Simple methods to language detection

3 min readJun 3, 2021

Let’s talk about the some Python packages to detect language of the text.

Language detection is one of the activity in the natural language processing, We have many python packages to serve this.

In this article we will discuss about the langid, langdetect, TextBlob.

1. langid

langid package is pre-trained over a large number of languages (currently 97).

langid package is requires >= Python 2.7 and numpy. The main script langid/langid.py is cross-compatible with both Python2 and Python3, but the accompanying training tools are still Python2-only.

langid.py is WSGI-compliant. langid.py will use fapws3 as a web server if available, and default to wsgiref.simple_server otherwise.

You can use this github link to explore more about langid package.

saffsd/langid.py

langid.py is a standalone Language Identification (LangID) tool. The design principles are as follows: Fast Pre-trained…

github.com

sample_data = ['Azaindole derivatives and their use as antithrombotic agents ', 'Azaindol derivate und ihre Verwendung als antithrombotische Wirkstoffe ', "Dérivés de l'azaindole et leur utilisation comme agents antithrombotiques ", '']import langidfor lang in lang_data:
    language = langid.classify(lang)
    print(f" {lang} is related to {language}")And Out is like belowAzaindole derivatives and their use as antithrombotic agents  is related to ('en', -173.22279596328735)Azaindol derivate und ihre Verwendung als antithrombotische Wirkstoffe  is related to ('de', -239.93667697906494)Dérivés de l'azaindole et leur utilisation comme agents antithrombotiques  is related to ('fr', -274.12151527404785)
  is related to ('en', 9.061840057373047)

2. langdetect

This module is a port of Google’s language-detection library that supports 55 languages. This module don’t come with Python’s standard utility modules. So, it is needed to be installed externally.

You can use this github link to explore more about langdetect package.

Mimino666/langdetect

Port of Nakatani Shuyo's language-detection library (version from 03/03/2014) to Python. Supported Python versions 2.7…

github.com

sample_data = ['Azaindole derivatives and their use as antithrombotic agents ', 'Azaindol derivate und ihre Verwendung als antithrombotische Wirkstoffe ', "Dérivés de l'azaindole et leur utilisation comme agents antithrombotiques ", '']from langdetect import detectfor lang in lang_data:
    language = detect(lang)
    print(language)And, output will be like below....
en
de
fr

NOTE

Language detection algorithm is non-deterministic, which means that if you try to run it on a text which is either too short or too ambiguous, you might get different results every time you run it.

To enforce consistent results, call following code before the first language detection:

from langdetect import DetectorFactory
DetectorFactory.seed = 0

3. TextBlob

TextBlob package requires NLTK package, uses Google.

Note: This solution requires internet access and Textblob is using Google Translate’s language detector by calling the API.

sample_data = ['Azaindole derivatives and their use as antithrombotic agents ', 'Azaindol derivate und ihre Verwendung als antithrombotische Wirkstoffe ', "Dérivés de l'azaindole et leur utilisation comme agents antithrombotiques ", '']from textblob import TextBlobfor lang in lang_data:
    b = TextBlob(lang)
    print(b.detect_language())
And, output will be like below....
en
de
fr

In the Next article I will try to cover three more libraries.

Conclusion
In this article, I have tried to explain different python libraries to detect the languages.
We have many more packages for language detection like spacy-langdetect, Pycld2, polyglot, Chardet, guess_language, fasttext, pycld3 and Googletrans
Language detection is one of the key activity in NLP process.
Hopefully, this article will help you.

Thanks for reading…

Simple methods to language detection

saffsd/langid.py

langid.py is a standalone Language Identification (LangID) tool. The design principles are as follows: Fast Pre-trained…

Mimino666/langdetect

Port of Nakatani Shuyo's language-detection library (version from 03/03/2014) to Python. Supported Python versions 2.7…

Written by Ganyan Tech