You can use ML Kit to identify the language of a string of text. You can get the string's most likely language as well as confidence scores for all of the string's possible languages.
ML Kit recognizes text in more than 100 different languages in their native scripts. In addition, romanized text can be recognized for Arabic, Bulgarian, Chinese, Greek, Hindi, Japanese, and Russian. See the complete list of supported languages and scripts.
Bundled | Unbundled | |
---|---|---|
Library name | com.google.mlkit:language-id | com.google.android.gms:play-services-mlkit-language-id |
Implementation | Model is statically linked to your app at build time. | Model is dynamically downloaded via Google Play Services. |
App size impact | About 900 KB size increase. | About 200 KB size increase. |
Initialization time | Model is available immediately. | Might have to wait for model to download before first use. |
Try it out
- Play around with the sample app to see an example usage of this API.
Before you begin
In your project-level
build.gradle
file, make sure to include Google's Maven repository in both yourbuildscript
andallprojects
sections.Add the dependencies for the ML Kit Android libraries to your module's app-level gradle file, which is usually
app/build.gradle
. Choose one of the following dependencies based on your needs:For bundling the model with your app:
dependencies { // ... // Use this dependency to bundle the model with your app implementation 'com.google.mlkit:language-id:17.0.6' }
For using the model in Google Play Services:
dependencies { // ... // Use this dependency to use the dynamically downloaded model in Google Play Services implementation 'com.google.android.gms:play-services-mlkit-language-id:17.0.0' }
If you choose to use the model in Google Play Services, you can configure your app to automatically download the model to the device after your app is installed from the Play Store. To do so, add the following declaration to your app's
AndroidManifest.xml
file:<application ...> ... <meta-data android:name="com.google.mlkit.vision.DEPENDENCIES" android:value="langid" > <!-- To use multiple models: android:value="langid,model2,model3" --> </application>
You can also explicitly check the model availability and request download through Google Play services ModuleInstallClient API.
If you don't enable install-time model downloads or request explicit download, the model is downloaded the first time you run the identifier. Requests you make before the download has completed produce no results.
Identify the language of a string
To identify the language of a string, call LanguageIdentification.getClient()
to
get an instance of LanguageIdentifier
, and then pass the string to the
identifyLanguage()
method of LanguageIdentifier
.
For example:
Kotlin
val languageIdentifier = LanguageIdentification.getClient() languageIdentifier.identifyLanguage(text) .addOnSuccessListener { languageCode -> if (languageCode == "und") { Log.i(TAG, "Can't identify language.") } else { Log.i(TAG, "Language: $languageCode") } } .addOnFailureListener { // Model couldn’t be loaded or other internal error. // ... }
Java
LanguageIdentifier languageIdentifier = LanguageIdentification.getClient(); languageIdentifier.identifyLanguage(text) .addOnSuccessListener( new OnSuccessListener<String>() { @Override public void onSuccess(@Nullable String languageCode) { if (languageCode.equals("und")) { Log.i(TAG, "Can't identify language."); } else { Log.i(TAG, "Language: " + languageCode); } } }) .addOnFailureListener( new OnFailureListener() { @Override public void onFailure(@NonNull Exception e) { // Model couldn’t be loaded or other internal error. // ... } });
If the call succeeds, a
BCP-47 language code is
passed to the success listener, indicating the language of the text. If no
language is confidently detected, the code
und
(undetermined) is passed.
By default, ML Kit returns a value other than und
only when it identifies
the language with a confidence value of at least 0.5. You can change this
threshold by passing a LanguageIdentificationOptions
object to getClient()
:
Kotlin
val languageIdentifier = LanguageIdentification .getClient(LanguageIdentificationOptions.Builder() .setConfidenceThreshold(0.34f) .build())
Java
LanguageIdentifier languageIdentifier = LanguageIdentification.getClient( new LanguageIdentificationOptions.Builder() .setConfidenceThreshold(0.34f) .build());
Get the possible languages of a string
To get the confidence values of a string's most likely languages, get an
instance of LanguageIdentifier
and then pass the string to the
identifyPossibleLanguages()
method.
For example:
Kotlin
val languageIdentifier = LanguageIdentification.getClient() languageIdentifier.identifyPossibleLanguages(text) .addOnSuccessListener { identifiedLanguages -> for (identifiedLanguage in identifiedLanguages) { val language = identifiedLanguage.languageTag val confidence = identifiedLanguage.confidence Log.i(TAG, "$language $confidence") } } .addOnFailureListener { // Model couldn’t be loaded or other internal error. // ... }
Java
LanguageIdentifier languageIdentifier = LanguageIdentification.getClient(); languageIdentifier.identifyPossibleLanguages(text) .addOnSuccessListener(new OnSuccessListener<List<IdentifiedLanguage>>() { @Override public void onSuccess(List<IdentifiedLanguage> identifiedLanguages) { for (IdentifiedLanguage identifiedLanguage : identifiedLanguages) { String language = identifiedLanguage.getLanguageTag(); float confidence = identifiedLanguage.getConfidence(); Log.i(TAG, language + " (" + confidence + ")"); } } }) .addOnFailureListener( new OnFailureListener() { @Override public void onFailure(@NonNull Exception e) { // Model couldn’t be loaded or other internal error. // ... } });
If the call succeeds, a list of IdentifiedLanguage
objects is passed to the
success listener. From each object, you can get the language's BCP-47 code and
the confidence that the string is in that language. Note that
these values indicate the confidence that the entire string is in the given
language; ML Kit doesn't identify multiple languages in a single string.
By default, ML Kit returns only languages with confidence values of at least
0.01. You can change this threshold by passing a
LanguageIdentificationOptions
object to
getClient()
:
Kotlin
val languageIdentifier = LanguageIdentification .getClient(LanguageIdentificationOptions.Builder() .setConfidenceThreshold(0.5f) .build())
Java
LanguageIdentifier languageIdentifier = LanguageIdentification.getClient( new LanguageIdentificationOptions.Builder() .setConfidenceThreshold(0.5f) .build());
If no language meets this threshold, the list has one item, with the value
und
.