Class LdLocale

java.lang.Object
com.optimaize.langdetect.i18n.LdLocale

public final class LdLocale extends Object
A language-detector implementation of a Locale, similar to the java.util.Locale.

It represents a IETF BCP 47 tag, but does not implement all the features. Features can be added as needed.

It is constructed through the fromString(java.lang.String) factory method. The toString() method produces a parseable and persistable string.

The class is immutable.

The java.util.Locale cannot be used because it has issues for historical reasons, notably the script code conversion for Hebrew, Yiddish and Indonesian, and more. If one needs a Locale, it is simple to create one based on this object.
The ICU ULocale cannot be used because a) it has issues too (for our use case) and b) we're not using ICU in here [yet].

This class does not perform any modifications on the input. The input is used as is, and the getters return it in exactly the same way. No standardization, canonicalization, cleaning.

The input is validated syntactically, but not for code existence. For example the script code must be a valid ISO 15924 like "Latn" or "Cyrl", in correct case. But whether the code exists or not is not checked. These code standards are not fixed, simply because regional entities like Countries can change for political reasons, and languages are living entities. Therefore certain codes may exist at some point in time only (be introduced late, or be deprecated or removed, or even be re-assigned another meaning). It is not up to us to decide whether Kosovo is a country in 2015 or not. If one needs to only work with a certain range of acceptable codes, he can validate the codes through other classes that have knowledge about the codes.

Language: as for BCP 47, the iso 639-1 code must be used if there is one. For example "fr" for French. If not, the ISO 639-3 should be used. It is highly discouraged to use 639-2. Right now this class enforces a 2 or 3 char code, but this may be relaxed in the future.

Script: Only ISO 15924, no discussion.

Region: same as for BCP 47. That means ISO 3166-1 alpha-2 and "UN M.49". I can imagine relaxing it in the future to also allow 3166-2 codes. In most cases the "region" is a "country".

  • Field Details

    • language

      @NotNull private final @NotNull String language
    • script

      @NotNull private final @NotNull com.google.common.base.Optional<String> script
    • region

      @NotNull private final @NotNull com.google.common.base.Optional<String> region
  • Constructor Details

    • LdLocale

      private LdLocale(@NotNull @NotNull String language, @NotNull @NotNull com.google.common.base.Optional<String> script, @NotNull @NotNull com.google.common.base.Optional<String> region)
  • Method Details

    • fromString

      @NotNull public static @NotNull LdLocale fromString(@NotNull @NotNull String string)
      Parameters:
      string - The output of the toString() method.
      Returns:
      either a new or possibly a cached (immutable) instance.
    • looksLikeScriptCode

      private static boolean looksLikeScriptCode(String string)
    • looksLikeGeoCode3166_1

      private static boolean looksLikeGeoCode3166_1(String string)
    • looksLikeGeoCodeNumeric

      private static boolean looksLikeGeoCodeNumeric(String string)
    • assignLang

      private static String assignLang(String s)
    • toString

      public String toString()
      The output of this can be fed to the fromString() method.
      Overrides:
      toString in class Object
      Returns:
      for example "de" or "de-Latn" or "de-CH" or "de-Latn-CH", see class header.
    • getLanguage

      @NotNull public @NotNull String getLanguage()
      Returns:
      ISO 639-1 or 639-3 language code, eg "fr" or "gsw", see class header.
    • getScript

      @NotNull public @NotNull com.google.common.base.Optional<String> getScript()
      Returns:
      ISO 15924 script code, eg "Latn", see class header.
    • getRegion

      @NotNull public @NotNull com.google.common.base.Optional<String> getRegion()
      Returns:
      ISO 3166-1 or UN M.49 code, eg "DE" or 150, see class header.
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object