DEV Community

Cover image for Java and BCP 47 Language Tags
John O'Conner
John O'Conner

Posted on

Java and BCP 47 Language Tags

Since Java 7, Java's Locale object has been updated to take on the role of a language tag as identified in RFC 5646 and BCP 47 specs. The newer language tag support gives developers the ability to be more precise in identifying language resources. The new Locale, used as a language tag, can be used to identify languages in this general form:

language[-script][-region]*(-variant)*(-extension)[-privateuse]
Enter fullscreen mode Exit fullscreen mode

Of course, you can continue to think of locale as a lang_region_variant identifier, but Java now uses the RFC 5646 spec to enhance the Locale class to support language, script, broader regions, and even newer extensions if needed. And if the rules for generating this string of identifiers seems intimidating, you can use the Locale.Builder class to build up the tag without worries of creating a malformed tag.

The primary language identifier is the almost the same item you've always known; it's an ISO 639 2-letter code or 3-letter code. The spec recommends using the shortest id possible.

The script is new. You can now add a proper script identifier that specifies the writing system used for the language. People can use multiple writing systems to write languages. For example, Japanese speakers/writers can use 3 or more different scripts for Japanese: kanji, hiragana, katakana, and even “romaji” or Latin script. Serbian is another language often written in either Latin or Cyrillic characters.

The region identifier was once limited to 2-letter ISO 3166 codes, but now you can also use the United Nations 3-digit macro geographical region codes in the region portion of a language tag. A macro geographical region identifies a larger region that comprises more than one country. For example, the UN currently defines Eastern Europe to be macro region 151 and includes 10 countries within it.

Finally, you can use variant, extension, and privateuse sub-tags to provide even more context for a language tag. See RFC 5646 for more details on these. I suggest that you also use the Locale.Builder class to assist if you need to use this level of detail.

Take a look at the Locale documentation for all the details on using these new features. They definitely give you much more control of how you identify and use language resources in your internationalized applications.

Top comments (0)