DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

James Moberg
James Moberg

Posted on • Updated on

Convert Unicode strings to ASCII with ColdFusion & JUnidecode

I’ve struggled for years attempting to identify the best solution for converting unicode accents and other characters using ColdFusion. I’ve used regex, java.text.Normalizer, ICU4J Transliterate and Apache.Lang3.StringUtils.StripAccents and recently scrapped them all in favor of using JUnidecode. JUnidecode is a Java port of Text::Unidecode perl module. The JUnidecode Java library only has one method and it takes a string and transliterates it to a valid 7-bit ASCII String (obviously it also strips diacritic marks).

Examples:

  • Москвa becomes Moskva.
  • čeΕ‘tina becomes cestina.
  • Υ€Υ‘Υ΅Υ‘Υ½ΥΏΥ‘ΥΆ becomes Hayastan.
  • Ελληνικά becomes Ellenika.
  • εŒ—δΊ° becomes Bei Jing
  • HΓ€user BΓ€ume HΓΆfe GΓ€rten becomes Hauser Baume Hofe Garten
  • daß becomes dass

WARNING: Please be aware that Junidecode doesn't like emojis. You may need to sanitize (or convert to aliases) using cf-emoji-java prior to using converting to ASCII7.

Here's a demo script I've written that has some generic test cases:
https://gist.github.com/JamoCA/6565bd4e2526b7c177a5f0cde3980d1c

Top comments (0)

Hacktoberfest is happening now!


It is a month-long celebration of open source. For a lot of devs, its their introduction to open source.



Check out the Hacktoberfest tag on DEV to keep up with the latest!