DEV Community

Cover image for How To Check if a Text Have Weird Characters
Ahmed Zeno
Ahmed Zeno

Posted on

5 1

How To Check if a Text Have Weird Characters

So how to know if ur text have strange characters or not ?

I've got a bug ticket from our QA says that the text style doesn't display as it should be when the language is Japanese.

In the beginning I tried to check the css and played around to find a css solution that supports all browsers and i thought it might be a font issue , then i realised that i need to do some changes on the code, so i decided i should check if the text has this weird cases(alphabets) or not.

so I wanted to do a quick a check if my text include Japanese characters and i ended up using this code.
It usages regular expressions

regex = /[\u3000-\u303F]|[\u3040-\u309F]|[\u30A0-\u30FF]|[\uFF00-\uFFEF]|[\u4E00-\u9FAF]|[\u2605-\u2606]|[\u2190-\u2195]|\u203B/g;
var input = "ビッグファームへようこそ!";

if(regex.test(input)) {
console.log("Japanese characters found")
}
else {
console.log("No Japanese characters");
}

Alt Text

And it worked for me and I started to understand the issue ,but what if u want to check the other languages like Chinese, Deutsche, Russian …….etc

Ive tried to write another RegExpr and use the .test() with it to get the result

var format = /[ !@#$%^&*()_+-=[]{};':"\|,.<>\/?]/; // ^ ^ document.write(format.test("My@string-with(some%text)") + "
"); document.write(format.test("My string with spaces") + "
"); document.write(format.test("MyStringContainingNoSpecialChars"));

And this one was also good,
The anchors (like ^ start of string/line, $ end of string/line and \b word boundaries) can restrict matches at specific places in a string. When using ^ the regex engine checks if the next subpattern appears right at the start of the string (or line if /m modifier is declared in the regex).

Same case with $: the preceding subpattern should match right at the end of the string.

but it's also include the spaces and counted as strange character so what to do:-

Alt Text

By removing the anchors, and the quantifier *. The * quantifier that would match even an empty string, thus we must remove it in order to actually check for the presence of at least 1 special character (actually, without any quantifiers we check for exactly one occurrence, same as if we were using {1} limiting quantifier).

After some searching for such an issue found article that talking about this issue and it was like this :-

The important thing , you have to define what is the strange or special characters for you :-

All chars other than ASCII chars: /[^\x00-\x7F]/ (demo)
All chars other than printable ASCII chars: /[^ -~]/ (demo)
Any printable ASCII chars other than space, letters and digits: /[!-\/:-@[-`{-~]/ (demo)
Any Unicode punctuation proper chars, the \p{P} Unicode property class:
ECMAScript 2018: /\p{P}/u
ES6+:

/[!-#%-*,-\/:;?@[-]_{}\u00A1\u00A7\u00AB\u00B6\u00B7\u00BB\u00BF\u037E\u0387\u055A-\u055F\u0589\u058A\u05BE\u05C0\u05C3\u05C6\u05F3\u05F4\u0609\u060A\u060C\u060D\u061B\u061E\u061F\u066A-\u066D\u06D4\u0700-\u070D\u07F7-\u07F9\u0830-\u083E\u085E\u0964\u0965\u0970\u09FD\u0A76\u0AF0\u0C84\u0DF4\u0E4F\u0E5A\u0E5B\u0F04-\u0F12\u0F14\u0F3A-\u0F3D\u0F85\u0FD0-\u0FD4\u0FD9\u0FDA\u104A-\u104F\u10FB\u1360-\u1368\u1400\u166D\u166E\u169B\u169C\u16EB-\u16ED\u1735\u1736\u17D4-\u17D6\u17D8-\u17DA\u1800-\u180A\u1944\u1945\u1A1E\u1A1F\u1AA0-\u1AA6\u1AA8-\u1AAD\u1B5A-\u1B60\u1BFC-\u1BFF\u1C3B-\u1C3F\u1C7E\u1C7F\u1CC0-\u1CC7\u1CD3\u2010-\u2027\u2030-\u2043\u2045-\u2051\u2053-\u205E\u207D\u207E\u208D\u208E\u2308-\u230B\u2329\u232A\u2768-\u2775\u27C5\u27C6\u27E6-\u27EF\u2983-\u2998\u29D8-\u29DB\u29FC\u29FD\u2CF9-\u2CFC\u2CFE\u2CFF\u2D70\u2E00-\u2E2E\u2E30-\u2E4E\u3001-\u3003\u3008-\u3011\u3014-\u301F\u3030\u303D\u30A0\u30FB\uA4FE\uA4FF\uA60D-\uA60F\uA673\uA67E\uA6F2-\uA6F7\uA874-\uA877\uA8CE\uA8CF\uA8F8-\uA8FA\uA8FC\uA92E\uA92F\uA95F\uA9C1-\uA9CD\uA9DE\uA9DF\uAA5C-\uAA5F\uAADE\uAADF\uAAF0\uAAF1\uABEB\uFD3E\uFD3F\uFE10-\uFE19\uFE30-\uFE52\uFE54-\uFE61\uFE63\uFE68\uFE6A\uFE6B\uFF01-\uFF03\uFF05-\uFF0A\uFF0C-\uFF0F\uFF1A\uFF1B\uFF1F\uFF20\uFF3B-\uFF3D\uFF3F\uFF5B\uFF5D\uFF5F-\uFF65\u{10100}-\u{10102}\u{1039F}\u{103D0}\u{1056F}\u{10857}\u{1091F}\u{1093F}\u{10A50}-\u{10A58}\u{10A7F}\u{10AF0}-\u{10AF6}\u{10B39}-\u{10B3F}\u{10B99}-\u{10B9C}\u{10F55}-\u{10F59}\u{11047}-\u{1104D}\u{110BB}\u{110BC}\u{110BE}-\u{110C1}\u{11140}-\u{11143}\u{11174}\u{11175}\u{111C5}-\u{111C8}\u{111CD}\u{111DB}\u{111DD}-\u{111DF}\u{11238}-\u{1123D}\u{112A9}\u{1144B}-\u{1144F}\u{1145B}\u{1145D}\u{114C6}\u{115C1}-\u{115D7}\u{11641}-\u{11643}\u{11660}-\u{1166C}\u{1173C}-\u{1173E}\u{1183B}\u{11A3F}-\u{11A46}\u{11A9A}-\u{11A9C}\u{11A9E}-\u{11AA2}\u{11C41}-\u{11C45}\u{11C70}\u{11C71}\u{11EF7}\u{11EF8}\u{12470}-\u{12474}\u{16A6E}\u{16A6F}\u{16AF5}\u{16B37}-\u{16B3B}\u{16B44}\u{16E97}-\u{16E9A}\u{1BC9F}\u{1DA87}-\u{1DA8B}\u{1E95E}\u{1E95F}]/u

Alt Text

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

Top comments (0)