DEV Community

Cover image for Parsing non-Latin based Twitch usernames in Kotlin
Tristan Elliott
Tristan Elliott

Posted on

Parsing non-Latin based Twitch usernames in Kotlin

Table of contents

  1. Introduction
  2. The Problem
  3. The Solution

The code

My app on the Google play store

Introduction

  • here is just a quick little reminder that if you are ever parsing usernames and or user based content, think if you can parse non-Latin based text

The Problem

  • Recently I have ran into an issue where the regex for my parsing code, simply does not work on non-Latin based alphabets. For example, if I wanted to parse the display-name from this string: display-name=CoalTheTroll;emotes=;flags=;id=3ceab6bd-de3f-4d05-8038-5cebdb2af1c7; :tmi.twitch.tv USERNOTICE #cohhcarnage
  • The typical code would look like this:
fun userNoticeParsing(text: String):String{
  val displaynamePattern = "display-name=([a-zA-Z0-9_]+)".toRegex()
   val displayNameMatch = displayNamePattern.find(text)
   return displayNameMatch?.groupValues?.get(1)!!
    }

Enter fullscreen mode Exit fullscreen mode
  • The code above works. However, there is a problem when the display name is non-latin based. For example, a Mandarin display name will not be parsed. So a display-name of 不橋小結 will cause the code to crash

The solution

  • A simple solution (some might say lazy) is to not worry about ASCII character sets. With regex, we simply say, match all characters after display-name. The code would look like this:
fun userNoticeParsing(text: String):String{
        val displayNamePattern = "display-name=([^;]+)".toRegex()
        val displayNameMatch = displayNamePattern.find(text)
        return displayNameMatch?.groupValues?.get(1) ?: "username"
    }
Enter fullscreen mode Exit fullscreen mode
  • with the regex code above, display-name=([^;]+), we are stating. Match display-name= and any characters that follow one or more times, stop matching once you find a ;. The ()brackets allow us to break the regex expression into groups allowing for a easier match and quick retrieval of what we actually want. Lasty we us the ?: operator to say, if not match is found return "username"
  • Now, even with character based display names, such as Mandarin our code will work:
val text ="display-name=不橋小結;emotes=;flags=;id=3ceab6bd-de3f-4d05-8038-5cebdb2af1c7; :tmi.twitch.tv USERNOTICE #cohhcarnage"

fun userNoticeParsing(text: String):String{
        val displayNamePattern = "display-name=([^;]+)".toRegex()
        val displayNameMatch = displayNamePattern.find(text)
        return displayNameMatch?.groupValues?.get(1) ?: "username"
    }
val expectedUsername = "不橋小結"
val actualUsername = userNoticeParsing(text)
expectedUsername == actualUsername 

Enter fullscreen mode Exit fullscreen mode

Conclusion

  • Thank you for taking the time out of your day to read this blog post of mine. If you have any questions or concerns please comment below or reach out to me on Twitter.

Top comments (0)