Background
I previously wrote about the hidden pitfalls in JavaScript character counting. The issues go beyond the well-known surrogate pair problem:
| Issue | JS .length
|
Server receives |
|---|---|---|
Emoji 🎉
|
2 | 1 character |
a\nb\nc (textarea) |
5 | 7 chars (\n → \r\n) |
Lone surrogate \uD800
|
1 | MySQL error 💥 |
That last one is especially problematic—a lone surrogate can cause your database insert to fail entirely.
I kept running into these issues in my own projects, so I ended up creating a small library to handle them. It might be useful for others dealing with similar problems, so I'm sharing it here.
What is true-form-length
true-form-length is a zero-dependency TypeScript library that counts characters as your server will receive them.
npm install true-form-length
Features
- Accurate character counting - Counts code points, not UTF-16 code units
- CRLF normalization - Matches HTTP form submission behavior
- Lone surrogate detection - Identifies and removes invalid UTF-16 sequences
- UTF-8 byte length - For database byte-limit validation
- URL-encoded length - For query parameter validation
- React, Vue, and Lit components - Drop-in UI components
Basic Usage
import { countChars } from 'true-form-length'
// Simple counting
const result = countChars('Hello 🎉')
console.log(result.length) // 7 (not 8)
console.log(result.byteLength) // 10 (UTF-8 bytes)
// Textarea with newlines
const textarea = countChars('Line1\nLine2')
console.log(textarea.length) // 13 (accounts for \n → \r\n)
The countChars function returns:
interface CountResult {
length: number // Code points (MySQL CHAR_LENGTH compatible)
byteLength: number // UTF-8 bytes (MySQL LENGTH compatible)
urlEncodedLength: number // Length after percent-encoding
hasLoneSurrogate: boolean
newlineCount: number
wordCount: number
sentenceCount: number
readingTime: number // Minutes (200 wpm)
}
Validation
Direct Validation
import { isValidLength, isValidByteLength } from 'true-form-length'
// VARCHAR(255) check
isValidLength(userInput, 255)
// Index byte limit check (MySQL InnoDB: 767 bytes)
isValidByteLength(userInput, 767)
Zod Integration
import { z } from 'zod'
import { maxLength, maxByteLength, noLoneSurrogates } from 'true-form-length'
const schema = z.object({
username: z.string()
.refine(maxLength(255), 'Must be 255 characters or less')
.refine(noLoneSurrogates(), 'Contains invalid characters'),
bio: z.string()
.refine(maxByteLength(767), 'Too long for database index'),
})
Yup Integration
import * as yup from 'yup'
import { maxLength, noLoneSurrogates } from 'true-form-length'
const schema = yup.object({
username: yup.string()
.test('maxLength', 'Too long', maxLength(255))
.test('valid', 'Invalid characters', noLoneSurrogates()),
})
React Components
The library includes Radix UI-style compound components:
import { Textarea } from 'true-form-length/react/textarea'
function PostEditor() {
const [content, setContent] = useState('')
return (
<Textarea.Root
maxLength={280}
value={content}
onValueChange={setContent}
warningThreshold={0.9}
>
<Textarea.Input placeholder="What's happening?" />
<Textarea.Counter />
<Textarea.Error>Too many characters!</Textarea.Error>
<Textarea.Progress />
</Textarea.Root>
)
}
Styling with data attributes:
textarea[data-state="over"] {
border-color: red;
}
textarea[data-state="warning"] {
border-color: orange;
}
/* Tailwind */
.counter {
@apply text-gray-500 data-[state=over]:text-red-500 data-[state=warning]:text-amber-500;
}
Vue Components
<script setup lang="ts">
import { ref } from 'vue'
import * as Textarea from 'true-form-length/vue/textarea'
const content = ref('')
</script>
<template>
<Textarea.Root v-model="content" :max-length="280" :warning-threshold="0.9">
<Textarea.Input placeholder="What's happening?" />
<Textarea.Counter />
<Textarea.Error>Too many characters!</Textarea.Error>
</Textarea.Root>
</template>
Presets
Built-in constants for common limits:
import {
X_LIMIT, // 280
SMS_LIMIT, // 160
MYSQL_VARCHAR, // 255
MYSQL_TEXT, // 65535
INSTAGRAM_BIO, // 150
LINKEDIN_POST, // 3000
YOUTUBE_TITLE, // 100
TIKTOK_CAPTION, // 2200
X_URL_LENGTH, // 23 (t.co wrapper)
} from 'true-form-length'
// X-style URL counting (all URLs = 23 chars)
const tweet = 'Check out https://example.com/very/long/path'
const result = countChars(tweet, { urlLength: X_URL_LENGTH })
URL Encoding Validation
For query parameters and path segments:
import {
countChars,
isValidUrlEncodedLength,
truncateUrlEncoded
} from 'true-form-length'
// Japanese characters expand significantly when URL-encoded
const result = countChars('こんにちは')
console.log(result.urlEncodedLength) // 45 (each char = 9 encoded)
// Validate before building URLs
isValidUrlEncodedLength(searchQuery, 2000)
// Smart truncation that won't break encoded sequences
truncateUrlEncoded('Hello World', 10) // "Hello W"
Why Not Just [...str].length?
The spread operator fixes surrogate pairs, but misses CRLF normalization:
const text = 'Hello\nWorld'
// Spread operator
console.log([...text].length) // 11
// true-form-length (with CRLF normalization)
console.log(countChars(text).length) // 13
// What the server actually receives: 13 characters
And lone surrogates can still cause issues:
const problematic = 'Valid text\uD800more text'
// Spread operator
console.log([...problematic].length) // 19
// true-form-length
const result = countChars(problematic)
console.log(result.hasLoneSurrogate) // true
console.log(result.length) // 18 (lone surrogate excluded)
Notes
- Zero dependencies
- Single-pass algorithm
- ~50KB minified (tree-shakeable)
- React/Vue hooks use proper memoization
Links
For more context on why these issues occur, I wrote about it in detail here: Beyond Surrogate Pairs: Hidden Pitfalls in JS Character Counting with PHP/MySQL
Top comments (0)