DEV Community

usapop
usapop

Posted on

true-form-length: A Library for Accurate Character Counting in Web Forms

Background

I previously wrote about the hidden pitfalls in JavaScript character counting. The issues go beyond the well-known surrogate pair problem:

Issue JS .length Server receives
Emoji 🎉 2 1 character
a\nb\nc (textarea) 5 7 chars (\n\r\n)
Lone surrogate \uD800 1 MySQL error 💥

That last one is especially problematic—a lone surrogate can cause your database insert to fail entirely.

I kept running into these issues in my own projects, so I ended up creating a small library to handle them. It might be useful for others dealing with similar problems, so I'm sharing it here.

What is true-form-length

true-form-length is a zero-dependency TypeScript library that counts characters as your server will receive them.

npm install true-form-length
Enter fullscreen mode Exit fullscreen mode

Features

  • Accurate character counting - Counts code points, not UTF-16 code units
  • CRLF normalization - Matches HTTP form submission behavior
  • Lone surrogate detection - Identifies and removes invalid UTF-16 sequences
  • UTF-8 byte length - For database byte-limit validation
  • URL-encoded length - For query parameter validation
  • React, Vue, and Lit components - Drop-in UI components

Basic Usage

import { countChars } from 'true-form-length'

// Simple counting
const result = countChars('Hello 🎉')
console.log(result.length)     // 7 (not 8)
console.log(result.byteLength) // 10 (UTF-8 bytes)

// Textarea with newlines
const textarea = countChars('Line1\nLine2')
console.log(textarea.length)   // 13 (accounts for \n → \r\n)
Enter fullscreen mode Exit fullscreen mode

The countChars function returns:

interface CountResult {
  length: number           // Code points (MySQL CHAR_LENGTH compatible)
  byteLength: number       // UTF-8 bytes (MySQL LENGTH compatible)
  urlEncodedLength: number // Length after percent-encoding
  hasLoneSurrogate: boolean
  newlineCount: number
  wordCount: number
  sentenceCount: number
  readingTime: number      // Minutes (200 wpm)
}
Enter fullscreen mode Exit fullscreen mode

Validation

Direct Validation

import { isValidLength, isValidByteLength } from 'true-form-length'

// VARCHAR(255) check
isValidLength(userInput, 255)

// Index byte limit check (MySQL InnoDB: 767 bytes)
isValidByteLength(userInput, 767)
Enter fullscreen mode Exit fullscreen mode

Zod Integration

import { z } from 'zod'
import { maxLength, maxByteLength, noLoneSurrogates } from 'true-form-length'

const schema = z.object({
  username: z.string()
    .refine(maxLength(255), 'Must be 255 characters or less')
    .refine(noLoneSurrogates(), 'Contains invalid characters'),
  bio: z.string()
    .refine(maxByteLength(767), 'Too long for database index'),
})
Enter fullscreen mode Exit fullscreen mode

Yup Integration

import * as yup from 'yup'
import { maxLength, noLoneSurrogates } from 'true-form-length'

const schema = yup.object({
  username: yup.string()
    .test('maxLength', 'Too long', maxLength(255))
    .test('valid', 'Invalid characters', noLoneSurrogates()),
})
Enter fullscreen mode Exit fullscreen mode

React Components

The library includes Radix UI-style compound components:

import { Textarea } from 'true-form-length/react/textarea'

function PostEditor() {
  const [content, setContent] = useState('')

  return (
    <Textarea.Root
      maxLength={280}
      value={content}
      onValueChange={setContent}
      warningThreshold={0.9}
    >
      <Textarea.Input placeholder="What's happening?" />
      <Textarea.Counter />
      <Textarea.Error>Too many characters!</Textarea.Error>
      <Textarea.Progress />
    </Textarea.Root>
  )
}
Enter fullscreen mode Exit fullscreen mode

Styling with data attributes:

textarea[data-state="over"] {
  border-color: red;
}

textarea[data-state="warning"] {
  border-color: orange;
}

/* Tailwind */
.counter {
  @apply text-gray-500 data-[state=over]:text-red-500 data-[state=warning]:text-amber-500;
}
Enter fullscreen mode Exit fullscreen mode

Vue Components

<script setup lang="ts">
import { ref } from 'vue'
import * as Textarea from 'true-form-length/vue/textarea'

const content = ref('')
</script>

<template>
  <Textarea.Root v-model="content" :max-length="280" :warning-threshold="0.9">
    <Textarea.Input placeholder="What's happening?" />
    <Textarea.Counter />
    <Textarea.Error>Too many characters!</Textarea.Error>
  </Textarea.Root>
</template>
Enter fullscreen mode Exit fullscreen mode

Presets

Built-in constants for common limits:

import {
  X_LIMIT,            // 280
  SMS_LIMIT,          // 160
  MYSQL_VARCHAR,      // 255
  MYSQL_TEXT,         // 65535
  INSTAGRAM_BIO,      // 150
  LINKEDIN_POST,      // 3000
  YOUTUBE_TITLE,      // 100
  TIKTOK_CAPTION,     // 2200
  X_URL_LENGTH,       // 23 (t.co wrapper)
} from 'true-form-length'

// X-style URL counting (all URLs = 23 chars)
const tweet = 'Check out https://example.com/very/long/path'
const result = countChars(tweet, { urlLength: X_URL_LENGTH })
Enter fullscreen mode Exit fullscreen mode

URL Encoding Validation

For query parameters and path segments:

import {
  countChars,
  isValidUrlEncodedLength,
  truncateUrlEncoded
} from 'true-form-length'

// Japanese characters expand significantly when URL-encoded
const result = countChars('こんにちは')
console.log(result.urlEncodedLength) // 45 (each char = 9 encoded)

// Validate before building URLs
isValidUrlEncodedLength(searchQuery, 2000)

// Smart truncation that won't break encoded sequences
truncateUrlEncoded('Hello World', 10) // "Hello W"
Enter fullscreen mode Exit fullscreen mode

Why Not Just [...str].length?

The spread operator fixes surrogate pairs, but misses CRLF normalization:

const text = 'Hello\nWorld'

// Spread operator
console.log([...text].length) // 11

// true-form-length (with CRLF normalization)
console.log(countChars(text).length) // 13

// What the server actually receives: 13 characters
Enter fullscreen mode Exit fullscreen mode

And lone surrogates can still cause issues:

const problematic = 'Valid text\uD800more text'

// Spread operator
console.log([...problematic].length) // 19

// true-form-length
const result = countChars(problematic)
console.log(result.hasLoneSurrogate) // true
console.log(result.length) // 18 (lone surrogate excluded)
Enter fullscreen mode Exit fullscreen mode

Notes

  • Zero dependencies
  • Single-pass algorithm
  • ~50KB minified (tree-shakeable)
  • React/Vue hooks use proper memoization

Links


For more context on why these issues occur, I wrote about it in detail here: Beyond Surrogate Pairs: Hidden Pitfalls in JS Character Counting with PHP/MySQL

Top comments (0)