On a recent project, the client had a Chinese version of their site. The texts and titles were, of course, written using Chinese characters, which meant that the URLs were automatically generated using those characters as well.
For SEO and UX reasons, we wanted the URLs to use Latin characters instead. While search engines are perfectly capable of indexing Chinese URLs, we felt that Latin URLs would perform better in practice.
Our reasoning was:
Latin URLs are more readable (especially since the audience was not exclusively Chinese), which we assumed would lead to higher click-through rates
Hyphenated Latin words provide clearer URL keyword signals
Sharing URLs and linking to them is much easier with Latin characters than with encoded Chinese characters
For example, a title like:
击掌!你太棒了!
would normally generate a URL segment using Chinese characters. With pinyin conversion applied, it instead becomes:
/ji-zhang-ni-tai-bang-le
The latter is much easier to read, share, and reason about, especially in mixed-language environments.
Converting Chinese characters to Latin using pinyin
Umbraco already tries to convert special characters when generating URLs, but the default character replacement set is quite limited. You can see the default configuration in the request handler settings in the documentation article about Request Handler Settings.
Because of the nature of Chinese characters, having more than 50,000 of them, mapping each character to a Latin equivalent is not feasible. Instead, we can use pinyin, which is a Latin-based phonetic transcription system for Chinese characters.
Using pinyin allows us to convert Chinese text into readable Latin equivalents without maintaining huge character maps. There is an open-source NuGet package called NPinyin.Core that makes this conversion straightforward. It’s lightweight and integrates cleanly into existing logic.
This approach also works well for mixed-language titles, such as:
击掌!You Rock!
which results in:
/ji-zhang-you-rock
Using a custom URL segment provider in Umbraco
With the conversion logic in place, the next step was to make Umbraco use it when generating URL segments.
When Umbraco generates URL segments, it runs through a set of URL segment providers, and it’s possible to create a custom one. The process is simple and well documented.
By adding a small amount of additional logic to separate Chinese characters with dashes, we ended up with a short and simple custom segment provider that generates readable, SEO-friendly Latin URLs for Chinese pages.
** Note: **
This approach converts each Chinese character to its pinyin representation and inserts dashes between characters. While this is not full linguistic word segmentation, it produces consistent, readable URL segments without adding unnecessary complexity.
Example implementation
using System.Text;
using NPinyin;
using Umbraco.Cms.Core.Models;
using Umbraco.Cms.Core.Strings;
namespace Project.Core.Urls;
// Custom URL segment provider that wraps Umbraco's default provider
// and applies Chinese to Pinyin conversion on top.
public class ChineseUrlSegmentProvider : IUrlSegmentProvider
{
// We delegate to the default provider first, so we keep
// Umbraco's built-in behavior for non-Chinese characters.
private readonly DefaultUrlSegmentProvider _defaultUrlSegmentProvider;
public ChineseUrlSegmentProvider(IShortStringHelper shortStringHelper)
{
_defaultUrlSegmentProvider = new DefaultUrlSegmentProvider(shortStringHelper);
}
public string? GetUrlSegment(IContentBase content, string? culture = null)
{
// Let Umbraco generate the initial URL segment,
// then post-process it with our custom logic.
return ChineseToPinyinWithDashes(
_defaultUrlSegmentProvider.GetUrlSegment(content, culture)
);
}
// Converts Chinese characters to pinyin and inserts dashes
// between consecutive Chinese characters to improve readability.
private static string? ChineseToPinyinWithDashes(string? input)
{
if (input is null)
{
return input;
}
var sb = new StringBuilder();
var lastWasChinese = false;
foreach (var c in input)
{
if (IsChinese(c))
{
// Insert a dash between consecutive Chinese characters
if (sb.Length > 0 && lastWasChinese)
{
sb.Append('-');
}
// Convert the individual character to pinyin
sb.Append(Pinyin.GetPinyin(c.ToString()));
lastWasChinese = true;
}
else
{
// Non-Chinese characters are passed through unchanged
sb.Append(c);
lastWasChinese = false;
}
}
return sb.ToString();
}
// Simple Unicode range check for CJK Unified Ideographs
private static bool IsChinese(char c)
{
return c >= 0x4E00 && c <= 0x9FFF;
}
}
Results
With this approach in place, Chinese pages now automatically generate clean, readable, and shareable URL segments without any manual intervention. It works equally well for fully Chinese titles and mixed-language content, and it integrates cleanly into Umbraco’s existing routing pipeline.
The solution is lightweight, easy to maintain, and avoids the complexity of large character-mapping tables or full linguistic word segmentation — making it a practical choice for real-world Umbraco projects.
Top comments (0)