Azure Blob Storage and Azure Search - Handling non-ASCII characters with Azure Indexers
Azure Blob Containers will only accept ASCII characters in metadata fields. In order to store a character like "ó" in your metadata fields, URL encode the metadata before writing to Azure:
title: encodeURIComponent(docTitle),
The encoded data will look like "%C3%B3" to preserve the non-ASCII characters. Simply decode before displaying on the front end.
But what if my metadata needs to be searchable in Azure Search?
I pushed the above change to production, and it broke our search page. Special characters were not searchable because there were no special characters in the search index (they were encoded). And encoding the search queries before executing a search just seemed like a bad idea.
Azure Indexers, which we used to get our data from the blob into the search index, allow you to configure field mapping functions to transform your data as you are adding it to the search index
"urlDecode" is one of these mapping functions. The configuration looks like this:
{
"sourceFieldName": "title",
"targetFieldName": "my_document_title",
"mappingFunction": {
"name": "urlDecode"
}
}
Now the special characters are stored unencoded in the search index and ready to be searched.
Top comments (0)