Introduction
Usually when there is a need to determine if a string has multiple tokens/words a developer uses code like the following.
public static class Extensions
{
public static bool Search(this string line) =>
line.IndexOf("hello", StringComparison.OrdinalIgnoreCase) > 1 &&
line.IndexOf("world", StringComparison.OrdinalIgnoreCase) > 1;
}
Starting with .NET Core 8, Microsoft provides
System.Buffers.SearchValues<T> Class
Learn about SearchValues.
Which to use IndexOf or SearchValues?
SearchValues is a powerful structure that improves the efficiency of search operations. Providing a dedicated and optimized method for lookups, helps you write more performant and cleaner code, especially in scenarios where checking for multiple values is frequent.
SearchValues is not a replacement for IndexOf or IndexOfAny, SearchValues over larger strings which means for smaller strings a developer can use IndexOf && IndexOf etc.
Examples for SearchValues
Does text contain spam?
The text can come from any source, in this case to keep things simple, a text file.
TextBanned.txt
Hello Karen, I am writing to inform you that your account is now active.
This is not a spam message. Please click the link below
In a project a json file is used for watched tokens/words.
bannedwords.json
[
{
"Id": "1",
"Name": "spam"
},
{
"Id": "2",
"Name": "advertisement"
},
{
"Id": "3",
"Name": "clickbait"
}
]
The following model is used to deserialize the file above.
public class BannedWord
{
public string Id { get; set; }
public string Name { get; set; }
}
Next, create a language extension method for SearchValues.
public static class GenericExtensions
{
/// <summary>
/// Determines whether the specified text contains any of the banned words.
/// </summary>
/// <param name="text">The text to be checked for banned words.</param>
/// <param name="bannedWords">An array of banned words to search for within the text.</param>
/// <returns>
/// <c>true</c> if the text contains any of the banned words; otherwise, <c>false</c>.
/// </returns>
public static bool HasBannedWords(this string text, params string[] bannedWords) =>
text.AsSpan().ContainsAny(SearchValues.Create(bannedWords, StringComparison.OrdinalIgnoreCase));
}
Note
The above extension method is case-insensitive, either logic and a bool passed in to determine if the search is case-insensitive or not or create an overloaded method for matching case.
The following code first reads words/tokens to search for by deserializing bannedwords.json
followed by reading the file TestBanneded.txt
which is the file to scan for spam.
Note the foreach statement uses Enumerable.Index which is in .NET Core 9 which allows deconstruction to the current index (zero based) and the item, where item is a line for the variable sentences.
Debug.WriteLine is used below as the source code was done in a Windows Forms project where Console.WriteLine does not work.
Find errors/warning in Visual Studio log file
When Visual Studio encounters errors they can be written to a log file by starting Visual Studio with the following command.
Open the ActivityLog.xml by clicking on the file usually has thousands of lines and can be tedious to find errors/warnings.
Small look at ActivityLog.xml.
The following extension methods, first and second are using SearchValues were for the following code sample the second will be used as we are only interested in errors and warnings. The first extension method would be used for general purpose searches. The last extension method is the conventional approach which is less flexible.
public static class Extensions
{
/// <summary>
/// Searches the specified string for any of the provided tokens case-insensitive.
/// </summary>
/// <param name="sender">The string to search within.</param>
/// <param name="tokens">An array of tokens to search for within the string.</param>
/// <returns>
/// <c>true</c> if any of the tokens are found within the string; otherwise, <c>false</c>.
/// </returns>
public static bool Search(this string sender, string[] tokens)
=> sender.AsSpan().ContainsAny(
SearchValues.Create(tokens,
StringComparison.OrdinalIgnoreCase));
/// <summary>
/// Determines whether the specified line contains a warning or error.
/// </summary>
/// <param name="line">The line of text to be checked for warnings or errors.</param>
/// <returns>
/// <c>true</c> if the line contains a warning or error; otherwise, <c>false</c>.
/// </returns>
public static bool LineHasWarningOrError(this string line)
{
ReadOnlySpan<string> tokens = ["<type>Error</type>", "<type>Warning</type>"];
return line.AsSpan().ContainsAny(SearchValues.Create(tokens, StringComparison.OrdinalIgnoreCase));
}
/// <summary>
/// Determines whether the specified line contains a warning or error using conventional string comparison.
/// </summary>
/// <param name="line">The line of text to be checked for warnings or errors.</param>
/// <returns>
/// <c>true</c> if the line contains a warning or error; otherwise, <c>false</c>.
/// </returns>
public static bool LineHasWarningOrErrorConventional(this string line) =>
line.IndexOf("<type>Error</type>", StringComparison.OrdinalIgnoreCase) > 1 &&
line.IndexOf("<type>Warning</type>", StringComparison.OrdinalIgnoreCase) > 1;
}
Executing code (full source is provided).
- First determine if the activity file exists, if so read it.
- Display the path and file name along with line count
- Iterate each line searching for errors and warnings.
Extra
Finding the activity log is not easy and that there may be multiples. To assist with finding the right activity log the provided source code has a class dedicated to working with the activity file which includes providing the path to the activity file which can be helpful for developers who want to examine older activity files.
Source code
Both point to two different GitHub repositories. For the Spam Source code check out new NET Core 9 features.
Spam Source code Activity log Source code
Summary
SearchValues provides a new method to search for words/tokens in a string which is better performing than IndexOf for larger strings and that SearchValues is more flexible than IndexOf.
Top comments (7)
How does it compare to System.Text.RegularExpressions, especially Compiled regular expression? (Performance scaling with 10, 100, ... search values and input text size.)
While I can't speak for the compiled regex, if you look at the code produced by using the regex source generator, especially in the case of word matches (i.e.
spam|advertisement|clickbait
), it will typically useSearchValues
in the generated code.Based on that, I would expect the performance to be similar and to scale in a similar fashion.
Tried this out of curiosity (.net 9) with:
and compared to SearchValues it is usually way slower. Worst case for regex is when the input does not contain any of the searched sequences.
In some special cases, especially when the sequence is found at the beginning the compiled regex was quicker, but in other cases it was 2x - 20x slower.
I think it will be very dependent on the data set and what the regex is. I ran the following benchmark using Bogus to generate a large chunk of Lorem text and then tacking the word "Tuesday" at the beginning, at the end, and then not at all (a misspelled version however to keep the text basically identical in length). What I found is that, with the exception of the text where "Tuesday" was the first word, the performance was pretty similar between both:
For reference, here is the code:
I just had to try and compare this agan. I used shorter input text (~3300 chars) and 24 searched words. (Your input is tested by methods ending '2'.) I used the same approach by placing one of the search words early or late into the sequence (or not at all - no match) in methods suffixed '1'.
SearchValues
are scaling better with more search keywords.Interesting, and good to know. Thanks for running some more tests π
One thing that is important to note with
SearchValues
is that it is a bit expensive to create and so recommended usage is to create it once and reuse it.The code presented here is creating a new instance of
SearchValues
on each call. It is understandable for the example since we are allowing for custom values each time, but it should be pointed out that it is best practice to cache the instance and reuse it.For instance, this example:
would be better written as:
Some comments may only be visible to logged-in visitors. Sign in to view all comments.