DEV Community

Cover image for C# Search by multiple strings
Karen Payne
Karen Payne

Posted on

C# Search by multiple strings

Introduction

Usually when there is a need to determine if a string has multiple tokens/words a developer uses code like the following.

public static class Extensions
{
    public static bool Search(this string line) =>
        line.IndexOf("hello", StringComparison.OrdinalIgnoreCase) > 1 && 
        line.IndexOf("world", StringComparison.OrdinalIgnoreCase) > 1;
}
Enter fullscreen mode Exit fullscreen mode

Starting with .NET Core 8, Microsoft provides

System.Buffers.SearchValues<T> Class

Learn about SearchValues.

Which to use IndexOf or SearchValues?

SearchValues is a powerful structure that improves the efficiency of search operations. Providing a dedicated and optimized method for lookups, helps you write more performant and cleaner code, especially in scenarios where checking for multiple values is frequent.

SearchValues is not a replacement for IndexOf or IndexOfAny, SearchValues over larger strings which means for smaller strings a developer can use IndexOf && IndexOf etc.

Examples for SearchValues

Does text contain spam?

The text can come from any source, in this case to keep things simple, a text file.

TextBanned.txt

Hello Karen, I am writing to inform you that your account is now active.
This is not a spam message. Please click the link below
Enter fullscreen mode Exit fullscreen mode

In a project a json file is used for watched tokens/words.

bannedwords.json

[
  {
    "Id": "1",
    "Name": "spam"
  },
  {
    "Id": "2",
    "Name": "advertisement"
  },
  {
    "Id": "3",
    "Name": "clickbait"
  }
]
Enter fullscreen mode Exit fullscreen mode

The following model is used to deserialize the file above.

public class BannedWord
{
    public string Id { get; set; }
    public string Name { get; set; }
}
Enter fullscreen mode Exit fullscreen mode

Next, create a language extension method for SearchValues.

public static class GenericExtensions
{
    /// <summary>
    /// Determines whether the specified text contains any of the banned words.
    /// </summary>
    /// <param name="text">The text to be checked for banned words.</param>
    /// <param name="bannedWords">An array of banned words to search for within the text.</param>
    /// <returns>
    /// <c>true</c> if the text contains any of the banned words; otherwise, <c>false</c>.
    /// </returns>
    public static bool HasBannedWords(this string text, params string[] bannedWords) => 
        text.AsSpan().ContainsAny(SearchValues.Create(bannedWords, StringComparison.OrdinalIgnoreCase));
}
Enter fullscreen mode Exit fullscreen mode

Note
The above extension method is case-insensitive, either logic and a bool passed in to determine if the search is case-insensitive or not or create an overloaded method for matching case.

The following code first reads words/tokens to search for by deserializing bannedwords.json followed by reading the file TestBanneded.txt which is the file to scan for spam.

Note the foreach statement uses Enumerable.Index which is in .NET Core 9 which allows deconstruction to the current index (zero based) and the item, where item is a line for the variable sentences.

Debug.WriteLine is used below as the source code was done in a Windows Forms project where Console.WriteLine does not work.

code to search for spam using SearchValues

Find errors/warning in Visual Studio log file

When Visual Studio encounters errors they can be written to a log file by starting Visual Studio with the following command.

devenv.exe /log

Open the ActivityLog.xml by clicking on the file usually has thousands of lines and can be tedious to find errors/warnings.

Small look at ActivityLog.xml.

ActivityLog.xml small view

The following extension methods, first and second are using SearchValues were for the following code sample the second will be used as we are only interested in errors and warnings. The first extension method would be used for general purpose searches. The last extension method is the conventional approach which is less flexible.

public static class Extensions
{
    /// <summary>
    /// Searches the specified string for any of the provided tokens case-insensitive.
    /// </summary>
    /// <param name="sender">The string to search within.</param>
    /// <param name="tokens">An array of tokens to search for within the string.</param>
    /// <returns>
    /// <c>true</c> if any of the tokens are found within the string; otherwise, <c>false</c>.
    /// </returns>
    public static bool Search(this string sender, string[] tokens) 
        => sender.AsSpan().ContainsAny(
            SearchValues.Create(tokens, 
                StringComparison.OrdinalIgnoreCase));

    /// <summary>
    /// Determines whether the specified line contains a warning or error.
    /// </summary>
    /// <param name="line">The line of text to be checked for warnings or errors.</param>
    /// <returns>
    /// <c>true</c> if the line contains a warning or error; otherwise, <c>false</c>.
    /// </returns>
    public static bool LineHasWarningOrError(this string line)
    {
        ReadOnlySpan<string> tokens = ["<type>Error</type>", "<type>Warning</type>"];
        return line.AsSpan().ContainsAny(SearchValues.Create(tokens, StringComparison.OrdinalIgnoreCase));
    }

    /// <summary>
    /// Determines whether the specified line contains a warning or error using conventional string comparison.
    /// </summary>
    /// <param name="line">The line of text to be checked for warnings or errors.</param>
    /// <returns>
    /// <c>true</c> if the line contains a warning or error; otherwise, <c>false</c>.
    /// </returns>
    public static bool LineHasWarningOrErrorConventional(this string line) =>
        line.IndexOf("<type>Error</type>", StringComparison.OrdinalIgnoreCase) > 1 && 
        line.IndexOf("<type>Warning</type>", StringComparison.OrdinalIgnoreCase) > 1;
}
Enter fullscreen mode Exit fullscreen mode

Executing code (full source is provided).

  • First determine if the activity file exists, if so read it.
  • Display the path and file name along with line count
  • Iterate each line searching for errors and warnings.

full executing code

Extra

Finding the activity log is not easy and that there may be multiples. To assist with finding the right activity log the provided source code has a class dedicated to working with the activity file which includes providing the path to the activity file which can be helpful for developers who want to examine older activity files.

Source code

Both point to two different GitHub repositories. For the Spam Source code check out new NET Core 9 features.

Spam Source code Activity log Source code

Summary

SearchValues provides a new method to search for words/tokens in a string which is better performing than IndexOf for larger strings and that SearchValues is more flexible than IndexOf.

Top comments (7)

Collapse
 
peter_truchly_4fce0874fd5 profile image
Peter Truchly

How does it compare to System.Text.RegularExpressions, especially Compiled regular expression? (Performance scaling with 10, 100, ... search values and input text size.)

Collapse
 
pstrjds profile image
jshergal

While I can't speak for the compiled regex, if you look at the code produced by using the regex source generator, especially in the case of word matches (i.e. spam|advertisement|clickbait), it will typically use SearchValues in the generated code.

Based on that, I would expect the performance to be similar and to scale in a similar fashion.

Collapse
 
peter_truchly_4fce0874fd5 profile image
Peter Truchly

Tried this out of curiosity (.net 9) with:

[GeneratedRegex(@" ...")
private static partial Regex CompiledRegex();
//by doing 1M times
for (int i = 0; i < repetitions; i++) { testRegex.IsMatch(testText); }
Enter fullscreen mode Exit fullscreen mode

and compared to SearchValues it is usually way slower. Worst case for regex is when the input does not contain any of the searched sequences.
In some special cases, especially when the sequence is found at the beginning the compiled regex was quicker, but in other cases it was 2x - 20x slower.

Thread Thread
 
pstrjds profile image
jshergal

I think it will be very dependent on the data set and what the regex is. I ran the following benchmark using Bogus to generate a large chunk of Lorem text and then tacking the word "Tuesday" at the beginning, at the end, and then not at all (a misspelled version however to keep the text basically identical in length). What I found is that, with the exception of the text where "Tuesday" was the first word, the performance was pretty similar between both:

Method searchString Mean Error StdDev
FindStringWithRegex Con(...)day [20809] 1,670.42 ms 8.770 ms 7.324 ms
FindStringWithSearchValues Con(...)day [20809] 1,656.58 ms 7.650 ms 7.156 ms
FindStringWithRegex Not(...)sya [20813] 1,631.96 ms 2.385 ms 1.992 ms
FindStringWithSearchValues Not(...)sya [20813] 1,613.41 ms 2.734 ms 2.283 ms
FindStringWithRegex Tue(...)ins [20810] 43.12 ms 0.163 ms 0.136 ms
FindStringWithSearchValues Tue(...)ins [20810] 26.43 ms 0.493 ms 0.462 ms

For reference, here is the code:

[SimpleJob]
public partial class FindTextBenchmark
{
    private const int Iterations = 1_000_000;

    [GeneratedRegex(@"Monday|Tuesday|Wednesday", RegexOptions.IgnoreCase)]
    private static partial Regex MyReg();

    private static SearchValues<string> MySearchValues =
        SearchValues.Create(["Monday", "Tuesday", "Wednesday"], StringComparison.OrdinalIgnoreCase);

    private static readonly Lorem Data = new()
    {
        Random = new Randomizer(42)
    };

    private static readonly string BaseText = Data.Paragraphs(100);

    public IEnumerable<object> ArgumentStrings()
    {
        yield return "Contains" + BaseText + " Tuesday";
        yield return "Tuesday " + BaseText + " Contains";
        yield return "NotContains " + BaseText + " Teudsya";
    }

    [Benchmark]
    [ArgumentsSource(nameof(ArgumentStrings))]
    public int FindStringWithRegex(ReadOnlySpan<char> searchString)
    {
        int foundCount = 0;
        for (int i = 0; i < Iterations; ++i)
        {
            if (MyReg().IsMatch(searchString))
                foundCount++;
        }

        return foundCount;
    }

    [Benchmark]
    [ArgumentsSource(nameof(ArgumentStrings))]
    public int FindStringWithSearchValues(ReadOnlySpan<char> searchString)
    {
        int foundCount = 0;
        for (int i = 0; i < Iterations; ++i)
        {
            if (searchString.ContainsAny(MySearchValues))
                foundCount++;
        }

        return foundCount;
    }
}
Enter fullscreen mode Exit fullscreen mode
Thread Thread
 
peter_truchly_4fce0874fd5 profile image
Peter Truchly

I just had to try and compare this agan. I used shorter input text (~3300 chars) and 24 searched words. (Your input is tested by methods ending '2'.) I used the same approach by placing one of the search words early or late into the sequence (or not at all - no match) in methods suffixed '1'.

  • It seems that SearchValues are scaling better with more search keywords.
BenchmarkDotNet v0.14.0, Windows 10
AMD Ryzen 9 9950X, 1 CPU, 16 logical and 16 physical cores
.NET SDK 9.0.100
[Host] .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

| Method              | N       | text                 | Mean            |
|-------------------- |-------- |--------------------- |----------------:|
| UsingSearchValues2  | 1000    | Con(...)day [20809]  |       509.02 us |
| UsingCompiledRegex2 | 1000    | Con(...)day [20809]  |       527.17 us |
| UsingSearchValues2  | 1000000 | Con(...)day [20809]  |   508,156.44 us |
| UsingCompiledRegex2 | 1000000 | Con(...)day [20809]  |   536,759.98 us |
| UsingSearchValues2  | 1000    | Not(...)sya [20813]  |       495.79 us |
| UsingCompiledRegex2 | 1000    | Not(...)sya [20813]  |       507.24 us |
| UsingSearchValues2  | 1000000 | Not(...)sya [20813]  |   492,306.42 us |
| UsingCompiledRegex2 | 1000000 | Not(...)sya [20813]  |   510,228.88 us |
| UsingSearchValues2  | 1000    | Tue(...)ins [20810]  |        20.20 us |
| UsingCompiledRegex2 | 1000    | Tue(...)ins [20810]  |        28.29 us |
| UsingSearchValues2  | 1000000 | Tue(...)ins [20810]  |    15,613.21 us |
| UsingCompiledRegex2 | 1000000 | Tue(...)ins [20810]  |    27,636.95 us |
| UsingSearchValues1  | 1000    | Earl(...)ien. [3243] |        60.78 us |
| UsingCompiledRegex1 | 1000    | Earl(...)ien. [3243] |       197.24 us |
| UsingSearchValues1  | 1000000 | Earl(...)ien. [3243] |    59,129.08 us |
| UsingCompiledRegex1 | 1000000 | Earl(...)ien. [3243] |   197,564.64 us |
| UsingSearchValues1  | 1000    | Late(...)ien. [3242] |     1,588.84 us |
| UsingCompiledRegex1 | 1000    | Late(...)ien. [3242] |     7,135.61 us |
| UsingSearchValues1  | 1000000 | Late(...)ien. [3242] | 1,212,064.31 us |
| UsingCompiledRegex1 | 1000000 | Late(...)ien. [3242] | 7,153,609.21 us |
| UsingSearchValues1  | 1000    | NoMa(...)ien. [3234] |     1,463.93 us |
| UsingCompiledRegex1 | 1000    | NoMa(...)ien. [3234] |     7,150.13 us |
| UsingSearchValues1  | 1000000 | NoMa(...)ien. [3234] | 1,682,654.02 us |
| UsingCompiledRegex1 | 1000000 | NoMa(...)ien. [3234] | 7,153,832.79 us |
Enter fullscreen mode Exit fullscreen mode
Thread Thread
 
pstrjds profile image
jshergal

Interesting, and good to know. Thanks for running some more tests πŸ˜ƒ

Collapse
 
pstrjds profile image
jshergal • Edited

One thing that is important to note with SearchValues is that it is a bit expensive to create and so recommended usage is to create it once and reuse it.

The code presented here is creating a new instance of SearchValues on each call. It is understandable for the example since we are allowing for custom values each time, but it should be pointed out that it is best practice to cache the instance and reuse it.

For instance, this example:

    public static bool LineHasWarningOrError(this string line)
    {
        ReadOnlySpan<string> tokens = ["<type>Error</type>", "<type>Warning</type>"];
        return line.AsSpan().ContainsAny(SearchValues.Create(tokens, StringComparison.OrdinalIgnoreCase));
    }
Enter fullscreen mode Exit fullscreen mode

would be better written as:

    private static readonly SearchValues<string> WarningsOrErrorSearch = SearchValues.Create(["<type>Error</type>", "<type>Warning</type>"], StringComparison.OrdinalIgnoreCase);

    public static bool LineHasWarningOrError(this string line)
    {
        return line.AsSpan().ContainsAny(WarningsOrErrorSearch);
    }
Enter fullscreen mode Exit fullscreen mode

Some comments may only be visible to logged-in visitors. Sign in to view all comments.