DEV Community

Roman Kondratev
Roman Kondratev

Posted on

Introducing RCParsing - brand new .NET framework for DSLs and data scrapers

If you've ever built a parser in .NET, you know the struggle: choosing between performance and readability, wrestling with token priorities, or debugging cryptic error messages. What if you could have it all – a parser that's both blazing fast and delightful to use?

Meet RCParsing – a revolutionary lexerless parsing library that puts developer experience first without compromising on performance.

Why Another Parsing Library?

Most parsing libraries force you to choose between different trade-offs. RCParsing breaks this pattern by offering:

  • 🐍 Hybrid Power: Unique barrier token support for indent-sensitive languages like Python and YAML
  • ☄️ Incremental Parsing: Edit large documents with instant feedback – perfect for LSP servers
  • 🌀 Lexerless Freedom: No more token priority headaches – parse directly from raw text
  • 🎨 Fluent API: Write parsers that read like clean BNF grammars
  • 🐛 Superior Debugging: Get detailed error messages with stack/walk traces and precise source locations

See It in Action

Simple Math Parser

var builder = new ParserBuilder();
builder.Settings.SkipWhitespaces();

builder.CreateMainRule("expression")
    .Number<double>()
    .LiteralChoice("+", "-")
    .Number<double>()
    .Transform(v => {
        var value1 = v.GetValue<double>(index: 0);
        var op = v.GetText(index: 1);
        var value2 = v.GetValue<double>(index: 2);
        return op == "+" ? value1 + value2 : value1 - value2;
    });

var parser = builder.Build();
var result = parser.Parse("10 + 15").GetValue<double>();
Console.WriteLine(result); // 25
Enter fullscreen mode Exit fullscreen mode

Python-like Indentation Parsing

builder.BarrierTokenizers
    .AddIndent(indentSize: 4, "INDENT", "DEDENT");

builder.CreateRule("block")
    .Token("INDENT")
    .OneOrMore(b => b.Rule("statement"))
    .Token("DEDENT");
Enter fullscreen mode Exit fullscreen mode

This killer feature lets you parse Python-style indentation naturally – something most .NET parsing libraries struggle with.

Performance That Competes

Don't let the great API fool you – RCParsing is built for speed. Check out these benchmarks against established libraries:

JSON Parsing (Combinator Style)

Method Mean Ratio Allocated
JsonShort_RCParsing 1,542 ns 1.00 2,280 B
JsonShort_Parlot 2,326 ns 1.51 1,960 B
JsonShort_Pidgin 11,427 ns 7.41 3,664 B

Expression Evaluation

Method Mean Ratio Allocated
ExpressionShort_RCParsing 2,527 ns 1.00 3,736 B
ExpressionShort_TokenCombination 474 ns 0.19 656 B
ExpressionShort_Parlot 591 ns 0.23 896 B

The "Token Combination" style shows what's possible when you bypass AST construction for direct, allocation-free results.

Python 3.13

Method Mean Ratio Allocated
PythonBig_RCParsing 35,644 us 1.00 37397.53 KB
PythonBig_RCParsing_Optimized 5,249 us 0.15 3863.07 KB
PythonBig_ANTLR 5,631 us 0.16 6699.11 KB

Yes, you see right, this library can parse any grammars - from A+B to entire Python 3.13!

Real-World Ready

Advanced Error Messages

RCParsing gives you incredibly detailed error information:

The line where the error occurred (position 130):
    "tags": ["tag1", "tag2", "tag3"],,
                   line 5, column 35 ^

',' is unexpected character, expected one of:
  'string'
  literal '}'

['string'] Stack trace (top call recently):
- Sequence 'pair':
    'string' <-- here
    literal ':'
    'value'

... 316 hidden parsing steps. Total: 356 ...
[ENTER]   pos:128   literal '//'
[FAIL]    pos:128   literal '//' failed to match: '],,\r\n\t"isActive...'
[FAIL]    pos:128   Sequence... failed to match: '],,\r\n\t"isActive...'
...
[FAIL]    pos:0     'value' failed to match: '{\r\n\t"id": 1,\r\n\t...'
[FAIL]    pos:0     'content' failed to match: '{\r\n\t"id": 1,\r\n\t...'
... End of walk trace ...
Enter fullscreen mode Exit fullscreen mode

Incremental Reparsing

Perfect for editors and IDEs:

var ast = jsonParser.Parse(json);
// Later, when the document changes:
var changedAst = ast.Reparsed(changedJson);
// Only changed sections are re-parsed!
Enter fullscreen mode Exit fullscreen mode

Beyond Traditional Parsing

Regex on Steroids (for data scraping)

var parser = builder.Build();
var prices = parser .FindAllMatches<dynamic>(input).ToList();
// Extract all "Price: X.XX USD" patterns with transformations
Enter fullscreen mode Exit fullscreen mode

Combinator Style for Maximum Speed

Skip AST construction entirely when you need raw performance:

builder.CreateToken("value")
    .SkipWhitespaces(b => b.Choice(
        c => c.Token("string"),
        c => c.Token("number"),
        // ... more choices
    ));

// Direct matching without AST
var result = parser.MatchToken<Dictionary<string, object>>("value", json);
Enter fullscreen mode Exit fullscreen mode

Who Should Use RCParsing?

  • Language Developers: Creating DSLs or full programming languages
  • Tooling Authors: Building LSP servers, linters, or formatters
  • Data Engineers: Extracting and transforming complex data formats
  • Library Maintainers: Implementing configuration parsers or query languages
  • Anyone Tired of Parser Pain: Who wants a delightful parsing experience

Getting Started

dotnet add package RCParsing
Enter fullscreen mode Exit fullscreen mode

Explore the documentation and try the web demo to see RCParsing in action.

What’s Next for RCParsing

RCParsing is actively developed with an exciting roadmap:

  • Grammar transformers for automatic optimization
  • Semantic analysis tools
  • NFA algorithm for complex rules
  • Visualization and debugging tools

Conclusion

Interested? Install the library and go parsing! Also check the Github repo.

What’s the biggest parsing pain you’ve faced in .NET? I’d love to hear your cases in comments below.

Top comments (0)