DEV Community: samvidmistry

Implementing a Language Server with Language Server Protocol - Basic Completion (Part 5)

samvidmistry — Sat, 18 Oct 2025 20:46:14 +0000

1. Introduction

In the previous post, I covered how we can show documentation upon hovering on any field in a JSON schema. At this point, we already have all of the lower-level functionality required to navigate the JSON schema as well as the JSON file being edited. This post will directly use those classes to implement completion, aka autocomplete.

In my opinion, completion is vital for one minor and one major reason. The minor reason is that it helps cut down on repeated typing. This may not be as pronounced in the case of ARM templates as it is in other languages. The major reason I consider completion to be critical for a good editor experience is because it provides instant feedback about the correctness of the code you just wrote. If you are writing the name of a property and the completion UI does not show that field as a candidate, you can immediately pause and reassess whether what you are doing is correct. This is critical in the case of ARM templates since there is no compiler to check your files for you. We will implement the completion functionality at two different levels of difficulty. The first will be a very basic completion that will blindly return all fields available at any particular location. The other will take into account the surrounding context, like what fields are already specified and combinators such as AllOf, OneOf, and AnyOf. The latter ends up being rather complex and since my implementation is just for illustrative purposes, you may not always get a correct answer.

You can find the basic implementation by checking out commit 9ede1d5. The comprehensive implementation is available in commit df70890 and will be covered in the next part.

2. Basic Completion

We will create a new handler for completion and implement the required methods. This is what the scaffolding of the class looks like. We create a class called CompletionHandler and take in the helper classes that we need. We also provide the registration options. One thing to note here is the field TriggerCharacters. This field takes an array of characters that, when inserted in the editor, automatically trigger the completion UI. This can be different for different languages. In C#, you may want to trigger completion on a .. In YAML, you may want to trigger completion on a -. In this case, we are mainly interested in providing completion for keys, which will always be enclosed within double quotes, so we provide it as the trigger character for completion.

namespace Armls.Handlers;

public class CompletionHandler : CompletionHandlerBase
{
    private readonly BufferManager bufManager;
    private readonly MinimalSchemaComposer schemaComposer;

    public CompletionHandler(BufferManager manager, MinimalSchemaComposer schemaComposer)
    {
        bufManager = manager;
        this.schemaComposer = schemaComposer;
    }

    public override async Task<CompletionList> Handle(
        CompletionParams request,
        CancellationToken cancellationToken
    )
    {
        /// handle completion
    }

    protected override CompletionRegistrationOptions CreateRegistrationOptions(CompletionCapability capability, ClientCapabilities clientCapabilities)
    {
        return new CompletionRegistrationOptions{
            DocumentSelector = TextDocumentSelector.ForPattern("**/*.json", "**/*.jsonc"),
            TriggerCharacters = new string[] { "\"" },
            ResolveProvider = false
        };
    }
}

The base algorithm to provide completion in our server is straightforward.

Construct a minimal schema for the ARM template.
Construct a path to the parent node of the cursor in the current file using JsonPathGenerator.
Navigate that path in the minimal schema to find all applicable child properties under that parent.
Return a list of these properties as completion candidates.

One complication here is that when we navigate to the parent node in the minimal schema and find the set of applicable keys, we may find ourselves pointing to a combinator, like OneOf. To keep things simple, we recursively travel the combinators until we get to a node that has direct child properties. In the end, we combine all results from all branches of the combinator and present a single list to the user. This may show completion candidates that are not always applicable, but this is a good way to get started. We create a method to recursively travel the combinators.

private IEnumerable<CompletionItem> FindCompletionCandidates(JSchema schema)
{
    if (schema.AllOf.Count != 0 || schema.AnyOf.Count != 0 || schema.OneOf.Count != 0)
    {
        return schema.AllOf.Concat(schema.AnyOf).Concat(schema.OneOf)
            .SelectMany(childSchema => FindCompletionCandidates(childSchema));
    }

    return schema.Properties.Select(kvp => new CompletionItem()
    {
        Label = kvp.Key,
        Documentation = new StringOrMarkupContent(kvp.Value.Description ?? "")
    });
}

Note that we also send the documentation for a field along with the completion candidate since many editors have provisions to show documentation alongside the completion list. Finally, we are ready to tackle the core of the handler. The code here is a slightly simplified version of what is available in the repository.

public override async Task<CompletionList> Handle(
        CompletionParams request,
        CancellationToken cancellationToken
)
{
    var completionList = new CompletionList();

    var buffer = bufManager.GetBuffer(request.TextDocument.Uri);
    var schemaUrl = buffer?.GetStringValue("$schema");

    var schema = await schemaComposer.ComposeSchemaAsync(schemaUrl, buffer!.GetResourceTypes());

    var cursor = new TSPoint{
        row = (uint)request.Position.Line,
        column = (uint)request.Position.Character,
    };

    // Schema path contains the path /till/ the last element, which in our case is the field we are trying to write.
    // So we get the path only till the parent.
    var path = Json.JsonPathGenerator.FromNode(buffer, buffer.ConcreteTree.RootNode().DescendantForPoint(cursor).Parent());

    var targetSchema = Schema.SchemaNavigator.FindSchemaByPath(schema, path);

    return new CompletionList(FindCompletionCandidates(targetSchema).DistinctBy(c => c.Label + ":" + c.Documentation));
}

At a high level, this method does the following:

Extracts the schema URL from the buffer and constructs the minimal schema.
Finds the path to the parent node of the current cursor location.
Finds the schema node corresponding to the parent node of the cursor.
Extracts completion candidates recursively from the schema node.
Deduplicates them to avoid showing two completion items with the same name.

Finally, we register the handler in our MainAsync method.

var server = await LanguageServer.From(options =>
            options
                .WithInput(Console.OpenStandardInput())
                .WithOutput(Console.OpenStandardOutput())
                .WithServices(s =>
                    s.AddSingleton(new BufferManager()).AddSingleton(new MinimalSchemaComposer())
                )
                .WithHandler<TextDocumentSyncHandler>()
                .WithHandler<HoverHandler>()
                .WithHandler<CompletionHandler>() // newly added
);

3. Conclusion

This is how the completion looks in the Emacs UI. VS Code will show a similar UI for completion. Notice that the completion list shows the fields that are already defined in the file. We will see how to tackle this issue in the next part.

Setting up GoatCounter on my Homelab

samvidmistry — Fri, 15 Aug 2025 00:00:00 +0000

1. Introduction

Everyone loves dashboards. As I recently started blogging, I also wanted to see how my blog was doing. The numbers won't be anything to brag about since I just started blogging, but I wanted to see them nevertheless. When I started looking for solutions, I had some requirements in mind:

Free and open source
Self-hostable
Works on resource-constrained environments
Simple to set up

My homelab is almost a decade old with barely 1–2 GB of RAM to spare for this analytics software. With these requirements in mind, I fired up my LLM web interface and sent it on a deep research errand.

2. Plausible

My first choice was Plausible. It ticked all the boxes—or so I thought. Setting up Plausible was a breeze because NixOS already has a build defined for it. After running Plausible for a while, it consumed whatever RAM was left on my homelab along with almost all of the swap memory. It made the homelab unusable. Even logging into the homelab through SSH took minutes. Killing the process was another chore because it was running as a systemd service. This isn't to say that Plausible is bad software, just that it didn't fit my use case. Once I got the homelab back under control, I went back to the drawing board. This time I asked an LLM to find software that uses the minimal amount of resources to run. Next, I settled on GoatCounter.

3. GoatCounter

I looked at the homepage of GoatCounter and found it to be exactly what I was looking for. It checked all the boxes. No overly complicated menus or enterprise-grade features—just a simple, self-hostable, free and open source project.

The diagram above shows the setup I want to achieve. A VPS is necessary in this pipeline as the homelab is only reachable within your tailnet. Your VPS also needs a valid domain name and a certificate to support HTTPS. I'm using a free DuckDNS address for my VPS.

3.1. Installing GoatCounter

Setting up GoatCounter on NixOS was just four lines of configuration because a build is already defined in the NixOS configuration. Instructions for running with Docker are available here.

services.goatcounter = {
  enable = true;
  address = "0.0.0.0";
  proxy = true;
  extraArgs = [ "-automigrate" ];
};

The options are self-explanatory. An important option here is proxy = true. Since GoatCounter is running on my homelab behind a reverse proxy, telling GoatCounter about this setup disables all TLS enforcement. The responsibility of handling TLS falls to the VPS. This should be enough to get GoatCounter running on your machine. You can visit https://<tail-scale-url>:8081 and you'll be greeted with the create account page.

You can provide your signup info and the website at which GoatCounter will be accessible.

Note: You need to provide the domain where GoatCounter is accessible. Since the user's browser will use the domain name of your VPS, put that domain in Your site domain, not your Tailscale MagicDNS domain. Once you've set your VPS domain as the site where GoatCounter is available, you won't be able to log in to GoatCounter using your MagicDNS domain—you'll hit a login loop. This tripped me up during my setup as well. To log into GoatCounter and view the dashboard, use your VPS domain.

Once you set up your account, you'll be asked to log in. After logging in, you'll see a page like this, except that it won't show any pages and the charts will all be flat lines at 0. More images are available on the GoatCounter homepage.

3.2. Setting up reverse proxy

I use Caddy as my reverse proxy. Setup for Nginx should look similar. You can add a block like this to your Caddyfile to proxy requests to your homelab.

example.public.domain.net {
    encode gzip
    reverse_proxy http://example-homelab.magicdns.ts.net:8081 {
        # Caddy sets all of these properly by default
        # Showing the settings here for reference to use
        # with other reverse proxies
        header_up Host {host}
        header_up X-Forwarded-Proto https
        header_up X-Forwarded-For {remote}
        header_up X-Real-IP {remote}
    }
}

Since the reverse proxy terminates the connection and starts a new request from itself to your homelab, we need to update a few headers to make sure GoatCounter sees the origins of requests properly. Otherwise GoatCounter might think that all requests are coming from the VPS, which will render some of the stats useless.

header_up Host {host} → Preserves the original Host header that the client requested
header_up X-Forwarded-Proto https → The protocol used by the client
header_up X-Forwarded-For {remote} → Chain of IP addresses starting from the client
header_up X-Real-IP {remote} → IP address of the proxy

Caddy sets these headers by default, so you don't need to set them explicitly. I'm showing them here for reference.

3.3. Setting up JavaScript

Finally, we can add a script to the page that is served to the user. This script gathers information about the user's environment and sends it to our VPS, which proxies it to GoatCounter. There are other ways to get data into GoatCounter, covered here. You just need to add this little script tag to your pages.

<script data-goatcounter="https://example.public.domain.net/count"
        async src="//example.public.domain.net/count.js"></script>

The data-goatcounter attribute tells the script where to send the gathered user data.

3.4. Skipping your own views

A neat snippet in the count.js script lets you easily skip your own views of the blog. It doesn't matter much if you have a popular blog, but for a new or little-visited blog, your own views might skew the numbers. You can visit your own website with a small ID appended to the end, namely #toggle-goatcounter. This sets a field in your browser's local storage that tells it to ignore views from this browser. You'll see a visual confirmation as well.

4. Conclusion

There you have it: setting up a very simple tracking system on your blog to extract insights from the visits your blog gets. GoatCounter is free and open source software. If you find value in GoatCounter, consider donating to the author on GitHub Sponsors to support his work.

Implementing a Language Server with Language Server Protocol - Hover (Part 4)

samvidmistry — Sat, 09 Aug 2025 00:00:00 +0000

1. Introduction

In the previous post, we created minimal schemas for ARM templates that help validate their structure semantically. In this post, we will implement hover support using LSP. It involves:

Finding the node under the cursor
Walking up to the root node of the document
Finding the corresponding field in the JSON Schema

You can check out commit 78372cc to follow along. The result will look like this:

2. Path to Root

We construct the path from the node under the cursor to the root of the document. This ensures we always find documentation for the correct field, even when multiple fields share the same name. We create a utility class called JsonPathGenerator to generate this path. Our schema only has two types of containers: pair and array. In either case we record the address of the current field in that container. For pairs, we record the key. For arrays, we record the index of the element inside the array. Recording the index lets us detect arrays during schema traversal. If a segment in the path is an index, we know the next element is inside an array and we should look at the schema of the array's items rather than the array itself.

public static class JsonPathGenerator
{
    public static List FromNode(Buffer.Buffer buffer, TSNode startNode)
    {
        var path = new List();
        var currentNode = startNode;
        while ((currentNode = currentNode.Parent()) != null)
        {
            if (currentNode.Type == "pair")
            {
                var keyNode = currentNode.ChildByFieldName("key");
                if (keyNode != null)
                {
                    path.Insert(0, keyNode.Text(buffer.Text).Trim('"'));
                }
            }
            else if (currentNode.Type == "array")
            {
                uint index = 0;
                for (uint i = 0; i &lt; currentNode.NamedChildCount; i++)
                {
                    if (currentNode.NamedChild(i).Equals(startNode))
                    {
                        index = i;
                        break;
                    }
                }
                path.Insert(0, index.ToString());
            }
        }
        return path;
    }
}

3. Schema Traversal

Once we have the path to the element, we traverse the schema in the same order to locate the schema for the field under the cursor. We write another utility class for this purpose, called SchemaNavigator. The flow mirrors the path construction, with one wrinkle. The second half of the function simply checks whether the current path segment exists in the properties of the current schema object. If it doesn't, and the segment represents an array, we look for the segment in the definition of the array's items. The first half handles JSON Schema combinators. A property can be defined in terms of a combination of other properties using anyOf, allOf, or oneOf. If we encounter these, we perform a depth-first search through the schemas to find the field. This straightforward approach gets the point across for this article, though it may not be ideal for production software.

public static class SchemaNavigator
{
    public static JSchema? FindSchemaByPath(JSchema rootSchema, List path)
    {
        JSchema? currentSchema = rootSchema;

        for (int i = 0; i &lt; path.Count(); i++)
        {
            var segment = path[i];
            if (currentSchema == null) return null;

            IList combinator = null;
            if (currentSchema.AnyOf.Count &gt; 0) { combinator = currentSchema.AnyOf; }
            else if (currentSchema.AllOf.Count &gt; 0) { combinator = currentSchema.AllOf; }
            else if (currentSchema.OneOf.Count &gt; 0) { combinator = currentSchema.OneOf; }

            if (combinator is not null)
            {
                foreach (var schemaPath in combinator)
                {
                    var nestedSchema = FindSchemaByPath(schemaPath, path.Skip(i).ToList());
                    if (nestedSchema is not null) return nestedSchema;
                }
                return null; // Path segment not found in any of the choices
            }

            if (currentSchema.Properties.TryGetValue(segment, out var propertySchema))
            {
                currentSchema = propertySchema;
            }
            // If the segment is an integer, attempt to navigate into an array.
            else if (currentSchema.Type == JSchemaType.Array &amp;&amp; int.TryParse(segment, out _))
            {
                // For ARM templates, arrays usually have a single schema definition for all their items.
                if (currentSchema.Items.Count &gt; 0) currentSchema = currentSchema.Items[0];
                else return null; // Array schema has no item definition.
            }
            else return null; // Path segment not found.
        }

        return currentSchema;
    }
}

4. Hover Handler

Now let's put it all together. We'll define a HoverHandler that finds the symbol under the cursor, constructs the path, traverses the schema, and returns the hover content. It needs access to the BufferManager to read the buffer text and to the MinimalSchemaComposer from the previous article to build a minimal schema that includes documentation.

public class HoverHandler : HoverHandlerBase
{
    private BufferManager bufManager;
    private MinimalSchemaComposer schemaComposer;

    public HoverHandler(BufferManager manager, MinimalSchemaComposer schemaComposer)
    {
        bufManager = manager;
        this.schemaComposer = schemaComposer;
    }

    public override async Task Handle(
        HoverParams request,
        CancellationToken cancellationToken
    )
    {
        var buffer = bufManager.GetBuffer(request.TextDocument.Uri);
        var schemaUrl = buffer.GetStringValue("$schema");
        var schema = await schemaComposer.ComposeSchemaAsync(schemaUrl, buffer.GetResourceTypes());
        var cursorPosition = new TSPoint()
        {
            row = (uint)request.Position.Line,
            column = (uint)request.Position.Character,
        };

        var rootNode = buffer.ConcreteTree.RootNode();
        var hoveredNode = rootNode.DescendantForPointRange(cursorPosition, cursorPosition);
        var path = Json.JsonPathGenerator.FromNode(buffer, hoveredNode);
        var targetSchema = Schema.SchemaNavigator.FindSchemaByPath(schema, path);
        return new Hover
        {
            Contents = new MarkedStringsOrMarkupContent(
                new MarkupContent { Kind = MarkupKind.Markdown, Value = targetSchema.Description }
            ),
            Range = hoveredNode.GetRange(),
        };
    }
}

Finally, add the handler to the language server configuration in Program.cs.

private static async Task MainAsync()
{
    var server = await LanguageServer.From(options =&gt;
            options
                .WithInput(Console.OpenStandardInput())
                .WithOutput(Console.OpenStandardOutput())
                .WithServices(s =&gt;
                    s.AddSingleton(new BufferManager()).AddSingleton(new MinimalSchemaComposer())
                )
                .WithHandler()
                .WithHandler()
    );

    await server.WaitForExit;
}

5. Conclusion

There you have it: a straightforward way to provide hover documentation. The JSON Schema for a document contains a wealth of information that editors can use to provide various features. In the next post, we'll look at providing auto-completion through LSP. As a reminder, the first post in this series explains how to use the Armls VS Code extension to interact with Armls.

Implementing a Language Server with Language Server Protocol - Schema (Part 3)

samvidmistry — Sun, 20 Jul 2025 00:00:00 +0000

1. Introduction

In previous posts, we looked at an introduction to LSP and syntax checking using TreeSitter. This post will talk about using JSON Schema to check the schema of ARM templates. This post took longer than usual because of the challenges involved in taking the giant ARM template schema and making it usable for interactive scenarios.

2. JSON Schema

JSON schema is a structured way of describing the schema of a JSON file. It uses standard JSON format to describe the properties and their types. We can also encode a choice between multiple different types of values supported by a property by using oneOf or compose a value out of multiple different components by using allOf. A small example of a JSON schema might look like the following:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "required": ["$schema", "resources"],
  "properties": {
    "$schema": {
      "type": "string"
    },
    "resources": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["type", "name"],
        "properties": {
          "type": { "type": "string" },
          "name": { "type": "string" }
        }
      }
    }
  }
}

Among other things, this schema also says what properties are required to be specified. You can also refer to other schemas from a schema by using $ref and pointing to the schema through either a relative or an absolute path.

3. ARM Template Schema

Recent ARM templates refer to the latest ARM template schema defined in 2019, which is present at . An abbreviated version of the schema file at that link looks like this:

{
  "id": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
  "$schema": "http://json-schema.org/draft-04/schema#",
  "title": "Template",
  "description": "An Azure deployment template",
  "type": "object",
  "properties": {
    "$schema": {
      "type": "string",
      "description": "JSON schema reference"
    },
    // ...
    "resource": {
      "description": "Collection of resource schemas",
      "oneOf": [
        {
          "allOf": [
            {
              "$ref": "#/definitions/resourceBase"
            },
            {
              "oneOf": [
                {
                  "$ref": "https://schema.management.azure.com/schemas/2017-08-01-preview/Microsoft.Genomics.json#/resourceDefinitions/accounts"
                },
                {
                  "$ref": "https://schema.management.azure.com/schemas/2016-06-01/Microsoft.RecoveryServices.legacy.json#/resourceDefinitions/vaults"
                },
        // Tons of other references
              ]
            }
          ]
        },
        // ...
        {
          "$ref": "https://schema.management.azure.com/schemas/common/autogeneratedResources.json"
        }
      ]
    },
    // ...
  }
}

It describes the structure of various entities within ARM template like parameters, variables, functions, etc. The biggest section in the ARM template is the definition of a resource. The definition is just a ton of references to all the different resources and APIs defined by azure over the years. Given the scale of the functionalities provided by Azure, this section contains dozens upon dozens of references. Finally, it has a reference to common/autogeneratedResources.json which has references to even more resources. If you download the schemas for all of the supported resource types for all their API versions, you will have close to 50K schemas that weigh over 9GB! Reading and managing over 9GB of text will be an issue even in a batch processing system, let alone an interactive system like text editors. It was a big challenge for me to figure out how I can support schema checking at reasonable speeds with the data like that. I still went ahead and tried to work with this 9GB. Here are the things I tried.

3.1. Downloading the schemas

I decided to use JSON.Net Schema library for validating schema. It provides JSchemaUrlResolver which can automatically download schemas from the internet for all $ref references. At the time of trying this method, I didn't know that the entire suit of schema files is 9GB+. I gave the library the base deploymentTemplate schema and let it download all references off the internet. Over an hour passed but it couldn't finish downloading all schemas. This made me realize that there must be a huge number of schema files being referenced and the schema files might themselves be referencing other schema files. This was clearly not scalable.

3.2. Using `azure-resource-manager-schemas`

Azure has a public repository on GitHub at Azure/azure-resource-manager-schemas. This repository is supposed to contains the schemas for all resource providers on Azure. It also ships with a server that can locally serve all requests for https://schema.management.azure.com/schemas. This seemed promising. I cloned the repository and fired up the server. With the server running locally, the library was quickly able to pull a lot of schemas from the server within a few minutes. A few minutes of loading time for the language server is still pretty high but I was willing to work with it and optimize it later. It would be easy to either ship this server with the application or directly embed the schema files within the application. It might bloat the executable but it would've been fine for the purposes of this tutorial. However, I ran into issues even before getting there. Turns out that this repository is incomplete and does not have all schemas that are referred to within the web of references. I thought that it might be missing a few schemas that I can manually download and put in the local repository but even after downloading over a dozen schemas manually the library kept finding missing schemas. This was clearly not scalable.

3.3. Shipping Schemas

I still didn't know that the total size of the web of schemas was over 9GB. I was thinking more along the lines of a few or a few hundred MBs. At this point I turned to Claude. After some back and forth, Claude wrote me a script that will recursively download all $ref schemas starting with deploymentTemplate. After the script finished running, I looked at the size of the downloaded folder and was shocked to see the size as 9GB. I was still stupid enough to work with the size. I first tried loading the schemas from filesystem at runtime. The problem with Json.NET and possibly all other schema checking libraries is that they will load the entire referenced schema before they can validate the schema of the file, even though the file only refers to 3-5 resources out of thousands resources in the schema. It makes sense because the library cannot be sure without looking at the entire schema how many errors the file has, especially when there is a huge list of oneOf resources. But this creates an issue while trying to load the entire schema in memory. In trying to load all schemas off the disk the virtual memory of armls grew to over 40GB. Next I tried embedding all schemas within the executable itself. My MBP has 64GB of RAM so I thought loading a 9GB executable should at least be possible. Apart from the issue of compilation time of over 500 seconds, the process kept dying as soon as it was launched with OOM exception. This wasn't going to work.

The core issue is that existing tools try to validate against the entire universe of possible resources, when any given template only uses a handful. The solution, as we'll see, involves dynamically constructing a minimal, relevant schema on-the-fly for each template. This gives the validator just enough information to do its job without boiling the ocean.

3.4. Writing Your Own Schema

I only get to work on this project on weekend, and then too I'm not always in the mood to tackle such a difficult problem. So I kept thinking about the problem now and then. Even LLMs didn't help much. A potential solution finally struck me this weekend. The problem in this case was that the primary deploymentTemplate schema was referring to too many unnecessary schemas from all of Azure's lifetime. I just needed it to refer to half a dozen or less resources mentioned in any ARM template. This meant modifying the deploymentTemplate on-the-fly to only contain references to the resources mentioned in the ARM template. I decided to try this idea. I was so fed up from trying various solutions to this problem that I wasn't even willing to code this solution up. I fired up Claude Code and explained the idea to it. I gave it deploymentTemplate file as reference to find out which sections it needs to rewrite on-the-fly and which sections it needs to preserve. It wrote me a first draft. After prompting it to fix a few issues, the code worked. I was able to construct a minimal schema on-the-fly specifically for the resources mentioned in the file and use it to check the schema! Since the library needed only handful of schemas to verify any reasonably sized ARM template, I didn't even need to bundle any schema files with the executable. The library could just download it off the internet almost instantaneously.

4. Code Walkthrough

We will be working with the commit ID 569cfdd for this post. I've also added a bunch of corresponding changes to TreeSitter bindings which I won't be covering.

4.1. `Buffer`

We'll first add a method in Buffer to get the set of resources and their API versions defined in the template. These will be used to define the minimal schema for validation.

class Buffer
{
    // Returns a dictionary mapping resource types to their API versions like {"Microsoft.Storage/storageAccounts": "2021-04-01"}.
    public Dictionary GetResourceTypes()
    {
        var resourceTypesWithVersions = new Dictionary();
        var query = new TSQuery(
            @"(pair (string (string_content) @key) (array (object) @resource))",
            TSJsonLanguage.Language()
        );
        var cursor = query.Execute(ConcreteTree.RootNode());
        while (cursor.Next(out TSQueryMatch? match))
        {
            var captures = match!.Captures();
            if (captures.Count &gt;= 2 &amp;&amp; captures[0].Text(Text).Equals("resources"))
            {
                var resourceNode = captures[1];

        resourceTypesWithVersions[GetPropertyValue(resourceNode, "type")] =
                  GetPropertyValue(resourceNode, "apiVersion");
            }
        }
    }
}

4.2. `MinimalSchemaComposer`

We'll define a class to create the minimal schema for a template. We'll first have a schemaJsonCache to avoid downloading the base schema template repeatedly, an HttpClient to download base schema if it doesn't exist. I have the schemas downloaded locally which I'm going to use for loading but you can load the schemas from the internet as well. In my local schema cache, the files are named with GUIDs, so I maintain a schemaIndex which maps the schema URLs for various resources to their local file names.

public class MinimalSchemaComposer
{
    private readonly Dictionary schemaJsonCache;
    private readonly HttpClient httpClient;
    private readonly string schemaDirectory = "/Users/samvidmistry/Downloads/schemas";
    private readonly Dictionary schemaIndex;

    public MinimalSchemaComposer()
    {
        schemaJsonCache = new();
        httpClient = new();
        var indexPath = Path.Combine(schemaDirectory, "schema_index.json");
        var indexJson = File.ReadAllText(indexPath);
        schemaIndex =
            JsonConvert.DeserializeObject&gt;(indexJson)
            ?? new Dictionary();
    }

We then have the main function for composing the schema. We always want the common/definitions.json included because it contains the definitions for basic ARM primitives like an expression. We then construct URLs for all mentioned resources based on the pattern followed for Azure schemas. We also add the corresponding $ref entry in a JArray. This will be used to create the minimal schema with references to just the required resources. We then parallelly load the schemas for all referenced resources from the local disk. These can also be downloaded from the internet directly with a JSchemaURLResolver(). Finally, we call the utility function ConstructSchemaWithResources to construct the final schema.

    public async Task ComposeSchemaAsync(
        string baseSchemaUrl,
        Dictionary resourceTypesWithVersions
    )
    {
    if (!schemaJsonCache.TryGetValue(baseSchemaUrl, out var schemaJson))
        {
            schemaJson = await httpClient.GetStringAsync(baseSchemaUrl);
            schemaJsonCache[baseSchemaUrl] = schemaJson;
        }

        // ... code to return if this is not a `deploymentTemplate`

        var resolver = new JSchemaPreloadedResolver();
        var resourceReferences = new JArray();

    // Always load common definitions
        var schemaUrls = new HashSet
        {
            "https://schema.management.azure.com/schemas/common/definitions.json",
        };

        foreach (var (resourceType, apiVersion) in resourceTypesWithVersions)
        {
            var parts = resourceType.Split('/');
            var provider = parts[0];
            var resourceName = parts[1];
            var schemaUrl =
                $"https://schema.management.azure.com/schemas/{apiVersion}/{provider}.json";

            schemaUrls.Add(schemaUrl);
            resourceReferences.Add(
                new JObject { ["$ref"] = $"{schemaUrl}#/resourceDefinitions/{resourceName}" }
            );
        }

        (
            await Task.WhenAll(
                schemaUrls
                    .Where(url =&gt;
                        schemaIndex.TryGetValue(url, out var filename)
                        &amp;&amp; File.Exists(Path.Combine(schemaDirectory, filename))
                    )
                    .Select(async url =&gt; new
                    {
                        Url = new Uri(url),
                        Content = await File.ReadAllTextAsync(
                            Path.Combine(schemaDirectory, schemaIndex[url])
                        ),
                    })
            )
        )
            .ToList()
            .ForEach(s =&gt; resolver.Add(s.Url, s.Content));

        return ConstructSchemaWithResources(schemaJson, resourceReferences) is { } minimalSchemaJson
            ? JSchema.Load(new JsonTextReader(new StringReader(minimalSchemaJson)), resolver)
            : null;
    }

This utility function finds the exact path where references to all thousands of schemas is embedded and replaces it with the minimal JArray that we created. It also replaces references to other schemas such as autogeneratedResources which is again a huge file with tons of references.

    private string? ConstructSchemaWithResources(string baseSchemaJson, JArray resourceReferences)
    {
        try
        {
            var schemaObj = JObject.Parse(baseSchemaJson);

            if (
                schemaObj.SelectToken("definitions.resource.oneOf[0].allOf[1].oneOf")
                is JArray resourceRefsArray
            )
            {
                resourceRefsArray.Replace(resourceReferences);
            }

            if (
                schemaObj.SelectToken("definitions.resource.oneOf") is JArray oneOfArray
                &amp;&amp; oneOfArray.Any()
            )
            {
                oneOfArray.ReplaceAll(oneOfArray.First());
            }

            return schemaObj.ToString();
        }
        catch (Exception)
        {
            // Return original schema if generation fails
            return null;
        }
    }

4.3. `Analyzer`

We will now update the Analyzer to check for schema. We will compose the minimal schema using MinimalSchemaComposer. Then use Json.NET Schema to verify the schema. For all errors we get, we will find the named parent that contains the location of the error and highlight that node with a warning. Named nodes in TreeSitter are nodes which have been given a dedicated name. These are generally the nodes that hold some semantic significance in the larger grammar. Json grammar has just 2, which are key and value. Json.NET Schema returns the errors in a tree shaped collection where it highlights errors from the smallest token that contains the error to the largest structure that nests the smaller token and every level in-between. However, we are only interested in leaf errors to highlight the smallest node that contains the error. So we have a utility function called GetLeafErrors that recursively processes the ValidationError objects to get the leaf nodes. We then find the closest named descendent for the token with error. It is useful to highlight the closest named structure instead of highlighting only the smallest token with the error because the writers of these files do not think at the level of tokens. They think at the level of entities, or in this case a single key or value, which make sense for the context. It is a good design to highlight the errors at the level of abstraction they are working with.

    public async Task&gt;&gt; AnalyzeAsync(
        IReadOnlyDictionary buffers
    )
    {
        var diagnostics = new Dictionary&gt;();

        foreach (var (path, buf) in buffers)
        {
            // ... check for syntax errors

            // Extract resource types with their API versions from the ARM template
            var resourceTypesWithVersions = buf.GetResourceTypes();

            // Compose minimal schema with only needed resource definitions
            var schema = await schemaComposer.ComposeSchemaAsync(
                schemaUrl,
                resourceTypesWithVersions
            );

        // ... handle the case where schema is null

            IList errors;
            var isValid = JToken.Parse(buf.Text).IsValid(schema, out errors);
            diagnostics[path] = GetLeafErrors(errors)
                .Where(e =&gt; !e.Message.Contains("Expected Object but got Array."))
                .Select(e =&gt; new Diagnostic
                {
                    Range = buf
                        .ConcreteTree.RootNode()
                        .NamedDescendantForPointRange(
                            new TSPoint
                            {
                                row = (uint)e.LineNumber - 1,
                                column = (uint)e.LinePosition - 1,
                            },
                            new TSPoint
                            {
                                row = (uint)e.LineNumber - 1,
                                column = (uint)e.LinePosition - 1,
                            }
                        )
                        .GetRange(),
                    Message = e.Message,
                    Severity = DiagnosticSeverity.Warning,
                })
                .ToList();
        }

        return diagnostics;
    }

5. Conclusion

Once all of the pieces are implemented and connected, the schema errors will show up in your editor. Here's how it looks in VS Code.

This post covered basics of how schema validation can be done for JSON files. This is by no means a production ready implementation, but given the complexities of validating the humongous schema web of ARM templates and the time I have on my hands, I feel this works good enough to get the point across. As we will see later, the schema files also provide us other information about the structures used in the file. In the next post, we'll see how to provide hover functionality using the schemas we just loaded.

Implementing a Language Server with Language Server Protocol - Parsing (Part 2)

samvidmistry — Sat, 14 Jun 2025 00:00:00 +0000

1. Introduction

In previous post, we covered the basics of LSP and how we can use C#-LSP to implement a language server that can communicate with a language client using LSP. The server had basic code to be able to receive and track changes to all buffers of the project through BufferManager. It can be very tricky to design programs that work with completely free flowing text. The resulting programs would also be very brittle. Hence we impose a certain structure on the text which makes it easier for us to write programs. JSON is one such structure which is used to write ARM templates. These structures are generally defined and described using a grammar. You can read this wikipedia entry for more information on programming language grammars. To make it easier and efficient for programs to understand and interpret the text in accordance with a grammar, we define lexers and parsers. The process of using a parser to interpret text is called parsing. In this article, we will use a parser for JSON language to parse ARM templates and do basic error checking. To follow along, checkout commit b794cd3 from Armls repository which has the changes described in this article.

2. TreeSitter

From the official documentation at

> Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited.

In other words, given the grammar of JSON language, TreeSitter can generate a parser for us that can (incrementally) parse JSON files. We won't be using the incremental parsing functionalities of TreeSitter in this series as we are asking for complete changed file from the language client. Another important feature of TreeSitter is that the parsers generated by TreeSitter are fault tolerant, i.e., the parser can recover from syntax errors in file and continue to parse the rest of the (valid) file according to the rules of the grammar. We will use TreeSitter to create concrete syntax trees for all files in the project. These trees are useful for all sorts of functionalities, like syntax checking, finding definitions and references, etc. TreeSitter is not the only parsing library out there, however, it is probably the most popular one. You can choose any parsing library that fits your needs (or write your own too), but it must be fault-tolerant to be able to identify all issues in a file in a single pass. Core API of TreeSitter is designed to work with an abstract concept of TSLanguage. Any language that provides a valid implementation of this structure can work with the library. For our usecase, we want to work with tree-sitter (GitHub) and tree-sitter-json (GitHub) repositories. Both come with a Makefile to easily generate a linkable library. Compilation on Windows might require MSYS2 and MinGW.

NOTE: Thoroughly covering the concepts of parsing is beyond the scope of this article. One can find various resources online that explain the concepts in various levels of depth. My AI agent says that Chapters 4-6 from the book Crafting Interpreters give an approachable introduction to the concepts of lexing and parsing.

2.1. C# Bindings

TreeSitter is a library written in pure C. To use TreeSitter with our language server written in C#, we need some sort of Foreign Function Invocation(FFI) feature. Thankfully C#, like all major high level languages, comes with the ability to interface with C libraries out of the box (P/Invoke). While one can directly call C functions from C# code, this becomes very verbose and awkward because of different design philosophies of 2 languages, C# being an object oriented language and C being a purely imperative one. Programmers generally rely on language bindings to effectively utilize features of systems outside of their choice of language. Language bindings expose the functionalities of outside system in a way that is consistent and idiomatic for the language you are working in. In this case, functionalities of C TreeSitter library, which works with structures and functions, will be exposed in a way that is idiomatic in C#, which is through classes and methods. TreeSitter homepage links the official bindings for various languages, including C#. However, the linked C# bindings are outdated and only designed to be compiled on Windows. We will write our own bindings to work around this limitation. This will allow our code to be written in such a way that it will work on all platforms that are supported by C# and TreeSitter. I have written a set of bindings for all required structures and methods for parsing and basic error checking in TreeSitter package in Armls. I encourage you to browse through the bindings to get an idea of different types of functionalities implemented. To get a deeper idea of the C functions used in the bindings, look at the definition of that TreeSitter API in api.h which I have checked in the repository for convenience.

3. Writing Bindings

The repository contains a set of bindings to interface with many constructs of TreeSitter. I will cover the concepts of writing C bindings in C# using the class TSQueryCursor.cs as it is short but touches virtually all of the required concepts.

using System.Runtime.InteropServices;

namespace Armls.TreeSitter;

public class TSQueryCursor
{
    internal readonly IntPtr cursor;

    internal TSQueryCursor(IntPtr cursor)
    {
        this.cursor = cursor;
    }

C is an imperative language that keeps the state and behaviors separate. Unlike OOP languages, the state lives in a struct while the behavior lives in a function independent of the struct. This requires us to pass the state explicitly to all functions, either through function parameters or through global variables. Virtually all methods in TreeSitter take in the state as the first parameter. In C#, generally the state is encapsulated and maintained by the objects themselves. Hence we are going to declare a pointer (IntPtr) to a cursor coming from C library as an instance variable in the class.

    [DllImport(
        "/Users/samvidmistry/projects/lsp/armls/tree-sitter/libtree-sitter.dylib",
        CallingConvention = CallingConvention.Cdecl
    )]
    private static extern bool ts_query_cursor_next_capture(
        IntPtr cursor,
        ref TSQueryMatchNative match,
        out uint capture_index
    );

Next we need to declare the signature of the native method that our class can invoke. We first use DllImport attribute on the method declaration to specify which dynamically linked library will provide an implementation for this function. We also specify the Calling Convention for the function, which just a set of rules around how to pass values to and receive values from unmanaged code. Next we declare the signature of the method. The method is defined as static to declare that these methods are not associated with any instance of this class and marked as extern to declare that the implementation for this method will be provided by some externally linked source. First parameter is an IntPtr, a signed integer value that has the same bit-width as a pointer, i.e., an IntPtr can be used to store and pass pointers to methods. Next we see ref keyword for second parameter. ref is a safe way to pass pointers to managed structures to unmanaged code. Their values can flow into unmanaged code and any changes to the value also reflect out into managed code. out works in the same way as ref except that changes can only flow out of unmanaged code to managed code. So there is no strict need to initialize this variable with any value in managed code.

    public bool Next(out TSQueryMatch? match)
    {
        TSQueryMatchNative matchNative = new();
        uint captureIndex = 0;
        if (ts_query_cursor_next_capture(cursor, ref matchNative, out captureIndex))
        {
            match = new TSQueryMatch(matchNative);
            return match.match.capture_count &gt; 0;
        }

        match = null;
        return false;
    }
}

Finally, we expose an idiomatic C# method on the class equivalent to the native function we intend to call in that method. You call the native function in your method, marshaling and unmarshaling the requests and responses so that any of the values being returned from the method are also valid C# objects. This knowledge should empower you to be able to read any of the bindings implemented in TreeSitter package. Moving forward in this article and series, I will directly use the C# bindings to work with the syntax trees without showing the underlying binding. I will cover the technicalities of the library C functions as and when needed.

4. Syntax Checking

4.1. Parsing

The most basic thing you can do using a parser for any language is to check if the provided text conforms to the grammar for that language. This is also referred to as Syntax Checking. TreeSitter API is designed to work independently from the language it is working with. The core logic of walking and manipulating the syntax trees lives in the TreeSitter repository, while mapping TreeSitter concepts to constructs in any particular language is outsourced to parsers generated from grammars. Generated parsers provide an implementation of TSLanguage struct which the rest of the code in TreeSitter works with in language independent way.

NOTE: I've omitted some details in the snippets like error handling and extern declarations. Look at the source files in GitHub repository for complete implementations.

In our case, we are working with JSON so let's first create a language class for JSON.

public static class TSJsonLanguage
{
    // ... extern declarations

    public static IntPtr Language()
    {
        return tree_sitter_json();
    }
}

This is a simple wrapper over the C function tree_sitter_json() provided by tree-sitter-json library to make it C#-like. TreeSitter Parser returns a TSTree of the parsed text that can be maniupated. So let's create a wrapper for that.

public class TSTree
{
    IntPtr tree;

    // ... extern declarations

    public TSTree(IntPtr tree)
    {
        this.tree = tree;
    }

    public TSNode RootNode()
    {
        return new TSNode(ts_tree_root_node(tree));
    }
}

Now a TSTree consists of a set of TSNode structs representing nodes in the concrete syntax tree of parsed text. Let's create a wrapper for that.

[StructLayout(LayoutKind.Sequential)]
internal struct TSNodeNative
{
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 4)]
    public uint[] context;
    public IntPtr id;
    public IntPtr tree;
}

We need a C# struct to represent the C struct needed to pass in various TreeSitter methods as state. C# allows methods to be added directly to this struct as well but we don't want the users of our package to manipulate the state for unmanaged code directly, so we will put another class wrapper over this native struct and expose sensible and safe methods.

public class TSNode
{
    internal readonly TSNodeNative node;

    // ... extern declarations

    internal TSNode(TSNodeNative nativeNode)
    {
        node = nativeNode;
    }

    public OmniSharp.Extensions.LanguageServer.Protocol.Models.Range GetRange()
    {
        var start = ts_node_start_point(node);
        var end = ts_node_end_point(node);

        return new OmniSharp.Extensions.LanguageServer.Protocol.Models.Range(
            new OmniSharp.Extensions.LanguageServer.Protocol.Models.Position(
                (int)start.row,
                (int)start.column
            ),
            new OmniSharp.Extensions.LanguageServer.Protocol.Models.Position(
                (int)end.row,
                (int)end.column
            )
        );
    }
}

The TSNode wrapper simply wraps a TSNodeNative struct and provides a convenience method to convert the bounds of a TSNodeNative to Range LSP type. Finally, we are ready to define a wrapper for the parser.

public class TSParser
{
    private IntPtr parser;

    // ... extern declarations

    public TSParser(IntPtr language)
    {
        parser = ts_parser_new();
        bool success = ts_parser_set_language(parser, language);
    }

    public TSTree ParseString(string text)
    {
        var (nativeText, length) = Utils.GetUnmanagedUTF8String(text);
        return new TSTree(
            ts_parser_parse_string_encoding(
                parser,
                IntPtr.Zero,
                nativeText,
                length,
                TSInputEncoding.TSInputEncodingUTF8
            )
        );
    }
}

TSParser simply takes a pointer to a TSLanguage, creates an instance of native parser struct and sets the language on it. It exposes a method to parse a string using the language used to construct the parser and returns a TSTree instance, holding the syntax tree of the parsed text.

4.2. Finding Errors

Whenever TreeSitter runs into a node that is faulty as per the definition of the grammar, it flags that position and the surrounding faulty area by adding an (ERROR) node in the tree. We can find these error nodes and give their bounds to the language client to highlight syntax errors in the editor. To find a node with any particular structure, we need to walk the tree. Let's create some classes and bindings for tree traversal. TreeSitter uses a query language that describes the structure to match in Lisp notation. When a query is executed on a tree, it returns a mutable cursor struct that stores the relevant state for that search. You can incrementally advance this cursor to iterate through all matches. A query can have zero or more captures. A capture, just like regular expressions, binds to the text associated with a matched node. With that terminology out of the way, let's create bindings for a query match. This is what we will get from the cursor as it walks the tree finding nodes which match the query pattern.

[StructLayout(LayoutKind.Sequential)]
internal struct TSQueryMatchNative
{
    public uint id;
    public ushort pattern_index;
    public ushort capture_count;
    public IntPtr captures; // Pointer to TSQueryCapture array
}

[StructLayout(LayoutKind.Sequential)]
internal struct TSQueryCaptureNative
{
    public TSNodeNative node;
    public uint index;
}

public class TSQueryMatch
{
    internal readonly TSQueryMatchNative match;

    internal TSQueryMatch(IntPtr queryMatchPtr)
    {
        match = Marshal.PtrToStructure(queryMatchPtr);
    }

    internal TSQueryMatch(TSQueryMatchNative nativeMatch)
    {
        match = nativeMatch;
    }

    public ICollection Captures()
    {
        var capturesList = new List();
        var count = match.capture_count;
        var capturesPtr = match.captures;

        int size = Marshal.SizeOf();
        for (int i = 0; i &lt; count; i++)
        {
            var capturePtr = IntPtr.Add(capturesPtr, i * size);
            var nativeCapture = Marshal.PtrToStructure(capturePtr);
            capturesList.Add(new TSNode(nativeCapture.node));
        }

        return capturesList;
    }
}

Above code defines a couple of internal C# structs to correspond to C structs from the library. TSQueryMatchNative describes a match struct and TSQueryCaptureNative describes a particular capture within the query match. We expose a straightforward Captures() method from the C# wrapper which returns a collection of nodes, one corresponding to each capture.

NOTE: Returning an ICollection from Captures() is not ideal. ICollection is not indexable. This makes it impossible to find out which item in the collection matches which capture. IList would have been better. It works fine for now because we are capturing a single item in our query but we will have to update it in the future if we search for a query with multiple captures.

Definition of TSQueryMatch makes it very straightforward to define a TSQueryCursor which just iterates over the native cursor objects and returns the matches.

public class TSQueryCursor
{
    internal readonly IntPtr cursor;

    // ... extern declarations

    internal TSQueryCursor(IntPtr cursor)
    {
        this.cursor = cursor;
    }

    public bool Next(out TSQueryMatch? match)
    {
        TSQueryMatchNative matchNative = new();
        uint captureIndex = 0;
        if (ts_query_cursor_next_capture(cursor, ref matchNative, out captureIndex))
        {
            match = new TSQueryMatch(matchNative);
            return match.match.capture_count &gt; 0;
        }

        match = null;
        return false;
    }
}

We expose a method Next which iterates over the matches, binds a match to the out parameter to be consumed and returns false when it runs out of matches. This method makes it very convenient to consume results in a while loop. Finally we can define TSQuery which simply takes a query and a language and executes the query on a node, returning the cursor for iteration over matches.

public class TSQuery
{
    private IntPtr query;

    // ... extern declarations

    public TSQuery(string queryString, IntPtr language)
    {
        var (nativeQuery, length) = Utils.GetUnmanagedUTF8String(queryString);
        uint errorOffset;
        int errorType;
        query = ts_query_new(language, nativeQuery, length, out errorOffset, out errorType);
    }

    public TSQueryCursor Execute(TSNode node)
    {
        var cursor = new TSQueryCursor(ts_query_cursor_new());
        ts_query_cursor_exec(cursor.cursor, query, node.node);
        return cursor;
    }
}

4.3. Putting it all together

Now that we know how to parse text files into syntax trees and how to walk a syntax tree to find nodes matching a query, we can

Parse an ARM template
Query for errors
Walk the tree with a cursor
Publish the locations of (ERROR) nodes to language client for highlighting in editor

4.3.1. `Analyzer`

Let's create an Analyzer class that will analyze our buffers. Syntax checking is only one kind of analysis that we can do on a buffer. We can add more and more analyses like looking for missing variables, looking for missing resources, providing linting warnings and highlighting best practices, etc. All of it can be added to the analyzer, which will return a collection of diagnostics that the editor can highlight, simplifying our sync handler.

public class Analyzer
{
    private readonly TSQuery errorQuery;

    public Analyzer(TSQuery errorQuery)
    {
        this.errorQuery = errorQuery;
    }

    public IDictionary&gt; Analyze(
        IReadOnlyDictionary buffers
    )
    {
        return buffers
            .Select(kvp =&gt; new KeyValuePair&gt;(
                kvp.Key,
                AnalyzeBuffer(kvp.Value)
            ))
            .ToDictionary(kvp =&gt; kvp.Key, kvp =&gt; kvp.Value);
    }

    private IEnumerable AnalyzeBuffer(Buffer.Buffer buf)
    {
        IEnumerable diagnostics = new List();
        var cursor = errorQuery.Execute(buf.ConcreteTree.RootNode());
        while (cursor.Next(out TSQueryMatch? match))
        {
            diagnostics = match!
                .Captures()
                .Select(n =&gt; new Diagnostic()
                {
                    Range = n.GetRange(),
                    Severity = DiagnosticSeverity.Error,
                    Source = "armls",
                    Message = "Syntax error",
                })
                .Concat(diagnostics);
        }

        return diagnostics;
    }
}

Analyzer simply takes a dictionary of buffers and analyzes them. Corresponding to each buffer, it produces a collection of diagnostics.

4.3.2. Sync Handler

In our TextDocumentSyncHandler, which receives updates to all text files, we will have an instance of a TSParser to re-parse the files as they change and an instance of Analyzer to analyze the changed files.

public class TextDocumentSyncHandler : TextDocumentSyncHandlerBase
{
    private readonly BufferManager bufManager;
    private readonly ILanguageServerFacade languageServer;
    private readonly TSParser parser;            // newly added
    private readonly Analyzer.Analyzer analyzer; // newly added

    public TextDocumentSyncHandler(BufferManager manager,
                                   ILanguageServerFacade languageServer)
    {
        bufManager = manager;
        parser = new TSParser(TSJsonLanguage.Language());  // initialize with JSON language
        this.languageServer = languageServer;

        // initialize with an error query
        analyzer = new Analyzer.Analyzer(new TSQuery(@"(ERROR) @error",
            TSJsonLanguage.Language())); 
    }

    // ...
}

@error next to our (ERROR) node in a TSQuery tells TreeSitter to put the information about (ERROR) node in that capture. Next we define a utility method to analyze buffers.

public class TextDocumentSyncHandler : TextDocumentSyncHandlerBase
{
    // ...
    private void AnalyzeWorkspace()
    {
        var diagnostics = analyzer.Analyze(bufManager.GetBuffers());

        foreach (var buf in diagnostics)
        {
            languageServer.SendNotification(
                new PublishDiagnosticsParams() { Uri = buf.Key,
                    Diagnostics = buf.Value.ToList() }
            );
        }
    }

    // ...
}

Note that we need to publish the diagnostics for each file independently to the server. It would have been fine to analyze only a single opened or changed file and publish diagnostics but we are analyzing all the buffers here. This is fine as long as the performance is acceptable. Finally, we call this method whenever a new file is opened in the editor or an already open file changes.

public class TextDocumentSyncHandler : TextDocumentSyncHandlerBase
{
    // ...

    public override Task Handle(
        DidOpenTextDocumentParams request,
        CancellationToken cancellationToken
    )
    {
        bufManager.Add(request.TextDocument.Uri,
            CreateBuffer(request.TextDocument.Text));

        AnalyzeWorkspace();  // Analyze

        return Unit.Task;
    }

    public override Task Handle(
        DidChangeTextDocumentParams request,
        CancellationToken cancellationToken
    )
    {
        var text = request.ContentChanges.FirstOrDefault()?.Text;
        if (text is not null)
        {
            bufManager.Add(request.TextDocument.Uri, CreateBuffer(text));
            AnalyzeWorkspace();  // Analyze
        }

        return Unit.Task;
    }

    // ...
}

Running this in VSCode will look like this. I have removed the comma on line 11. You can see that the first error is coming from armls, with VSCode also running its own analysis and reporting the errors.

5. Conclusion

In this relatively long post, we learned many new concepts. In summary we learned to

Create C# bindings for C library
How to parse a block of text using a TreeSitter parser
How to query a syntax tree and how to process the matches
How to publish diagnostics to the editor/language client

Querying (ERROR) nodes from a tree does not cover all types of errors. More specifically, error nodes do not highlight locations where the parser was able to recover from the failure by adding a missing token. Those locations are marked with a (MISSING) node in the tree as described here. It will be a good exercise to implement support for missing nodes to this project. As I showed, it becomes very easy to work with the syntax trees once you have the TreeSitter bindings in place. In future posts, we will exercise our newfound power to walk the trees and extract information about nodes to implement richer editing experiences.

Implementing a Server with Language Server Protocol (Part 1)

samvidmistry — Tue, 20 May 2025 00:00:00 +0000

1. Introduction

Language Server Protocol (LSP) is the de facto standard for providing rich editor experience these days. Given its popularity, surprisingly little content can be found on the internet about how to implement your own language server from scratch. Even when the material does exist, it either only talks about the specification itself with no implementation, or it implements a very basic server that provides almost no useful functionality. This series of posts is going to be my attempt to fill this void. In this series of posts, I will implement a simple language server for Azure Resource Manager templates. I will try to utilize maximum number of LSP features that make sense for this case. This first post is going to set some context about what we will be doing and what technologies we will be using. Full source code of this server is available in this repository.

2. Brief Introduction to Language Server Protocol (LSP)

Roughly, LSP is a protocol to mediate the communication between an editor and a language server, over JSON-RPC. A language server implements the language server protocol to provide rich editing experience for some set of related files, such as a C# project, or a Ruby project. By implementing the language server protocol, a single LSP server can provide rich editing to all editors that support LSP and an editor that supports LSP can support all language servers.

LSP describes the set of messages clients and servers are supposed to exchange, along with the data required to be in those messages, to provide various features. LSP supports a wide range of features, some of which (in layman terms) are:

Hovering on symbols
Providing auto-complete suggestions
Jumping to definition
Finding references
… and many more

2.1. An Exchange between Client and Server

Let's see an example of an exchange between a client and a server to implement the Hover functionality.

In this figure above, we can see an interaction between the language client and the language server. As the user brings their cursor over a variable name, named noOfCols, the language client will construct a HoverParams struct, filling out the relevant information about where exactly in the file the cursor is pointing. It will then call the method textDocument/Hover on the language client through JSON-RPC, passing HoverParams struct as argument. The language server takes that position and maps it to a token that is currently under the cursor. It then looks up the information about that token and sends a HoverResult struct containing the hover information, like the type of the variable and the documentation about what that variable refers to. The language client can choose to display this information however it is configured, like as a tooltip or in the echo area. This is more or less how all features in LSP work.

In most cases, you will not have to implement the specifics of the protocol by yourself. Official website for LSP contains a list of SDKs for a variety of languages here. You can use the SDK for your preferred language and let the SDK handle the communication with the language client. The library will expose clean functions/bindings relating to all functionalities offered by LSP that you will implement to provide the functionalities for your particular technology, in this case ARM templates. I will be using C# and LSP library from OmniSharp project to implement the language server. In order to follow along with this post, you can clone the Armls repository and checkout commit with ID b794cd3. armls directory contains the code for the language server, while armls-ext contains the code and VSIX for a VS Code extension that can utilize Armls. In order to use the extension, you will have to install the VSIX as explained here and set the path to compiled armls binary.

3. Bird's Eye View of Armls

Armls comprises of 3 components primarily:

C#-LSP library to handle interactions with language client
TreeSitter to parse ARM template JSON
Domain knowledge of ARM templates to implement LSP functionalities

The choice of language here is somewhat arbitrary and mostly dependent on what you are comfortable with. LSP SDKs are available for a large number of languages and they all provide the same functionality, idiomatic to the patterns in that language. You are relatively constrained in the choice of a parser though. It is critical that whatever parser you use (or write yourself) is fault tolerant. As the user writes code and modifies the file, the syntax tree is bound to have errors in it. Your parser must be comfortable parsing faulty and incomplete code to be able to highlight errors. Lastly, you will need to domain knowledge of the technology you are providing rich editing experience for.

4. C# Project

Start by creating a blank C# project for Armls by opening your terminal and running something like:

dotnet new console --name armls

Edit armls.csproj to add dependencies for C#-LSP library and Microsoft's popular dependency injection library which we will use to cleanly inject dependencies in our handlers.

5. Minimum Viable Server

Add the following to your Program.cs:

public static void Main()
{
    MainAsync().Wait();
}

private static async Task MainAsync()
{
    var server = await LanguageServer.From(options =&gt;
            options
                .WithInput(Console.OpenStandardInput())
                .WithOutput(Console.OpenStandardOutput())
    );

    await server.WaitForExit;
}

This code simply creates a language server using the APIs from C#-LSP and connects the Standard IO of console application as IO streams for the language server. Believe it or not, you just created your own language server. This is all it takes to create a language server that does absolutely nothing. C#-LSP exposes various base classes for Handlers that provide functionalities for rich editing. There's TextDocumentSyncHandlerBase for handling the file change notifications coming from language client. There's CompletionHandlerBase to provide completion candidates. And so on.

6. Managing Buffers

A buffer, for this discussion, roughly refers to a file, either open in the editor or on the file system. Language servers cannot only rely on the files on the file system because the servers need to provide diagnostics, like errors and warnings, for changes that haven't been saved yet. Hence LSP clients convey all changes made to a file to the server. We need to cache these changes in the server to be able to run analysis on them. To that end, we will create a class called BufferManager that is responsible to carry the latest state of all buffers.

public class BufferManager
{
    private readonly IDictionary buffers;

    public BufferManager()
    {
        buffers = new ConcurrentDictionary();
    }

    public void Add(DocumentUri uri, Buffer buf)
    {
        Add(uri.GetFileSystemPath(), buf);
    }

    public void Add(string path, Buffer buf)
    {
        buffers[path] = buf;
    }

    public IReadOnlyDictionary GetBuffers()
    {
        return buffers.AsReadOnly();
    }
}

BufferManager contains a simple dictionary that maps a path to an instance of Buffer class. Buffer is a very simple class that just has the text of a buffer, for now. Overtime, it will grow to cache all information related to a buffer, like the concrete syntax tree of the parsed text.

public class Buffer
{
    public string Text;

    public Buffer(string text)
    {
        Text = text;
    }
}

7. Text Document Synchronization

In order to sync with all the text changes happening inside the editor, we need to provide an implementation of ITextDocumentSyncHandler. The interface provides various callbacks received from the editor about what the user is doing.

public interface ITextDocumentSyncHandler
{
    public abstract TextDocumentAttributes GetTextDocumentAttributes(DocumentUri uri);
    public abstract Task Handle(DidOpenTextDocumentParams request, CancellationToken cancellationToken);
    public abstract Task Handle(DidChangeTextDocumentParams request, CancellationToken cancellationToken);
    public abstract Task Handle(DidSaveTextDocumentParams request, CancellationToken cancellationToken);
    public abstract Task Handle(DidCloseTextDocumentParams request, CancellationToken cancellationToken);
}

You can extend a base implementation provided by C#-LSP which handles some boilerplate, named TextDocumentSyncHandlerBase.

public class TextDocumentSyncHandler : TextDocumentSyncHandlerBase
{
    private readonly BufferManager bufManager;
    private readonly ILanguageServerFacade languageServer;

    public TextDocumentSyncHandler(BufferManager manager,
                                   ILanguageServerFacade languageServer)
    {
        bufManager = manager;
        this.languageServer = languageServer;
    }

    // ...
}

To start with, the sync handler will need access to the BufferManager to cache all the changes we will receive from language client. We will also get an instance of ILanguageServerFacade which, among other things, is the interface to communicate with the language client.

public class TextDocumentSyncHandler : TextDocumentSyncHandlerBase
{
    // ...
    public override TextDocumentAttributes GetTextDocumentAttributes(DocumentUri uri)
    {
        // Language ID of json and jsonc are just their names
        // which are also the extensions of the files.
        return new TextDocumentAttributes(uri, Path.GetExtension(uri.Path));
    }

    protected override TextDocumentSyncRegistrationOptions CreateRegistrationOptions(
        TextSynchronizationCapability capability,
        ClientCapabilities clientCapabilities
    )
    {
        return new TextDocumentSyncRegistrationOptions(TextDocumentSyncKind.Full);
    }

    private Buffer.Buffer CreateBuffer(string text)
    {
        return new Buffer(text);
    }
    // ...
}

We then implement GetTextDocumentAttributes which is supposed to provide some information about the file. We just provide the URI to the document as well as the language ID. We override CreateRegistrationOptions where we note that we want to get the full content of the file with every change, instead of just getting the changed region. We also create a utility method to create an instance of Buffer from the given text of the file.

public class TextDocumentSyncHandler : TextDocumentSyncHandlerBase
{
    // ...
    public override Task Handle(
        DidOpenTextDocumentParams request,
        CancellationToken cancellationToken
    )
    {
        bufManager.Add(request.TextDocument.Uri, CreateBuffer(request.TextDocument.Text));
        return Unit.Task;
    }

    public override Task Handle(
        DidChangeTextDocumentParams request,
        CancellationToken cancellationToken
    )
    {
        var text = request.ContentChanges.FirstOrDefault()?.Text;
        if (text is not null)
        {
            bufManager.Add(request.TextDocument.Uri, CreateBuffer(text));
        }
        return Unit.Task;
    }
    // ...
}

We then override the callbacks we get from the language client whenever a new document is opened (DidOpenTextDocumentParams) and whenever an open document is changed (DidChangeTextDocumentParams). In both cases, we simply get the latest content of the file and cache in our BufferManager to be analyzed. We don't need to do anything on document save and document close so we won't override those methods.

8. Activating the Handler

Finally we need to add the handler to the language server for the server to utilize it. We do it by injecting it during the construction of the language server.

var server = await LanguageServer.From(options =&gt;
            options
                .WithInput(Console.OpenStandardInput())
                .WithOutput(Console.OpenStandardOutput())
                .WithServices(s =&gt; s.AddSingleton(new BufferManager()))
                .WithHandler()
);

9. Conclusion

At this point, you have created a basic language server that will be able to receive all text changes from the editor and cache it for analysis. That's all for this post. We will cover how to parse and analyze the text we just stored in our BufferManager in the next post.

DEV Community: samvidmistry

Implementing a Language Server with Language Server Protocol - Basic Completion (Part 5)

1. Introduction

2. Basic Completion

3. Conclusion

Setting up GoatCounter on my Homelab

1. Introduction

2. Plausible

3. GoatCounter

3.1. Installing GoatCounter

3.2. Setting up reverse proxy

3.3. Setting up JavaScript

3.4. Skipping your own views

4. Conclusion

Implementing a Language Server with Language Server Protocol - Hover (Part 4)

1. Introduction

2. Path to Root

3. Schema Traversal

4. Hover Handler

5. Conclusion

Implementing a Language Server with Language Server Protocol - Schema (Part 3)

1. Introduction

2. JSON Schema

3. ARM Template Schema

3.1. Downloading the schemas

3.2. Using azure-resource-manager-schemas

3.3. Shipping Schemas

3.4. Writing Your Own Schema

4. Code Walkthrough

4.1. Buffer

4.2. MinimalSchemaComposer

4.3. Analyzer

5. Conclusion

Implementing a Language Server with Language Server Protocol - Parsing (Part 2)

1. Introduction

2. TreeSitter

2.1. C# Bindings

3. Writing Bindings

4. Syntax Checking

4.1. Parsing

4.2. Finding Errors

4.3. Putting it all together

4.3.1. Analyzer

4.3.2. Sync Handler

5. Conclusion

Implementing a Server with Language Server Protocol (Part 1)

1. Introduction

2. Brief Introduction to Language Server Protocol (LSP)

2.1. An Exchange between Client and Server

3. Bird's Eye View of Armls

4. C# Project

5. Minimum Viable Server

6. Managing Buffers

7. Text Document Synchronization

8. Activating the Handler

9. Conclusion

3.2. Using `azure-resource-manager-schemas`

4.1. `Buffer`

4.2. `MinimalSchemaComposer`

4.3. `Analyzer`

4.3.1. `Analyzer`