DEV Community

Cover image for Legal Research is Changing: How AI Makes Your C# Applications Smarter
Matěj Štágl
Matěj Štágl

Posted on

Legal Research is Changing: How AI Makes Your C# Applications Smarter

Legal Research is Changing: How AI Makes Your C# Applications Smarter

C#

Last weekend, I found myself curious about how legal professionals could leverage AI to cut through mountains of case law and legal documents. After spending some time building a legal research assistant proof-of-concept, I realized we're at an interesting inflection point. AI is fundamentally transforming how legal research happens, and as developers, we have the tools to build these solutions ourselves.

The Shift From "Should We?" to "How Do We?"

Here's what struck me: in 2025, firms aren't asking whether to use AI anymore. They're asking how to integrate it effectively.

Tools like Microsoft Copilot, Casetext, and Lexis+ have proven that AI can:

  • Automate document reviews in minutes instead of days
  • Suggest relevant case law based on context, not just keywords
  • Perform compliance monitoring with impressive accuracy
  • Predict legal outcomes using historical data

But these are often black-box solutions with limited customization. What if you need something tailored to your firm's specific workflows? That's where building your own integration in C# becomes compelling.

Getting Started: Your First AI Legal Research Assistant

Before diving into code, let's install what we need:

dotnet add package LlmTornado
dotnet add package LlmTornado.Agents
Enter fullscreen mode Exit fullscreen mode

Here's a basic legal research agent I threw together to demonstrate the concept:

using System;
using System.Linq;
using System.Threading.Tasks;
using LlmTornado;
using LlmTornado.Agents;
using LlmTornado.Chat;
using LlmTornado.Chat.Models;

// Initialize the API client
var api = new TornadoApi("your-api-key");

// Create a specialized legal research agent
var legalAgent = new TornadoAgent(
    client: api,
    model: ChatModel.OpenAi.Gpt4.Turbo,
    name: "LegalResearcher",
    instructions: @"You are a legal research assistant specializing in case law analysis.
        When analyzing cases, always:
        1. Identify relevant precedents
        2. Extract key legal principles
        3. Note jurisdictional considerations
        4. Cite sources accurately
        Provide detailed, professional responses suitable for legal professionals."
);

// Run a research query
var result = await legalAgent.Run(
    "Find cases related to software patent infringement where APIs were involved"
);

Console.WriteLine(result.Messages.Last().Content);
Enter fullscreen mode Exit fullscreen mode

This creates an agent with legal-domain expertise baked into its instructions. The agent understands context and maintains conversation history, which is crucial when you're iterating through research questions.

Real-World Pattern: Document Analysis with Streaming

When I started working with legal documents, I quickly learned they're often massive. Hundreds of pages of discovery, contracts, or case files.

Users want to see progress, not stare at a spinner for minutes. Tools like Lexis+ have shown that streaming responses dramatically improve user experience.

Here's how I implemented streaming document analysis:

using System;
using System.IO;
using System.Threading.Tasks;
using LlmTornado;
using LlmTornado.Agents;
using LlmTornado.Chat;
using LlmTornado.Chat.Models;
using LlmTornado.Agents.DataModels;

var api = new TornadoApi("your-api-key");

var documentAnalyzer = new TornadoAgent(
    client: api,
    model: ChatModel.OpenAi.Gpt4.Turbo,
    name: "DocumentAnalyzer",
    instructions: "Analyze legal documents for key clauses, risks, and compliance issues.",
    streaming: true
);

// Streaming callback to display results as they arrive
async ValueTask StreamHandler(AgentRunnerEvents runEvent)
{
    if (runEvent is AgentRunnerStreamingEvent streamingEvent)
    {
        if (streamingEvent.ModelStreamingEvent is ModelStreamingOutputTextDeltaEvent deltaEvent)
        {
            Console.Write(deltaEvent.DeltaText);
        }
    }
}

string contractText = await File.ReadAllTextAsync("employment_contract.txt");

Console.Write("Analysis: ");
var analysis = await documentAnalyzer.Run(
    $"Analyze this employment contract for potential liability issues:\n\n{contractText}",
    streaming: true,
    onAgentRunnerEvent: StreamHandler
);
Enter fullscreen mode Exit fullscreen mode

The streaming approach writes tokens as they're generated. This gives users immediate feedback. This matters when analyzing complex documents—it feels responsive rather than frozen.

Semantic Search Over Case Law

One pattern I found especially powerful is combining embeddings with vector databases for semantic search.

According to legal tech analysis, AI-powered legal research tools can analyze millions of documents and surface the most relevant information with impressive accuracy. This is exactly what semantic search enables.

What's semantic search? Instead of matching keywords, it understands meaning. When you search for "API copyright cases," it finds relevant precedents even if they use different terminology like "application programming interface intellectual property disputes."

Here's a practical implementation using a vector database for case law retrieval:

using System;
using System.Linq;
using System.Threading.Tasks;
using LlmTornado;
using LlmTornado.Embedding;
using LlmTornado.Embedding.Models;
using LlmTornado.VectorDatabases;
using LlmTornado.VectorDatabases.Faiss.Integrations;

var api = new TornadoApi("your-api-key");

// Initialize an in-memory vector database (perfect for proof-of-concepts)
var vectorDb = new FaissVectorDatabase(
    indexDirectory: "./legal_cases_index",
    vectorDimension: 1536
);

await vectorDb.InitializeCollection("case_law");

// Index some case summaries
var caseSummaries = new[]
{
    "Smith v. Jones (2023): Court ruled APIs are copyrightable as creative works.",
    "Tech Corp v. StartupCo (2022): Fair use defense applied to API reimplementation.",
    "Oracle v. Google (2021): Supreme Court found Google's use of Java APIs was fair use."
};

// Create embeddings and store them
for (int i = 0; i < caseSummaries.Length; i++)
{
    var embeddingResult = await api.Embeddings.CreateEmbedding(
        EmbeddingModel.OpenAi.Gen3.Small,
        caseSummaries[i]
    );

    var embedding = embeddingResult?.Data.FirstOrDefault()?.Embedding;

    if (embedding != null)
    {
        await vectorDb.AddDocumentsAsync(new[]
        {
            new VectorDocument(
                id: $"case_{i}",
                content: caseSummaries[i],
                embedding: embedding
            )
        });
    }
}

// Semantic search query
string query = "case law about software copyright and fair use";
var queryEmbedding = await api.Embeddings.CreateEmbedding(
    EmbeddingModel.OpenAi.Gen3.Small,
    query
);

var results = await vectorDb.QueryByEmbeddingAsync(
    embedding: queryEmbedding.Data.First().Embedding,
    topK: 3,
    includeScore: true
);

Console.WriteLine("Most relevant cases:");
foreach (var doc in results)
{
    Console.WriteLine($"- {doc.Content} (Relevance: {doc.Score:F4})");
}
Enter fullscreen mode Exit fullscreen mode

This isn't just keyword matching—it understands conceptual similarity. A paralegal searching for "API copyright cases" gets relevant precedents even when they use different words.

Adding Tools for Research Workflows

AI

What really sold me on this approach was seeing how easily you could add domain-specific tools.

The 2025 legal tech reports emphasize that successful AI integration depends on how well it connects with existing workflows.

Here's an agent with tools for case citation verification and statute lookup:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Linq;
using System.Threading.Tasks;
using LlmTornado;
using LlmTornado.Agents;
using LlmTornado.Chat.Models;

var api = new TornadoApi("your-api-key");

// Define research tools as delegates
[Description("Verify if a case citation is valid and retrieve its details")]
string VerifyCitation(
    [Description("The case citation to verify, e.g., '123 F.3d 456'")] string citation)
{
    // In production, this would call a legal database API
    return $"Citation {citation} is valid. Case: Smith v. Jones (2023)";
}

[Description("Look up the full text of a statute by code")]
string LookupStatute(
    [Description("The statute code, e.g., '17 U.S.C. § 102'")] string statuteCode)
{
    // In production, this would query a statute database
    return $"17 U.S.C. § 102: Subject matter of copyright: In general";
}

// Create agent with tools
var researchAgent = new TornadoAgent(
    client: api,
    model: ChatModel.OpenAi.Gpt4.Turbo,
    name: "ResearchAgent",
    instructions: "You are a legal research assistant. Use the provided tools to verify citations and look up statutes.",
    tools: new List<Delegate> { VerifyCitation, LookupStatute }
);

var response = await researchAgent.Run(
    "Verify the citation 123 F.3d 456 and look up 17 U.S.C. § 102"
);

Console.WriteLine(response.Messages.Last().Content);
Enter fullscreen mode Exit fullscreen mode

The agent automatically decides when to call these tools based on the query. It's surprisingly intelligent about understanding when it needs additional information versus when it can answer directly.

Structured Output for Contract Analysis

One challenge I ran into: legal workflows often need specific output formats for downstream systems.

You might need:

  • JSON for your case management software
  • Structured data for compliance reporting
  • Consistent formats for integration with existing tools

Here's how I solved that with structured outputs:

using System;
using System.ComponentModel;
using System.Linq;
using System.Threading.Tasks;
using LlmTornado;
using LlmTornado.Agents;
using LlmTornado.Chat;
using LlmTornado.Chat.Models;
using LlmTornado.Code;

// Define the output schema
[Description("Analysis of a legal contract")]
public struct ContractAnalysis
{
    [Description("Overall risk level: Low, Medium, or High")]
    public string RiskLevel { get; set; }

    [Description("List of identified risk factors")]
    public ContractRisk[] Risks { get; set; }

    [Description("Recommended actions to mitigate risks")]
    public string[] Recommendations { get; set; }
}

[Description("A specific risk identified in the contract")]
public struct ContractRisk
{
    [Description("Description of the risk")]
    public string Description { get; set; }

    [Description("Severity: Low, Medium, or High")]
    public string Severity { get; set; }

    [Description("Clause or section where risk was found")]
    public string Location { get; set; }
}

var api = new TornadoApi("your-api-key");

var contractAgent = new TornadoAgent(
    client: api,
    model: ChatModel.OpenAi.Gpt4.Turbo,
    instructions: "Analyze contracts for legal risks and compliance issues.",
    outputSchema: typeof(ContractAnalysis)
);

string contract = "Sample NDA with unlimited liability clause...";

var result = await contractAgent.Run(
    $"Analyze this contract:\n\n{contract}"
);

// Parse the structured output
var analysis = result.Messages.Last().Content.JsonDecode<ContractAnalysis>();

Console.WriteLine($"Risk Level: {analysis.RiskLevel}");
Console.WriteLine("\nIdentified Risks:");
foreach (var risk in analysis.Risks)
{
    Console.WriteLine($"- [{risk.Severity}] {risk.Description}");
    Console.WriteLine($"  Location: {risk.Location}");
}
Enter fullscreen mode Exit fullscreen mode

The agent now returns perfectly structured JSON matching your schema. This makes integration with existing legal software much cleaner—no more parsing freeform text responses.

Persistent Conversations for Complex Research

Legal research is rarely a one-shot question. You're iterating, following threads, asking follow-ups. I needed conversation persistence:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using LlmTornado;
using LlmTornado.Agents;
using LlmTornado.Chat;
using LlmTornado.Chat.Models;
using LlmTornado.Code;

var api = new TornadoApi("your-api-key");

var agent = new TornadoAgent(
    client: api,
    model: ChatModel.OpenAi.Gpt4.Turbo,
    instructions: "You are a legal research assistant with expertise in intellectual property law."
);

// Start a research session
Console.WriteLine("Starting research session...");
var conversation = await agent.Run("What's the test for software patent eligibility?");

// Continue with follow-ups
conversation = await agent.Run(
    "Can you cite specific cases?",
    appendMessages: conversation.Messages.ToList()
);

conversation = await agent.Run(
    "How has that test evolved since 2015?",
    appendMessages: conversation.Messages.ToList()
);

// Save the entire research session
conversation.Messages.ToList().SaveConversation("research_session.json");

// Later, resume the session
List<ChatMessage> savedMessages = new List<ChatMessage>();
await savedMessages.LoadMessagesAsync("research_session.json");

conversation.LoadConversation(savedMessages);
conversation = await agent.Run(
    "Based on what we discussed, how would this apply to AI-generated code?",
    appendMessages: conversation.Messages.ToList()
);
Enter fullscreen mode Exit fullscreen mode

This pattern lets you build multi-session research workflows. A paralegal could start research on Monday, save their progress, and pick up exactly where they left off on Wednesday.

What I Learned

Building AI-powered legal research tools isn't as daunting as it sounds. The core patterns are straightforward once you understand them:

  • Semantic search for finding relevant cases
  • Tool integration for connecting to legal databases
  • Streaming responses for better user experience
  • Structured outputs for system integration

What matters most isn't the AI model itself. It's how well you integrate it into existing legal workflows.

The success stories from 2025 all share this trait: they didn't just bolt AI onto their stack. They deeply integrated it with how legal professionals actually work.

For more examples and implementation details, check the LlmTornado repository on GitHub.

Glossary of Key Terms

Semantic Search: Search based on meaning and context rather than exact keyword matching. Uses embeddings to understand conceptual similarity.

Embeddings: Numerical representations of text that capture semantic meaning. This enables similarity comparisons between documents.

Vector Database: A database optimized for storing and querying high-dimensional embeddings. This enables fast similarity search.

Streaming Response: Delivering AI-generated content incrementally as it's produced, rather than waiting for the complete response.

Tool Calling: The ability for an AI agent to invoke external functions or APIs to retrieve information or perform actions.

Structured Output: Constraining AI responses to match a specific schema (like JSON). This ensures predictable, parseable results.

Agent: An AI system with persistent instructions and context that can use tools and maintain conversation history.

Further Reading

Top comments (0)