DEV Community

Cover image for Extract Structured Data from Car Listings Using AI in .NET 10
Rahul Kumar Jha
Rahul Kumar Jha

Posted on

Extract Structured Data from Car Listings Using AI in .NET 10

πŸš— From Chaos to Structure: Extracting Car Listings with AI

Ever struggled with parsing messy car listings from different sources? Imagine turning this:

"Check out this stylish Honda City 2018 model for sale, clocked only 30,000 km! Single owner, showroom condition, insurance valid. Yours for just β‚Ή6.5 lakh."

Into this:

{
  "Make": "Honda",
  "Model": "City",
  "Year": 2018,
  "Mileage": 30000,
  "Price": 6.5,
  "AvailabilityType": "Sale",
  "Features": ["Single owner", "Showroom condition", "Insurance valid"],
  "OwnerCount": 1
}
Enter fullscreen mode Exit fullscreen mode

Let me show you how to build this in under 100 lines of C# using GitHub Models API! πŸš€

🎯 The Problem

Car listings come in all shapes and sizes. Whether you're building a price comparison site, marketplace aggregator, or inventory management system, you need to:

  • Extract key details (make, model, year, mileage, price)
  • Handle different formats (sale, lease, rent)
  • Deal with missing information gracefully
  • Process data at scale

Manually parsing this is tedious. Let AI do the heavy lifting! πŸ’ͺ

πŸ› οΈ The Solution

We'll use:

  • GitHub Models - Free access to powerful AI models
  • Microsoft.Extensions.AI - Unified AI abstraction for .NET
  • .NET 10 - Latest and greatest

πŸ“¦ Quick Setup

First, grab your free GitHub token from GitHub Models (no credit card needed!).

Create a new console app:

dotnet new console -n TextExtraction
cd TextExtraction
dotnet add package Microsoft.Extensions.AI.OpenAI
dotnet add package Microsoft.Extensions.Configuration.UserSecrets
Enter fullscreen mode Exit fullscreen mode

Store your token securely:

dotnet user-secrets init
dotnet user-secrets set "GitHubModels:Token" "your-github-token"
Enter fullscreen mode Exit fullscreen mode

🎨 Building the Model

Create a CarDetails.cs class to define what we want to extract:

using System.Text.Json.Serialization;

[JsonConverter(typeof(JsonStringEnumConverter))]
public enum AvailabilityType
{
    Sale,
    Lease,
    Rent
}

public class CarDetails
{
    public string Make { get; set; } = string.Empty;
    public string Model { get; set; } = string.Empty;
    public int? Year { get; set; }
    public double? Mileage { get; set; }
    public double? Price { get; set; }
    public AvailabilityType? AvailabilityType { get; set; }
    public double? PricePerMonth { get; set; }
    public double? PricePerDay { get; set; }
    public string[]? Features { get; set; }
    public string? Location { get; set; }
    public string ShortSummary { get; set; } = string.Empty;
    public int? OwnerCount { get; set; }
}
Enter fullscreen mode Exit fullscreen mode

Notice the nullable types? That's how we handle missing data elegantly! ✨

🧠 The Magic: AI-Powered Extraction

Here's the core extraction logic in Program.cs:

using Microsoft.Extensions.AI;
using OpenAI;
using System.ClientModel;

// Configure the client
var configuration = new ConfigurationBuilder()
    .AddUserSecrets<Program>()
    .Build();

var credential = new ApiKeyCredential(
    configuration["GitHubModels:Token"] ?? 
    throw new InvalidOperationException("Token not found")
);

IChatClient chatClient = new OpenAIClient(credential, new OpenAIClientOptions
{
    Endpoint = new Uri("https://models.inference.ai.azure.com")
}).GetChatClient("gpt-4o-mini")
  .AsIChatClient();

// Define extraction schema
var prompt = @"Extract the following details from the car listing and return ONLY a valid JSON object:
{
  ""Make"": ""string - car manufacturer/brand"",
  ""Model"": ""string - car model name"",
  ""Year"": number - manufacturing year,
  ""Mileage"": number - kilometers driven,
  ""Price"": number - price in lakhs,
  ""AvailabilityType"": ""string - one of: Sale, Lease, Rent"",
  ""Features"": ""array of strings - notable features"",
  ""ShortSummary"": ""string - brief summary in 10-15 words"",
  ""OwnerCount"": number - previous owners (null if not mentioned)
}
Return only the JSON object, no additional text.";

// Sample car listings
var carListings = new List<string>
{
    "Honda City 2018 for sale, only 30,000 km! Single owner, showroom condition. β‚Ή6.5 lakh.",
    "Hyundai Creta SX 2020 β€” premium SUV with sunroof. Monthly lease at β‚Ή22,000.",
    "Toyota Innova Crysta 2019 β€” spacious 7-seater, 40,000 km, rent at β‚Ή2,500/day."
};

// Process each listing
foreach (var listing in carListings)
{
    var response = await chatClient.GetResponseAsync<CarDetails>(
        $"{prompt}\n\nCar Listing:\n{listing}"
    );

    if (response.TryGetResult(out CarDetails? carDetails) && carDetails != null)
    {
        Console.WriteLine($"βœ… Extracted: {carDetails.Make} {carDetails.Model}");
        Console.WriteLine(JsonSerializer.Serialize(carDetails, 
            new JsonSerializerOptions { WriteIndented = true }));
    }
}
Enter fullscreen mode Exit fullscreen mode

🎬 Run It!

dotnet run
Enter fullscreen mode Exit fullscreen mode

Output:

Processing car listings...

βœ… Extracted: Honda City
{
  "Make": "Honda",
  "Model": "City",
  "Year": 2018,
  "Mileage": 30000,
  "Price": 6.5,
  "AvailabilityType": "Sale",
  "Features": ["Single owner", "Showroom condition"],
  "OwnerCount": 1
}

βœ… Extracted: Hyundai Creta
{
  "Make": "Hyundai",
  "Model": "Creta SX",
  "Year": 2020,
  "AvailabilityType": "Lease",
  "PricePerMonth": 22000,
  "Features": ["Premium SUV", "Sunroof"]
}
Enter fullscreen mode Exit fullscreen mode

πŸš€ Level Up: Customization Ideas

1. Extract Different Fields

Add fuel type, transmission, color:

public string? FuelType { get; set; }  // Petrol/Diesel/Electric
public string? Transmission { get; set; }  // Manual/Automatic
public string? Color { get; set; }
Enter fullscreen mode Exit fullscreen mode

2. Process Real-Time Data

Connect to web scraping APIs or RSS feeds:

var listings = await FetchListingsFromApi("https://api.carmarket.com/listings");
Enter fullscreen mode Exit fullscreen mode

3. Add Validation

if (carDetails.Year < 1900 || carDetails.Year > DateTime.Now.Year)
{
    Console.WriteLine("⚠️ Invalid year detected");
}
Enter fullscreen mode Exit fullscreen mode

4. Export to Database

await dbContext.CarListings.AddAsync(carDetails);
await dbContext.SaveChangesAsync();
Enter fullscreen mode Exit fullscreen mode

5. Use a Better Model

For higher accuracy, switch to GPT-4o:

.GetChatClient("gpt-4o")  // More capable, slightly slower
Enter fullscreen mode Exit fullscreen mode

πŸ’‘ Pro Tips

  1. Keep temperature low (default is good) for consistent extraction
  2. Be specific in prompts - define exact format you want
  3. Use nullable types - not all listings have all fields
  4. Batch process - handle multiple listings efficiently
  5. Monitor token usage - track costs with response.Usage

🎯 Real-World Applications

  • πŸͺ Marketplace Aggregation: Consolidate listings from multiple sources
  • πŸ’° Price Intelligence: Track pricing trends across markets
  • πŸ“Š Analytics Dashboards: Build insights from unstructured data
  • πŸ€– Chatbots: Power car recommendation bots
  • πŸ“± Mobile Apps: Parse user-submitted listings

πŸ”— Get the Full Code

Grab the complete working example from GitHub:

πŸ‘‰ genai-dotnet-basic_llm_tasks/TextExtraction

The repo includes:

  • βœ… Full source code with comments
  • βœ… 9 example car listings
  • βœ… Configuration setup guide
  • βœ… Detailed README

πŸŽ“ What You Learned

  • Using GitHub Models API in .NET
  • Strongly-typed AI responses with GetResponseAsync<T>
  • Schema-based extraction with AI
  • Handling unstructured data gracefully
  • Building production-ready text extraction

πŸ€” What's Next?

Try extracting:

  • πŸ“„ Resume data (name, skills, experience)
  • 🧾 Invoices (vendor, amounts, dates)
  • πŸ“§ Emails (sender, subject, key points)
  • 🏠 Real estate listings
  • πŸ• Restaurant menus (dishes, prices, ingredients)

The same pattern works for ANY text extraction task!

πŸ’¬ Let's Connect!

What will you build with text extraction? Drop a comment below! πŸ‘‡

Found this helpful? Give it a ❀️ and follow for more .NET + AI content!


Tags: #dotnet #ai #machinelearning #csharp #github #opensource #textextraction #nlp #automation


GitHub Repo: https://github.com/Rahul1994jh/genai-dotnet-basic_llm_tasks

Top comments (0)