π From Chaos to Structure: Extracting Car Listings with AI
Ever struggled with parsing messy car listings from different sources? Imagine turning this:
"Check out this stylish Honda City 2018 model for sale, clocked only 30,000 km! Single owner, showroom condition, insurance valid. Yours for just βΉ6.5 lakh."
Into this:
{
"Make": "Honda",
"Model": "City",
"Year": 2018,
"Mileage": 30000,
"Price": 6.5,
"AvailabilityType": "Sale",
"Features": ["Single owner", "Showroom condition", "Insurance valid"],
"OwnerCount": 1
}
Let me show you how to build this in under 100 lines of C# using GitHub Models API! π
π― The Problem
Car listings come in all shapes and sizes. Whether you're building a price comparison site, marketplace aggregator, or inventory management system, you need to:
- Extract key details (make, model, year, mileage, price)
- Handle different formats (sale, lease, rent)
- Deal with missing information gracefully
- Process data at scale
Manually parsing this is tedious. Let AI do the heavy lifting! πͺ
π οΈ The Solution
We'll use:
- GitHub Models - Free access to powerful AI models
- Microsoft.Extensions.AI - Unified AI abstraction for .NET
- .NET 10 - Latest and greatest
π¦ Quick Setup
First, grab your free GitHub token from GitHub Models (no credit card needed!).
Create a new console app:
dotnet new console -n TextExtraction
cd TextExtraction
dotnet add package Microsoft.Extensions.AI.OpenAI
dotnet add package Microsoft.Extensions.Configuration.UserSecrets
Store your token securely:
dotnet user-secrets init
dotnet user-secrets set "GitHubModels:Token" "your-github-token"
π¨ Building the Model
Create a CarDetails.cs class to define what we want to extract:
using System.Text.Json.Serialization;
[JsonConverter(typeof(JsonStringEnumConverter))]
public enum AvailabilityType
{
Sale,
Lease,
Rent
}
public class CarDetails
{
public string Make { get; set; } = string.Empty;
public string Model { get; set; } = string.Empty;
public int? Year { get; set; }
public double? Mileage { get; set; }
public double? Price { get; set; }
public AvailabilityType? AvailabilityType { get; set; }
public double? PricePerMonth { get; set; }
public double? PricePerDay { get; set; }
public string[]? Features { get; set; }
public string? Location { get; set; }
public string ShortSummary { get; set; } = string.Empty;
public int? OwnerCount { get; set; }
}
Notice the nullable types? That's how we handle missing data elegantly! β¨
π§ The Magic: AI-Powered Extraction
Here's the core extraction logic in Program.cs:
using Microsoft.Extensions.AI;
using OpenAI;
using System.ClientModel;
// Configure the client
var configuration = new ConfigurationBuilder()
.AddUserSecrets<Program>()
.Build();
var credential = new ApiKeyCredential(
configuration["GitHubModels:Token"] ??
throw new InvalidOperationException("Token not found")
);
IChatClient chatClient = new OpenAIClient(credential, new OpenAIClientOptions
{
Endpoint = new Uri("https://models.inference.ai.azure.com")
}).GetChatClient("gpt-4o-mini")
.AsIChatClient();
// Define extraction schema
var prompt = @"Extract the following details from the car listing and return ONLY a valid JSON object:
{
""Make"": ""string - car manufacturer/brand"",
""Model"": ""string - car model name"",
""Year"": number - manufacturing year,
""Mileage"": number - kilometers driven,
""Price"": number - price in lakhs,
""AvailabilityType"": ""string - one of: Sale, Lease, Rent"",
""Features"": ""array of strings - notable features"",
""ShortSummary"": ""string - brief summary in 10-15 words"",
""OwnerCount"": number - previous owners (null if not mentioned)
}
Return only the JSON object, no additional text.";
// Sample car listings
var carListings = new List<string>
{
"Honda City 2018 for sale, only 30,000 km! Single owner, showroom condition. βΉ6.5 lakh.",
"Hyundai Creta SX 2020 β premium SUV with sunroof. Monthly lease at βΉ22,000.",
"Toyota Innova Crysta 2019 β spacious 7-seater, 40,000 km, rent at βΉ2,500/day."
};
// Process each listing
foreach (var listing in carListings)
{
var response = await chatClient.GetResponseAsync<CarDetails>(
$"{prompt}\n\nCar Listing:\n{listing}"
);
if (response.TryGetResult(out CarDetails? carDetails) && carDetails != null)
{
Console.WriteLine($"β
Extracted: {carDetails.Make} {carDetails.Model}");
Console.WriteLine(JsonSerializer.Serialize(carDetails,
new JsonSerializerOptions { WriteIndented = true }));
}
}
π¬ Run It!
dotnet run
Output:
Processing car listings...
β
Extracted: Honda City
{
"Make": "Honda",
"Model": "City",
"Year": 2018,
"Mileage": 30000,
"Price": 6.5,
"AvailabilityType": "Sale",
"Features": ["Single owner", "Showroom condition"],
"OwnerCount": 1
}
β
Extracted: Hyundai Creta
{
"Make": "Hyundai",
"Model": "Creta SX",
"Year": 2020,
"AvailabilityType": "Lease",
"PricePerMonth": 22000,
"Features": ["Premium SUV", "Sunroof"]
}
π Level Up: Customization Ideas
1. Extract Different Fields
Add fuel type, transmission, color:
public string? FuelType { get; set; } // Petrol/Diesel/Electric
public string? Transmission { get; set; } // Manual/Automatic
public string? Color { get; set; }
2. Process Real-Time Data
Connect to web scraping APIs or RSS feeds:
var listings = await FetchListingsFromApi("https://api.carmarket.com/listings");
3. Add Validation
if (carDetails.Year < 1900 || carDetails.Year > DateTime.Now.Year)
{
Console.WriteLine("β οΈ Invalid year detected");
}
4. Export to Database
await dbContext.CarListings.AddAsync(carDetails);
await dbContext.SaveChangesAsync();
5. Use a Better Model
For higher accuracy, switch to GPT-4o:
.GetChatClient("gpt-4o") // More capable, slightly slower
π‘ Pro Tips
- Keep temperature low (default is good) for consistent extraction
- Be specific in prompts - define exact format you want
- Use nullable types - not all listings have all fields
- Batch process - handle multiple listings efficiently
-
Monitor token usage - track costs with
response.Usage
π― Real-World Applications
- πͺ Marketplace Aggregation: Consolidate listings from multiple sources
- π° Price Intelligence: Track pricing trends across markets
- π Analytics Dashboards: Build insights from unstructured data
- π€ Chatbots: Power car recommendation bots
- π± Mobile Apps: Parse user-submitted listings
π Get the Full Code
Grab the complete working example from GitHub:
π genai-dotnet-basic_llm_tasks/TextExtraction
The repo includes:
- β Full source code with comments
- β 9 example car listings
- β Configuration setup guide
- β Detailed README
π What You Learned
- Using GitHub Models API in .NET
- Strongly-typed AI responses with
GetResponseAsync<T> - Schema-based extraction with AI
- Handling unstructured data gracefully
- Building production-ready text extraction
π€ What's Next?
Try extracting:
- π Resume data (name, skills, experience)
- π§Ύ Invoices (vendor, amounts, dates)
- π§ Emails (sender, subject, key points)
- π Real estate listings
- π Restaurant menus (dishes, prices, ingredients)
The same pattern works for ANY text extraction task!
π¬ Let's Connect!
What will you build with text extraction? Drop a comment below! π
Found this helpful? Give it a β€οΈ and follow for more .NET + AI content!
Tags: #dotnet #ai #machinelearning #csharp #github #opensource #textextraction #nlp #automation
GitHub Repo: https://github.com/Rahul1994jh/genai-dotnet-basic_llm_tasks
Top comments (0)