DEV Community

bob lee
bob lee

Posted on

How I Built an FTIR Analysis Platform with Claude (and What I Learned About AI-Assisted Development)

DEV.to Article: How I Built an FTIR Analysis Platform with Claude

Title: How I Built an FTIR Analysis Platform with Claude (and What I Learned About AI-Assisted Development)
Tags: python, chemistry, opensource, ai
Published: true (can publish immediately on DEV)


The Backstory

I'm a materials science graduate, not a software developer. I know FTIR spectroscopy — identifying polymers, interpreting functional group peaks, matching unknown samples against reference libraries. But when I needed to search FTIR spectra programmatically, I hit a wall: the existing tools were either expensive enterprise packages or Excel macros from the early 2000s.

So I decided to build my own. And I used Claude (Anthropic's AI assistant) as my coding partner.

This is the story of how a domain expert with basic Python skills built a production FTIR search platform — 135,000 spectra, MCP server, API, community features — with AI writing about 70% of the code.


Step 1: The Core Algorithm

FTIR spectrum matching sounds complex, but the core is simple geometry: given a set of peak positions from an unknown sample, find the library spectra with the most matching peaks within a tolerance window (typically ±5 to ±15 cm⁻¹).

What Claude helped with:

  • Writing the initial peak-matching loop
  • Setting up the Django project structure
  • Designing the database schema for the spectral library

What I handled:

  • Understanding which tolerance values actually work (different wavenumber regions need different tolerances)
  • Validating match results against known materials
  • Rejecting the first three algorithm designs that looked correct on paper but failed on real data

Lesson: AI can write the code faster than you can, but it can't tell you if the chemistry is right. Domain expertise is the bottleneck, not code.


Step 2: Parsing FTIR Instrument Files

This was the hardest technical challenge. FTIR instruments output data in at least 6 different formats:

Format Origin Difficulty
SPA Thermo Nicolet Medium — binary, proprietary
SPC GRAMS Medium — documented but complex
OPUS Bruker High — completely proprietary
CSV Universal Easy
JDX JCAMP-DX Medium — standard but varied implementations
XLSX Labs Easy — but infinite variations

What Claude helped with:

  • Writing binary file parsers from format documentation
  • Extracting peak tables from raw instrument data
  • Handling edge cases (missing metadata, non-standard headers)

What I handled:

  • Testing with real instrument files from my university lab
  • Identifying which format variants actually appear in practice
  • Setting up error handling for unparseable files

Lesson: Claude is surprisingly good at binary file parsing. I pasted format specs from Thermo and Bruker documentation, and it generated working parsers. But I caught three subtle byte-offset errors that would have silently corrupted data.


Step 3: The MCP Server

MCP (Model Context Protocol) lets AI agents call your tool directly. Instead of a human typing peak values into a web form, an AI agent can send structured requests and receive structured results.

The MCP server, at fastapi_server/mcp_server.py, exposes one main tool:

analyze_ftir_spectrum(file_content, filename, peaks)
Enter fullscreen mode Exit fullscreen mode

Accept either an instrument file or a peak list. Returns ranked matches with similarity scores.

What Claude generated: ~90% of the MCP server code, including the Pydantic output schemas, error handling, and feature documentation.


Step 4: What Broke in Production

Problem 1: Memory
Loading the entire 135K-spectrum library into memory on every request was fine locally. On a 2GB VPS with other services running, it caused OOM kills within hours.

  • Fix: Added Redis caching for frequent searches, lazy loading for the library, and a batch query size limit.

Problem 2: Cloudflare timeouts
The MCP streamable-http transport needs persistent connections. Cloudflare's default 100-second timeout killed long searches.

  • Fix: Server-sent events for progress reporting, and Cloudflare timeout tuning.

Problem 3: Hallucination-like false positives
The matching algorithm returned chemically impossible candidates for very short peak lists (2-3 peaks).

  • Fix: Added a minimum peak count threshold and a confidence penalty for low-peak queries.

The Result

FTIR.fun is now:

  • Live at https://ftir.fun
  • MCP endpoint: https://ftir.fun/mcp — connect from Claude, Cursor, Copilot, or any MCP client
  • OpenAPI spec: https://ftir.fun/openapi.platform.yaml
  • GitHub: github.com/jxbaoxiaodong/ftirfun-mcp
  • ~135,000 spectra indexed and searchable
  • ~70% of the code co-written with Claude
  • ~30% of the code rewritten after Claude's version failed in production

What I'd Tell Other Domain Experts Considering AI-Assisted Development

1. Start with the messy data, not the shiny framework.

I spent two weeks getting Claude to generate a perfect Docker Compose setup. Then I spent two months wrangling real FTIR instrument files. The infrastructure was the easy part — the data was the hard part.

2. AI will write code that looks right but is wrong.

Claude produced beautiful peak-matching code that passed unit tests and failed on real spectra. The peak positions "matched" mathematically but violated basic FTIR chemistry. You need domain knowledge to catch this.

3. Production is where the AI-generated code breaks first.

The code that looks clean in a notebook dies first under real load, real data variety, and real timeout limits. Be ready to rewrite the hot paths.

4. But the framework code is perfect for AI.

Settings, schemas, API routing, test scaffolding, README files, deployment scripts — Claude wrote these flawlessly. Let AI handle the glue while you focus on the domain logic.


What's Next

  • Confidence calibration (how reliable is a 0.85 similarity score?)
  • Expanded file format support
  • Public API with usage tiers
  • More MCP tools for agent workflows

FTIR.fun is an open-spectral-search project by a materials scientist who learned Python by building it. Questions, feedback, or FTIR datasets to contribute? ftir.fun@outlook.com

Top comments (0)