plasmon

Posted on Mar 22 • Originally published at qiita.com

What Happens When You Bring LLMs Into a Semiconductor FAB — 5 ArXiv Papers, Brutally Honest Reviews

#semiconductor #llm #manufacturing #ai

ArXiv papers on semiconductor manufacturing x AI have been surging. From late 2024 onward, proposals have popped up for AI applied to every major FAB process: failure analysis (FA), anomaly detection, SPC, OPC, and tool matching.

Honest take — about half of these made me think "cool, but would this actually survive a production line?" But at the same time, there's genuine excitement: "if someone cracks this, manufacturing engineering changes fundamentally."

I straddle both the process engineering side and the software side, so I've seen the pattern of "beautiful theory that disintegrates the moment it hits a mass production line" more times than I can count. But that doesn't mean these problems aren't worth solving. Quite the opposite. Converting veteran engineers' tacit knowledge into formalized, searchable, reusable knowledge is a challenge that manufacturing as a whole has grappled with for decades — and LLMs plus RAG are the first tools that offer a technically credible approach.

With that lens, I read five notable papers and sorted them into "potentially useful" and "suspicious."

1. Shoving an LLM Agent Into Semiconductor Failure Analysis

Intelligent Assistants for the Semiconductor Failure Analysis with LLM-Based Planning Agents (2025)

Failure analysis is the most human-dependent process in semiconductor manufacturing. When a defective chip shows up, the flow goes: electrical test -> physical analysis -> cross-section SEM -> root cause identification. Veteran engineers run this loop on experience and gut feeling. It takes a new hire 5 years to become competent.

This paper proposes having an LLM-based Planning Agent orchestrate that entire flow.

Defective chip info (electrical test results, lot info, tool history)
    |
LLM Planning Agent
    |-- Search past FA case DB (RAG)
    |-- Select analysis methods (SEM? FIB? EBAC?)
    |-- Score priorities
    |-- Draft report
    |
Recommended action list for engineers

The critical design choice here: the LLM doesn't "produce answers" — it "produces plans." FA knowledge is ultra domain-specific and drowning in NDAs. Getting an LLM to understand the physics of failure mechanisms is a fool's errand. But pattern-matching on "when this failure mode appeared before, what procedure did the team use to hunt it down" — that sits squarely in RAG + LLM's sweet spot.

My blunt assessment: Out of all five papers, this one has the highest chance of actually getting deployed. The reason is simple — FA engineers are chronically understaffed everywhere, and the slow ramp-up time for new hires is a universal pain point. However, the paper glosses over "knowledge base construction," which is actually the heaviest lift. Who's going to structure all those past FA cases and load them into a database? At most fabs, FA reports are scattered across Word docs and PDFs with zero searchability. Before you build the LLM agent, you'll spend two years on data cleanup. That's reality.

Why Local LLMs Are Non-Negotiable

FA data ranks at the very top of corporate secrets. Failure mode patterns, tool-specific fault tendencies, yield data — no company on earth is sending this to OpenAI's API.

This means these systems must run local LLMs inside an air-gapped FAB environment. The architecture is llama.cpp + a 30B-class model on an edge server, hitting usable inference speed on a single RTX 4060-class GPU. I've previously verified that Qwen2.5-32B runs realistically in this setup. Qwen3.5-27B is out now so inference quality should be even better, and the lower parameter count should fit more comfortably into 8GB VRAM. Haven't benchmarked it yet though.

2. Attacking Semiconductor Anomaly Detection with ML — N-BEATS + GNN Multivariate Approach

Unsupervised Anomaly Prediction with N-BEATS and Graph Neural Network in Multi-variate Semiconductor Process Time Series (2025)

A semiconductor production line has thousands of sensor parameters. Temperature, pressure, gas flow, RF power, film thickness. Monitoring all these multivariate time series for anomalies is the job, but three walls stand in the way:

Dimensionality is insane — thousands of parameters to monitor simultaneously
Anomalies are absurdly rare — less than 0.1% of all data points are anomalous (the other 99.9% are normal)
Parameters are entangled — single-variable thresholds miss anomalies that only show up in parameter interactions

The proposed method has two stages:

[Stage 1] N-BEATS
  Predict each parameter's time series -> compute residuals (gap between prediction and actual)
  Large residual = something is off

[Stage 2] GNN
  Represent inter-parameter dependencies as a graph
  -> Trace residual propagation patterns to backtrack "where did it first go wrong"

Choosing N-BEATS here is smart. Compared to Transformer-based time series models, it has fewer parameters and lighter inference. If you're mounting this on real-time FAB monitoring, heavy models are a non-starter. Using GNN to capture causal relationships between parameters is also a sound approach for identifying cascade failures like "chamber temperature drifts -> film thickness shifts -> electrical characteristics degrade."

Being unsupervised (trained on normal data only) also lowers the bar for production deployment. Telling a semiconductor manufacturer to "collect a ton of labeled anomaly data" is practically asking for the impossible. Anomalies rarely happen, and when they do, the data's confidentiality level goes through the roof.

But here's where I get real. This paper's evaluation is mostly on simulated data. Validation on actual FAB data is thin. When a paper says "98% accuracy" on simulation, the question is how many percentage points it drops when exposed to the messy reality of a production line — sensor drift, jumps after maintenance, discontinuities from recipe changes. Papers that don't show this get a credibility discount from me.

3. Putting AI Prediction on Semiconductor SPC — From "React After the Fact" to "See It Coming"

Proactive Statistical Process Control Using AI: A Time Series Forecasting Approach for Semiconductor Manufacturing (2025)

SPC (Statistical Process Control) is the bedrock of semiconductor quality management. Plot measurements on a control chart, trigger an alert when control limits are breached. This method has been used for over 50 years.

And that "alert after the limit is breached" part is the structural flaw. By the time it's breached, wafers are already defective — or at minimum headed for rework. A lot containing tens of millions of yen worth of wafers gets sacrificed before the system says "anomaly detected." Thanks for the heads up.

The paper's proposal is straightforward:

Traditional SPC:
  Measurement -> Control chart -> Limit breached -> Alert -> Reactive response
  (Casualties happen before you notice)

Proactive SPC:
  Past measurements -> Time series forecast model -> Predict next N points
  -> Detect limit breach before it happens -> Preventive intervention
  (Stop it before casualties)

The "look ahead" idea itself has existed before, but prediction accuracy was insufficient, drowning the floor in false alarms and exhausting operators. This paper argues that accuracy improvements in N-BEATS and Temporal Fusion Transformer make this "finally production-viable."

Semiconductor process time series have characteristics that play nicely with forecasting models:

Clear periodicity — lot cycles, PM (preventive maintenance) cycles
Gradual drift — consumable part degradation doesn't spike suddenly
Low noise — precision-controlled environment means high S/N ratio

Compared to time series in other domains (finance, retail), semiconductor data is "easier to predict." So it makes sense that proactive SPC would reach production viability in semiconductors first.

Of the five papers, this is the one I'd call "usable starting tomorrow." The reason: it can be implemented as an add-on layer on top of existing SPC infrastructure. Keep the existing control charts, keep the existing data pipelines, just add one prediction layer on top. It also avoids colliding with the conservative IT infrastructure culture of most FABs.

4. Quantifying Tool-to-Tool Matching with ML — A Boring But Lethal Problem in Semiconductor Manufacturing

Tool-to-Tool Matching Analysis Based Difference Score Computation Methods for Semiconductor Manufacturing (2025)

When you're running the same etch process on three tools, "tool #3 is cutting CD slightly wider" happens all the time. Tool-to-Tool Matching (TTTM) is unglamorous but directly hammers yield.

Traditional TTTM uses a "golden reference" — measuring the gap against the output of an idealized tool. But maintaining a golden reference in a production line is close to fantasy. Tools shift subtly after every maintenance event, parts swaps change their characteristics, and across tools from different vendors the comparison axes don't even align.

This paper proposes a pipeline that dynamically computes inter-tool difference scores without a golden reference. ML models correct for drift and seasonal variation, decomposing "which parameter has how much divergence."

Not a bad approach. But this space already has Lam Research and Applied Materials doing similar things with their proprietary software. How much novelty an academic paper can claim here is questionable. Equipment vendors have matching algorithms optimized for their own tools' data, delivered to customers under NDA. Whether the published method actually exceeds these proprietary solutions — honestly, I don't have enough information to judge.

5. Giving OPC Engineers an LLM Assistant — The Most Ambitious Proposal of the Five

Intelligent OPC Engineer Assistant for Semiconductor Manufacturing (2024)

OPC (Optical Proximity Correction) is the linchpin of the lithography process. At advanced nodes below 7nm, the gap between the mask pattern and what actually transfers to the wafer is massive, and the correction computations are enormous. Dependence on OPC engineers' experience and know-how is extreme.

This paper proposes LLM support for OPC engineer decision-making — recipe recommendations, interpretation of simulation results, parameter tuning advice. Same philosophy as Paper 1's FA agent: the LLM doesn't "replace judgment," it "assists judgment."

The concept makes sense. But of the five papers, this one is furthest from realization. OPC's knowledge domain intertwines lithography optics physics, resist chemistry, and etch reaction kinetics into a composite field where text-based knowledge alone can't capture everything. What OPC engineers look at daily is simulation images and contour maps, not text. How far an LLM's text processing capabilities actually reach here — I'm genuinely skeptical.

If multimodal LLMs (with image understanding) mature, the story might change. But trying to assist OPC with today's text-based LLMs feels like picking the wrong tool for the job.

Cross-Cutting Patterns Across All Five Papers — Local LLMs Are Becoming a Prerequisite for Manufacturing

Reading these individually is interesting, but lining them up reveals structural patterns in this field.

LLMs Function as a "Knowledge Interface Layer"

Papers 1 (FA) and 5 (OPC) both use LLMs, but neither asks the LLM to "perform analysis." They use it to "mediate access to knowledge." Expecting an LLM to directly understand semiconductor process physics is unreasonable. But taking engineers' tacit knowledge -> converting to text -> making it RAG-searchable -> conversational natural language interface — that's a realistic role as an interface layer.

This might look modest, but I think it's actually a significant paradigm shift. For the first time, there's a technically coherent answer to manufacturing's decades-old problem: "when the veteran retires, the knowledge vanishes." It doesn't need to be perfect. A system that covers 70% of a veteran's judgment calls would accelerate new engineer ramp-up by two years. The impact of that is far larger than any individual paper's accuracy improvement.

Unsupervised Methods Are Becoming the De Facto Standard

Papers 2 and 3 both use unsupervised or semi-supervised approaches. In semiconductor manufacturing, mass collection of anomaly data is structurally difficult. Anomalies are rare, and their data is highly confidential. "Train on normal data only, catch anything that deviates from normal" is becoming the de facto standard approach for AI in this industry.

"Fix After It Breaks" -> "Stop Before It Breaks"

Papers 2 and 3 both aim at the same paradigm shift: reactive to proactive. This isn't a technical improvement — it's a transformation of the quality management paradigm itself. It changes the speed of yield improvement, which directly affects cost per wafer.

Edge Inference Is "Can't Start Without It," Not "Nice to Have"

FA data, process data, OPC data — none of it can leave the FAB. No matter how smart cloud LLMs get, the "data can't go outside" constraint isn't going away. Local LLM x edge GPU isn't an option for the semiconductor industry — it's a prerequisite.

The Gap Between Academia and the FAB Floor — And Why That's Exciting

You may have noticed by now — all five papers are at the "proposal" stage. Not a single one reports "results from running in a production FAB for six months."

This is a structural challenge in semiconductor manufacturing AI research. Real data can't be published due to NDAs. What can be published are validations on simulated data or anonymized data stripped of context. So a paper's "accuracy XX%" doesn't directly translate to production performance.

But I'd argue this gap is precisely where the next wave of value will be created. FA knowledge search, predictive SPC, unsupervised anomaly detection — these are things the production floor genuinely wants, and the direction is clearly right. What's desperately missing are engineers with the "messy implementation muscle" to bridge the gap between paper and production. Put differently: if you're someone who understands both process engineering and software engineering, this field is a gold mine right now.

A tip for reading these papers: before looking at "novelty of the method," first check "can the input data this method requires actually be collected at my FAB?" If the answer is no, it's not usable today. But if it's "not collectible now, but could be with a process change," then that process change itself becomes the next piece of work.

References

Next up, I want to try reproducing Paper 3's proactive SPC on an actual open dataset. Planning to test N-BEATS look-ahead accuracy on the SECOM dataset. If it works out, I might release it as a PoC package for FABs. Building the data preprocessing pipeline is admittedly tedious heavy lifting — but a factory that can stop before it breaks is a goal worth pursuing.

DEV Community