How I Automated Water Quality Triage Using Multi-Agent Business Rules
Part 1 — Context and Foundations
TL;DR
I built AquaGuard-AI as a personal proof-of-concept to explore a problem I kept noticing while experimenting with operational monitoring patterns: aquaculture pond sensor readings are easy to collect, but the hard part is deciding what each reading means when species rules, feeding schedules, calibration age, and farm policy all interact. This experimental project routes pond readings through a LangGraph-orchestrated multi-agent pipeline backed by FastAPI and a lightweight aquaculture policy knowledge index. Each reading passes through sensor validation, species threshold checks, feeding window gates, compliance retrieval, and action generation before landing in a structured report with urgency levels and agent traces. The full source code lives at https://github.com/aniket-work/AquaGuard-AI. From my experience, the most interesting takeaway is not any single agent prompt, but how graph-based orchestration keeps specialist validation steps composable without turning business logic into spaghetti. This write-up documents my design choices, the code I wrote along the way, and what I would change if I extended the PoC further.
Introduction
A few weeks ago I started paying closer attention to what happens after a pond probe reports dissolved oxygen at 2.4 mg/L at six in the morning. From my observation reading aquaculture operations forums and anonymized sensor exports, the number itself is rarely the whole story. A field operator still has to answer harder questions. Is this critical for tilapia or merely concerning for catfish? Does a scheduled feed batch make the situation worse? Is the probe trustworthy given calibration age? Which farm policy clause applies, and how quickly should someone respond before stock loss?
I wanted to test whether multi-agent orchestration could compress that triage work without pretending to replace human judgment. AquaGuard-AI is the result of that experiment. It is not production software, not something I deployed at any employer, and not a claim about how aquaculture farms should operate. It is a solo PoC I built to learn LangGraph patterns in a domain that felt concrete and underserved in agent tutorials.
The architecture rhymes with other validation experiments I have tried: specialist agents, a coordinator graph, a knowledge layer for policy retrieval, and a thin API for integration. I deliberately chose aquaculture pond health rather than chatbots or document summarization because those demos are overrepresented, and because water quality incidents have crisp numeric inputs and operational outputs that make evaluation honest.
What pulled me toward this domain specifically was the asymmetry between measurement and response. Modern farms already emit dissolved oxygen, pH, temperature, ammonia, and turbidity readings from in-pond probes. The gap I noticed was structured validation: turning raw readings into urgency labels, species-aware anomaly tags, policy citations, and recommended actions that a field operator could accept or override quickly. I wondered whether explicit agent roles could mirror that mental sorting without hiding reasoning inside a black-box model.
I also wanted a use case where mistakes are obvious. If an agent labels a 2.4 mg/L reading as normal for tilapia grow-out, tests should scream. That clarity helped me iterate faster than I would have in a vague business insights demo where quality is subjective.
From where I stand, aquaculture validation shares a structural pattern with other enterprise-style agent experiments I have built: you receive structured events, apply domain-specific business rules in layers, attach policy citations, and emit operational recommendations with explicit urgency. I did not copy any prior tutorial wholesale. Instead, I asked what "validation" means when the inventory is live fish rather than shopping carts, then designed agents around that question.
What's This Article About?
This article walks through how I designed and implemented AquaGuard-AI, a multi-agent system that ingests batches of pond sensor readings and emits validation reports containing anomaly labels, urgency scores, aquaculture policy references, recommended interventions, and response hour estimates. You will see the LangGraph state machine I used, the specialist agents I wrote, the policy knowledge module that acts as a stand-in for retrieval-augmented generation, and the FastAPI surface that exposes the workflow to a simple dashboard.
I also cover setup and execution steps so you can reproduce the PoC locally, and I share edge cases I discovered while testing, including feeding-window conflicts that only appear when operator notes mention scheduled feed batches. Throughout, I frame conclusions as personal observations from an experimental build rather than authoritative guidance for regulated aquaculture environments.
Tech Stack
I kept the stack intentionally small. Complexity in agent systems often hides in dependencies long before it hides in prompts, and I wanted to feel every layer.
- Python 3.11 as the runtime. Strong typing with Pydantic models made pond reading batches easy to validate at API boundaries.
- LangGraph for orchestration. I considered a hand-rolled coordinator loop, but LangGraph gave me explicit state transitions and a graph I could diagram without lying.
- FastAPI for HTTP endpoints and to serve a minimal HTML dashboard from the same process.
-
Pydantic v2 for schemas such as
PondReading,ValidationResult, andBatchValidationReport. - Rich for CLI tables. Terminal output matters in field operations workflows because many farm tools still live partly in terminals and JSON exports.
- Matplotlib for urgency and anomaly distribution charts. I wanted at least one visual summary of learned validation statistics, not because the charts are fancy, but because they make batch behavior obvious at a glance.
I skipped a full vector database in this PoC. A lexical policy index was enough to prove the RAG insertion point, and it kept the repository approachable for readers who want to clone and run in ten minutes.
As per my experience shipping small services, I optimize for clone-to-green time in PoCs. Every extra infrastructure component is a reader who bounces at step four of the README. LangGraph plus FastAPI plus a JSON fixture gets people to a colored terminal table quickly. That table is the emotional payoff that convinces someone to read the rest of the design sections.
| Component | Role in AquaGuard-AI | Why I chose it |
|---|---|---|
| LangGraph | Per-reading validation loop | Makes orchestration visible and testable |
| FastAPI | HTTP plus static dashboard | Single process dev experience |
| Pydantic | Schema guardrails | Prevents malformed reading batches early |
| Rich | CLI tables | Matches field-operator friendly output |
| Matplotlib | Urgency and anomaly charts | Quant summary for README and article |
Part 2 — Design Rationale
Why Read It?
If you are experimenting with LangGraph and tired of toy chatbots, aquaculture pond validation offers a disciplined sandbox. Inputs are structured sensor readings with species metadata. Outputs are enums, policy codes, and action lists. That makes it easy to tell when an agent misbehaved without hand-waving about subjective quality.
From my perspective, three ideas in this PoC transfer to other domains:
- Specialist agents beat monolithic prompts when each step has different failure modes. Species threshold scoring and feeding schedule gates should not share one prompt context window.
- Graph orchestration documents operational process better than nested if-statements. Farm operators already think in workflows; LangGraph mirrors that honestly.
- A thin API unlocks multiple clients with one workflow. I used both a Rich CLI and a browser dashboard against the same validation function.
The GitHub repository at https://github.com/aniket-work/AquaGuard-AI includes tests, sample data, diagrams, and a title animation GIF generated from terminal output plus a dashboard preview. Clone it, run python main.py, and you will see the same ASCII table aesthetic I embedded in the article cover animation.
Let's Design
Before writing code I sketched the lifecycle of a single pond reading. A probe reports dissolved oxygen, pH, temperature, ammonia, turbidity, biomass density, and calibration age. A validation system must answer five questions: is the sensor trustworthy, do species thresholds permit current chemistry, does a feeding window conflict exist, which farm policy clauses apply, and what intervention urgency is appropriate. Those questions map cleanly to five agent responsibilities plus a coordinator.
Coordinator and state
The coordinator owns batch progress. LangGraph state carries the reading list, accumulated results, farm name, and a numeric index. I chose explicit index-based iteration rather than recursive graph tricks because it made debugging easier when a single reading produced surprising output.
Sensor validation agent
The sensor agent checks calibration age and dissolved oxygen criticality before species rules run. Probes older than forty-five days flag sensor_calibration_due. Dissolved oxygen below species-critical floors triggers dissolved_oxygen_critical. I put calibration checks first because, in my opinion, acting on stale probe data is worse than acting slowly on fresh data.
Species rules agent
Species is where aquaculture business logic actually lives. Tilapia, shrimp, and catfish each carry distinct dissolved oxygen minimums, pH bands, ammonia ceilings, and temperature ranges. Shrimp tolerates narrower ammonia windows than tilapia. Catfish temperature stress appears above thirty-two Celsius. Encoding these as lookup tables made pytest assertions straightforward.
Feeding schedule agent
This agent encodes an operational gate I rarely see in generic IoT demos: do not feed when water quality fails species minimums. If operator notes mention a scheduled feed batch while dissolved oxygen or ammonia violates thresholds, the agent flags feeding_window_conflict. That single rule prevented several false "all clear" outcomes in my sample batch.
Compliance agent
This agent queries a small aquaculture policy chunk index. Each chunk includes a code like AQ-TIL-001, title, body, and species tag. Search is lexical overlap with species and category boosts. It is a deliberate stand-in for embedding retrieval, but the interface is stable: pass description and species, receive citation strings.
Action and urgency agents
Finally, the urgency agent maps anomaly sets to emergency, high, medium, low, or normal levels. The action agent converts urgency into recommended interventions and response hour counts. Critical dissolved oxygen triggers emergency aeration steps with a one-hour response window. Feeding conflicts map to high urgency with hold-feed instructions.
API and clients
FastAPI exposes /api/validate/sample for the bundled dataset and /api/validate for custom payloads. The root route serves a dark-themed dashboard that renders urgency cards and a results table after a POST call. I put the frontend in a single HTML file to avoid Node build tooling in a PoC README.
Part 3 — Implementation Deep Dive
Let's Get Cooking
Here is where I translate the design into code. I split the implementation into schema models, specialist agents, the LangGraph workflow, and the API layer. I wrote each block to be readable in isolation because, in my experience, agent repos rot quickly when everything lives in one thousand-line module.
Data models
I started with Pydantic models because pond readings arrive from JSON exports and IoT gateways in real life, and I wanted validation before any agent touched the data. Strong typing also made FastAPI response models trivial to declare.
class UrgencyLevel(str, Enum):
EMERGENCY = "emergency"
HIGH = "high"
MEDIUM = "medium"
LOW = "low"
NORMAL = "normal"
class PondReading(BaseModel):
reading_id: str
pond_id: str
species: SpeciesType
dissolved_oxygen_mg_l: float = Field(ge=0.0, le=25.0)
ph: float = Field(ge=0.0, le=14.0)
temperature_c: float = Field(ge=-5.0, le=45.0)
ammonia_mg_l: float = Field(ge=0.0, le=10.0)
turbidity_ntu: float = Field(ge=0.0, le=500.0)
biomass_kg_per_m3: float = Field(ge=0.0, le=50.0)
sensor_calibration_days_ago: int = Field(ge=0, le=365)
recorded_at: str
operator_notes: str = ""
These models became the contract between CLI, API, and agents. Keeping enums strict prevented silent string drift in urgency labels when I generated charts later. I designed operator_notes as a first-class field because feeding schedule logic depends on human context that pure chemistry cannot capture.
Specialist agents
Each agent appends to an agent_trace list so I could explain decisions in the CLI and in future audit logs. The sensor validation agent checks calibration and dissolved oxygen floors:
def sensor_validation_agent(reading: PondReading, trace: list[str]) -> list[ReadingAnomaly]:
anomalies: list[ReadingAnomaly] = []
if reading.sensor_calibration_days_ago > 45:
anomalies.append(ReadingAnomaly.SENSOR_CALIBRATION_DUE)
if reading.dissolved_oxygen_mg_l < SPECIES_DO_CRITICAL[reading.species]:
anomalies.append(ReadingAnomaly.DISSOLVED_OXYGEN_CRITICAL)
elif reading.dissolved_oxygen_mg_l < SPECIES_DO_MIN[reading.species]:
anomalies.append(ReadingAnomaly.DISSOLVED_OXYGEN_LOW)
trace.append(f"SensorValidationAgent: found {len(anomalies)} sensor anomalies")
return anomalies
Species rules encode the aquaculture intuition I mentioned earlier. Shrimp ammonia above 0.25 mg/L becomes ammonia_elevated. Tilapia pH outside 6.5 to 8.5 becomes ph_out_of_range. I put it this way because species mistakes are costlier than sensor mistakes: they directly influence mortality risk estimates.
The feeding schedule agent inspects operator notes for feed keywords:
feed_requested = any(
kw in notes for kw in ("feed scheduled", "feeding window", "distribute feed", "feed batch")
)
if feed_requested and (do_low or ammonia_high):
anomalies.append(ReadingAnomaly.FEEDING_WINDOW_CONFLICT)
This block is short but important. It encodes a business rule that numeric thresholds alone cannot express.
The compliance agent wraps policy search:
def compliance_agent(reading, anomalies, trace) -> list[str]:
hits = search_policies(query, species=reading.species.value, category=category, top_k=1)
refs = [f"{h.code}: {h.title}" for h in hits]
trace.append(f"ComplianceAgent: attached {len(refs)} policy references")
return refs
The action agent returns both steps and response hours. That pairing is what makes the output feel operational instead of academic.
LangGraph workflow
The graph is small but worth showing because it is the spine of the PoC:
def build_validation_graph() -> Any:
graph = StateGraph(ValidationState)
graph.add_node("load", _load_batch)
graph.add_node("process", _process_next)
graph.add_node("summarize", _summarize)
graph.set_entry_point("load")
graph.add_edge("load", "process")
graph.add_conditional_edges(
"process",
_should_continue,
{"process": "process", "summarize": "summarize"},
)
graph.add_edge("summarize", END)
return graph.compile()
Each pass through process validates one pond reading and increments the index. When the index reaches the batch length, the graph routes to summarize, which is a hook I left open for aggregate analytics such as average confidence or anomaly histograms.
run_batch_validation computes urgency counts and anomaly distributions after the graph finishes. That summary feeds both the Rich CLI table and the Matplotlib chart in src/aquaguard/analytics.py.
FastAPI surface
The API is intentionally thin. It validates payloads, calls run_batch_validation, and returns a BatchValidationReport. The sample endpoint loads data/sample_pond_readings.json, which describes eight readings across tilapia, shrimp, and catfish ponds: critical dissolved oxygen on POND-A1, elevated ammonia on POND-B3, calibration overdue with high turbidity on POND-C2, low pH after rain on POND-A2, heat stress with biomass overload on POND-D1, feeding window conflict on POND-B1, a normal routine check on POND-E1, and ammonia spike with overstocking on POND-A3.
@app.post("/api/validate/sample", response_model=BatchValidationReport)
def validate_sample() -> BatchValidationReport:
readings = sample_readings()
return run_batch_validation(readings)
I wired CORS permissively because this is a local PoC dashboard, not a public deployment. If I hardened the experiment, authentication and farm tenancy would come first.
Batch aggregation logic
After the graph finishes, run_batch_validation aggregates urgency counts and anomaly histograms. This function is plain Python on purpose; not every step needs to be a graph node. I kept summarization outside the loop so unit tests can call validate_reading directly without invoking LangGraph when I want faster feedback.
return BatchValidationReport(
farm_name=farm_name,
processed=len(results),
emergency_count=counts[UrgencyLevel.EMERGENCY],
high_count=counts[UrgencyLevel.HIGH],
...
summary_stats={
"avg_confidence": round(sum(r.confidence for r in results) / len(results), 2),
"anomaly_distribution": anomaly_distribution,
},
)
The summary statistics feed the dashboard cards and the Matplotlib chart in the README. When I first omitted anomaly_distribution, the UI felt empty even though per-row validation was correct. Aggregates matter for human scanability.
CLI presentation layer
main.py uses Rich tables because field operators often work from tabular summaries. I print farm name, urgency counts, and per-reading rows with color-coded urgency. The CLI became the source of truth for the GIF animation frames, which keeps marketing assets honest.
What surprised me during implementation
When I first ran the batch, a shrimp reading with borderline dissolved oxygen and elevated ammonia scored high urgency but missed feeding conflict until I added note keyword detection. I tightened feed phrase matching and prioritized ammonia patterns for shrimp. Another surprise: confidence scoring felt too flat when every result returned the same value. I added small boosts when multiple anomalies co-occur and when calibration is fresh, which better reflected my own certainty when testing.
Urgency mapping logic
The urgency agent applies a priority ladder I sketched on paper before coding. Critical dissolved oxygen always wins emergency status. Feeding window conflicts escalate to high even when chemistry is borderline, because distributing feed into poor water quality accelerates ammonia production. Isolated calibration overdue stays low unless paired with chemistry anomalies. I wrote the ladder as explicit conditionals rather than a scored formula because explainability beat marginal accuracy in this PoC.
def urgency_agent(anomalies: list[ReadingAnomaly], trace: list[str]) -> UrgencyLevel:
if ReadingAnomaly.DISSOLVED_OXYGEN_CRITICAL in anomalies:
return UrgencyLevel.EMERGENCY
if ReadingAnomaly.FEEDING_WINDOW_CONFLICT in anomalies:
return UrgencyLevel.HIGH
# ... additional branches for low DO, ammonia, biomass, pH, temperature
Readers forking the repo should treat this function as the first customization point. Every farm operates with different risk appetite, and swapping urgency tables is easier than rewriting an LLM prompt.
Confidence scoring rationale
Confidence is not model probability here; it is a heuristic transparency score. Base confidence starts at 0.72. Multiple anomalies add 0.08. High or emergency urgency adds 0.07. Fresh calibration within thirty days adds 0.05. The cap is 0.96 because I never want a deterministic rules engine pretending certainty it cannot justify. When I present confidence in the CLI, I intend it as a sort key for human review, not as a statistical guarantee.
Part 4 — Operations, Reflections, and Next Steps
Let's Setup
Step-by-step details can be found at the repository README: https://github.com/aniket-work/AquaGuard-AI
Locally, the setup path I used looks like this:
- Clone the repository and create a virtual environment inside the project root.
- Install dependencies from
requirements.txt, which pins LangGraph, FastAPI, Rich, and Matplotlib among others. - Run
python main.pyto validate the sample JSON batch and print the Rich summary table. - Optional: run
python main.py --chartto generateimages/validation-stats-chart.png. - Optional: start
uvicorn aquaguard.api.server:app --app-dir src --reloadand open the dashboard at port 8000. - Optional: execute
pytest tests/ -vto confirm emergency dissolved oxygen detection and feeding conflict presence.
I recommend keeping the virtual environment inside the repository folder for PoCs like this so paths to data/sample_pond_readings.json remain predictable when you run scripts from different working directories.
Environment variables are not required for the default demo. If you extend the project with embedding-based retrieval or an external LLM provider, add keys through a .env file but do not commit secrets.
Let's Run
Running python main.py against the bundled dataset typically reports eight processed readings with two emergency and three high urgency findings in my latest run. Emergency items included POND-A1 critical dissolved oxygen at 2.4 mg/L and POND-B1 feeding window conflict combined with sub-threshold dissolved oxygen for shrimp. Average confidence landed around eighty-six percent with an average response window near twelve hours across the mixed batch.
The CLI output ends with an ASCII-friendly table that mirrors what I animated in the repository GIF. That was intentional: I wanted the marketing asset and the actual tool output to tell the same story.
For API mode, POST /api/validate/sample returns JSON containing per-reading recommended_actions and policy_refs. The dashboard renders urgency cards and a results table without a separate frontend build step.
To generate analytics assets, run python main.py --chart. It writes images/validation-stats-chart.png, which I included in the README so GitHub visitors see quantitative behavior immediately.
When I demo this PoC to myself after a break, I use a three-step smoke path: CLI batch, pytest, then dashboard POST. That sequence exercises the graph, validation rules, and HTTP layer without needing external services. If all three pass, I trust the repository state enough to publish diagrams or write about it.
For readers who prefer API exploration with curl, curl -X POST http://localhost:8000/api/validate/sample returns the full BatchValidationReport JSON. I often pipe that output through jq to inspect a single reading's recommended_actions array and verify policy references attached correctly.
Species threshold tables I encoded
The business rules in this PoC are intentionally visible as Python dictionaries rather than buried in prompts. Tilapia dissolved oxygen minimum is 4.0 mg/L with a critical floor at 3.0 mg/L. Shrimp minimum is 5.0 mg/L with critical at 4.0 mg/L because shrimp mortality risk rises faster at marginal oxygen. Catfish tolerates lower oxygen but still flags critical below 2.5 mg/L. Ammonia ceilings differ similarly: shrimp at 0.25 mg/L, tilapia at 0.5 mg/L, catfish at 0.6 mg/L. I chose these numbers from publicly available aquaculture extension ranges, then simplified them for demo clarity. They are not farm-specific standards.
Temperature bands matter because heat increases metabolic oxygen demand. When POND-D1 reports 33.5 Celsius for shrimp with biomass at 16.5 kg/m3, two agents fire: temperature stress and biomass overload. That combination is exactly the kind of multi-signal case I wanted LangGraph to handle without one agent silently overriding another.
Human-in-the-loop considerations
Even in a deterministic PoC, I reserved mental space for human review. Emergency readings should page someone, not close the loop autonomously. LangGraph supports interrupt_before on nodes, and I would place that interrupt before any future "execute intervention" node if I connected to real actuators. Today the action agent only returns text recommendations. That boundary matters ethically and practically.
Edge Cases and Testing Philosophy
I wrote four pytest cases covering batch length, emergency dissolved oxygen detection, feeding window conflicts, and non-empty remediation steps on actionable results. Tests encode what I care about in this PoC: critical dissolved oxygen must never be downgraded quietly, and every non-normal result must include recommended actions.
Ambiguous operator notes remain a weakness. A note mentioning "feed" in a unrelated context might still trigger feeding logic if phrasing overlaps keywords. In production I would route low-confidence classifications to a human queue. Here, I expose agent_trace lists to make that queue possible later.
Species tables only cover three species in this demo. A real farm with hybrid stocking or polyculture would need composite rules. That limitation is acceptable in a demo but worth documenting honestly.
Walking Through a Sample Reading
To make the pipeline concrete, consider reading R-2026-0629-001 from the sample dataset: tilapia pond POND-A1 reports dissolved oxygen at 2.4 mg/L with a feed batch mentioned in operator notes. When I feed this batch through the graph, the sensor agent flags critical dissolved oxygen below the tilapia floor. The species agent confirms chemistry is otherwise within band. The feeding agent detects a feed-window conflict because notes mention feeding while dissolved oxygen is sub-threshold. The compliance agent retrieves AQ-TIL-001: Tilapia Dissolved Oxygen Minimum. The action agent responds with emergency aeration, feed suspension, and technician dispatch with a one-hour response window.
That end-to-end path takes milliseconds in my local runs, but the value is not speed. The value is repeatability. Every reading in the batch gets the same structured treatment, which means I can diff reports across code versions when I tune heuristics.
Policy Knowledge Base Design
I modeled the knowledge layer as a list of PolicyChunk dataclass rows rather than jumping straight to ChromaDB. Each chunk stores a code, title, body, species tag, and category. The search function tokenizes query and body text, counts overlap, and applies species and category boosts. This is intentionally primitive. From my experience building RAG demos, people sometimes spend days on embedding pipelines before validating whether retrieval inputs are structured correctly. Here, the interface is stable: search_policies(query, species, category, top_k).
If I swap in embeddings later, agents above the knowledge layer stay unchanged. That separation mattered to me because it mirrors how I would approach a production migration: nail schemas and agent IO first, upgrade retrieval second.
Monolithic Prompt Versus Specialist Agents
Early in the experiment I tried a single prompt that asked for anomalies, urgency, citations, and actions in one JSON blob. It failed in boring ways: urgency would be conservative when citations were verbose, and shrimp ammonia violations were mislabeled as temperature stress when heat notes appeared in the same sentence. Splitting responsibilities into agents eliminated most of that cross-talk. Each agent receives only the fields it needs, writes to agent_trace, and returns a typed value the coordinator merges.
This aligns with how I think about LangGraph more generally. Nodes are functions with narrow contracts. Edges express control flow a field operator could whiteboard. When I revisit the monolithic prompt idea, it will be as a final summarization layer, not as the core reasoning engine.
Dashboard and Visualization Choices
The HTML dashboard uses a dark theme with urgency-colored badges because farm monitoring tools I have seen in demos often default to sterile white tables that hide urgency. I wanted emergency items to visually pop. The Matplotlib chart uses horizontal bars for urgency and top anomaly categories. Static charts export cleanly to README, which helps GitHub visitors who never run the code still grasp batch behavior.
Performance and Operational Notes
The sample batch of eight readings completes in under a second on my laptop, including graph orchestration overhead. I did not optimize for throughput because validation batches in this PoC are tiny. If batches grew to thousands of historical readings, I would parallelize per-reading validation outside LangGraph or shard by farm while keeping the same agent functions.
Memory footprint stays small because readings are pydantic objects in a list, not streamed from an IoT broker. Logging is print-based in the CLI and trace lists in results. A next step would be structured JSON logs with reading IDs for observability, but I skipped that to keep the repository approachable.
Theory Behind Response Hours
Response hours encode how quickly I believe an operator must show correction before stock health degrades. Critical dissolved oxygen and feeding conflicts map to one or four hours in my heuristics because shrimp and tilapia cannot wait a business week at sub-threshold oxygen. Sensor calibration overdue maps to forty-eight hours because it is serious but not always an immediate mortality event. These numbers are not organizational standards; they are PoC placeholders where I tried to mirror operational urgency I observed in aquaculture extension materials.
Integration Points I Deliberately Left Open
The FastAPI layer returns pydantic-serialized JSON ready for webhooks, SMS workers, or farm management dashboards I did not build. agent_trace arrays are the extension point for human review queues: a UI could highlight low-confidence rows or readings where sensor and species agents disagreed with a prior human label. The compliance agent returns string citations today; tomorrow it could return structured objects with URLs to regulatory documents.
I also left authentication out entirely. If this ever faced real farm credentials or tenant data, isolation and role-based access would precede any model upgrade.
Ethics and Boundaries
Automating pond health validation touches live animal welfare and economic outcomes for farm operators. I want to be explicit: this PoC does not activate aerators, does not suspend feed systems, and does not connect to live IoT control planes. It sorts readings you give it. If someone deployed a derivative system, human review on emergency outcomes would be non-negotiable in my view, and rule changes would require traceable versioning.
I also avoided implying employer affiliation. This build lives entirely in my personal GitHub namespace as an experiment.
Future Roadmap I Would Explore
If I continue this PoC, my next steps would be:
- Replace lexical policy search with embeddings stored in Chroma or pgvector, then measure citation precision against a held-out reading set.
- Add an LLM summarization step that rewrites recommended actions into field runbook language while keeping structured JSON underneath.
- Introduce pond history so repeat dissolved oxygen dips escalate urgency automatically.
- Export intervention ticket drafts with policy footnotes for farm management systems.
- Build a feedback endpoint where a human reviewer marks agent mistakes, creating training data for later fine-tuning.
None of those are implemented here. They are the natural extension points I noted while writing the graph skeleton.
What I would not change
If I rebuilt the PoC from scratch, I would keep three decisions exactly as they are: strict pydantic schemas at the boundary, separate specialist modules instead of one prompt file, and agent_trace on every result. Those choices cost almost nothing in lines of code but paid off every time I debugged a surprising urgency label.
Reader exercises
If you fork the repository, here are three exercises I found useful while learning:
- Add a new reading to
sample_pond_readings.jsonwith ambiguous operator notes and write a test that asserts your expected urgency. - Introduce a sixth agent that drafts a plain-language summary paragraph per reading without altering structured fields.
- Swap lexical policy search for an embedding model and compare citation precision on ten hand-labeled examples.
Each exercise touches a different production concern: evaluation, human-readable output, and retrieval quality.
Comparing Batch Outputs Across Iterations
One habit I kept from prior agent experiments is saving JSON reports when iterating heuristics. AquaGuard-AI supports --json-out on the CLI. Diffing two JSON files after a species threshold tweak shows exactly which reading IDs changed urgency or response hours. That practice sounds tedious, but it prevented regressions when I adjusted shrimp ammonia ceilings. Without structured output, I would rely on eyeballing terminal tables, which does not scale past a handful of readings.
Why Aquaculture and Not Another Domain
I rotate domains in my personal agent experiments to avoid repeating myself. Aquaculture sat outside my recent project history and offered a rich rule surface: species biology, equipment trust, feeding economics, and regulatory policy all intersect in one reading. That intersection is what makes multi-agent validation interesting. A single monolithic classifier would need to simultaneously reason about chemistry, biology, and operations. Splitting agents let me test each concern independently and combine results with traceable provenance.
Closing Thoughts
AquaGuard-AI started as a question about aquaculture triage labor, not as a pitch for autonomous farm management. What I ended up with is a compact LangGraph workflow that feels honest about its limits: deterministic agents, explicit traces, readable tests, and an API that could swap underneath without rewriting business logic.
From my experience, the hardest part of multi-agent tutorials is not drawing boxes on a whiteboard. It is choosing a domain where evaluation is grounded. Pond sensor readings gave me that grounding. When the urgency agent mistyped a shrimp ammonia issue, tests failed or traces looked wrong immediately. When species heuristics worked, the CLI table matched my own gut ranking.
If you are building your own orchestration experiments, steal the structure more than the rules. Keep schemas strict, keep agents narrow, keep orchestration visible, and keep humans in the loop for emergency outcomes. The repository at https://github.com/aniket-work/AquaGuard-AI is there if you want to run the same path I did and remix it for a completely different operational domain.
One last reflection: agent hype often focuses on autonomy. This project convinced me that disciplined validation is the more interesting problem. Pond readings do not need a chatty assistant; they need reliable sorting, traceable reasoning, and fast handoff to humans when stakes are high. LangGraph helped me express that validation story in code without pretending the PoC is more than an experiment.
I will keep iterating on personal agent projects like this because each one teaches a reusable pattern. AquaGuard-AI's pattern is narrow agents, explicit graph state, honest tests, and dashboards that mirror terminal truth. That combination is what I would carry into the next domain, whatever it ends up being.
Cover animation for readers who prefer visual summaries:
Disclaimer
The views and opinions expressed here are solely my own and do not represent the views, positions, or opinions of my employer or any organization I am affiliated with. The content is based on my personal experience and experimentation and may be incomplete or incorrect. Any errors or misinterpretations are unintentional, and I apologize in advance if any statements are misunderstood or misrepresented.
Tags: python, langgraph, ai, aquaculture





Top comments (0)