Davi Domingos de Oliveira

Posted on Apr 3

Data Modeling for Agriculture: Combining Cash Flow and Average Daily Gain (ADG) in the Same Database

#architecture #data #database #dataengineering

1. Introduction: A Real Problem on the Ground

When people talk about technology in agriculture, the conversation usually jumps straight to the flashy stuff — autonomous tractors, drone-mapped fields, and bleeding-edge IoT sensors. But that's the reality for a small minority. There's a quiet technological divide running through the agricultural sector that rarely makes headlines.

Livestock alone accounts for roughly 6–7% of Brazil's entire GDP. Yet 77% of rural properties in the country belong to small and mid-sized producers. So how does the vast majority of these operators manage the complexity of running a farm — tracking supplementation costs, live-weight commodity price swings, and the Average Daily Gain (ADG) of dozens or hundreds of animals? With paper notebooks, whiteboards, and generic spreadsheets that don't talk to each other.

As engineers, we have a name for this: Dark Data. Thousands of data points generated every single day, lost or severely underutilized simply because they were never captured in a relational database. Profit margins in modern livestock production are razor-thin, and making decisions based on unstructured data is a bottleneck worth billions of dollars.

In this post, I want to walk through the architecture of the Livestock Management System (LMS), a zootechnical ERP I built to bring structure to this long tail of agricultural data — with a particular focus on how I solved the project's most significant performance bottleneck.

2. The Technical Challenge: Python vs. the Database

2.1. Understanding the Bottleneck (The ADG Math)

ADG (Average Daily Gain) is an estimate of an animal's daily weight gain, typically measured in kilograms. Given a table structured like this:

CREATE TABLE IF NOT EXISTS pesagens (
    id INT AUTO_INCREMENT PRIMARY KEY,
    animal_id INT NOT NULL,
    data_pesagem DATE NOT NULL,
    peso DECIMAL(10, 2) NOT NULL,
    deleted_at DATETIME NULL DEFAULT NULL,
    FOREIGN KEY (animal_id) REFERENCES animais(id)
);

The ADG formula is straightforward: (Latest Weight - First Weight) / Days between first and last weigh-in.

On a working farm, this metric directly drives decision-making. It's the number that tells a rancher whether a nutrition protocol is working — or whether a particular animal is just burning through feed budget with nothing to show for it.

The real problem starts when you try to turn that simple formula into scalable code.

2.2. The Backend Processing Trap (The Python Anti-Pattern)

My first instinct was to use MySQL purely as a data store and let the Python backend handle all the computation — something roughly like this:

cursor.execute("SELECT animal_id, peso, data_pesagem FROM pesagens WHERE deleted_at IS NULL")
all_weigh_ins = cursor.fetchall()  # <--- THE BOTTLENECK STARTS HERE

herd_adg = {}

# Python doing the database's job
for record in all_weigh_ins:
    animal_id = record['animal_id']

    if animal_id not in herd_adg:
        herd_adg[animal_id] = {'weights': [], 'dates': []}

    herd_adg[animal_id]['weights'].append(record['peso'])
    herd_adg[animal_id]['dates'].append(record['data_pesagem'])

# Calculating ADG per animal with CPU-heavy loops
results = []
for animal_id, data in herd_adg.items():
    first_weight = min(data['weights'])
    last_weight  = max(data['weights'])

    first_date = min(data['dates'])
    last_date  = max(data['dates'])

    days = (last_date - first_date).days

    if days > 0:
        adg = (last_weight - first_weight) / days
        results.append({'animal_id': animal_id, 'adg': adg})

After the first load-testing round, I knew this approach was a dead end. The backend was carrying the full weight of raw computation, which caused a dramatic spike in server RAM consumption. The direct consequence: the system would slow down and eventually lock up as it scaled to more farms and larger herds.

3. The Solution: Push Intelligence into the Database

To fix the memory bottleneck, I flipped the logic entirely. Instead of pulling thousands of raw rows into Python and forcing the application layer to process them, why not do the math where the data already lives?

That's how I designed the "Performance-First" architecture of the LMS — delegating the heavy lifting to MySQL.

3.1. Optimized SQL Views and Window Functions

The centerpiece of this solution is a set of SQL Views. A View acts as a virtual table with the formula baked in. Instead of computing ADG on the fly in the backend at request time, I teach the calculation to the database itself.

To ensure the computation correctly reflects the chronological order of weigh-ins — even when an animal loses weight during a period — I used Common Table Expressions (CTEs) and Window Functions (ROW_NUMBER).

Here's how I structured the v_gmd_analitico view to solve this complex aggregation automatically:

CREATE OR REPLACE VIEW v_gmd_analitico AS
WITH PesagensOrdenadas AS (
    SELECT
        animal_id, data_pesagem, peso,
        ROW_NUMBER() OVER(PARTITION BY animal_id ORDER BY data_pesagem ASC) as rn_asc,
        ROW_NUMBER() OVER(PARTITION BY animal_id ORDER BY data_pesagem DESC) as rn_desc
    FROM pesagens
    WHERE deleted_at IS NULL
),
PrimeiraUltima AS (
    SELECT
        animal_id,
        MAX(CASE WHEN rn_asc = 1 THEN data_pesagem END) AS data_inicial,
        MAX(CASE WHEN rn_asc = 1 THEN peso END) AS peso_inicial,
        MAX(CASE WHEN rn_desc = 1 THEN data_pesagem END) AS data_final,
        MAX(CASE WHEN rn_desc = 1 THEN peso END) AS peso_final
    FROM PesagensOrdenadas
    GROUP BY animal_id
)
SELECT
    a.user_id, a.id as animal_id, a.brinco,
    p.peso_final,
    (p.peso_final - p.peso_inicial) as ganho_total,
    DATEDIFF(p.data_final, p.data_inicial) as dias,
    CASE
        WHEN DATEDIFF(p.data_final, p.data_inicial) > 0
        THEN (p.peso_final - p.peso_inicial) / DATEDIFF(p.data_final, p.data_inicial)
        ELSE 0
    END as gmd
FROM PrimeiraUltima p
JOIN animais a ON p.animal_id = a.id
WHERE p.data_inicial <> p.data_final
  AND a.deleted_at IS NULL;

Here's where the architecture pays off: when Flask needs to render a producer's dashboard, it no longer runs unbounded loops or pulls raw data across the network. The main listing queries the base animais table using server-side pagination, and the View is called surgically only when complex metrics are needed. MySQL handles all the grouping and guards against division-by-zero, delivering the zootechnical metric ready for aggregation — SELECT AVG(gmd) FROM v_gmd_analitico — or for building out a detailed animal profile page instantly.

3.2. Composite Indexes for Instant Lookups

Solving the ADG computation with a View was a major win, but what about when a producer wants to filter costs for a specific lot? Without proper indexing, MySQL would fall back to a Full Table Scan on every click — scanning the entire table regardless of how targeted the query was.

I addressed this with composite structural indexes — idx_pesagens_otimizada and idx_custos_busca being the two most critical. Even as a client registers thousands of animals over the years, every filtered lookup continues to return in milliseconds. The indexes effectively make query cost O(1) at the application layer, regardless of table size.

3.3. Server-Side Pagination and Connection Pooling

Two final rules govern the communication layer between the API and the database:

Server-Side Pagination: The main dashboard never loads the entire animal roster in a single response. I use LIMIT and OFFSET directly in MySQL queries, sending the browser only what fits on the current page.

Connection Pooling: Using mysql-connector-python, I configured a connection pool at database setup time. This eliminates the massive TCP overhead of opening and closing a new database connection on every incoming API request — the pool keeps connections warm and ready to reuse.

4. Results and Real-World Impact

The best architecture isn't the one built with the most fashionable tools — it's the one that eliminates the user's actual bottleneck in the smartest way. Shifting to a database-first approach delivered immediate, measurable gains on two fronts.

Business Impact: The End of Dark Data

A livestock producer who previously spent around 2 hours manually cross-referencing weigh-in notebooks with slow Excel files now has the complete farm picture in front of them. The dashboard delivers the full herd ADG analysis and consolidated cash flow in under 3 minutes. Dark data became actionable intelligence.

Technical Impact: A Server That Can Actually Scale

The Flask server is no longer choking on in-memory computation. RAM consumption dropped sharply because Python stopped pulling thousands of rows over the wire to do arithmetic it was never meant to do. With the SQL View absorbing the heavy calculation, composite indexes accelerating every filtered query, and pagination controlling data transfer, the system is genuinely production-scalable. Double or triple the number of registered animals — the dashboard still opens in milliseconds.

5. Conclusion

Generic software fails in agriculture because it treats a steer's live weight like SKU inventory in a retail system. Livestock is a living system, and its key performance indicators — like Average Daily Gain — demand an architecture designed around them, not bolted on as an afterthought.

In the end, good software solves real problems. Going from a rancher's operational pain to a production-grade SQL View using CTEs and Window Functions is what separates purpose-built tooling from off-the-shelf software dressed up with a new logo.

Want to dig into the code or see this architecture running live?

💻 Source code: Dom1ng0s/sistema_gado on GitHub
🚀 Live demo: Active deployment via Docker on Oracle Cloud: http://163.176.171.60

Appendix B: Brazilian Agribusiness Terms — A Translator's Note

Two terms in the original Portuguese source require special handling for an international engineering audience:

"Arroba" (@ — the Brazilian livestock weight unit)
In Brazilian cattle markets, live weight is quoted per arroba — a unit equivalent to 15 kg (approximately 33 lbs). When a rancher tracks the "price per arroba," they're tracking the live-weight commodity price used to calculate the market value of an animal at a given weight. For this article, I opted to avoid the term entirely and instead refer to "live-weight commodity price" and "market price per kilogram," which convey the same economic concept without requiring domain-specific context.

"Boi Gordo" (lit. "Fat Ox" / Finished Beef Steer)
Boi Gordo is the Brazilian futures market category for a grain-finished steer ready for slaughter — the rough equivalent of "fed cattle" in the CME Group's Live Cattle futures contract in the US. Price fluctuation in this category is what the article refers to when mentioning commodity price swings that affect profitability. I translated this simply as "live-weight commodity price" throughout to keep the focus on the economic signal rather than the Brazilian market nomenclature.

DEV Community