I'm building Zutato, a food-tracking app focused on nutrient transparency. One of the earliest decisions we made: don't use AI for actual nutrient values. Here's why.
We don't ask the AI what's in your carrot.
Sounds odd for 2026 – we use AI in plenty of places in the app. But the moment it's about the concrete nutrient value that ends up in your logbook, we deliberately don't ask a language model.
Plausible, but wrong.
Ask a language model how much iron is in 100 g of whole-grain oats. Ask it again tomorrow. Ask it the day after with slightly different wording. You'll get three answers that all sound convincing – and differ by several milligrams.
That's not a vibe, that's measurable. A 2025 study fed ChatGPT meal photographs and compared the estimates against reference values:
- Calcium 27.8 % below the real value
- Potassium 49.5 % below the real value
- Folate 38.6 % below the real value
- Vitamin D estimated at zero at the median
- Portion weight underestimated in 76.3 % of cases – and every micronutrient estimate builds on top of that
A separate 2025 study compared three language models and found ChatGPT and Claude at a mean absolute error of around 36 % for weight and energy – but 40 to 73 % on the macronutrients themselves, with Gemini landing between 64 and 110 % depending on the nutrient. The authors see potential for rough tracking use cases, but explicitly call the models unsuitable for precise values.
The underlying issue isn't "AI isn't good enough yet" – it's structural. Language models generate plausible-sounding values without a verifiable source behind them. There's no dataset the answer traces back to, and the next roll of the dice gives you a different number. For a tracking app that's a non-starter.
Crowdsourced is great – just not enough as a sole source.
The obvious alternative would be OpenFoodFacts. Over 100,000 volunteers, more than 4 million products from 150 countries, open and free to use – a project we genuinely respect. Without OFF the whole conversation around open food data would be poorer.
What OFF does well: reach, openness, a huge pool of products that aren't digitally captured anywhere else. What OFF structurally can't do: guarantee that individual values were verified before publication. A binding, formal review process before publication doesn't exist. Through the API you eventually get the same unverified values that were entered.
That's not a complaint – it's the honest consequence of the model. Crowdsourced works for reach. For an app you trust with your own nutrition tracking, it isn't enough as a sole source.
What we're left with: math.
So we take the inconvenient path. For base ingredients – oats, carrots, lentils, tofu, olive oil – we rely on the Bundeslebensmittelschlüssel (BLS), the standard reference maintained by the Max Rubner-Institut, with values whose origin is traceable.
For specific branded products, the manufacturer label is the primary source. When a manufacturer doesn't declare micronutrients – which is the rule, not the exception – we calculate. From a product's ingredient list and the known BLS values for each ingredient, the total nutrient profile can be derived deterministically. Same input, same output, every time. If a BLS update lands, it's clear which values shifted and why.
We're deliberately not going into the specific algorithms here. What matters isn't how clever the calculation is – what matters is that it's deterministic and traceable. An AI isn't, by design.
More work for us, less magic for you.
The result is a curated, self-owned dataset we're responsible for. Smaller than OFF, growing more slowly, considerably less spectacular. In exchange, a 4.6 is still allowed to be a 4.6 two weeks from now – and if it isn't, we can see exactly why.
That's the deal: we take on the work, you get a number you can trust.
If that sounds interesting – beta opens in autumn 2026, sign-up at zutato.com.
Sources
- O'Hara C. et al. (2025): An Evaluation of ChatGPT for Nutrient Content Estimation from Meal Photographs. Nutrients 17(4):607. doi.org/10.3390/nu17040607
- Fridolfsson J. et al. (2025): Performance Evaluation of 3 Large Language Models for Nutritional Content Estimation from Food Images. Curr. Dev. Nutr. 2025;9(10):107556. doi.org/10.1016/j.cdnut.2025.107556
Originally published on zutato.com
Top comments (0)