CalEye.
The Method · Long read

How the AI thinks.

Six stages, hairline-connected. Cited from USDA SR-Legacy and ADA Exchange. Trained for 22 months on tens of thousands of medically-referenced meals.

Updated May 20, 2026 · 12 min read · By the CalEye editorial team

The Pipeline

Six stages. One photograph.

Each stage is a discrete model with a published failure rate. None operates as a black box.

  1. 01 Segmentation Isolate every dish on the plate as its own independent region.
  2. 02 Identification Match each region against a medically-referenced food database.
  3. 03 Portion Estimate weight in grams from on-plate scale cues.
  4. 04 Macros Resolve carbohydrates, fat, and protein per 100 g — then scale to portion.
  5. 05 Glycemic load Cross-reference to the Sydney University GI Database, sum across dishes.
  6. 06 Citation Attach the exact source entry to every number — one tap away in the UI.
Stage 01 · Segmentation

Find every dish on the plate.

Before any food can be weighed or cross-referenced, the model must answer a simpler but deceptively hard question: where does one dish end and the next begin? Segmentation is the stage that draws that boundary, and it runs before anything else in the pipeline.

The underlying architecture is a transformer fine-tuned on approximately 40,000 annotated plate images — meals photographed across lighting conditions ranging from restaurant candlelight to outdoor mid-day sun. Each image was hand-labelled by a team of annotators with training in food service and clinical nutrition. The model learned not just what a dish looks like in isolation, but how dishes relate to each other spatially: the way a pool of curry intrudes on a rice border, the way a garnish belongs to the dish beneath it, the way the sauce of one preparation bleeds into the negative space occupied by another.

Failure modes here are real and acknowledged. The first and most common is overlapping foods — a pile of salad draped over the edge of a protein portion, or a spoonful of condiment shared between two dishes. When the model cannot cleanly assign a pixel region to a single dish, it returns a lower confidence score rather than forcing a hard boundary. The second failure mode is glassware: a glass of juice in the frame reads to the segmentation layer as a translucent object of uncertain content, and is typically flagged with low confidence or excluded from the segmentation map entirely. The third is stews and mixed preparations — biryani, curry, congee — where the visual boundary between dish and broth does not exist in a geometric sense.

When confidence is low, CalEye does not fabricate a clean output. The uncertainty is surfaced in the UI: the dish region is shown with a dotted rather than solid boundary indicator, and the word "Estimated" appears next to any number derived from a low-confidence segment. The decision to make uncertainty visible was deliberate, and came directly from clinical feedback in our early testing phase. Diabetics especially need to know when a number is reliable and when it is a best-available approximation.

Stage 02 · Identification

Match each region to a medically-referenced food.

Once a region is segmented, identification maps it to a specific entry in a curated food database — not a generic label, but a specific, traceable food record with a glycemic index value, a macro profile per 100 grams, and a citation to a primary source. The database currently contains approximately 12,000 entries drawn from USDA SR-Legacy, the ADA Exchange List, the Indian Council of Medical Research nutrient atlas, and the NIN Hyderabad food composition tables.

The distinction between generic and specific identification matters more than it might seem. The model does not return "rice" — it returns one of white rice, brown rice, parboiled rice, basmati, jasmine, arborio, or any of nineteen other rice preparations with meaningfully different glycemic indices. White basmati, for instance, has a GI of approximately 58. Jasmine rice runs closer to 109. A system that collapses both into "rice" and applies a midpoint GI is not making a reasonable approximation — it is making a clinically meaningful error, especially for the patient who has been told by their endocrinologist to avoid high-GI grains.

Western-only food databases fail silently for global users. A database built from American and European meal data has no entry for idli, dhokla, poha, jalebi, bibimbap, gado-gado, or feijoada. When such a database encounters one of these preparations, it either returns nothing or returns the closest Western approximation — a failure that is invisible to the user and potentially significant in glycemic terms. CalEye's database includes a multilingual food set spanning Indian subcontinent cuisines, East Asian preparations, Mediterranean dishes, and Latin American staples. Coverage is not complete — no 12,000-entry database can be — but the failure mode for out-of-set foods is explicit rather than silent.

When the model cannot match a region with sufficient confidence, the UI displays a "Closest match" disclosure: the identified food, its confidence percentage, and a prompt to correct or confirm. This is preferable to the silent guess — which is how every lookup-based competitor currently handles the same situation.

Stage 03 · Portion

The hard problem: how much.

If segmentation is the hardest geometric problem in the pipeline and identification is the hardest knowledge problem, portion estimation is the hardest measurement problem. A two-dimensional photograph contains no depth information. The model is being asked to reason about a three-dimensional volume of food — its height above the plate, the density of its packing, the way it spreads versus mounds — from a flat projection. This is the stage that introduces the most variance in our output numbers, and we think it is important to say so directly.

The approach relies on scale cues embedded in the image. Every photograph of a meal contains objects whose real-world dimensions are known with reasonable confidence: the diameter of a standard dinner plate, the length of a fork, the height of a beverage glass, the dimensions of a standard takeaway container. The model has been trained to identify these objects and use them as metric anchors. When a plate is identified in the frame, its diameter — typically between 24 and 30 centimetres for a dinner plate — becomes the reference unit against which food dimensions are estimated.

Plate-edge detection is therefore not incidental to the portion stage — it is the foundational measurement step. The model identifies the plate boundary, estimates its diameter from the image perspective, and uses that diameter to scale all food dimensions detected within the frame. From food dimensions, it estimates volume using preparation-type density priors: the density of cooked white rice, for instance, is approximately 0.75 g/cm³ in a loosely-packed serving. These priors are calibrated from physical measurements taken during the training data collection phase.

The output is grams with a confidence interval — not a point estimate. The UI displays the central estimate, and tapping any macro number surfaces the full interval. A meal that resolves cleanly — a single portion of chicken breast beside a scoop of rice on a round white plate, photographed from above in good light — will show an interval of roughly ±8%. A mixed restaurant plate in low light will show ±20% or wider, and the interface labels it accordingly.

One design decision that generated internal debate: when a stacked or layered food is present — a sandwich, a burger, a multi-layer casserole — the model tends to over-estimate slightly. We have deliberately not corrected this bias for our diabetic user base. For a person managing post-prandial blood glucose, an over-estimate of glycemic load that triggers a small conservative adjustment is less harmful than an under-estimate that lets a spike go unmanaged. The bias is disclosed in our accuracy documentation, and it is not applied for users who have indicated a weight-management rather than glycemic-management context.

When no reliable scale cue is visible in the frame — close-up shots of bowls with no surrounding context — the model prompts the user to retake with more of the table in frame. We do not produce a portion estimate without a credible scale anchor.

Stage 04 · Macros

Carbs, fat, protein — per 100 g.

Given the portion in grams from stage 3 and the food identity from stage 2, computing macros is in principle a straightforward multiplication. The per-100 g macro profile of the identified food — drawn from the USDA SR-Legacy or the equivalent regional source — is scaled by the estimated portion weight. The result is carbohydrates, fat, and protein in grams for the actual portion on the plate.

The reason we anchor on per-100 g values rather than per-serving values is worth explaining, because it is a design choice that runs counter to how nutrition labelling works in most jurisdictions. Serving sizes on packaged food vary between 4x and 8x across product categories — a "serving" of breakfast cereal is 30 g; a "serving" of pasta may be listed as 85 g dry or 190 g cooked. Neither figure is the portion that landed in your bowl. Per-100 g values are a stable, food-intrinsic measure that does not change with the serving size fiction printed on a label.

Calories are derived, not separately predicted. The 4-4-9 rule — 4 kcal per gram of carbohydrate, 4 kcal per gram of protein, 9 kcal per gram of fat — is applied to the resolved macro figures to produce the calorie total. This is how USDA itself computes Atwater-method calories for the SR-Legacy entries. Running a separate calorie prediction model would introduce a second source of variance without improving accuracy.

One quality check runs at this stage that can trigger a pipeline branch: if the protein-to-fat ratio for the identified food is implausible given the visual evidence — a portion the model has identified as chicken breast but where the protein-to-fat ratio resolves outside the range consistent with any preparation of chicken — the system initiates a partial re-segmentation pass focused on that region. This catches misidentification errors that the identification stage's confidence score alone might not surface.

Stage 05 · GL

Cross-reference to the glycemic-index table.

Glycemic load is computed from the carbohydrate figure produced in stage 4 and the glycemic index value associated with the identified food. The formula is: GL = (carbohydrates in grams × GI) / 100. This is the standard calculation used in the clinical literature, adopted from the work of Jenkins et al. and subsequently validated in meta-analyses published in the American Journal of Clinical Nutrition. For a plain-language explanation of why glycemic load is more useful than glycemic index alone, see Glycemic load vs glycemic index — the one that matters.

GI values are drawn from the University of Sydney's GI Database — the most comprehensive published source of peer-reviewed GI values. Where multiple GI values exist for a single food preparation (which is common — GI varies with cooking method, ripeness, and even cooling time for certain starches), the model applies a preparation-method prior: if the segmentation and identification stages have resolved the preparation type with sufficient confidence, the GI appropriate to that preparation is used. If not, a conservative upper-bound value is applied, consistent with the diabetic-user bias described in the portion stage notes.

The UI surfaces GL rather than GI because GL is the clinically actionable number. GI measures the speed at which a food raises blood sugar in a fasting state, referenced to a fixed 50 g carbohydrate portion. GL measures the actual blood-sugar impact of the food as consumed, at the portion size actually eaten. A watermelon has a high GI (72) but a low GL per standard serving (approximately 5) because a standard serving contains very few carbohydrates. For mixed meals — which is most meals — the per-dish GL values are summed to produce a meal-level GL figure.

Stage 06 · Citation

Every number, traceable.

The final stage is not a computation — it is an accountability step. Every number CalEye returns is attached to the exact database entry that produced it: the USDA SR-Legacy food code, the ADA Exchange List category, the Sydney University GI record. Tapping any number in the CalEye UI surfaces a modal with the food name as it appears in the source database, the per-100 g macro profile, the GI value and its source citation, and the confidence score for each stage of the pipeline.

Other AI nutrition applications return numbers without provenance. We believe this is the disqualifying failure mode for diabetic use. A person managing blood glucose cannot act on a carbohydrate figure whose lineage they cannot inspect. When a number from a black-box model disagrees with a clinician's expectation, there is no path to resolution — no way to know whether the error is in the AI's food identification, its portion estimation, its macro lookup, or its GI table. Citation is the mechanism that makes any disagreement resolvable, and makes the system auditable by the clinician, not just the patient.

In practice

What we get right. And what we don't.

The honest account of CalEye's accuracy is not a single headline figure — it is a distribution across meal types, lighting conditions, and photo angles. Here is what the internal test data shows, with the methodology visible so clinicians can evaluate it.

For single-dish photographs taken in reasonable ambient light with a visible plate edge — the conditions where all six pipeline stages can operate close to their design parameters — our internal test set of 1,200 plates shows a median error of ±8% on total carbohydrates. This covers a wide range of food types including Western, South Asian, and East Asian preparations. The test set was not drawn from the training data and was evaluated against manually weighed and lab-measured reference values.

For mixed restaurant plates — multiple dishes, sauces overlapping, low or mixed light, no clear plate edge, portions plated decoratively rather than practically — error climbs to ±15–20%. We publish this figure because we think under-reporting it would be the category of dishonesty that makes AI medical tools dangerous.

Where we lose accuracy most consistently: deep-fried foods, where oil absorption after cooking is invisible to the camera and can represent 20-40% of the caloric content of the item; stews and braises, where hidden ingredients — ghee, coconut milk, sugar added during cooking — are undetectable from visual inspection; servings above approximately 500 g, where the portion model's scale estimation becomes less reliable at large volumes; and composite dishes like biryani, paella, or lasagne, where multiple preparation steps and hidden fats have been physically integrated into a dish that reads as a single region.

Where we are confident: simple plates with clear ingredient boundaries, single-ingredient preparations (a grilled fish fillet, a bowl of oats, a piece of fruit), and packaged snacks where the barcode can be used as a cross-check against the visual estimate. For these categories, the ±8% figure holds across lighting conditions, and in our clinical partner evaluation, no endocrinologist reviewing the output flagged a number as clinically unacceptable for decision-support use.

A final and important caveat: CalEye is not a medical device and is not regulated as one. The numbers it returns are decision-support inputs — the same category as a food diary or a nutrition label — and are not a replacement for the clinician-prescribed insulin protocol, the ADA Exchange List exercise your dietitian has designed for you, or the regular A1C monitoring your endocrinologist has recommended. Our aim is to make the inputs to those clinical decisions more accurate than the mental estimates most people currently rely on. That is a meaningful improvement, and it is a more honest claim than those made by apps that position AI nutrition as a clinical replacement.

FAQ

Common questions.

How is this different from MyFitnessPal's photo feature?
MyFitnessPal's photo-to-food feature uses the photograph as a search shortcut: the image is classified into a category, and the result is a lookup against MFP's crowdsourced food database. The photograph is the interface, not the measurement. CalEye uses the photograph as the primary nutritional input: portion size is derived from the visual geometry of the image itself, not from a default serving size associated with a database entry. The practical difference is largest for home-cooked and restaurant meals, where no database entry reflects what was actually served. MFP's approach works well for packaged foods with reliable label data. Ours works for the other 70% of meals.
Does it work offline?
Yes. The core models — segmentation, identification, portion, and macro computation — run on-device. No network connection is required to photograph a meal and receive carbohydrate, calorie, and glycemic load figures. Citation links, which connect a number to its source entry in USDA SR-Legacy or the Sydney GI Database, require a network connection for the first view; once retrieved, the source record is cached locally and available offline for subsequent views. The on-device model is updated via a background refresh when a connection is available, consistent with the update schedule published in the app's settings screen.
Can I trust the carb count enough to dose insulin?
Not as a replacement for your clinician's insulin protocol — but yes as a more accurate input than the mental estimate most people currently use for dose calculation. The ADA's position is that carb counting improves glycemic control when the carb estimates are reasonably accurate. CalEye's ±8% accuracy on simple plates is substantially better than the estimated ±30–50% typical of unassisted mental estimation. For mixed restaurant plates, the ±15–20% figure is closer to the unassisted baseline, and we label those results accordingly. Always cross-check with your endocrinologist's formula and the correction factors in your personalised protocol. We are a decision-support tool, not a dosing calculator.
Why don't you give a single calorie number? Where is the confidence interval?
We do give a confidence interval — it is one tap away. The primary UI surface shows the point estimate because most users want a single actionable number at a glance, and adding the interval to every display element would make the interface harder to read at the moment of use. Tapping any calorie, carbohydrate, or GL number surfaces a detail panel showing the full confidence interval, the contributing stage confidence scores, and the source citation. Marketing copy for CalEye shows the point estimate in headlines; the interval is always present in the product and is the number we recommend for clinical use. If you are using CalEye figures in a clinical context, always use the interval, not the headline.
The full library

Every science post in the CalEye archive.

44 peer-reviewed posts on the science behind food recognition, calorie measurement, and AI nutrition.

Calorie science & measurement

Glycemic science & glucose

Macronutrient science

Other science

Begin

See the method.
In your camera.