USDA SR-Legacy — What's in the Database Your App Uses
USDA Standard Reference Legacy (SR-Legacy, release 28) is the foundational nutrition database that underpins almost every calorie-counting app, food label verification system, and clinical nutrition calculator in use today — a collection of 8,789 food items with up to 150 nutrient values each, accumulated over six decades of laboratory measurement and published in its final form in 2018 before being superseded by FoodData Central. Understanding what SR-Legacy contains, how its values were generated, and where its coverage is weakest is essential for anyone who wants to interpret the numbers their tracking app returns: a single value in the database is not a single measurement but a statistical summary of multiple laboratory analyses, and the variance behind that summary can easily span ±15% for macronutrients in whole foods.
How USDA SR-Legacy Values Are Generated
Each entry in SR-Legacy represents a nationally representative composite sample: multiple lots of a food are purchased from grocery stores across different US regions, homogenized into a single analytical sample, and then analyzed by contract laboratories using standardized AOAC (Association of Official Agricultural Chemists) methods. The goal is a nutrient profile that represents what an American consumer is likely to encounter when purchasing that food, not what a specific lot or batch contains.
The analytical process works in layers. Proximate composition — protein, fat, total carbohydrate, moisture, and ash — is determined first and forms the basis for energy calculation. Mineral content (calcium, iron, zinc, phosphorus, potassium, sodium, magnesium, and others) comes from separate atomic absorption or inductively coupled plasma analyses. Vitamin content requires yet another set of analytical runs, as different vitamins require different extraction and detection methods. Fatty acid profiles, amino acid profiles, and phytochemical data (where available) add further analytical complexity.
The number of samples analyzed per food item varies considerably. Staple commodities — raw chicken breast, whole milk, white rice — may be represented by 80–100 samples collected over multiple years, making the database value a statistically stable estimate. Less common items may be based on 4–6 samples, and the statistical uncertainty of a 6-sample mean is substantially higher. SR-Legacy reports the number of samples (tagged as “n”) for each nutrient-food combination, allowing technically literate users to assess confidence in a specific value — but most apps do not surface this metadata.1
A critical point for users: SR-Legacy nutrient values are means from the sample distribution, not ranges. When your app reports “230 kcal per 100 g of cooked chicken breast,” it is reporting the mean of a distribution that has real variance — the true caloric content of a specific piece of chicken you are eating could be 195–265 kcal per 100 g depending on the bird, the cut, the cooking method, and moisture loss. The single number in the database is not wrong; it is an accurate summary of the central tendency that conceals the distribution’s width.
Nutrient Derivation vs Direct Measurement
Not every nutrient value in SR-Legacy is generated by direct laboratory analysis. Significant portions of the database rely on derived or calculated values, and understanding this distinction matters for interpreting the accuracy of specific nutrient figures.
Carbohydrate by difference is the most consequential derivation. For most SR-Legacy entries, total carbohydrate is not directly measured. Instead, it is computed as:
Carbohydrate = 100 g − moisture − protein − total fat − ash
This means that carbohydrate absorbs the cumulative measurement error from all four of the directly measured proximate fractions. If moisture is overestimated by 1 g and fat is underestimated by 1 g, carbohydrate is underestimated by 2 g — without any analytical error in the carbohydrate measurement itself. For foods with highly variable moisture content (fresh produce, cooked grains, meat), this propagated error can be substantial.1
Energy calculation from Atwater factors is another derivation. SR-Legacy calories are not measured by bomb calorimetry (direct heat measurement). Instead, they are calculated using Atwater conversion factors: 4 kcal per gram for protein, 4 kcal per gram for carbohydrate, 9 kcal per gram for fat. These factors are averages derived from bomb calorimetry studies conducted in the late 19th and early 20th centuries on representative Western diets. They are reasonably accurate for mixed diets of processed and animal foods, but they systematically overestimate the metabolizable energy of high-fiber plant foods, where a proportion of the carbohydrate is fermented by gut bacteria rather than directly absorbed.1
Modern research on modified Atwater factors (developed by Livesey and others) accounts for fiber fermentation and produces lower calorie estimates for legumes and whole grains than SR-Legacy’s Atwater-based calculations. For a 150 g serving of cooked lentils, the difference between Atwater-based and modified Atwater-based energy estimates is approximately 15–20 kcal — small in isolation but meaningful if lentils are a daily dietary staple.
Imputed values for mixed dishes represent a third category of derivation. When SR-Legacy includes a composite food (a recipe rather than a single ingredient), the nutrient values are typically calculated from the ingredient profile rather than analytically measured. “Beef stew” in the database is not a laboratory measurement of a specific beef stew — it is the sum of nutritional contributions from the ingredient components, with adjustments for yield and cooking losses. The accuracy of this approach depends on how well the recipe in the database reflects the actual composition of the food you are eating.
SR-Legacy vs FoodData Central: What Changed
FoodData Central (FDC), launched in April 2019, replaced SR-Legacy as USDA’s primary nutritional data resource. SR-Legacy is preserved within FDC for backward compatibility but receives no new entries — it is a historical dataset frozen at 2018 coverage.
FDC organises nutritional data into four interlinked datasets with distinct data quality characteristics:2
Foundation Foods (~1,100 items as of 2024) are the FDC’s analytically rigorous backbone. Each Foundation Food entry includes not only mean nutrient values but also standard deviations — giving apps the information they need to represent nutrient uncertainty honestly rather than as false-precision point estimates. Foundation Foods are the most scientifically sound entries in any USDA dataset.
SR-Legacy (~8,789 items) is preserved for historical continuity. It has the broadest food coverage of any USDA dataset by item count, which is why most nutrition apps continue to query it despite its limitations.
Branded Foods (~1 million+ items) contains manufacturer-submitted label data for packaged products. Coverage is broad but data quality is only as good as the label, which has a ±20% legal tolerance in the US and EU. Brand-specific entries are more accurate than generic database entries for specific packaged products but require regular updates as formulations change.
Survey Foods (FNDDS) supports NHANES dietary recall research and covers recipe-based food composites used in national dietary surveys. It is primarily of interest for epidemiological research rather than individual tracking applications.
Most nutrition apps use SR-Legacy as their primary database for unbranded foods because its 8,789-item coverage exceeds Foundation Foods’ current 1,100-item scope. The implication is that you are usually getting means without standard deviations — SR-Legacy values without the uncertainty data that FDC’s Foundation Foods now includes.
Global Cuisine Coverage Gaps
SR-Legacy reflects the food environment of the United States in the late 20th century. Its coverage of globally consumed staple foods is uneven, and the gaps are substantial for cuisines from South Asia, East Asia, West Africa, the Middle East, and Latin America.
A 2019 audit by Dunford et al. (Public Health Nutrition) examined nutritional database coverage for foods commonly consumed in India, Nigeria, and Mexico. Approximately 34% of commonly consumed foods in these populations had no direct SR-Legacy equivalent, forcing applications to substitute the nearest available analogue — sometimes introducing macronutrient estimation errors of 20–40%.3 This gap is especially pronounced for dishes like those covered in our South Asian carb counting guide.
Examples of specific gaps:
- South Asian staples: Idli, dosa, upma, pongal, and regional rice varieties (Sona Masuri, Ponni, red rice) are either absent from SR-Legacy or represented by generic entries that do not capture preparation-method variation. A dosa made from a fermented rice-and-lentil batter has a nutritional profile and glycaemic index substantially different from a generic “rice pancake” entry.
- Dal varieties: Multiple lentil and legume preparations — chana dal, urad dal, moong dal, toor dal — are either absent or collapsed into a single “lentils, cooked” entry. The macronutrient differences between whole urad and split moong are not trivial.
- Regional African staples: Ugali, jollof rice, egusi soup, and most traditional West African dishes are absent. Apps serving African users either use rough analogues or rely on user-submitted entries without quality verification.
Regional databases partially fill these gaps: the Indian Food Composition Tables 2017 (IFCT 2017) published by the National Institute of Nutrition covers approximately 528 Indian foods with direct analytical measurement. The Sydney GI Database uses a comparable methodology for glycaemic index data. The PHE Nutrient Databank covers UK-specific products. The LanguaL food description thesaurus attempts cross-database harmonization. But these databases use different analytical protocols, nutrient definitions (available carbohydrate vs total carbohydrate, for example), and food description conventions — making direct cross-database comparison difficult without translation layers.
NDB Numbers, Refuse Percentages, and Yield Factors
Three technical elements of SR-Legacy structure the relationship between a raw as-purchased food item and its cooked, eaten portion. Apps that handle these correctly produce more accurate calorie estimates; apps that handle them incorrectly introduce systematic errors.
NDB numbers are the legacy identification keys assigned to each SR-Legacy food item (e.g., NDB 01009 for “Cheese, cheddar”). These keys allow apps to link to specific database entries and are the lookup identifiers that underpin barcode-to-nutrient mapping. When you scan a food barcode and the app returns nutritional data, it is typically resolving the barcode to an NDB number (or a branded food equivalent) and returning the associated values.
Refuse percentage represents the inedible fraction of a food as purchased — bone, shell, peel, seed — expressed as a percentage of the as-purchased weight. Chicken thigh, bone-in, has a refuse percentage of approximately 27% (the bone). If an app looks up “chicken thigh” and you weigh 200 g of a bone-in thigh, the edible portion is approximately 146 g, and the nutritional values should be applied to 146 g, not 200 g. Apps that do not adjust for refuse percentage overestimate calorie and protein content for bone-in meats, shell-on seafood, and whole fruit with inedible skins.
Yield factors convert nutrient values from raw to cooked weight, accounting for moisture loss and fat gain or loss during cooking. “Chicken breast, raw” in SR-Legacy has different values per 100 g than “chicken breast, cooked, roasted” — not only because roasting changes the nutritional profile slightly but because roasting removes moisture, concentrating the remaining nutrients. A 150 g raw chicken breast yields approximately 110 g cooked chicken breast (a yield factor of ~0.73). If you weigh your chicken after cooking and apply raw-weight nutritional values, you overestimate protein by approximately 27%. This is one of the most common systematic errors in protein tracking for people who cook at home.
How Apps Should Use — and Disclose — Database Provenance
Best practice for a nutrition app is to surface the database source alongside every nutrient figure, flag entries derived by difference or recipe calculation rather than direct measurement, and present confidence intervals for AI-estimated portions rather than false-precision single numbers. Most apps currently do none of these things.
SR-Legacy values for cooked mixed dishes are particularly uncertain: they are estimated from recipes, not analyzed as laboratory composites. An app blending AI vision for nutrition estimates (carrying ±20% error) with SR-Legacy recipe calculations (carrying ±10–15% error) stacks two approximation layers — a reality that should be communicated to users transparently rather than implied by a precisely formatted calorie integer.1
CalEye’s approach is to link each identified food item to its specific FoodData Central or SR-Legacy source, so users can see which database item was matched and identify mismatches. For AI-estimated portions, confidence intervals are displayed rather than single numbers. This transparency allows nutritionally literate users — and their healthcare providers — to audit the estimation and correct it where the database analogue does not fit the actual food consumed. The number is only as useful as the user’s ability to evaluate it.
References
-
U.S. Department of Agriculture, Agricultural Research Service. USDA National Nutrient Database for Standard Reference, Legacy Release (April 2018). Nutrient Data Laboratory Home Page. https://www.ars.usda.gov/northeast-area/beltsville-md-bhnrc/beltsville-human-nutrition-research-center/methods-and-application-of-food-composition-laboratory/mafcl-site-pages/sr11-sr28/
-
U.S. Department of Agriculture, Agricultural Research Service. FoodData Central (2019–present). https://fdc.nal.usda.gov/
-
Dunford EK, Ni Mhurchu C, Huang L, et al. “A Comparison of Nutrients in Foods Sold in Supermarkets in the United States, Australia, and New Zealand.” Public Health Nutrition 22, no. 10 (2019): 1892–1902.
-
Livesey G. “Energy Values of Unavailable Carbohydrate and Diets: An Inquiry and Analysis.” American Journal of Clinical Nutrition 51, no. 4 (1990): 617–637.
-
National Institute of Nutrition, Indian Council of Medical Research. Indian Food Composition Tables 2017 (IFCT 2017). Hyderabad: NIN-ICMR, 2017.
Frequently asked questions
- How are calorie values in USDA SR-Legacy actually calculated?
- Calories in SR-Legacy are not measured by direct combustion. They are calculated using Atwater factors — 4 kcal per gram for protein and carbohydrate, 9 kcal per gram for fat. Total carbohydrate itself is derived by subtracting moisture, protein, fat, and ash from 100 g, meaning it absorbs cumulative measurement error from all four directly measured fractions.
- How many food items does USDA SR-Legacy contain and why do apps still use it?
- SR-Legacy release 28 contains 8,789 food items with up to 150 nutrient values each. FoodData Central's newer Foundation Foods dataset covers only about 1,100 items, so most apps continue querying SR-Legacy for its broader unbranded food coverage despite it being frozen at 2018 and receiving no new entries.
- What is the refuse percentage and why does it matter for calorie accuracy?
- Refuse percentage is the inedible fraction of an as-purchased food — bone, shell, peel — expressed as a percentage of total weight. A bone-in chicken thigh has roughly 27% refuse, so nutritional values should apply to only 146 g of a 200 g thigh. Apps that skip this adjustment overestimate calories and protein for bone-in meats and whole fruits.
- Why is USDA SR-Legacy coverage poor for South Asian, African, and Latin American cuisines?
- SR-Legacy reflects the US food environment of the late 20th century. A 2019 audit found approximately 34% of commonly consumed foods in India, Nigeria, and Mexico had no direct SR-Legacy equivalent. Dishes such as idli, dosa, ugali, and regional lentil preparations are absent or collapsed into generic entries that can introduce macronutrient errors of 20–40%.
- How does cooking yield factor affect protein tracking when you weigh food after cooking?
- A 150 g raw chicken breast yields approximately 110 g cooked after roasting — a yield factor of about 0.73 from moisture loss. If you weigh cooked chicken and apply the raw-weight nutritional values from the database, you overestimate protein content by roughly 27%. This is one of the most common systematic errors in home protein tracking.