CalEye.
Blog · how-to May 23, 2026 10 min read

How Fitness Devices Calculate Calories Burned — and Where They Err

Every major fitness wearable now displays a calorie burn estimate for your workout and your day. The number appears confident — two decimal places on some apps, updated in real time, color-coded by intensity zone. It is also, in most cases, wrong by between 15% and 30%. Not because the devices are badly made, but because the problem of measuring how much energy a human body expends is genuinely hard, and the sensors available in a wrist-worn device are indirect proxies for the underlying physiology, not direct measurements of it.

Understanding where the error comes from matters for anyone who uses calorie data to guide food intake, training load, or weight management. A person who trusts a wearable’s 700 kcal burn estimate and eats 700 kcal above their baseline may be eating 500–900 kcal above it in reality. Over weeks, that gap is the difference between weight maintenance and weight gain, and no amount of app engagement fixes a systematically miscalibrated input number.

This piece maps the full signal chain from body to display — the sensors, the algorithms, the population assumptions, and the compounding errors — so you can use your device’s output intelligently rather than uncritically.

The gold standard your wearable cannot use

Direct calorimetry — placing a person in a sealed, thermally insulated chamber and measuring the heat they emit — is the most accurate method of measuring energy expenditure. It is expensive, laboratory-bound, and requires the subject to sit in a room for hours. No wearable will ever use it.

Indirect calorimetry is the second gold standard: measuring the volume of oxygen consumed (VO₂) and carbon dioxide produced (VCO₂) per unit time, then applying the Weir equation to derive energy expenditure. One liter of oxygen consumed corresponds to approximately 4.86 kcal of heat production at a typical respiratory exchange ratio.1 Metabolic carts and face masks worn in exercise physiology labs use indirect calorimetry. Research-grade portable metabolic analyzers like the COSMED K5 can perform indirect calorimetry during exercise in the field — but they cost upward of $20,000 and require calibration with reference gases before each session.

Consumer wearables work entirely with proxy signals: accelerometer data, photoplethysmography-derived heart rate, skin temperature (in some devices), and user-supplied biometric inputs (age, sex, height, weight). The gap between these proxies and direct metabolic measurement is where the error lives.

How accelerometers work and what they miss

The primary sensor in every fitness wearable is a three-axis accelerometer — a microscopic mechanical structure whose resonant frequency shifts in response to acceleration forces along each spatial axis. The accelerometer measures the magnitude and direction of wrist movement many times per second.

Step counting from accelerometers is reasonably accurate under walking and running conditions where arm swing correlates with lower-limb stride. Published studies consistently report step count accuracy of 95–99% for most devices during walking and running on flat surfaces.2 The problem begins when movement doesn’t produce wrist acceleration proportional to metabolic demand.

Cycling is the canonical failure case. A cyclist’s wrists are stationary on the handlebars while the legs are driving high metabolic output. The accelerometer sees minimal wrist movement and severely underestimates energy expenditure. Most wearables address this by allowing the user to manually select “cycling” mode, switching reliance to heart rate rather than accelerometer for calorie estimation — but this introduces the heart rate estimation errors described below.

Weightlifting presents a related problem. A bicep curl generates large wrist acceleration but relatively modest metabolic demand. A heavy deadlift generates enormous metabolic demand but limited wrist movement. Accelerometer-based estimates of resistance training sessions are particularly unreliable — a 2021 study found mean errors of 33–42% across six wearables for resistance exercise protocols.2

Swimming, kayaking, rowing on an ergometer, and most racquet sports produce movement patterns that confound accelerometer-based step-counting models. Elliptical machines generate arm movement that some devices mistake for walking. The algorithm’s prediction is only as good as the movement pattern’s resemblance to the walking and running data it was trained on.

Heart rate as a metabolic proxy: the Keytel equations and their assumptions

When wearables switch from accelerometer-based to heart rate-based calorie estimation — as they do in exercise mode — they rely on empirically derived equations relating heart rate to oxygen consumption. The most widely cited are the Keytel equations, published in 2005, which predict VO₂ from heart rate, body weight, age, and sex.3

The Keytel equations were derived from a sample of 1,187 adults performing graded treadmill exercise to exhaustion in a laboratory. The relationship between heart rate and VO₂ at a given exercise intensity is approximately linear within the aerobic range, which is why the approach works at all. But the relationship is not fixed — it varies substantially with:

Fitness level. A trained athlete running at 150 bpm is likely working at a lower percentage of VO₂max than an untrained person at the same heart rate. The trained runner burns more calories per minute at that heart rate (because they sustain a faster pace to reach it), but the equation that knows only heart rate and demographics cannot distinguish between these two people.

Dehydration. Cardiovascular drift — the progressive rise in heart rate at a fixed pace during prolonged exercise in heat — occurs partly because reduced plasma volume from sweating forces the heart to beat faster to maintain cardiac output. A wearable interpreting elevated heart rate as elevated metabolic output during a dehydrated late-mile run is making a systematic error that inflates the calorie estimate.

Caffeine and medications. Beta-blockers suppress heart rate at any given exercise intensity, causing a wearable to severely underestimate expenditure for anyone taking them. Stimulants — caffeine, pseudoephedrine — elevate resting heart rate, causing modest overestimation at rest and slight errors during exercise.

Heat and emotional stress. A sauna, a heated yoga room, or an anxiety-provoking situation all elevate heart rate via sympathetic nervous system activation without proportional increases in metabolic rate. Devices recording heart rate in a hot yoga session may overestimate calorie burn by 20–40% compared to the same session in a temperate environment.

The photoplethysmography (PPG) sensor that measures heart rate at the wrist introduces its own error layer. PPG works by shining infrared or green light through the skin and detecting the pulse in the reflected signal. Wrist PPG accuracy is affected by skin tone, tattoo ink, wrist position during exercise, and ambient light. Published comparisons between wrist PPG and chest-strap electrocardiography report mean errors of 5–15 bpm during vigorous exercise — which translates directly into calorie estimate error through the heart rate-to-VO₂ equations.3

Population-level equations applied to individuals

Every calorie estimation algorithm relies on equations derived from population samples. These samples, even when reasonably large, cannot capture the full range of individual metabolic variation.

Resting metabolic rate (RMR) varies by approximately ±15% around predicted values from standard equations (Mifflin-St Jeor, Harris-Benedict) even after controlling for lean mass, sex, and age.4 This is the starting point for total daily energy expenditure calculations — an individual whose true RMR is 15% below the equation’s prediction will have their total expenditure systematically underestimated at every exercise intensity level.

The assumption that exercise expenditure can be modeled from heart rate alone ignores the effect of exercise mode. Running and cycling at the same heart rate produce different calorie burn because running involves eccentric loading, vertical oscillation, and a different muscle mass recruitment pattern. A 2017 Stanford study found that Apple Watch overestimated cycling expenditure by approximately 43% when using heart rate-based estimation and the user had not selected a specific activity mode.5

Individual variation in mechanical efficiency also matters. Two runners at the same pace, weight, and heart rate may differ in oxygen consumption by 5–10% based on running economy — a stable individual characteristic that reflects the metabolic cost of a given pace. Wearables cannot measure running economy; they assume a population average.

The compounding error: how individual errors multiply

The final calorie estimate is the product of multiple individually uncertain steps:

  1. Accelerometer or heart rate measurement (±5–15%)
  2. VO₂ estimation from heart rate via population equation (±10–15%)
  3. VO₂-to-kcal conversion using assumed respiratory exchange ratio (±3–5%)
  4. Addition of resting metabolic rate background (±10–15%)

When error terms compound multiplicatively and some are correlated — for example, dehydration inflates both heart rate and the exercise intensity signal — the composite error can substantially exceed the ±20% population average. The Stanford study comparing seven devices found individual-level errors ranging from −27% to +93% for energy expenditure estimation.5 The population mean error of ±20% conceals the long tails where devices are wildly wrong for specific individuals or specific conditions.

What wearables actually measure well

The case is not that wearables are useless — it is that they are better used as training load proxies than as precise metabolic measurements.

Relative change over time is more reliable than absolute values. If your device reports that today’s run generated 20% more activity than last week’s equivalent run, the directionality is probably correct even if the absolute number is off. This makes wearable data useful for managing training progression, monitoring recovery patterns, and detecting drift in fitness metrics over weeks.

Heart rate zone training uses relative effort rather than absolute calorie counts. Using a device to keep effort in a target heart rate zone is a valid training tool that doesn’t require calorie accuracy — you’re using the heart rate signal directly, not the derived calorie estimate.

Active minutes or move minutes, where the device simply records whether you’re moving above a threshold intensity, are more reliably measured than calorie expenditure. Step count, despite accelerometer limitations, is more accurate than calorie estimation and is a useful proxy for overall daily activity volume.

Trend data over 4–8 week windows smooths the session-level noise. Wearable data aggregated over monthly periods can detect changes in average daily expenditure that correlate with changes in body composition — not because any individual day’s estimate is accurate, but because systematic bias tends to be consistent and relative trends remain meaningful.

How to use device data without being misled

The practical calibration approach: treat your device’s calorie estimates as systematically biased in a direction you should determine empirically for yourself. The way to estimate your personal bias is to hold caloric intake constant for 4 weeks, record your device’s weekly expenditure estimates, and compare the implied energy balance to actual body weight change. If you maintain intake 500 kcal below your device’s implied maintenance and lose only 0.25 kg/week rather than the predicted 0.5 kg/week, your device is overestimating expenditure by approximately 500 kcal/day. Apply that correction factor going forward.

Photograph-based food logging paired with wearable exercise data is a more calibrated system than relying on either input alone. The structural uncertainty in calorie counting comes from both sides of the energy balance equation — food intake estimation and expenditure estimation — and systematic calibration of both sides over time produces a more reliable picture than trusting either device’s absolute output.

Select activity modes explicitly rather than using general activity detection. Most wearables switch to heart rate-primary estimation when a specific activity is selected, and that shift — despite its own limitations — typically reduces accelerometer-mode errors for non-ambulatory activities like cycling, strength training, and rowing.

Ignore per-session calorie numbers for anything except relative comparison. The meal you consumed before a workout, the temperature of the gym, your hydration status, and your individual fitness level each affect the estimate in ways the device cannot know. The number that matters is the weekly trend, and the calibration that matters is how your body actually responds to the intake and expenditure you sustain over months.

References

  1. Weir JB. “New Methods for Calculating Metabolic Rate with Special Reference to Protein Metabolism.” Journal of Physiology 109, no. 1–2 (1949): 1–9.

  2. Evenson KR, Goto MM, Furberg RD. “Systematic Review of the Validity and Reliability of Consumer-Wearable Activity Trackers.” International Journal of Behavioral Nutrition and Physical Activity 12, no. 1 (2015): 159.

  3. Keytel LR, Goedecke JH, Noakes TD, et al. “Prediction of Energy Expenditure from Heart Rate Monitoring During Submaximal Exercise.” Journal of Sports Sciences 23, no. 3 (2005): 289–297.

  4. Frankenfield D, Roth-Yousey L, Compher C. “Comparison of Predictive Equations for Resting Metabolic Rate in Healthy Nonobese and Obese Adults.” Journal of the American Dietetic Association 105, no. 5 (2005): 775–789.

  5. Shcherbina A, Mattsson CM, Waggott D, et al. “Accuracy in Wrist-Worn, Sensor-Based Measurements of Heart Rate and Energy Expenditure in a Diverse Cohort.” Journal of Personalized Medicine 7, no. 2 (2017): 3.

  6. Strath SJ, Kaminsky LA, Ainsworth BE, et al. “Guide to the Assessment of Physical Activity: Clinical and Research Applications.” Circulation 128, no. 20 (2013): 2259–2279.

Frequently asked questions

How far off are fitness wearables when calculating calories burned?
Most consumer wearables produce errors of 15-30% on average, but a Stanford study of seven devices found individual-level errors ranging from -27% to +93%. The population average masks wide variation caused by fitness level, hydration, medications, and activity type.
Why are wearables so inaccurate for cycling and weightlifting?
Accelerometers measure wrist movement, not metabolic output. Cyclists hold handlebars so wrists barely move while legs drive high energy expenditure. Weightlifting generates inconsistent wrist motion relative to actual effort. Studies found mean errors of 33-42% across six devices during resistance training protocols.
What is the Keytel equation and why does it matter for calorie estimates?
The Keytel equation predicts oxygen consumption from heart rate, weight, age, and sex, and underpins most heart-rate-based calorie calculations. It was derived from 1,187 adults on treadmills, so it struggles with dehydration-driven heart-rate drift, beta-blockers, heat stress, and highly trained athletes whose heart rate-VO2 relationship differs from the population average.
How can I find my personal wearable calibration factor?
Hold caloric intake constant for four weeks and record your device's weekly expenditure estimates alongside actual body weight change. If you eat 500 kcal below the device's implied maintenance but lose only half the predicted weight, your device is overestimating expenditure by roughly 500 kcal per day. Apply that correction going forward.
What does my wearable actually measure reliably?
Relative change over time is more reliable than absolute calorie values. Heart-rate zone monitoring, active minutes above a threshold, and step counts on flat surfaces are all more accurate than the derived calorie estimate. Weekly trend data averaged over 4-8 weeks smooths session-level noise into a signal that correlates with body-composition changes.