Nutrition Data Storage Hypothetical

How much data does food require in the United States? Well, let’s find out. I’ll be using C, since it’s the lowest-level language many programmers are comfortable with, and it is plenty efficient enough to handle this task.

The data required for one label can be expressed in a C struct with 42 data items, including everything from vitamins, to brand name, to serving size. These are all the data points that are found on Nutrition Facts labels that cannot be reasonably calculated otherwise. This data structure is a type that is used to contain and manage many variables that contain the same structure of data.

I calculate 224 bytes per nutrition label using the C sizeof() function to account for language-dependent padding:

  • 50 letters each for food and brand names (100 bytes)
  • 10 letters for serving size and total units (“container” being the longest) (20 bytes)
  • 14 32-bit floating point numbers for decimal-significant numbers (56 bytes)
  • 2 32-bit floating point numbers for servings, to convert to fractions on the label (8 bytes)
  • 24 unsigned 16-bit integers (unsigned shorts) for numbers between 0-65,535 (48 bytes)

The original code can be found here: https://fwylupek.com/code/nutrition_data.c

You might be wondering where the calories have gone. The macros fat, protein, and carbs will be used to calculate calories. Protein and carbs each account for 4 calories per gram, while fat is 9 calories per gram. Percentage of daily values will also be calculated, especially since they are prone to change over time.

So then, how many labels are needed? Open Food Facts maintains a collaborative database of 347,507 foods in the United States alone, at the time of writing. This should account for about 20 years’ worth of food, seeing as the USDA records about 20,000 new foods per year.

347,507 labels at 224 bytes each equals 77,841,568 bytes, or 74.24 megabytes of data! Is that more or less than you expected?