Normalizing Food Safety Scores Across US Jurisdictions

Ask yourself a simple question: is a restaurant with an LA County score of 92 safer than one with an NYC score of 92? The answer is - you genuinely cannot tell, because those numbers measure completely different things. This is the core problem with restaurant health inspection data in the United States, and it is what makes building any kind of cross-market food safety product so much harder than it looks from the outside.

The US has no federal restaurant inspection standard. Health inspections are administered at the county or municipal level, and with over 3,000 distinct health jurisdictions in operation, you end up with a patchwork of incompatible inspection scoring systems that each reflect different priorities, violation taxonomies, and thresholds. Some cities use letter grades. Some use percentages. Some use pass/fail with a violation count tacked on. Some use numerical scores that look identical to other numerical scores but are computed from entirely different inputs. Directly comparing these outputs leads to systematically wrong conclusions.

This post explains the normalization methodology behind the FoodSafe Score API in full - the major scoring systems in use today, why direct comparison fails, how to build a principled translation layer, how to handle edge cases, and why maintaining this kind of system in-house is far more work than it appears.

The Major Scoring Systems in Use Today

Before building a normalization layer, you need to understand what you are normalizing from. The six systems described below cover the vast majority of US restaurant inspection records, but they are far from uniform.

New York City - Points Deducted from Zero

New York City's inspection system is inverted relative to most people's intuitions. Restaurants start with a score of zero, and violations add points. Lower scores are better. A restaurant with a score of 0-13 earns an A grade. 14-27 points earns a B. 28 or more points earns a C. The point values are tied to specific violation codes: a critical violation that poses an imminent public health hazard carries significantly more weight than an administrative non-compliance issue.

The NYC system is also notable for its "Grade Pending" state. When a restaurant scores in the B or C range, it has the right to request a re-inspection before a grade is officially posted. During the waiting period, it displays a "Grade Pending" card rather than its inspection result. This means that a restaurant with a Grade Pending status is not the same as a restaurant with no inspection history - it has likely recently failed an inspection. Any normalization system needs to handle this state explicitly rather than treating it as missing data.

Los Angeles County - Percentage Score

LA County uses a percentage-based score that runs 0-100, where higher is better. Establishments start at 100% and lose points for each violation found. The system distinguishes between three violation categories: major violations (also called "A" violations in LA's coding) carry a 4-point deduction, minor violations carry a 2-point deduction, and administrative violations carry a 1-point deduction. The letter grade is then derived from the percentage: 90-100% earns an A, 80-89% earns a B, 70-79% earns a C, and below 70% closes the establishment.

The LA system looks superficially like a clean 0-100 scale, but the violation weights are calibrated to LA's local violation taxonomy, which does not map one-to-one with other jurisdictions' definitions of "major" vs. "minor." A 92% in LA reflects the absence of specific violations that LA considers major - not necessarily the same violations that NYC considers critical.

Chicago - Pass/Fail with Violation Count

Chicago's system is the starkest departure from numerical scoring. Restaurants receive either a "Pass," "Pass with Conditions," or "Fail" result. The inspection record includes a list of specific violations found, but the city does not publish a weighted numerical score. "Pass with Conditions" indicates that violations were found but corrected on-site during the inspection visit.

This binary model creates an obvious challenge for normalization: how do you convert a pass/fail result into a 0-100 score without losing meaningful signal? The answer requires leaning heavily on the violation count and category breakdown. A Chicago establishment that passed with zero violations is not equivalent to one that passed with eight minor violations that happened to fall below the failure threshold. The raw violation data carries the signal that the top-line result obscures.

San Francisco - Numerical Score with Different Weights

San Francisco uses a numerical score that superficially resembles LA County's scale. Establishments start at 100 points and lose points for violations. However, the violation weights are calibrated differently: high-risk violations carry a deduction of 4-15 points depending on severity, medium-risk violations carry 2-5 points, and low-risk violations carry 1-2 points. The wide range within each category means that two SF establishments with identical scores may have arrived there through very different violation profiles.

SF also conducts routine inspections, complaint-based inspections, and re-inspections - and the inspection type affects how the record should be interpreted. A score from a targeted complaint-based inspection is not directly comparable to a routine scheduled inspection score from the same establishment, because the complaint inspection was triggered by a specific alleged problem rather than a random sampling of overall conditions.

King County (Seattle Area) - Modified Point System

King County in Washington State uses a system where violations are categorized by risk level and assigned point values: "Red" (high-risk) violations carry 5 points each, "Blue" (low-risk) violations carry 2 points. The total points determine whether an establishment is in compliance. Establishments are classified into risk categories (1-4) that affect inspection frequency - higher-risk establishments (full-service restaurants with complex menus) face more frequent inspections than lower-risk ones (convenience stores selling pre-packaged food).

The risk category classification adds a layer of complexity that matters for normalization: a 10-point score at a Category 1 (highest risk) establishment represents different underlying conditions than a 10-point score at a Category 3 establishment, because the inspection was more thorough and covered more potential violation types.

Miami-Dade and Maricopa - Hybrid Systems

Miami-Dade County uses a system that combines violation counts with a disposition label (Met, Not Met, Not Applicable) for each line item in a standardized inspection form. Maricopa County (Phoenix metro) uses a points-based system with separate thresholds for routine compliance and for imminent health hazards that trigger immediate closure. Both systems require custom parsing logic to extract a meaningful comparison score, and both have changed their inspection forms and data formats multiple times in the past decade as they have migrated between inspection software vendors.

Why Direct Comparison Fails

The systems described above are not just different scales - they measure different things using different violation taxonomies applied at different inspection frequencies by inspectors trained under different protocols. An LA 92% is not equivalent to a 92/100 anywhere else, but the failure is deeper than just a scale problem.

Consider the specific question of what counts as a "critical" violation. In NYC, improper hot-holding temperature (keeping cooked food between 41°F and 140°F where pathogens multiply rapidly) is a critical violation worth 7 points. In LA County, the same condition is a "major" violation worth 4 percentage points. In Chicago, it is recorded as a specific violation code and contributes to the pass/fail determination. The underlying food safety risk is identical. The scoring impact is completely different.

Now consider inspection frequency. A Chicago restaurant that was inspected six months ago has a record that is 6 months stale. A NYC restaurant in a high-traffic area may have been inspected three times in the same period. If you simply average the raw scores, you are implicitly treating these records as equally fresh, which they are not. The Chicago pass/fail from 6 months ago deserves less weight than the NYC score from 3 weeks ago.

There is also the question of what inspectors actually check. Standard inspections in most jurisdictions follow a baseline protocol, but some jurisdictions have expanded their inspection forms to cover topics like allergen handling, food traceability documentation, and employee illness policies. A restaurant that gets high marks on an expanded-form inspection is not directly comparable to one that passes a baseline inspection - the expanded inspection covered more potential failure modes.

The fundamental error in naive cross-jurisdiction comparison is treating the output number as if it represents the same underlying measurement everywhere it appears. It does not. Normalization is the process of rebuilding that shared measurement from first principles using the raw violation data as input.

The Normalization Methodology

A principled normalization approach starts by discarding the top-line score from each jurisdiction entirely and working from the violation-level records. This is the key insight: instead of trying to convert an LA percentage into an NYC-equivalent score (which requires knowing how LA and NYC weight violations relative to each other), you re-derive a score from the underlying violation data using a single consistent weighting scheme applied to all jurisdictions.

Step 1 - Map Violations to a Common Taxonomy

The first step is mapping each jurisdiction's violation codes to a common violation taxonomy. This taxonomy has three categories that are consistent across all inputs:

Critical violations - conditions with direct potential to cause foodborne illness. Includes improper temperature control, contamination of ready-to-eat food, presence of pests, sewage/plumbing failures, and bare-hand contact with ready-to-eat food.
Non-critical violations - conditions that do not directly cause illness but indicate systemic hygiene problems. Includes improper labeling, non-food-contact surface cleanliness, equipment maintenance, and documentation issues.
Corrected on-site violations - violations that were observed during the inspection but corrected before the inspector left the premises. These indicate a problem existed but was immediately remediated.

Building this mapping requires reviewing the specific violation codes and descriptions for each jurisdiction and making judgment calls about where ambiguous violations should be placed. Some jurisdictions have violation codes that span what the common taxonomy treats as two separate categories. These are resolved by reviewing the description and associated regulatory language rather than the code number alone.

Step 2 - Apply Consistent Violation Weights

With violations mapped to the common taxonomy, the normalized score is computed from a baseline of 100:

Normalization Formula

score = 100

- 25 x critical_violations // per violation

- 5 x non_critical_violations // per violation

- 2 x corrected_violations // reduced for on-site correction

floor: 0 // score cannot go below 0

Violation Type	Point Deduction	Rationale
Critical	-25 pts	Direct pathogen contamination risk. One critical violation alone moves a restaurant from A to B territory. Two uncorrected critical violations puts a restaurant in C range.
Non-critical	-5 pts	Hygiene practice failures that indicate systemic issues but do not directly cause illness. Three non-critical violations move a restaurant one grade band.
Corrected on-site	-2 pts	Violation existed but was immediately remediated. Retains a small penalty because the condition was present at all, but reflects the positive action taken.

The weight ratios (25:5:2) are not arbitrary. They reflect the epidemiological severity differential between violation types. FDA food safety research consistently finds that critical temperature, contamination, and hygiene violations are responsible for the overwhelming majority of foodborne illness outbreaks traced to restaurants. Non-critical violations represent real but lower-probability risks. The 5:1 ratio between critical and non-critical weights reflects this proportional risk difference.

Step 3 - Apply Jurisdiction-Specific Adjustments

Some jurisdictions conduct inspections under programs that are structurally different from standard routine inspections. These require adjustment before the raw violation count is used.

Establishments inspected under Chicago's "Consultation" program (a voluntary food safety review rather than a compliance inspection) have their violation records flagged and excluded from score computation, because the violation taxonomy and documentation standards differ from routine inspections. Miami-Dade establishments operating under a temporary food service permit are scored against a subset of the full violation taxonomy - the score is computed from the violations actually inspected rather than against the full baseline of 100.

Some jurisdictions also apply a severity modifier for repeat violations. A critical violation that appears in three consecutive inspections receives a higher effective weight than one appearing for the first time, because it suggests that the establishment is either unable or unwilling to correct a known problem. This is a jurisdiction-specific adjustment applied where the raw data includes explicit "repeat violation" flags in the inspection record.

Step 4 - Weight by Inspection Recency

When an establishment has multiple inspections on record, the normalized score is not a simple average of all historical inspection scores. Recent inspections receive more weight than older ones, with the weight decaying exponentially as inspection age increases.

The practical effect is that an establishment whose most recent inspection was six months ago, and whose previous inspection two years ago was much worse, will have a score that primarily reflects the recent result - with the older poor record acting as a mild drag rather than an equal co-contributor. This reflects the reality that food safety conditions change over time, and a recent improvement is meaningful evidence that should not be overwhelmed by a distant history.

The decay function uses a half-life of approximately 18 months. An inspection from 18 months ago carries half the weight of a same-day inspection. An inspection from 36 months ago carries one-quarter the weight. Inspections older than 48 months are retained in the historical record for context but effectively zero-weighted in the current score computation.

Grade Thresholds

The normalized score maps to a letter grade using fixed thresholds that are consistent across all jurisdictions:

A85-100

B70-84

C50-69

F0-49

These thresholds were calibrated against a validation dataset of known-outcome inspection records. The A/B threshold at 85 was chosen so that an establishment with a single uncorrected critical violation falls into B territory, while one with zero critical violations and a small number of non-critical violations can still earn an A. The B/C threshold at 70 was set so that establishments with two critical violations or a pattern of repeated non-critical violations fall into C. The F threshold at 49 effectively captures establishments that had two or more critical violations uncorrected - conditions that in most jurisdictions would trigger a follow-up inspection or conditional closure.

Edge Cases and Challenges

Closed and Reopened Establishments

When a restaurant closes - whether voluntarily, due to a lease ending, or due to a forced closure following a severe inspection failure - and then reopens under new ownership, the question is whether the new establishment inherits the inspection history of the old one. The answer depends on what changed.

If the same legal entity is operating at the same address under a new trade name, the inspection history is preserved. The underlying business is the same, and a rebrand does not reset the food safety record. If genuinely new ownership took over, the establishment gets a 12-month "new ownership" flag during which historical records from the prior operator are weighted at 20% of their normal value. This reflects the fact that new owners may have addressed the prior operator's problems - but it does not zero out the history entirely, because the physical plant and some staff often remain the same.

New Establishments with No Inspection History

Newly opened restaurants have no inspection record. The normalized score for these establishments defaults to 75 (the middle of the B range) with a "New - Pending First Inspection" flag. This is a deliberately conservative default. It does not assign a high score based on the absence of violations, because absence of a record is not the same as absence of violations - new establishments have simply not yet been inspected.

Once the first inspection result is recorded, the default is replaced with the actual computed score. Displaying the 75/B default is accompanied by prominent UI language clarifying that the score is a placeholder pending inspection, not an actual assessment.

Jurisdictions with Infrequent Inspections

Rural counties with limited health department resources may inspect lower-risk food establishments only once every 18-24 months. This creates a staleness problem: the most recent inspection result may simply be too old to be reliable as a current indicator. For establishments where the most recent inspection is older than 24 months, the normalized score is displayed with a staleness flag and the numerical score is visually de-emphasized relative to the date. The grade letter is not shown for records older than 36 months, since a letter grade implies current-ish validity that the data does not support.

Complaint-Based vs. Routine Inspections

Inspection type matters for interpretation. A routine inspection is a scheduled visit meant to sample overall compliance conditions. A complaint-based inspection is triggered by a specific allegation - a customer reporting illness, evidence of pests, or other concerns - and therefore tends to focus on the specific alleged problem area rather than the full establishment. Complaint inspections generally find violations at higher rates than routine inspections for the same establishments, because they are targeted at suspected problem areas.

In the normalization model, complaint-based inspections are included in the score computation but flagged as complaint-driven. When a sequence of complaint inspections closely follows a routine inspection, the complaint results receive 60% of the normal weight. When a complaint inspection is the only recent record, it receives full weight with an explanatory flag in the API response.

Validation Approach

Normalization methodology is only as good as its validation. The approach used here involves three validation tracks running in parallel.

Ground truth spot-checking takes a random sample of 500 restaurants monthly from a cross-section of major jurisdictions and manually reviews the raw government records against the normalized score. Any case where the normalized score diverges significantly from what the raw record would suggest - given reasonable professional judgment about the violations present - is flagged for methodology review.

Cross-jurisdiction calibration compares the grade distribution of normalized scores across jurisdictions. If the methodology were systematically biased toward one jurisdiction's standards, you would expect to see statistically different grade distributions between, say, NYC restaurants and LA restaurants after normalization. Persistent distribution differences prompt a review of whether jurisdiction-specific adjustment factors need updating.

Outcome correlation tracking compares normalized score bands against publicly available foodborne illness outbreak data linked to specific establishments. Establishments that generated confirmed outbreak reports should, in retrospect, show lower normalized scores at the time of the event. Checking this correlation over time verifies that the score is predictive of real food safety failures, not just administrative compliance.

Why Building This In-House Is Much Harder Than It Looks

After reading through the methodology above, a natural reaction is: "this is complex but tractable - we could build this." That assessment is correct for the initial build. Where it underestimates the challenge is in ongoing maintenance.

Health departments change their inspection software vendors every few years, which means the underlying data format and violation code taxonomy changes without notice. LA County migrated between vendors twice in the last decade, each time changing how violation records are structured in the public data export. Chicago has changed its inspection form's violation code numbering three times since 2010. Each change requires updating the violation-to-taxonomy mapping, re-running historical records through the updated mapping, and validating that the score distribution has not shifted in a way that would mislead users.

Health departments also change their scoring rules. NYC has updated its violation point values several times, with the most recent update adding new violation categories for allergen handling that did not exist in prior versions of the form. These rule changes require not just updating the data pipeline but also deciding how to handle historical records that were generated under the old rules.

Beyond the data maintenance burden, there is the legal and access layer. Some jurisdictions require a formal data sharing agreement before you can access bulk inspection data. Others make bulk data available freely but impose rate limits or access restrictions on programmatic queries. A handful require explicit attribution language whenever their data is displayed publicly. Managing this compliance surface across 3,000+ jurisdictions is a genuine operational overhead that scales with coverage, not with usage.

For teams that want to consume this data via API rather than maintaining the pipeline themselves, the integration guide covers the technical steps from authentication through production deployment. For teams evaluating whether to build or buy the normalization layer, the maintenance burden of the build path is the factor most consistently underestimated in early planning.

Conclusion

Normalizing food safety scores across US jurisdictions is not a data science problem with a clean closed-form solution. It is an engineering and maintenance problem that requires continuously mapping evolving local inspection systems to a shared standard, handling the edge cases and special programs that every jurisdiction has developed independently over decades, and validating the output against real-world outcomes to ensure the score remains meaningful over time.

The methodology described here - violation-level re-scoring with consistent weights, jurisdiction-specific adjustments, recency decay, and systematic validation - is what the FoodSafe Score API uses to produce a single 0-100 score and A/B/C/F grade for restaurants across every covered market. The goal is a score that a food delivery platform in Los Angeles and a franchise operator in Chicago can both trust for the same reasons, regardless of which health department issued the underlying inspection record.

How to Normalize Food Safety Scores Across US Jurisdictions