Brandon Harris bio photo

Brandon Harris

Big Data + Analytics @ United Airlines

Twitter LinkedIn Instagram Github Photography

*Updated information and disclaimers here. Please don’t assume a restaurant is actually dirty before reading this.

Note - If you’re coming from Twitter, Reddit, Facebook, etc.. and don’t really care about the data behind the results, and just want to know where you may not want to eat, you can jump to the results by clicking here.

This semester I wound up taking Linear Algebra and Database Design. I design databases for a living so I was definitely happy to pair Linear Algebra with a course that I knew would be easy for me. For the most part the database class was what I expected, though the very basic concepts were taught with MS Access. I hadn’t touched Access in ~5 years so it was good to get a refresher. For our final project, we had to take an existing data set, normalize it, build a schema to hold the data, and then finally migrate the data into the actual database. This was all done with MySQL, and I have to say the admin tools today for MySQL have come a long way since I first started using it ~2002.

I found a great repository for data relating to Chicago at the City of Chicago Data Portal, and one set that struck me as particularly interesting was the food service inspection results. I like to eat out, and I live in Chicago. I figured this would probably be a good match.

I learned two important things after completing this project.

Firstly, while I applaud the effort of the City of Chicago to open up their data, the data quality is not the best I’ve ever seen. There were a lot of missing and duplicated values.

Secondly, I learned that I should probably not eat out so often.

Not only did the data list food inspection results (pass, fail, etc..) it listed the text of the violations. I’ll leave it as an exercise to the reader to look them up on your own. Consider yourself warned, some thing can’t be unseen.

Below is the schema we designed to house the data, pretty simple and nothing too unusual. This data set was relatively large and since there was no stipulation for the project around having hundreds of thousands of records, we decided to import just 2014 data.

One very cool feature that I will applaud the City of Chicago for having is latitude / longitude coordinates. Using the Haversine Formula with any known coordinate set made it easy to answer questions like ‘What restaurants within 1/2 mile of me have had the most inspection violations in 2014’? This also led me to realize some questions are better left unanswered, as I have happened to have already eaten at a number of these establishments.

If you live in Chicago and enjoy eating out, you may want to stop reading and take the blue pill right about now.

last chance..

going once..

going twice..

Red pill it is!

Here are some of the more interesting results from this project.

Most failed health inspections in 2014

SELECT 
    FACILITY_NAME,
    ADDRESS,
    failed_inspections,
    risks AS RISK_VALUE
FROM
    LICENSED_FACILITY L
        JOIN
    (SELECT 
        FACILITY_ID,
            COUNT(FACILITY_ID) AS failed_inspections,
            SUM(K.RISK_VALUE) AS risks
    FROM
        INSPECTION I
    JOIN RISK K ON K.RISK_ID = I.RISK_ID
    WHERE
        RESULT_ID = 3
    GROUP BY FACILITY_ID
    HAVING COUNT(facility_id) > 2
    ORDER BY 2 DESC) list ON list.FACILITY_ID = L.FACILITY_ID
FACILITY NAME ADDRESS FAIL COUNT RISK VALUE
TRINITY BAR 2721 N HALSTED ST 4 12
TAJ MAHAL 1512 W TAYLOR ST 4 12
HANA RESTAURANT 6803 N SHERIDAN RD 4 12
FISCHMANS WAGYU WAGON 324 N LEAVITT ST 4 8
NUEVO LEON RESTAURANT 1515 W 18TH ST 4 12
LAWNDALE FOOD MARKET 3705-3713 W 16TH ST 4 8
HAROLDS CHICKEN #2 6419 S COTTAGE GROVE AVE 3 6
McDONALD’S 225 S CANAL ST 3 6
NEW BUFFET CASTLE INC 3326 W BELMONT AVE 3 9
LA FLOR DE MICHOACAN 5155-5159 W ADDISON ST 3 9
TABERNACLE CHRISTIAN ACADEMY 1233 W 109TH PL 3 9
HONG KONG DELIGHT III, INC 1238 S CANAL ST 3 9
MARVIN RESTAURANT 901 S CICERO AVE 3 9
LAO SZE CHUAN RESTAURANT 2172 S ARCHER AVE 3 9
GLORIA JEANS COFFEE 79 E MADISON ST 3 6
PHOENIX RESTAURANT OF IL.L.L.C 2131 S ARCHER AVE 3 9
HALSTED STREET DELI & BAGEL 133 N WELLS ST 3 9
P J CLARKE’S 1204 N STATE PKWY 3 9
ELLY’S PANCAKE HOUSE 101 W NORTH AVE 3 9
RIBS UNLIMITED 1419 W 79TH ST 3 9
ONE STOP SHOP FOOD MART 11250 S MICHIGAN AVE 3 6
TACO BELL #30409 2575 N CLYBOURN AVE 3 9
LAS ISLAS MARIAS 6560 W FULLERTON AVE 3 9
EL PAISANO TACO’S 2429 W DIVISION ST 3 9
BUGSY’S 6301 S WESTERN AVE 3 9
TACO FRESCO 180 N WELLS ST 3 9
NEW HOPE LUTHERAN SCHOOL 6416 S Washtenaw (2700W) 3 9
TETE CHARCUTERIE 1110-1114 W RANDOLPH ST 3 9
RIB LADY 2 4203 W CERMAK RD 3 9
JEWEL FOOD STORE #3030 7530 S STONY ISLAND AVE 3 9
CORPORATE CONCIERGE SERVICES, INC. 210 S CANAL ST 3 3
SILVER’S SEAFOOD RESTAURANT 4829 N BROADWAY 3 9
EL TACONAZO RESTAURANT 5849 S KEDZIE AVE 3 9
EL TACONAZO RESTAURANT 4124 W NORTH AVE 3 9
NUEVO LEON RESTAURANT 3657 W 26TH ST 3 9
SU NUEVA CASA DEL GALLO BRAVO, INC. 3714 W LAWRENCE AVE 3 9
POTASH BROS. SUPERMARKET 1525 N CLARK ST 3 9
71ST GROCERY INC. 1908-1910 E 71ST ST 3 3
Skinner West 1260 W ADAMS 3 9
MODEL CHEF LLC 2300 S THROOP ST 3 6
HP CONVENIENT MART 111 E 51ST ST 3 6
ROYAL CARIBBEAN JERK 2126 E 71ST ST 3 9
ROBINSON ‘S NEW ORLEANS CAJUN 500 W MADISON ST 3 9
SOHO HOUSE 113-125 N GREEN 3 9
CLARKE’S ON LINCOLN 2441-2445 N LINCOLN AVE 3 9
NOAH FOOD GROUP, INC 4959 W AUGUSTA BLVD 3 6
GRAND TRIPLE CROWN 3034 S HALSTED ST 3 9
MY FIT FOODS 157 W NORTH AVE 3 9

The most frequently inspected restaurants in Chicago in 2014 (7+ inspections each)

SELECT 
    L.FACILITY_NAME, L.ADDRESS, COUNT(INSPECTION_ID) INSPECTIONS
FROM
    INSPECTION I
        JOINchinato
    LICENSED_FACILITY L ON L.FACILITY_ID = I.FACILITY_ID
GROUP BY I.FACILITY_ID
HAVING COUNT(INSPECTION_ID) >= (SELECT 
        MAX(insp_count) - 1
    FROM
        (SELECT 
            FACILITY_ID, COUNT(INSPECTION_ID) AS insp_count
        FROM
            INSPECTION
        GROUP BY FACILITY_ID) A)
ORDER BY 3 DESC
FACILITY NAME ADDRESS INSPECTIONS
LAO SZE CHUAN RESTAURANT 2172 S ARCHER AVE 8
TEQUILA RESTAURANT 5728 S ARCHER AVE 8
HONG KONG DELIGHT III INC 1238 S CANAL ST 8
NEW BUFFET CASTLE INC 3326 W BELMONT AVE 8
NUEVO LEON RESTAURANT 1515 W 18TH ST 8
PETE’S FRESH SUPERMARKET 2526 W CERMAK RD 7
McDONALDS 6355 S DR MARTIN LUTHER KING JR DR 7
JEWEL FOOD STORE #3030 7530 S STONY ISLAND AVE 7
OLD COUNTRY BUFFET 7601 S CICERO AVE 7

Restaurants with 2+ failed inspections in 2014 within 1 Mile of the Gleacher Center (where I have class).

SELECT 
    L.FACILITY_NAME,
    L.ADDRESS,
    L.CITY,
    failed_inspections,
    RISK_VALUE
FROM
    LICENSED_FACILITY L
        JOIN
    (SELECT 
        FACILITY_ID,
            COUNT(FACILITY_ID) AS failed_inspections,
            SUM(RISK_VALUE) AS RISK_VALUE
    FROM
        INSPECTION I
    LEFT JOIN RISK K ON K.RISK_ID = I.RISK_ID
    WHERE
        I.RESULT_ID = 3
    GROUP BY FACILITY_ID
    HAVING COUNT(facility_id) > 1
    ORDER BY 2 DESC) failed_list ON failed_list.FACILITY_ID = L.FACILITY_ID
        JOIN
    (SELECT 
        facility_id,
            facility_name,
            (3959 * ACOS(COS(RADIANS(41.8896193000000)) * COS(RADIANS(CAST(LATITUDE AS DECIMAL (13 , 2 )))) * COS(RADIANS(CAST(LONGITUDE AS DECIMAL (13 , 2 ))) - RADIANS(- 87.62223169999999)) + SIN(RADIANS(41.8896193000000)) * SIN(RADIANS(CAST(LATITUDE AS DECIMAL (13 , 2 )))))) AS distance
    FROM
        LICENSED_FACILITY F
    HAVING distance < 1
    ORDER BY distance ASC) closest_list ON closest_list.facility_id = failed_list.facility_id
ORDER BY 4 DESC
FACILITY NAME ADDRESS FAIL COUNT RISK VALUE
P J CLARKE’S 1204 N STATE PKWY 3 9
TACO FRESCO 180 N WELLS ST 3 9
GLORIA JEANS COFFEE 79 E MADISON ST 3 6
HALSTED STREET DELI & BAGEL 133 N WELLS ST 3 9
LA COCINA MEXICAN GRILL LLC 178 N FRANKLIN ST 2 6
DOUBLE TREE MAGNIFICENT MILE 300 E OHIO ST 2 6
PANERA BREAD 2 N MICHIGAN AVE 2 6
TROPHY ROOM 170 W ONTARIO ST 2 6
THE GRILL ON THE ALLEY 909 N MICHIGAN AVE 2 6
MCFADDEN’S 1206 N STATE PKWY 2 6
VORA 1028 N CLARK ST 2 6
AL’S BEEF 234 S WABASH AVE 2 4
PITA EXPRESS 100 W RANDOLPH ST 2 6
SEVEN 400 E RANDOLPH ST 2 6
NORTHWESTERN MEMORIAL HOSPITAL (CAFETERIA-2ND) 251 E HURON ST 2 6
HALSTED STREET DELI 222 W MERCHANDISE MART PLZ 2 6
DAVE & BUSTER’S 1030 N CLARK ST 2 6
HAIFA CAFE 19 N WELLS ST 2 6
Eggsperience 33 W ONTARIO ST 2 6
DOUGHNUT VAULT 228-230 W KINZIE ST 2 2
TGI FRIDAY’S 153 E ERIE ST 2 6
CHARTWELLS 525 S STATE ST 2 2
RED ROBIN’S BURGER WORKS 328 N MICHIGAN AVE 2 6
CHARTWELLS 525 S STATE ST 2 2
AU BON PAIN 200 W ADAMS ST 2 6
HASH HOUSE A GO GO 1212 N STATE PKWY 2 6
SUBWAY 304 W HUBBARD ST 2 6
GEORGIOS ON DIVISION 11 W DIVISION ST 2 6
SUSHI SAMBA 504 N WELLS ST 2 6

Northwestern Memorial Hospital? Really?! Yeeesh.

Most Gag Inducing Violation Text(s)

And the award goes to NICKY’S CHINESE FOOD, 5231 S WOODLAWN AVE (Hyde Park) for both of the below violations!

Inspection 1491142 - 07/30/2014 - OBSERVED OVER TWO HUNDRED RODENT (MOUSE) DROPPINGS THROUGHOUT KITCHEN, ON FLOOR UNDER COOKING EQUIPMENT AND SINKS, ON BOTTOM SHELVES OF PREP TABLES, IN STORAGE ROOM ON FLOOR UNDER SHELVES AND ON FOOD/SUPPLY STORAGE SHELVES.

Inspection 1493314 - 08/06/2014 - STILL EVIDENCE OF RODENT INFESTATION. ONE LIVE MOUSE SIGHTED RUNNING ALONG KITCHEN FLOOR. RODENT DROPPINGS ON FLOOR IN KITCHEN AND STORAGE AREAS AND ON SHELF IN KITCHEN. ALSO AT THIS TIME SIGHTED OVER 30 LIVE SMALL FLIES IN UTENSIL WASHING AREA, KITCHEN AND STORAGE AREA.

(Good news for Nicky’s fans, they passed their latest inspection on November 7, so while the violations are concerning, they appear to heave done a good job cleaning things up).

Bon Appétit!

Updated Dec 15, 2014

A few updates here as this post has gotten way more attention than I’d ever imagined.

Firstly, the data is sourced from the City of Chicago Data Portal which requires the following disclaimer.

“This site provides applications using data that has been modified for use from its original source, www.cityofchicago.org, the official website of the City of Chicago. The City of Chicago makes no claims as to the content, accuracy, timeliness, or completeness of any of the data provided at this site. The data provided at this site is subject to change at any time. It is understood that the data provided at this site is being used at one’s own risk.”

To piggy-back onto that, I also make no claims as to the accuracy of the data presented here.

Secondly, “dirtiest” is not a statement of fact, it’s an opinion (and solely my opinion) derived from my interpretation of the results of the data. If you’re concerned about the cleanliness of a specific restaurant, please do some research at the City of Chicago Data Portal. The entire set of data is present, and not just my set of data from January through early November 2014. Please don’t make generalizations about a restaurant or business based on this small subset of data. A restaurant could’ve been operating for 20 years, and then I just looked at data for 2014 when they had a violation for a water heater that happened to break as the inspector walked in. There are lots of explanations for a violation, and they are all detailed on the City’s website.

Finally, I’ve updated the previous data with a new column, “RISK VALUE”. This is the sum of the risk values associated with the total violations. A lower value is better. Since a violation for a poorly labeled food storage area could result in a failure, and that isn’t (IMO) on the same level of having flies in your prep area (also a failure), I think this new metric helps better represent the violations as an aggregate. The City of Chicago assigns risk values (1,2,3) to each violation. 1 = critical, 3 = kinda bad, but not the end of the world. In order to get a scale that makes more sense in this data, I’ve reversed those values, so 3 is the worst, and 1 is the ‘best’ kind of failure you could get (I didn’t change the values assigned by the city, I just created a new column in the database that translated 3=1, 2=2, 1=3).

So long story short, for the data provided, a business with 3 violations whose risk value is 9 had inspection failures that were more severe (overall) in the eyes of the city than a business who had 3 violations and a risk value of 3.

I hope that clears things up a bit as to the relative degree of the total failed inspections.

Some people have also expressed interest in the data. You can find the DB script on my GitHub