Procedural Update on the Vision Zero Project

Jack T.
4 min readFeb 10, 2023

I’ve been slowly (but steadily) collecting crash data for U.S. cities acknowledged by the Vision Zero Network for the last month and a half and I’m happy to say that I’m almost done. About 90% of the cities have data collected and logged in the project repository on GitHub and I’m still waiting for a response on my data request for crash data from Albuquerque. If you want to check out the Google sheet with some metadata on the cities, how much data is available and source links, you can check it out here.

This whole project has been a great exercise in persistence and, perhaps most importantly, patience. Scouring city open data portals and state DOT sites can be time-consuming and frustrating. That’s something I learn nearly every time I search for relevant crash data. Some cities have pretty solid Vision Zero sites that include nice dashboards with easily accessible data. Some cities look like their websites were built in 1995 and haven’t been updated since 2000.

I haven’t been tracking this specifically, but it has seemed that the cities with better data access have all been in fairly liberal/Blue states. Chicago and New York City get first prize here, and Bellevue, WA comes in second. California’s cities earn an honorable mention because the University of California-Berkeley has perhaps the most amazing crash data query tool I’ve ever seen. Seriously, check it out, even if you’re not incredibly interested in transportation statistics.

Data quality and access have been major sticking points throughout this project. Some cities have data right there in their Vision Zero sites, while other cities do not and I’ve had to go to state DOT sites to try and find good data. Once I’ve found the required query tool, it’s a crapshoot on how well the tool will actually work and if it’ll give me data that can be worked with (I guess any data can be worked with and manipulated. I just mean the fields I’m looking for). For example, Florida’s Department of Highway Safety tool limited results to only 5,000 rows, so I could only run a year’s worth of crashes at a time, maybe. Texas’ DOT site at least allowed 50,000 records to be returned, but even then I found that I still had to run some queries in 6 month chunks for San Antonio.

Data organization has been key throughout this project. Due to query restrictions, I’m sometimes downloading up to 10 CSVs per city. So I’ve tried to keep a consistent file labeling format for every dataset. Most of the data is available on my GitHub but I’m also exploring using Google Cloud for the project for some of the larger files that GitHub doesn’t support. In addition to organization, going through the crash datasets for each city takes up a lot of time, ensuring that the required fields are there and there aren’t any duplicates included by mistake.

When it comes time for some analysis of the data, identifying duplicates is going to be key. I’ve already identified a potential sticky patch with some of San Antonio’s crash data. After combining all the datasets, I ended up with one data frame with over 1 million rows(!) for the years 2013 through 2022. When I saw that number I felt like there had to be duplicates in the data. Sure enough, there was, but they weren’t duplicates I was expecting. Every crash in San Antonio (and in many of the other cities) is assigned a Crash ID, typically a string of numbers. What has happened in this case when there are multiple people involved in a crash, the same Crash ID is used for each individual involved. So if there’s a 2 car collision, and Car A had the driver and a passenger while Car B only had a driver, then there will be 3 rows with the same Crash ID, which is a bit frustrating. I have to do some more exploratory analysis and maybe even re-download some of the data.

I haven’t used the federal DOT site that much during this project. I think I will once all the data for the cities are collected and I’ll look to the federal DOT for some Vehicle Miles Traveled (VMT) data. I think combining VMT with the crash data will tell a pretty compelling story about the success or failure of Vision Zero in the US.

So that’s where I’m at right now with the Vision Zero Project. My goal is to have all the data that’s available collected and housed on GitHub or Google Cloud by February 22. Then I’ll start going through some analysis and sharing some of my findings. Until then, if you’re driving, please drive safe and be on the lookout for pedestrians and bicyclists. If you’re walking, be safe and look both ways. If you’re biking, stay lit up like a Christmas tree!

--

--

Jack T.

Data enthusiast. Topics of interest are sports (all of them!), environment, and public policy.