SQLite Data Starter Packs
This is a collection of public datasets conveniently packaged as SQLite databases to practice on. You don’t have to worry about the data cleaning/import process, just download the SQLite database files and query them from your favorite SQLite client.
Quick links to the datasets
Dataset | Size | Tables |
---|---|---|
SimpleFolks for Simple SQL | 0.01 MB | 3 |
American Community Survey 1-Year Data for 2015 | 0.25 MB | 3 |
M3.0+ Earthquakes in the Contiguous U.S., 1995 through 2015 | 52.3 MB | 1 |
S.F. Food Inspections (LIVES) | 16.4 MB | 1 |
Census 2000 Surnames | 23.3 MB | 1 |
Dallas Police Officer-Involved Shootings | 0.4 MB | 3 |
Florida Death Row Roster | 0.1 MB | 1 |
Salaries of City Officials from the California Peninsula | 65.9 MB | 1 |
SFPD Incidents, 2012 through 2015 | 98.3 MB | 1 |
San Francisco Restaurant Health Inspections | 9.8 MB | 3 |
Social Security Administration Baby Names, 1980 through 2015 | 81.0 MB | 1 |
Social Security Administration Baby Names 2015 for All States | 11.4 MB | 1 |
California School SAT Performance and Poverty Data | 14.8 MB | 3 |
Gendered Baby Names 2015 | 19.6 MB | 1 |
Gender assessment of Hollywood Reporter's 2016 Power 100 Rankings | 1.6 MB | 2 |
About the datasets
SimpleFolks for Simple SQL
As a way to simplify learning new SQL syntax, this is a very simple, very small database of people who just go by their first names, and live in a world in which they own pets and homes.
Related links
American Community Survey 1-Year Data for 2015
Selected demographic data, including population by ethnicity and wealth, for U.S. states, places, and congressional districts. Note that the places
table doesn’t have complete data.
Related links
M3.0+ Earthquakes in the Contiguous U.S., 1995 through 2015
Earthquakes within contiguous United States, from 1995 through 2015, that have a magnitude of at least 3.0 as measured by the U.S. Geological Survey
Related links
S.F. Food Inspections (LIVES)
A single-table, flattened version of the health department’s food inspection data. This is what Yelp uses to tie health scores to business listings.
Related links
Census 2000 Surnames
The most popular last names and their racial breakdowns as catalogued by the U.S. Census in 2000.
Related links
Dallas Police Officer-Involved Shootings
Officer-involved shootings as disclosed by the Dallas Police Department. Includes separate tables for officer and subject/suspect information.
Related links
Florida Death Row Roster
Inmates currently on Florida’s death row, with basic biographical information.
Related links
Salaries of City Officials from the California Peninsula
Anonymized salary and benefits information for city officials in San Mateo, Santa Clara, and San Francisco counties, as released by the California state controller.
Related links
SFPD Incidents, 2012 through 2015
Incidents reported to the San Francisco Police Department from 2012 through 2015.
Related links
San Francisco Restaurant Health Inspections
The San Francisco Dept. of Public Health’s database of eateries, inspections of those eateries, and violations found during the inspections.
Related links
Social Security Administration Baby Names, 1980 through 2015
Baby names as recorded for Social Security Number applications, counted by sex, nationwide, from 1980 through 2015.
Related links
Social Security Administration Baby Names 2015 for All States
Baby names as recorded for Social Security Number applications, counted by sex, for each state and nationwide, for 2015. Similar to the nationwide 1980 to 2015 data, but just for 2015. However, you can use it to compare name popularity across states.
Related links
California School SAT Performance and Poverty Data
A database containing geospatial information, as well as SAT average scores and Free-or-Reduced-Price Meal eligibility data, for California schools.
Related links
Gendered Baby Names 2015
This dataset is a transformation of the data in the 2015 Social Security babyname dataset. Instead of having a M
and F
entry for Leslie
, this dataset has one entry for every name, with two additional fields that specify what that name’s majority gender is (and by how much).
This is a useful dataset for joining on other tables with names to get a gender calculation. This dataset includes name data for each state and nationwide.
Related links
Gender assessment of Hollywood Reporter's 2016 Power 100 Rankings
This database contains a copy of the Gendered Baby Names 2015 dataset (just nationwide, not each state), as well as a hand-copied spreadsheet I made of the Hollywood Reporter’s 100 Most Powerful People in Entertainment feature. Useful for an exercise in learning real-world messy JOINs.