Carto, Political Boundaries, SQL, and Guns and Coffee
Using Carto and SQL Joins to make a choropleth of Starbucks and gunshops by zip code
Data to download
Just copy-and-paste the following URLs (from this subset of data) and import into your Carto Dataset maker:
Map the Congressional house districts
For reference's sake, this is what the Congressional Districts of the 114th Congress of the United States looks like as a wall poster, which you can download from the U.S. Census site:
The Census and the Congressional District shapefile data
The Census also provies the shapefiles for the district boundaries, i.e. the congress_districts_114th Dataset. You can download it yourself from the Census, or past Congresses boundaries, from the Census Cartographic Boundary Shapefiles - Congressional Districts
(If you haven't gotten the clue, the U.S. Census Bureau is the maintainer of this data, something that will make the suffering later in this lesson more bearable.)
Importing and creating a map from a shapefile
For our intents and purposes, cartographic shapefiles come as ZIP files. No need to unzip them ourselves. Import into Carto as if were just a CSV or any other data file.
Likewise, sreating a map from a dataset of shapes is same as creating map from CSV and points, except we'll get shapes, i.e polygons instead of points.
This is what the dataset looks like as a table:
And Carto is easily able to map it like this:
Styling the fill and stroke of the polygons
Go into the Layer > Style editor to set the Fill of each Polygon to 0.0 alpha (totally transparent) and a solid black outline:
Detour: Using NASA satellite photography to show urban density
In the next section, we're going to use SQL and JOIN statements to fix a real-world data problem – the lack of political data in the Congress district shapefile.
But let's pretend we didn't have SQL. Let's pretend we still wanted to show how congressional district boundaries mapped to population, but we didn't have another database table or spreadsheet.
Sans numerical data, let's use another kind of geospatial layer that purports to represent where humans live: satellite photography.
Carto has a section on Basemaps – the map layer underneath our data – that you can read for more detail. Here's what my map looks like when changing the Basemap from the default "Positron" style to the "NASA" Source, and "Night" Style:
What happened to the outlines of our congressional districts? They need to be something that is not black. Go back into the Layer > Style editor to change the Stroke color to something like lavender:
Human civilization generally needs light to live. It's a safe bet to say that where there are a lot of lights, according to NASA's photography, there are a lot of people. The density of Congressional district boundaries seems to correspond with the density of lights. Here's the boroughs of New York:
Meanwhile, Montana and Idaho, which have one at-large representative apiece, has very little light activity within their borders:
lights == people doesn't always work. Look at North Dakota vs. South Dakota – both have a single at-large rep, but the North has something the South doesn't:
It's fun to play with the NASA imagery, but ultimately, we'll have to join the Census shapefile data with datasets that actually have a count of population and other human activity.
The limits of the Census
We know that the U.S. Census is tasked with counting people. But the shapefile data columns have no indication of this – that is, there is no population count per shapefile row. There's just identifiers:
And the Census also doesn't include information about each district's elected House Congressmember, which is the entire point of making Congress districts. This makes a little more sense – the Census's job is to track population data and boundaries – a district (or state's) congressmember could change literally every day.
Joining Congressional data with the shapefile data
So if we want our shapefile map to include red states for Republican congressmembers and blue for Democratic, we need to get that party data from another table. That's easy enough: get a spreadsheet of Congressmembers. The Sunlight Foundation tracks such data, the congressmembers.csv file I've provided is a mirror of that.
Using FIPS to join "congressmembers" and "congress_districts_114th"
Ah but everything has to be difficult, doesn't it? The
congress_districts_114th shapefile data doesn't have a column for state, it has
congressmembers doesn't have a FIPS number for state, it uses postal abbreviations:
So we need to add another dataset, one that simply acts as a go-between a table with just postal codes, and another with FIPS. I refer to this as a lookup table, and it's as simple as it sounds:
Import it from this URL:
Mind the datatype of "fips"
One quick note about the
fips column – note how the column is listed as having a datatype of
string, which preserves the falues of
09, instead of non-zero-padded integers of
9 – we want that.
When importing the data, look in the lower-left column for the checkbox labeled, – and disable it:
SQL query to join the
us_state_codes.postal_code column to every district shapefile:
(note that I've aliased the
congress_districts_114th table as
lookups, for brevity's sake):
SELECT lookups.postal_code, shapes.* FROM congress_districts_114th AS shapes INNER JOIN us_state_codes AS lookups ON lookups.fips = shapes.statefp
To get the Congressional party data, e.g.
R, it's just another join and another table. However, remember that we need to join ON
congress_districts_114th.cd114fp, i.e. the district number of each house member
Note that I've aliased
SELECT lookups.postal_code, members.party, shapes.* FROM congress_districts_114th AS shapes INNER JOIN us_state_codes AS lookups ON lookups.fips = shapes.statefp INNER JOIN congressmembers AS members ON members.state = lookups.postal_code AND members.district = shapes.cd114fp;
And look at that sweet, sweet, partisan data, attached to our
congress_districts_114th columns (i.e.
Styling the Congress district shapefiles, by fill value
Sex and Congress
Let's try another angle on this particular map layer of Congressmembers and districts, inspired by: This map shows every place in the US that has ever had a woman in Congress.
It's just a matter of including the
SELECT lookups.postal_code, members.gender, members.party, shapes.* /*everything else is the same*/ FROM congress_districts_114th AS shapes INNER JOIN us_state_codes AS lookups ON lookups.fips = shapes.statefp INNER JOIN congressmembers AS members ON members.state = lookups.postal_code AND members.district = shapes.cd114fp;
Unlike Soo Oh's map, we don't purport to color states by whether or not they have ever been represented by a woman. We just show which districts have a female House rep, as of the 114th Congress:
Here's Soo Oh's state map:
Starbucks and Gun shops
Inspired by Flowing Data, Firearms Dealers vs. Burgers, Pizza, and Coffee
Here's the relevant datasets. We'll stick to zip code boundaries since both the Starbucks data and gunshop data have it:
Starbucks (geospatial point data)
Let's start with
starbucks_locations, because it has latitude and longitude fields and upon importing it, we will have a lot of cool dots.
I recommend making a whole new map, and use the Positron Lite layer. Go into the Layer > Style menu, where the default aggregation is just "none" (i.e. dots). Make them Starbucks green:
Tons of dots are not helpful. Let's try out the hexbin aggregation, which will look something similar to this:
(And this is where lesson abruptly stops for now…)