Carto, Political Boundaries, SQL, and Guns and Coffee

Using Carto and SQL Joins to make a choropleth of Starbucks and gunshops by zip code

Table of contents

Data to download

Just copy-and-paste the following URLs (from this subset of data) and import into your Carto Dataset maker:

Map the Congressional house districts

For reference's sake, this is what the Congressional Districts of the 114th Congress of the United States looks like as a wall poster, which you can download from the U.S. Census site:

wallmap-114-congress.png

The Census and the Congressional District shapefile data

The Census also provies the shapefiles for the district boundaries, i.e. the congress_districts_114th Dataset. You can download it yourself from the Census, or past Congresses boundaries, from the Census Cartographic Boundary Shapefiles - Congressional Districts

(If you haven't gotten the clue, the U.S. Census Bureau is the maintainer of this data, something that will make the suffering later in this lesson more bearable.)

Importing and creating a map from a shapefile

For our intents and purposes, cartographic shapefiles come as ZIP files. No need to unzip them ourselves. Import into Carto as if were just a CSV or any other data file.

Likewise, sreating a map from a dataset of shapes is same as creating map from CSV and points, except we'll get shapes, i.e polygons instead of points.

This is what the dataset looks like as a table:

table-congress-114th=dataset-geojson.png

And Carto is easily able to map it like this:

mapped-congress-district-polygons.png

Styling the fill and stroke of the polygons

Go into the Layer > Style editor to set the Fill of each Polygon to 0.0 alpha (totally transparent) and a solid black outline:

outline-districts-in-black.png

Detour: Using NASA satellite photography to show urban density

In the next section, we're going to use SQL and JOIN statements to fix a real-world data problem – the lack of political data in the Congress district shapefile.

But let's pretend we didn't have SQL. Let's pretend we still wanted to show how congressional district boundaries mapped to population, but we didn't have another database table or spreadsheet.

Sans numerical data, let's use another kind of geospatial layer that purports to represent where humans live: satellite photography.

Carto has a section on Basemaps – the map layer underneath our data – that you can read for more detail. Here's what my map looks like when changing the Basemap from the default "Positron" style to the "NASA" Source, and "Night" Style:

districts-at=night-normal.jpg

What happened to the outlines of our congressional districts? They need to be something that is not black. Go back into the Layer > Style editor to change the Stroke color to something like lavender:

lavender-districts.jpg

Human civilization generally needs light to live. It's a safe bet to say that where there are a lot of lights, according to NASA's photography, there are a lot of people. The density of Congressional district boundaries seems to correspond with the density of lights. Here's the boroughs of New York:

newyork-satellite-night-boundaries.jpg

Meanwhile, Montana and Idaho, which have one at-large representative apiece, has very little light activity within their borders:

montana-idaho-lights.jpg

But lights == people doesn't always work. Look at North Dakota vs. South Dakota – both have a single at-large rep, but the North has something the South doesn't:

nd-vs-nd-lights.jpg

It's fun to play with the NASA imagery, but ultimately, we'll have to join the Census shapefile data with datasets that actually have a count of population and other human activity.

The limits of the Census

We know that the U.S. Census is tasked with counting people. But the shapefile data columns have no indication of this – that is, there is no population count per shapefile row. There's just identifiers:

census-district-shp-no-pop-data.png

And the Census also doesn't include information about each district's elected House Congressmember, which is the entire point of making Congress districts. This makes a little more sense – the Census's job is to track population data and boundaries – a district (or state's) congressmember could change literally every day.

Joining Congressional data with the shapefile data

So if we want our shapefile map to include red states for Republican congressmembers and blue for Democratic, we need to get that party data from another table. That's easy enough: get a spreadsheet of Congressmembers. The Sunlight Foundation tracks such data, the congressmembers.csv file I've provided is a mirror of that.

Using FIPS to join "congressmembers" and "congress_districts_114th"

Ah but everything has to be difficult, doesn't it? The congress_districts_114th shapefile data doesn't have a column for state, it has statefps. And congressmembers doesn't have a FIPS number for state, it uses postal abbreviations:

congress-no-fips-postal-join.png

So we need to add another dataset, one that simply acts as a go-between a table with just postal codes, and another with FIPS. I refer to this as a lookup table, and it's as simple as it sounds:

Import it from this URL:

https://dannguyen.github.io/sql-carto-journalism-exercises/master/data/lookups/us_state_codes.csv

fips-lookups-usstatecodes.png

Mind the datatype of "fips"

One quick note about the fips column – note how the column is listed as having a datatype of string, which preserves the falues of 04 and 09, instead of non-zero-padded integers of 4 and 9 – we want that.

When importing the data, look in the lower-left column for the checkbox labeled, – and disable it:

disable-type-conversion-in-cartoimport.png

SQL JOINS

SQL query to join the us_state_codes.postal_code column to every district shapefile:

(note that I've aliased the congress_districts_114th table as shapes and us_state_codes as lookups, for brevity's sake):

SELECT
  lookups.postal_code, 
  shapes.*  
  
FROM 
  congress_districts_114th AS shapes

INNER JOIN 
  us_state_codes AS lookups 
    ON lookups.fips = shapes.statefp

postal-code-and-shapefiledistring.png

To get the Congressional party data, e.g. D and R, it's just another join and another table. However, remember that we need to join ON two fields:

  • Where congressmembers.state = us_state_codes.postal_code`
  • Where congressmembers.district = congress_districts_114th.cd114fp, i.e. the district number of each house member

Note that I've aliased congressmembers as members:

SELECT
  lookups.postal_code,
  members.party, 
  shapes.*  
  
FROM 
  congress_districts_114th AS shapes

INNER JOIN 
  us_state_codes AS lookups 
    ON lookups.fips = shapes.statefp
INNER JOIN 
  congressmembers AS members
    ON 
      members.state = lookups.postal_code
      AND members.district = shapes.cd114fp;

And look at that sweet, sweet, partisan data, attached to our congress_districts_114th columns (i.e. shapes.*):

party-col-triple-join.png

Styling the Congress district shapefiles, by fill value

red-blue-fill-congress-district.jpg

Sex and Congress

Let's try another angle on this particular map layer of Congressmembers and districts, inspired by: This map shows every place in the US that has ever had a woman in Congress.

It's just a matter of including the gender column:


SELECT
  lookups.postal_code,
  members.gender, 
  members.party,
  shapes.*  
/*everything else is the same*/
FROM 
  congress_districts_114th AS shapes

INNER JOIN 
  us_state_codes AS lookups 
    ON lookups.fips = shapes.statefp
INNER JOIN 
  congressmembers AS members
    ON 
      members.state = lookups.postal_code
      AND members.district = shapes.cd114fp;

Unlike Soo Oh's map, we don't purport to color states by whether or not they have ever been represented by a woman. We just show which districts have a female House rep, as of the 114th Congress:

gender-house-map.jpg

Here's Soo Oh's state map:

soo-oh-gender-congress-states-historical.png

Starbucks and Gun shops

Inspired by Flowing Data, Firearms Dealers vs. Burgers, Pizza, and Coffee

Here's the relevant datasets. We'll stick to zip code boundaries since both the Starbucks data and gunshop data have it:

Starbucks (geospatial point data)

Let's start with starbucks_locations, because it has latitude and longitude fields and upon importing it, we will have a lot of cool dots.

I recommend making a whole new map, and use the Positron Lite layer. Go into the Layer > Style menu, where the default aggregation is just "none" (i.e. dots). Make them Starbucks green:

starbucks-worldwide.png

Hexbinning Starbucks

Tons of dots are not helpful. Let's try out the hexbin aggregation, which will look something similar to this:

starbucks-hexbins.png

(And this is where lesson abruptly stops for now…)