Having singular data and focus

Anecdote, spreadsheet, database, FOIA – it doesn’t matter what you start with but that you start somewhere. So here are some random bits of advice and examples.

Table of contents

What are you complaining about?

I mean, seriously.


By any account, Smith, a 29-year-old, little-known independent journalist, deserved a front row seat to the city’s hastily called press conference Tuesday. In a place where police shootings are frequent and discipline for officers rare, activists and others had seized on the release of dashcam video showing the October 2014 fatal shooting of Laquan McDonald, 17, as a potential watershed moment.

It was Smith—who left journalism school to pursue the real thing and scrapes by with part-time marketing and restaurant jobs and work as an Uber and Lyft driver to feed his investigation addiction—who had successfully sued the city to set in motion the day’s unprecedented events.

Brandon Smith definitely had less time and fewer resources to be a journalist than you do, even on finals week? Smith's story is inspirational. Maybe it's a bit disturbing to read how much of the judicial process depended on Smith's tenacity, because the rest of the Chicago press had mostly given up. But that should feel pretty inspiring, too.

The singular of data is anecdote

Being stationed in beautiful Palo Alto puts you at a gar remove from the intensity of Chicago's world and protests, so you don't benefit from the energy of being there. All you have is some skill at using data, which I'm afraid is usually not enough to magically find a story for you.

So stop hoping for the magic that doesn't exist in data, and find a story an "old-fashioned" way for now. Look for the kind of story that a data-lover/cultist ignores because they believe, "The plural of data is not anecdote" instead of just the opposite.

As NAte Silver points out, not only is the original saying, "The plural of anecdote is data", that's the way data actually works in the real world:

Data does not have a virgin birth. It comes to us from somewhere. Someone set up a procedure to collect and record it. Sometimes this person is a scientist, but she also could be a journalist.

This is why databases are hard, far harder than just learning SQL. But it also means that we can start with a story, then understand its data and the fuller context.

Local matters

Remember that police shooters were poorly covered at all Ferguson, but luckily a passionate reporter had decided before then to start the collection:

Where to start? It's almost been a year since the fatal shooting of 31-year-old William Raff by Palo Alto police. The officers have been cleared, but a pending civil lawsuit from Raff's family probably means you won't get an interview with the officers, a la the New Yorker.

So what can you add to the Palo Alto's lengthy coverage? If the local paper has written so much about it, maybe it's a sign this "just doesn't happen in a place like this."

OK, this is where your "normal" journalist personality ends, so that you can use data to go beyond what your best guesses and poor memory.

  1. Don't just have a hunch, download the spreadsheet from Fatal Encounters to see when the last police-involved death happened.
  2. Ask the Palo Alto PD yourself. You should be able to ask for the police report, and then for any past reports of officer-involved shootings.
  3. According to the Palo Alto Online, new guidelines for these incidents were created in 2012 by the Santa Clara County Police Chief's Association:

The review was conducted according to the 2012 "Officer-Involved Guidelines" adopted by the Santa Clara County Police Chief's Association, according to the DA's office. The Police Chiefs' Association guidelines state that the role of the district attorney is to "monitor the police investigation" and "when deemed necessary, perform an independent investigation, separate from that of the police investigation."

That report is a public record that you can probably find via Googling. But there's still the story of how things changed because of that report. And because it covers Santa Clara County, your reporting can look at officer-involved incidents countywide and how agencies have handled it. Palo Alto may be the closest "data point" to you, but it's not the only incident that impacts your story. Check it out.

All you need is a list

A single FOIA is data. So is a list of FOIA requests. Just because your initial interest was police shootings doesn't mean it has to focus scrutiny on the officers exclusively. The existence for the 2012 "Officer-Involved Guidelines" report is a sign that someone in the county wanted to take a larger look at policing.

But did they? 4 years seems like a long time, but it's long enough for people to not follow-up on whether goals have been met. Check out MuckRock's Use of Force Policy Project as an example of how they see a systematic process of FOIAs being valuable.

Another inspiring example: the way the Upturn collected data – by asking for anecdotes – on the state of predictive policing.

Outrage doesn't uncover everything.

Even though it was 5 years ago, you might remember this California slice of Occupy Wall Street:

I'm proud how my former editor, Diana Lambert, led an investigation using public records that raised the kind of shit storm to force UC Davis's Chancellor Katehi out of a job.

That's pretty amazing considering it's been 5 years since Katehi survived this catastrophic PR blowup:

But it looks like the damage was done, or at least, the Chancellor thought she needed to spend $15K per month to a shady tech company in hopes of cleaning up her reputation. That was never going to look good:

“Nevins & Associates is prepared to create and execute an online branding campaign designed to clean up the negative attention the University of California, Davis, and Chancellor Katehi have received related to the events that transpired in November 2011,” a six-page proposal from Nevins promised.

“Online evidence and the venomous rhetoric about UC Davis and the Chancellor are being filtered through the 24-hour news cycle, but it is at a tepid pace,” the proposal said.

The objectives Nevins outlined for the contract included “eradication of references to the pepper spray incident in search results on Google for the university and the Chancellor.”

Also worth pointing out that this story began with asking why Katehi was moonlighting for a private school. The tear gas part was already long forgiven…

You'd be surprised what "good" journalists don't do

It's hard for me to be too critical of students for not initially having a strong focus or drive in school, because even a supposed experienced journalist like me drops important stories all the time.

This past September, the ACLU of Northern California revealed on how police agencies across the country were paying tens of thousands to Geofeedia, a shady social-media aggregator to see where things were being tweeted/facebooked from:

Police use of social media surveillance software is escalating, and activists are in the digital crosshairs

How did the ACLU do this? Through public records, the same process that you or I could have done:

So this summer, we requested records from 63 police departments, sheriffs, and district attorneys across California. And what we learned from the documents was alarming. We found no evidence in the documents of any public notice, debate, community input, or lawmaker vote about use of this invasive surveillance. And no agency produced a use policy that would limit how the tools were used and help protect civil rights and civil liberties.

To give you an idea of how potent the ACLU's investigation was: less than a month later, Twitter, Facebook, and Instagram cut off their data to Geofeedia. Then they cut half their staff, which feels like a pretty drastic cut after a $17M funding round.

But I really have no excuse. I even saw the ACLU's requests early last year on MuckRock. When I had the opportunity, here was my contribution to investigative data journalism; tweet like a smug asshole:

I turned my nose up at Geofeedia's comically crap system, because I didn't worry, despite all evidence to the contrary, how even an incompetent contractor can wreck lives. Other groups of people feel much differently, and thankfully have vigilant advocates with the ACLU.

The world is far more screwy than imagined

The fun part of data and docs is that often reveal how bad your assumptions were. If there was one thing I was sure of, it was that Facebook – one of Geofeedia's data sources – could build itself a thousand better versions of Geofeedia rather than use the paltry overpriced features on offer.



Asking won't hurt

Even when it comes to pure data, which I should be good at, I'm pretty slow. I remember thinking how awesome this New York Times visualization of NYC taxi traffic was back in 2010.:


I thought, gosh, I hope the Taxi and Limousine Commission release that data for the rest of us to use, and this was while I was working, ostensibly, as an investigative journalist at ProPublica. And about 4 years later, they did, in response to Chris Whong who is "just" a non-journalist-data-scientist who knew about the data, how to Google for the request form, and then how to mail it in. And they sent him the data without a fuss. Soon after, the TLC just started putting all the data up online for [anyone to click and download(http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml).

And even though Whong has been doing ground-breaking visualization analysis of the millions of data rows, that didn't stop FiveThirtyEight from doing its own stories Planes, Trains And Taxis: When To Take Public Transit From The Airport, and requesting its own data: Uber Is Taking Millions Of Manhattan Rides Away From Taxis.

Bottom line: if someone has already asked for something, you have the right to ask for it to. You have nothing to lose.

Let's FOIA

Following the lead of