Getting to Know SQLite with a Client and a Database
A brief description of the SQLite database software, specifically, how to navigate a SQLite database file using a SQLite client.
Quick start and learn-by-doing
If you're completely new to SQLite and databases in general, it may be easier to learn via doing. This section contains a quick-start-click-here-type-this hand-holding walkthrough that should work on a relatively modern computer and operating system, while being understandable by a novice spreadsheet user.
Download and launch the DB Browser for SQLite client
Go to the sqlitebrowser.org homepage and download the relevant executable:
Now Install the client by…doing whatever you usually do to install an application. This may include going to your Downloads folder and double-clicking on what you just downloaded.
For Mac users, you'll get the standard popup that asks you to download the funny icon to your Applications folder:
Go to your Applications folder, find the DB Browser for SQLite icon, and double-click it:
If you've managed to install the latest version, 3.9.1, DB Browser will look something like this:
(If you're at around version 3.8, it should look similar and the differences aren't important)
Just like we can open Microsoft Excel without specifying a file, we can open our database client and start empty. But that's no fun. So exit DB Browser and move on to the next step of downloading a pre-build SQLite database file.
Download SQLite database files
I've created a "starter pack" of SQLite databases that are interesting and relatively easy to explore: SQLite Data Starter Packs.
At the top of the page, you can get a quick overview of what's available. But to keep things simple, let's download the following database:
Clicking the above link should immediately download the file into your Downloads folder, with this filename by default:
- florida-deathrow.sqlite
Opening a sqlite Database File
What's that .sqlite
extension mean? It's just a human-readable label (as all file extensions and names are) to remind us that the file can't be opened in Microsoft Excel or a text-editor.
By convention, when you installed DB Browser for SQLite, DB Browser should have established itself as the default program to open .sqlite
files when you double-click them. If not, you should be able to right-click the file and specify which program to use for opening a given file.
Try opening the florida-deathrow.sqlite
file in DB Browser:
And here's what DB Browser looks like when you've launched it while specifying a database file to open:
One panel to rule them all
This is just a quick cosmetic action that you can ignore or reverse later: in most programs, I like having just one window open. And for whatever reason, DB Browser defaults to opening two panels. The left-panel is fine. But the right-panel, titled "Edit Database Cell" is not only empty, but will not be particularly useful for the vast majority of our SQLite work. So just close it by hitting the X in the corner, or toggling the option in the View submenu:
And now we just have one panel:
Much cleaner!
Browsing data in a database
I'll cover the interface features specific to DB Browser in another tutorial. For now, it's more important we acquaint ourselves with the fundamental aspects of a database.
If you've followed my steps exactly, your client, i.e. DB Browser has the Florida death row roster database opened and ready to explore.
But what's a database?
But what is this death row data? Where did it come from? And what makes it a "database" instead of a spreadsheet?
The data came from this Florida Department of Corrections webpage – I wrote a quickie web-scraper but you can just assume I copied it by hand into a SQLite database file:
But why a SQLite database file? Just by looking at the webpage, you might assume that the tabular death row data can fit in a standard spreadsheet.
And you would be right: here's the death row roster as a Google Spreadsheet:
There's really nothing about this data that requires it to be in a database. In fact, because of its simple structure and small size, this death row data should be in a spreadsheet, where we can use all the powerful spreadsheet techniques – such as sorting, filtering, aggregating – that we're perfectly comfortable with.
But I imported this data into a SQLite database so that it's easier to see how those spreadsheet techniques map as database techniques. And also, to reaffirm the point that what makes data data is its structure, not the program you read it with.
Viewing the data with a database client
Later in this lesson, we'll see that the core concept of "opening a database" does not entitle us to actually seeing the data as we're accustomed to when opening a spreadsheet. But for now, let's take advantage of DB Browser, which is a client specifically designed to make databases seem point-and-click friendly.
Bring your attention to the nav bar that sits atop the main panel view. Currently, the Database Structure tab should be highlighted.
This Database Structure view isn't interesting to us right now, though you might have noticed that it appears to list a single table named inmates
with what seems to be field names that correspond to the column names we saw in the Google Spreadsheet.
Whatever. Click on the next nav tab: Browse Data:
This seems to have the convenience of a spreadsheet view; we can scroll up and down the list of rows. There's convenient Filter boxes above every column. And if we double-click a field, we can…edit the data?
If you play with this data-browsing view for more than a few seconds, it should be obvious that it is substantially inferior to what we're used to with a modern spreadsheet. This is because the great open-source authors of DB Browser lack design skills, but because databases aren't meant to be interactive like spreadsheets are.
It's hard to understand that "feature" now until we get into more important, substantial data. The other way to think of it is: when we work with databases, we have better ways of managing data than manually-fixing it by hand.
Querying the data
One last stop on the DB Browser interface: in the nav bar, click the Execute SQL tab:
It looks like nothing, but this the main canvas on which we will write programs to command the computer to fetch, explore, arrange, and calculate insights from data with far more sophistication and scale than we could with a spreadsheet.
Having a blank canvas should feel intimidating. Remember the death row data as a webpage and spreadsheet?
By default, a spreadsheet shows everything before you can ask otherwise. By default, a database shows you nothing, until you tell it what you want to see.
Hello, SQL
SQL is a programming language for SQLite and other databases in the SQL-family. No need to go into more background, so let's just write some code, starting with Brian Kernighan's classic "Hello, World" program, the proper program for when we have no f—–g clue what we're doing or if anything is even working properly.
The top box in the Execute SQL view is just a standard text input box. Type:
Hello, World
Hit Enter.
Then wait.
Sitting, waiting, wishing, with the prompt
If you're completely new to programming, you might be confused at how DB Browser – and indeed, the SQLite database engine – seems to be doing nothing even though you've typed some code and even hit Enter.
This is perfectly rational, even though it's hard for experienced programmers (including me) to understand how even a novice could be confused.
It's rational, because if you've been using modern computers with any regularity and even proficiency, since, say 2010, then you may have either forgotten, never realized, or never experienced the computer just waiting for you.
As soon as you open your web browser, your computer is sending off thousands of bytes of data relating to either what you were browsing in a previous session or what you are expected to browse in the next few minutes. As soon as you type in a couple characters into Google, Google already has an opinion about what you're confused with:
Facebook's system knows you and your friends so well that, as soon as you upload a photo, it can basically tag the faces in the photo even before you see the uploaded photo in your browser. As soon as you start typing anything to a friend, Facebook's servers are processing your words, even if you don't send the message
Hell, you can't even load a file in a modern spreadsheet program without it, the spreadsheet program, imposing its opinions and throwing error messages or corrupting the original data.
SQLite and the DB Browser client are as sophisticated as any modern software you use day-to-day. But DB Browser, aside from making some very narrow and simple auto-complete suggestions as we write a query, will do nothing until we hit the Execute button. So feel free to make mistakes, you're not being judged…yet:
And the SQLite engine itself, well, it takes minimalism to an extreme, as we'll see later.
Execute!
OK, let's get back to work. In DB Browser, the button that looks like the Play button on a music player will execute whatever code we've written:
Go ahead and hit it…And now comes the judgment:
Hello, World
is not proper SQL syntax. And so, we get a syntax error. In DB Browser, the bottom box is the results console and worth paying attention to later.
Here's one way to say "Hello, World", following the rules of SQL:
SELECT 'Hello, World';
Hooray, congrats on your first SQL program:
We'll elaborate later the details of the DB Browser interface. The important message is in the results console: not an error
Syntax highlighting
As you typed that command, you might notice that the color of the text changes. This is referred to as syntax-highlighting: SQL is a formal, and pretty simple language. DB Browser knows that SELECT
, in that context, is a SQL keyword, hence, the bolding. And in SQL, when text is enclosed in single-quote-marks, it has a whole different meeting.
For example, write and execute:
SELECT 'SELECT';
And now is the time to experiment and deliberately break things. In fact, that should be your mentality going forward if you hope to get a grasp of how SQL and databases work in the short time we have to study them.
What happens when we un-quote that second SELECT
?
SELECT SELECT;
Another syntax error:
Mind your grammar and syntax
Again, we'll be exploring this in much more detail later, but SQL is a language. Grammar and punctuation have meaning in the same ways they do in human languages, and missing a single word or punctuation mark or rule can be problematic:
Wrong: "She enjoys mans laughter."
Correct: "She enjoys man's laughter."
Error messages are good
It's hard to get used to being wrong, and the thing with a computer program, database or otherwise, is that it never tires of telling you when and why you're wrong. But this is an asset: we want to fail fast.
Ask this patent troll, who only found out about his poor punctuation after losing millions of dollars in a lawsuit.
Because there are scenarios far worse than being wrong. You can be technically correct, but absolutely wrong:
Grammatically correct: "She enjoys manslaughter."
A whirlwind tour of SQL
In subsequent lessons, we'll learn the SQL needed to elevate our data work. For now, I'll demonstrate how SQL can be used to find the kind of insights we found with spreadsheets.
Assuming you're following along with the Florida Death Row Roster as SQLite database, try just typing and execute the same code I write below:
Show me the data
Select and show all rows, all columns
Unlike spreadsheets, we can't get something, nevermind everything, without writing code:
sql
SELECT *
FROM "inmates";
Select specific columns
Remember how real-life datasets often contain irrelevant or redundant columns, and how these columns can make it hard to just read the spreadsheet? And the work it takes to manually hide or delete the columns? By being explicit in our SQL query, we declare exactly what we want and only what we want:
sql
SELECT
"Inmate Name", "Race/Gender",
"Crime", "Date Received"
FROM "inmates";
Sorting rows by column values
Not much more work than clicking a column-header in a spreadsheet – and a lot more precise. Here's the list of inmates ordered by youngest first – i.e. "birthdate in descending order":
sql
SELECT *
FROM "inmates"
ORDER BY "Date of Birth" DESC;
Limiting the number of rows
After doing a sort, we often don't care about the entire dataset. We just care about what is at the top of the sort. In a spreadsheet, we have to jump through a few hoops to hide the noise. With SQL, it's just another clause:
sql
SELECT *
FROM "inmates"
ORDER BY "Date of Birth" ASC
LIMIT 5;
Deriving columns
Not much different than creating a new column in spreadsheet filled with values derived from a formula, e.g:
=(B2+C2) / 100
But in SQL, we don't have to memorize arbitrary column-letters. Or do the mouse-clicking to insert a new column and replicating the values. We don't even have to create a column at all. We merely declare the formulas and calculations we wish to see:
sql
SELECT
"Inmate Name",
SUBSTR("Date of Birth", 1, 4),
SUBSTR("Date Received", 1, 4)
FROM "inmates";
Aliasing column names
Another step required in the spreadsheet version of data-insertion: naming the new columns.
We don't have to do that in SQL, as you can see in the previous screenshot. But if we want nicer, human-friendly headers, we just need to be a little more explicit:
sql
SELECT
"Inmate Name",
SUBSTR("Date of Birth", 1, 4) AS "birth_year",
SUBSTR("Date Received", 1, 4) AS "received_year"
FROM "inmates";
And we can do math, too: what is the youngest age that someone has entered death row? That's a function of the difference between received_year
and birth_year
:
sql
SELECT
"Inmate Name",
(SUBSTR("Date Received", 1, 4) - SUBSTR("Date of Birth", 1, 4))
AS age_received
FROM "inmates"
ORDER BY age_received ASC;
A nice side-effect of all this aliasing and rearranging of columns: we never alter the underlying data. In contrast, it's nearly impossible to do a sort of a spreadsheet without actually changing how the data is arranged.
Sometimes, we don't want that.
Filtering
To me, filtering data in spreadsheets has not only required a significant investment in the soul-killing work of learning-the-menus-of-a-spreadsheet, the process has always been quirky and prone to catastrophic human-error. This is why I never really teach how to filter data using a spreadsheet.
Filtering in SQL uses the same kind of conditional thinking – "show me rows in which some column is equal to/greater than/less than some other value" – but we can express it in code, rather than finicky checkboxes:
Find all inmates from a specific County
sql
SELECT *
FROM inmates
WHERE "County" = "Polk";
The conditional logic can be as flexible as we need it: Black males born after 1980, or, female inmates:
sql
SELECT *
FROM inmates
WHERE
("Race/Gender" = 'BM' AND "Date of Birth" > '1980-01-01')
OR SUBSTR("Race/Gender", 2, 1) = 'F';
Aggregations
Pivot tables are a killer feature of spreadsheets, and SQL, because of the rigid nature of databases, can't do all the fancy data-rearrangement that a pivot table can.
But with aggregation functions and the GROUP BY
clause, we can count, sum, average across categories in a dataset.
Getting a count of rows
First, how many rows in this dataset?
sql
SELECT COUNT(1) FROM `inmates`;
But some inmates are in for multiple crimes. So we need to count distinct ID numbers for the prisoners:
sql
SELECT COUNT(DISTINCT "DC#") AS totalinmates
FROM inmates;
But if we're interested in getting a list of inmates sorted by number of crimes committed, we use GROUP BY
to facet by a column value:
sql
SELECT "DC#", "Inmate Name", COUNT(1) AS crimecount
FROM inmates
GROUP BY "DC#"
ORDER BY crimecount DESC
LIMIT 10;
Getting a count by Race/Gender
Want a demographic count of distinct inmates? No problem:
sql
SELECT "Race/Gender",
COUNT(DISTINCT "DC#") AS inmatecount
FROM inmates
GROUP BY "Race/Gender"
ORDER BY inmatecount DESC;
Group and aggregate with whatever values you want
(note: There is a much more rigorous and sane way of calculating dates and time spans. I ignore it for brevity's sake)
What if I wanted to get a count of inmates by age? That can roughly and sloppily calculated with: 2016 - SUBSTR("Date of Birth", 1, 4)
:
sql
SELECT
2016 - SUBSTR("Date of Birth", 1, 4) AS "age",
COUNT(DISTINCT "DC#") AS inmatecount
FROM inmates
GROUP BY "age"
ORDER BY "age" ASC;
Those age groups (by each year) are a bit too granular. Let's do something simpler: group inmates by the decade in which they were born:
sql
SELECT
SUBSTR("Date of Birth", 1, 3) * 10
AS "decadeborn",
COUNT(DISTINCT "DC#") AS inmatecount
FROM inmates
GROUP BY "decadeborn"
ORDER BY "decadeborn" ASC;
One more aggregation: By race and by decade, and let's simplify it to "black" and "white":
sql
SELECT
SUBSTR("Date of Birth", 1, 3) * 10
AS "decadeborn",
SUBSTR("Race/Gender", 1, 1)
AS race,
COUNT(DISTINCT "DC#") AS inmatecount
FROM inmates
WHERE race IN('B', 'W')
GROUP BY "decadeborn", "race"
ORDER BY "decadeborn", "race" ASC;
Porting data
I think that's enough random SQL queries for now. Let's get back to the other parts of data work, particularly visualization.
Databases generally focus on storing and sorting through data. Which is why they can open and crunch datasets thousands of times bigger and faster than what spreadsheets can do. The tradeoff is that they aren't good for much else. Unlike spreadsheets, the typical database does not come with a built-in chart maker.
No problem. Spreadsheets are a great visualization tool. All we need to do is export our data out of the database, and import it into a spreadsheet.
I liked the most recent aggregation we did of Florida inmates. Did the results look like a table, i.e. a spreadsheet? That's because for all intents and purposes, it is another table. Unlike a spreadsheet, we don't have to create a new tab/table/sheet to store our results.
Exporting data
OK, one more DB-Browser-specific-tip. In the Execute SQL view, with the results shown, there is a little button in the bottom-right that allows us to Export to CSV:
We'll be directed to a pop-up menu: the default options are fine, though you should set "New line characters" to "Unix" just to be safe:
What does our new CSV file look like when we then import it into a new spreadsheet? Exactly as you'd expect:
Now, we do a quick Pivot Table to rearrange the data specifically to how spreadsheet-chart-making-tools wants it:
Note that this visual rearranging of data is one thing that is almost insurmountably difficult in SQL, so much that doing analysis in SQL and porting it to a spreadsheet is something you'll get used to doing.
Finally, we visualize, using Google Spreadsheet's chart wizard:
Here's the chart in its interactive glory:
Nothing spectacular, but this is just a quick demo. Everything we breezed through here applies for every dataset that we put into a database, and we haven't even gotten to the interesting and powerful parts of SQL.
Other clients
DB Browser is a great client for working with SQLite databases. But it's a client – a graphical one at that – not SQLIte itself. Here's a few other alternatives:
- SQLite Manager for Firefox: I used to use this for teaching, but its interface was clunky enough that it unnecessarily distracted students.
- Base : this OSX-specific client will cost you $25 but, as with any paid product, it has a decent layer of polish. However, I found DB Browser has more than enough polish for me.
- Navicat for SQLite: Another commercial solution, but cross-platform. I've used Navicat as a client for other databases, such as MySQL, with good success.
In this sense, DB Browser and other clients are not analogous to Excel and spreadsheet files: if you open a spreadsheet in Excel, and then open it in Apple Numbers or Google Spreadsheets, you may end up with very different results.
A database client, however, is meant to be a thin shell over the database engine itself.
Going Mr. Robot with the command-line
(this is a stub section, though going forward, most of my SQLIte demonstrations will be from the terminal)
SQLite doesn't have many bells and whistles, but it does come with a text-only client. I highly recommend it if you ever find yourself hating having to learn point-and-click interfaces. It's not for novices, but once you become confident in SQL as a language, you'll enjoy its minimalism.
Here's the official docs/instructions for SQLite's "command line shell".
How to activate it: Go to your system's shell or Terminal and type sqlite3
to run the SQLite database engine:
That's basically what's going on underneath the hood of DB Browser and every other client that you execute SELECT 'hello world';
on.
As with DB Browser, we typically want to jump into a database when launching SQLite. From the command-line, we can specify the name of the database that we want to use. In the snippet below, I open a database that's in my Downloads
folder, then run a few meta commands so that the text output is easier to read before I run a query:
References
-
SQLite Data Starter Packs
This is a collection of public datasets conveniently packaged as SQLite databases to practice on. You don't have to worry about the data cleaning/import process, just download the SQLite database files and query them from your favorite SQLite client. -
DB Browser for SQLite Homepage
DB Browser is a cross-platform SQLite client with a relatively friendly point-and-click interface for easing the transition from Excel to SQL. -
Official docs for Command Line Shell For SQLite
Do you often imagine yourself as Darlene/Elliott/Trenton/Tyrell from "Mr. Robot"? SQLite comes with a spartan command-line interface for writing and executing queries.