SQLite Simple Folks: WHERE and LIKE

How to filter data based on true/false conditions, including fuzzy-text matching

This article is part of a series: SQLite Simple Folks: Overview

Table of contents

Get the data

Once you've downloaded simplefolks.sqlite data file onto your computer, open it with your SQLite client and be ready to write some queries.

How to hack better filters with LIMIT and ORDER BY

In all honesty, knowing LIMIT and ORDER BY puts us in striking range of most of the interesting things we want to know from data, i.e. given a list of people/businesses/politicians, find the 5 or 10 or 100 richest/poorest/oldest/happiest/most despicable.

To refresh our memory of the people table, here's how to find the youngest person among all the people:

SELECT name, age, sex
FROM people
ORDER by age ASC
LIMIT 1;
name age sex
Leland 16 M

Being able to just sort by columns is so powerful that, when we need to filter the data in not-so-trivial way, we can frequently improvise a hack by just being clever about the data.

In the above query, we not only found the youngest person, but the youngest male. But how do we find the youngest female, that is, without manually searching the database or even changing the LIMIT 1 clause?

Well, we know that females are represented as F in the sex column. If we first sort people by sex in ascending order, the records at the top of the results will all be females.

Thus, making age ASC be the second ORDER BY criteria will put the record belonging to the youngest female at the top:

SELECT name, age, sex
FROM people
ORDER BY 
  sex ASC, age ASC
LIMIT 1;
name age sex
Georgina 19 F

Filtering with the WHERE clause

Being clever and making use of what little you have is good exercise. But there's a point where learning new syntax is much more efficient than thinking up hacks (especially hacks that won't apply to more complex data).

In the previous example, it should have felt very convoluted to create a complex sort to make female-associated records rise to the top, on account of the letter F being alphabetical priority over M for "males".

It feels far more direct for our query to ask just give me the records that have a sex of 'M', THEN do the sort

That's where WHERE comes in. Ignore LIMIT and ORDER for now; we use WHERE after we've selected data from a table and we want to filter it.

Here's how to retrieve only people who are listed as male:

SELECT *
FROM people
WHERE sex = 'M';
name sex age
Austin M 33
Blair M 90
Donald M 70
Leland M 16
Liam M 22
Michael M 48
Zed M 42

How the WHERE clause works is that we give it a true/false test, in this case sex = 'M', and it runs that test on every record in the data, e.g. imagine it asking each row in people: Do you have a 'M' in your sex column?. For the rows in which this test is true, they move along to the next part of the SQL processing, such as ORDER BY and LIMIT. For the rows that evaluate as false, they're ignored as far as results are concerned.

(though it's worth emphasizing that nothing gets deleted – SELECT statements don't delete rows no matter what filtering is used)

Re-running the above query, but adding on ORDER BY and LIMIT to find the youngest man:

SELECT *
FROM people
WHERE sex = 'M'
ORDER BY age ASC
LIMIT 1;

To find the youngest woman, we just change that conditional expression (or test, as I sometimes call it) to this:

WHERE sex = 'F'

And now, no more "clever", convoluted sorting that takes advantage of how the English language works:

SELECT *
FROM people
WHERE sex = 'F'
ORDER BY age ASC
LIMIT 1;

Comparison operators

Let's do a quick rundown of all the ways we can express a true/fall test.

Not equal

To negate an expression that checks for equality, put an exclamation mark before the equals sign. The following query would list all people whose sex are not female, which, in a more complex dataset, would be more than just males:

Greater/less than

SQL has the standard comparison operators learned in arithmetic school.

To find all people older than 60:

SELECT *
FROM people
WHERE age > 60;

To find all homes (remember there are other tables) with a value less than $100,000:

SELECT *
FROM homes
WHERE value < 100000;
owner_name area value
Blair suburbs 95000
Dani country 67000
Georgina country 82000
Leland country 42000
Michael country 82000
Phoebe urban 77000

Greater/less than or equal to

Append an equals sign to the > and < operators to express a condition like, "all years up to and including 1999".

To select all homes with a value of at least $400,000, i.e. greater than or equal to $400,000:

owner_name area value
Donald urban 450000
Donald urban 660000
Hillary urban 400000

Comparing string literals

Comparison operators aren't just for math. String literals can be compared alphabetically. For example, the following query would be just fine. I add a ORDER BY name so we can better see which names have been cut off:

SELECT * 
FROM people
WHERE name < 'Dani'
ORDER BY name;

Note how even how the record for Dani has been excluded because Dani is less than Dani, it is equal to it:

name sex age
Austin M 33
Blair M 90
Carolina F 28

Meanwhile, every name after Dani alphetically, starting with Donald, is also excluded.

Let's break it down to a single expression: is Dani less than Donald?

  'Dani' < 'Donald'

This evaluates to true because Donald comes after Dani in the alphabet, just like this numerical comparison also evaluates to true101 comes after 99:

   99 < 101

1 and 0 and true and false in SQLite

But don't take my word, test this with what you know about SQL:

SELECT 
  'Dani' < 'Donald', 
  99 < 101;
'Dani' < 'Donald' 99 < 101
1 1

The result of 1 is considered to be true. For those of you coming from another programming language, SQLite does not have a special type for true and false; just 1 and 0 respectively.

Just to confirm, let's select for something obviously false:

SELECT
  1 = 1
  1 = 2;
1 = 1 1 = 2
1 0

When the WHERE conditional is always true or false

Let's revisit the query for finding the youngest person in people, ignoring sex (and the WHERE condition):

SELECT *
FROM people
ORDER BY age ASC
LIMIT 1;

The result is Leland:

name sex age
Leland M 16

Now let's add a WHERE clause, but with an incredibly dumb conditional expression. If we can write a query as dumb as SELECT 1 = 1; – and SQLite will still run it, then what's stopping us from putting 1 = 2 in the WHERE condition?

Nothing. That's perfectly valid SQL.

Now try to picture what happens: the WHERE clause evaluates this always false test for every row selected. Since every row gets a false, every row is excluded from the results:

SELECT *
FROM people
WHERE 1 = 0
ORDER BY age ASC
LIMIT 1;

You should get an empty result set, but not a SQL errorhuge difference.

OK, let's try for something always true:

SELECT *
FROM people
WHERE 1 = 1
ORDER BY age ASC
LIMIT 1;

What did you expect?

name sex age
Leland M 16

If you run the query sans the LIMIT clause, the result will include every record because for every record, 1 = 1 is true.

Hence, this query with WHERE:

SELECT *
FROM people
WHERE 1 = 1
ORDER BY age ASC
LIMIT 1;

Is equivalent to:

SELECT *
FROM people
ORDER BY age ASC
LIMIT 1;

Which means that WHERE 1 = 1 is basically a no-op, i.e. a useless appendage of SQL code.

Fuzzy matching with LIKE

Often times, with real-world data and names, we don't know quite exactly how a name might be spelled. Someone could go by Jon in one database, and Jonathan in another.

In other programming languages and software packages, the concept of a wildcard is well-known.

A Google search for the double-quoted string of "dan boone" will only return results with literally, "dan boone". Not "danny boone". However, use the asterisk for a wildcard, and Google will know to return results for "daniel boone" as well as "dan boone" and everything in between:

[https://www.google.com/search?q="dan%20boone"](https://www.google.com/search?q="dan%20boone")

SQLite has the same functionality. But instead of using WHERE, we use LIKE. And instead of using the asterisk * for "zero or more of any character", we use the percentage sign, %.

LIKE is used the same spot as a comparison operator such as = or > would be used.

Let's pretend we don't know about the wildcard and try to find all people with names beginning with d. Here's a start:

SELECT * 
FROM people 
WHERE name LIKE 'D';

The result should be empty.

Now, try a wildcard:

SELECT * 
FROM people 
WHERE name LIKE 'D%';
name sex age
Dani F 41
Donald M 70

We can even put the wildcard on the opposite side of the searched-for value, e.g. to find all people with names that end in d:

SELECT * 
FROM people 
WHERE name LIKE '%d';
name sex age
Donald M 70
Leland M 16
Zed M 42

Note: for reasons that won't make sense until you, for some reason, learn database optimization, searching for what a value ends with will often be extremely suboptimal, because of how database values are indexed. It's usually better to search from left-to-right, but that doesn't affect us in this with the simple database at hand.

The IN keyword

LIKE is great, but it can be a bit too permissive with its wildcard. Sometimes we don't want a "fuzzy" match, but a match against exact values.

Instead of finding rows in which a person's name begins with D, what if we already know we want Dani and Donald? Then we can call IN:

(I especially recommend capitalizing IN so that it looks obviously like a SQL keyword)

SELECT * FROM people
WHERE
  name IN('Dani', 'Donald');

NOT IN for negation

To exclude known values, put NOT in front of IN:

SELECT * FROM people
WHERE
  name NOT IN('Dani', 'Donald');

BETWEEN for ranges

BETWEEN is another expression. Instead of using parentheses, we pass in two numbers, with they keyword AND in between:

SELECT * FROM people
WHERE 
  age BETWEEN 30 AND 60;
name sex age
Austin M 33
Dani F 41
Eliza F 37
Michael M 48
Phoebe F 52
Sherry F 39
Zed M 42

A bunch of things about string literals

Now that we've learned LIKE, it's worth pointing out how SQLite, by default, does not treat an uppercased-word equal to its lowercased version. The following query will return False:

SELECT 'DAN' = 'dan';

Which means that the following queries will return nothing from the people table:

SELECT * 
FROM people 
WHERE name = 'dani';


SELECT * 
FROM people 
WHERE name IN ('dani', 'donald');

The general exception is LIKE, which by default, will be _case-insensitive, even without a wildcard:

SELECT * 
FROM people 
WHERE name LIKE 'dani';

The following queries will get the same results:

SELECT * 
FROM people 
WHERE name LIKE 'd%';

SELECT * 
FROM people 
WHERE name LIKE 'D%';

Minding string literals

When we single-quote a value, SQLite will treat it as literal as possible. This can be annoying at first, especially when we're so used to Googling for daniel boone instead of Daniel Boone. But exactness is an inherent side-effect and feature of SQLite.

That said, the world is quite horrible at spelling, nevermind capitalizing proper nouns. When a database could contain Dani or DANI, I take no chances and just use LIKE:

SELECT * 
FROM people 
WHERE name LIKE 'dani';

Sorting string literals

If dani isn't equal to Dani, then it seems logical that lowercase letters are either greater or less than uppercase letters, because they can't be equal:

SELECT 'a' = 'A',
  'a' < 'A',
  'a' > 'A',
  'a' > 'ZZTOP';

Because lowercase-a comes later in the ASCII character set than uppercase-A, when it comes to sorting, a will be considered greater than letters A through Z:

'a' = 'A' 'a' < 'A' 'a' > 'A' 'a' > 'ZZTOP'
0 0 1 1

Sorting numbers that are strings

Special care must be taken when dealing with numbers that are single-quoted:

SELECT
  9 > 10, 
  '9' > '100000010';

In the first value, 9 is being compared to 10, as a number. 9 is less than 10, obviously.

But in the second comparison, 9, as a text character, is being compared to 1, because 100000010 is treated like any other word. 9 beats 1 and thus 100000010, just like d comes after allegory, because d is after a.

Conclusion

A whole chapter could be devoted to string literals, numbers, and data types in general. I only bring them up here so that you're at least aware of where mistakes can be made.

In the next lesson, we look at more forms of Boolean logic, to make more complex WHERE filters.

This article is part of a series: SQLite Simple Folks: Overview