Midterm SQL
A Midterm of SQL. Read the questions. Write the answers on pieces of paper.
Simple Data
Used for "Match the query" and "Write out the query".
"people" table
SELECT *
FROM people;
name | sex | age | eyes |
---|---|---|---|
Austin | M | 10 | blue |
Blair | M | 32 | brown |
Carolina | F | 28 | brown |
Dani | F | 22 | blue |
Eliza | F | 16 | blue |
"homes" table
SELECT *
FROM homes;
owner_name | area | value |
---|---|---|
Austin | urban | 500 |
Blair | suburbs | 800 |
Carolina | suburbs | 100 |
Carolina | urban | 200 |
Dani | country | 400 |
Donald | urban | 200 |
Eliza | urban | 100 |
Match the query
Just match the right query to the given result. (1 point each)
Example: Get a sum of home values, where the owner is female, grouped by eye color of the owner and area, ordered by highest total and then area (alphabetically)
(this is the toughest kind of query you'll have to do)
eyes | area | total |
---|---|---|
blue | country | 400 |
brown | urban | 200 |
brown | suburbs | 100 |
blue | urban | 100 |
The answer to this is XC:
XA.
SELECT
eyes, area, value AS total
FROM homes
INNER JOIN
people
ON people.sex = 'F'
ORDER BY total DESC;
XB.
SELECT
eyes, area, SUM(value) AS total
FROM homes
INNER JOIN
people
WHERE people.name = homes.owner_name
AND sex = 'F'
GROUP BY eyes, area
ORDER BY total, area;
XC.
SELECT
eyes, area, SUM(value) AS total
FROM homes
INNER JOIN
people
ON people.name = homes.owner_name
WHERE sex = 'F'
GROUP BY eyes, area
ORDER BY total DESC,
area ASC;
1. From "people", select all the males
The result:
name | sex | age | eyes |
---|---|---|---|
Austin | M | 10 | blue |
Blair | M | 32 | brown |
1A.
SELECT *
FROM people ;
1B.
SELECT *
FROM people
WHERE sex = 'M';
1C.
SELECT 'M'
FROM people;
2. List the values of the first 3 "homes", in order of value
The result:
value |
---|
800 |
500 |
400 |
2A.
SELECT value
FROM homes
ORDER BY
value DESC
LIMIT 3;
2B.
SELECT *
FROM homes
LIMIT 3
ORDER BY
value DESC;
2C.
SELECT *
FROM homes
WHERE value > 400;
3. Count the number of people with brown eyes
COUNT(*) |
---|
2 |
3A.
SELECT *
FROM people
GROUP BY eyes;
3B.
SELECT eyes
FROM people
WHERE eyes = "brown";
3C.
SELECT COUNT(*)
FROM people
WHERE eyes = "brown";
4. Count the number of people grouped by sex, in descending order of count
sex | the_count |
---|---|
F | 3 |
M | 2 |
4A.
SELECT
COUNT(*) AS the_count
FROM people;
4B.
SELECT sex,
COUNT(*) AS the_count
FROM people
GROUP BY sex;
4C.
SELECT GROUP BY(sex),
COUNT(*) AS the_count
FROM people;
5. List all the fields for the people over 20, in order of descending age
name | sex | age | eyes |
---|---|---|---|
Blair | M | 32 | brown |
Carolina | F | 28 | brown |
Dani | F | 22 | blue |
5A.
SELECT *
FROM people
WHERE age > 20
ORDER BY age;
5B.
SELECT *
FROM people
WHERE age > 20
ORDER BY age DESC;
5C.
SELECT *
FROM people
ORDER BY age DESC;
6. List the sum of the value of homes by area, in order of the sum per area.
area | total |
---|---|
urban | 1000 |
suburbs | 900 |
country | 400 |
6A.
SELECT area,
SUM(value) AS total
FROM homes
GROUP BY area
ORDER BY total DESC;
6B.
SELECT area,
value AS total
FROM homes
GROUP BY area
ORDER BY total DESC;
6C.
SELECT area,
SUM(value) AS total
FROM homes
ORDER BY total DESC;
7. Join all the "homes" to their proper owner in "people", in reverse alphabetical order of the owner's name
owner_name | area | value | name | sex | age | eyes |
---|---|---|---|---|---|---|
Eliza | urban | 100 | Eliza | F | 16 | blue |
Dani | country | 400 | Dani | F | 22 | blue |
Carolina | suburbs | 100 | Carolina | F | 28 | brown |
Carolina | urban | 200 | Carolina | F | 28 | brown |
Blair | suburbs | 800 | Blair | M | 32 | brown |
Austin | urban | 500 | Austin | M | 10 | blue |
7A.
SELECT *
FROM homes
ORDER BY
homes.owner_name DESC;
7B.
SELECT *
FROM homes
INNER JOIN
people ON
people.name = homes.owner_name
ORDER BY
homes.owner_name DESC;
7C.
SELECT *
FROM homes
WHERE
people.name = homes.owner_name
ORDER BY homes.owner_name DESC;
8. Find the homes owned by men, in descending order of value
owner_name | area | value |
---|---|---|
Blair | suburbs | 800 |
Austin | urban | 500 |
8A.
SELECT
owner_name, area, value
FROM homes
INNER JOIN
people ON
WHERE sex = 'M'
ORDER BY owner_name DESC;
8B.
SELECT
owner_name, area, value
FROM homes
WHERE people.name = owner_name
AND sex = 'M'
ORDER BY owner_name DESC;
8C.
SELECT
owner_name, area, value
FROM homes
INNER JOIN
people ON
people.name = owner_name
WHERE sex = 'M'
ORDER BY owner_name DESC;
9. Count the number of homes owned by women
COUNT(*) |
---|
4 |
9A.
SELECT COUNT(*)
FROM homes
INNER JOIN
people ON
homes.owner_name = people.name
WHERE sex = 'F';
9B.
SELECT sex, COUNT(*)
FROM homes
INNER JOIN
people ON
homes.owner_name = people.name
9C.
SELECT COUNT(*)
FROM homes
INNER JOIN
people
WHERE sex = 'F';
10. List the count and total value of homes, grouped by owner, in descending order of total value
owner_name | total_value | total_count |
---|---|---|
Blair | 800 | 1 |
Austin | 500 | 1 |
Dani | 400 | 1 |
Carolina | 300 | 2 |
Eliza | 100 | 1 |
10A.
SELECT GROUP BY(owner_name),
SUM(value) AS total_value,
COUNT(*) AS total_count
FROM homes
INNER JOIN
people ON
homes.owner_name = people.name;
10B.
SELECT owner_name,
SUM(value) AS total_value,
COUNT(*) AS total_count
FROM homes
INNER JOIN
people ON
homes.owner_name = people.name
GROUP BY owner_name
ORDER BY total_value DESC;
10C.
SELECT owner_name,
SUM(value) AS total_value,
COUNT(*) AS total_count
FROM homes
INNER JOIN
people ON
homes.owner_name = people.name;
11. List all homes owned by people whose names begin with 'D'
Hints:
-
No join or aggregation needed, just making sure you know how to use a function.
-
Use the substring function,
SUBSTR
, which the first argument is the string/column to "slice", the second argument is where to start cutting from, and the third argument is where to end the slice:
SELECT SUBSTR(owner_name, 1, 3) AS letters
FROM homes
LIMIT 3;
letters |
---|
Aus |
Bla |
Car |
(/ end of hints)
The result for the query for this given question:
first_initial | owner_name | area | value |
---|---|---|---|
D | Dani | country | 400 |
D | Donald | urban | 200 |
11A.
SELECT SUBSTR(owner_name, 1, 1) AS 'D',
homes.*
FROM homes;
11B.
SELECT SUBSTR(owner_name, 1, 1) AS first_initial,
homes.*
FROM homes
WHERE first_initial = 'D';
11C.
SELECT SUBSTR(owner_name, 1, 1)
FROM homes
WHERE first_initial = 'D';
Write out the query
(2 points each)
The results are provided, just write the query to get the answer.
Example: List all the homes owned by women, in order of value
The results:
owner_name | area | value |
---|---|---|
Dani | country | 400 |
Carolina | urban | 200 |
Carolina | suburbs | 100 |
Eliza | urban | 100 |
The answer:
SELECT homes.*
FROM homes
INNER JOIN
people
ON people.name = homes.owner_name
WHERE people.sex = 'F'
ORDER BY value DESC;
1. List the homes owned by "Carolina"
owner_name | area | value |
---|---|---|
Carolina | suburbs | 100 |
Carolina | urban | 200 |
2. List all the men in order of oldest to youngest
name | sex | age | eyes |
---|---|---|---|
Blair | M | 32 | brown |
Austin | M | 10 | blue |
3. Sum the values of homes
SUM(value) |
---|
2300 |
4. Calculate the count of people grouped by eye color, in ascending order of the count
(requires group-by-aggregation, but not a JOIN)
eyes | the_count |
---|---|
brown | 2 |
blue | 3 |
5. List all homes owned by brown-eyed men
(requires a JOIN, but not an aggregation)
owner_name | area | value |
---|---|---|
Blair | suburbs | 800 |
Twitter Joining
A set of tweets and hashtags from those tweets from Election 2016. The hypothetical results of the queries are given, just write the query.
NOTE: Only do questions 1, which just requires the tweets
table.
Question 2 is extra credit.
Twitter Tables
"tweets" - A list of tweets
ID | time_tweeted | screen_name | Text |
---|---|---|---|
789104196003868673 | 2016-10-20 14:01:15 | piersmorgan | My new @DailyMail column is about last night's final presidential TV debate. Posting soon. #Trump #Clinton https://t.co/lxEkbMPHAr |
789102711551635456 | 2016-10-20 13:55:21 | creynoldsnc | Get out there, North Carolina. Let's #TurnNCBlue https://t.co/6g0EeYDfWa |
789101425557110784 | 2016-10-20 13:50:15 | brianstelter | A first look at last night's #debate ratings… coming up next on @CNN TV with @CarolCNN… |
789095747635572736 | 2016-10-20 13:27:41 | realdonaldtrump | Thank you America! #MAGA |
789088297989640192 | 2016-10-20 12:58:05 | brianstelter | RT @ExtremeLiberal: Go @brianstelter for going after the "only elites" are concerned about our democratic process. @cnn #cnn |
tw_hashtags (optional table for optional question)
Hashtags as extracted from the text of tweet
For example, a tweet with ID 101
, with this text:
I bought and sold my first #apple computer
Has a corresponding entry in tw_hashtags
of:
tweet_id | tag |
---|---|
101 | apple |
(note: a tweet can have more than one hashtag, but that's not important for this section.)
Here's a sample from the table `tw_hashtags:
tweet_id | tag |
---|---|
325013411014520832 | aspen |
325624214998626305 | regionaldirector |
325624214998626305 | timbayly |
325624214998626305 | dannythomasmemorialpavillion |
326138349939212288 | celebapprentice |
Example: Get the 5 hashtags most frequently used in tweets sent on July 2016
Result:
tag | hcount |
---|---|
rncincle | 349 |
demsinphilly | 167 |
hannity | 161 |
trumppence16 | 83 |
womenwhowork | 78 |
Query:
SELECT tag,
COUNT(*) AS hcount
FROM tw_hashtags
INNER JOIN
tweets ON
tw_hashtags.tweet_ID = tweets.ID
WHERE
SUBSTR(time_tweeted, 1, 7) = "2016-07"
AND
GROUP BY tag
ORDER BY hcount DESC
LIMIT 5;
(The two following questions are not this difficult; this just is a reminder of all the concepts and where they go)
Questions
T1. Count the tweets that mention "vote" somewhere in their text
(2 points)
Hint: Remember to use LIKE
and the wildcard, %
, to see if some text string is part of another text string.
Expected answer:
tcount |
---|
3978 |
T2. (3 points Extra credit)
(3 points)
The prompt:
For the user named 'realdonaldtrump' list his 5 most used hashtags when tweeting in the hours between midnight and 5 A.M.
Basically, find all tweets by this user in that time period, connect them to their hashtags, then do a group count. All of the components are in in the example.
Hints
Hint: Don't forget that we just want it for the user with a screen name of realdonaldtrump
Hint: To get a tweet that was posted at midnight, i.e. 00
, you can use SUBSTR
to extract digits. Or you should just use STRFTIME
:
SELECT * from tweets
WHERE
STRFTIME("%H", time_tweeted) = "00"
ID | Posted at | Screen name | Text |
---|---|---|---|
795430555739373568 | 2016-11-07 00:59:57 | ingrahamangle | Sadly 12,000 ppl were unable to get into the #Minneapolis Event bc fire marshal shut it down! @realDonaldTrump https://t.co/HNr2BTcVJ8 |
(/end of hints)
The expected result of the query:
tag | hcount |
---|---|
trump2016 | 94 |
makeamericagreatagain | 74 |
bigleaguetruth | 57 |
debate | 45 |
maga | 41 |
Officer-involved shootings
Data
These data tables are example, simplified excerpts from the L.A. county shooting incidents. They do not represent the LA county's actual data but have been simplified for exam results.
deputies
INCIDENT NUMBER | INCIDENT DATE | INCIDENT TYPE | DEPUTY RACE | DEPUTY AGE | YEARS OF SERVICE | # OF PREVIOUS SHOOTINGS |
---|---|---|---|---|---|---|
1000914 | 2016-10-31 | HIT SHOOTING INCIDENT | FILIPINO | 47 | 0 | 0 |
1000914 | 2016-10-31 | HIT SHOOTING INCIDENT | HISPANIC | 30 | 1 | 0 |
1000684 | 2016-02-14 | HIT SHOOTING INCIDENT | HISPANIC | 32 | 8 | 0 |
1000874 | 2016-09-08 | HIT SHOOTING INCIDENT | WHITE | 26 | 4 | 0 |
1000450 | 2013-07-19 | HIT SHOOTING INCIDENT | HISPANIC | 44 | 18 | 1 |
people
A list of people (who were shot) connected to each shooting incident.
0
and 1
are meant to stand for "false" and "true", respectively.
INCIDENT NUMBER | PERSON AGE | PERSON RACE | DECEASED | WOUNDED | MENTAL HEALTH CONCERNS | ON PAROLE |
---|---|---|---|---|---|---|
1000472 | 46 | WHITE | 0 | 0 | 0 | 0 |
1000380 | 35 | HISPANIC | 0 | 0 | 0 | 1 |
1000467 | 49 | BLACK | 1 | 0 | 0 | 1 |
1000332 | 25 | HISPANIC | 0 | 0 | 0 | 1 |
1000654 | 53 | BLACK | 1 | 0 | 0 | 0 |
Example
To get the race of the deputy and person in the 5 most recent incidents in which the person died:
(remember to quote fields that have non-alphanumeric characters, i.e. "DEPUTY RACE"
)
SELECT people."INCIDENT NUMBER",
"INCIDENT DATE",
"PERSON RACE",
"DEPUTY RACE"
FROM people
INNER JOIN
deputies ON
deputies."INCIDENT NUMBER" = people."INCIDENT NUMBER"
WHERE "DECEASED" = 1
ORDER BY "INCIDENT DATE" DESC
LIMIT 5;
INCIDENT NUMBER | INCIDENT DATE | PERSON RACE | DEPUTY RACE |
---|---|---|---|
1000896 | 2016-10-14 | HISPANIC | HISPANIC |
1000895 | 2016-10-14 | BLACK | WHITE |
1000894 | 2016-10-14 | WHITE | HISPANIC |
1000874 | 2016-09-14 | WHITE | WHITE |
1000874 | 2016-09-14 | WHITE | WHITE |
Questions
P1. Get the count of incidents grouped by the person's race and the deputy's race, in descending order of incident count
(3 points)
Write out this query. It requires a JOIN
and GROUP BY
. And remember to quote field names, i.e. "DEPUTY RACE"
, when necessary. And alias the count of incidents as 'icount'
(see the header below)
Example result:
PERSON RACE | DEPUTY RACE | icount |
---|---|---|
HISPANIC | HISPANIC | 65 |
HISPANIC | WHITE | 42 |
BLACK | HISPANIC | 25 |
BLACK | WHITE | 23 |
WHITE | WHITE | 7 |
WHITE | HISPANIC | 3 |
P2. Essay SQL
(6 points)
A lot of police shootings analyses focus on race of officer and subject. But there are often many other aspects to the data.
- Describe a question you have of the data.
- Tell me why it's interesting, or at least more interesting to you beyond total number of shootings and race of the persons involved.
- Write out the SQL query using the
deputies
and/or thepeople
tables. - Sketch out the table results. They don't have to be correct answers, since you don't have the whole data at hand. I'm looking to see what you expect the data's format to look like, based on your query.
Your query for #4 doesn't have to be as complicated as the example or previous problem. But it should have at least:
- A filtering of the data by some condition
- An aggregation (count, average, max, min) or a JOIN statement between people and deputies.
- Be selective of the columns, i.e. don't use
*
In other words, I want something more complicated than this:
SELECT * FROM deputies
LIMIT 5;