TDM 20100: Project 11 — SQL
Motivation: Now we will apply our SQL skills by studying movies and TV shows.
Context: The Internet Movie DataBase www.imdb.com provides data tables here: datasets.imdbws.com which we have stored in a database for you here: /anvil/projects/tdm/data/movies_and_tv/imdb2024.db
Scope: There are 7 tables to get familiar with: akas, basics, crew, episode, name, principals, ratings
Dataset(s)
This project will use the following dataset:
-
/anvil/projects/tdm/data/movies_and_tv/imdb2024.db(Internet Movie DataBase (IMDB))
Our page in The Examples Book about SQL (in general) is given here: the-examples-book.com/tools/SQL/
Questions
Using the seminar kernel, if you run this line in a cell by itself:
%sql sqlite:////anvil/projects/tdm/data/movies_and_tv/imdb2024.db
then you will have the movies and TV database loaded.
|
If your kernel dies, then you need to re-run the line above. You also need to re-run this line at the start of any new Jupyter Lab session. |
The tables in this database are:
akas
basics
crew
episode
name
principals
ratings
Question 1 (2 pts)
-
From the
basicstable, display the entry for Friends. (The title constant for Friends istt0108778. Notice that this corresponds to the IMDB webpage for Friends: www.imdb.com/title/tt0108778 from IMDB.) -
Find all of the entries of the
principalstable that correspond to people in Friends. -
Use the
episodetable to discover how many episodes occurred during each season of Friends. For each season, print the season number and the number of episodes in that season.
|
Notice that the |
-
From the
basicstable, display the entry for Friends. -
Find all of the entries of the
principalstable that correspond to people in Friends. -
Use the
episodetable to discover how many episodes occurred during each season of Friends. For each season, print the season number and the number of episodes in that season.
Question 2 (2 pts)
Join the ratings and the basics table, to find the 13 titles that each have more than 2 million ratings. For each such title, output these values: tconst, averageRating, numVotes, primaryTitle, startYear, runtimeMinutes, and genres
-
For each of the 13 titles that each have more than 2 million ratings, output these values:
tconst,averageRating,numVotes,primaryTitle,startYear,runtimeMinutes, andgenres
Question 3 (2 pts)
Using the startYear values from the basics table, find the total number of entries in each startYear.
-
For each
startYearvalue from thebasicstable, print thestartYearand the total number of entries in corresponding to thatstartYear.
Question 4 (2 pts)
-
From the
nametable, find the nconst value for Emma Watson. (Notice that there are several entries with this name, but probably only one of them is the one that you want to analyze.) -
How many entries in the
principalstable correspond to Emma Watson (using only the correct value ofnconstthat you found in part a)?
-
From the
nametable, find the nconst value for Emma Watson. (Although several values appear, just find the 1 value that is correct for her.) -
How many entries in the
principalstable correspond to Emma Watson?
Question 5 (2 pts)
Join the basics and the ratings table to find the 3 entries that have startYear = 2024 and numVotes > 100000 and averageRating > 8. (Print all of the columns from both tables, for these 3 entries.)
-
Join the
basicsand theratingstable to find the 3 entries that havestartYear = 2024andnumVotes > 100000andaverageRating > 8. (Print all of the columns from both tables, for these 3 entries.)
Submitting your Work
We see that the SQL skills that we learned for the Lahman baseball database are directly applicable to analyzing the movies and TV database too! It is a good feeling to be able to apply what we have learned in a new setting!
-
firstname-lastname-project11.ipynb
|
You must double check your You will not receive full credit if your |