TDM 20100: Project 9 — SQL
Motivation: Now we learn how to write SQL queries that rely on more than one table.
Context: The JOIN in SQL enables us to make queries that rely on information from multiple SQL tables. It is absolutely important to tell SQL which rows need to agree, by including the ON portion of the JOIN statement.
Scope: SQLite queries use a JOIN to gather information from more than one table.
Dataset(s)
This project will use the following dataset:
-
/anvil/projects/tdm/data/lahman/lahman.db(Lahman baseball database)
Our page in The Examples Book about SQL (in general) is given here: the-examples-book.com/tools/SQL/
|
Before you begin the project, try the examples from the Lahman baseball database found on this webpage of The Examples Book: the-examples-book.com/tools/SQL/lahman-examples-one-join All of these examples rely on one |
Questions
Using the seminar kernel, if you run this line in a cell by itself:
%sql sqlite:////anvil/projects/tdm/data/lahman/lahman.db
|
If your kernel dies, then you need to re-run the line above. You also need to re-run this line at the start of any new Jupyter Lab session. |
Again, we remind students that the list of all of the tables in this database are:
AllstarFull
Appearances
AwardsManagers
AwardsPlayers
AwardsShareManagers
AwardsSharePlayers
Batting
BattingPost
CollegePlaying
Fielding
FieldingOF
FieldingOFsplit
FieldingPost
HallOfFame
HomeGames
Managers
ManagersHalf
Parks
People
Pitching
PitchingPost
Salaries
Schools
SeriesPost
Teams
TeamsFranchises
TeamsHalf
Please read the examples given here: the-examples-book.com/tools/SQL/lahman-examples-one-join and then you are ready to start the questions for this project!
|
In the page of examples, sometimes we write |
Question 1 (2 pts)
Join the Batting table to the People by matching the playerID values in these two tables. For all 48 players on the 2023 Chicago Cubs team, print their PlayerID (from either table), as well as their hits (H) and home runs (HR) from the Batting table, and also their nameFirst and nameLast from the People table.
-
Print the
playerID,H,HR,nameFirst, andnameLastvalues for all 48 of the players on the 2023 Chicago Cubs team.
Question 2 (2 pts)
Join the Batting table to the Pitching table by matching the playerID, yearID, and stint columns. There is only one person from 2023 appearing in both of these tables that hit more than 30 home runs. Print this person’s playerID and the number of home runs (HR) that they attained (from the Batting table).
-
Print the
PlayerIDand the number of home runs (HR) from theBattingtable for the only person who is in both theBattingandPitchingtable in 2023 who had more than 30 home runs (HR) in theBattingtable.
Question 3 (2 pts)
In this question, we will accomplish everything from Project 8, Question 3abc in just one query.
Join the People and Batting table by matching the playerID values in these two tables. Print only 1 row, corresponding to Rickey Henderson, displaying his playerID, nameFirst, nameLast, SUM(R), and SUM(SB) values.
|
He had 2295 runs scored altogether and 1406 stolen bases. |
-
Print only 1 row, corresponding to Rickey Henderson, displaying his
playerID,nameFirst,nameLast,SUM(R), andSUM(SB)values.
Question 4 (2 pts)
-
As in Project 8, Question 4a, use the
Battingtable but now alsoJOINthePeopletable (by matching theplayerIDvalues), to find the top 5 players of all time, in terms of their total number of hits, in other words, according toSUM(H). For the top 5 players (in terms of the total number of hits), print theirplayerID, theSUM(H)(in other words, their total number of hits in their careers), and theirnameFirstandnameLastvalues. -
Same question as 4b, but this time use home runs (according to
SUM(HR)) instead of hits.
-
For the top 5 players (in terms of the total number of hits), print their
playerID, theSUM(H)(in other words, their total number of hits in their careers), and theirnameFirstandnameLastvalues. -
For the top 5 players (in terms of the total number of home runs), print their
playerID, theSUM(HR)(in other words, their total number of home runs in their careers), and theirnameFirstandnameLastvalues.
Question 5 (2 pts)
Join the CollegePlaying and People tables on the playerID values. Print the DISTINCT(playerID) and nameFirst and nameLast values from the People table for each of the 15 distinct players that have schoolID = 'purdue' in the CollegePlaying table.
-
Print the
DISTINCT(playerID)andnameFirstandnameLastvalues from thePeopletable for each of the 15 distinct players that haveschoolID = 'purdue'in theCollegePlayingtable.
Submitting your Work
We hope that you enjoyed learning about databases this week! Please let us know if we can assist, as you are learning these new ideas!
-
firstname-lastname-project9.ipynb
|
You must double check your You will not receive full credit if your |