You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
Run a short set of SQL queries in Google BigQuery against the NYC Yellow Taxi public dataset to practice cloud data warehouse basics (public datasets, SQL, cost awareness).
Setup (5–10 minutes)
Sign in with a Google account and open BigQuery in the Google Cloud Console.
Use BigQuery Sandbox (no billing) or a classroom project provided by the instructor.
In the Explorer, locate the NYC Taxi public dataset (Yellow Trips). Choose a 2023 table if available. Tip: use the dataset browser and copy the fully qualified table name from the UI.
Instructions
Write each query in a separate cell/tab and label it with the exercise number in a comment.
Prefer date filters to limit scanned bytes. Use preview to confirm schemas before running.
Keep notes of any assumptions (e.g., which exact table you used).
Exercises (submit queries 1–6; 7–10 are optional)
Mandatory (1–6):
Count the number of trips in January 2023
Calculate the total revenue generated by taxi trips in 2023
Find the most popular pickup location
Analyze the number of trips per hour of the day
Calculate the average trip distance
Find the longest trip by distance
Optional (7–10):7. Calculate the total number of passengers by payment type8. Find the most common drop-off location for trips paid by credit card9. Calculate the total number of trips that had more than 4 passengers10. Subquery — Find the average fare for trips longer than the average trip distance
Hint: Use functions like EXTRACT, DATETIME/TIMESTAMP functions, GROUP BY, ORDER BY, and LIMIT. For revenue, sum fare components (e.g., fare_amount + tip_amount + tolls where applicable), based on the table schema you select.
Deliverables (submission)
Submit your solutions via a Pull Request.
Safety and cost notes
Public datasets are free to query; scanning bytes count toward sandbox limits. Always filter by date when possible.
Use the query validator to review “bytes processed” before running.