Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

This repository is dedicated to solutions for LeetCode SQL questions implemented in PySpark.

License

NotificationsYou must be signed in to change notification settings

bitoollearner/leetcode-pyspark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

This repository is dedicated to unsolved LeetCode SQL questions implemented in PySpark.

This notebook contains a collection of SQL-related challenges sourced fromLeetCode. The primary goal is to enhance SQL skills while leveragingPySpark, a powerful framework for large-scale data processing.

For each question, you will find:

  • A brief problem description
  • The relevant dataset or schema context
  • A step-by-step Unsolved implemented using PySpark DataFrames and SQL functions

Question Breakdown by Difficulty:

DifficultyCount
Easy111
Medium130
Hard60
Total301

All Unsolveds in this notebook are implemented usingPySpark to ensure consistency with big data processing practices. The approach utilizesPySpark’s Python API, DataFrame API, and built-in functions to solve problems efficiently.

Disclaimer:
All questions and problem statements belong toLeetCode. This notebook is solely for educational purposes, and all credit and ownership of the questions remain withLeetCode. If you have any questions, please reach out toinfo.bilearner@gmail.com.

Prerequisites

  • Databricks Account:
    • You need access to a Databricks account. Databricks provides a collaborative environment for big data analytics, including support for PySpark, which is crucial for running your Unsolveds.
  • Basic Knowledge of PySpark:
    • Familiarity with PySpark is essential. Users should understand how to write PySpark code to manipulate dataframes, perform transformations, and execute actions.
  • SQL Knowledge:
    • Basic knowledge of SQL is required to understand and write SQL queries within the PySpark framework. This includes understanding SQL syntax, querying databases, and manipulating data.
  • GitHub Repository Access:
    • Access to your GitHub repository containing unsolved LeetCode SQL questions. Users should be able to clone the repository, review the Unsolveds, and potentially contribute if allowed.

1. Setting up Databricks Premium (Paid Version)

Databricks Premium is a paid plan that offers advanced features such as higher compute power, security options, and integrations.

Step 1: Sign Up for Databricks

  1. Go toDatabricks website.
  2. Click"Start your free trial" (for a trial) or go to"Sign In" if you have an account.
  3. Choose"AWS", "Azure", or "GCP" as your cloud provider.
  4. Follow the registration process, providing details like your email, company, and cloud provider credentials.

Step 2: Create a Databricks Workspace

  1. In the cloud provider console (AWS, Azure, or GCP), create a Databricks workspace.
  2. Select thePremium plan during setup.
  3. Configure networking and security settings as required.
  4. Once created, launch the workspace from the cloud console.

Step 3: Create a Cluster

  1. Inside the Databricks workspace, go toCompute.
  2. ClickCreate Cluster.
  3. Choose a cluster name and select a runtime version (latest recommended).
  4. Select the number of workers (scale as needed).
  5. ClickCreate Cluster.

Step 4: Create a Notebook

  1. Navigate toWorkspace > Users > Your Name.
  2. ClickCreate > Notebook.
  3. Name the notebook and selectPython as the language.
  4. Attach it to your running cluster.

2. Setting up Databricks Community Edition (Free Version)

Databricks Community Edition is a free, limited version ideal for learning PySpark.

Step 1: Sign Up for Community Edition

  1. Go toDatabricks Community Edition Signup.
  2. Enter your email and complete the registration.
  3. Check your email for the verification link and activate your account.
  4. Log in to your Databricks Community workspace.

Step 2: Create a Cluster

  1. Click onCompute in the left panel.
  2. ClickCreate Cluster.
  3. Name your cluster.
  4. Choose the latest runtime version.
  5. ClickCreate Cluster (Community Edition supports only small clusters).

Step 3: Create a Notebook

  1. Go toWorkspace > Users > Your Name.
  2. ClickCreate > Notebook.
  3. Name the notebook and selectPython.
  4. Attach it to the running cluster.

Key Differences Between Premium and Community Edition

FeatureDatabricks PremiumDatabricks Community Edition
PricePaidFree
Cloud ProvidersAWS, Azure, GCPDatabricks Cloud
Cluster ScalingScalableLimited (Single Node)
Security FeaturesAdvancedBasic
CollaborationMulti-userSingle-user

3. Step-by-Step Guide to Importing LeetCode SQL Questions Notebook into Jupyter Notebook

  1. Clone Your GitHub Repository:

    • First, ensure you have Git installed on your local machine. If not, download and install it fromGit's official website.
    • Open your terminal (command prompt) and navigate to the directory where you want to clone your repository.
    • Clone your GitHub repository using the command:
      git clone <repository_url>
    • Replace<repository_url> with the URL of your GitHub repository. This will download your repository to your local machine.
  2. Install Required Dependencies:

    • Make sure you have Python installed on your machine. It's recommended to use Anaconda or Miniconda to manage your Python environments.
    • Install Jupyter Notebook and PySpark dependencies if you haven't already:
      pip install jupyter pyspark
  3. Launch Jupyter Notebook:

    • Navigate to the directory where your Jupyter Notebook files are located. Typically, this would be the root directory of your cloned repository.
    • Start Jupyter Notebook by running the command:
      jupyter notebook
    • This command will open a new tab in your web browser with the Jupyter Notebook interface.
  4. Open and Run Your Notebook:

    • In the Jupyter Notebook interface, navigate to the directory where your notebook file (*.ipynb) is located.
    • Click on the notebook file to open it.
    • Once the notebook is open, you can run each cell by pressingShift + Enter or using the "Run" button in the toolbar.
    • Ensure that Spark is correctly initialized and configured in your notebook. You may need to import necessary libraries and set up the Spark session if it's not done automatically.
  5. Verify Spark Installation and Configuration:

    • Check if Spark is installed and configured correctly by running a basic Spark operation in one of the notebook cells. For example:
      frompyspark.sqlimportSparkSession# Initialize Spark sessionspark=SparkSession.builder \                    .appName("MyApp") \                    .getOrCreate()# Verify Spark sessionspark
    • If Spark is configured correctly, you should see the Spark session information printed without any errors.
  6. Execute and Test Your Notebook:

    • Execute each cell in your notebook to ensure that all code runs as expected.
    • Validate the results of the LeetCode SQL questions solutions to ensure correctness and functionality with PySpark.
  7. Save Your Work:

    • Once you have verified that everything is working correctly, save your notebook with any changes you have made.

Additional Tips:

  • Environment Management: Consider using virtual environments or conda environments to manage dependencies and avoid conflicts between different projects.
  • Documentation: It's helpful to include documentation within your notebook, such as explanations of the SQL solutions and any specific configurations required for Spark.
  • Version Control: Regularly commit your changes to Git and push them to your GitHub repository to keep a versioned history of your work.

By following these steps, you should be able to successfully import and run your LeetCode SQL questions notebook using PySpark in Jupyter Notebook on your local machine.

Note:
The Databricks Community Edition will be more than adequate for this activity.

LeetCode SQL Questions

S.NoQuestionsDifficultyUnsolvedSolved
1175. Combine Two TablesEasy
2176. Second Highest SalaryMedium
3177. Nth Highest SalaryMedium
4178. Rank ScoresMedium
5180. Consecutive NumbersMedium
6181. Employees Earning More Than Their ManagersEasy
7182. Duplicate EmailsEasy
8183. Customers Who Never OrderEasy
9184. Department Highest SalaryMedium
10185. Department Top Three SalariesHard
11196. Delete Duplicate EmailsEasy
12197. Rising TemperatureEasy
13262. Trips and UsersHard
14511. Game Play Analysis IEasy
15512. Game Play Analysis IIEasy
16534. Game Play Analysis IIIMedium
17550. Game Play Analysis IVMedium
18569. Median Employee SalaryHard
19570. Managers with at Least 5 Direct ReportsMedium
20571. Find Median Given Frequency of NumbersHard
21574. Winning CandidateMedium
22577. Employee BonusEasy
23578. Get Highest Answer Rate QuestionMedium
24579. Find Cumulative Salary of an EmployeeHard
25580. Count Student Number in DepartmentsMedium
26584. Find Customer RefereeEasy
27585. Investments in 2016Medium
28586. Customer Placing the Largest Number of OrdersEasy
29595. Big CountriesEasy
30596. Classes More Than 5 StudentsEasy
31597. Friend Requests I: Overall Acceptance RateEasy
32601. Human Traffic of StadiumHard
33602. Friend Requests II: Who Has the Most FriendsMedium
34603. Consecutive Available SeatsEasy
35607. Sales PersonEasy
36608. Tree NodeMedium
37610. Triangle JudgementEasy
38612. Shortest Distance in a PlaneMedium
39613. Shortest Distance in a LineEasy
40614. Second Degree FollowerMedium
41615. Average Salary: Departments VS CompanyHard
42618. Students Report By GeographyHard
43619. Biggest Single NumberEasy
44620. Not Boring MoviesEasy
45626. Exchange SeatsMedium
46627. Swap SalaryEasy
471045. Customers Who Bought All ProductsMedium
481050. Actors and Directors Who Cooperated At Least Three TimesEasy
491068. Product Sales Analysis IEasy
501069. Product Sales Analysis IIEasy
511070. Product Sales Analysis IIIMedium
521075. Project Employees IEasy
531076. Project Employees IIEasy
541077. Project Employees IIIMedium
551082. Sales Analysis IEasy
561083. Sales Analysis IIEasy
571084. Sales Analysis IIIEasy
581097. Game Play Analysis VHard
591098. Unpopular BooksMedium
601107. New Users Daily CountMedium
611112. Highest Grade For Each StudentMedium
621113. Reported PostsEasy
631126. Active BusinessesMedium
641127. User Purchase PlatformHard
651132. Reported Posts IIMedium
661141. User Activity for the Past 30 Days IEasy
671142. User Activity for the Past 30 Days IIEasy
681148. Article Views IEasy
691149. Article Views IIMedium
701158. Market Analysis IMedium
711159. Market Analysis IIHard
721164. Product Price at a Given DateMedium
731173. Immediate Food Delivery IEasy
741174. Immediate Food Delivery IIMedium
751179. Reformat Department TableEasy
761193. Monthly Transactions IMedium
771194. Tournament WinnersHard
781204. Last Person to Fit in the BusMedium
791205. Monthly Transactions IIMedium
801211. Queries Quality and PercentageEasy
811212. Team Scores in Football TournamentMedium
821225. Report Contiguous DatesHard
831241. Number of Comments per PostEasy
841251. Average Selling PriceEasy
851264. Page RecommendationsMedium
861270. All People Report to the Given ManagerMedium
871280. Students and ExaminationsEasy
881285. Find the Start and End Number of Continuous RangesMedium
891294. Weather Type in Each CountryEasy
901303. Find the Team SizeEasy
911308. Running Total for Different GendersMedium
921321. Restaurant GrowthMedium
931322. Ads PerformanceEasy
941327. List the Products Ordered in a PeriodEasy
951336. Number of Transactions per VisitHard
961341. Movie RatingMedium
971350. Students With Invalid DepartmentsEasy
981355. Activity ParticipantsMedium
991364. Number of Trusted Contacts of a CustomerMedium
1001369. Get the Second Most Recent ActivityHard
1011378. Replace Employee ID With The Unique IdentifierEasy
1021384. Total Sales Amount by YearHard
1031393. Capital Gain/LossMedium
1041398. Customers Who Bought Products A and B but Not CMedium
1051407. Top TravellersEasy
1061412. Find the Quiet Students in All ExamsHard
1071421. NPV QueriesEasy
1081435. Create a Session Bar ChartEasy
1091440. Evaluate Boolean ExpressionMedium
1101445. Apples & OrangesMedium
1111454. Active UsersMedium
1121459. Rectangles AreaMedium
1131468. Calculate SalariesMedium
1141479. Sales by Day of the WeekHard
1151484. Group Sold Products By The DateEasy
1161495. Friendly Movies Streamed Last MonthEasy
1171501. Countries You Can Safely Invest InMedium
1181511. Customer Order FrequencyEasy
1191517. Find Users With Valid E-MailsEasy
1201527. Patients With a ConditionEasy
1211532. The Most Recent Three OrdersMedium
1221543. Fix Product Name FormatEasy
1231549. The Most Recent Orders for Each ProductMedium
1241555. Bank Account SummaryMedium
1251565. Unique Orders and Customers Per MonthEasy
1261571. Warehouse ManagerEasy
1271581. Customer Who Visited but Did Not Make Any TransactionsEasy
1281587. Bank Account Summary IIEasy
1291596. The Most Frequently Ordered Products for Each CustomerMedium
1301607. Sellers With No SalesEasy
1311613. Find the Missing IDsMedium
1321623. All Valid Triplets That Can Represent a CountryEasy
1331633. Percentage of Users Attended a ContestEasy
1341635. Hopper Company Queries IHard
1351645. Hopper Company Queries IIHard
1361651. Hopper Company Queries IIIHard
1371661. Average Time of Process per MachineEasy
1381667. Fix Names in a TableEasy
1391677. Product's Worth Over InvoicesEasy
1401683. Invalid TweetsEasy
1411693. Daily Leads and PartnersEasy
1421699. Number of Calls Between Two PersonsMedium
1431709. Biggest Window Between VisitsMedium
1441715. Count Apples and OrangesMedium
1451729. Find Followers CountEasy
1461731. The Number of Employees Which Report to Each EmployeeEasy
1471741. Find Total Time Spent by Each EmployeeEasy
1481747. Leetflex Banned AccountsMedium
1491757. Recyclable and Low Fat ProductsEasy
1501767. Find the Subtasks That Did Not ExecuteHard
1511777. Product's Price for Each StoreEasy
1521783. Grand Slam TitlesMedium
1531789. Primary Department for Each EmployeeEasy
1541795. Rearrange Products TableEasy
1551809. Ad-Free SessionsEasy
1561811. Find Interview CandidatesMedium
1571821. Find Customers With Positive Revenue this YearEasy
1581831. Maximum Transaction Each DayMedium
1591841. League StatisticsMedium
1601843. Suspicious Bank AccountsMedium
1611853. Convert Date FormatEasy
1621867. Orders With Maximum Quantity Above AverageMedium
1631873. Calculate Special BonusEasy
1641875. Group Employees of the Same SalaryMedium
1651890. The Latest Login in 2020Easy
1661892. Page Recommendations IIHard
1671907. Count Salary CategoriesMedium
1681917. Leetcodify Friends RecommendationsHard
1691919. Leetcodify Similar FriendsHard
1701934. Confirmation RateMedium
1711939. Users That Actively Request Confirmation MessagesEasy
1721949. Strong FriendshipMedium
1731951. All the Pairs With the Maximum Number of Common FollowersMedium
1741965. Employees With Missing InformationEasy
1751972. First and Last Call On the Same DayHard
1761978. Employees Whose Manager Left the CompanyEasy
1771988. Find Cutoff Score for Each SchoolMedium
1781990. Count the Number of ExperimentsMedium
1792004. The Number of Seniors and Juniors to Join the CompanyHard
1802010. The Number of Seniors and Juniors to Join the Company IIHard
1812020. Number of Accounts That Did Not StreamMedium
1822026. Low-Quality ProblemsEasy
1832041. Accepted Candidates From the InterviewsMedium
1842051. The Category of Each Member in the StoreMedium
1852066. Account BalanceMedium
1862072. The Winner UniversityEasy
1872082. The Number of Rich CustomersEasy
1882084. Drop Type 1 Orders for Customers With Type 0 OrdersMedium
1892112. The Airport With the Most TrafficMedium
1902118. Build the EquationHard
1912142. The Number of Passengers in Each Bus IMedium
1922153. The Number of Passengers in Each Bus IIHard
1932159. Order Two Columns IndependentlyMedium
1942173. Longest Winning StreakHard
1952175. The Change in Global RankingsMedium
1962199. Finding the Topic of Each PostHard
1972205. The Number of Users That Are Eligible for DiscountEasy
1982228. Users With Two Purchases Within Seven DaysMedium
1992230. The Users That Are Eligible for DiscountEasy
2002238. Number of Times a Driver Was a PassengerMedium
2012252. Dynamic Pivoting of a TableHard
2022253. Dynamic Unpivoting of a TableHard
2032292. Products With Three or More Orders in Two Consecutive YearsMedium
2042298. Tasks Count in the WeekendMedium
2052308. Arrange Table by GenderMedium
2062314. The First Day of the Maximum Recorded Degree in Each CityMedium
2072324. Product Sales Analysis IVMedium
2082329. Product Sales Analysis VEasy
2092339. All the Matches of the LeagueEasy
2102346. Compute the Rank as a PercentageMedium
2112356. Number of Unique Subjects Taught by Each TeacherEasy
2122362. Generate the InvoiceHard
2132372. Calculate the Influence of Each SalespersonMedium
2142377. Sort the Olympic TableEasy
2152388. Change Null Values in a Table to the Previous ValueMedium
2162394. Employees With DeductionsMedium
2172474. Customers With Strictly Increasing PurchasesHard
2182480. Form a Chemical BondEasy
2192494. Merge Overlapping Events in the Same HallHard
2202504. Concatenate the Name and the ProfessionEasy
2212668. Find Latest SalariesEasy
2222669. Count Artist Occurrences On Spotify Ranking ListEasy
2232686. Immediate Food Delivery IIIMedium
2242687. Bikes Last Time UsedEasy
2252688. Find Active UsersMedium
2262701. Consecutive Transactions with Increasing AmountsHard
2272720. Popularity PercentageHard
2282738. Count Occurrences in TextMedium
2292752. Customers with Maximum Number of Transactions on Consecutive DaysHard
2302783. Flight Occupancy and Waitlist AnalysisMedium
2312837. Total Traveled DistanceEasy
2322853. Highest Salaries DifferenceEasy
2332854. Rolling Average StepsMedium
2342893. Calculate Orders Within Each IntervalMedium
2352922. Market Analysis IIIMedium
2362978. Symmetric CoordinatesMedium
2372984. Find Peak Calling Hours for Each CityMedium
2382985. Calculate Compressed MeanEasy
2392986. Find Third TransactionMedium
2402987. Find Expensive CitiesEasy
2412988. Manager of the Largest DepartmentMedium
2422989. Class PerformanceMedium
2432990. Loan TypesEasy
2442991. Top Three WineriesHard
2452993. Friday Purchases IMedium
2462994. Friday Purchases IIHard
2472995. Viewers Turned StreamersHard
2483050. Pizza Toppings Cost AnalysisMedium
2493051. Find Candidates for Data Scientist PositionEasy
2503052. Maximize ItemsHard
2513053. Classifying Triangles by LengthsEasy
2523054. Binary Tree NodesMedium
2533055. Top Percentile FraudMedium
2543056. Snaps AnalysisMedium
2553057. Employees Project AllocationHard
2563058. Friends With No Mutual FriendsMedium
2573059. Find All Unique Email DomainsEasy
2583060. User Activities within Time BoundsHard
2593061. Calculate Trapping Rain WaterHard
2603087. Find Trending HashtagsMedium
2613089. Find Bursty BehaviorMedium
2623103. Find Trending Hashtags IIHard
2633118. Friday Purchase IIIMedium
2643124. Find Longest CallsMedium
2653126. Server Utilization TimeMedium
2663140. Consecutive Available Seats IIMedium
2673150. Invalid Tweets IIEasy
2683156. Employee Task Duration and Concurrent TasksHard
2693166. Calculate Parking Fees and DurationMedium
2703172. Second Day VerificationEasy
2713182. Find Top Scoring StudentsMedium
2723188. Find Top Scoring Students IIHard
2733198. Find Cities in Each StateEasy
2743204. Bitwise User Permissions AnalysisMedium
2753214. Year on Year Growth RateHard
2763220. Odd and Even TransactionsMedium
2773230. Customer Purchasing Behavior AnalysisMedium
2783236. CEO Subordinate HierarchyHard
2793246. Premier League Table RankingEasy
2803252. Premier League Table Ranking IIMedium
2813262. Find Overlapping ShiftsMedium
2823268. Find Overlapping Shifts IIHard
2833278. Find Candidates for Data Scientist Position IIMedium
2843293. Calculate Product Final PriceMedium
2853308. Find Top Performing DriverMedium
2863322. Premier League Table Ranking IIIMedium
2873328. Find Cities in Each State IIMedium
2883338. Second Highest Salary IIMedium
2893358. Books with NULL RatingsEasy
2903368. First Letter CapitalizationHard
2913374. First Letter Capitalization IIHard
2923384. Team Dominance by Pass SuccessHard
2933390. Longest Team Pass StreakHard
2943401. Find Circular Gift Exchange ChainsHard
2953415. Find Products with Three Consecutive DigitsEasy
2963421. Find Students Who ImprovedMedium
2973436. Find Valid EmailsEasy
2983451. Find Invalid IP AddressesHard
2993465. Find Products with Valid Serial NumbersEasy
3003475. DNA Pattern RecognitionMedium
3013482. Analyze Organization HierarchyHard

🚀 Support This Project! ⭐

Hey everyone! 👋

We've created this GitHub repository to help the community solveLeetCode SQL questions using PySpark. If you find this project helpful, please considergiving it a star ⭐ and sharing it with other learners who might benefit from it!

Your support will help grow this resource and make it even better for everyone. Let’s learn and improve together! 🚀

🔗LeetCode SQL Questions-(PySpark Unsolved)

Happy coding! 💻🔥

About

This repository is dedicated to solutions for LeetCode SQL questions implemented in PySpark.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors2

  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp