Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Cover image for How to Effectively Scrape Noon Data
Crawlbase
Crawlbase

Posted on • Originally published atcrawlbase.com

     

How to Effectively Scrape Noon Data

This blog was initially posted toCrawlbase Blog

Noon is one of the biggest e-commerce platforms in the Middle East, with millions of customers across the UAE, Saudi Arabia, and Egypt. Noon has a huge product catalog and thousands of daily transactions. Scraping Noon data helps businesses to track prices, competitors and market insights.

But scraping Noon is tough. The website has dynamic content, JavaScript-based elements, and anti-bot measures that can block traditional scraping methods. We will useCrawlbase Crawling API to extract search results and product details while handling these challenges.

This tutorial will show you how to scrape Noon data using Python with step-by-step examples for structured data extraction.

Let’s start!

Setting Up Your Python Environment

Before you start scraping Noon data, you need to set up your environment.This includes installing Python, required libraries, and choosing the right IDE to code.

Installing Python and Required Libraries

If you don’t have Python installed, download the latest version frompython.org and follow the installation instructions for your OS.

Next, install the required libraries by running:

pipinstallcrawlbase beautifulsoup4 pandas
Enter fullscreen modeExit fullscreen mode
  • Crawlbase – Bypasses anti-bot protections and scrapes JavaScript heavy pages.
  • BeautifulSoup – Extracts structured data from HTML.
  • Pandas – Handles and stores data in CSV format.

Choosing an IDE for Scraping

Choosing the right Integrated Development Environment (IDE) makes scraping easier. Here are some good options:

  • VS Code – Lightweight and feature-rich with great Python support.
  • PyCharm – Powerful debugging and automation features.
  • Jupyter Notebook – Ideal for interactive scraping and quick data analysis.

With Python installed, libraries set up and IDE ready, you’re now ready to start scraping Noon data.

Scraping Noon Search Results

Scraping search results from Noon will give you the product names, prices, ratings, and URLs. This data is useful for competitive analysis, price monitoring, and market research. In this section, we will guide you through the process of scraping search results from Noon, handling pagination, and storing the data in a CSV file.

Inspecting the HTML for CSS Selectors

Before we start writing the scraper, we need to inspect the HTML structure of Noon’s search results page. By doing this we can find the CSS selectors to extract the product details.

  1. Go toNoon.com and search for a product (e.g., "smartphones").
  2. Right-click on any product and choose Inspect or Inspect Element in Chrome Developer Tools.

Screenshot displaying HTML structure for Noon search results

  1. Identify the following key HTML elements:
  • Product Title: Found in the<div data-qa="product-name"> tag.
  • Price: Found in the<strong> tag.
  • Currency: Found in the<span> tag.
  • Ratings: Found in the<div> tag.
  • Product URL: Found in thehref attribute of the<a> tag.

Once you identify the relevant elements and their CSS classes or IDs, you can proceed to write the scraper.

Writing the Noon Search Listings Scraper

Now that we've inspected the HTML structure, we can write a Python script to scrape the product data from Noon. We’ll use Crawlbase Crawling API for bypassing anti-bot measures and BeautifulSoup for parsing the HTML.

fromcrawlbaseimportCrawlingAPIfrombs4importBeautifulSoup# Initialize Crawlbase APIcrawling_api=CrawlingAPI({'token':'YOUR_CRAWLBASE_TOKEN'})defscrape_noon_search(query,page):"""Scrape search results from Noon."""url=f"https://www.noon.com/uae-en/search/?q={query}&page={page}"options={'ajax_wait':'true','page_wait':'5000'}response=crawling_api.get(url,options)ifresponse['headers']['pc_status']=='200':returnresponse['body'].decode('utf-8')else:print(f"Failed to fetch page{page}.")returnNonedefextract_product_data(html):"""Extract product details from Noon search results."""soup=BeautifulSoup(html,'html.parser')products=[]foriteminsoup.select('div.grid > span.productContainer'):title=item.select_one('div[data-qa="product-name"]').text.strip()ifitem.select_one('div[data-qa="product-name"]')else''price=item.select_one('strong.amount').text.strip()ifitem.select_one('strong.amount')else''currency=item.select_one('span.currency').text.strip()ifitem.select_one('span.currency')else''rating=item.select_one('div.dGLdNc').text.strip()ifitem.select_one('div.dGLdNc')else''link=f"https://www.noon.com{item.select_one('a')['href']}"ifitem.select_one('a')else''iftitleandprice:products.append({'Title':title,'Price':price,'Currency':currency,'Rating':rating,'URL':link})returnproducts
Enter fullscreen modeExit fullscreen mode

We first initialize the CrawlingAPI class with a token for authentication. Thescrape_noon_search function fetches the HTML of a search results page from Noon based on a query and page number, handling AJAX content loading. Theextract_product_data function parses the HTML using BeautifulSoup, extracting details such as product titles, prices, ratings, and URLs. It then returns this data in a structured list of dictionaries.

Handling Pagination

Noon’s search results span across multiple pages. To scrape all the data, we need to handle pagination and loop through each page. Here's how we can do it:

defscrape_all_pages(query,max_pages):"""Scrape multiple pages of search results."""all_products=[]forpageinrange(1,max_pages+1):print(f"Scraping page{page}...")html=scrape_noon_search(query,page)ifhtml:products=extract_product_data(html)ifnotproducts:print("No more results found. Stopping.")breakall_products.extend(products)else:breakreturnall_products
Enter fullscreen modeExit fullscreen mode

This function loops through the specified number of pages, fetching and extracting product data until all pages are processed.

Storing Data in a CSV File

Once we’ve extracted the product details, we need to store the data in a structured format. The most common and easy-to-handle format is CSV. Below is the code to save the scraped data:

importcsvdefsave_to_csv(data,filename):"""Save scraped data to a CSV file."""keys=data[0].keys()ifdataelse['Title','Price','Rating','URL']withopen(filename,'w',newline='',encoding='utf-8')asf:writer=csv.DictWriter(f,fieldnames=keys)writer.writeheader()writer.writerows(data)print(f"Data saved to{filename}")
Enter fullscreen modeExit fullscreen mode

This function takes the list of products and saves it as a CSV file, making it easy to analyze or import into other tools.

Complete Code Example

Here is the complete Python script to scrape Noon search results, handle pagination, and store the data in a CSV file:

fromcrawlbaseimportCrawlingAPIfrombs4importBeautifulSoupimportcsv# Initialize Crawlbase APIcrawling_api=CrawlingAPI({'token':'YOUR_CRAWLBASE_TOKEN'})defscrape_noon_search(query,page):"""Scrape product listings from Noon search results."""url=f"https://www.noon.com/uae-en/search/?q={query}&page={page}"options={'ajax_wait':'true','page_wait':'5000'}response=crawling_api.get(url,options)ifresponse['headers']['pc_status']=='200':returnresponse['body'].decode('utf-8')else:print(f"Failed to fetch page{page}.")returnNonedefextract_product_data(html):"""Extract product details from Noon search results."""soup=BeautifulSoup(html,'html.parser')products=[]foriteminsoup.select('div.grid > span.productContainer'):title=item.select_one('div[data-qa="product-name"]').text.strip()ifitem.select_one('div[data-qa="product-name"]')else''price=item.select_one('strong.amount').text.strip()ifitem.select_one('strong.amount')else''currency=item.select_one('span.currency').text.strip()ifitem.select_one('span.currency')else''rating=item.select_one('div.dGLdNc').text.strip()ifitem.select_one('div.dGLdNc')else''link=f"https://www.noon.com{item.select_one('a')['href']}"ifitem.select_one('a')else''iftitleandprice:products.append({'Title':title,'Price':price,'Currency':currency,'Rating':rating,'URL':link})returnproductsdefscrape_all_pages(query,max_pages):"""Scrape multiple pages of search results."""all_products=[]forpageinrange(1,max_pages+1):print(f"Scraping page{page}...")html=scrape_noon_search(query,page)ifhtml:products=extract_product_data(html)ifnotproducts:print("No more results found. Stopping.")breakall_products.extend(products)else:breakreturnall_productsdefsave_to_csv(data,filename):"""Save scraped data to a CSV file."""keys=data[0].keys()ifdataelse['Title','Price','Rating','URL']withopen(filename,'w',newline='',encoding='utf-8')asf:writer=csv.DictWriter(f,fieldnames=keys)writer.writeheader()writer.writerows(data)print(f"Data saved to{filename}")defmain():"""Main function to run the scraper."""query="smartphones"# Change the search term as neededmax_pages=5# Set the number of pages to scrapeall_products=scrape_all_pages(query,max_pages)save_to_csv(all_products,'noon_smartphones.csv')if__name__=="__main__":main()
Enter fullscreen modeExit fullscreen mode

noon_smartphones.csv Snapshot:

noon_smartphones.csv output file snapshot

Scraping Noon Product Pages

Scraping product pages on Noon will give you all the product details, including descriptions, specifications, and customer reviews. This data will help businesses optimize their product listings and customer behavior. In this section, we will go through the process of inspecting the HTML structure of a product page, writing the scraper and saving the data into a CSV file.

Inspecting the HTML for CSS Selectors

Before we write the scraper, we need to inspect the HTML structure of the product page to identify the correct CSS selectors for the elements we want to scrape. Here’s how to do it:

  1. Open a product page on Noon (e.g., a smartphone page).
  2. Right-click on a product detail (e.g., product name, price, description) and click on Inspect in Chrome Developer Tools.

Screenshot displaying HTML structure for Noon product pages

  1. Look for key elements, such as:
  • Product Name: Found in the<h1 data-qa^="pdp-name-"> tag.
  • Price: Found in the<div data-qa="div-price-now"> tag.
  • Product Highlights: Found in the<div> tag, specifically within an unordered list (<ul>).
  • Product Specifications: Found in the<div> tag, within a table's<tr> tags containing<td> elements.

Once you identify the relevant elements and their CSS classes or IDs, you can proceed to write the scraper.

Writing the Noon Product Page Scraper

Now, let's write a Python script to scrape the product details from Noon product pages using Crawlbase Crawling API and BeautifulSoup.

fromcrawlbaseimportCrawlingAPIfrombs4importBeautifulSoupimportre# Initialize Crawlbase APIcrawling_api=CrawlingAPI({'token':'YOUR_CRAWLBASE_TOKEN'})defscrape_product_page(product_url):"""Scrape product details from a Noon product page."""options={'ajax_wait':'true','page_wait':'3000'}response=crawling_api.get(product_url,options)ifresponse['headers']['pc_status']=='200':returnresponse['body'].decode('utf-8')else:print(f"Failed to fetch product page:{product_url}.")returnNonedefextract_product_details(html):"""Extract details like name, price, description, and reviews."""soup=BeautifulSoup(html,'html.parser')product={}product['Name']=soup.select_one('h1[data-qa^="pdp-name-"]').text.strip()ifsoup.select_one('h1[data-qa^="pdp-name-"]')else''product['Price']=soup.select_one('div[data-qa="div-price-now"]').text.strip()ifsoup.select_one('div[data-qa="div-price-now"]')else''product['highlights']=soup.select_one('div.oPZpQ ul').text.strip()ifsoup.select_one('div.oPZpQ ul')else''product['specifications']={re.sub(r'\s+','',row.find_all('td')[0].text.strip()):re.sub(r'\s+','',row.find_all('td')[1].text.strip())forrowinsoup.select('div.dROUvm table tr')iflen(row.find_all('td'))==2}returnproduct
Enter fullscreen modeExit fullscreen mode

Storing Data in a CSV File

Once we’ve extracted the product details, we need to store this information in a structured format like CSV for easy analysis. Here’s a simple function to save the scraped data:

importcsvdefsave_product_data_to_csv(products,filename):"""Save product details to a CSV file."""keys=products[0].keys()ifproductselse['Name','Price','Description','Reviews']withopen(filename,'w',newline='',encoding='utf-8')asf:writer=csv.DictWriter(f,fieldnames=keys)writer.writeheader()writer.writerows(products)print(f"Data saved to{filename}")
Enter fullscreen modeExit fullscreen mode

Complete Code Example

Now, let's combine everything into a complete script. Themain() function will scrape data for multiple product pages and store the results in a CSV file.

fromcrawlbaseimportCrawlingAPIfrombs4importBeautifulSoupimportcsvimportre# Initialize Crawlbase APIcrawling_api=CrawlingAPI({'token':'YOUR_CRAWLBASE_TOKEN'})defscrape_product_page(product_url):"""Scrape product details from a Noon product page."""options={'ajax_wait':'true','page_wait':'3000'}response=crawling_api.get(product_url,options)ifresponse['headers']['pc_status']=='200':returnresponse['body'].decode('utf-8')else:print(f"Failed to fetch product page:{product_url}.")returnNonedefextract_product_details(html):"""Extract details like name, price, description, and reviews."""soup=BeautifulSoup(html,'html.parser')product={}product['Name']=soup.select_one('h1[data-qa^="pdp-name-"]').text.strip()ifsoup.select_one('h1[data-qa^="pdp-name-"]')else''product['Price']=soup.select_one('div[data-qa="div-price-now"]').text.strip()ifsoup.select_one('div[data-qa="div-price-now"]')else''product['highlights']=soup.select_one('div.oPZpQ ul').text.strip()ifsoup.select_one('div.oPZpQ ul')else''product['specifications']={re.sub(r'\s+','',row.find_all('td')[0].text.strip()):re.sub(r'\s+','',row.find_all('td')[1].text.strip())forrowinsoup.select('div.dROUvm table tr')iflen(row.find_all('td'))==2}returnproductdefsave_product_data_to_csv(products,filename):"""Save product details to a CSV file."""keys=products[0].keys()ifproductselse['Name','Price','Description','Reviews']withopen(filename,'w',newline='',encoding='utf-8')asf:writer=csv.DictWriter(f,fieldnames=keys)writer.writeheader()writer.writerows(products)print(f"Data saved to{filename}")defmain():"""Main function to scrape product pages."""product_urls=['https://www.noon.com/uae-en/galaxy-s25-ai-dual-sim-silver-shadow-12gb-ram-256gb-5g-middle-east-version/N70140511V/p/?o=e12201b055fa94ee','https://www.noon.com/uae-en/a78-5g-dual-sim-glowing-black-8gb-ram-256gb/N70115717V/p/?o=c99e13ae460efc6b']# List of product URLs to scrapeproduct_data=[]forurlinproduct_urls:print(f"Scraping{url}...")html=scrape_product_page(url)ifhtml:product=extract_product_details(html)product_data.append(product)save_product_data_to_csv(product_data,'noon_product_details.csv')if__name__=="__main__":main()
Enter fullscreen modeExit fullscreen mode

noon_product_details.csv Snapshot:

noon_product_details.csv output file snapshot

Final Thoughts

Scraping Noon data is great for businesses to track prices, analyze competitors and improve product listings.Crawlbase Crawling API makes this process easier by handling JavaScript rendering and CAPTCHA protections so you get complete and accurate data with no barriers.

With Python and BeautifulSoup, scraping data from Noon search results and product pages is easy. Follow ethical practices and set up the right environment, and you’ll have the insights to stay ahead in the competitive e-commerce game.

If you want to scrape from other e-commerce platforms check out these other guides.

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

Crawlbase is a web scraping tool to streamline web crawling for data extraction, with smart proxy, empowering businesses with efficient and accurate insights from diverse online sources.
  • Joined

Trending onDEV CommunityHot

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp