Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Cover image for Scrape Brave Images with Python
SerpApi profile imageArtur Chukhrai
Artur Chukhrai forSerpApi

Posted on • Edited on • Originally published atserpapi.com

     

Scrape Brave Images with Python

Intro

Currently, we don't have an API that supports extracting data from Brave Search.

This blog post is to show you way how you can do it yourself with provided DIY solution below while we're working on releasing our proper API.

The solution can be used for personal use as it doesn't include theLegal US Shield that we offer for our paidproduction and above plans and has its limitations such as the need to bypass blocks, for example, CAPTCHA.

You can check our public roadmap to track the progress for this API:[New API] Brave Search

What will be scraped

wwbs-brave-images

What is Brave Search

The previous Brave blog post previously describedwhat is Brave search. For the sake of non-duplicating content, this information is not mentioned in this blog post.

Full Code

If you don't need explanation, have a look atfull code example in the online IDE.

importrequests,jsondefscrape_brave_images():# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urlsparams={'q':'dune 2021',# query'source':'web',# source'size':'All',# size (Small, Medium, Large, Wallpaper)'_type':'All',# type (Photo, Clipart, AnimatedGifHttps, Transparent)'layout':'All',# layout (Square, Tall, Wide)'color':'All',# colors (Monochrome, ColorOnly, Red etc)'license':'All',# license (Public, Share, Modify etc)'offset':0}# https://docs.python-requests.org/en/master/user/quickstart/#custom-headersheaders={'content-type':'application/json','User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}data=[]old_page_result=[]whileTrue:html=requests.get('https://search.brave.com/api/images',headers=headers,params=params).json()new_page_result=html['results']ifnew_page_result==old_page_result:breakforresultinnew_page_result:data.append({'title':result.get('title'),'link':result.get('url'),'source':result.get('source'),'width':result.get('properties').get('width'),'height':result.get('properties').get('height'),'image':result.get('properties').get('url')})params['offset']+=151old_page_result=new_page_resultreturndataif__name__=="__main__":brave_images=scrape_brave_images()print(json.dumps(brave_images,indent=2))
Enter fullscreen modeExit fullscreen mode

Preparation

Install libraries:

pip install requests
Enter fullscreen modeExit fullscreen mode

Reduce the chance of being blocked

Make sure you're usingrequest headersuser-agent to act as a "real" user visit. Because defaultrequestsuser-agent ispython-requests and websites understand that it's most likely a script that sends a request.Check what's youruser-agent.

There's ahow to reduce the chance of being blocked while web scraping blog post that can get you familiar with basic and more advanced approaches.

Code Explanation

Import libraries:

importrequests,json
Enter fullscreen modeExit fullscreen mode
LibraryPurpose
requeststo make a request to the website.
jsonto convert extracted data to a JSON object.

Create URL parameters and request headers:

# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urlsparams={'q':'dune 2021',# query'source':'web',# source'size':'All',# size (Small, Medium, Large, Wallpaper)'_type':'All',# type (Photo, Clipart, AnimatedGifHttps, Transparent)'layout':'All',# layout (Square, Tall, Wide)'color':'All',# colors (Monochrome, ColorOnly, Red etc)'license':'All',# license (Public, Share, Modify etc)'offset':0}# https://docs.python-requests.org/en/master/user/quickstart/#custom-headersheaders={'content-type':'application/json','User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}
Enter fullscreen modeExit fullscreen mode
CodeExplanation
paramsa prettier way of passing URL parameters to a request.
content-typeto indicate the original media type of the resource (prior to any content encoding applied for sending). In responses, aContent-Type header provides the client with the actual content type of the returned content.
user-agentto act as a "real" user request from the browser by passing it torequest headers.Defaultrequests user-agent is apython-reqeusts so websites might understand that it's a bot or a script and block the request to the website.Check what's youruser-agent.

Create thedata list to hold all the data, and theold_page_result list that we'll need later:

data=[]old_page_result=[]
Enter fullscreen modeExit fullscreen mode

To scrape Brave images with pagination, you need to use theoffset parameter of the URL, which defaults to0 for the first page,151 for the second, and so on. Since data is retrieved from all pages, it is necessary to implement awhile loop:

whileTrue:# pagination will be here
Enter fullscreen modeExit fullscreen mode

In each iteration of the loop, you need to make a request to the Brave API, pass the created request parameters and headers. Using thejson() method, the response is converted into a JSON object for further work:

html=requests.get('https://search.brave.com/api/images',headers=headers,params=params).json()
Enter fullscreen modeExit fullscreen mode

Thenew_page_result list contains all the results on the current page. Thenew_page_result list is compared with theold_page_result list. If they are the same, then this means that we have reached the last page and there is no more new data. Therefore, you need tobreak the loop:

new_page_result=html['results']ifnew_page_result==old_page_result:break
Enter fullscreen modeExit fullscreen mode

📌Note: In the first iteration of the loop, there is no data in theold_page_result list. Therefore, the check will fail.

By looping through thenew_page_result list in afor loop, you can get the data. For each result, data such astitle,link,source,width,height, andimage are retrieved:

forresultinnew_page_result:data.append({'title':result.get('title'),'link':result.get('url'),'source':result.get('source'),'width':result.get('properties').get('width'),'height':result.get('properties').get('height'),'image':result.get('properties').get('url')})
Enter fullscreen modeExit fullscreen mode

📌Note: Theimage key contains a full resolution image.

After extracting the data, you need to increase the value of theoffset parameter by151. This value also increases on the site when you click on the button responsible for showing more data, that is, we simulate this behavior:

params['offset']+=151
Enter fullscreen modeExit fullscreen mode

This is shown more clearly in the GIF below:

brave-images-pagination

On each iteration, the data from thenew_page_result list will be written to theold_page_result list until they are the same:

old_page_result=new_page_result
Enter fullscreen modeExit fullscreen mode

Output

[{"title":"Dune (2021) | The Poster Database (TPDb)","link":"https://theposterdb.com/posters/42710?page=2","source":"theposterdb.com","width":1365,"height":2048,"image":"https://image.tmdb.org/t/p/original/2sxSn0jjjQoIIZfZjC6j5GZkMVR.jpg"},{"title":"Dune (2021) - Posters\u2014 The Movie Database (TMDB)","link":"https://www.themoviedb.org/movie/438631-dune/images/posters","source":"The Movie Database","width":2000,"height":3000,"image":"https://www.themoviedb.org/t/p/original/7S56MF6XA1jIzD9I2ejMjd6aNvN.jpg"},{"title":"Dune (2021) - Posters\u2014 The Movie Database (TMDb)","link":"https://www.themoviedb.org/movie/438631-dune/images/posters","source":"The Movie Database","width":956,"height":1333,"image":"https://www.themoviedb.org/t/p/original/AqjrlcNRSKx84CeNJyNueg6V1SR.jpg"},{"title":"Dune - Pel\u00edcula 2021 - SensaCine.com","link":"http://www.sensacine.com/peliculas/pelicula-133392/","source":"Sensacine","width":600,"height":800,"image":"http://es.web.img2.acsta.net/pictures/20/04/15/09/53/3283826.jpg"},{"title":"DUNE 2021 Movie Poster : dune","link":"https://www.reddit.com/r/dune/comments/kh9som/dune_2021_movie_poster/","source":"reddit.com","width":1890,"height":2800,"image":"https://preview.redd.it/3fl2s0q1ug661.jpg?auto=webp&s=ed5e4418f962103b0d47b5b466036d7b40aa761b"},...otherimages]
Enter fullscreen modeExit fullscreen mode

Links

Join us onTwitter |YouTube

Add aFeature Request💫 or aBug🐞

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

API to get search engine results with ease.

More fromSerpApi

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp