- Notifications
You must be signed in to change notification settings - Fork2
Fast python library for the Crawlbase API
License
crawlbase/crawlbase-python
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
A lightweight, dependency free Python class that acts as wrapper for Crawlbase API.
Choose a way of installing:
- Download the python class from Github.
- Or usePyPi Python package manager.
pip install crawlbase
Then import the CrawlingAPI, ScraperAPI, etc as needed.
fromcrawlbaseimportCrawlingAPI,ScraperAPI,LeadsAPI,ScreenshotsAPI,StorageAPI
First initialize the CrawlingAPI class.
api=CrawlingAPI({'token':'YOUR_CRAWLBASE_TOKEN' })
Pass the url that you want to scrape plus any options from the ones available in theAPI documentation.
api.get(url,options= {})
Example:
response=api.get('https://www.facebook.com/britneyspears')ifresponse['status_code']==200:print(response['body'])
You can pass any options from Crawlbase API.
Example:
response=api.get('https://www.reddit.com/r/pics/comments/5bx4bx/thanks_obama/', {'user_agent':'Mozilla/5.0 (Windows NT 6.2; rv:20.0) Gecko/20121202 Firefox/30.0','format':'json'})ifresponse['status_code']==200:print(response['body'])
Pass the url that you want to scrape, the data that you want to send which can be either a json or a string, plus any options from the ones available in theAPI documentation.
api.post(url,dictionaryorstringdata,options= {})
Example:
response=api.post('https://producthunt.com/search', {'text':'example search' })ifresponse['status_code']==200:print(response['body'])
You can send the data asapplication/json instead ofx-www-form-urlencoded by setting optionpost_content_type as json.
importjsonresponse=api.post('https://httpbin.org/post',json.dumps({'some_json':'with some value' }), {'post_content_type':'json' })ifresponse['status_code']==200:print(response['body'])
If you need to scrape any website built with Javascript like React, Angular, Vue, etc. You just need to pass your javascript token and use the same calls. Note that only.get is available for javascript and not.post.
api=CrawlingAPI({'token':'YOUR_JAVASCRIPT_TOKEN' })
response=api.get('https://www.nfl.com')ifresponse['status_code']==200:print(response['body'])
Same way you can pass javascript additional options.
response=api.get('https://www.freelancer.com', {'page_wait':5000 })ifresponse['status_code']==200:print(response['body'])
You can always get the original status and crawlbase status from the response. Read theCrawlbase documentation to learn more about those status.
response=api.get('https://craiglist.com')print(response['headers']['original_status'])print(response['headers']['pc_status'])
If you have questions or need help using the library, please open an issue orcontact us.
The usage of the Scraper API is very similar, just change the class name to initialize.
scraper_api=ScraperAPI({'token':'YOUR_NORMAL_TOKEN' })response=scraper_api.get('https://www.amazon.com/DualSense-Wireless-Controller-PlayStation-5/dp/B08FC6C75Y/')ifresponse['status_code']==200:print(response['json']['name'])# Will print the name of the Amazon product
To find email leads you can use the leads API, you can check the fullAPI documentation if needed.
leads_api=LeadsAPI({'token':'YOUR_NORMAL_TOKEN' })response=leads_api.get_from_domain('microsoft.com')ifresponse['status_code']==200:print(response['json']['leads'])
Initialize with your Screenshots API token and call theget method.
screenshots_api=ScreenshotsAPI({'token':'YOUR_NORMAL_TOKEN' })response=screenshots_api.get('https://www.apple.com')ifresponse['status_code']==200:print(response['headers']['success'])print(response['headers']['url'])print(response['headers']['remaining_requests'])print(response['file'])
or specifying a file path
screenshots_api=ScreenshotsAPI({'token':'YOUR_NORMAL_TOKEN' })response=screenshots_api.get('https://www.apple.com', {'save_to_path':'apple.jpg' })ifresponse['status_code']==200:print(response['headers']['success'])print(response['headers']['url'])print(response['headers']['remaining_requests'])print(response['file'])
or if you setstore=true thenscreenshot_url is set in the returned headers
screenshots_api=ScreenshotsAPI({'token':'YOUR_NORMAL_TOKEN' })response=screenshots_api.get('https://www.apple.com', {'store':'true' })ifresponse['status_code']==200:print(response['headers']['success'])print(response['headers']['url'])print(response['headers']['remaining_requests'])print(response['file'])print(response['headers']['screenshot_url'])
Note thatscreenshots_api.get(url, options) method accepts anoptions
Initialize the Storage API using your private token.
storage_api=StorageAPI({'token':'YOUR_NORMAL_TOKEN' })
Pass theurl that you want to get fromCrawlbase Storage.
response=storage_api.get('https://www.apple.com')ifresponse['status_code']==200:print(response['headers']['original_status'])print(response['headers']['pc_status'])print(response['headers']['url'])print(response['headers']['rid'])print(response['headers']['stored_at'])print(response['body'])
or you can use theRID
response=storage_api.get('RID_REPLACE')ifresponse['status_code']==200:print(response['headers']['original_status'])print(response['headers']['pc_status'])print(response['headers']['url'])print(response['headers']['rid'])print(response['headers']['stored_at'])print(response['body'])
Note: One of the two RID or URL must be sent. So both are optional but it's mandatory to send one of the two.
Delete request
To delete a storage item from your storage area, use the correct RID
ifstorage_api.delete('RID_REPLACE'):print('delete success')else:print('Unable to delete')
Bulk request
To do a bulk request with a list of RIDs, please send the list of rids as an array
response=storage_api.bulk(['RID1','RID2','RID3', ...])ifresponse['status_code']==200:foriteminresponse['json']:print(item['original_status'])print(item['pc_status'])print(item['url'])print(item['rid'])print(item['stored_at'])print(item['body'])
RIDs request
To request a bulk list of RIDs from your storage area
rids=storage_api.rids()print(rids)
You can also specify a limit as a parameter
storage_api.rids(100)
To get the total number of documents in your storage area
total_count=storage_api.totalCount()print(total_count)
If you need to use a custom timeout, you can pass it to the class instance creation like the following:
api=CrawlingAPI({'token':'TOKEN','timeout':120 })
Timeout is in seconds.
Copyright 2025 Crawlbase
About
Fast python library for the Crawlbase API
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors2
Uh oh!
There was an error while loading.Please reload this page.