oxylabs/Oxylabs-Web-Scraper-API-SchedulerPublic

NotificationsYou must be signed in to change notification settings
Fork1
Star7

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
images		images
runtime_files		runtime_files
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
stop.py		stop.py

Repository files navigation

Oxylabs Web Scraper API Scheduler

Requirements

Python 3.9+

Setting up Python Environment

Open a terminal or command prompt;

Clone this repository:

git clone git@gitlab-ssh.int.oxyplatform.io:oxybrain/scraper-api-setup.git

Navigate to your project directory:
```
cd scraper-api-setup/
```
Create a virtual environment using Python 3.9:
```
python3.9 -m venv venv
```
Activate the virtual environment:
- On Mac/Linux:
```
source venv/bin/activate
```
- On Windows:
```
venv\Scripts\activate
```

Installing Requirements

Ensure your virtual environment is activated
Install the required packages:
```
pip install -r requirements.txt
```

Setting up Scraper API

The only thing you need to set it up is to prepare a list of URLs and payload of Scraper API parameters which can be generated in our playground. Below you will find general steps to run our Scraper API and a few examples of different options you have to run it.

Steps:

Generate payload by describing your scraping needs in Oxy playground (https://dashboard.oxylabs.io/?route=/api-playground).
Copy payload and store it either inruntime_files/payload.json or as a new file on your device.
Store URLs you wish to scrape inruntime_files/urls.txt or as a new file on your device. Each url must be separated by newline.
Run script which registers/performs your scraping jobs (ensure you have completed installation steps stated above and have environment activated).
```
python3.9 run.py
```
This will start a setup wizard, where you have to provide information about your account and some other.
Enter your Oxylabs username and password. It will be cached into file.env. If you later wish to change it, just remove this file.

Examples

Store completed jobs locally

Initiaterun.py script.
```
python3.9 run.py
```
When askedSelect where you wish to store the results. enter1 to chooseLocally.
When askedFull path to existing directory where results should be stored: enter path to directory where completed jobs should be stored. This directory must already exist on your device.
That's it, completed jobs will be stored in desired location after some time, depending on the number of URLs you asked to scrape.

Image of wizard for reference

Store completed jobs in cloud

We do support only AWS S3 or Google Cloud Storage for now.

Initiaterun.py script.
```
python3.9 run.py
```
When askedSelect where you wish to store the results. enter2 to chooseCloud.
When askedPath to cloud bucket and directory/partition where results should be stored: enter path to your cloud directory where completed jobs should be stored. Bucket permissions should be adjust by using these instructions -https://developers.oxylabs.io/scraper-apis/web-scraper-api/features/cloud-storage.
When askedDo you want to schedule urls to be scraped repetitively?(y/n) entern as now we just want to scrape all URLs once.

Image of wizard for reference

Store completed jobs in cloud and schedule it to be repeated.

Initiaterun.py script.
```
python3.9 run.py
```
When askedSelect where you wish to store the results. enter2 to chooseCloud.
When askedPath to cloud bucket and directory/partition where results should be stored: enter path to your cloud directory where completed jobs should be stored. Bucket permissions should be adjust by using these instructions -https://developers.oxylabs.io/scraper-apis/web-scraper-api/features/cloud-storage.
When askedDo you want to schedule urls to be scraped repetitively?(y/n) entery.
Next you have to select frequency.
- If you chooseDaily, you will need to enter hour scraping jobs will run each day.
- If you chooseWeekly, you will need to select weekday scraping jobs will run each week.
- If you chooseMonthly, you will need to enter day scraping jobs will run each month.
Lastly, you will be asked to specify date and time when scheduler should stop. You MUST enter end time in format of2032-12-21 12:34:45. You will be able to stop scheduler before end time. Example of how to do it is in next section.
You will get schedule id, which you should save.

Image of wizard for reference

Stopping scheduler

Initiaterun.py script.
```
python3.9 stop.py
```
Enter your username and password.
Select schedule id you wish to deactivate.

Image of wizard for reference

About

No description, website, or topics provided.

Releases

No releases published

Packages

No packages published

Contributors2

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Oxylabs Web Scraper API Scheduler

Requirements

Setting up Python Environment

Installing Requirements