Scrape
Start Scrape Job
Starts a scrape job for a given URL.
Method:client.scrape.start(params: StartScrapeJobParams): StartScrapeJobResponse
Endpoint:POST /api/scrape
Parameters:
StartScrapeJobParams
:url: string
- URL to scrapesession_options?:
CreateSessionParams
scrape_options?:
ScrapeOptions
Response:StartScrapeJobResponse
Example:
response= client.scrape.start(StartScrapeJobParams(url="https://example.com"))print(response.jobId)
Get Scrape Job
Retrieves details of a specific scrape job.
Method:client.scrape.get(id: str): ScrapeJobResponse
Endpoint:GET /api/scrape/{id}
Parameters:
id: string
- Scrape job ID
Response:ScrapeJobResponse
Example:
response = client.scrape.get( "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e")print(response.status)
Start Scrape Job and Wait
Start a scrape job and wait for it to complete
Method:client.scrape.start_and_wait(params: StartScrapeJobParams): ScrapeJobResponse
Parameters:
StartScrapeJobParams
:url: string
- URL to scrapesession_options?:
CreateSessionParams
scrape_options?:
ScrapeOptions
Response:ScrapeJobResponse
Example:
response = client.scrape.start_and_wait(StartScrapeJobParams(url="https://example.com"))print(response.status)
Types
ScrapeFormat
ScrapeFormat = Literal["markdown", "html", "links", "screenshot"]
ScrapeJobStatus
ScrapeJobStatus = Literal["pending", "running", "completed", "failed"]
ScrapeOptions
class ScrapeOptions(BaseModel): formats: Optional[List[ScrapeFormat]] = None include_tags: Optional[List[str]] = Field( default=None, serialization_alias="includeTags" ) exclude_tags: Optional[List[str]] = Field( default=None, serialization_alias="excludeTags" ) only_main_content: Optional[bool] = Field( default=None, serialization_alias="onlyMainContent" ) wait_for: Optional[int] = Field(default=None, serialization_alias="waitFor") timeout: Optional[int] = Field(default=None, serialization_alias="timeout")
StartScrapeJobResponse
class StartScrapeJobResponse(BaseModel): job_id: str = Field(alias="jobId")
ScrapeJobData
class ScrapeJobData(BaseModel): metadata: Optional[dict[str, Union[str, list[str]]]] = None html: Optional[str] = None markdown: Optional[str] = None links: Optional[List[str]] = None
ScrapeJobResponse
class ScrapeJobResponse(BaseModel): job_id: str = Field(alias="jobId") status: ScrapeJobStatus error: Optional[str] = None data: Optional[ScrapeJobData] = None
Last updated