MontFerret/workerPublic

NotificationsYou must be signed in to change notification settings
Fork7
Star14

Containerized Ferret worker

License

Apache-2.0 license

14 stars 7 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.github		.github
assets		assets
internal		internal
pkg		pkg
reference		reference
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Dockerfile.release		Dockerfile.release
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
install.sh		install.sh
main.go		main.go
revive.toml		revive.toml
versions.sh		versions.sh

Repository files navigation

Worker

Worker is a simple HTTP server that acceptsFQL (Ferret Query Language) queries, executes them and returns their results.

What is Ferret?

Ferret is a declarative web scraping query language that allows you to extract data from web pages using a SQL-like syntax. Worker provides a REST API interface to execute FQL queries remotely, making it easy to integrate web scraping capabilities into your applications.

Common use cases:

Web scraping and data extraction from websites
Automated testing of web applications
Monitoring web pages for changes
Generating PDFs or screenshots from web pages
Collecting data for analytics and research

OpenAPI v2 schema can be foundhere.

Quick start

Prerequisites

Docker (recommended) or Go 1.23+ for local installation
For local installation without Docker: Google Chrome or Chromium browser

Running with Docker

The Worker is shipped with dedicated Docker image that contains headless Google Chrome, so feel free to run queries usingcdp driver:

DockerHub:

docker run -d -p 8080:8080 montferret/worker

GitHub Container Registry:

docker run -d -p 8080:8080 ghcr.io/montferret/worker

Local Installation

Alternatively, if you want to use your own version of Chrome, you can run the Worker locally.

Install from script:

curl https://raw.githubusercontent.com/MontFerret/worker/master/install.sh| shworker

Build from source:

git clone https://github.com/MontFerret/worker.gitcd workermake

Your First Query

Once the Worker is running, you can send FQL queries via POST requests tohttp://localhost:8080/:

Simple data extraction:

curl -X POST http://localhost:8080/ \  -H"Content-Type: application/json" \  -d'{    "text": "LET doc = DOCUMENT(\"https://example.com\") RETURN doc.title"  }'

Web scraping with browser automation:

curl -X POST http://localhost:8080/ \  -H"Content-Type: application/json" \  -d'{    "text": "LET page = DOCUMENT(\"https://github.com\", { driver: \"cdp\" }) WAIT_ELEMENT(page, \"h1\") RETURN INNER_TEXT(page, \"h1\")"  }'

Query with parameters:

curl -X POST http://localhost:8080/ \  -H"Content-Type: application/json" \  -d'{    "text": "LET doc = DOCUMENT(@url) RETURN doc.title",    "params": {      "url": "https://example.com"    }  }'

Visual Example

System Resource Requirements

2 CPU
2 Gb of RAM

Usage

API Reference

Endpoints

POST /

Executes a given FQL query. The payload must have the following shape:

{"text":"LET doc = DOCUMENT('https://example.com') RETURN doc.title","params": {"optional_param":"value"  }}

Request body:

text (string, required): The FQL query to execute
params (object, optional): Parameters to pass to the query (accessible via@param_name)

Response:

{"data":"Example Domain","stats": {"execution_time":"1.234s"  }}

Example with complex data extraction:

curl -X POST http://localhost:8080/ \  -H"Content-Type: application/json" \  -d'{    "text": "LET page = DOCUMENT(@url, { driver: \"cdp\" }) LET links = ELEMENTS(page, \"a\") RETURN links[* LIMIT 5].href",    "params": {      "url": "https://news.ycombinator.com"    }  }'

GET /info

Returns worker information including Chrome, Ferret and worker versions:

{"ip":"127.0.0.1","version": {"worker":"1.18.0","chrome": {"browser":"125.0.6422.141","protocol":"1.3","v8":"12.5.227.39","webkit":"537.36"    },"ferret":"0.18.1"  }}

GET /health

Health check endpoint that returns HTTP 200 when the service is healthy and all dependencies (like Chrome) are accessible. Returns HTTP 424 when dependencies are unavailable.

Healthy response:

HTTP/1.1 200 OK

Unhealthy response:

HTTP/1.1 424 Failed Dependency

Configuration

Command Line Options

  -log-level="debug"    log level (trace, debug, info, warn, error, fatal, panic)  -port=8080    port to listen  -body-limit=1000    maximum size of request bodyin kb. 0 means no limit.  -request-limit=0    amount of requests per secondfor each IP. 0 means no limit.  -request-limit-time-window=180    amount of secondsfor request rate limittime window.  -cache-size=100    amount of cached queries. 0 means no caching.  -chrome-ip="127.0.0.1"    Google Chrome remote IP address  -chrome-port=9222    Google Chrome remote debugging port  -no-chrome=false    disable Chrome driver  -version=false    show version  -help=false    show this list

Configuration Examples

Production deployment with rate limiting:

worker \  -port=8080 \  -log-level=info \  -request-limit=10 \  -request-limit-time-window=60 \  -body-limit=2000 \  -cache-size=500

Development with debugging:

worker \  -port=3000 \  -log-level=debug \  -cache-size=0

Using external Chrome instance:

# Start Chrome with remote debugginggoogle-chrome --headless --remote-debugging-port=9222&# Start worker pointing to external Chromeworker -chrome-ip=localhost -chrome-port=9222

Without Chrome (HTTP driver only):

worker -no-chrome=true

Docker Configuration

Custom port and configuration:

docker run -d \  -p 3000:3000 \  -e PORT=3000 \  montferret/worker \  worker -port=3000 -log-level=info

With volume for persistent cache:

docker run -d \  -p 8080:8080 \  -v /host/cache:/app/cache \  montferret/worker

Security Considerations

⚠️Important for Production Deployments:

Rate Limiting: Always enable rate limiting in production (-request-limit)
Body Size Limits: Set appropriate body size limits (-body-limit) to prevent abuse
Network Security: Worker should not be exposed directly to the internet without proper authentication
Query Validation: Consider implementing query validation/filtering for untrusted input
Resource Monitoring: Monitor CPU and memory usage as complex queries can be resource-intensive
Chrome Security: The bundled Chrome runs in sandboxed mode, but avoid running as root in production

Recommended production configuration:

worker \  -port=8080 \  -log-level=warn \  -request-limit=5 \  -request-limit-time-window=60 \  -body-limit=1000 \  -cache-size=200

Troubleshooting

Common Issues

Chrome connection failed:

Error: failed to connect to Chrome

Ensure Chrome is running with--remote-debugging-port=9222
Check if Chrome is accessible at the configured IP/port
For Docker: make sure Chrome service is healthy

Query timeout:

Error: query execution timeout

Complex pages may take longer to load
Consider adding explicit waits in your FQL query
Check network connectivity to target websites

Memory issues:

Error: out of memory

Reduce cache size (-cache-size)
Limit concurrent requests (-request-limit)
Monitor Chrome memory usage

Permission denied:

Error: permission denied accessing Chrome

Ensure proper user permissions for Chrome binary
In Docker, avoid running as root when possible

Debug Mode

Enable debug logging to troubleshoot issues:

worker -log-level=debug

Health Check

Monitor worker health:

curl http://localhost:8080/healthcurl http://localhost:8080/info

FQL Query Examples

Basic Web Scraping

// Extract page titleLETdoc=DOCUMENT("https://example.com")RETURNdoc.title// Get all linksLETdoc=DOCUMENT("https://example.com")LETlinks=ELEMENTS(doc,"a")RETURNlinks[*].href// Extract structured dataLETdoc=DOCUMENT("https://news.ycombinator.com")LETstories=ELEMENTS(doc,".titleline > a")RETURNstories[*LIMIT10].{title:INNER_TEXT(@),url: @.href}

Browser Automation with CDP

// Navigate and interact with pageLETpage=DOCUMENT("https://github.com",{driver:"cdp"})WAIT_ELEMENT(page,"input[name='q']")INPUT(page,"input[name='q']","ferret")CLICK(page,"button[type='submit']")WAIT_ELEMENT(page,".repo-list-item")RETURNELEMENTS(page,".repo-list-item h3 a")[*].{name:INNER_TEXT(@),url: @.href}// Take screenshotLETpage=DOCUMENT("https://example.com",{driver:"cdp"})RETURNPDF(page)

Using Parameters

// Query with parameters (pass via "params" in POST body)LETpage=DOCUMENT(@url,{driver:"cdp"})LETselector= @css_selectorRETURNELEMENTS(page,selector)[*].{text:INNER_TEXT(@),href: @.href}

Development

Building from Source

# Clone repositorygit clone https://github.com/MontFerret/worker.gitcd worker# Install dependenciesmake install# Buildmake build# Run testsmaketest# Start development servermake start

Contributing

Fork the repository
Create a feature branch:git checkout -b my-feature
Make your changes
Run tests:make test
Run linter:make lint
Commit changes:git commit -am 'Add some feature'
Push to the branch:git push origin my-feature
Submit a pull request

Project Structure

├── cmd/                    # Command-line interface├── internal/               # Internal application code│   ├── controllers/        # HTTP request handlers│   ├── server/            # HTTP server configuration│   └── storage/           # Caching layer├── pkg/                   # Public packages│   ├── caching/           # Cache implementation│   └── worker/            # Core worker logic├── reference/             # OpenAPI specification└── assets/               # Documentation assets

About

Containerized Ferret worker

Releases29

v1.23.0 Latest

Feb 5, 2026

+ 28 releases

Movatterモバイル変換

Uh oh!

License

MontFerret/worker

Folders and files

Latest commit

History

Repository files navigation

Worker

What is Ferret?

Quick start

Prerequisites

Running with Docker

Local Installation

Your First Query

Visual Example

System Resource Requirements

Usage

API Reference

Endpoints

POST /

GET /info

GET /health

Configuration

Command Line Options

Configuration Examples

Docker Configuration

Security Considerations

Troubleshooting

Common Issues

Debug Mode

Health Check

FQL Query Examples

Basic Web Scraping

Browser Automation with CDP

Using Parameters

Development

Building from Source

Contributing

Project Structure

Links

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases29

Sponsor this project

Uh oh!

Packages0

Uh oh!

Uh oh!

Contributors4

Uh oh!

Languages

Packages