Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
expected		expected
sql		sql
src		src
test		test
web		web
Makefile		Makefile
README.md		README.md
README.rus.md		README.rus.md
conf.add		conf.add
fix-extension-config.sql		fix-extension-config.sql
internals.md		internals.md
pgpro_scheduler--2.0--2.1.sql		pgpro_scheduler--2.0--2.1.sql
pgpro_scheduler--2.0.sql		pgpro_scheduler--2.0.sql
pgpro_scheduler--2.1.sql		pgpro_scheduler--2.1.sql
pgpro_scheduler.control		pgpro_scheduler.control

Repository files navigation

pgpro_scheduler - PostgreSQL extension for job scheduling

pgpro_scheduler allows to schedule jobs execution and control their activityin PostgreSQL database.

The job is the set of SQL commands. Schedule table could be described as acrontab-like string or as a JSON object. It's possible to use combinationof both methods for scheduling settings.

Each job could calculate its next start time. The set of SQL commandscould be executed in the same transaction or each command could be executed inindividual one. It's possible to set SQL statement to be executed onfailure of main job transaction.

Installation

pgpro_scheduler is a regular PostgreSQL extension and requires no prerequisites.

Before build extension from the source make sure that the environment variablePATH includes path topg_config utility. Also make sure that you havedeveloper version of PostgreSQL installed or PostgrteSQL was built fromsource code.

Install extension as follows:

$ cd pgpro_scheduler$ make USE_PGXS=1$ sudo make USE_PGXS=1 install$ psql <DBNAME> -c "CREATE EXTENSION pgpro_scheduler"

Configuration

The extension defines a number of PostgreSQL variables (GUC). This variableshelp to handle scheduler configuration.

schedule.enabled - boolean, if scheduler is enabled in this system.Default value: false.
schedule.database - text, list of database names on which scheduleris enabled. Database names should be separated by comma.Default value: empty string.
schedule.schema - text, theschema name where scheduler store itstables and functions. To change this value restart required. Normallyyou should not change this variable but it could be useful if youwant run scheduled jobs on hot-standby database. So you can defineforeign data wrapper on master system to wrap default scheduler schemato another and use it on replica. Default value: schedule.
schedule.nodename - text, node name of this instance.Default value ismaster. You should not change or use it if you runsingle server configuration. But it is necessary to change this nameif you run scheduler on hot-standby database.
schedule.max_workers - integer, max number of simultaneously runningjobs for one database. Default value: 2.
schedule.transaction_state - text, this is internal variable.This variable contains state of executed job. This variable was designedto use with a next job start time calculation procedure.Possible values are:
- success - transaction has finished successfully
- failure - transaction has failed to finish
- running - transaction is in progress
- undefined - transaction has not started yet
The last two values normally should not appear inside the user procedure. Ifyou got them probably it indicates an internal scheduler error.

Management

You could manage scheduler work by means of PostgreSQL variables describedabove.

For example, you have a fresh PostgreSQL installation with scheduler extensioninstalled. You are going to use scheduler with databases called 'database1' and'database2'. You want 'database1' be capable to run 5 jobs in parallel and'database2' - 3.

Put the following string to yourpostgresql.conf:

shared_preload_libraries = 'pgpro_scheduler'

Then startpsql and execute the following commands:

# ALTER SYSTEM SET schedule.enabled = true;# ALTER SYSTEM SET schedule.database = 'database1,database2';# ALTER DATABASE database1 SET schedule.max_workers = 5;# ALTER DATABASE database2 SET schedule.max_workers = 3;# SELECT pg_reload_conf();

If you do not need the different values inmax_workers you could storethe same in configuration file. Then ask server to reread configuration. Thereis no need to restart.

Here is an example ofpostgresql.conf:

shared_preload_libraries = 'pgpro_scheduler'schedule.enabled = onschedule.database = 'database1,database2'schedule.max_workers = 5

The scheduler is designed as background worker which dynamically startsanother bgworkers. That's why you should care about proper value inmax_worker_processes variable. The minimal acceptable valuecould be calculated using the following formula:

N_min = 1 + N_databases + MAX_WORKERS₁ + ... + MAX_WORKERS_n

where:

N_min - the minimal acceptable amount of bgworkers in thesystem. Consider the fact that other systems need to start backgroundworkers too. E.g. parallel queries. So you need to adjust the value totheir needs either.
N_databases - the number of databases scheduler works with
MAX_WORKERS_n - the value ofschedule.max_workersvariable in context of each database

SQL Scheme

The extension uses SQL schemaschedule to store its internal tables andfunctions. Direct access to tables is forbidden. All manipulations shouldbe performed by means of functions defined by extension.

SQL Types

The scheduler defines 2 SQL types and use them as types for return valuesfor some of its functions.

cron_rec

This type describes information about the job to be scheduled.

CREATE TYPE schedule.cron_rec AS(id integer,             -- job idnode text,              -- node name to be executed onname text,              -- job name comments text,          -- job's commentrule jsonb,             -- scheduling rulescommands text[],        -- sql commands to be executedrun_as text,            -- name of executor userowner text,             -- name of owner userstart_date timestamp,   -- lower bound of execution window-- NULL if unboundend_date timestamp,     -- upper bound of execution window-- NULL if unbounduse_same_transaction boolean,   -- if true the set of sql commands -- will be executed in same transactionlast_start_available interval,  -- max time till scheduled job -- can wait execution if all allowed -- workers are busymax_instances int,-- max number of simultaneous running instances-- of this jobmax_run_time interval,  -- max execution timeonrollback text,        -- SQL command to be performed on transaction-- failurenext_time_statement text,   -- SQL command to execute on main -- transaction end to calculate next -- start timeactive boolean,         -- true - job could be scheduledbroken boolean          -- true - job has errors in configutration-- that prevent it's further execution);

cron_job

Type describes information about job scheduled execution

CREATE TYPE schedule.cron_job AS(cron integer,           -- job idnode text,              -- node name to be executed onscheduled_at timestamp, -- scheduled execution timename text,              -- job namecomments text,          -- job commentscommands text[],        -- sql commands to be executedrun_as text,            -- name of executor userowner text,             -- name of owner useruse_same_transaction boolean,-- if true the set of sql commands-- will be executed in same transactionstarted timestamp,      -- timestamp of this job execution startedlast_start_available timestamp,-- time untill job must be startedfinished timestamp,     -- timestamp of this job execution finishedmax_run_time interval,  -- max execution timemax_instances int,-- the number of instances run at the same timeonrollback text,        -- statement on ROLLBACKnext_time_statement text,-- statement to calculate next start timestatus text,-- status of this task: working, done, error message text-- error message);