Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Various experiments with PostgreSQL clustering

NotificationsYou must be signed in to change notification settings

postgrespro/postgres_cluster

Repository files navigation

pgpro_scheduler allows to schedule jobs execution and control their activityin PostgreSQL database.

The job is the set of SQL commands. Schedule table could be described as acrontab-like string or as a JSON object. It's possible to use combinationof both methods for scheduling settings.

Each job could calculate its next start time. The set of SQL commandscould be executed in the same transaction or each command could be executed inindividual one. It's possible to set SQL statement to be executed onfailure of main job transaction.

Installation

pgpro_scheduler is a regular PostgreSQL extension and requires no prerequisites.

Before build extension from the source make sure that the environment variablePATH includes path topg_config utility. Also make sure that you havedeveloper version of PostgreSQL installed or PostgrteSQL was built fromsource code.

Install extension as follows:

$ cd pgpro_scheduler$ make USE_PGXS=1$ sudo make USE_PGXS=1 install$ psql <DBNAME> -c "CREATE EXTENSION pgpro_scheduler"

Configuration

The extension defines a number of PostgreSQL variables (GUC). This variableshelp to handle scheduler configuration.

  • schedule.enabled - boolean, if scheduler is enabled in this system.Default value: false.

  • schedule.database - text, list of database names on which scheduleris enabled. Database names should be separated by comma.Default value: empty string.

  • schedule.schema - text, theschema name where scheduler store itstables and functions. To change this value restart required. Normallyyou should not change this variable but it could be useful if youwant run scheduled jobs on hot-standby database. So you can defineforeign data wrapper on master system to wrap default scheduler schemato another and use it on replica. Default value: schedule.

  • schedule.nodename - text, node name of this instance.Default value ismaster. You should not change or use it if you runsingle server configuration. But it is necessary to change this nameif you run scheduler on hot-standby database.

  • schedule.max_workers - integer, max number of simultaneously runningjobs for one database. Default value: 2.

  • schedule.transaction_state - text, this is internal variable.This variable contains state of executed job. This variable was designedto use with a next job start time calculation procedure.Possible values are:

    • success - transaction has finished successfully
    • failure - transaction has failed to finish
    • running - transaction is in progress
    • undefined - transaction has not started yet

    The last two values normally should not appear inside the user procedure. Ifyou got them probably it indicates an internal scheduler error.

Management

You could manage scheduler work by means of PostgreSQL variables describedabove.

For example, you have a fresh PostgreSQL installation with scheduler extensioninstalled. You are going to use scheduler with databases called 'database1' and'database2'. You want 'database1' be capable to run 5 jobs in parallel and'database2' - 3.

Put the following string to yourpostgresql.conf:

shared_preload_libraries = 'pgpro_scheduler'

Then startpsql and execute the following commands:

# ALTER SYSTEM SET schedule.enabled = true;# ALTER SYSTEM SET schedule.database = 'database1,database2';# ALTER DATABASE database1 SET schedule.max_workers = 5;# ALTER DATABASE database2 SET schedule.max_workers = 3;# SELECT pg_reload_conf();

If you do not need the different values inmax_workers you could storethe same in configuration file. Then ask server to reread configuration. Thereis no need to restart.

Here is an example ofpostgresql.conf:

shared_preload_libraries = 'pgpro_scheduler'schedule.enabled = onschedule.database = 'database1,database2'schedule.max_workers = 5

The scheduler is designed as background worker which dynamically startsanother bgworkers. That's why you should care about proper value inmax_worker_processes variable. The minimal acceptable valuecould be calculated using the following formula:

Nmin = 1 + Ndatabases + MAX_WORKERS1 + ... + MAX_WORKERSn

where:

  • Nmin - the minimal acceptable amount of bgworkers in thesystem. Consider the fact that other systems need to start backgroundworkers too. E.g. parallel queries. So you need to adjust the value totheir needs either.
  • Ndatabases - the number of databases scheduler works with
  • MAX_WORKERSn - the value ofschedule.max_workersvariable in context of each database

SQL Scheme

The extension uses SQL schemaschedule to store its internal tables andfunctions. Direct access to tables is forbidden. All manipulations shouldbe performed by means of functions defined by extension.

SQL Types

The scheduler defines 2 SQL types and use them as types for return valuesfor some of its functions.

cron_rec

This type describes information about the job to be scheduled.

CREATE TYPE schedule.cron_rec AS(id integer,             -- job idnode text,              -- node name to be executed onname text,              -- job name comments text,          -- job's commentrule jsonb,             -- scheduling rulescommands text[],        -- sql commands to be executedrun_as text,            -- name of executor userowner text,             -- name of owner userstart_date timestamp,   -- lower bound of execution window-- NULL if unboundend_date timestamp,     -- upper bound of execution window-- NULL if unbounduse_same_transaction boolean,   -- if true the set of sql commands -- will be executed in same transactionlast_start_available interval,  -- max time till scheduled job -- can wait execution if all allowed -- workers are busymax_instances int,-- max number of simultaneous running instances-- of this jobmax_run_time interval,  -- max execution timeonrollback text,        -- SQL command to be performed on transaction-- failurenext_time_statement text,   -- SQL command to execute on main -- transaction end to calculate next -- start timeactive boolean,         -- true - job could be scheduledbroken boolean          -- true - job has errors in configutration-- that prevent it's further execution);

cron_job

Type describes information about job scheduled execution

CREATE TYPE schedule.cron_job AS(cron integer,           -- job idnode text,              -- node name to be executed onscheduled_at timestamp, -- scheduled execution timename text,              -- job namecomments text,          -- job commentscommands text[],        -- sql commands to be executedrun_as text,            -- name of executor userowner text,             -- name of owner useruse_same_transaction boolean,-- if true the set of sql commands-- will be executed in same transactionstarted timestamp,      -- timestamp of this job execution startedlast_start_available timestamp,-- time untill job must be startedfinished timestamp,     -- timestamp of this job execution finishedmax_run_time interval,  -- max execution timemax_instances int,-- the number of instances run at the same timeonrollback text,        -- statement on ROLLBACKnext_time_statement text,-- statement to calculate next start timestatus text,-- status of this task: working, done, error message text-- error message);

Functions

schedule.create_job(cron text, sql text, node text)

Creates job and sets it active.

Arguments:

  • cron - crontab-like string to set schedule
  • sql - SQL statement to execute
  • node - node name, optional

Returns id of created job.

schedule.create_job(cron text, sqls text[], node text)

Creates job and sets it active.

Arguments:

  • cron - crontab-like string to set schedule
  • sqls - set of SQL statements to be executed
  • node - node name, optional

Returns id of created job.

schedule.create_job(date timestamp with time zone, sql text, node text)

Creates job and sets it active.

Arguments:

  • date - exact date of execution
  • sql - SQL statement to execute
  • node - node name, optional

Returns id of created job.

schedule.create_job(date timestamp with time zone, sqls text[], node text)

Creates job and sets it active.

Arguments:

  • date - exec date of execution
  • sqls - set of SQL statements to be executed
  • node - node name, optional

Returns id of created job.

schedule.create_job(dates timestamp with time zone[], sql text, node text)

Creates job and sets it active.

Arguments:

  • dates - set of execution dates
  • sql - SQL statement to execute
  • node - node name, optional

Returns id of created job.

schedule.create_job(dates timestamp with time zone[], sqls text[], node text)

Creates job and sets it active.

Arguments:

  • dates - set of execution dates
  • sqls - set of SQL statements to be executed
  • node - node name, optional

Returns id of created job.

schedule.create_job(data jsonb)

Creates job and sets it active.

The only argument is a JSONB object with information about job.

The object could contains the following keys, some of them could be omitted:

  • name - job name;
  • node - node name;
  • comments - job comments;
  • cron - cron-like string for scheduling settings;
  • rule - scheduling settings as JSONB object (see description later);
  • command - SQL statement to be executed;
  • commands - a set of SQL statements to be executed as an array;
  • run_as - user to execute command(s);
  • start_date - begin of period while command can be executed,could be NULL;
  • end_date - end of period while command can be executed,could be NULL;
  • date - exact date when command will be executed;
  • dates - set of exact dates when command will be executed;
  • use_same_transaction - if set of commands will be executed withinthe same transaction. Default: false;
  • last_start_available - for how long command execution could bepostponed if maximum number of allowed workers reached at the scheduledmoment. Time set in format ofinterval. E.g. '00:02:34' - it ispossible to wait for 2 minutes 34 seconds. If time is NULL or not setwaits forever. Default: NULL;
  • max_run_time - for how long scheduled job can be executed.Format:interval. If NULL or not set - there is no time limits.Default: NULL;
  • onrollback - SQL statement to be executed on ROLLBACK if maintransaction failure. Default: not defined;
  • next_time_statement - SQL statement to calculate next start time.

The rules of scheduling could be set as cron-like string (keycron) andalso as JSONB object (keyrule).

This object contains the following keys:

  • minutes - minutes, array of integers in range 0-59
  • hours - hours, array of integers in range 0-23
  • days - days of month, array of integers in range 1-31
  • months - months, array of integers in range 1-12
  • wdays - day of week, array of integers in range 0-6 where 0 - is Sunday
  • onstart - integer with value 0 or 1, if value equals to 1 job will beexecuted on scheduler start only once

Also job could be scheduled on exact date or set of dates. Use keysdateanddates keys accordingly.

All scheduling methods could be combined but the use of at least one of them ismandatory.

Keynext_time_statement may contain SQL statement to be executedafter the main transaction to calculate next start time. If key is definedthe first start time will be calculated by methods described above butsuccessive start times will be derived from this statement. The statementmust return record with the first field containing value of typetimestamp with time zone. If returning value be of the different type orstatement execution produce an error the job will be marked as broken and furtherexecution will be prohibited.

This statement will be executed in spite of main transaction execution state.It's possible to get state of main transaction form postgres variableschedule.transaction_state.

The possible values are:

  • success - transaction is successful
  • failure - transaction is failed
  • running - transaction is in progress
  • undefined - undefined - transaction has not been started yet

The last two values should not appear as a value of the keynext_time_statement within user procedure.

SQL statement to be executed could be set incommand orcommands key.The first one sets the single statement, the second - the set of statements.In fact the first key could contains the set of commands in one stringdivided by semicolon. In this case they all be executed in single transactionin spite of the valueuse_same_transaction. So for set of the statementsis better to use keycommands key as you get more control on execution.

Returns id of created job.

schedule.set_job_attributes(job_id integer, data jsonb)

Edits properties of existed job

Arguments:

  • job_id - job id
  • data - JSONB object with properties to be edited. The descriptionof keys and their structure could be found in functionschedule.create_job description.

The function returns boolean value:

  • true - properties changed successfully
  • false - properties unchanged

The user can edit properties of jobs it owns unless the user is superuser.

schedule.set_job_attribute(job_id integer, name text, value text || anyarray)

Edits one property of existed job.

Arguments:

  • job_id - job id
  • name - property name
  • value - property value

The full list of the properties could be found inschedule.create_jobfunction description. Some values are of array types. Theirvalue couldbe passed as an array, but if the value could not be an array the exceptionwill be raised.

The function returns boolean value, true on success and false on failure.

The user can edit properties of jobs it owns unless the user is superuser.

schedule.deactivate_job(job_id integer)

Deactivates job and suspends its further scheduling and execution.

Arguments:

  • job_id - job id

Returns true on success, false on failure.

schedule.activate_job(integer)

Activates job and starts its scheduling and execution.

Arguments:

  • job_id - job id

Returns true on success, false on failure.

schedule.drop_job(jobId integer)

Deletes job.

Arguments:

  • job_id - job id

Returns true on success, false on failure.

schedule.get_job(job_id integer)

Retrieves information about the job.

Arguments:

  • job_id - job id

The return value is of typecron_rec. Description of type could be foundinSQL types section.

schedule.get_user_owned_cron(username text)

Retrieves job list owned by user.

Arguments:

  • username - user name, optional

Returns the set of records of typecron_rec. Records contain informationabout jobs owned by user. If user name is omitted the session user name isused.

Retrieve jobs owned by another user is allowed only to superuser.

cron_rec type description can be found inSQL type section.

schedule.get_user_cron(username text)

Retrieves job list executed as user.

Arguments:

  • username - user name, optional

Returns the set of records of typecron_rec. Records contain informationabout jobs executed as user. If user name is not specified session user isused.

Retrieve jobs executed as another user is allowed only to superuser.

Typecron_rec description can be found inSQL type section.

schedule.get_user_active_jobs(username text)

Returns jobs list executed in this very moment as user passed in arguments.

Arguments:

  • username - user name, optional

If user name is omitted the session user name used. To list jobs executedas another user allowed only to superuser.

Return value is set of records of typecron_job.See the type description inSQL types section.

schedule.get_active_jobs()

Returns list of jobs being executed at that very moment.The function call allowed only to superuser.

Return value is set of records of typecron_job.See the type description inSQL types section.

schedule.get_log()

Returns list of all completed jobs.The function call allowed only to superuser.

Return value is set of records of typecron_job.See the type description inSQL types section.

schedule.get_user_log(username text)

Returns list of completed jobs executed as user passed in arguments.

Arguments:

  • username - user name, optional

If username is omitted the session user name used.Jobs executed as another users are accessible only to superuser.

Return value is set of records of typecron_job.See the type description inSQL types section.

schedule.clean_log()

Delete all records with information about completed jobs.Can be called by superuser only.

Returns the number of records deleted.

About

Various experiments with PostgreSQL clustering

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors38


[8]ページ先頭

©2009-2025 Movatter.jp