|
| 1 | +=== |
| 2 | +xtm |
| 3 | +=== |
| 4 | + |
| 5 | +Distributed transaction management tools for PostgreSQL. |
| 6 | + |
| 7 | +-------------------- |
| 8 | +Communication scheme |
| 9 | +-------------------- |
| 10 | + ┏━━━━━━━━━┓ |
| 11 | + ┌────────┨ Backend ┠──────────┐ |
| 12 | + │ ┗━━━━━━━━━┛ │ |
| 13 | +┏━━━━┷━━━━┓ ┏━━━━━━━━━┓ ┏━━━━━━┷━━━━━━┓ |
| 14 | +┃ Arbiter ┠───┨ Backend ┠───┨ Coordinator ┃ |
| 15 | +┗━━━━┯━━━━┛ ┗━━━━━━━━━┛ ┗━━━━━━┯━━━━━━┛ |
| 16 | + │ ┏━━━━━━━━━┓ │ |
| 17 | + └──┬─────┨ Backend ┠───────┬──┘ |
| 18 | + ┆ ┗━━━━━━━━━┛ ┆ |
| 19 | + libdtm + libsockhub libpq + xtm procs |
| 20 | + |
| 21 | +----------------------- |
| 22 | +Coordinator-Backend API |
| 23 | +----------------------- |
| 24 | + |
| 25 | +This API includes a set of postgres procedures that |
| 26 | +the coordinator can call with "select" statement. |
| 27 | + |
| 28 | +FIXME: actualize the API |
| 29 | + |
| 30 | +------------------------ |
| 31 | +Backend-Arbiter Protocol |
| 32 | +------------------------ |
| 33 | + |
| 34 | +The underlying protocol (libsockhub) also transmits the message length, so |
| 35 | +there is no need in 'argc'. Every command or reply is a series of int64 |
| 36 | +numbers. |
| 37 | + |
| 38 | +The format of all commands: |
| 39 | +[cmd, argv[0], argv[1], ...] |
| 40 | + |
| 41 | +'cmd' is a command. |
| 42 | +'argv[i]' are the arguments. |
| 43 | + |
| 44 | +The commands: |
| 45 | + |
| 46 | +'r': reserve(minxid, minsize) |
| 47 | +Claims a sequence ≥ minsize of xids ≥ minxid for local usage. This will |
| 48 | +prevent the arbiter from using those values for global transactions. |
| 49 | + |
| 50 | +The arbiter replies with: |
| 51 | +[RES_OK, min, max] if reserved a range [min, max] |
| 52 | +[RES_FAILED] on failure |
| 53 | + |
| 54 | +'b': begin(size) |
| 55 | +Starts a global transaction and assign a 'xid' to it. 'size' is used |
| 56 | +for vote results calculation. The arbiter also creates and returns the |
| 57 | +snapshot. |
| 58 | + |
| 59 | +The arbiter replies with: |
| 60 | +[RES_OK, xid, *snapshot] if transaction started successfully |
| 61 | +[RES_FAILED] on failure |
| 62 | + |
| 63 | +See the 'snapshot' command description for the snapshot format. |
| 64 | + |
| 65 | +'s': status(xid, wait) |
| 66 | +Asks the arbiter about the status of the global transaction identified |
| 67 | +by the given 'xid'. |
| 68 | + |
| 69 | +If 'wait' is 1, the arbiter will not reply until it considers the |
| 70 | +transaction finished (all nodes voted, or one dead). |
| 71 | + |
| 72 | +The arbiter replies with: |
| 73 | +[RES_TRANSACTION_UNKNOWN] if not started |
| 74 | +[RES_TRANSACTION_COMMITTED] if committed |
| 75 | +[RES_TRANSACTION_ABORTED] if aborted |
| 76 | +[RES_TRANSACTION_INPROGRESS] if in progress |
| 77 | +[RES_FAILED] if failed |
| 78 | + |
| 79 | +'y': for(xid, wait) |
| 80 | +Tells the arbiter that this node votes for commit of the global |
| 81 | +transaction identified by the given 'xid'. |
| 82 | + |
| 83 | +The reply and 'wait' logic is the same as for the 'status' command. |
| 84 | + |
| 85 | +'n': against(xid, wait) |
| 86 | +Tells the arbiter that this node votes againts commit of the global |
| 87 | +transaction identified by the given 'xid'. |
| 88 | + |
| 89 | +The reply and 'wait' logic is the same as for the 'status' command. |
| 90 | + |
| 91 | +'h': snapshot(xid) |
| 92 | +Tells the arbiter to generate a snapshot for the global transaction |
| 93 | +identified by the given 'xid'. The arbiter will create a snapshot for |
| 94 | +every participant, so when each of them asks for the snapshot it will |
| 95 | +reply with the same snapshot. The arbiter generates a fresh version if |
| 96 | +the same client asks for a snapshot again for the same transaction. |
| 97 | + |
| 98 | +Joins the global transaction identified by the given 'xid', if not |
| 99 | +joined already. |
| 100 | + |
| 101 | +The arbiter replies with [RES_OK, gxmin, xmin, xmax, xcnt, xip[0], xip[1]...], |
| 102 | +where 'gxmin' is the smallest xmin among all available snapshots. |
| 103 | + |
| 104 | +In case of a failure, the arbiter replies with [RES_FAILED]. |