DiceTechnology/dice-fairlinkPublic

NotificationsYou must be signed in to change notification settings
Fork5
Star38

JDBC Driver for read-only connections on AWS RDS Clusters

License

MIT license

38 stars 5 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 297 Commits
.github/workflows		.github/workflows
cd		cd
images		images
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gpg.asc.enc		gpg.asc.enc
pom.xml		pom.xml

Repository files navigation

dice-fairlink is a JDBC driver designed to connect to the read replicas of an AWS Aurora cluster.The driver will periodically obtain a description of the cluster and despatch connections to each read replicaon a round-robin fashion.

dice-fairlink doesnot handle read/write connections

Why do we need dice-fairlink (TL/DR version)?

Because in many cases Aurora will not evenly distribute the connections amongst all the available read replicas.

How can I use dice-fairlink (TL/DR version)?

Add dice-fairlink as a dependency to your JVM project
Add software.amazon.awssdk:rds and software.amazon.awssdk:resourcegroupstaggingapi as dependencies of your project
Addfairlink as a jdbc sub-protocol to your connection string's schema
If using the AWS API discovery mode, change your connection string's host to the name of your AWS Aurora cluster. Use the cluster's read-only endpoint otherwise
Ensure the code running dice-fairlink as the correct IAM policies (see sections Member Discovery and Exclusion Discovery)

Here for version 1.x.x ?

Version 2.x.x of dice-fairlink is substantially different internally, in particular in terms of configuration and the IAM permissions it needs to run. Please see the README.md of versions 1.x.xhere.

Changes from 2.x.x:

Added SQL discovery for postgres

Changes from 1.x.x:

Renamed the sub-protocol fromauroraro tofairlink (auroraro is still accepted for backward compatibility and will be removed in version 3.0.0)
Added SQL discovery (only for MySQL. Other underlaying drivers can still use the AWS API)
Added a randomised poller start delay to avoid swarming the AWS API if many applications using dice fairlink are started at the same time
Refactored to make it more testable
Reified multiple concepts (connection string, configuration, member and exclusion discoverers)
Added the option to validate connections at the point they are discovered and before they are returned (enabled by default)
Made dice-fairlink's dependencies of the typeprovided
Upgraded the AWS API to version 2

Usage Examples

dice-fairlink implements a generic sub-protocol of any existing jdbc protocol (psql,mysql,etc). For AWS API discovery (see below) the host section of the URL should be the cluster identifier and not the hostname of any cluster or instance endpoint.

The driver will accept urls in the formjdbc:fairlink:XXXX and delegate the actual handling of the connection to the driver of the protocolXXXX (which needs to be loadable by the JVM classloader).

Example with AWS API discovery:

In a cluster namedmy-cluster with three read replicasmy-cluster-r1,my-cluster-r2 and,my-cluster-r3, andthe following connection string

StringconnectionString ="jdbc:fairlink:mysql://my-cluster/my-schema";

dice-fairlink will returnmy-cluster-r1 for the first connection request,my-cluster-r2 to the secondand,my-cluster-r3 to the third. The forth request for a connection will again returnmy-cluster-r1, and so forth.

A connection test is performed before the replica is returned to the application. This is controlled by thevalidateConnection property (see Driver Properties section).

In this example dice-fairlink will use the available mysql driver to establish the connection to the read replica.

Driver properties:

discoveryMode=AWS_APIreplicaEndpointTemplate=%s.-xxxxx.a-region.amazonaws.comauroraClusterRegion=a-region

Dynamic changes to the cluster (node promotions, removals and additions) are automatically detected.

Example with MySQL discovery:

In a cluster namedmy-cluster with three read replicasmy-cluster-r1,my-cluster-r2 and,my-cluster-r3, andthe following connection string

StringconnectionString ="jdbc:fairlink:mysql://my-cluster.cluster-xxxxx.a-region.amazonaws.com/my-schema";

A connection test is performed before the replica is returned to the application. This is controlled by thevalidateConnection property (see Driver Properties section).

Driver properties:

discoveryMode=SQL_MYSQLreplicaEndpointTemplate=%s.-xxxxx.a-region.amazonaws.comfallbackEndpoint=my-cluster.cluster-ro--xxxxx.region.amazonaws.comauroraClusterRegion=a-region

In this example dice-fairlink will use the available mysql driver to establish the connection to the read replica.

Dynamic changes to the cluster (node promotions, removals and additions) are automatically detected.

Example with Postgres discovery:

In a cluster namedmy-cluster with three read replicasmy-cluster-r1,my-cluster-r2 and,my-cluster-r3, andthe following connection string

StringconnectionString ="jdbc:fairlink:postgresql://my-cluster.cluster-xxxxx.a-region.amazonaws.com/my-schema";

A connection test is performed before the replica is returned to the application. This is controlled by thevalidateConnection property (see Driver Properties section).

Driver properties:

discoveryMode=SQL_POSTGRESreplicaEndpointTemplate=%s.-xxxxx.a-region.amazonaws.comfallbackEndpoint=my-cluster.cluster-ro--xxxxx.region.amazonaws.comauroraClusterRegion=a-region

In this example dice-fairlink will use the available mysql driver to establish the connection to the read replica.

Dynamic changes to the cluster (node promotions, removals and additions) are automatically detected.

Member discovery

dice-fairlink offers two options to discover the members of an Aurora cluster. It polls for any changes and update its internal state with the configured frequency (see Driver Properties section). The driver does this once when the driver is loaded, and enters the periodic poll cycle This after a random delay of up to 10 seconds. This is to avoid, in scenarios of large clusters of applications using dice-fairlink, exceeding the rate limits imposed by AWS.

Versions 1.x.x of dice-fairlink are very prone to this problem, as they make 1+2*number_of_replicas (O(n)) calls on each cycle. Versions 2.x.x make 2 (O(1)) calls per cycle when usingAWS API discovery mode, and 0 calls per cycle when usingMySQL mode. See the Exclusions section to learn about an additional call on both discovery modes.

AWS API Mode

dice-fairlink uses theRDS API to discover the members of a cluster, and to determine which cluster member is the writer. To use this discovery mode, the code must be executed by an IAM user with the following policy:

[   {"Effect":"Allow","Action":["rds:DescribeDBClusters"      ],"Resource":["arn:aws:rds:*:<account_id>:cluster:<cluster_name_regex>"      ]   },    {"Effect":"Allow","Action":["rds:DescribeDBInstances","rds:ListTagsForResource"      ],"Resource":"*"    }]

Note that the regex above can be merely* depending on how strict you want your permissions to be.

When using theAWS API discovery mode, thehost part of the connection string must be thecluster identifier. For example, for a cluster with the cluster endpointabc.cluster-xxxxxx.eu-west-1.rds.amazonaws.com, it must be set toabc. The cluster's read-only endpoint will be used asfallback should an error occur in refreshing the list of cluster members, unless the device propertyfallbackEndpoint is set (not recommended to be set to a specific replica).On this discovery mode, dice-fairlink will additionally check if all the replicas are in theAVAILABLE state. This is because, during a DB Instance deletion, RDS will remove the tagsbefore removingit from the list of cluster members. This can cause a period where the exclusion tag is no longer present, but the server is deleted shortly after, causing connections to be dropped, causing application errors.

If your cluster uses theMySQL engine, it is recommended to use theMySQL discovery mode, as it is gentler on the AWS API.

SQL-based discovery modes

dice-fairlink can connect to the aurora database and use specific queries AWS provides to obtain the list of cluster members.

A SQL-based discovery mode does not require any special IAM user permissions.

When using the a sql discovery mode, thehost part of the connection string must be acluster endpoint.We recommend using the cluster (writer) endpoint as the hostname of all fairlink's connection strings.

To avoid burdening the writer node with read-only statements in the event of a failure, set thefallbackEndpoint driver property (see Driver Properties section).

MySQL Mode

dice-fairlink uses theinformation_schema.replica_host_status table available on Aurora MySQL clusters to list the members of a given cluster. The credentials on the connection string, or on the driver properties will be used.

When using theMySQL discovery mode, thehost part of the connection string must be acluster endpoint. Even thoughAWS's documentation states either the writer or reader endpoint will work - in fact thatany instance would work -we have observed situations where querying any instance other than the writer will return apartial list of cluster members, which will limit the effectiveness of dice-fairlink.For this reason, we recommend using the cluster (writer) endpoint as the hostname of all fairlink's connection strings. To avoid burdening the writer node with read-only statements in the event of a failure,set thefallbackEndpoint driver property (see Driver Properties section).

For example, for a cluster with the cluster endpointabc.cluster-xxxxxx.eu-west-1.rds.amazonaws.com, it must be set toabc.cluster-xxxxxx.eu-west-1.rds.amazonaws.com. This host will be used as a fallback in the case clusters without any replicas or if an error occurs, unless the variablefallbackEndpoint is set to the cluster's read endpoint (recommended).

Postgres Mode

dice-fairlink uses theaurora_replication_status() function available on Aurora Postgres clusters to list the members of a given cluster. The credentials on the connection string, or on the driver properties will be used.

Connection validation

dice-fairlink validates every replica before making it available to the application. This is done once on every discovery cycle. On versions 1.x.x dice-fairlink would rely on theRDSAPI returning instances in theavailable status. On versions 2.x.x dice-fairlink executesSELECT 1 on each discovered replica, using the underlaying protocol, as specified on the connection string. This validation occurs regardless of the discovery mode, and be disabled via thevalidateConnection property (see Driver Properties section)

Exclusions discovery

It is possible to prevent members of an Aurora cluster to receive connections despatched by dice-fairlink. This is particularly useful when scaling in a cluster, as it allows the connections to a particular node to drain prior to deletion, and thus avoiding application errors.

To exclude a member, tag it with key=Fairlink-Exclude and value=true. dice-fairlinkwill treat them as not available and skip them when assigning connections. Tag changes will be picked up within the time specified intagsPollInterval (see Driver Properties section). Please note that versions 1.x.x would poll for this information with a frequency determined byreplicaPollInterval instead.

dice-fairlink uses theResource Group API to obtain the list of all excluded members. Unfortunately this API does not offer a way to obtain only the information pertaining to a specific cluster, and as a consequence dice-fairlink will hold information aboutall RDS instances tagged withFairlink-Exclude=true in the entire region of the AWS account in question. It is also not possible to restrict this via an IAM policy. It is assumed the size of this collection will not be problematic, and that making a single call (instead of 1 call per database instance) outweighs this waste.

For the discovery of exclusions (not possible to switch off in the current version) to work correctly, the code must be executed by an IAM user with the following policy:

[   {"Effect":"Allow","Action":["tag:GetResources","tag:GetTagValues","tag:GetTagKeys"      ],"Resource":"*"   }]

Why do we need dice-fairlink (long version)?

Using AWS Aurora clusters with database connection pools is a common use case. A possible configuration is to pointa connection pool to the cluster's read-only endpoint. AWS claims (here,here, andhere)that Aurora will send the new connections to different read replicas in a quasi-round-robin fashion. It is well documented onthe references above that Aurora does this based on the number of connections each of the replicas is holding at the time of receiving a new connection request. This is done via DNS, with a 1 second TTL. This means that, for a period of 1second, all new connection requests will be sent to the same read replica.

Example:Consider an Aurora cluster with the read endpoint atread-endpoint-url, and read replicasr1,r2,r3, andr4.Also consider an application using a fixed-sized connection pool of 10 connections, recycled every 30 minutes. Finally,consider we have a cluster of 3 servers running this application. When we launch the servers for the first time, thefollowing is a possible timeline (times in ms), starting from an idle cluster:

t0: Server 1 comes online and pre-fills the connection pool, sending 10 connection requests toread-endpoint-url
t0: Aurora directs 10 connections tor1
t500: Server 2 comes online and pre-fills the connection pool, sending 10 connection requests toread-endpoint-url
t750: Aurora directs 10 connections tor1
t1500: Server 3 comes online and pre-fills the connection pool, sending 10 connection requests toread-endpoint-url
t1500: Aurora directs 10 connections tor2

The ideal scenario would be 10 connection on each read-replica. Unfortunately, as Server 1 and Server 2 populated their connection pools with less than 1 seconds' difference, and Aurora has cached the name resolution ofread-endpoint-url tor1for 1 second starting ont0, Server 2's requests will also be sent tor1.r1 ends up serving 20 connections,r2 10 connections andr3 will be idle.

The fact Aurora's uses DNS to distribute the connections amongst the available read replicas can also be problematic due to other components of a solution. If any network agent (local server, router, etc) caches DNS resolutions, the results will become harder to predict. On top of this, Java can also cache DNS resolutions. It does so by defaultforever, or for 30 seconds depending on the JVM version and vendor.

What other options did we try before writing dice-fairlink ?

We tried the following, commutative, options

Controlling DNS

In a controlled environment we disabled Java DNS cache (seehere,orhere) and any other intermediate caches between the serverand the Aurora cluster.

result: this allowed us to achieve the results described on the previous section.

Tweaking pool parameters

We configured our connection pool to not be fixed-sized and to have a much lower connection maximum lifetime (2 minutes). Additionally we had a random (maximum 2.5% of the maximum lifetime) variance on the maximum lifetime for each pool generation. Finally, each application server had a different maximum connection lifetime.

The rationale was to try to disperse connection requests to the Aurora cluster as much as possible.

result: with the non-deterministic random variables did generate better distribution in some occasions. However, the random nature of this experiment also means that, in other occasions, a single read replica received all 30 connections. It is not simple to reliably set all the variables mentioned above in such a way that each server will request a connection to Aurora if and only if no other server has requested a connection in the previous second.

How does dice-fairlink solve this problem ?

dice-fairlink does not require using the Aurora cluster read only endpoint. Instead, it keeps a list of addresses for every read replica of a given cluster. When the client application (through a connection pool or otherwise) requests a connection to the jdbc driver, dice-fairlink selects the next read replica and delegates theactual establishing of the connection to the underlying jdbc driver (see usage examples). The frequency with which this list is refreshed is configurable (see driver parameters).

In the current version, dice-fairlink does not dynamically mark replicas as faulty, or try to despatch connections taking into account how busy each replica is. It simply returns the read replica that hasn't been returned for longer(round-robin).

Installation

dice-fairlink is available fromMaven Central, with the coordinatestechnology.dice.open:dice-fairlink.

If you are using Maven, just add the following dependency to yourpom.xml:

<dependency>   <groupId>technology.dice.open</groupId>   <artifactId>dice-fairlink</artifactId>   <version>x.y.z</version></dependency>

dice-fairlink's dependencies are not transitive, and depends only on threeprovided artifacts. The application using dice-fairlink must include the following dependencies:

<dependency>   <groupId>software.amazon.awssdk</groupId>   <artifactId>rds</artifactId>   <version>2.7.23</version></dependency>   <dependency>   <groupId>software.amazon.awssdk</groupId>   <artifactId>resourcegroupstaggingapi</artifactId>   <version>2.7.23</version></dependency>

Any version compatible with2.7.23 will work.

Additionally, the jdbc driver for the underlying protocol must be available at runtime (usually loaded via SPI).

Driver properties

dice-fairlink uses the AWS RDS Java SDK to obtain information about the cluster, and needs a valid authenticationsource to establish the connection. Two modes of authentication are supported:default_chain,environment orbasic. Dependingon the chosen mode, different driver properties are required. This is the full list of properties:

replicaEndpointTemplate: theString.format() template to generate the replica hostnames, given theirDBInstanceIdentifiers. The resulting URI (which will maintain all the connetion string's non-hostname parts) is where dice-fairlink will send connections to. Mandatory.
fallbackEndpoint: the fallback URI to despatch if no replicas are found or if an error occurs. default: thehost specified in the connection string.
discoveryMode:{'AWS_API'|'SQL_MYSQL}. default:AWS_API
auroraClusterRegion: the AWS region of the cluster to connect to. Mandatory unless environment variableAWS_DEFAULT_REGION is set. If both provided,the value from data source properties object has priority.
auroraDiscoveryAuthMode:{'default_chain'|'environment'|'basic'}. default:default_chain
auroraDiscoveryKeyId: the AWS key id to connect to the Aurora cluster. Mandatory if the authentication mode isbasic.Ignored otherwise.
auroraDiscoverKeySecret: the AWS key secret to connect to the Aurora cluster. Mandatory if the authentication mode isbasic.Ignored otherwise.
replicaPollInterval: the interval, in seconds, between each refresh of the list of read replicas. default:30
tagsPollInterval: the interval, in seconds, between each refresh of the list of excluded replicas. default:120
validateConnection:{'true'|'false}. default:true

all properties (including the list above) will be passed to the underlying driver.

Discovery authentication modes

In order to discover the members of a given cluster, dice-fairlink makes use of the AWS RDS SDK. This means the client application must provide some meansof authentication for dice-fairlink to execute the necessary API methods. See section Member Discovery for further information.

default_chain mode: will use the AWS library default provider chain. This is, as of version1.11.251, the following order: environment, system properties,user profile, EC2 container credentials
environment: reads the key and secret fromAWS_ACCESS_KEY_ID/AWS_ACCESS_KEY andAWS_SECRET_KEY/AWS_SECRET_ACCESS_KEY variables
basic: takes the credentials from two driver properties as defined above

Logging

To limit the dependencies of dice-fairlink, thejava.util.logging package is used for logging.Client applications may make use of the popularslf4j library, in which case the following block ofbootstrap code is necessary to connect the two logging systems:

SLF4JBridgeHandler.removeHandlersForRootLogger();SLF4JBridgeHandler.install();

additionally, the following dependency must be added to the project:

<dependency>    <groupId>org.slf4j</groupId>    <artifactId>jul-to-slf4j</artifactId>    <version>x.y.z</version></dependency>

This will direct thejava.util.logging logging statements to SLF4J, and make them available to anylogging backend aslogback orlog4j.

About

JDBC Driver for read-only connections on AWS RDS Clusters

Movatterモバイル変換

License

DiceTechnology/dice-fairlink

Folders and files

Latest commit

History

Repository files navigation

Why do we need dice-fairlink (TL/DR version)?

How can I use dice-fairlink (TL/DR version)?

Here for version 1.x.x ?

Usage Examples

Example with AWS API discovery:

Driver properties:

Example with MySQL discovery:

Driver properties:

Example with Postgres discovery:

Driver properties:

Member discovery

AWS API Mode

SQL-based discovery modes

MySQL Mode

Postgres Mode

Connection validation

Exclusions discovery

Why do we need dice-fairlink (long version)?

What other options did we try before writing dice-fairlink ?

Controlling DNS

Tweaking pool parameters

How does dice-fairlink solve this problem ?

Installation

Driver properties

Discovery authentication modes

Logging

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors6

Uh oh!

Languages

Packages