Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Ktor/Vertx spring-actuator style library - healthchecks, logging, database

License

NotificationsYou must be signed in to change notification settings

sksamuel/cohort

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

main

Cohort is aSpring Actuator stylereplacement forKtor andVertx. It provides health checks for orchestrators like Kubernetes and management of logging, databases, JVM settings, memory and threads in production.

Seechangelog

Features

All features are disabled by default.

  • Comprehensive system healthchecks: Expose healthcheck endpoints that check for thread deadlocks, memory usage,disk space, cpu usage, garbage collection and more.
  • Resource healthchecks: Additional modules to monitor the health of Redis, Kafka, Elasticsearch, databases andother resources.
  • Micrometer integration: Send healthcheck metrics to amicrometer registry, so you can see whichhealthchecks are consistently failing or flakely.
  • Database pools: See runtime metrics such as active and idle connections in database pools such as HikariConnection Pool.
  • JVM Info: Enable endpoints to export system properties, JVM arguments and version information, and O/S name /version.
  • Thread and heap dumps: Optional endpoints to export a thread dump or heap dump, in the standard JVM format, foranalysis locally.
  • Database migrations: See the status of applied and pending database migrations fromeitherFlyway orLiquibase.
  • Logging configuration: View configured loggers and levels and modify log levels at runtime.

How to use

For ktor projects:

Include the following dependencies in your build:

  • com.sksamuel.cohort:cohort-ktor:<version>

Then to wire into Ktor, install theCohort plugin, and enable whichever features / endpoints we want to expose.Remember, endpoints are disabled by default for security, and you must enable them.

Here is a sample configuration with each feature enabled.

install(Cohort) {// enable an endpoint to display operating system name and version   operatingSystem=true// enable runtime JVM information such as vm options and vendor name   jvmInfo=true// configure the Logback log manager to show effective log levels and allow runtime adjustment   logManager=LogbackManager// show connection pool information   dataSources=listOf(HikariDataSourceManager(ds))// show current system properties   sysprops=true// enable an endpoint to dump the heap in hprof format   heapdump=true// enable an endpoint to dump threads   threaddump=true// enable healthchecks for kubernetes// each of these is optional and can map to any healthcheck url you wish// for example if you just want a single health endpoint, you could use /health   healthcheck("/liveness", livechecks)   healthcheck("/readiness", readychecks)   healthcheck("/startup", startupchecks)}

For vertx projects:

Include the following dependencies in your build:

  • com.sksamuel.cohort:cohort-vertx:<version>

Then in your app, create acohort object using theinitializeCohort function. Then pass that object to therouter.cohort extension function.Note: If you are deploying multiple instances of a verticle, be sure to initialize cohort outside of the verticle and pass in the instance. Otherwise, each verticle will create its own set of background checks.

Here is a sample configuration with each feature enabled.

val cohort= initializeCohort {// enable an endpoint to display operating system name and version   operatingSystem=true// enable runtime JVM information such as vm options and vendor name   jvmInfo=true// configure the Logback log manager to show effective log levels and allow runtime adjustment   logManager=LogbackManager// show connection pool information   dataSources=listOf(HikariDataSourceManager(ds))// show current system properties   sysprops=true// enable an endpoint to dump the heap in hprof format   heapdump=true// enable an endpoint to dump threads   threaddump=true// enable healthchecks for kubernetes// each of these is optional and can map to any healthcheck url you wish// for example if you just want a single health endpoint, you could use /health   healthcheck("/liveness", livenessChecks)   healthcheck("/readiness", readinessChecks)   healthcheck("/startup", startupChecks)}val vertx=Vertx.vertx()val router=Router.router(vertx)router.cohort(cohort)

Other modules

Finally, add any additional modules for any features you wish to activate. For example the kafka modulerequirescom.sksamuel.cohort:cohort-kafka:<version>.

Healthchecks

Cohort providesHealthChecks for a variety of JVM metrics such as memory and thread deadlocks as well as connectivityto services such as Kafka and Elasticsearch and databases.

We use health checks by adding them to aHealthCheckRegistry instance, along with an interval of how often to run thechecks. A registry requires acoroutine dispatcher to execute the checks on. Healthchecks can take advantage ofcoroutines to suspend if they need to do something IO based. Cohort will periodically run these healthchecks based onthe passed schedule and record if they are healthy or unhealthy.

For example:

val checks=HealthCheckRegistry(Dispatchers.Default) {// detects if threads are mutually blocked on each others locks   register(ThreadDeadlockHealthCheck(),1.minutes)// checks that we always have at least one database connection open   register(HikariConnectionsHealthCheck(ds,1),5.seconds)}

With the registry created, we register it with Cohort by invoking thehealthcheck method along with an endpoint url toexpose it on.

For example:

install(Cohort) {   healthcheck("/healthcheck", checks)}

Whenever the endpoint is accessed, a200 is returned if all health checks are currently reporting healthy, and a500otherwise.

Which healthchecks you use is entirely up to you, and you may want to use some healthchecks for startup probes, some forreadiness checks and some for liveness checks. See the section onkubernetes for discussion on how tostructure healthchecks in a kubernetes environment.

If you wish to output the results of each metric scan, you can hook intomicrometer.

Available Healthchecks

This table lists the available health checks and their uses.

HealthcheckModuleDetails
AvailableCoresHealthCheckcohort-coreChecks for a minimum number of available CPU cores. While the number of cores won't change during the lifetime of a pod, this check can be useful to avoid accidentally deploying pods into environments that don't have the required resources.
DaemonThreadsHealthCheckcohort-coreChecks that the number of daemon threads does not exceed a threshold.
DatabaseConnectionHealthCheckcohort-coreChecks that a database connection can be retrieved from aDataSource and that the connection is valid. This healthcheck is useful to determine if aDataSource is becoming contended and cannnot return a connection in a timely manner.
DbcpConnectionsHealthCheckcohort-dbcpChecks that the number of connections in an Apache DBCP2 connection pool is at least equal to a min value.
DbcpMinIdleHealthCheckcohort-dbcpChecks that the number of idle connections in an Apache DBCP2 connection pool is at least equal to a min value.
DiskSpaceHealthCheckcohort-coreChecks that the available disk space on a filestore is below a threshold.
DynamoDBHealthCheckcohort-aws-dynamoChecks connectivity to an AWS DynamoDB instance.
ElasticClusterHealthCheckcohort-elasticChecks that an elasticsearch cluster is reachable and the cluster is in "green" state.
ElasticClusterCommandCheckcohort-elasticExecutes an arbitrary command against an elasticsearch cluster.
ElasticIndexHealthCheckcohort-elasticChecks that a topic exists on an elastic cluster, with an optional setting to fail if the topic is empty.
FreememHealthCheckcohort-coreChecks that the available freemem is above a threshold.
GarbageCollectionTimeCheckcohort-coreChecks that the time spent in GC is below a threshold. The time is specified as a percentage and is calculated as the period between invocations.
HikariConnectionsHealthCheckcohort-hikariConfirms that the number of connections in a Hikari DataSource connection pool is equal or above a threshold. This is useful to ensure a required number of connections are open before accepting traffic.
HikariMinIdleHealthCheckcohort-hikariChecks that the number of idle connections in an Hikari DataSource connection pool is at least equal to a min value.
HikariPendingThreadsHealthCheckcohort-hikariChecks that the number of threads awaiting a connection from a Hikari DataSource is below a threshold. This is useful to detect when queries are running slowly and causing threads to back up waiting for a connection
HotSpotCompilationTimeHealthCheckcohort-coreIs healthy once a specified HotSpot compilation time is reached
HttpHealthCheckcohort-httpAttempts to connect to a given HTTP host/port/method.
KafkaClusterHealthCheckcohort-kafkaConfirms that a Kafka client can connect to a Kafka cluster.
KafkaLastPollHealthCheckcohort-kafkaAsserts the last poll time for a Kafka Consumer was within a set threshold.
KafkaConsumerCountHealthCheckcohort-kafkaChecks that a Kafka Consumer is consuming a minimum number of records between health checks.
KafkaProducerCountHealthCheckcohort-kafkaChecks that a Kafka Producer is producing a minimum number of records between health checks.
KafkaTopicHealthCheckcohort-kafkaConfirms that a Kafka cluster can be reached and a topic exists.
KafkaConsumerSubscriptionHealthCheckcohort-kafkaChecks that a Kafka consumer is subscribed to specified (or any) topic.
LiveThreadsHealthCheckcohort-coreChecks that the number of live threads does not exceed a value
LoadedClassesHealthCheckcohort-coreChecks that the number of loaded classes is below a threshold
MaxFileDescriptorsHealthCheckcohort-coreChecks that the number of max file descriptors is at least a required level.
OpenFileDescriptorsHealthCheckcohort-coreChecks that the number of open file descriptors is below a threshold.
PeakThreadsHealthCheckcohort-coreChecks that the number of peak threads does not exceed a threshold.
MongoConnectionHealthCheckcohort-mongoChecks for connectivity to a Mongo instance.
ProcessCpuHealthCheckcohort-coreChecks that the process cpu is below a threshold.
RabbitConnectionHealthCheckcohort-rabbitChecks for connectivity to a RabbitMQ instance.
RabbitQueueHealthCheckcohort-rabbitChecks for connectivity to, and existence of, a RabbitMQ queue.
RedisClusterHealthCheckcohort-redisConfirms that a connection can be opened to a Redis cluster using a Jedis client. Can optionally execute an arbitrary command.
RedisHealthCheckcohort-redisConfirms that a connection can be opened to a Redis instance using a Jedis client. Can optionally execute an arbitrary command.
RedisClusterHealthCheckcohort-lettuceConfirms that a connection can be opened to a Redis cluster using a Lettuce client. Can optionally execute an arbitrary command.
RedisHealthCheckcohort-lettuceConfirms that a connection can be opened to a Redis instance using a Lettuce client. Can optionally execute an arbitrary command.
StartedThreadsHealthCheckcohort-corethat the number of created and started threads does not exceed a threshold.
S3ReadBucketHealthCheckcohort-aws-s3Checks for connectivity and permissions to read from an S3 bucket.
SQSQueueHealthCheckcohort-aws-sqsChecks for connectivity and existence of an SQS queue.
SNSHealthCheckcohort-aws-snsChecks for connectivity and existence of an SQS queue.
SystemCpuHealthCheckcohort-coreChecks that the maximum system cpu is below a threshold.
SystemLoadHealthCheckcohort-coreChecks that the maximum system load is below a threshold.
TcpHealthCheckcohort-coreAttempts to ping a given host and port within a time period. Can be used to check connectivity to an arbitrary socket.
ThreadDeadlockHealthCheckcohort-coreChecks for the presence of deadlocked threads. A single deadlocked thread marks this check as unhealthy.
ThreadStateHealthCheckcohort-coreChecks that the the number of threads in a given state does not exceed a value. For example, you could specify that the max number of BLOCKED threads is 100.

Kubernetes

A Kubernetes kubelet offers three kinds of probes to know the status of a container.

  • liveness - Indicates whether the container is running. If the liveness probe fails, the kubelet kills thecontainer (and restarts subject to the restart policy).
  • readiness - Indicates whether the container is ready to respond to requests. If the readiness probe fails, thekubelet removes the pod from receiving traffic.
  • startup - Indicates whether the application within the container has started. All other probes are disabled if astartup probe is provided, until it succeeds.

The kubelet uses liveness probes to know when to restart a container. Liveness probes help catch a situation where anapplication is running but is no longer useful. One such example is if a thread has stopped and the application does nothave code to detect and restart the thread. Restarting a container in such a state can make the application availableagain despite the presence of bugs.

The kubelet uses readiness probes to know when a container should receive traffic. A pod is considered ready when all ofits containers are ready. One use of this signal is to temporarily remove traffic from backends when they are unable tohandle any more requests. For example, a service may have received more requests than it can handle, and so it's backlogof requests is growing. Taking that pod out of the load balancers while it catches up can avoid the service crashing orneeding a restart. A pod with containers reporting that they are not ready does not receive traffic through KubernetesServices.

Readiness probes are not a substitute for proper scaling (either HPA or manually) but they can avoid a situation whereall pods are killed, and a service is completely unavailable.

The kubelet uses startup probes to know when a container application has fully started. If such a probe is configured,it disables liveness and readiness checks until it succeeds, making sure those probes don't interfere with theapplication startup. Startup probes are very useful if an application needs to perform slow initialization work anduntil that is complete, a liveness check would fail. This avoids situation where the failing liveness checks result inthe kubelet killing the pod before it is ready.

Healthcheck Endpoint Output

Here is an example of output from a health check with a series of configured health checks.

[   {"name":"com.sksamuel.cohort.memory.FreememHealthCheck","healthy":true,"lastCheck":"2022-03-15T03:01:09.445932Z","message":"Freemem is above threshold [433441040 >= 67108864]","cause":null,"consecutiveSuccesses":75,"consecutiveFailures":0   },   {"name":"com.sksamuel.cohort.system.OpenFileDescriptorsHealthCheck","healthy":true,"lastCheck":"2022-03-15T03:01:09.429469Z","message":"Open file descriptor count within threshold [209 <= 16000]","cause":null,"consecutiveSuccesses":25,"consecutiveFailures":0   },   {"name":"com.sksamuel.cohort.memory.GarbageCollectionTimeCheck","healthy":true,"lastCheck":"2022-03-15T03:00:54.422194Z","message":"GC Collection time was 0% [Max is 25]","cause":null,"consecutiveSuccesses":6,"consecutiveFailures":0   },   {"name":"writer connections","healthy":true,"lastCheck":"2022-03-15T03:01:09.445868Z","message":"Database connections is equal or above threshold [8 >= 8]","cause":null,"consecutiveSuccesses":75,"consecutiveFailures":0   },   {"name":"reader connections","healthy":true,"lastCheck":"2022-03-15T03:01:09.445841Z","message":"Database connections is equal or above threshold [8 >= 8]","cause":null,"consecutiveSuccesses":75,"consecutiveFailures":0   },   {"name":"com.sksamuel.cohort.system.SystemCpuHealthCheck","healthy":true,"lastCheck":"2022-03-15T03:01:09.463421Z","message":"System CPU is below threshold [0.12667261373773417 < 0.9]","cause":null,"consecutiveSuccesses":75,"consecutiveFailures":0   },   {"name":"com.sksamuel.cohort.threads.ThreadDeadlockHealthCheck","healthy":true,"lastCheck":"2022-03-15T03:00:54.419733Z","message":"There are 0 deadlocked threads","cause":null,"consecutiveSuccesses":6,"consecutiveFailures":0   }]

Micrometer

Cohort will send healthcheck metrics to micrometer if configured. Add thecohort-micrometer module and then bind aninstance ofCohortMetrics to both your healthcheck registry, and the micrometer registry.

For example:

val micrometerRegistry=DatadogMeterRegistry(..)// or any other registryval healthcheckRegistry=HealthCheckRegistry(Dispatchers.Default) {   register("foo",FooHealthCheck,5.seconds)   register("bar",BarHealthCheck,3.seconds)}CohortMetrics(healthcheckRegistry).bindTo(micrometerRegistry)

Each health check will emit a metric under the keycohort.healthcheck with aname,type andhealthy tag.

Logging

Cohort allows you to view the current logging configuration and update log levels at runtime.

To enable this, pass an instance of theLogManager interface for the logging framework you are using tothelogManagerparameter in the Cohort plugin configuration.

Once enabled, the endpointGET /cohort/logging can be used to show current log informationandPUT /cohort/logging/{name}/{level} can be used to modify a log level at runtime.

Cohort currently supports twoLogManager implementations:

  • LogbackManager - add modulecom.sksamuel.cohort:cohort-logback:<version>
  • Log4j2Manager - add modulecom.sksamuel.cohort:cohort-log4j2:<version>

For example, for projects that use logback, you can configure like this:

install(Cohort) {   logManager=LogbackManager}

Here is the example output of which shows the logging configuration:

{"levels": ["DEBUG","TRACE","INFO","ERROR","OFF","WARN"   ],"loggers": [      {"name":"ROOT","level":"INFO"      },      {"name":"com","level":"INFO"      },      {"name":"com.sksamuel","level":"INFO"      },      {"name":"ktor","level":"INFO"      },      {"name":"ktor.application","level":"INFO"      },      {"name":"org","level":"INFO"      },      {"name":"org.apache","level":"INFO"      },      {"name":"org.apache.kafka","level":"WARN"      }   ]}

Jvm Info

Displays information about the JVM state, including VM options, JVM version, and vendor name.

To enable, setjvmInfo to true inside theCohort ktor configuration block:

install(Cohort) {   jvmInfo=true}
{"name":"106637@sam-H310M-A-2-0","pid":106637,"vmOptions": ["-Dvisualvm.id=32227655111670","-javaagent:/home/sam/development/idea-IU-213.5744.125/lib/idea_rt.jar=36667:/home/sam/development/idea-IU-213.5744.125/bin","-Dfile.encoding=UTF-8"   ],"classPath":"/home/sam/development/workspace/......","specName":"Java Virtual Machine Specification","specVendor":"Oracle Corporation","specVersion":"11","vmName":"OpenJDK 64-Bit Server VM","vmVendor":"AdoptOpenJDK","vmVersion":"11.0.10+9","startTime":1647315704746,"uptime":405278}

Operating System

Displays the running os and version.

To enable, setoperatingSystem to true inside theCohort ktor configuration block:

install(Cohort) {   operatingSystem=true}
{"arch":"amd64","name":"Linux","version":"5.13.0-35-generic"}

Datasources

By passing one or more database pools to Cohort, you can see at runtime the current state of the pool(s). Once enabled,a GET request to/cohort/datasources will return information such as idle connection count, max pool size, connectiontimeouts and so on.

Cohort supports two connection pool libraries:

  • Apache Commons DBCP - add modulecom.sksamuel.cohort:cohort-dbcp
  • HikariCP - add modulecom.sksamuel.cohort:cohort-hikari

To activate this feature, wrap yourDataSource in an appropriateDataSourceManager instance and pass through to theCohort plugin.

For example, if we had two connection pools, a writer pool using Hikari, and a reader pool using Apache DBCP, then wecould configure like this:

install(Cohort) {   dataSources=listOf(ApacheDBCPDataSourceManager(reader),HikariDataSourceManager(writer),   )}

Here is an example output for the above datasources:

[   {"name":"writer","activeConnections":0,"idleConnections":8,"totalConnections":8,"threadsAwaitingConnection":0,"connectionTimeout":30000,"idleTimeout":600000,"maxLifetime":1800000,"leakDetectionThreshold":0,"maximumPoolSize":16,"validationTimeout":5000   },   {"name":"reader","activeConnections":0,"idleConnections":8,"totalConnections":8,"threadsAwaitingConnection":0,"connectionTimeout":30000,"idleTimeout":600000,"maxLifetime":1800000,"leakDetectionThreshold":0,"maximumPoolSize":16,"validationTimeout":5000   }]

System Properties

Send a GET request to/cohort/sysprops to return the current system properties.

To enable, setsysprops to true inside theCohort plugin configuration block:

install(Cohort) {   sysprops=true}

Here is an example of the output:

{"sun.jnu.encoding":"UTF-8","java.vm.vendor":"AdoptOpenJDK","java.vendor.url":"https://adoptopenjdk.net/","user.timezone":"America/Chicago","os.name":"Linux","java.vm.specification.version":"11","user.country":"US","sun.boot.library.path":"/home/sam/.sdkman/candidates/java/11.0.10.hs-adpt/lib","sun.java.command":"com.myapp.MainKt","user.home":"/home/sam","java.version.date":"2021-01-19","java.home":"/home/sam/.sdkman/candidates/java/11.0.10.hs-adpt","file.separator":"/","java.vm.compressedOopsMode":"Zero based","line.separator":"\n","java.specification.name":"Java Platform API Specification","java.vm.specification.vendor":"Oracle Corporation","sun.management.compiler":"HotSpot 64-Bit Tiered Compilers","java.runtime.version":"11.0.10+9","user.name":"sam","path.separator":":","file.encoding":"UTF-8","java.vm.name":"OpenJDK 64-Bit Server VM","user.dir":"/home/sam/development/workspace/myapp","os.arch":"amd64","java.vm.specification.name":"Java Virtual Machine Specification","java.awt.printerjob":"sun.print.PSPrinterJob","java.class.version":"55.0"}

Heap Dump

Send a GET request to/cohort/heapdump to retrieve a heap dump for all live objects.

The file returned is in the format used byhprof.

To enable, setheapdump to true inside theCohort plugin configuration block:

install(Cohort) {   heapdump=true}

Thread Dump

Send a GET request to/cohort/threaddump to retrieve a thread dump for all current threads.

To enable, setthreaddump to true inside theCohort plugin configuration block:

install(Cohort) {   threaddump=true}

Example output:

"main" prio=5 Id=1 WAITING on io.netty.channel.AbstractChannel$CloseFuture@291c536cat java.base@11.0.10/java.lang.Object.wait(Native Method)-  waiting on io.netty.channel.AbstractChannel$CloseFuture@291c536cat java.base@11.0.10/java.lang.Object.wait(Object.java:328)at app//io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:253)at app//io.netty.channel.DefaultChannelPromise.await(DefaultChannelPromise.java:131)at app//io.netty.channel.DefaultChannelPromise.await(DefaultChannelPromise.java:30)at app//io.netty.util.concurrent.DefaultPromise.sync(DefaultPromise.java:404)at app//io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:119)at app//io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:30)..."Reference Handler" daemon prio=10 Id=2 RUNNABLEat java.base@11.0.10/java.lang.ref.Reference.waitForReferencePendingList(Native Method)at java.base@11.0.10/java.lang.ref.Reference.processPendingReferences(Reference.java:241)at java.base@11.0.10/java.lang.ref.Reference$ReferenceHandler.run(Reference.java:213)"Finalizer" daemon prio=8 Id=3 WAITING on java.lang.ref.ReferenceQueue$Lock@1392e5c7at java.base@11.0.10/java.lang.Object.wait(Native Method)-  waiting on java.lang.ref.ReferenceQueue$Lock@1392e5c7at java.base@11.0.10/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:155)at java.base@11.0.10/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:176)at java.base@11.0.10/java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:170)"Signal Dispatcher" daemon prio=9 Id=4 RUNNABLE"Common-Cleaner" daemon prio=8 Id=19 TIMED_WAITING on java.lang.ref.ReferenceQueue$Lock@78e1959dat java.base@11.0.10/java.lang.Object.wait(Native Method)-  waiting on java.lang.ref.ReferenceQueue$Lock@78e1959dat java.base@11.0.10/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:155)at java.base@11.0.10/jdk.internal.ref.CleanerImpl.run(CleanerImpl.java:148)at java.base@11.0.10/java.lang.Thread.run(Thread.java:834)at java.base@11.0.10/jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:134)

About

Ktor/Vertx spring-actuator style library - healthchecks, logging, database

Topics

Resources

License

Stars

Watchers

Forks

Languages


[8]ページ先頭

©2009-2025 Movatter.jp