- Notifications
You must be signed in to change notification settings - Fork24
A library to improve the resilience of Go applications in an easy and flexible way
License
slok/goresilience
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Goresilience is a Go toolkit to increase the resilience of applications. Inspired by hystrix and similar libraries at it's core but at the same time very different:
- Increase resilience of the programs.
- Easy to extend, test and with clean design.
- Go idiomatic.
- Use the decorator pattern (middleware), like Go's http.Handler does.
- Ability to create custom resilience flows, simple, advanced, specific... by combining different runners in chains.
- Safety defaults.
- Not couple to any framework/library.
- Prometheus/Openmetrics metrics as first class citizen.
- Motivation
- Getting started
- Static Runners
- Adaptive Runners
- Other
- Architecture
- Extend using your own runners
You are wondering, why another circuit breaker library...?
Well, this is not a circuit breaker library. Is true that Go has some good circuit breaker libraries (likesony/gobreaker,afex/hystrix-go orrubyist/circuitbreaker). But there is a lack a resilience toolkit that is easy to extend, customize and establishes a design that can be extended, that's why goresilience born.
The aim of goresilience is to use the library with the resilience runners that can be combined or used independently depending on the execution logic nature (complex, simple, performance required, very reliable...).
Also one of the key parts of goresilience is the extension to create new runners yourself and use it in combination with the bulkhead, the circuitbreaker or any of the runners of this library or from others.
The usage of the library is simple. Everything is based onRunner
interface.
The runners can be used in two ways, in standalone mode (one runner):
package mainimport ("context""log""time""github.com/slok/goresilience/timeout")funcmain() {// Create our command.cmd:=timeout.New(timeout.Config{Timeout:100*time.Millisecond, })fori:=0;i<200;i++ {// Execute.result:=""err:=cmd.Run(context.TODO(),func(_ context.Context)error {iftime.Now().Nanosecond()%2==0 {time.Sleep(5*time.Second) }result="all ok"returnnil })iferr!=nil {result="not ok, but fallback" }log.Printf("the result is: %s",result) }}
or combining in a chain of multiple runners by combining runner middlewares. In this example the execution will be retried timeout and concurrency controlled using a runner chain:
package mainimport ("context""errors""fmt""github.com/slok/goresilience""github.com/slok/goresilience/bulkhead""github.com/slok/goresilience/retry""github.com/slok/goresilience/timeout")funcmain() {// Create our execution chain.cmd:=goresilience.RunnerChain(bulkhead.NewMiddleware(bulkhead.Config{}),retry.NewMiddleware(retry.Config{}),timeout.NewMiddleware(timeout.Config{}), )// Execute.calledCounter:=0result:=""err:=cmd.Run(context.TODO(),func(_ context.Context)error {calledCounter++ifcalledCounter%2==0 {returnerrors.New("you didn't expect this error") }result="all ok"returnnil })iferr!=nil {result="not ok, but fallback" }fmt.Printf("result: %s",result)}
As you see, you could create any combination of resilient execution flows by combining the different runners of the toolkit.
Static runners are the ones that based on a static configuration and don't change based on the environment (unlike the adaptive ones).
This runner is based on timeout pattern, it will execute thegoresilience.Func
but if the execution duration is greater than a T duration timeout it will return a timeout error.
Checkexample.
This runner is based on retry pattern, it will retry the execution ofgoresilience.Func
in case it failed N times.
It will use a exponential backoff with some jitter (for more information checkthis)
Checkexample.
This runner is based onbulkhead pattern, it will control the concurrency ofgoresilience.Func
executions using the same runner.
It also can timeout if agoresilience.Func
has been waiting too much to be executed on a queue of execution.
Checkexample.
This runner is based oncircuitbreaker pattern, it will be storing the results of the executedgoresilience.Func
in N buckets of T time to change the state of the circuit based on those measured metrics.
Checkexample.
This runner is based onfailure injection of errors and latency. It will inject those failures on the required executions (based on percent or all).
Checkexample.
Concurrency limit is based on Netflixconcurrency-limit library. It tries to implement the same features but for goresilience library (nd compatible with other runners).
It limits the concurrency based on less configuration and adaptive based on the environment is running on that moment, hardware, load...
This Runner will limit the concurrency (like bulkhead) but it will use different TCP congestion algorithms to adapt the concurrency limit based on errors and latency.
The Runner is based on 4 components.
- Limiter: This is the one that will measure and calculate the limit of concurrency based on different algorithms that can be choose, for exampleAIMD.
- Executor: This is the one executing the
goresilience.Func
itself, it has different queuing implementations that will prioritize and drop executions based on the implementations. - Runner: This is the runner itself that will be used by the user and is the glue of the
Limiter
and theExecutor
. This will had a policy that will treat the execution result as an error, success or ignore for the Limiter algorithm. - Result policy: This is a function that can be configured on the concurrencylimit Runner. This function receives the result of the executed function and returns a result for the limit algorithm. This policy is responsible to tell the limit algorithm if the received error should be count as a success, failure or ignore on the calculation of the concurrency limit. For example: only count the errors that have been 502 other ones ignore.
CheckAIMD example.CheckCoDel example.
FIFO
: This executor is the default one it will execute the queue jobs in a first-in-first-out order and also has a queue wait timeout.LIFO
: This executor will execute the queue jobs in a last-in-first-out order and also has a queue wait timeout.AdaptiveLIFOCodel
: Implementation of Facebook'sCoDel+adaptive LIFO algorithm. This executor is used withStatic
limiter.
Static
: This limiter will set a constant limit that will not change.AIMD
: This limiter is based onAIMD TCP congestion algorithm. It increases the limit at a constant rate and when congestion occurs (by timeout or result failure) it will decrease by a configured factor
FailureOnExternalErrorPolicy
: Will treat as failure every error that is not from concurrencylimit package.NoFailurePolicy
: Will never return a failure, just ignore when an error occurs, this can be used to adapt only on RTT/latency.FailureOnRejectedPolicy
: Will treat as failure every time the execution has been rejected with aerrors.ErrRejectedExecution
error.
All the runners can be measured using ametrics.Recorder
, but instead of passing to every runner, the runners will try to get this recorder from the context. So you can wrap any runner usingmetrics.NewMiddleware
and it will activate the metrics support on the wrapped runners. This should be the first runner of the chain.
At this moment onlyPrometheus is supported.
In thisexample the runners are measured.
Measuring has always a performance hit (not too high), on most cases is not a problem, but there is a benchmark to see what are the numbers:
BenchmarkMeasuredRunner/Without_measurement_(Dummy).-4 300000 6580 ns/op 677 B/op 12 allocs/opBenchmarkMeasuredRunner/With_prometheus_measurement.-4 200000 12901 ns/op 752 B/op 15 allocs/op
Using the different runners a hystrix like library flow can be obtained. You can see a simple example of how it can be done on thisexample
Creating HTTP middlewares with goresilience runners is simple and clean. You can see an example of how it can be done on thisexample. The example shows how you can protect the server by load shedding using an adaptive concurrencylimitgoresilience.Runner
.
At its core, goresilience is based on a very simple idea, theRunner
interface,Runner
interface is the unit of execution, its accepts acontext.Context
, agoresilience.Func
and returns anerror
.
The idea of the Runner is the same as the go'shttp.Handler
, having a interface you could create chains of runners, also known as middlewares (Also called decorator pattern).
The library comes with decorators calledMiddleware
that return a function that wraps a runner with another runner and gives us the ability to create a resilient execution flow having the ability to wrap any runner to customize with the pieces that we want including custom ones not in this library.
This way we could create execution flow like this example:
Circuit breaker└── Timeout └── Retry
To create your own runner, You need to have 2 things in mind.
- Implement the
goresilience.Runner
interface. - Give constructors to get a
goresilience.Middleware
, this way yourRunner
could be chained with otherRunner
s.
In this example (full examplehere) we create a new resilience runner to make chaos engineering that will fail at a constant rate set on theConfig.FailEveryTimes
setting.
Following the library convention withNewFailer
we get the standalone Runner (the one that is not chainable). And withNewFailerMiddleware
We get aMiddleware
that can be used withgoresilience.RunnerChain
to chain with other Runners.
Note: We can usenil
onNew
becauseNewMiddleware
usesgoresilience.SanitizeRunner
that will return a valid Runner as the last part of the chain in case of beingnil
(for more information about this checkgoresilience.command
).
// Config is the configuration of constFailertypeConfigstruct {// FailEveryTimes will make the runner return an error every N executed times.FailEveryTimesint}// New is like NewFailerMiddleware but will not wrap any other runner, is standalone.funcNew(cfgConfig) goresilience.Runner {returnNewMiddleware(cfg)(nil)}// NewMiddleware returns a new middleware that will wrap runners and will fail// every N times of executions.funcNewMiddleware(cfgConfig) goresilience.Middleware {returnfunc(next goresilience.Runner) goresilience.Runner {calledTimes:=0// Use the RunnerFunc helper so we don't need to create a new type.returngoresilience.RunnerFunc(func(ctx context.Context,f goresilience.Func)error {// We should lock the counter writes, not made because this is an example.calledTimes++ifcalledTimes==cfg.FailEveryTimes {calledTimes=0returnfmt.Errorf("failed due to %d call",calledTimes) }// Run using the the chain.next=goresilience.Sanitize(next)returnnext.Run(ctx,f) }) }}
About
A library to improve the resilience of Go applications in an easy and flexible way
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.