- Notifications
You must be signed in to change notification settings - Fork14
A fault-tolerant events/alerts correlation engine
License
myntra/cortex
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Cortex is a fault-tolerant events correlation engine. It groups and correlates incoming events for further actions:creating/resolving incidents/alerts or for doing root cause analysis.
- Built-in regex matcher for capturing events into groups(here called as a bucket).
- Built-in ES6 javascript interpreter(https://docs.k6.io/docs/modules) for executing correlation logic on buckets.
- React UI for creating new rules, correlation scripts, list of rule execution history and a playground to simulate correlation executions.
- REST API crud for rules, scripts and execution history.
- Cloudevents input and output(https://cloudevents.io/).
- Fault Tolerance built on top ofhttps://github.com/hashicorp/raft andhttps://github.com/boltdb/bolt .
- Single fat self-supervising binary usinghttps://github.com/crawshaw/littleboss .
- MessagePack encoding/decoding for raft entries usinghttps://github.com/tinylib/msgp .
The project isalpha quality and not yet ready for production.
Find relationship between N events received at M different points in time using regex matchers and javascript
To know more about event correlation in general, please read:https://en.wikipedia.org/wiki/Event_correlation
- https://console.bluemix.net/catalog/services/event-management
- https://www.bigpanda.io/blog/algorithmic-alert-correlation/
- https://docs.servicenow.com/bundle/kingston-it-operations-management/page/product/event-management/concept/c_EMEventCorrelationRules.html
- Alerts/Events Correlation
- Event Gateway
- FAAS
- Incidents Management
Cortex runs the following steps to achieve event corrrelation:
- Match : incoming alert --> (convert from site 24x7/icinga ) --> (match rule) -->Collect
- Collect --> (add to the rule bucket whichdwells around until the configured time) -->Execute
- Execute --> (flush after Dwell period) --> (execute configured script) -->Post
- Post --> (if result is set from script, post the result to the HookEndPoint or post the bucket itself if result is nil)
A rule contains an array of patterns used to capture events in abucket
{"title":"a test rule","id":"test-rule-id-1","eventTypePatterns": ["acme.prod.icinga.check_disk","acme.prod.site247.*"],"scriptID":"myscript.js","dwell":4000,"dwellDeadline":3800,"maxDwell":8000,"hookEndpoint":"http://localhost:3000/testrule","hookRetry":2}where:
EventTypePatterns is the pattern of events to be collected in a bucket.
Dwell is the wait duration since the first matched event.
Possible patterns:
{rule pattern, incoming event type, expected match}{"acme*", "acme", false},{"acme*", "acme.prod", true},{"acme.prod*", "acme.prod.search", true},{"acme.prod*.checkout", "acme.prod.search", false},{"acme.prod*.*", "acme.prod.search", false},{"acme.prod*.*", "acme.prod-1.search", true},{"acme.prod.*.*.*", "acme.prod.search.node1.check_disk", true},{"acme.prod.*.*.check_disk", "acme.prod.search.node1.check_disk", true},{"acme.prod.*.*.check_loadavg", "acme.prod.search.node1.check_disk", false},{"*.prod.*.*.check_loadavg", "acme.prod.search.node1.check_loadavg", true},{"acme.prod.*", "acme.prod.search.node1.check_disk", true},{"acme.prod.search.node*.check_disk", "acme.prod.search.node1.check_disk", true},{"acme.prod.search.node*.*", "acme.prod.search.node1.check_disk", true},{"acme.prod.search.dc1-node*.*", "acme.prod.search.node1.check_disk", false},Alerts are accepted as a cloudevents.io event(https://github.com/cloudevents/spec/blob/master/json-format.md). Site 24x7 and Icinga integration sinks are also provided.
The engine collects similar events in a bucket over a time window using a regex matcher and then executes a JS(ES6) script. The script contains the correlation logic which can further create incidents or alerts. The JS environment is limited and is achieved by embedding k6.io javascript interpreter(https://docs.k6.io/docs/modules). This is an excellent library built on top ofhttps://github.com/dop251/goja
For the above example rule, incoming events witheventType matching one ofeventTypePatterns will be put in the same bucket:
{"rule": {},"events": [{"cloudEventsVersion":"0.1","eventType":"acme.prod.site247.search_down","source":"site247","eventID":"C234-1234-1234","eventTime":"2018-04-05T17:31:00Z","extensions": {"comExampleExtension":"value"},"contentType":"application/json","data": {"appinfoA":"abc","appinfoB":123,"appinfoC":true}}]}After thedwell period, the configuredmyscript.js will be invoked and the bucket will be passed along:
importhttpfrom"k6/http";// result is a special variableletresult=null// the entry function called by defaultexportdefaultfunction(bucket){bucket.events.foreach((event)=>{// create incident or alert or do nothinghttp.Post("http://acme.com/incident")// if result is set. it will picked up the engine and posted to hookEndPoint})}`
Ifresult is set, it will be posted to the hookEndPoint. Thebucket itself will be reset and evicted from thecollect loop. The executionrecord will then be stored and can be fetched later.
A newbucket will be created when an event matches the rule again.
Rule results can be posted to a configured http endpoint. The remote endpoint should be able to accept aPOST : application/json request.
"hookEndpoint": "http://localhost:3000/testrule","hookRetry": 2- git clonehttps://github.com/myntra/cortex
- ./release.sh
Starts a single node server.
TODO
About
A fault-tolerant events/alerts correlation engine
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors3
Uh oh!
There was an error while loading.Please reload this page.


