
Posted on • Edited on • Originally published atduffn.github.io
Analyzing Cloudflare Logs with AWS Athena
As with many features with Cloudflare, you can enable theirLogpush service with the click of a button. Logpush sends your HTTP request logs to your cloud storage provider every 5 minutes.
If you are using AWS S3 for your storage, you can then utilizeAthena to analyze your logs.
Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.
So with a little setup and some simple SQL, you can analyze your Cloudflare logs.
Table DDL
In order to being querying, however, you need tocreate an external table in Athena that matches the format of your Cloudflare logs, which are JSON with a newline delineating each record.
Luckily, this is pretty easy to setup in Athena. Here is the DDL for all of the fields currently included in Cloudflare Logpush.
Note: You can customize thefields that Logpush includes, so if you have, your list of fields may not match the below exactly.
CREATEEXTERNALTABLEcloudflare_logs(CacheCacheStatusstring,CacheResponseBytesint,CacheResponseStatusint,CacheTieredFillboolean,ClientASNint,ClientCountrystring,ClientDeviceTypestring,ClientIPstring,ClientIPClassstring,ClientRequestBytesint,ClientRequestHoststring,ClientRequestMethodstring,ClientRequestPathstring,ClientRequestProtocolstring,ClientRequestRefererstring,ClientRequestURIstring,ClientRequestUserAgentstring,ClientSSLCipherstring,ClientSSLProtocolstring,ClientSrcPortint,EdgeColoIDint,EdgeEndTimestampstring,EdgePathingOpstring,EdgePathingSrcstring,EdgePathingStatusstring,EdgeRateLimitActionstring,EdgeRateLimitIDint,EdgeRequestHoststring,EdgeResponseBytesint,EdgeResponseCompressionRatiodouble,EdgeResponseContentTypestring,EdgeResponseStatusint,EdgeServerIPstring,EdgeStartTimestampstring,FirewallMatchesActionsARRAY<string>,FirewallMatchesSourcesARRAY<string>,FirewallMatchesRuleIDsARRAY<string>,OriginIPstring,OriginResponseBytesint,OriginResponseHTTPExpiresstring,OriginResponseHTTPLastModifiedstring,OriginResponseStatusint,OriginResponseTimebigint,OriginSSLProtocolstring,ParentRayIDstring,RayIDstring,SecurityLevelstring,WAFActionstring,WAFFlagsstring,WAFMatchedVarstring,WAFProfilestring,WAFRuleIDstring,WAFRuleMessagestring,WorkerCPUTimeint,WorkerStatusstring,WorkerSubrequestboolean,WorkerSubrequestCountint,ZoneIDbigint)ROWFORMATSERDE'org.openx.data.jsonserde.JsonSerDe'LOCATION's3://my-cloudflare-logs/'
Of course, changes3://my-cloudflare-logs/
to the name of your bucket that you used when setting up Logpush.
Querying
An now that we have a table created in Athena, we can analyze our logs in a myriad of ways.
How about checking to see how many requests you've received by the request protocol?
SELECTcount(*)asrequests,c.clientrequestprotocolFROM"cloudflare_logs"."cloudflare_logs"cGROUPBYc.clientrequestprotocolORDERBYcount(*)DESClimit10;
requests clientrequestprotocol1 15063737 HTTP/22 6842951 HTTP/1.13 4342 HTTP/1.0
Or maybe for reasons unknown, you want to see the average client request size in bytes for today, grouped by the Cloudflare edge colo ID.
SELECTavg(clientrequestbytes),edgecoloidFROM"cloudflare_logs"."cloudflare_logs"-- Assuming you are using the default date format with LogpushWHEREdate_trunc('day',from_iso8601_timestamp(edgestarttimestamp))=current_dateGROUPBYedgecoloidORDERBYavg(clientrequestbytes)ASC;
As you can imagine, the ways that you can slice and dice your Cloudflare HTTP logs is nearly limitless. Enjoy diving deep on your Cloudflare logs!
Top comments(0)
For further actions, you may consider blocking this person and/orreporting abuse