- Notifications
You must be signed in to change notification settings - Fork5.5k
Add support for JSON formatted logs#5799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This adds a new extra package dependency on `json-logging`and an environment variable, which when set to "true" regardlessof case, will try to use the json-logging non-web app log formatter.If the `json-logging` package is not installed but the environmentvariable is True, something like this will be logged but it will notcrash the application:```$ ENABLE_JSON_LOGGING=true jupyter notebookUnable to use json logging due to missing packages. Run "pip install notebook[json-logging]" to fix.Traceback (most recent call last): File "/home/osboxes/jupyter/notebook/notebook/notebookapp.py", line 144, in <module> import json_loggingModuleNotFoundError: No module named 'json_logging'```Initially I tried to add a new Bool config option to toggle this butthe problem is (from my limited understanding of traitlets and tornado)is that `_log_formatter_cls` needs to be set early and trying to se thelog formatter later in `init_logging` is too late - or at least I couldn'tfigure out a way to reset the log formatter dynamically (I tried calling`_log_format_changed` [1] but that didn't work).With this working you get logs like this:```{"written_at": "2020-10-07T21:10:51.606Z", "written_ts": 1602105051606265000, "msg": "404 GET /nbextensions/widgets/notebook/js/extension.js (127.0.0.1) 9.26ms referer=http://localhost:8888/notebooks/Untitled.ipynb", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "WARNING", "module": "log", "line_no": 49}{"written_at": "2020-10-07T21:10:54.443Z", "written_ts": 1602105054443309000, "msg": "Starting buffering for f260ddd7-938c-42d0-ac3b-455bea76694f:49f30b53fc4b4ec6a8f2fb748a171613", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "INFO", "module": "kernelmanager", "line_no": 222}{"written_at": "2020-10-07T21:10:54.446Z", "written_ts": 1602105054446264000, "msg": "Kernel shutdown: f260ddd7-938c-42d0-ac3b-455bea76694f", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "INFO", "module": "multikernelmanager", "line_no": 201}```An obvious improvement here would to be able to split the `msg` fields apartso that we can log things like response status code, request method, requestURL, response_time_ms etc. That should be possible with the `JSONLogWebFormatter`from `json-logging` but when I tried using that I was getting errors fromthe library about a correlation id (which is based on a request header wedon't use). The `json-logging` library supports several web frameworks likeFlask but unfortunately does not have built in support for Tornado, but it doessupport custom formatters so that might be a follow up option to improve on this.[1]https://github.com/ipython/traitlets/blob/4.3.3/traitlets/config/application.py#L195Closesjupyter#5798mriedem commentedOct 7, 2020
mriedem commentedOct 7, 2020
This is the error I get when trying to use And it looks like that's because I'm not going through one of the https://github.com/bobbui/json-logging-python/blob/1.2.8/json_logging/__init__.py#L132 |
This gets us closer to logging parts of the requestas separate json fields but not all the way there.For example:```json{"written_at": "2020-10-07T21:52:16.975Z", "written_ts": 1602107536975497000, "msg": "{'status': 404, 'method': 'GET', 'ip': '127.0.0.1', 'uri': '/nbextensions/widgets/notebook/js/extension.js', 'request_time': 1.9807815551757812, 'referer': 'http://localhost:8888/notebooks/test.ipynb'}", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "WARNING", "module": "log", "line_no": 54}```mriedem commentedOct 7, 2020
Getting a little further by changing the {"written_at":"2020-10-07T21:52:16.975Z","written_ts":1602107536975497000,"msg":"{'status': 404, 'method': 'GET', 'ip': '127.0.0.1', 'uri': '/nbextensions/widgets/notebook/js/extension.js', 'request_time': 1.9807815551757812, 'referer': 'http://localhost:8888/notebooks/test.ipynb'}","type":"log","logger":"NotebookApp","thread":"MainThread","level":"WARNING","module":"log","line_no":54}But the |
notebook/log.py Outdated
| log_method(json.dumps(dict(request.headers),indent=2)) | ||
| log_method(msg.format(**ns)) | ||
| if_enable_json_logs: | ||
| # FIXME: this still logs the msg as a serialized json string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I wonder if we could use a partial to passlog in here:
notebook/notebook/notebookapp.py
Line 269 inb617d91
| log_function=log_request, |
Because that'd be theNotebookApp logger which is using our json log formatter.
notebook/log.py Outdated
| # log all headers if it caused an error | ||
| log_method(json.dumps(dict(request.headers),indent=2)) | ||
| log_method(msg.format(**ns)) | ||
| if_enable_json_logs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Just because this is true doesn't mean that json-logging is installed. If it's not, the request log line ends up looking like this:
[W 17:04:57.097 NotebookApp] {'status': 404, 'method': 'GET', 'ip': '127.0.0.1', 'uri': '/nbextensions/widgets/notebook/js/extension.js', 'request_time': 6.743431091308594, 'referer': 'http://localhost:8888/notebooks/Untitled.ipynb?kernel_name=python3'}Also note that with the json logging we're not formatting the request time to 2 decimal places:
'request_time': 6.743431091308594
kevin-bates commentedOct 8, 2020
Thanks Matt. This is an interesting conundrum on a couple counts.
I think these can be resolved in a more application-friendly manner by first introducing a boolean configurable Traitlets also provide the ability tovalidate a config option. This is where we could see if the We could let the default value for the I apologize for these details, but because there were so many unknowns I had, it was quicker just to try things out. I have some working code - but I'd rather help you navigate this if you don't mind. Otherwise, just let me know. |
mriedem commentedOct 8, 2020
@kevin-bates thanks for those detailed pointers.
Yup, no problem and I definitely appreciate the patience with helping me learn here (I don't really have experience with tornado or traitlets, mostly copy/tweak/test when I need to make changes). |
mriedem commentedOct 9, 2020
@kevin-bates I worked on your suggestions a bit this morning and captured the diffhere since it's not working. A few notes:
I think that's useful for someone looking at logging options to have those grouped together for context.
Thanks again for the help. |
kevin-bates commentedOct 9, 2020
Thanks Matt. The "tickle" should actually change the format. In this case, I removed the color-coding attributes. Here's the code I had put together the other day. Since you've changed the traitlet's name, that change would need to be reflected, as well as perhaps logging that the import failed during the validation. Lastly, the convention for envs is that they be prefixed with I found no changes necessary other than in notebookapp.py, but would prefer that you confirm the same. json_logging=Bool(config=True)@default('json_logging')def_default_json_logging(self):returnos.getenv('JUPYTER_ENABLE_JSON_LOGGING','false').lower()=='true'@validate('json_logging')def_validate_json_logging(self,proposal):value=proposal['value']ifvalueisTrue:try:importjson_loggingself._log_formatter_cls=json_logging.JSONLogFormatterself.log_format=u"[%(levelname)1.1s %(asctime)s.%(msecs).03d %(name)s] %(message)s"exceptImportError:value=Falsereturnvalue My concern with the change to |
mriedem commentedOct 9, 2020
Got it. I'm not totally sure about that solution since if I set
Done.
I see a bug in my code, I wasn't using
I renamed the config option to simply I'll tinker with this locally and see what I come up with. I'm also going to see what happens if we call https://github.com/bobbui/json-logging-python/blob/1.2.8/json_logging/__init__.py#L153 |
This adds the `NotebookApp.log_json` CLI option which defaults toFalse. The default value comes from the `JUPYTER_ENABLE_JSON_LOGGING`environment variable. A validation hook is added such that if jsonlogging is enabled but the `json-logging` package is not installed,an error is logged and the effective `log_json` value is False.If enabled and the `json-logging` package is installed, the `init_logging`method will initialize the non-web [1] json logging formatter.[1]https://github.com/bobbui/json-logging-python#21-non-web-application-log
mriedem commentedOct 9, 2020
This works and avoids the need to change Still need to figure out how to get and if you do it logs the request parameters as a serialized json string: |
In order to use the same logger from the notebook app weneed to pass it through to the log_request function. Todo that we need to use a partial function so the `log` kwargis available. If not provided the default logger is the sameas before, the tornado web access logger.This is necessary for the `log_json` settings to be used forboth the notebook app logs and the web app request logging.There is still a todo needed here so that the individualrequest fields get formatted separately rather than dumpedinto the `msg` field.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
mriedem commentedOct 9, 2020 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I'd like to add tests for this but I'm not sure if there is a great way to test it. Maybe something like start the notebook app server with the env var set and verify the logged message can be serialized and then deserialized JSON? Though that would also mean having |
This updates the log_request method to get the requestproperties logged separately if `log_json=True`. Thisends up looking like:```{"written_at": "2020-10-09T20:59:52.044Z", "written_ts": 1602277192044732000, "msg": "", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "WARNING", "module": "log", "line_no": 50, "status": 404, "method": "GET", "ip": "127.0.0.1", "uri": "/api/kernels/a6dd447b-282a-4e1c-ab65-eb340ad12965/channels?session_id=472907c27ed94e2cbfbae86501f7d159", "request_time": 8.63, "referer": "None"}```Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
kevin-bates left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Hi Matt. This is coming along, but I think still too spread out. I've added some comments that should help isolate some of these changes. Thanks.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
notebook/notebookapp.py Outdated
| root_dir='~'+root_dir[len(home):] | ||
| # Use the NotebookApp logger and its formatting for tornado request logging. | ||
| log_function=functools.partial(log_request,log=log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm not finding this to be necessary. If I restore to the original, I get the same output.
This PR:{"written_at": "2020-10-09T21:11:30.439Z", "written_ts": 1602277890439353000, "msg": "200 GET /api/contents?type=directory&_=1602277851454 (127.0.0.1) 22.49ms", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "DEBUG", "module": "log", "line_no": 49}with changes reverted:{"written_at": "2020-10-09T21:19:03.578Z", "written_ts": 1602278343578833000, "msg": "200 GET /api/contents?type=directory&_=1602278343433 (127.0.0.1) 12.31ms", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "DEBUG", "module": "log", "line_no": 49}Uh oh!
There was an error while loading.Please reload this page.
mriedem left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Changes coming soon.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Just use the app logger since that has the default LogFormatterlogging until we change it to JSON.
kevin-bates left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
hi Matt - this is looking great. Just had the one comment regarding the debug statement that doesn't seem to get displayed.
Uh oh!
There was an error while loading.Please reload this page.
Make sure the default value is handled properly basedon environment variables. Also checks the validationcode based on whether or not json_logging is imported.
Basic test just to make sure log_request handles log_json=Trueproperly.
The intent of the test is to make sure that log_json defaultsto True when JUPYTER_ENABLE_JSON_LOGGING is set to "true" sothis makes the test more explicit and to pass when json-loggingis not installed so that does not interfere due to validation.
kevin-bates commentedOct 13, 2020
Hi Matt - would you mind merging with the latest master and pushing that commit? Your changes with legitimate test failures coincided with a general build issue and I'd like to have this PR be 'green' prior to its merge into master. Sorry for the inconvenience. |
kevin-bates left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looks good - thank you Matt.
mriedem commentedOct 14, 2020
Thanks for all of your help Kevin. |
mriedem commentedOct 14, 2020
@kevin-bates quick question, do you have any idea of a timeline for the next planned release? Looking at recent releases it looks like you've been doing monthly releases which is cool. My team would like to avoid pulling this change in via a git hash so if there is a release planned on the horizon we'll wait, just wondering. |
kevin-bates commentedOct 14, 2020
Would getting a release out in the next couple of weeks be sufficient - say last week of October/first week of November? Although convenient, I'd rather not set a fixed cadence unless there are other maintainers that can assist. |
mriedem commentedOct 14, 2020
Yup, that would be great. |
Uh oh!
There was an error while loading.Please reload this page.
This adds a new extra package dependency on
json-loggingand an environment variable, which when set to "true" regardless
of case, will try to use the json-logging non-web app log formatter.
A
NotebookApp.log_jsonconfig option is also added which getsits default value from the
JUPYTER_ENABLE_JSON_LOGGINGenvironment variable.
If the
json-loggingpackage is not installed butlog_jsonis True,something like this will be logged but it will not crash the application:
With this working you get logs like this:
Web app requests get logged with separate properties like this:
{"written_at":"2020-10-09T20:59:52.044Z","written_ts":1602277192044732000,"msg":"","type":"log","logger":"NotebookApp","thread":"MainThread","level":"WARNING","module":"log","line_no":50,"status":404,"method":"GET","ip":"127.0.0.1","uri":"/api/kernels/a6dd447b-282a-4e1c-ab65-eb340ad12965/channels?session_id=472907c27ed94e2cbfbae86501f7d159","request_time":8.63,"referer":"None"}The nice thing about that is you can filter request logs on certain error
status codes or request times that take longer than some threshold to
setup alerts in your logging infrastructure (ELK, LogDNA, etc).
Closes#5798