Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add support for JSON formatted logs#5799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
kevin-bates merged 13 commits intojupyter:masterfrommriedem:5798-json-logging
Oct 14, 2020

Conversation

@mriedem
Copy link
Contributor

@mriedemmriedem commentedOct 7, 2020
edited
Loading

This adds a new extra package dependency onjson-logging
and an environment variable, which when set to "true" regardless
of case, will try to use the json-logging non-web app log formatter.
ANotebookApp.log_json config option is also added which gets
its default value from theJUPYTER_ENABLE_JSON_LOGGING
environment variable.

If thejson-logging package is not installed butlog_json is True,
something like this will be logged but it will not crash the application:

$ JUPYTER_ENABLE_JSON_LOGGING=true jupyter notebookUnable to use json logging due to missing packages. Run "pip install notebook[json-logging]" to fix.Traceback (most recent call last):  File "/home/osboxes/jupyter/notebook/notebook/notebookapp.py", line 144, in <module>    import json_loggingModuleNotFoundError: No module named 'json_logging'

With this working you get logs like this:

{"written_at": "2020-10-07T21:10:51.606Z", "written_ts": 1602105051606265000, "msg": "404 GET /nbextensions/widgets/notebook/js/extension.js (127.0.0.1) 9.26ms referer=http://localhost:8888/notebooks/Untitled.ipynb", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "WARNING", "module": "log", "line_no": 49}{"written_at": "2020-10-07T21:10:54.443Z", "written_ts": 1602105054443309000, "msg": "Starting buffering for f260ddd7-938c-42d0-ac3b-455bea76694f:49f30b53fc4b4ec6a8f2fb748a171613", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "INFO", "module": "kernelmanager", "line_no": 222}{"written_at": "2020-10-07T21:10:54.446Z", "written_ts": 1602105054446264000, "msg": "Kernel shutdown: f260ddd7-938c-42d0-ac3b-455bea76694f", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "INFO", "module": "multikernelmanager", "line_no": 201}

Web app requests get logged with separate properties like this:

{"written_at":"2020-10-09T20:59:52.044Z","written_ts":1602277192044732000,"msg":"","type":"log","logger":"NotebookApp","thread":"MainThread","level":"WARNING","module":"log","line_no":50,"status":404,"method":"GET","ip":"127.0.0.1","uri":"/api/kernels/a6dd447b-282a-4e1c-ab65-eb340ad12965/channels?session_id=472907c27ed94e2cbfbae86501f7d159","request_time":8.63,"referer":"None"}

The nice thing about that is you can filter request logs on certain error
status codes or request times that take longer than some threshold to
setup alerts in your logging infrastructure (ELK, LogDNA, etc).

Closes#5798

This adds a new extra package dependency on `json-logging`and an environment variable, which when set to "true" regardlessof case, will try to use the json-logging non-web app log formatter.If the `json-logging` package is not installed but the environmentvariable is True, something like this will be logged but it will notcrash the application:```$ ENABLE_JSON_LOGGING=true jupyter notebookUnable to use json logging due to missing packages. Run "pip install notebook[json-logging]" to fix.Traceback (most recent call last):  File "/home/osboxes/jupyter/notebook/notebook/notebookapp.py", line 144, in <module>    import json_loggingModuleNotFoundError: No module named 'json_logging'```Initially I tried to add a new Bool config option to toggle this butthe problem is (from my limited understanding of traitlets and tornado)is that `_log_formatter_cls` needs to be set early and trying to se thelog formatter later in `init_logging` is too late - or at least I couldn'tfigure out a way to reset the log formatter dynamically (I tried calling`_log_format_changed` [1] but that didn't work).With this working you get logs like this:```{"written_at": "2020-10-07T21:10:51.606Z", "written_ts": 1602105051606265000, "msg": "404 GET /nbextensions/widgets/notebook/js/extension.js (127.0.0.1) 9.26ms referer=http://localhost:8888/notebooks/Untitled.ipynb", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "WARNING", "module": "log", "line_no": 49}{"written_at": "2020-10-07T21:10:54.443Z", "written_ts": 1602105054443309000, "msg": "Starting buffering for f260ddd7-938c-42d0-ac3b-455bea76694f:49f30b53fc4b4ec6a8f2fb748a171613", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "INFO", "module": "kernelmanager", "line_no": 222}{"written_at": "2020-10-07T21:10:54.446Z", "written_ts": 1602105054446264000, "msg": "Kernel shutdown: f260ddd7-938c-42d0-ac3b-455bea76694f", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "INFO", "module": "multikernelmanager", "line_no": 201}```An obvious improvement here would to be able to split the `msg` fields apartso that we can log things like response status code, request method, requestURL, response_time_ms etc. That should be possible with the `JSONLogWebFormatter`from `json-logging` but when I tried using that I was getting errors fromthe library about a correlation id (which is based on a request header wedon't use). The `json-logging` library supports several web frameworks likeFlask but unfortunately does not have built in support for Tornado, but it doessupport custom formatters so that might be a follow up option to improve on this.[1]https://github.com/ipython/traitlets/blob/4.3.3/traitlets/config/application.py#L195Closesjupyter#5798
@mriedem
Copy link
ContributorAuthor

@sdague@rmoe Here is an idea for a start on JSON log formatting though it might not be super useful if we can't split apart the request-specific parts of themsg field.

@mriedem
Copy link
ContributorAuthor

This is the error I get when trying to useJSONLogWebFormatter:

--- Logging error ---Traceback (most recent call last):  File "/usr/lib/python3.6/logging/__init__.py", line 994, in emit    msg = self.format(record)  File "/usr/lib/python3.6/logging/__init__.py", line 840, in format    return fmt.format(record)  File "/home/osboxes/jupyter/notebook/.tox/lib/python3.6/site-packages/json_logging/__init__.py", line 243, in format    log_object = self._format_log_object(record, request_util=_request_util)  File "/home/osboxes/jupyter/notebook/.tox/lib/python3.6/site-packages/json_logging/__init__.py", line 341, in _format_log_object    "correlation_id": request_util.get_correlation_id(within_formatter=True),AttributeError: 'NoneType' object has no attribute 'get_correlation_id'

And it looks like that's because I'm not going through one of theinit_* framework methods:

https://github.com/bobbui/json-logging-python/blob/1.2.8/json_logging/__init__.py#L132

This gets us closer to logging parts of the requestas separate json fields but not all the way there.For example:```json{"written_at": "2020-10-07T21:52:16.975Z", "written_ts": 1602107536975497000, "msg": "{'status': 404, 'method': 'GET', 'ip': '127.0.0.1', 'uri': '/nbextensions/widgets/notebook/js/extension.js', 'request_time': 1.9807815551757812, 'referer': 'http://localhost:8888/notebooks/test.ipynb'}", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "WARNING", "module": "log", "line_no": 54}```
@mriedem
Copy link
ContributorAuthor

Getting a little further by changing thelog_request method so I see something like this now:

{"written_at":"2020-10-07T21:52:16.975Z","written_ts":1602107536975497000,"msg":"{'status': 404, 'method': 'GET', 'ip': '127.0.0.1', 'uri': '/nbextensions/widgets/notebook/js/extension.js', 'request_time': 1.9807815551757812, 'referer': 'http://localhost:8888/notebooks/test.ipynb'}","type":"log","logger":"NotebookApp","thread":"MainThread","level":"WARNING","module":"log","line_no":54}

But themsg is a serialized JSON string now.

log_method(json.dumps(dict(request.headers),indent=2))
log_method(msg.format(**ns))
if_enable_json_logs:
# FIXME: this still logs the msg as a serialized json string,
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wonder if we could use a partial to passlog in here:

log_function=log_request,

Because that'd be theNotebookApp logger which is using our json log formatter.

# log all headers if it caused an error
log_method(json.dumps(dict(request.headers),indent=2))
log_method(msg.format(**ns))
if_enable_json_logs:
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Just because this is true doesn't mean that json-logging is installed. If it's not, the request log line ends up looking like this:

[W 17:04:57.097 NotebookApp] {'status': 404, 'method': 'GET', 'ip': '127.0.0.1', 'uri': '/nbextensions/widgets/notebook/js/extension.js', 'request_time': 6.743431091308594, 'referer': 'http://localhost:8888/notebooks/Untitled.ipynb?kernel_name=python3'}

Also note that with the json logging we're not formatting the request time to 2 decimal places:

'request_time': 6.743431091308594

@mriedemmriedem marked this pull request as draftOctober 7, 2020 22:19
@kevin-bates
Copy link
Member

Thanks Matt. This is an interesting conundrum on a couple counts.

  1. The logging formatter is something used very early - introducing potential chicken/egg situations.
  2. The tornado logging is a different facility, yet ideally, we want to leverage the same configurable setting to determine if JSON formatting should be used.

I think these can be resolved in a more application-friendly manner by first introducing a boolean configurablejson_logging that can be configured via a configuration file usingc.NotebookApp.json_logging = True. We'd also want to tie this to a CLI flag (e.g.,--json-logging such that, when present, it enables JSON formatting). You should be able to follow the example ofallow_root for these two details.

Traitlets also provide the ability tovalidate a config option. This is where we could see if thejson_logging package can be imported and, if so, "tickle" the current log_format value after setting the log_formatter_cls. The reason for this "tickle" is to trigger theobservation that takes place in the traitletsApplication class - which will perform a reset using the formatter_cls value - as you had mentioned. Modeling the validation after the 'ip' traitlet can get you started here.

We could let the default value for thejson-logging flag still derive from the env - which is helpful in containerized envs and this could be done via a _default_json_logging() entry that returns configurable's default. Again, 'ip' uses a defaulting method that you can model this after.

I apologize for these details, but because there were so many unknowns I had, it was quicker just to try things out. I have some working code - but I'd rather help you navigate this if you don't mind. Otherwise, just let me know.

@mriedem
Copy link
ContributorAuthor

@kevin-bates thanks for those detailed pointers.

I'd rather help you navigate this if you don't mind

Yup, no problem and I definitely appreciate the patience with helping me learn here (I don't really have experience with tornado or traitlets, mostly copy/tweak/test when I need to make changes).

kevin-bates reacted with thumbs up emoji

@mriedem
Copy link
ContributorAuthor

@kevin-bates I worked on your suggestions a bit this morning and captured the diffhere since it's not working. A few notes:

  • I added the config option but with namelog_format_json so that it would show up alongsidelog_format andlog_datefmt etc in the help output:
--NotebookApp.log_datefmt=<Unicode>    Default: '%Y-%m-%d %H:%M:%S'    The date format used by logging formatters for %(asctime)s--NotebookApp.log_format=<Unicode>    Default: '[%(name)s]%(highlevel)s %(message)s'    The Logging format template--NotebookApp.log_format_json=<Bool>    Default: False    Set to True to enable JSON formatted logs. Run "pip install notebook[json-    logging]" to install the required dependent packages. Can also be set using    the environment variable ENABLE_JSON_LOGGING=true.--NotebookApp.log_level=<Enum>    Default: 30    Choices: (0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL')    Set the log level by value or name.

I think that's useful for someone looking at logging options to have those grouped together for context.

  • I'm assuming I didn't implement_validate_log_format_json correctly because it doesn't seem to be getting hit. I've tried running with bothENABLE_JSON_LOGGING=true jupyter notebook andjupyter notebook --NotebookApp.log_format_json=True. I never see the logging statements I put in_validate_log_format_json though, nor do I see the ImportError traceback if I uninstalljson-logging and try to start the server withlog_format_json=True. Is there something I'm missing there about when that validation function is supposed to be called?

  • I'm also not sure that I'm "tickling" thelog_format option correctly by doingself.log_format = self.log_format.

Thanks again for the help.

@kevin-bates
Copy link
Member

Thanks Matt. The "tickle" should actually change the format. In this case, I removed the color-coding attributes. Here's the code I had put together the other day. Since you've changed the traitlet's name, that change would need to be reflected, as well as perhaps logging that the import failed during the validation.

Lastly, the convention for envs is that they be prefixed withJUPYTER_.

I found no changes necessary other than in notebookapp.py, but would prefer that you confirm the same.

json_logging=Bool(config=True)@default('json_logging')def_default_json_logging(self):returnos.getenv('JUPYTER_ENABLE_JSON_LOGGING','false').lower()=='true'@validate('json_logging')def_validate_json_logging(self,proposal):value=proposal['value']ifvalueisTrue:try:importjson_loggingself._log_formatter_cls=json_logging.JSONLogFormatterself.log_format=u"[%(levelname)1.1s %(asctime)s.%(msecs).03d %(name)s] %(message)s"exceptImportError:value=Falsereturnvalue

My concern with the change tolog_format_json is that it implies the value should be some kind of format string - similar tolog_format, whereasjson_logging seems more about thekind of logging that will take place. The concern being that the "format" approach could be confusing. Yes, I would agree whatever the name is should be grouped near the others as its all the same "topic".

@mriedem
Copy link
ContributorAuthor

The "tickle" should actually change the format. In this case, I removed the color-coding attributes.

Got it. I'm not totally sure about that solution since if I setlog_format specifically and this changed it, I'd probably wonder what's going on if the formatting doesn't show up as expected. If there is no better solution we could at least document something about that in the config help.

the convention for envs is that they be prefixed with JUPYTER_.

Done.

I found no changes necessary other than in notebookapp.py

I see a bug in my code, I wasn't usingself._log_formatter_cls.

My concern with the change to log_format_json is that it implies the value should be some kind of format string

I renamed the config option to simplylog_json to take the "format" out of it.


I'll tinker with this locally and see what I come up with.

I'm also going to see what happens if we calljson_logging.init_non_web(enable_json=True) frominit_logging sinceinit_non_web does change the formatter forall of the loggers:

https://github.com/bobbui/json-logging-python/blob/1.2.8/json_logging/__init__.py#L153

This adds the `NotebookApp.log_json` CLI option which defaults toFalse. The default value comes from the `JUPYTER_ENABLE_JSON_LOGGING`environment variable. A validation hook is added such that if jsonlogging is enabled but the `json-logging` package is not installed,an error is logged and the effective `log_json` value is False.If enabled and the `json-logging` package is installed, the `init_logging`method will initialize the non-web [1] json logging formatter.[1]https://github.com/bobbui/json-logging-python#21-non-web-application-log
@mriedem
Copy link
ContributorAuthor

I'm also going to see what happens if we call json_logging.init_non_web(enable_json=True) from init_logging since init_non_web does change the formatter for all of the loggers

This works and avoids the need to changelog_format to trigger the_log_format_changed observer. Updated the PR.

Still need to figure out how to getlog.log_request working properly because if you run without the environment variable set it doesn't log the request parameters in json format:

{"written_at": "2020-10-09T19:32:51.825Z", "written_ts": 1602271971825341000, "msg": "404 GET /nbextensions/widgets/notebook/js/extension.js (127.0.0.1) 8.94ms referer=http://localhost:8888/notebooks/Untitled.ipynb", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "WARNING", "module": "log", "line_no": 59}

and if you do it logs the request parameters as a serialized json string:

{"written_at": "2020-10-09T19:34:05.565Z", "written_ts": 1602272045565346000, "msg": "{'status': 404, 'method': 'GET', 'ip': '127.0.0.1', 'uri': '/api/kernels/48ecd2e7-ae9d-4484-8910-70fb46d72491/channels?session_id=5aa3a7dcedf1427e8ff41ae0cb3dae94', 'request_time': 1.8270015716552734, 'referer': 'None'}", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "WARNING", "module": "log", "line_no": 57}

In order to use the same logger from the notebook app weneed to pass it through to the log_request function. Todo that we need to use a partial function so the `log` kwargis available. If not provided the default logger is the sameas before, the tornado web access logger.This is necessary for the `log_json` settings to be used forboth the notebook app logs and the web app request logging.There is still a todo needed here so that the individualrequest fields get formatted separately rather than dumpedinto the `msg` field.
@mriedemmriedem marked this pull request as ready for reviewOctober 9, 2020 20:15
@mriedem
Copy link
ContributorAuthor

mriedem commentedOct 9, 2020
edited
Loading

I'd like to add tests for this but I'm not sure if there is a great way to test it. Maybe something like start the notebook app server with the env var set and verify the logged message can be serialized and then deserialized JSON? Though that would also mean havingjson-logging installed to run tests.

This updates the log_request method to get the requestproperties logged separately if `log_json=True`. Thisends up looking like:```{"written_at": "2020-10-09T20:59:52.044Z", "written_ts": 1602277192044732000, "msg": "", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "WARNING", "module": "log", "line_no": 50, "status": 404, "method": "GET", "ip": "127.0.0.1", "uri": "/api/kernels/a6dd447b-282a-4e1c-ab65-eb340ad12965/channels?session_id=472907c27ed94e2cbfbae86501f7d159", "request_time": 8.63, "referer": "None"}```
Copy link
Member

@kevin-bateskevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hi Matt. This is coming along, but I think still too spread out. I've added some comments that should help isolate some of these changes. Thanks.

root_dir='~'+root_dir[len(home):]

# Use the NotebookApp logger and its formatting for tornado request logging.
log_function=functools.partial(log_request,log=log)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm not finding this to be necessary. If I restore to the original, I get the same output.

This PR:{"written_at": "2020-10-09T21:11:30.439Z", "written_ts": 1602277890439353000, "msg": "200 GET /api/contents?type=directory&_=1602277851454 (127.0.0.1) 22.49ms", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "DEBUG", "module": "log", "line_no": 49}with changes reverted:{"written_at": "2020-10-09T21:19:03.578Z", "written_ts": 1602278343578833000, "msg": "200 GET /api/contents?type=directory&_=1602278343433 (127.0.0.1) 12.31ms", "type": "log", "logger": "NotebookApp", "thread": "MainThread", "level": "DEBUG", "module": "log", "line_no": 49}

Copy link
ContributorAuthor

@mriedemmriedem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Changes coming soon.

Just use the app logger since that has the default LogFormatterlogging until we change it to JSON.
Copy link
Member

@kevin-bateskevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

hi Matt - this is looking great. Just had the one comment regarding the debug statement that doesn't seem to get displayed.

Make sure the default value is handled properly basedon environment variables. Also checks the validationcode based on whether or not json_logging is imported.
Basic test just to make sure log_request handles log_json=Trueproperly.
The intent of the test is to make sure that log_json defaultsto True when JUPYTER_ENABLE_JSON_LOGGING is set to "true" sothis makes the test more explicit and to pass when json-loggingis not installed so that does not interfere due to validation.
@kevin-bates
Copy link
Member

Hi Matt - would you mind merging with the latest master and pushing that commit? Your changes with legitimate test failures coincided with a general build issue and I'd like to have this PR be 'green' prior to its merge into master. Sorry for the inconvenience.

mriedem reacted with thumbs up emoji

Copy link
Member

@kevin-bateskevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks good - thank you Matt.

@kevin-bateskevin-bates merged commit1550410 intojupyter:masterOct 14, 2020
@mriedemmriedem deleted the 5798-json-logging branchOctober 14, 2020 12:53
@mriedem
Copy link
ContributorAuthor

Looks good - thank you Matt.

Thanks for all of your help Kevin.

kevin-bates reacted with thumbs up emoji

@mriedem
Copy link
ContributorAuthor

@kevin-bates quick question, do you have any idea of a timeline for the next planned release? Looking at recent releases it looks like you've been doing monthly releases which is cool. My team would like to avoid pulling this change in via a git hash so if there is a release planned on the horizon we'll wait, just wondering.

@kevin-bates
Copy link
Member

Would getting a release out in the next couple of weeks be sufficient - say last week of October/first week of November? Although convenient, I'd rather not set a fixed cadence unless there are other maintainers that can assist.

@mriedem
Copy link
ContributorAuthor

Would getting a release out in the next couple of weeks be sufficient - say last week of October/first week of November?

Yup, that would be great.

Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.

Reviewers

1 more reviewer

@kevin-bateskevin-bateskevin-bates approved these changes

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

RFE: Add support for JSON logging format

2 participants

@mriedem@kevin-bates

[8]ページ先頭

©2009-2025 Movatter.jp