NotificationsYou must be signed in to change notification settings
Fork3
Star2

Commitbb97e80

committed

00399:CVE-2023-24329

pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508)`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bitpythonGH-25595.This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).Backported from Python 3.12

1 parente7ecd65 commitbb97e80Copy full SHA for bb97e80

File tree

4 files changed

+145

-3

lines changed

Doc/library
- urllib.parse.rst
Lib
- test
  - test_urlparse.py
- urllib
  - parse.py
Misc/NEWS.d/next/Security
- 2023-03-07-20-59-17.gh-issue-102153.14CLSZ.rst

4 files changed

+145

-3

lines changed

`‎Doc/library/urllib.parse.rst`

Lines changed: 69 additions & 2 deletions

Original file line number	Diff line number	Diff line change
`@@ -126,6 +126,28 @@ or on combining URL components into a URL string.`
`126`	`126`	``#``, ``@``, or ``:`` will raise a:exc:`ValueError`. If the URL is
`127`	`127`	`decomposed before parsing, no error will be raised.`
`128`	`128`
	`129`	`+ As is the case with all named tuples, the subclass has a few additional methods`
	`130`	+ and attributes that are particularly useful. One such method is:meth:`_replace`.
	`131`	+ The:meth:`_replace` method will return a new ParseResult object replacing specified
	`132`	`+ fields with new values.`
	`133`	`+`
	`134`	`+ ..doctest::`
	`135`	`+:options: +NORMALIZE_WHITESPACE`
	`136`	`+`
	`137`	`+ >>>from urllib.parseimport urlparse`
	`138`	`+ >>>u= urlparse('//www.cwi.nl:80/%7Eguido/Python.html')`
	`139`	`+ >>>u`
	`140`	`+ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',`
	`141`	`+params='', query='', fragment='')`
	`142`	`+ >>>u._replace(scheme='http')`
	`143`	`+ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',`
	`144`	`+params='', query='', fragment='')`
	`145`	`+`
	`146`	`+ ..warning::`
	`147`	`+`
	`148`	+:func:`urlparse` does not perform validation. See:ref:`URL parsing
	`149`	+ security <url-parsing-security>` for details.
	`150`	`+`
`129`	`151`	`..versionchanged::3.2`
`130`	`152`	`Added IPv6 URL parsing capabilities.`
`131`	`153`
`@@ -288,8 +310,14 @@ or on combining URL components into a URL string.`
`288`	`310`	``#``, ``@``, or ``:`` will raise a:exc:`ValueError`. If the URL is
`289`	`311`	`decomposed before parsing, no error will be raised.`
`290`	`312`
`291`		- Following the `WHATWG spec`_ that updates RFC 3986, ASCII newline
`292`		- ``\n``, ``\r`` and tab ``\t`` characters are stripped from the URL.
	`313`	+ Following some of the `WHATWG spec`_ that updates RFC 3986, leading C0
	`314`	+ control and space characters are stripped from the URL. ``\n``,
	`315`	+ ``\r`` and tab ``\t`` characters are removed from the URL at any position.
	`316`	`+`
	`317`	`+ ..warning::`
	`318`	`+`
	`319`	+:func:`urlsplit` does not perform validation. See:ref:`URL parsing
	`320`	+ security <url-parsing-security>` for details.
`293`	`321`
`294`	`322`	`..versionchanged::3.6`
`295`	`323`	Out-of-range port numbers now raise:exc:`ValueError`, instead of
`@@ -302,6 +330,9 @@ or on combining URL components into a URL string.`
`302`	`330`	`..versionchanged::3.6.14`
`303`	`331`	`ASCII newline and tab characters are stripped from the URL.`
`304`	`332`
	`333`	`+ ..versionchanged::3.11.4`
	`334`	`+ Leading WHATWG C0 control and space characters are stripped from the URL.`
	`335`	`+`
`305`	`336`	`.. _WHATWG spec:https://url.spec.whatwg.org/#concept-basic-url-parser`
`306`	`337`
`307`	`338`	`..function::urlunsplit(parts)`
`@@ -371,6 +402,42 @@ or on combining URL components into a URL string.`
`371`	`402`	`..versionchanged::3.2`
`372`	`403`	`Result is a structured object rather than a simple 2-tuple.`
`373`	`404`
	`405`	`+..function::unwrap(url)`
	`406`	`+`
	`407`	`+ Extract the url from a wrapped URL (that is, a string formatted as`
	`408`	+ ``<URL:scheme://host/path>``, ``<scheme://host/path>``, ``URL:scheme://host/path``
	`409`	+ or ``scheme://host/path``). If url is not a wrapped URL, it is returned
	`410`	`+ without changes.`
	`411`	`+`
	`412`	`+.. _url-parsing-security:`
	`413`	`+`
	`414`	`+URL parsing security`
	`415`	`+--------------------`
	`416`	`+`
	`417`	+The:func:`urlsplit` and:func:`urlparse` APIs do not perform validation of
	`418`	`+inputs. They may not raise errors on inputs that other applications consider`
	`419`	`+invalid. They may also succeed on some inputs that might not be considered`
	`420`	`+URLs elsewhere. Their purpose is for practical functionality rather than`
	`421`	`+purity.`
	`422`	`+`
	`423`	`+Instead of raising an exception on unusual input, they may instead return some`
	`424`	`+component parts as empty strings. Or components may contain more than perhaps`
	`425`	`+they should.`
	`426`	`+`
	`427`	`+We recommend that users of these APIs where the values may be used anywhere`
	`428`	`+with security implications code defensively. Do some verification within your`
	`429`	+code before trusting a returned component part. Does that ``scheme`` make
	`430`	+sense? Is that a sensible ``path``? Is there anything strange about that
	`431`	+``hostname``? etc.
	`432`	`+`
	`433`	`+What constitutes a URL is not universally well defined. Different applications`
	`434`	+have different needs and desired constraints. For instance the living `WHATWG
	`435`	+spec`_ describes what user facing web clients such as a web browser require.
	`436`	+While:rfc:`3986` is more general. These functions incorporate some aspects of
	`437`	`+both, but cannot be claimed compliant with either. The APIs and existing user`
	`438`	`+code with expectations on specific behaviors predate both standards leading us`
	`439`	`+to be very cautious about making API behavior changes.`
	`440`	`+`
`374`	`441`	`.. _parsing-ascii-encoded-bytes:`
`375`	`442`
`376`	`443`	`Parsing ASCII Encoded Bytes`

`‎Lib/test/test_urlparse.py`

Lines changed: 60 additions & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -660,14 +660,73 @@ def test_urlsplit_remove_unsafe_bytes(self):`
`660`	`660`	`self.assertEqual(p.scheme,"https")`
`661`	`661`	`self.assertEqual(p.geturl(),"https://www.python.org/#"diff-bb89d64e7413a637b421609f9a4f532c294037171c3f0150479930acf30425fa-662-662-0" data-selected="false" role="gridcell" tabindex="-1" valign="top">662`	`662`
	`663`	`+deftest_urlsplit_strip_url(self):`
	`664`	`+noise=bytes(range(0,0x20+1))`
	`665`	`+base_url="http://User:Pass@www.python.org:080/doc/?query=yes#frag"`
	`666`	`+`
	`667`	`+url=noise.decode("utf-8")+base_url`
	`668`	`+p=urllib.parse.urlsplit(url)`
	`669`	`+self.assertEqual(p.scheme,"http")`
	`670`	`+self.assertEqual(p.netloc,"User:Pass@www.python.org:080")`
	`671`	`+self.assertEqual(p.path,"/doc/")`
	`672`	`+self.assertEqual(p.query,"query=yes")`
	`673`	`+self.assertEqual(p.fragment,"frag")`
	`674`	`+self.assertEqual(p.username,"User")`
	`675`	`+self.assertEqual(p.password,"Pass")`
	`676`	`+self.assertEqual(p.hostname,"www.python.org")`
	`677`	`+self.assertEqual(p.port,80)`
	`678`	`+self.assertEqual(p.geturl(),base_url)`
	`679`	`+`
	`680`	`+url=noise+base_url.encode("utf-8")`
	`681`	`+p=urllib.parse.urlsplit(url)`
	`682`	`+self.assertEqual(p.scheme,b"http")`
	`683`	`+self.assertEqual(p.netloc,b"User:Pass@www.python.org:080")`
	`684`	`+self.assertEqual(p.path,b"/doc/")`
	`685`	`+self.assertEqual(p.query,b"query=yes")`
	`686`	`+self.assertEqual(p.fragment,b"frag")`
	`687`	`+self.assertEqual(p.username,b"User")`
	`688`	`+self.assertEqual(p.password,b"Pass")`
	`689`	`+self.assertEqual(p.hostname,b"www.python.org")`
	`690`	`+self.assertEqual(p.port,80)`
	`691`	`+self.assertEqual(p.geturl(),base_url.encode("utf-8"))`
	`692`	`+`
	`693`	`+# Test that trailing space is preserved as some applications rely on`
	`694`	`+# this within query strings.`
	`695`	`+query_spaces_url="https://www.python.org:88/doc/?query= "`
	`696`	`+p=urllib.parse.urlsplit(noise.decode("utf-8")+query_spaces_url)`
	`697`	`+self.assertEqual(p.scheme,"https")`
	`698`	`+self.assertEqual(p.netloc,"www.python.org:88")`
	`699`	`+self.assertEqual(p.path,"/doc/")`
	`700`	`+self.assertEqual(p.query,"query= ")`
	`701`	`+self.assertEqual(p.port,88)`
	`702`	`+self.assertEqual(p.geturl(),query_spaces_url)`
	`703`	`+`
	`704`	`+p=urllib.parse.urlsplit("www.pypi.org ")`
	`705`	`+# That "hostname" gets considered a "path" due to the`
	`706`	`+# trailing space and our existing logic... YUCK...`
	`707`	`+# and re-assembles via geturl aka unurlsplit into the original.`
	`708`	`+# django.core.validators.URLValidator (at least through v3.2) relies on`
	`709`	`+# this, for better or worse, to catch it in a ValidationError via its`
	`710`	`+# regular expressions.`
	`711`	`+# Here we test the basic round trip concept of such a trailing space.`
	`712`	`+self.assertEqual(urllib.parse.urlunsplit(p),"www.pypi.org ")`
	`713`	`+`
	`714`	`+# with scheme as cache-key`
	`715`	`+url="//www.python.org/"`
	`716`	`+scheme=noise.decode("utf-8")+"https"+noise.decode("utf-8")`
	`717`	`+for_inrange(2):`
	`718`	`+p=urllib.parse.urlsplit(url,scheme=scheme)`
	`719`	`+self.assertEqual(p.scheme,"https")`
	`720`	`+self.assertEqual(p.geturl(),"https://www.python.org/")`
	`721`	`+`
`663`	`722`	`deftest_attributes_bad_port(self):`
`664`	`723`	`"""Check handling of invalid ports."""`
`665`	`724`	`forbytesin (False,True):`
`666`	`725`	`forparsein (urllib.parse.urlsplit,urllib.parse.urlparse):`
`667`	`726`	`forportin ("foo","1.5","-1","0x10"):`
`668`	`727`	`withself.subTest(bytes=bytes,parse=parse,port=port):`
`669`	`728`	`netloc="www.example.net:"+port`
`670`		`-url="http://"+netloc`
	`729`	`+url="http://"+netloc+"/"`
`671`	`730`	`ifbytes:`
`672`	`731`	`netloc=netloc.encode("ascii")`
`673`	`732`	`url=url.encode("ascii")`

`‎Lib/urllib/parse.py`

Lines changed: 13 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -25,6 +25,10 @@`
`25`	`25`	`scenarios for parsing, and for backward compatibility purposes, some`
`26`	`26`	`parsing quirks from older RFCs are retained. The testcases in`
`27`	`27`	`test_urlparse.py provides a good indicator of parsing behavior.`
	`28`	`+`
	`29`	`+The WHATWG URL Parser spec should also be considered. We are not compliant with`
	`30`	`+it either due to existing user code API behavior expectations (Hyrum's Law).`
	`31`	`+It serves as a useful guide when making changes.`
`28`	`32`	`"""`
`29`	`33`
`30`	`34`	`importre`
`@@ -76,6 +80,10 @@`
`76`	`80`	`'0123456789'`
`77`	`81`	`'+-.')`
`78`	`82`
	`83`	`+# Leading and trailing C0 control and space to be stripped per WHATWG spec.`
	`84`	`+# == "".join([chr(i) for i in range(0, 0x20 + 1)])`
	`85`	`+_WHATWG_C0_CONTROL_OR_SPACE='\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f '`
	`86`	`+`
`79`	`87`	`# Unsafe bytes to be removed per WHATWG spec`
`80`	`88`	`_UNSAFE_URL_BYTES_TO_REMOVE= ['\t','\r','\n']`
`81`	`89`
`@@ -426,6 +434,11 @@ def urlsplit(url, scheme='', allow_fragments=True):`
`426`	`434`	`url,scheme,_coerce_result=_coerce_args(url,scheme)`
`427`	`435`	`url=_remove_unsafe_bytes_from_url(url)`
`428`	`436`	`scheme=_remove_unsafe_bytes_from_url(scheme)`
	`437`	`+# Only lstrip url as some applications rely on preserving trailing space.`
	`438`	`+# (https://url.spec.whatwg.org/#concept-basic-url-parser would strip both)`
	`439`	`+url=url.lstrip(_WHATWG_C0_CONTROL_OR_SPACE)`
	`440`	`+scheme=scheme.strip(_WHATWG_C0_CONTROL_OR_SPACE)`
	`441`	`+`
`429`	`442`	`allow_fragments=bool(allow_fragments)`
`430`	`443`	`key=url,scheme,allow_fragments,type(url),type(scheme)`
`431`	`444`	`cached=_parse_cache.get(key,None)`

`‎Misc/NEWS.d/next/Security/2023-03-07-20-59-17.gh-issue-102153.14CLSZ.rst`

Lines changed: 3 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	+:func:`urllib.parse.urlsplit` now strips leading C0 control and space
	`2`	`+characters following the specification for URLs defined by WHATWG in`
	`3`	`+response to CVE-2023-24329. Patch by Illia Volochii.`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitbb97e80

File tree

4 files changed

4 files changed

`‎Doc/library/urllib.parse.rst`

`‎Lib/test/test_urlparse.py`

`‎Lib/urllib/parse.py`

`‎Misc/NEWS.d/next/Security/2023-03-07-20-59-17.gh-issue-102153.14CLSZ.rst`

0 commit comments