Movatterモバイル変換

Issue36407

➜

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/80588

classification

Title:	xml.dom.minidom wrong indentation writing for CDATA section
Type:	enhancement	Stage:	resolved
Components:	XML	Versions:	Python 3.8

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	eli.bendersky, scoder, serhiy.storchaka, vsurjaninov
Priority:	normal	Keywords:	patch

Created on2019-03-23 15:38 byvsurjaninov, last changed2022-04-11 14:59 byadmin. This issue is nowclosed.

Pull Requests
URL	Status	Linked	Edit
PR 12514	merged	vsurjaninov,2019-03-23 16:00
PR 12578	closed	miss-islington,2019-03-27 06:19

Messages (5)
msg338681 -(view)	Author: Vladimir Surjaninov (vsurjaninov)*	Date: 2019-03-23 15:38
If we are writing xml with CDATA section and leaving non-empty indentation and new-line parameters, a parent node of the section will contain useless indentation, that will be parsed as a text.Example:>>>doc = minidom.Document()>>>root = doc.createElement('root')>>>doc.appendChild(root)>>>node = doc.createElement('node')>>>root.appendChild(node)>>>data = doc.createCDATASection('</data>')>>>node.appendChild(data)>>>print(doc.toprettyxml(indent=‘ ‘ * 4)<?xml version="1.0" ?><root> <node><![CDATA[</data>]]> </node></root>If we try to parse this output doc, we won’t get CDATA value correctly.Following code returns a string that contains only indentation characters:>>>doc = minidom.parseString(xml_text)>>>doc.getElementsByTagName('node')[0].firstChild.nodeValueReturns a string with CDATA value and indentation characters:>>>doc.getElementsByTagName('node')[0].firstChild.wholeTextBut we have a workaround:>>>data.nodeType = data.TEXT_NODE…>>>print(doc.toprettyxml(indent=‘ ‘ * 4)<?xml version="1.0" ?><root> <node><![CDATA[</data>]]></node></root>It will be parsed correctly:>>>doc.getElementsByTagName('node')[0].firstChild.nodeValue</data>But I think it will be better if we fix the writing function, which would set this as default behavior.
msg338701 -(view)	Author: Stefan Behnel (scoder)*	Date: 2019-03-23 21:33
Yes, this case is incorrect. Pretty printing should not change character content inside of a simple tag.The PR looks good to me.
msg338936 -(view)	Author: Serhiy Storchaka (serhiy.storchaka)*	Date: 2019-03-27 05:59
New changeset384b81d923addd52125e94470b11d2574ca266a9 by Serhiy Storchaka (Vladimir Surjaninov) in branch 'master':bpo-36407: Fix writing indentations of CDATA section (xml.dom.minidom). (GH-12514)https://github.com/python/cpython/commit/384b81d923addd52125e94470b11d2574ca266a9
msg338939 -(view)	Author: Serhiy Storchaka (serhiy.storchaka)*	Date: 2019-03-27 06:19
Should we backport this change? I am not sure.
msg338943 -(view)	Author: Stefan Behnel (scoder)*	Date: 2019-03-27 07:04
I don't think this should be backported. Pretty-printing is not a production relevant feature, more of a "debugging, diffing and help users see what they get" kind of feature. It's good to have it fixed for the future, but we shouldn't bother users with it during a point release.

History
Date	User	Action	Args
2022-04-11 14:59:12	admin	set	github: 80588
2019-03-27 12:08:27	serhiy.storchaka	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2019-03-27 07:04:43	scoder	set	messages: +msg338943
2019-03-27 06:19:42	serhiy.storchaka	set	messages: +msg338939
2019-03-27 06:19:22	miss-islington	set	pull_requests: +pull_request12522
2019-03-27 05:59:02	serhiy.storchaka	set	messages: +msg338936
2019-03-23 21:33:28	scoder	set	messages: +msg338701 versions: + Python 3.8
2019-03-23 16:00:14	vsurjaninov	set	keywords: +patch stage: patch review pull_requests: +pull_request12465
2019-03-23 15:40:39	xtreak	set	nosy: +scoder,eli.bendersky,serhiy.storchaka
2019-03-23 15:38:49	vsurjaninov	create

Supported byThe Python Software Foundation,
Powered byRoundup