
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Writing XML
Just likeParsing XML Files, Pandas also provides an easy method to convert DataFrames into XML document. TheDataFrame.to_xml() method in Python Pandas allows you to render the contents of a DataFrame as an XML document. XML (Extensible Markup Language) is widely used for data representation format due to its flexibility.
In this tutorial, we will learn about functionality of theDataFrame.to_xml() method, its parameters, and examples to demonstrate different use cases.
The to_xml() Method
The Pandas DataFrame object provides a method calledto_xml() for converting the contents of a DataFrame into an XML document. This method can write the output XML to a file or return it as a string. It also supports customization of XML structure, namespaces, attributes, formatting, and more by using its various options.
Syntax
Following is the syntax of the to_xml() method −
DataFrame.to_xml(path_or_buffer=None, *, root_name='data', row_name='row', attr_cols=None, elem_cols=None, namespaces=None, prefix=None, ...)
Where,
- path_or_buffer: Specifies the XML output location. It can be a string, path object, or file-like object. If None, the XML is returned as a string instead of saving it to a file.
- root_name: Specifies the name of the root element in the XML document. Default is 'data'.
- row_name: Specifies the name of the row element. Default is 'row'.
- attr_cols: List of columns to be written as attributes in the row elements.
- elem_cols: List of columns to be written as child elements of the row element.
- namespaces: Dictionary of namespaces to include in the XML.
- prefix: Namespace prefix for elements and attributes.
You can get more details about this method from the following tutorialDataFrame.to_xml().
Example
Here is a simple example demonstrating the conversion of a Pandas DataFrame into XML format with default settings using theDataFrame.to_xml() method.
import pandas as pd# Sample DataFramedf = pd.DataFrame({'name': ['Tanmay', 'Manisha'],'company': ['TutorialsPoint', 'TutorialsPoint'],'phone': ['(011) 123-4567', '(011) 789-4567']})# Convert to XMLprint(df.to_xml())Following is the output of the above code −
<?xml version='1.0' encoding='utf-8'?><data> <row> <index>0</index> <name>Tanmay</name> <company>TutorialsPoint</company> <phone>(011) 123-4567</phone> </row> <row> <index>1</index> <name>Manisha</name> <company>TutorialsPoint</company> <phone>(011) 789-4567</phone> </row></data>
Customizing Root and Row Names
While converting a Pandas DataFrame into the XML Format, you can change the default root and row element ('data' and 'row') names for better context representation. For this we can use theroot_name androw_name parameters of theDataFrame.to_xml() method.
Example
The following example uses theroot_name androw_name parameters for customizing the element tags of the XML data.
import pandas as pd# Sample DataFramedf = pd.DataFrame({'name': ['Tanmay', 'Manisha'],'company': ['TutorialsPoint', 'TutorialsPoint'],'phone': ['(011) 123-4567', '(011) 789-4567']})# Convert to XML with custom root and row namesprint(df.to_xml(root_name="contact-info", row_name="contact"))Following is the output of the above code −
<?xml version='1.0' encoding='utf-8'?><contact-info> <contact> <index>0</index> <name>Tanmay</name> <company>TutorialsPoint</company> <phone>(011) 123-4567</phone> </contact> <contact> <index>1</index> <name>Manisha</name> <company>TutorialsPoint</company> <phone>(011) 789-4567</phone> </contact></contact-info>
Writing an Attribute-Centric XML
Theattr_cols parameter of theDataFrame.to_xml() method is used to represent columns as attributes instead of row elements.
Example
This example shows how to write the attribute-centric XML using Pandasto_xml() method. When you specify columns inattr_cols, their values appear as attributes of the row elements instead of child elements.
import pandas as pd# Sample DataFramedf = pd.DataFrame({'name': ['Tanmay', 'Manisha'],'company': ['TutorialsPoint', 'TutorialsPoint'],'phone': ['(011) 123-4567', '(011) 789-4567']})# Write columns as attributesprint(df.to_xml(attr_cols=df.columns.tolist()))Following is the output of the above code −
<?xml version='1.0' encoding='utf-8'?><data> <row index="0" name="Tanmay" company="TutorialsPoint" phone="(011) 123-4567"/> <row index="1" name="Manisha" company="TutorialsPoint" phone="(011) 789-4567"/></data>
Mixing Attributes and Elements
You can also mix some columns as attributes and others as child elements usingattr_cols andelem_cols parameters. These parameters allow you to control the structure of row elements, defining which columns become attributes and which remain as child elements.
Example
This example demonstrates how convert a DataFrame to an XML with a mix of attributes and elements. Here, thename attribute is added to the <row> element, while company and phone are nested as child elements of <row>.
import pandas as pd# Sample DataFramedf = pd.DataFrame({'name': ['Tanmay', 'Manisha'],'company': ['TutorialsPoint', 'TutorialsPoint'],'phone': ['(011) 123-4567', '(011) 789-4567']})# Mix attributes and elementsprint(df.to_xml(attr_cols=['name'], elem_cols=['company', 'phone']))Following is the output of the above code −
<?xml version='1.0' encoding='utf-8'?><data> <row index="0" name="Tanmay"> <index>0</index> <company>TutorialsPoint</company> <phone>(011) 123-4567</phone> </row> <row index="1" name="Manisha"> <index>1</index> <company>TutorialsPoint</company> <phone>(011) 789-4567</phone> </row></data>
Handling Hierarchical Columns
Any hierarchical columns in a Pandas DataFrame will be flattened with underscores when converting it to XML documents.
Example
The following example demonstrates handling the hierarchical column names are flattened using an underscore (_) delimiter to create valid XML element names.
import pandas as pd# Create a MultiIndex objectindex = pd.MultiIndex.from_tuples([('A', 'one'), ('A', 'two'), ('B', 'one'), ('B', 'two')])# Create hierarchical DataFramedata = [[1, 2], [3, 4], [5, 6], [7, 8]]df = pd.DataFrame(data, index=index, columns=['X', 'Y'])# Diaply the hierarchical DataFrameprint("Hierarchical DataFrame:")print(df)# Convert to XMLprint('Output XML:')print(df.to_xml())Following is the output of the above code −
Hierarchical DataFrame:
| X | Y | ||
|---|---|---|---|
| A | one | 1 | 2 |
| two | 3 | 4 | |
| B | one | 5 | 6 |
| two | 7 | 8 |
Adding Namespaces While Writing XML
Namespaces can be included for the root element and other XML nodes using thenamespaces parameter.
Example
The following example demonstrates adding the default name space to the XML documents while converting it from a Pandas DataFrame using theto_xml() method.
import pandas as pd# Create a MultiIndex objectindex = pd.MultiIndex.from_tuples([('A', 'one'), ('A', 'two'), ('B', 'one'), ('B', 'two')])# Create a DataFramedata = [[1, 2], [3, 4], [5, 6], [7, 8]]df = pd.DataFrame(data, index=index, columns=['X', 'Y'])# Add default namespaceprint(df.to_xml(namespaces={"": "https://example.com"}))Following is the output of the above code −
<?xml version='1.0' encoding='utf-8'?><data xmlns="https://example.com"> <row> <level_0>A</level_0> <level_1>one</level_1> <X>1</X> <Y>2</Y> </row> <row> <level_0>A</level_0> <level_1>two</level_1> <X>3</X> <Y>4</Y> </row> <row> <level_0>B</level_0> <level_1>one</level_1> <X>5</X> <Y>6</Y> </row> <row> <level_0>B</level_0> <level_1>two</level_1> <X>7</X> <Y>8</Y> </row></data>
Writing XML with Namespace Prefix
You can define a prefix namespace to the elements and attributes of the XML document while creating it from Pandas using theprefix parameter.
Example
This example uses theprefix parameter to specifies the namespace prefix to the elements and attributes of the XML.
import pandas as pd# Create a MultiIndex objectindex = pd.MultiIndex.from_tuples([('A', 'one'), ('A', 'two'), ('B', 'one'), ('B', 'two')])# Create a DataFramedata = [[1, 2], [3, 4], [5, 6], [7, 8]]df = pd.DataFrame(data, index=index, columns=['X', 'Y'])# Add namespace with prefixprint(df.to_xml(namespaces={"doc": "https://example.com"}, prefix="doc"))Following is the output of the above code −
<?xml version='1.0' encoding='utf-8'?><doc:data xmlns:doc="https://example.com"> <doc:row> <doc:level_0>A</doc:level_0> <doc:level_1>one</doc:level_1> <doc:X>1</doc:X> <doc:Y>2</doc:Y> </doc:row> <doc:row> <doc:level_0>A</doc:level_0> <doc:level_1>two</doc:level_1> <doc:X>3</doc:X> <doc:Y>4</doc:Y> </doc:row> <doc:row> <doc:level_0>B</doc:level_0> <doc:level_1>one</doc:level_1> <doc:X>5</doc:X> <doc:Y>6</doc:Y> </doc:row> <doc:row> <doc:level_0>B</doc:level_0> <doc:level_1>two</doc:level_1> <doc:X>7</doc:X> <doc:Y>8</doc:Y> </doc:row></doc:data>
Disabling XML Declaration and Pretty Print
Thexml_declaration andpretty_print options can be set to False for disabling the XML declaration and pretty formatting.
Example
This example shows how to disable the the XML declaration and pretty formatting using thexml_declaration andpretty_print parameters.
import pandas as pd# Sample DataFramedf = pd.DataFrame({'name': ['Tanmay', 'Manisha'],'company': ['TutorialsPoint', 'TutorialsPoint'],'phone': ['(011) 123-4567', '(011) 789-4567']})# Disabling XML Declaration and Pretty Printprint(df.to_xml(xml_declaration=False, pretty_print=False))Following is the output of the above code −
<data><row><index>0</index><name>Tanmay</name><company>TutorialsPoint</company><phone>(011) 123-4567</phone></row><row><index>1</index><name>Manisha</name><company>TutorialsPoint</company><phone>(011) 789-4567</phone></row></data>
Transforming XML with Stylesheet
You can transform the output XML with theXSLT stylesheet using thestylesheet parameter of theDataFrame.to_xml() method. This will apply an XSLT stylesheet to modify the XML structure.
Example
The following example demonstrates the transforming XML withXSLT stylesheet. In this example we initially provided an XSLT script to transform the raw XML into a custom layout.
import pandas as pd# Create an XSLT stylesheetxsl = """<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="/data"> <contact> <xsl:apply-templates select="row"/> </contact> </xsl:template> <xsl:template match="row"> <object index="{index}"> <xsl:copy-of select="@*|node()"/> </object> </xsl:template></xsl:stylesheet>"""# Sample DataFramedf = pd.DataFrame({'name': ['Tanmay', 'Manisha'],'company': ['TutorialsPoint', 'TutorialsPoint'],'phone': ['(011) 123-4567', '(011) 789-4567']})# Apply stylesheetprint(df.to_xml(stylesheet=xsl))Following is the output of the above code −
<?xml version="1.0"?><contact> <object index="0"> <index>0</index> <name>Tanmay</name> <company>TutorialsPoint</company> <phone>(011) 123-4567</phone> </object> <object index="1"> <index>1</index> <name>Manisha</name> <company>TutorialsPoint</company> <phone>(011) 789-4567</phone> </object></contact>