Movatterモバイル変換


[0]ホーム

URL:


Python Pandas Tutorial

Python Pandas - Working with HTML Data



The Pandas library provides extensive functionalities for handling data from various formats. One such format isHTML (HyperText Markup Language), which is a commonly used format for structuring web content. The HTML files may contain tabular data, which can be extracted and analyzed using the Pandas library.

An HTML table is a structured format used to represent tabular data in rows and columns within a webpage. Extracting this tabular data from an HTML is possible by using thepandas.read_html() function. Writing the Pandas DataFrame back to an HTML table is also possible using theDataFrame.to_html() method.

In this tutorial, we will learn about how to work with HTML data using Pandas, including reading HTML tables and writing the Pandas DataFrames to HTML tables.

Reading HTML Tables from a URL

Thepandas.read_html() function is used for reading tables from HTML files, strings, or URLs. It automatically parses <table> elements in HTML and returns a list ofpandas.DataFrame objects.

Example

Here is the basic example of reading the data from a URL using thepandas.read_html() function.

import pandas as pd# Read HTML table from a URLurl = "https://www.tutorialspoint.com/sql/sql-clone-tables.htm"tables = pd.read_html(url)# Access the first table from the URLdf = tables[0]# Display the resultant DataFrameprint('Output First DataFrame:', df.head())

Following is the output of the above code −

Output First DataFrame:
IDNAMEAGEADDRESSSALARY
01Ramesh32Ahmedabad2000.0
12Khilan25Delhi1500.0
23Kaushik23Kota2000.0
34Chaitali25Mumbai6500.0
45Hardik27Bhopal8500.0

Reading HTML Data from a String

Reading HTML data directly from a string can be possible by using the Python'sio.StringIO module.

Example

The following example demonstrates how to read the HTML string using StringIO without saving to a file.

import pandas as pdfrom io import StringIO# Create an HTML stringhtml_str = """<table>   <tr><th>C1</th><th>C2</th><th>C3</th></tr>   <tr><td>a</td><td>b</td><td>c</td></tr>   <tr><td>x</td><td>y</td><td>z</td></tr></table>"""# Read the HTML stringdfs = pd.read_html(StringIO(html_str))print(dfs[0])

Following is the output of the above code −

C1C2C3
0abc
1xyz

Example

This is an alternative way of reading the HTML string with out using theio.StringIO module. Here we will save the HTML string into a temporary file and read it using thepandas.read_html() function.

import pandas as pd# Create an HTML stringhtml_str = """<table>   <tr><th>C1</th><th>C2</th><th>C3</th></tr>   <tr><td>a</td><td>b</td><td>c</td></tr>   <tr><td>x</td><td>y</td><td>z</td></tr></table>"""# Save to a temporary file and readwith open("temp.html", "w") as f:    f.write(html_str)df = pd.read_html("temp.html")[0]print(df)

Following is the output of the above code −

C1C2C3
0abc
1xyz

Handling Multiple Tables from an HTML file

While reading an HTML file of containing multiple tables, we can handle it by using thematch parameter of thepandas.read_html() function to read a table that has specific text.

Example

The following example reads a table that has a specific text from the HTML file of having multiple tables using thematch parameter.

import pandas as pd# Read tables from a SQL tutorialurl = "https://www.tutorialspoint.com/sql/sql-clone-tables.htm"tables = pd.read_html(url, match='Field')# Access the tabledf = tables[0]print(df.head())

Following is the output of the above code −

FieldTypeNullKeyDefaultExtra
1IDint(11)NOPRINaNNaN
2NAMEvarchar(20)NONaNNaNNaN
3AGEint(11)NONaNNaNNaN
4ADDRESSchar(25)YESNaNNaNNaN
5SALARYdecimal(18,2)YESNaNNaNNaN

Writing DataFrames to HTML

Pandas DataFrame objects can be converted to HTML tables using theDataFrame.to_html() method. This method returns a string if the parameterbuf is set to None.

Example

The following example demonstrates how to write a Pandas DataFrame to an HTML Table using theDataFrame.to_html() method.

import pandas as pd# Create a DataFramedf = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])# Convert the DataFrame to HTML tablehtml = df.to_html()# Display the HTML stringprint(html)

Following is the output of the above code −

<table border="1">  <thead>    <tr>      <th></th>      <th>A</th>      <th>B</th>    </tr>  </thead>  <tbody>    <tr>      <th>0</th>      <td>1</td>      <td>2</td>    </tr>    <tr>      <th>1</th>      <td>3</td>      <td>4</td>    </tr>  </tbody></table>
Print Page
Advertisements

[8]ページ先頭

©2009-2025 Movatter.jp