
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Working with HTML Data
The Pandas library provides extensive functionalities for handling data from various formats. One such format isHTML (HyperText Markup Language), which is a commonly used format for structuring web content. The HTML files may contain tabular data, which can be extracted and analyzed using the Pandas library.
An HTML table is a structured format used to represent tabular data in rows and columns within a webpage. Extracting this tabular data from an HTML is possible by using thepandas.read_html() function. Writing the Pandas DataFrame back to an HTML table is also possible using theDataFrame.to_html() method.
In this tutorial, we will learn about how to work with HTML data using Pandas, including reading HTML tables and writing the Pandas DataFrames to HTML tables.
Reading HTML Tables from a URL
Thepandas.read_html() function is used for reading tables from HTML files, strings, or URLs. It automatically parses <table> elements in HTML and returns a list ofpandas.DataFrame objects.
Example
Here is the basic example of reading the data from a URL using thepandas.read_html() function.
import pandas as pd# Read HTML table from a URLurl = "https://www.tutorialspoint.com/sql/sql-clone-tables.htm"tables = pd.read_html(url)# Access the first table from the URLdf = tables[0]# Display the resultant DataFrameprint('Output First DataFrame:', df.head())Following is the output of the above code −
Output First DataFrame:
| ID | NAME | AGE | ADDRESS | SALARY | |
|---|---|---|---|---|---|
| 0 | 1 | Ramesh | 32 | Ahmedabad | 2000.0 |
| 1 | 2 | Khilan | 25 | Delhi | 1500.0 |
| 2 | 3 | Kaushik | 23 | Kota | 2000.0 |
| 3 | 4 | Chaitali | 25 | Mumbai | 6500.0 |
| 4 | 5 | Hardik | 27 | Bhopal | 8500.0 |
Reading HTML Data from a String
Reading HTML data directly from a string can be possible by using the Python'sio.StringIO module.
Example
The following example demonstrates how to read the HTML string using StringIO without saving to a file.
import pandas as pdfrom io import StringIO# Create an HTML stringhtml_str = """<table> <tr><th>C1</th><th>C2</th><th>C3</th></tr> <tr><td>a</td><td>b</td><td>c</td></tr> <tr><td>x</td><td>y</td><td>z</td></tr></table>"""# Read the HTML stringdfs = pd.read_html(StringIO(html_str))print(dfs[0])
Following is the output of the above code −
| C1 | C2 | C3 | |
|---|---|---|---|
| 0 | a | b | c |
| 1 | x | y | z |
Example
This is an alternative way of reading the HTML string with out using theio.StringIO module. Here we will save the HTML string into a temporary file and read it using thepandas.read_html() function.
import pandas as pd# Create an HTML stringhtml_str = """<table> <tr><th>C1</th><th>C2</th><th>C3</th></tr> <tr><td>a</td><td>b</td><td>c</td></tr> <tr><td>x</td><td>y</td><td>z</td></tr></table>"""# Save to a temporary file and readwith open("temp.html", "w") as f: f.write(html_str)df = pd.read_html("temp.html")[0]print(df)Following is the output of the above code −
| C1 | C2 | C3 | |
|---|---|---|---|
| 0 | a | b | c |
| 1 | x | y | z |
Handling Multiple Tables from an HTML file
While reading an HTML file of containing multiple tables, we can handle it by using thematch parameter of thepandas.read_html() function to read a table that has specific text.
Example
The following example reads a table that has a specific text from the HTML file of having multiple tables using thematch parameter.
import pandas as pd# Read tables from a SQL tutorialurl = "https://www.tutorialspoint.com/sql/sql-clone-tables.htm"tables = pd.read_html(url, match='Field')# Access the tabledf = tables[0]print(df.head())
Following is the output of the above code −
| Field | Type | Null | Key | Default | Extra | |
|---|---|---|---|---|---|---|
| 1 | ID | int(11) | NO | PRI | NaN | NaN |
| 2 | NAME | varchar(20) | NO | NaN | NaN | NaN |
| 3 | AGE | int(11) | NO | NaN | NaN | NaN |
| 4 | ADDRESS | char(25) | YES | NaN | NaN | NaN |
| 5 | SALARY | decimal(18,2) | YES | NaN | NaN | NaN |
Writing DataFrames to HTML
Pandas DataFrame objects can be converted to HTML tables using theDataFrame.to_html() method. This method returns a string if the parameterbuf is set to None.
Example
The following example demonstrates how to write a Pandas DataFrame to an HTML Table using theDataFrame.to_html() method.
import pandas as pd# Create a DataFramedf = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])# Convert the DataFrame to HTML tablehtml = df.to_html()# Display the HTML stringprint(html)
Following is the output of the above code −
<table border="1"> <thead> <tr> <th></th> <th>A</th> <th>B</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1</td> <td>2</td> </tr> <tr> <th>1</th> <td>3</td> <td>4</td> </tr> </tbody></table>