Pandas DataFrame into a PostgreSQL Table

#postgres #datascience #python #tutorial

Final Result

We have this DataFrame in Jupyter Notebook. We clean it using Pandas — filling or deleting null values, adding new columns, converting data types, etc.
Now we want to view the cleaned DataFrame as a table inside a SQL database, so we can perform further analysis using SQL.

How to Convert a Pandas DataFrame into a Database Table

We'll usePandas andSQLAlchemy to connect Jupyter Notebook to a PostgreSQL database.

Steps Overview

Install Jupyter Notebook
Install PostgreSQL
Create a database to store the DataFrame as a table
Prepare the DataFrame in Jupyter
Connect Jupyter to PostgreSQL and convert the DataFrame into a table

1. Install Jupyter Notebook

Open your terminal or PowerShell and navigate (cd) to the directory where you want to store your Jupyter Notebook files.
For example:

cd ~/Documents/

Install and start Jupyter:

pipinstallnotebookjupyter notebook

You should see a new browser window open:

Click“New” → “Python 3 (ipykernel)” to create a new notebook.
This is where we’ll create and clean our DataFrame.

💡 Your laptop is now running a Jupyter Notebook server locally.
You have successfully installed Jupyter Notebook which we will use to write python code.

2. Install PostgreSQL

Open your terminal and run the following commands,for macOS only:

brewinstallpostgresql# install PostgreSQLbrew services list# check if PostgreSQL is runningbrew services start postgresql# start it if not runningpsql postgres# enter psql as the default 'postgres' user

You should see something like this:

psql (14.19 (Homebrew))Type "help" for help.postgres=#

This means you’re connected to a locally running PostgreSQL server.
Use\l to list all databases.

3. Create a Database

Inside the PostgreSQL terminal (psql), create a new database:

postgres=#CREATEDATABASEmydatabase;postgres=#\l-- list all databasespostgres=#\cmydatabase-- connect to the new database

4. Prepare the DataFrame in Jupyter

Run the following code in Jupyter:

!pipinstallpandasimportpandasaspddf=pd.DataFrame({'id':[1,2,3,4,5],'name':['Po','Tai Lung','Shifu','Oogway','Mr Ping'],'salary':[3000,2000,2500,4000,1000],'hire_date':pd.to_datetime(pd.Series(['2018.08.31','2001.01.01','1970.10.30','1950.01.01','2019.01.01']),format='%Y.%m.%d')})df.head()

5. Connect Jupyter to PostgreSQL and Load the DataFrame

!pipinstallpsycopg2-binarySQLAlchemyfromsqlalchemyimportcreate_engine# Step 1: Connect to PostgreSQLusername="postgres"# default userpassword=""# the password you set during installationhost="localhost"# if running locallyport="5432"# default PostgreSQL portdatabase="mydatabase"# the database you created earlierengine=create_engine(f"postgresql+psycopg2://{username}:{password}@{host}:{port}/{database}")# Step 2: Load the DataFrame into PostgreSQLtable_name="mytable"df.to_sql(table_name,engine,if_exists="replace",index=False)print(f"✅ Data successfully loaded into table'{table_name}' in database'{database}'.")

6. Verify the Table in PostgreSQL

In your terminal, run:

mydatabase=#SELECT*FROMmytable;

You should see the same data you created in Jupyter:

 id |   name    | salary |     hire_date----+------------+--------+---------------------  1 | Po         |   3000 | 2018-08-31 00:00:00  2 | Tai Lung   |   2000 | 2001-01-01 00:00:00  3 | Shifu      |   2500 | 1970-10-30 00:00:00  4 | Oogway     |   4000 | 1950-01-01 00:00:00  5 | Mr Ping    |   1000 | 2019-01-01 00:00:00

🏹 We successfully connected our Jupyter Notebook with the locally running psql database and made the dataframe on Jupyter available on our local database.

Additional Useful Commands

Command	Description
`\l`	List all databases
`\c database_name`	Connect to a specific database
`\dt`	List all tables in the current database
`\d table_name`	Show details of a specific table
`\dn`	List all schemas
`\du`	List all users
`\q`	Quit PostgreSQL
`\x`	Toggle expanded output for wide tables