98

The documentation for Pandas has numerous examples of best practices for working with data stored in various formats.

However, I am unable to find any good examples for working with databases like MySQL for example.

Can anyone point me to links or give some code snippets of how to convert query results usingmysql-python to data frames in Pandas efficiently ?

Scott Boston's user avatar
Scott Boston
154k15 gold badges160 silver badges207 bronze badges
askedApr 8, 2012 at 18:01
user1320615's user avatar
4

14 Answers14

103

As Wes says, io/sql's read_sql will do it, once you've gotten a database connection using a DBI compatible library. We can look at two short examples using theMySQLdb andcx_Oracle libraries to connect to Oracle and MySQL and query their data dictionaries. Here is the example forcx_Oracle:

import pandas as pdimport cx_Oracleora_conn = cx_Oracle.connect('your_connection_string')df_ora = pd.read_sql('select * from user_objects', con=ora_conn)    print 'loaded dataframe from Oracle. # Records: ', len(df_ora)ora_conn.close()

And here is the equivalent example forMySQLdb:

import MySQLdbmysql_cn= MySQLdb.connect(host='myhost',                 port=3306,user='myusername', passwd='mypassword',                 db='information_schema')df_mysql = pd.read_sql('select * from VIEWS;', con=mysql_cn)    print 'loaded dataframe from MySQL. records:', len(df_mysql)mysql_cn.close()
Owen's user avatar
Owen
1,6622 gold badges20 silver badges24 bronze badges
answeredJun 21, 2012 at 12:34
Keith C Campbell's user avatar
Sign up to request clarification or add additional context in comments.

Comments

58

For recent readers of this question: pandas have the following warning in theirdocs for version 14.0:

Warning: Some of the existing functions or function aliases have been deprecated and will be removed in future versions. This includes: tquery, uquery, read_frame, frame_query, write_frame.

And:

Warning: The support for the ‘mysql’ flavor when using DBAPI connection objects has been deprecated. MySQL will be further supported with SQLAlchemy engines (GH6900).

This makes many of the answers here outdated. You should usesqlalchemy:

from sqlalchemy import create_engineimport pandas as pdengine = create_engine('dialect://user:pass@host:port/schema', echo=False)f = pd.read_sql_query('SELECT * FROM mytable', engine, index_col = 'ID')
answeredJul 30, 2014 at 7:04
tktk's user avatar

5 Comments

loading a table with 133 rows and 7 columns takes around 30 secs.. can you give some insights regarding why is that?
@idoda [in general this is not the question's topic and it's better to ask a new question so you'd get more opinions]. Are you sure this is not a matter of request delay? Is simply sending the query and retrieving the results significantly faster?
@Korem I did thought about opening a new one, but I wanted to make sure it is not a trivial one first. When I use an mySql client (Sequel pro) and query the database, reuslts come up much faster. When you say "simply sending and then retrieving", is that what you mean? (using a client)
@idoda I mean comparing the time it takes to executeengine.execute("select * FROM mytable") with the time it takes to executepd.read_sql_query('SELECT * FROM mytable', engine)
Can one pass a sqlalchemy query (session.query as in my answer below) directly to a pandas method? That would be a ripper!
23

For the record, here is an example using a sqlite database:

import pandas as pdimport sqlite3with sqlite3.connect("whatever.sqlite") as con:    sql = "SELECT * FROM table_name"    df = pd.read_sql_query(sql, con)    print df.shape
propjk007's user avatar
propjk007
7051 gold badge12 silver badges19 bronze badges
answeredFeb 26, 2013 at 18:58
mbatchkarov's user avatar

1 Comment

You can specify the column to use as an index by specifyingindex_col='timestamp' inframe_query.
21

I prefer to create queries withSQLAlchemy, and then make a DataFrame from it.SQLAlchemy makes it easier to combineSQL conditions Pythonically if you intend to mix and match things over and over.

from sqlalchemy.ext.declarative import declarative_basefrom sqlalchemy import Tablefrom sqlalchemy import create_enginefrom sqlalchemy.orm import sessionmakerfrom pandas import DataFrameimport datetime# We are connecting to an existing serviceengine = create_engine('dialect://user:pwd@host:port/db', echo=False)Session = sessionmaker(bind=engine)session = Session()Base = declarative_base()# And we want to query an existing tabletablename = Table('tablename',     Base.metadata,     autoload=True,     autoload_with=engine,     schema='ownername')# These are the "Where" parameters, but I could as easily # create joins and limit resultsus = tablename.c.country_code.in_(['US','MX'])dc = tablename.c.locn_name.like('%DC%')dt = tablename.c.arr_date >= datetime.date.today() # Give me convenience or...q = session.query(tablename).\            filter(us & dc & dt) # That's where the magic happens!!!def querydb(query):    """    Function to execute query and return DataFrame.    """    df = DataFrame(query.all());    df.columns = [x['name'] for x in query.column_descriptions]    return dfquerydb(q)
answeredApr 28, 2014 at 3:08
dmvianna's user avatar

1 Comment

Also you must specify the driver if it's not the same asthe SQLAlchemy's default one:dialect+driver://user:pwd@host:port/db
11

MySQL example:

import MySQLdb as dbfrom pandas import DataFramefrom pandas.io.sql import frame_querydatabase = db.connect('localhost','username','password','database')data     = frame_query("SELECT * FROM data", database)
answeredJan 21, 2014 at 14:03
aerkenemesis's user avatar

1 Comment

frame_query is now deprecated. Now usepd.read_sql(query, db) instead.
8

The same syntax works for Ms SQL server using podbc also.

import pyodbcimport pandas.io.sql as psqlcnxn = pyodbc.connect('DRIVER={SQL Server};SERVER=servername;DATABASE=mydb;UID=username;PWD=password') cursor = cnxn.cursor()sql = ("""select * from mytable""")df = psql.frame_query(sql, cnxn)cnxn.close()
answeredOct 11, 2012 at 7:18
hedgcutter's user avatar

Comments

6

And this is how you connect to PostgreSQL using psycopg2 driver (install with "apt-get install python-psycopg2" if you're on Debian Linux derivative OS).

import pandas.io.sql as psqlimport psycopg2conn = psycopg2.connect("dbname='datawarehouse' user='user1' host='localhost' password='uberdba'")q = """select month_idx, sum(payment) from bi_some_table"""df3 = psql.frame_query(q, conn)
answeredSep 11, 2013 at 22:30
Will's user avatar

Comments

4

For Sybase the following works (withhttp://python-sybase.sourceforge.net)

import pandas.io.sql as psqlimport Sybasedf = psql.frame_query("<Query>", con=Sybase.connect("<dsn>", "<user>", "<pwd>"))
answeredJan 9, 2013 at 22:36
user1827356's user avatar

Comments

4

pandas.io.sql.frame_query is deprecated. Usepandas.read_sql instead.

joris's user avatar
joris
140k37 gold badges257 silver badges207 bronze badges
answeredDec 17, 2014 at 17:34
ajkl's user avatar

Comments

1

import the module

import pandas as pdimport oursql

connect

conn=oursql.connect(host="localhost",user="me",passwd="mypassword",db="classicmodels")sql="Select customerName, city,country from customers order by customerName,country,city"df_mysql = pd.read_sql(sql,conn)print df_mysql

That works just fine and using pandas.io.sql frame_works (with the deprecation warning). Database used is the sample database from mysql tutorial.

Tobi Nary's user avatar
Tobi Nary
4,6064 gold badges33 silver badges50 bronze badges
answeredFeb 14, 2016 at 14:57
user5925400's user avatar

Comments

0

This should work just fine.

import MySQLdb as mdbimport pandas as pdcon = mdb.connect(‘127.0.0.1’, ‘root’, ‘password’, ‘database_name’);with con: cur = con.cursor() cur.execute(“select random_number_one, random_number_two, random_number_three from randomness.a_random_table”) rows = cur.fetchall() df = pd.DataFrame( [[ij for ij in i] for i in rows] ) df.rename(columns={0: ‘Random Number One’, 1: ‘Random Number Two’, 2: ‘Random Number Three’}, inplace=True); print(df.head(20))
answeredJan 31, 2017 at 4:20
MontyPython's user avatar

Comments

0

This helped for me for connecting toAWS MYSQL(RDS) frompython 3.x basedlambda function and loading into a pandas DataFrame

import jsonimport boto3import pymysqlimport pandas as pduser = 'username'password = 'XXXXXXX'client = boto3.client('rds')def lambda_handler(event, context):    conn = pymysql.connect(host='xxx.xxxxus-west-2.rds.amazonaws.com', port=3306, user=user, passwd=password, db='database name', connect_timeout=5)    df= pd.read_sql('select * from TableName limit 10',con=conn)    print(df)    # TODO implement    #return {    #    'statusCode': 200,    #    'df': df    #}
answeredMay 28, 2019 at 8:20
Dheeraj Inampudi's user avatar

Comments

0

For Postgres users

import psycopg2import pandas as pdconn = psycopg2.connect("database='datawarehouse' user='user1' host='localhost' password='uberdba'")customers = 'select * from customers'customers_df = pd.read_sql(customers,conn)customers_df
answeredMar 6, 2020 at 9:07
EvaMwangi's user avatar

1 Comment

Could you point out the difference to theanswer of @Will and why your solution should be chosen?
0

Usingmysql.connector, you could write something like this:

import mysql.connectorimport pandas as pd# Database credentialsDB_HOST = 'host_ip'DB_NAME = 'db_name or schema'DB_USER = 'user_name'DB_PASS = 'password'try:    # Connect to the database    conn = mysql.connector.connect(        host=DB_HOST,        database=DB_NAME,        user=DB_USER,        password=DB_PASS    )    # Create a cursor object to execute SQL queries    cursor = conn.cursor()    # Example query    query = "SELECT * FROM your_table"    # Execute the query    cursor.execute(query)    # Fetch all the rows    rows = cursor.fetchall()    # Get column names    column_names = [desc[0] for desc in cursor.description]    # Create a DataFrame from the fetched rows and column names    df = pd.DataFrame(rows, columns=column_names)    # Process or analyze the DataFrame as needed    print(df)    # Close the cursor and connection    cursor.close()    conn.close()except mysql.connector.Error as error:    print(f"Failed to connect to MySQL: {error}")
answeredMay 17, 2023 at 18:14
Leo's user avatar

1 Comment

Your answer could be improved with additional supporting information. Pleaseedit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answersin the help center.

Your Answer

Sign up orlog in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to ourterms of service and acknowledge you have read ourprivacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.