The documentation for Pandas has numerous examples of best practices for working with data stored in various formats.
However, I am unable to find any good examples for working with databases like MySQL for example.
Can anyone point me to links or give some code snippets of how to convert query results usingmysql-python to data frames in Pandas efficiently ?
- 1Documentation:pandas.pydata.org/pandas-docs/stable/io.html#sql-queriesMechanical snail– Mechanical snail2013-06-01 13:26:37 +00:00CommentedJun 1, 2013 at 13:26
- See also:stackoverflow.com/questions/15231646/…Mechanical snail– Mechanical snail2013-06-01 13:29:21 +00:00CommentedJun 1, 2013 at 13:29
- Also take a look atBlaze.Sergey Orshanskiy– Sergey Orshanskiy2014-12-06 03:59:50 +00:00CommentedDec 6, 2014 at 3:59
- If you're willing to spend money, I believe that Wes McKinney's book ("Python for Data Analysis") has some useful examples.SmearingMap– SmearingMap2014-12-17 18:00:01 +00:00CommentedDec 17, 2014 at 18:00
14 Answers14
As Wes says, io/sql's read_sql will do it, once you've gotten a database connection using a DBI compatible library. We can look at two short examples using theMySQLdb andcx_Oracle libraries to connect to Oracle and MySQL and query their data dictionaries. Here is the example forcx_Oracle:
import pandas as pdimport cx_Oracleora_conn = cx_Oracle.connect('your_connection_string')df_ora = pd.read_sql('select * from user_objects', con=ora_conn) print 'loaded dataframe from Oracle. # Records: ', len(df_ora)ora_conn.close()And here is the equivalent example forMySQLdb:
import MySQLdbmysql_cn= MySQLdb.connect(host='myhost', port=3306,user='myusername', passwd='mypassword', db='information_schema')df_mysql = pd.read_sql('select * from VIEWS;', con=mysql_cn) print 'loaded dataframe from MySQL. records:', len(df_mysql)mysql_cn.close()Comments
For recent readers of this question: pandas have the following warning in theirdocs for version 14.0:
Warning: Some of the existing functions or function aliases have been deprecated and will be removed in future versions. This includes: tquery, uquery, read_frame, frame_query, write_frame.
And:
Warning: The support for the ‘mysql’ flavor when using DBAPI connection objects has been deprecated. MySQL will be further supported with SQLAlchemy engines (GH6900).
This makes many of the answers here outdated. You should usesqlalchemy:
from sqlalchemy import create_engineimport pandas as pdengine = create_engine('dialect://user:pass@host:port/schema', echo=False)f = pd.read_sql_query('SELECT * FROM mytable', engine, index_col = 'ID')5 Comments
engine.execute("select * FROM mytable") with the time it takes to executepd.read_sql_query('SELECT * FROM mytable', engine)For the record, here is an example using a sqlite database:
import pandas as pdimport sqlite3with sqlite3.connect("whatever.sqlite") as con: sql = "SELECT * FROM table_name" df = pd.read_sql_query(sql, con) print df.shape1 Comment
index_col='timestamp' inframe_query.I prefer to create queries withSQLAlchemy, and then make a DataFrame from it.SQLAlchemy makes it easier to combineSQL conditions Pythonically if you intend to mix and match things over and over.
from sqlalchemy.ext.declarative import declarative_basefrom sqlalchemy import Tablefrom sqlalchemy import create_enginefrom sqlalchemy.orm import sessionmakerfrom pandas import DataFrameimport datetime# We are connecting to an existing serviceengine = create_engine('dialect://user:pwd@host:port/db', echo=False)Session = sessionmaker(bind=engine)session = Session()Base = declarative_base()# And we want to query an existing tabletablename = Table('tablename', Base.metadata, autoload=True, autoload_with=engine, schema='ownername')# These are the "Where" parameters, but I could as easily # create joins and limit resultsus = tablename.c.country_code.in_(['US','MX'])dc = tablename.c.locn_name.like('%DC%')dt = tablename.c.arr_date >= datetime.date.today() # Give me convenience or...q = session.query(tablename).\ filter(us & dc & dt) # That's where the magic happens!!!def querydb(query): """ Function to execute query and return DataFrame. """ df = DataFrame(query.all()); df.columns = [x['name'] for x in query.column_descriptions] return dfquerydb(q)1 Comment
dialect+driver://user:pwd@host:port/dbMySQL example:
import MySQLdb as dbfrom pandas import DataFramefrom pandas.io.sql import frame_querydatabase = db.connect('localhost','username','password','database')data = frame_query("SELECT * FROM data", database)1 Comment
frame_query is now deprecated. Now usepd.read_sql(query, db) instead.The same syntax works for Ms SQL server using podbc also.
import pyodbcimport pandas.io.sql as psqlcnxn = pyodbc.connect('DRIVER={SQL Server};SERVER=servername;DATABASE=mydb;UID=username;PWD=password') cursor = cnxn.cursor()sql = ("""select * from mytable""")df = psql.frame_query(sql, cnxn)cnxn.close()Comments
And this is how you connect to PostgreSQL using psycopg2 driver (install with "apt-get install python-psycopg2" if you're on Debian Linux derivative OS).
import pandas.io.sql as psqlimport psycopg2conn = psycopg2.connect("dbname='datawarehouse' user='user1' host='localhost' password='uberdba'")q = """select month_idx, sum(payment) from bi_some_table"""df3 = psql.frame_query(q, conn)Comments
For Sybase the following works (withhttp://python-sybase.sourceforge.net)
import pandas.io.sql as psqlimport Sybasedf = psql.frame_query("<Query>", con=Sybase.connect("<dsn>", "<user>", "<pwd>"))Comments
pandas.io.sql.frame_query is deprecated. Usepandas.read_sql instead.
Comments
import the module
import pandas as pdimport oursqlconnect
conn=oursql.connect(host="localhost",user="me",passwd="mypassword",db="classicmodels")sql="Select customerName, city,country from customers order by customerName,country,city"df_mysql = pd.read_sql(sql,conn)print df_mysqlThat works just fine and using pandas.io.sql frame_works (with the deprecation warning). Database used is the sample database from mysql tutorial.
Comments
This should work just fine.
import MySQLdb as mdbimport pandas as pdcon = mdb.connect(‘127.0.0.1’, ‘root’, ‘password’, ‘database_name’);with con: cur = con.cursor() cur.execute(“select random_number_one, random_number_two, random_number_three from randomness.a_random_table”) rows = cur.fetchall() df = pd.DataFrame( [[ij for ij in i] for i in rows] ) df.rename(columns={0: ‘Random Number One’, 1: ‘Random Number Two’, 2: ‘Random Number Three’}, inplace=True); print(df.head(20))Comments
This helped for me for connecting toAWS MYSQL(RDS) frompython 3.x basedlambda function and loading into a pandas DataFrame
import jsonimport boto3import pymysqlimport pandas as pduser = 'username'password = 'XXXXXXX'client = boto3.client('rds')def lambda_handler(event, context): conn = pymysql.connect(host='xxx.xxxxus-west-2.rds.amazonaws.com', port=3306, user=user, passwd=password, db='database name', connect_timeout=5) df= pd.read_sql('select * from TableName limit 10',con=conn) print(df) # TODO implement #return { # 'statusCode': 200, # 'df': df #}Comments
For Postgres users
import psycopg2import pandas as pdconn = psycopg2.connect("database='datawarehouse' user='user1' host='localhost' password='uberdba'")customers = 'select * from customers'customers_df = pd.read_sql(customers,conn)customers_dfUsingmysql.connector, you could write something like this:
import mysql.connectorimport pandas as pd# Database credentialsDB_HOST = 'host_ip'DB_NAME = 'db_name or schema'DB_USER = 'user_name'DB_PASS = 'password'try: # Connect to the database conn = mysql.connector.connect( host=DB_HOST, database=DB_NAME, user=DB_USER, password=DB_PASS ) # Create a cursor object to execute SQL queries cursor = conn.cursor() # Example query query = "SELECT * FROM your_table" # Execute the query cursor.execute(query) # Fetch all the rows rows = cursor.fetchall() # Get column names column_names = [desc[0] for desc in cursor.description] # Create a DataFrame from the fetched rows and column names df = pd.DataFrame(rows, columns=column_names) # Process or analyze the DataFrame as needed print(df) # Close the cursor and connection cursor.close() conn.close()except mysql.connector.Error as error: print(f"Failed to connect to MySQL: {error}")1 Comment
Explore related questions
See similar questions with these tags.









