- Notifications
You must be signed in to change notification settings - Fork7
Power up your data science workflow with ChatGPT.
License
rvanasa/pandas-gpt
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
pandas-gpt
is a Python library for doing almost anything with apandas DataFrame using ChatGPT or any otherLarge Language Model (LLM).
pip install pandas-gpt[openai]
You may also want to install the optionalopenai
and/orlitellm
dependencies.
Next, set theOPENAI_API_KEY
environment variable to yourOpenAI API key, or use the following code snippet:
importopenaiopenai.api_key='<API Key>'
If you're looking for a free alternative to the OpenAI API, we encourage usingGoogle Gemini for code completion:
pip install pandas-gpt[litellm]
importpandas_gptpandas_gpt.completer=pandas_gpt.LiteLLM('gemini/gemini-1.5-pro',api_key='...')
Setup and usage examples are available in thisGoogle Colab notebook.
importpandasaspdimportpandas_gptdf=pd.DataFrame('https://gist.githubusercontent.com/bluecoconut/9ce2135aafb5c6ab2dc1d60ac595646e/raw/c93c3500a1f7fae469cba716f09358cfddea6343/sales_demo_with_pii_and_all_states.csv')# Data transformationdf=df.ask('drop purchases from Laurenchester, NY')df=df.ask('add a new Category column with values "cheap", "regular", or "expensive"')# Queriesweekday=df.ask('which day of the week had the largest number of orders?')top_10=df.ask('what are the top 10 most popular products, as a table')# Plottingdf.ask('plot monthly and hourly sales')top_10.ask('horizontal bar plot with pastel colors')# Allow changes to original datasetdf.ask('do something interesting',mutable=True)# Show source code before runningdf.ask('convert prices from USD to GBP',verbose=True)
It's possible to use a different language model with thecompleter
config option:
importpandas_gpt# Global defaultpandas_gpt.completer=pandas_gpt.OpenAI('gpt-3.5-turbo')# Custom completer for a specific requestdf.ask('Do something interesting with the data',completer=pandas_gpt.LiteLLM('gemini/gemini-1.5-pro'))
By default, API keys are picked up from environment variables such asOPENAI_API_KEY
.It's also possible to specify an API key for a particular call:
df.ask('Do something important with the data',completer=pandas_gpt.OpenAI('gpt-4o',api_key='...'))
pandas_gpt.completer=pandas_gpt.OpenAI('gpt-4o')
pandas_gpt.completer=pandas_gpt.LiteLLM('gemini/gemini-1.5-pro')
pandas_gpt.completer=pandas_gpt.LiteLLM('huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct')
pandas_gpt.completer=pandas_gpt.OpenRouter('anthropic/claude-3.5-sonnet')
defmy_custom_completer(prompt:str)->str:# Use an LLM or any other method to create a `process()` function that# takes a pandas DataFrame as a single argument, does some operations on it,# and return a DataFrame.return'def process(df): ...'pandas_gpt.completer=my_custom_completer
If you want to use a fully customized API host such asAzure OpenAI Service,you can globally configure theopenai
andpandas-gpt
packages:
importopenaiopenai.api_type='azure'openai.api_base='<Endpoint>'openai.api_version='<Version>'openai.api_key='<API Key>'importpandas_gptpandas_gpt.completer=pandas_gpt.OpenAI(model='gpt-3.5-turbo',engine='<Engine>',deployment_id='<Deployment ID>',)
- GitHub Copilot: General-purpose code completion (paid subscription)
- Sketch: AI-powered data summarization and code suggestions (works without an API key)
Please note that thelimitations of ChatGPT also apply to this library. I would recommend usingpandas-gpt
in a sandboxed environment such asGoogle Colab,Kaggle, orGitPod.