- Notifications
You must be signed in to change notification settings - Fork8
Power up your data science workflow with ChatGPT.
License
rvanasa/pandas-gpt
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
pandas-gpt
is a Python library for doing almost anything with apandas DataFrame using ChatGPT or any otherLarge Language Model (LLM).
pip install pandas-gpt[openai]
You may also want to install the optionalopenai
and/orlitellm
dependencies.
Next, set theOPENAI_API_KEY
environment variable to yourOpenAI API key, or use the following code snippet:
importopenaiopenai.api_key='<API Key>'
If you're looking for a free alternative to the OpenAI API, we encourage usingGoogle Gemini for code completion:
pip install pandas-gpt[litellm]
importpandas_gptpandas_gpt.completer=pandas_gpt.LiteLLM('gemini/gemini-1.5-pro',api_key='...')
Setup and usage examples are available in thisGoogle Colab notebook.
importpandasaspdimportpandas_gptdf=pd.DataFrame('https://gist.githubusercontent.com/bluecoconut/9ce2135aafb5c6ab2dc1d60ac595646e/raw/c93c3500a1f7fae469cba716f09358cfddea6343/sales_demo_with_pii_and_all_states.csv')# Data transformationdf=df.ask('drop purchases from Laurenchester, NY')df=df.ask('add a new Category column with values "cheap", "regular", or "expensive"')# Queriesweekday=df.ask('which day of the week had the largest number of orders?')top_10=df.ask('what are the top 10 most popular products, as a table')# Plottingdf.ask('plot monthly and hourly sales')top_10.ask('horizontal bar plot with pastel colors')# Allow changes to original datasetdf.ask('do something interesting',mutable=True)# Show source code before runningdf.ask('convert prices from USD to GBP',verbose=True)
It's possible to use a different language model with thecompleter
config option:
importpandas_gpt# Global defaultpandas_gpt.completer=pandas_gpt.OpenAI('gpt-3.5-turbo')# Custom completer for a specific requestdf.ask('Do something interesting with the data',completer=pandas_gpt.LiteLLM('gemini/gemini-1.5-pro'))
By default, API keys are picked up from environment variables such asOPENAI_API_KEY
.It's also possible to specify an API key for a particular call:
df.ask('Do something important with the data',completer=pandas_gpt.OpenAI('gpt-4o',api_key='...'))
pandas_gpt.completer=pandas_gpt.OpenAI('gpt-4o')
pandas_gpt.completer=pandas_gpt.LiteLLM('gemini/gemini-1.5-pro')
pandas_gpt.completer=pandas_gpt.LiteLLM('huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct')
pandas_gpt.completer=pandas_gpt.OpenRouter('anthropic/claude-3.5-sonnet')
defmy_custom_completer(prompt:str)->str:# Use an LLM or any other method to create a `process()` function that# takes a pandas DataFrame as a single argument, does some operations on it,# and return a DataFrame.return'def process(df): ...'pandas_gpt.completer=my_custom_completer
If you want to use a fully customized API host such asAzure OpenAI Service,you can globally configure theopenai
andpandas-gpt
packages:
importopenaiopenai.api_type='azure'openai.api_base='<Endpoint>'openai.api_version='<Version>'openai.api_key='<API Key>'importpandas_gptpandas_gpt.completer=pandas_gpt.OpenAI(model='gpt-3.5-turbo',engine='<Engine>',deployment_id='<Deployment ID>',)
- GitHub Copilot: General-purpose code completion (paid subscription)
- Sketch: AI-powered data summarization and code suggestions (works without an API key)
Please note that thelimitations of ChatGPT also apply to this library. I would recommend usingpandas-gpt
in a sandboxed environment such asGoogle Colab,Kaggle, orGitPod.
About
Power up your data science workflow with ChatGPT.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors2
Uh oh!
There was an error while loading.Please reload this page.