joeornstein/promptrPublic

NotificationsYou must be signed in to change notification settings
Fork1
Star18

Format and Complete Few-Shot LLM Prompts

License

Unknown, MIT licenses found

Licenses found

18 stars 1 fork Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
R		R
data-raw		data-raw
data		data
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CRAN-SUBMISSION		CRAN-SUBMISSION
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
cran-comments.md		cran-comments.md
promptr.Rproj		promptr.Rproj

Repository files navigation

promptr

We developed thepromptr package so that researchers could easilyformat and submit LLM prompts using the R programming language. Itprovides a handful of convenient functions to query the OpenAI API andreturn the output as a tidy R dataframe. The package is intended to beparticularly useful for social scientists using LLMs for textclassification and scaling tasks.

Installation

You can install the release version ofpromptr from CRAN with:

install.packages('promptr')

Or you can install the latest development version fromGitHub with:

# install.packages("devtools")devtools::install_github("joeornstein/promptr")

You will also need a developer account with OpenAI and an API key. Forbest performance, you may also want to provide credit card information(this significantly boosts your API rate limit, even if you’re notspending money).

Once your account is created, copy-paste your API key into the followingline of R code.

library(promptr)openai_api_key('YOUR API KEY GOES HERE', install = TRUE)

Now you’re all set up!

Completing Prompts

The workhorse function of thepromptr package iscomplete_prompt().This function submits a prompt to the OpenAI API and returns a dataframewith the five most likely next word predictions and their associatedprobabilities.

library(promptr)complete_prompt('I feel like a')#>    token probability#> 1    lot  0.20985606#> 2 little  0.02118042#> 3    kid  0.01374532#> 4    new  0.01208388#> 5    big  0.01204145

If you prefer the model to autoregressively generate text instead ofoutputting the next-word probabilities, you can set themax_tokensinput greater than 1. The function will return a character object withthe most likely completion.

complete_prompt('I feel like a',max_tokens=18)#> [1] " lot of people are gonna be like, \"Oh, I'm gonna be a doctor.\"\n\n"

Note that by default, thetemperature input is set to 0, which meansthe model will always return the most likely completion for your prompt.Increasing temperature allows the model to randomly select words fromits estimated probability distribution (see the API reference for moreon these parameters).

You can also change which model variant the function calls using themodel input. By default, it is set to “gpt-3.5-turbo-instruct”, theRLHF variant of GPT-3.5. For the base GPT-3 variants, try “davinci-002”(175 billion parameters) or “babbage-002” (1.3 billion parameters).

Formatting Prompts

Manually typing prompts with multiple few-shot examples can be tediousand error-prone, particularly if you want to include context-specificinstructions or few-shot examples. We include theformat_prompt()function to aid in that process.

The function is designed with classification problems in mind. If youinput the text you would like to classify along with a set ofinstructions, the default prompt template looks like this:

prompt<- format_prompt(text='I feel positively morose today.',instructions='Decide whether this statment is happy or sad.')prompt#> Decide whether this statment is happy or sad.#>#> Text: I feel positively morose today.#> Classification:

You can customize the template usingglue syntax, with placeholdersfor {text} and {label}.

format_prompt(text='I feel positively morose today.',instructions='Decide whether this statment is happy or sad.',template='Statement: {text}\nSentiment: {label}')#> Decide whether this statment is happy or sad.#>#> Statement: I feel positively morose today.#> Sentiment:

This function is particularly useful when including few-shot examples inthe prompt. If you input these examples as a tidy dataframe, theformat_prompt() function will paste them into the prompt according tothe template. Theexamples dataframe must have at least two columns,one called “text” and the other called “label”.

examples<-data.frame(text= c('What a pleasant day!','Oh bother.','Merry Christmas!',':-('),label= c('happy','sad','happy','sad'))examples#>                   text label#> 1 What a pleasant day! happy#> 2           Oh bother.   sad#> 3     Merry Christmas! happy#> 4                  :-(   sadprompt<- format_prompt(text='I feel positively morose today.',instructions='Decide whether this statment is happy or sad.',examples=examples,template='Statement: {text}\nSentiment: {label}')prompt#> Decide whether this statment is happy or sad.#>#> Statement: What a pleasant day!#> Sentiment: happy#>#> Statement: Oh bother.#> Sentiment: sad#>#> Statement: Merry Christmas!#> Sentiment: happy#>#> Statement: :-(#> Sentiment: sad#>#> Statement: I feel positively morose today.#> Sentiment:

Once you’re satisfied with the format of the prompt, you can submit itwithcomplete_prompt():

complete_prompt(prompt)#>     token  probability#> 1     sad 9.990284e-01#> 2     sad 6.382159e-04#> 3     Sad 1.961563e-04#> 4   happy 3.677703e-05#> 5 sadness 2.776648e-05

The full pipeline—first formatting the text into a prompt, thensubmitting the prompt for completion—looks like this:

'What a joyous day for our adversaries.'|>   format_prompt(instructions='Classify this text as happy or sad.',examples=examples)|>   complete_prompt()#>     token  probability#> 1     sad 0.9931754130#> 2   happy 0.0023576333#> 3     sad 0.0021634900#> 4     Sad 0.0007275062#> 5 unhappy 0.0006792638

The biggest advantage of using text prompts like these isefficiency. One can request up to 2,048 next-word probabilitydistributions in a single API call, whereas ChatGPT prompts (see nextsection) can only be submitted one at a time. Both theformat_prompt()function and thecomplete_prompt() function are vectorized so thatusers can submit multiple texts to be classified simultaneously.

texts<- c('What a wonderful world??? As if!','Things are looking up.','Me gusta mi vida.')texts|>   format_prompt(instructions='Classify these texts as happy or sad.',examples=examples)|>   complete_prompt()#> [[1]]#>     token  probability#> 1     sad 0.9845923503#> 2   happy 0.0101702041#> 3     sad 0.0022756506#> 4 unhappy 0.0005526699#> 5         0.0005016985#>#> [[2]]#>   token  probability#> 1 happy 9.989103e-01#> 2 happy 8.046505e-04#> 3       7.620519e-05#> 4       5.893237e-05#> 5 Happy 2.052843e-05#>#> [[3]]#>    token  probability#> 1  happy 0.9957006846#> 2  happy 0.0012367921#> 3        0.0009202636#> 4 unsure 0.0002593114#> 5        0.0001682163

Example: Supreme Court Tweets

To illustrate the entire workflow, let’s classify the sentiment ofsocial media posts from the Supreme Court Tweets dataset included in thepackage.

data(scotus_tweets)# the full datasetdata(scotus_tweets_examples)# a dataframe with few-shot examples

Let’s focus on tweets posted following theMasterpiece Cakeshop vColorado (2018) decision, formatting the prompts with a set ofinstructions and few-shot examples tailored to that context.

library(tidyverse)masterpiece_tweets<-scotus_tweets|>   filter(case=='masterpiece')instructions<-'Read these tweets posted the day after the US Supreme Court ruled in favor of a baker who refused to bake a wedding cake for a same-sex couple (Masterpiece Cakeshop, 2018). For each tweet, decide whether its sentiment is Positive, Neutral, or Negative.'masterpiece_examples<-scotus_tweets_examples|>   filter(case=='masterpiece')masterpiece_tweets$prompt<- format_prompt(text=masterpiece_tweets$text,instructions=instructions,examples=masterpiece_examples)masterpiece_tweets$prompt[3]#> Read these tweets posted the day after the US Supreme Court ruled in favor of a baker who refused to bake a wedding cake for a same-sex couple (Masterpiece Cakeshop, 2018). For each tweet, decide whether its sentiment is Positive, Neutral, or Negative.#>#> Text: Thank you Supreme Court I take pride in your decision!!!!✝️ #SCOTUS#> Classification: Positive#>#> Text: Supreme Court rules in favor of Colorado baker! This day is getting better by the minute!#> Classification: Positive#>#> Text: Can’t escape the awful irony of someone allowed to use religion to discriminate against people in love.#> Not my Jesus.#> #opentoall #SCOTUS #Hypocrisy #MasterpieceCakeshop#> Classification: Negative#>#> Text: I can’t believe this cake case went all the way to #SCOTUS . Can someone let me know what cake was ultimately served at the wedding? Are they married and living happily ever after?#> Classification: Neutral#>#> Text: Supreme Court rules in favor of baker who would not make wedding cake for gay couple#> Classification: Neutral#>#> Text: #SCOTUS set a dangerous precedent today. Although the Court limited the scope to which a business owner could deny services to patrons, the legal argument has been legitimized that one's subjective religious convictions trump (no pun intended) #humanrights. #LGBTQRights#> Classification: Negative#>#> Text: The @Scotus ruling was a 🥧 pie-in-the-face to liberal lunacy.#>#> @charliekirk11 @Richzeoli @DennisDMZ#>#> 🎂🎂🎂🎂🎂🎂🎂🎂🎂#>#> #CakeEquality #SCOTUS #liberaltears#> Classification:

Then we can submit this list of prompts usingcomplete_prompt():

masterpiece_tweets$out<- complete_prompt(masterpiece_tweets$prompt)

The estimated probability distribution for each completion is now a listof dataframes in theout column. We can compute a simple sentimentscore by taking the estimated probability each tweet is Positive minusthe estimated probability the tweet is Negative:

masterpiece_tweets$score<-masterpiece_tweets$out|>   lapply(mutate,token= str_to_lower(token))|>   lapply(summarize,positive= sum(probability[token=='positive']),negative= sum(probability[token=='negative']))|>  lapply(summarize,score=positive-negative)|>   unlist()

Finally, let’s compare those scores from GPT-3.5 with the authors’hand-coded sentiment scores (-1 for Negative, 0 for Neutral, and +1 forPositive).

ggplot(data=masterpiece_tweets,mapping= aes(x= (expert1+expert2+expert3)/3,y=score         ))+  geom_jitter(width=0.1)+  labs(x='Hand-Coded Sentiment',y='GPT-3.5 Sentiment Score')+  theme_bw()

Chat Completions

The most recent OpenAI language models—including ChatGPT and GPT-4—havebeen fine-tuned to function as “chat” models, and interacting with themthrough the API requires a slightly different format for the inputs.Instead of a single text prompt, few-shot prompts are expressed in theform of a “dialogue” between the user and the model, which we canrepresent inR as a “list of lists”.

prompt<-list(list(role='user',content='Hello can you help me with a homework problem?'),list(role='assistant',content='Sure thing! What is the problem?'),list(role='user',content='I need to explain why Frederick the Great was so fond of potatoes?'))

Users can submit a chat prompt to the API using thecomplete_chat()function. The default model is “gpt-3.5-turbo” (the most cost-effectivechat model offered through the API as of February 2024).

complete_chat(prompt,max_tokens=300)#> [1] "Frederick the Great, also known as Frederick II of Prussia, was fond of potatoes for several reasons. One of the main reasons was that he recognized the nutritional value and versatility of potatoes. Potatoes are a rich source of carbohydrates, vitamins, and minerals, making them a valuable food source for his subjects, especially during times of famine or food shortages.\n\nAdditionally, Frederick promoted the cultivation of potatoes in Prussia because they were easy to grow and had a high yield compared to other crops. This made potatoes a cost-effective and efficient food source for the population.\n\nFurthermore, Frederick saw the potential of potatoes as a way to improve the agricultural productivity of his kingdom. By encouraging the cultivation of potatoes, he aimed to increase food security and reduce dependence on imported grains.\n\nOverall, Frederick the Great's fondness for potatoes was driven by their nutritional value, ease of cultivation, and potential to improve agricultural productivity in Prussia."

Theformat_chat() function allows users to create a chat prompt usingthe same syntax asformat_prompt().

tweet<-masterpiece_tweets$text[4]cat(tweet)#> Let’s be real, lame anti-gay cake probably sucks anyway.#>#> Also, I love you Sonia Sotomayor and RBG ❤️🧡💛💚💙💜#>#> #masterpiececakeshop #scotusprompt<- format_chat(tweet,instructions='Read these tweets posted the day after the US Supreme Court ruled in favor of a baker who refused to bake a wedding cake for a same-sex couple (Masterpiece Cakeshop, 2018). For each tweet, decide whether its sentiment is Positive, Neutral, or Negative.',examples=masterpiece_examples)prompt#> [[1]]#> [[1]]$role#> [1] "user"#>#> [[1]]$content#> [1] "Read these tweets posted the day after the US Supreme Court ruled in favor of a baker who refused to bake a wedding cake for a same-sex couple (Masterpiece Cakeshop, 2018). For each tweet, decide whether its sentiment is Positive, Neutral, or Negative."#>#>#> [[2]]#> [[2]]$role#> [1] "user"#>#> [[2]]$content#> [1] "Thank you Supreme Court I take pride in your decision!!!!✝️ #SCOTUS"#>#>#> [[3]]#> [[3]]$role#> [1] "assistant"#>#> [[3]]$content#> [1] "Positive"#>#>#> [[4]]#> [[4]]$role#> [1] "user"#>#> [[4]]$content#> [1] "Supreme Court rules in favor of Colorado baker! This day is getting better by the minute!"#>#>#> [[5]]#> [[5]]$role#> [1] "assistant"#>#> [[5]]$content#> [1] "Positive"#>#>#> [[6]]#> [[6]]$role#> [1] "user"#>#> [[6]]$content#> [1] "Can’t escape the awful irony of someone allowed to use religion to discriminate against people in love. \r\nNot my Jesus. \r\n#opentoall #SCOTUS #Hypocrisy #MasterpieceCakeshop"#>#>#> [[7]]#> [[7]]$role#> [1] "assistant"#>#> [[7]]$content#> [1] "Negative"#>#>#> [[8]]#> [[8]]$role#> [1] "user"#>#> [[8]]$content#> [1] "I can’t believe this cake case went all the way to #SCOTUS . Can someone let me know what cake was ultimately served at the wedding? Are they married and living happily ever after?"#>#>#> [[9]]#> [[9]]$role#> [1] "assistant"#>#> [[9]]$content#> [1] "Neutral"#>#>#> [[10]]#> [[10]]$role#> [1] "user"#>#> [[10]]$content#> [1] "Supreme Court rules in favor of baker who would not make wedding cake for gay couple"#>#>#> [[11]]#> [[11]]$role#> [1] "assistant"#>#> [[11]]$content#> [1] "Neutral"#>#>#> [[12]]#> [[12]]$role#> [1] "user"#>#> [[12]]$content#> [1] "#SCOTUS set a dangerous precedent today. Although the Court limited the scope to which a business owner could deny services to patrons, the legal argument has been legitimized that one's subjective religious convictions trump (no pun intended) #humanrights. #LGBTQRights"#>#>#> [[13]]#> [[13]]$role#> [1] "assistant"#>#> [[13]]$content#> [1] "Negative"#>#>#> [[14]]#> [[14]]$role#> [1] "user"#>#> [[14]]$content#> [1] "Let’s be real, lame anti-gay cake probably sucks anyway. \r\n\r\nAlso, I love you Sonia Sotomayor and RBG ❤️🧡💛💚💙💜\r\n\r\n#masterpiececakeshop #scotus"

One advantage of these chat models is that they typically do not requireas many few-shot examples to perform well, but their big practicaldisadvantage is that we can only submit one chat to the API at a time.

response<- complete_chat(prompt)response#>        token  probability#> 1   Positive 7.849799e-01#> 2    Neutral 2.110320e-01#> 3   Negative 2.354229e-03#> 4      Mixed 1.621902e-03#> 5   positive 2.702952e-06#> 6       Post 1.892515e-06#> 7   Positive 1.472733e-06#> 8    Neutral 1.242802e-06#> 9        Mix 1.100770e-06#> 10   neutral 5.678884e-07#> 11        Ne 5.622518e-07#> 12       Pos 5.392126e-07#> 13         N 3.356456e-07#> 14       Net 2.261731e-07#> 15 _positive 8.153610e-08#> 16         - 6.318000e-08#> 17         M 5.630869e-08#> 18         I 4.956445e-08#> 19     mixed 4.791496e-08#> 20 .Positive 4.363649e-08

About

Format and Complete Few-Shot LLM Prompts

Resources

Readme

License

Unknown, MIT licenses found

Releases

No releases published

Packages

No packages published

Languages

R100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Licenses found

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

promptr

Installation

Completing Prompts

Formatting Prompts

Example: Supreme Court Tweets

Chat Completions

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

Licenses found

joeornstein/promptr

Folders and files

Latest commit

History

Repository files navigation

promptr

Installation

Completing Prompts

Formatting Prompts

Example: Supreme Court Tweets

Chat Completions

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages