- Notifications
You must be signed in to change notification settings - Fork0
Stanford DSPy: The framework for programming with foundation models
License
hal9ai/dspy
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Paper ——DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
DSPy is the framework for solving advanced tasks with language models (LMs) and retrieval models (RMs).DSPy unifies techniques forprompting andfine-tuning LMs — and approaches forreasoning,self-improvement, andaugmentation with retrieval and tools. All of these are expressed through modules that compose and learn.
To make this possible:
DSPy providescomposable and declarative modules for instructing LMs in a familiar Pythonic syntax. It upgrades "prompting techniques" like chain-of-thought and self-reflection from hand-adaptedstring manipulation tricks into truly modulargeneralized operations that learn to adapt to your task.
DSPy introduces anautomatic compiler that teaches LMs how to conduct the declarative steps in your program. Specifically, theDSPy compiler will internallytrace your program and thencraft high-quality prompts for large LMs (or train automatic finetunes for small LMs) to teach them the steps of your task.
TheDSPy compilerbootstraps prompts and finetunes from minimal datawithout needing manual labels for the intermediate steps in your program. Instead of brittle "prompt engineering" with hacky string manipulation, you can explore a systematic space of modular and trainable pieces.
For complex tasks,DSPy can routinely teach powerful models likeGPT-3.5
and local models likeT5-base
orLlama2-13b
to be much more reliable at tasks.DSPy will compile thesame program into different few-shot prompts and/or finetunes for each LM.
If you want to seeDSPy in action,open our intro tutorial notebook.
- Installation
- Framework Syntax
- Compiling: Two Powerful Concepts
- Tutorials & Documentation
- FAQ: Is DSPy right for me?
When we build neural networks, we don't write manualfor-loops over lists ofhand-tuned floats. Instead, you might use a framework likePyTorch to compose declarative layers (e.g.,Convolution
orDropout
) and then use optimizers (e.g., SGD or Adam) to learn the parameters of the network.
Ditto!DSPy gives you the right general-purpose modules (e.g.,ChainOfThought
,Retrieve
, etc.) and takes care of optimizing their promptsfor your program and your metric, whatever they aim to do. Whenever you modify your code, your data, or your validation constraints, you cancompile your program again andDSPy will create new effective prompts that fit your changes.
All you need is:
pip install dspy-ai
Or open our intro notebook in Google Colab:
Note: If you're looking for Demonstrate-Search-Predict (DSP), which is the previous version of DSPy, you can find it on thev1 branch of this repo.
For the optional Pinecone retrieval integration, include thepinecone
extra:
pip install dspy-ai[pinecone]
For the optional Qdrant retrieval integration, include theqdrant
extra:
pip install dspy-ai[qdrant]
For the optionalchromadb retrieval integration, include thechromadb
extra:
pip install dspy-ai[chromadb]
For the optionalmarqo retrieval integration, include themarqo
extra:
pip install dspy-ai[marqo]
DSPy hides tedious prompt engineering, but it cleanly exposes the important decisions you need to make:[1] what's your system design going to look like?[2] what are the important constraints on the behavior of your program?
You express your system as free-form Pythonic modules.DSPy will tune the quality of your programin whatever way you use foundation models: you can code with loops,if
statements, or exceptions, and useDSPy modules within any Python control flow you think works for your task.
Suppose you want to build a simple retrieval-augmented generation (RAG) system for question answering. You can define your ownRAG
program like this:
classRAG(dspy.Module):def__init__(self,num_passages=3):super().__init__()self.retrieve=dspy.Retrieve(k=num_passages)self.generate_answer=dspy.ChainOfThought("context, question -> answer")defforward(self,question):context=self.retrieve(question).passagesanswer=self.generate_answer(context=context,question=question)returnanswer
A program has two key methods, which you can edit to fit your needs.
Your__init__
method declares the modules you will use. Here,RAG
will use the built-inRetrieve
for retrieval andChainOfThought
for generating answers.DSPy offers general-purpose modules that take the shape ofyour own sub-tasks — and not pre-built functions for specific applications.
Modules that use the LM, likeChainOfThought
, require asignature. That is a declarative spec that tells the module what it's expected to do. In this example, we use the short-hand signature notationcontext, question -> answer
to tellChainOfThought
it will be given somecontext
and aquestion
and must produce ananswer
. We will discuss more advancedsignatures below.
Yourforward
method expresses any computation you want to do with your modules. In this case, we use the modulesself.retrieve
andself.generate_answer
to search for somecontext
and then use thecontext
andquestion
to generate theanswer
!
You can now either use thisRAG
program inzero-shot mode. Orcompile it to obtain higher quality. Zero-shot usage is simple. Just define an instance of your program and then call it:
rag=RAG()# zero-shot, uncompiled version of RAGrag("what is the capital of France?").answer# -> "Paris"
The next section will discuss how to compile our simpleRAG
program. When we compile it, theDSPy compiler will annotatedemonstrations of its steps: (1) retrieval, (2) using context, and (3) usingchain-of-thought to answer questions. From these demonstrations, theDSPy compiler will make sure it produces an effective few-shot prompt that works well with your LM, retrieval model, and data. If you're working with small models, it'll finetune your model (instead of prompting) to do this task.
If you later decide you need another step in your pipeline, just add another module and compile again. Maybe add a module that takes the chat history into account during search?
To make it possible to compile any program you write,DSPy introduces two simple concepts: Signatures and Teleprompters.
When we assign tasks to LMs inDSPy, we specify the behavior we need as aSignature. A signature is a declarative specification of input/output behavior of aDSPy module.
Instead of investing effort intohow to get your LM to do a sub-task, signatures enable you to informDSPywhat the sub-task is. Later, theDSPy compiler will figure out how to build a complex prompt for your large LM (or finetune your small LM) specifically for your signature, on your data, and within your pipeline.
A signature consists of three simple elements:
- A minimal description of the sub-task the LM is supposed to solve.
- A description of one or more input fields (e.g., input question) that will we will give to the LM.
- A description of one or more output fields (e.g., the question's answer) that we will expect from the LM.
We support two notations for expressing signatures. Theshort-hand signature notation is for quick development. You just provide your module (e.g.,dspy.ChainOfThought
) with a string withinput_field_name_1, ... -> output_field_name_1, ...
with the fields separated by commas.
In theRAG
class earlier, we saw:
self.generate_answer=dspy.ChainOfThought("context, question -> answer")
In many cases, this barebones signature is sufficient. However, sometimes you need more control. In these cases, we can use the full notation to express a more fully-fledged signature below.
classGenerateSearchQuery(dspy.Signature):"""Write a simple search query that will help answer a complex question."""context=dspy.InputField(desc="may contain relevant facts")question=dspy.InputField()query=dspy.OutputField()### inside your program's __init__ functionself.generate_answer=dspy.ChainOfThought(GenerateSearchQuery)
You can optionally provide aprefix
and/ordesc
key for each input or output field to refine or constraint the behavior of modules using your signature. The description of the sub-task itself is specified as the docstring (i.e.,"""Write a simple..."""
).
After defining theRAG
program, we cancompile it. Compiling a program will update the parameters stored in each module. For large LMs, this is primarily in the form of creating and validating good demonstrations for inclusion in your prompt(s).
Compiling depends on three things: a (potentially tiny) training set, a metric for validation, and your choice of teleprompter fromDSPy.Teleprompters are powerful optimizers (included inDSPy) that can learn to bootstrap and select effective prompts for the modules of any program. (The "tele-" in the name means "at a distance", i.e., automatic prompting at a distance.)
DSPy typically requires very minimal labeling. For example, ourRAG
pipeline may work well with just a handful of examples that contain aquestion and its (human-annotated)answer. Your pipeline may involve multiple complex steps: our basicRAG
example includes a retrieved context, a chain of thought, and the answer. However, you only need labels for the initial question and the final answer.DSPy will bootstrap any intermediate labels needed to support your pipeline. If you change your pipeline in any way, the data bootstrapped will change accordingly!
my_rag_trainset= [dspy.Example(question="Which award did Gary Zukav's first book receive?",answer="National Book Award" ), ...]
Second, define your validation logic, which will express some constraints on the behavior of your program or individual modules. ForRAG
, we might express a simple check like this:
defvalidate_context_and_answer(example,pred,trace=None):# check the gold label and the predicted answer are the sameanswer_match=example.answer.lower()==pred.answer.lower()# check the predicted answer comes from one of the retrieved contextscontext_match=any((pred.answer.lower()inc)forcinpred.context)returnanswer_matchandcontext_match
Different teleprompters offer various tradeoffs in terms of how much they optimize cost versus quality, etc. ForRAG
, we might use the simple teleprompter calledBootstrapFewShot
. To do so, we instantiate the teleprompter itself with a validation functionmy_rag_validation_logic
and then compile against some training setmy_rag_trainset
.
fromdspy.telepromptimportBootstrapFewShotteleprompter=BootstrapFewShot(metric=my_rag_validation_logic)compiled_rag=teleprompter.compile(RAG(),trainset=my_rag_trainset)
If we now usecompiled_rag
, it will invoke our LM with rich prompts with few-shot demonstrations of chain-of-thought retrieval-augmented question answering on our data.
While we work on new tutorials and documentation, please check outour intro notebook.
Or open it directly in free Google Colab:
[Intro-01] Getting Started: High Quality Pipelined Prompts with Minimal Effort
[Intro-02] Using DSPy For Your Own Task: Building Blocks
[Intro-03] Adding Complexity: Multi-stage Programs
[Intro-04] Adding Complexity for Your Own Task: Design Patterns
[Advanced-01] Long-Form QA & Programmatic Evaluation.
[Advanced-02] Programmatic Evaluation II & Dataset Creation.
[Advanced-03] Compiling & Teleprompters.
[Advanced-04] Extending DSPy with Modules or Teleprompters.
[Advanced-05]: Agents and General Tool Use in DSPy.
[Advanced-06]: Reproducibility, Saving Programs, and Advanced Caching.
We have work-in-progress module documentation atthis PR. Please let us know if anything there is unclear.
dspy.Signature
dspy.InputField
dspy.OutputField
dspy.Predict
dspy.Retrieve
dspy.ChainOfThought
dspy.majority
(functional self-consistency)dspy.ProgramOfThought
[see open PR]dspy.ReAct
dspy.MultiChainComparison
dspy.SelfCritique
[coming soon]dspy.SelfRevision
[coming soon]
dspy.teleprompt.LabeledFewShot
dspy.teleprompt.BootstrapFewShot
dspy.teleprompt.BootstrapFewShotWithRandomSearch
dspy.teleprompt.LabeledFinetune
[coming soon]dspy.teleprompt.BootstrapFinetune
dspy.teleprompt.Ensemble
dspy.teleprompt.kNN
[coming soon]
TheDSPy philosophy and abstraction differ significantly from other libraries and frameworks, so it's usually straightforward to decide whenDSPy is (or isn't) the right framework for your usecase.
If you're a NLP/AI researcher (or a practitioner exploring new pipelines or new tasks), the answer is generally an invariableyes. If you're a practitioner doing other things, please read on.
In other words:Why can't I just write my prompts directly as string templates? Well, for extremely simple settings, thismight work just fine. (If you're familiar with neural networks, this is like expressing a tiny two-layer NN as a Python for-loop. It kinda works.)
However, when you need higher quality (or manageable cost), then you need to iteratively explore multi-stage decomposition, improved prompting, data bootstrapping, careful finetuning, retrieval augmentation, and/or using smaller (or cheaper, or local) models. The true expressive power of building with foundation models lies in the interactions between these pieces. But every time you change one piece, you likely break (or weaken) multiple other components.
DSPy cleanly abstracts away (and powerfully optimizes) the parts of these interactions that are external to your actual system design. It lets you focus on designing the module-level interactions: thesame program expressed in 10 or 20 lines ofDSPy can easily be compiled into multi-stage instructions forGPT-4
, detailed prompts forLlama2-13b
, or finetunes forT5-base
.
Oh, and you wouldn't need to maintain long, brittle, model-specific strings at the core of your project anymore.
Note: If you use LangChain as a thin wrapper around your own prompt strings, refer to answer [5.a] instead.
LangChain and LlamaIndex are popular libraries that target high-level application development with LMs. They offer manybatteries-included, pre-built application modules that plug in with your data or configuration. In practice, indeed, many usecases genuinelydon't need any special components. If you'd be happy to use someone's generic, off-the-shelf prompt for question answering over PDFs or standard text-to-SQL as long as it's easy to set up on your data, then you will probably find a very rich ecosystem in these libraries.
Unlike these libraries,DSPy doesn't internally contain hand-crafted prompts that target specific applications you can build. Instead,DSPy introduces a very small set of much more powerful and general-purpose modulesthat can learn to prompt (or finetune) your LM within your pipeline on your data.
DSPy offers a whole different degree of modularity: when you change your data, make tweaks to your program's control flow, or change your target LM, theDSPy compiler can map your program into a new set of prompts (or finetunes) that are optimized specifically for this pipeline. Because of this, you may find thatDSPy obtains the highest quality for your task, with the least effort, provided you're willing to implement (or extend) your own short program. In short,DSPy is for when you need a lightweight but automatically-optimizing programming model — not a library of predefined prompts and integrations.
If you're familiar with neural networks:
This is like the difference between PyTorch (i.e., representingDSPy) and HuggingFace Transformers (i.e., representing the higher-level libraries). If you simply want to use off-the-shelf
BERT-base-uncased
orGPT2-large
or apply minimal finetuning to them, HF Transformers makes it very straightforward. If, however, you're looking to build your own architecture (or extend an existing one significantly), you have to quickly drop down into something much more modular like PyTorch. Luckily, HF Transformersis implemented in backends like PyTorch. We are similarly excited about high-level wrapper aroundDSPy for common applications. If this is implemented usingDSPy, your high-level application can also adapt significantly to your data in a way that static prompt chains won't. Pleaseopen an issue if this is something you want to help with.
Guidance, LMQL, RELM, and Outlines are all exciting new libraries for controlling the individual completions of LMs, e.g., if you want to enforce JSON output schema or constrain sampling to a particular regular expression.
This is very useful in many settings, but it's generally focused on low-level, structured control of a single LM call. It doesn't help ensure the JSON (or structured output) you get is going to be correct or useful for your task.
In contrast,DSPy automatically optimizes the prompts in your programs to align them with various task needs, which may also include producing valid structured ouputs. That said, we are considering allowingSignatures inDSPy to express regex-like constraints that are implemented by these libraries.
DSPy is led byOmar Khattab at Stanford NLP withChris Potts andMatei Zaharia.
Key contributors and team members includeArnav Singhvi,Paridhi Maheshwari,Keshav Santhanam,Sri Vardhamanan,Eric Zhang,Hanna Moazam,Thomas Joshi,Saiful Haq, andAshutosh Sharma.
DSPy includes important contributions fromRick Battle andIgor Kotenkov. It reflects discussions withLisa Li,David Hall,Ashwin Paranjape,Heather Miller,Chris Manning,Percy Liang, and many others.
TheDSPy logo is designed byChuyi Zhang.
To stay up to date or learn more, follow@lateinteraction on Twitter.
If you use DSPy or DSP in a research paper, please cite our work as follows:
@article{khattab2023dspy, title={DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines}, author={Khattab, Omar and Singhvi, Arnav and Maheshwari, Paridhi and Zhang, Zhiyuan and Santhanam, Keshav and Vardhamanan, Sri and Haq, Saiful and Sharma, Ashutosh and Joshi, Thomas T. and Moazam, Hanna and Miller, Heather and Zaharia, Matei and Potts, Christopher}, journal={arXiv preprint arXiv:2310.03714}, year={2023}}@article{khattab2022demonstrate, title={Demonstrate-Search-Predict: Composing Retrieval and Language Models for Knowledge-Intensive {NLP}}, author={Khattab, Omar and Santhanam, Keshav and Li, Xiang Lisa and Hall, David and Liang, Percy and Potts, Christopher and Zaharia, Matei}, journal={arXiv preprint arXiv:2212.14024}, year={2022}}
You can also read more about the evolution of the framework from Demonstrate-Search-Predict to DSPy:
- Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP (Academic Paper, Dec 2022)
- Introducing DSP (Twitter Thread, Jan 2023)
- Releasing the DSP Compiler (v0.1) (Twitter Thread, Feb 2023)
- Releasing DSPy, the latest iteration of the framework (Twitter Thread, Aug 2023)
- DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines (Academic Paper, Oct 2023)