Movatterモバイル変換

dottxt-ai/outlinesPublic

NotificationsYou must be signed in to change notification settings
Fork610
Star12.1k

Structured Outputs

dottxt-ai.github.io/outlines/

License

Apache-2.0 license

12.1k stars 610 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,063 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
benchmarks		benchmarks
docs		docs
examples		examples
outlines		outlines
scripts		scripts
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pydocstyle		.pydocstyle
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
flake.lock		flake.lock
flake.nix		flake.nix
llm.txt		llm.txt
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements-doc.txt		requirements-doc.txt
setup.cfg		setup.cfg
shell.nix		shell.nix
uv.lock		uv.lock

Repository files navigation

🗒️Structured outputs for LLMs 🗒️

Made with ❤👷️ by the team at.txt
Trusted by NVIDIA, Cohere, HuggingFace, vLLM, etc.

Need a high-performance commercial solution for structured outputs? Email us atcontact@dottxt.co, orschedule a call.

Why Outlines?

LLMs are powerful but their outputs are unpredictable. Most solutions attempt to fix bad outputs after generation using parsing, regex, or fragile code that breaks easily.

Outlines guarantees structured outputs during generation — directly from any LLM.

Works with any model - Same code runs across OpenAI, Ollama, vLLM, and more
Simple integration - Just pass your desired output type:model(prompt, output_type)
Guaranteed valid structure - No more parsing headaches or broken JSON
Provider independence - Switch models without changing code

The Outlines Philosophy

Outlines follows a simple pattern that mirrors Python's own type system. Simply specify the desired output type, and Outlines will ensure your data matches that structure exactly:

For a yes/no response, useLiteral["Yes", "No"]
For numerical values, useint
For complex objects, define a structure with aPydantic model

Quickstart

Getting started with outlines is simple:

1. Install outlines

pip install outlines

2. Connect to your preferred model

importoutlinesfromtransformersimportAutoTokenizer,AutoModelForCausalLMMODEL_NAME="microsoft/Phi-3-mini-4k-instruct"model=outlines.from_transformers(AutoModelForCausalLM.from_pretrained(MODEL_NAME,device_map="auto"),AutoTokenizer.from_pretrained(MODEL_NAME))

3. Start with simple structured outputs

fromtypingimportLiteralfrompydanticimportBaseModel# Simple classificationsentiment=model("Analyze: 'This product completely changed my life!'",Literal["Positive","Negative","Neutral"])print(sentiment)# "Positive"# Extract specific typestemperature=model("What's the boiling point of water in Celsius?",int)print(temperature)# 100

4. Create complex structures

frompydanticimportBaseModelfromenumimportEnumclassRating(Enum):poor=1fair=2good=3excellent=4classProductReview(BaseModel):rating:Ratingpros:list[str]cons:list[str]summary:strreview=model("Review: The XPS 13 has great battery life and a stunning display, but it runs hot and the webcam is poor quality.",ProductReview,max_new_tokens=200,)review=ProductReview.model_validate_json(review)print(f"Rating:{review.rating.name}")# "Rating: good"print(f"Pros:{review.pros}")# "Pros: ['great battery life', 'stunning display']"print(f"Summary:{review.summary}")# "Summary: Good laptop with great display but thermal issues"

Real-world examples

Here are production-ready examples showing how Outlines solves common problems:

🙋‍♂️ Customer Support Triage
This example shows how to convert a free-form customer email into a structured service ticket. By parsing attributes like priority, category, and escalation flags, the code enables automated routing and handling of support issues.

importoutlinesfromenumimportEnumfrompydanticimportBaseModelfromtransformersimportAutoTokenizer,AutoModelForCausalLMfromtypingimportListMODEL_NAME="microsoft/Phi-3-mini-4k-instruct"model=outlines.from_transformers(AutoModelForCausalLM.from_pretrained(MODEL_NAME,device_map="auto"),AutoTokenizer.from_pretrained(MODEL_NAME))defalert_manager(ticket):print("Alert!",ticket)classTicketPriority(str,Enum):low="low"medium="medium"high="high"urgent="urgent"classServiceTicket(BaseModel):priority:TicketPrioritycategory:strrequires_manager:boolsummary:straction_items:List[str]customer_email="""Subject: URGENT - Cannot access my account after paymentI paid for the premium plan 3 hours ago and still can't access any features.I've tried logging out and back in multiple times. This is unacceptable as Ihave a client presentation in an hour and need the analytics dashboard.Please fix this immediately or refund my payment."""prompt=f"""<|im_start|>userAnalyze this customer email:{customer_email}<|im_end|><|im_start|>assistant"""ticket=model(prompt,ServiceTicket,max_new_tokens=500)# Use structured data to route the ticketticket=ServiceTicket.model_validate_json(ticket)ifticket.priority=="urgent"orticket.requires_manager:alert_manager(ticket)

📦 E-commerce product categorization
This use case demonstrates how outlines can transform product descriptions into structured categorization data (e.g., main category, sub-category, and attributes) to streamline tasks such as inventory management. Each product description is processed automatically, reducing manual categorization overhead.

importoutlinesfrompydanticimportBaseModelfromtransformersimportAutoTokenizer,AutoModelForCausalLMfromtypingimportList,OptionalMODEL_NAME="microsoft/Phi-3-mini-4k-instruct"model=outlines.from_transformers(AutoModelForCausalLM.from_pretrained(MODEL_NAME,device_map="auto"),AutoTokenizer.from_pretrained(MODEL_NAME))defupdate_inventory(product,category,sub_category):print(f"Updated{product.split(',')[0]} in category{category}/{sub_category}")classProductCategory(BaseModel):main_category:strsub_category:strattributes:List[str]brand_match:Optional[str]# Process product descriptions in batchesproduct_descriptions= ["Apple iPhone 15 Pro Max 256GB Titanium, 6.7-inch Super Retina XDR display with ProMotion","Organic Cotton T-Shirt, Men's Medium, Navy Blue, 100% Sustainable Materials","KitchenAid Stand Mixer, 5 Quart, Red, 10-Speed Settings with Dough Hook Attachment"]template=outlines.Template.from_string("""<|im_start|>userCategorize this product:{{ description }}<|im_end|><|im_start|>assistant""")# Get structured categorization for all productscategories=model(    [template(description=desc)fordescinproduct_descriptions],ProductCategory,max_new_tokens=200)# Use categorization for inventory managementcategories= [ProductCategory.model_validate_json(category)forcategoryincategories]forproduct,categoryinzip(product_descriptions,categories):update_inventory(product,category.main_category,category.sub_category)

📊 Parse event details with incomplete data
This example uses outlines to parse event descriptions into structured information (like event name, date, location, type, and topics), even handling cases where the data is incomplete. It leverages union types to return either structured event data or a fallback “I don’t know” answer, ensuring robust extraction in varying scenarios.

importoutlinesfromtypingimportUnion,List,LiteralfrompydanticimportBaseModelfromenumimportEnumfromtransformersimportAutoTokenizer,AutoModelForCausalLMMODEL_NAME="microsoft/Phi-3-mini-4k-instruct"model=outlines.from_transformers(AutoModelForCausalLM.from_pretrained(MODEL_NAME,device_map="auto"),AutoTokenizer.from_pretrained(MODEL_NAME))classEventType(str,Enum):conference="conference"webinar="webinar"workshop="workshop"meetup="meetup"other="other"classEventInfo(BaseModel):"""Structured information about a tech event"""name:strdate:strlocation:strevent_type:EventTypetopics:List[str]registration_required:bool# Create a union type that can either be a structured EventInfo or "I don't know"EventResponse=Union[EventInfo,Literal["I don't know"]]# Sample event descriptionsevent_descriptions= [# Complete information"""    Join us for DevCon 2023, the premier developer conference happening on November 15-17, 2023    at the San Francisco Convention Center. Topics include AI/ML, cloud infrastructure, and web3.    Registration is required.    """,# Insufficient information"""    Tech event next week. More details coming soon!    """]# Process eventsresults= []fordescriptioninevent_descriptions:prompt=f"""<|im_start>systemYou are a helpful assistant<|im_end|><|im_start>userExtract structured information about this tech event:{description}If there is enough information, return a JSON object with the following fields:- name: The name of the event- date: The date where the event is taking place- location: Where the event is taking place- event_type: either 'conference', 'webinar', 'workshop', 'meetup' or 'other'- topics: a list of topics of the conference- registration_required: a boolean that indicates whether registration is requiredIf the information available does not allow you to fill this JSON, and only then, answer 'I don't know'.<|im_end|><|im_start|>assistant"""# Union type allows the model to return structured data or "I don't know"result=model(prompt,EventResponse,max_new_tokens=200)results.append(result)# Display resultsfori,resultinenumerate(results):print(f"Event{i+1}:")ifisinstance(result,str):print(f"{result}")else:# It's an EventInfo objectprint(f"  Name:{result.name}")print(f"  Type:{result.event_type}")print(f"  Date:{result.date}")print(f"  Topics:{', '.join(result.topics)}")print()# Use structured data in downstream processingstructured_count=sum(1forrinresultsifisinstance(r,EventInfo))print(f"Successfully extracted data for{structured_count} of{len(results)} events")

🗂️ Categorize documents into predefined types
In this case, outlines classifies documents into predefined categories (e.g., “Financial Report,” “Legal Contract”) using a literal type specification. The resulting classifications are displayed in both a table format and through a category distribution summary, illustrating how structured outputs can simplify content management.

importoutlinesfromtypingimportLiteral,ListimportpandasaspdfromtransformersimportAutoTokenizer,AutoModelForCausalLMMODEL_NAME="microsoft/Phi-3-mini-4k-instruct"model=outlines.from_transformers(AutoModelForCausalLM.from_pretrained(MODEL_NAME,device_map="auto"),AutoTokenizer.from_pretrained(MODEL_NAME))# Define classification categories using LiteralDocumentCategory=Literal["Financial Report","Legal Contract","Technical Documentation","Marketing Material","Personal Correspondence"]# Sample documents to classifydocuments= ["Q3 Financial Summary: Revenue increased by 15% year-over-year to $12.4M. EBITDA margin improved to 23% compared to 19% in Q3 last year. Operating expenses...","This agreement is made between Party A and Party B, hereinafter referred to as 'the Parties', on this day of...","The API accepts POST requests with JSON payloads. Required parameters include 'user_id' and 'transaction_type'. The endpoint returns a 200 status code on success."]template=outlines.Template.from_string("""<|im_start|>userClassify the following document into exactly one category among the following categories:- Financial Report- Legal Contract- Technical Documentation- Marketing Material- Personal CorrespondenceDocument:{{ document }}<|im_end|><|im_start|>assistant""")# Classify documentsdefclassify_documents(texts:List[str])->List[DocumentCategory]:results= []fortextintexts:prompt=template(document=text)# The model must return one of the predefined categoriescategory=model(prompt,DocumentCategory,max_new_tokens=200)results.append(category)returnresults# Perform classificationclassifications=classify_documents(documents)# Create a simple results tableresults_df=pd.DataFrame({"Document": [doc[:50]+"..."fordocindocuments],"Classification":classifications})print(results_df)# Count documents by categorycategory_counts=pd.Series(classifications).value_counts()print("\nCategory Distribution:")print(category_counts)

📅 Schedule a meeting from requests with Function Calling
This example demonstrates how outlines can interpret a natural language meeting request and translate it into a structured format matching a predefined function’s parameters. Once the meeting details are extracted (e.g., title, date, duration, attendees), they are used to automatically schedule the meeting.

importoutlinesimportjsonfromtypingimportList,OptionalfromdatetimeimportdatefromtransformersimportAutoTokenizer,AutoModelForCausalLMMODEL_NAME="microsoft/phi-4"model=outlines.from_transformers(AutoModelForCausalLM.from_pretrained(MODEL_NAME,device_map="auto"),AutoTokenizer.from_pretrained(MODEL_NAME))# Define a function with typed parametersdefschedule_meeting(title:str,date:date,duration_minutes:int,attendees:List[str],location:Optional[str]=None,agenda_items:Optional[List[str]]=None):"""Schedule a meeting with the specified details"""# In a real app, this would create the meetingmeeting= {"title":title,"date":date,"duration_minutes":duration_minutes,"attendees":attendees,"location":location,"agenda_items":agenda_items    }returnf"Meeting '{title}' scheduled for{date} with{len(attendees)} attendees"# Natural language requestuser_request="""I need to set up a product roadmap review with the engineering team for nextTuesday at 2pm. It should last 90 minutes. Please invite john@example.com,sarah@example.com, and the product team at product@example.com."""# Outlines automatically infers the required structure from the function signatureprompt=f"""<|im_start|>userExtract the meeting details from this request:{user_request}<|im_end|><|im_start|>assistant"""meeting_params=model(prompt,schedule_meeting,max_new_tokens=200)# The result is a dictionary matching the function parametersmeeting_params=json.loads(meeting_params)print(meeting_params)# Call the function with the extracted parametersresult=schedule_meeting(**meeting_params)print(result)# "Meeting 'Product Roadmap Review' scheduled for 2023-10-17 with 3 attendees"

📝 Dynamically generate prompts with re-usable templates
Using Jinja-based templates, this example shows how to generate dynamic prompts for tasks like sentiment analysis. It illustrates how to easily re-use and customize prompts—including few-shot learning strategies—for different content types while ensuring the outputs remain structured.

importoutlinesfromtypingimportList,LiteralfromtransformersimportAutoTokenizer,AutoModelForCausalLMMODEL_NAME="microsoft/phi-4"model=outlines.from_transformers(AutoModelForCausalLM.from_pretrained(MODEL_NAME,device_map="auto"),AutoTokenizer.from_pretrained(MODEL_NAME))# 1. Create a reusable template with Jinja syntaxsentiment_template=outlines.Template.from_string("""<|im_start>userAnalyze the sentiment of the following {{ content_type }}:{{ text }}Provide your analysis as either "Positive", "Negative", or "Neutral".<|im_end><|im_start>assistant""")# 2. Generate prompts with different parametersreview="This restaurant exceeded all my expectations. Fantastic service!"prompt=sentiment_template(content_type="review",text=review)# 3. Use the templated prompt with structured generationresult=model(prompt,Literal["Positive","Negative","Neutral"])print(result)# "Positive"# Templates can also be loaded from filesexample_template=outlines.Template.from_file("templates/few_shot.txt")# Use with examples for few-shot learningexamples= [    ("The food was cold","Negative"),    ("The staff was friendly","Positive")]few_shot_prompt=example_template(examples=examples,query="Service was slow")print(few_shot_prompt)

They use outlines

Model Integrations

Model type	Description	Documentation
Server Support	vLLM and Ollama	Server Integrations →
Local Model Support	transformers and llama.cpp	Model Integrations →
API Support	OpenAI and Gemini	API Integrations →

Core Features

Feature	Description	Documentation
Multiple Choices	Constrain outputs to predefined options	Multiple Choices Guide →
Function Calls	Infer structure from function signatures	Function Guide →
JSON/Pydantic	Generate outputs matching JSON schemas	JSON Guide →
Regular Expressions	Generate text following a regex pattern	Regex Guide →
Grammars	Enforce complex output structures	Grammar Guide →

Other Features

Feature	Description	Documentation
Prompt templates	Separate complex prompts from code	Template Guide →
Custome types	Intuitive interface to build complex types	Python Types Guide →
Applications	Encapsulate templates and types into functions	Application Guide →

About .txt

Outlines is developed and maintained by.txt, a company dedicated to making LLMs more reliable for production applications.

Our focus is on advancing structured generation technology through:

🧪Cutting-edge Research: We publish our findings onstructured generation
🚀Enterprise-grade solutions: You can licenseour enterprise-grade libraries.
🧩Open Source Collaboration: We believe in building in public and contributing to the community

Community

💡Have an idea? Come chat with us onDiscord
🐞Found a bug? Open anissue
🧩Want to contribute? Consult ourcontribution guide.

Cite Outlines

@article{willard2023efficient,  title={Efficient Guided Generation for Large Language Models},  author={Willard, Brandon T and Louf, R{\'e}mi},  journal={arXiv preprint arXiv:2307.09702},  year={2023}}