replicate/replicate-pythonPublic

NotificationsYou must be signed in to change notification settings
Fork250
Star844

Python client for Replicate

License

Apache-2.0 license

844 stars 250 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 293 Commits
.github/workflows		.github/workflows
.vscode		.vscode
replicate		replicate
script		script
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Repository files navigation

Replicate Python client

This is a Python client forReplicate. It lets you run models from your Python code or Jupyter notebook, and do various other things on Replicate.

Breaking Changes in 1.0.0

The 1.0.0 release contains breaking changes:

Thereplicate.run() method now returnsFileOutputs instead of URL strings by default for models that output files.FileOutput implements an iterable interface similar tohttpx.Response, making it easier to work with files efficiently.

To revert to the previous behavior, you can opt out ofFileOutput by passinguse_file_output=False toreplicate.run():

output=replicate.run("acmecorp/acme-model",use_file_output=False)

In most cases, updating existing applications to calloutput.url should resolve any issues. But we recommend using theFileOutput objects directly as we have further improvements planned to this API and this approach is guaranteed to give the fastest results.

Tip

👋 Check out an interactive version of this tutorial onGoogle Colab.

Requirements

Python 3.8+

Install

pip install replicate

Authenticate

Before running any Python scripts that use the API, you need to set your Replicate API token in your environment.

Grab your token fromreplicate.com/account and set it as an environment variable:

export REPLICATE_API_TOKEN=<your token>

We recommend not adding the token directly to your source code, because you don't want to put your credentials in source control. If anyone used your API key, their usage would be charged to your account.

Alternative authentication

As ofreplicate 1.0.7 andcog 0.14.11 it is possible to pass aREPLICATE_API_TOKEN via thecontext as part of a prediction request.

TheReplicate() constructor will now use this context when available. This grants cog models the ability to use the Replicate client libraries, scoped to a user on a per request basis.

Run a model

Create a new Python file and add the following code, replacing the model identifier and input with your own:

>>>importreplicate>>>outputs=replicate.run("black-forest-labs/flux-schnell",input={"prompt":"astronaut riding a rocket like a horse"}    )[<replicate.helpers.FileOutputobjectat0x107179b50>]>>>forindex,outputinenumerate(outputs):withopen(f"output_{index}.webp","wb")asfile:file.write(output.read())

replicate.run raisesModelError if the prediction fails.You can access the exception'sprediction propertyto get more information about the failure.

importreplicatefromreplicate.exceptionsimportModelErrortry:output=replicate.run("stability-ai/stable-diffusion-3", {"prompt":"An astronaut riding a rainbow unicorn" })exceptModelErroraseif"(some known issue)"ine.prediction.logs:passprint("Failed prediction: "+e.prediction.id)

Note

By default the Replicate client will hold the connection open for up to 60 seconds while waitingfor the prediction to complete. This is designed to optimize getting the model output back to theclient as quickly as possible.

The timeout can be configured by passingwait=x toreplicate.run() wherex is a timeoutin seconds between 1 and 60. To disable the sync mode you can passwait=False.

AsyncIO support

You can also use the Replicate client asynchronously by prependingasync_ to the method name.

Here's an example of how to run several predictions concurrently and wait for them all to complete:

importasyncioimportreplicate# https://replicate.com/stability-ai/sdxlmodel_version="stability-ai/sdxl:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b"prompts= [f"A chariot pulled by a team of{count} rainbow unicorns"forcountin ["two","four","six","eight"]]asyncwithasyncio.TaskGroup()astg:tasks= [tg.create_task(replicate.async_run(model_version,input={"prompt":prompt}))forpromptinprompts    ]results=awaitasyncio.gather(*tasks)print(results)

To run a model that takes a file input you can pass eithera URL to a publicly accessible file on the Internetor a handle to a file on your local device.

>>>output=replicate.run("andreasjansson/blip-2:f677695e5e89f8b236e52ecd1d3f01beb44c34606419bcc19345e046d8f786f9",input={"image":open("path/to/mystery.jpg") }    )"an astronaut riding a horse"

Run a model and stream its output

Replicate’s API supports server-sent event streams (SSEs) for language models.Use thestream method to consume tokens as they're produced by the model.

importreplicateforeventinreplicate.stream("meta/meta-llama-3-70b-instruct",input={"prompt":"Please write a haiku about llamas.",    },):print(str(event),end="")

Tip

Some models, likemeta/meta-llama-3-70b-instruct,don't require a version string.You can always refer to the API documentation on the model page for specifics.

You can also stream the output of a prediction you create.This is helpful when you want the ID of the prediction separate from its output.

prediction=replicate.predictions.create(model="meta/meta-llama-3-70b-instruct",input={"prompt":"Please write a haiku about llamas."},stream=True,)foreventinprediction.stream():print(str(event),end="")

For more information, see"Streaming output" in Replicate's docs.

Run a model in the background

You can start a model and run it in the background using async mode:

>>>model=replicate.models.get("kvfrans/clipdraw")>>>version=model.versions.get("5797a99edc939ea0e9242d5e8c9cb3bc7d125b1eac21bda852e5cb79ede2cd9b")>>>prediction=replicate.predictions.create(version=version,input={"prompt":"Watercolor painting of an underwater submarine"})>>>predictionPrediction(...)>>>prediction.status'starting'>>>dict(prediction){"id":"...","status":"starting", ...}>>>prediction.reload()>>>prediction.status'processing'>>>print(prediction.logs)iteration:0,render:loss:-0.6171875iteration:10,render:loss:-0.92236328125iteration:20,render:loss:-1.197265625iteration:30,render:loss:-1.3994140625>>>prediction.wait()>>>prediction.status'succeeded'>>>prediction.output<replicate.helpers.FileOutputobjectat0x107179b50>>>>withopen("output.png","wb")asfile:file.write(prediction.output.read())

Run a model in the background and get a webhook

You can run a model and get a webhook when it completes, instead of waiting for it to finish:

model=replicate.models.get("ai-forever/kandinsky-2.2")version=model.versions.get("ea1addaab376f4dc227f5368bbd8eff901820fd1cc14ed8cad63b29249e9d463")prediction=replicate.predictions.create(version=version,input={"prompt":"Watercolor painting of an underwater submarine"},webhook="https://example.com/your-webhook",webhook_events_filter=["completed"])

For details on receiving webhooks, seereplicate.com/docs/webhooks.

Compose models into a pipeline

You can run a model and feed the output into another model:

laionide=replicate.models.get("afiaka87/laionide-v4").versions.get("b21cbe271e65c1718f2999b038c18b45e21e4fba961181fbfae9342fc53b9e05")swinir=replicate.models.get("jingyunliang/swinir").versions.get("660d922d33153019e8c263a3bba265de882e7f4f70396546b6c9c8f9d47a021a")image=laionide.predict(prompt="avocado armchair")upscaled_image=swinir.predict(image=image)

Get output from a running model

Run a model and get its output while it's running:

iterator=replicate.run("pixray/text2image:5c347a4bfa1d4523a58ae614c2194e15f2ae682b57e3797a5bb468920aa70ebf",input={"prompts":"san francisco sunset"})forindex,imageinenumerate(iterator):withopen(f"file_{index}.png","wb")asfile:file.write(image.read())

Cancel a prediction

You can cancel a running prediction:

>>>model=replicate.models.get("kvfrans/clipdraw")>>>version=model.versions.get("5797a99edc939ea0e9242d5e8c9cb3bc7d125b1eac21bda852e5cb79ede2cd9b")>>>prediction=replicate.predictions.create(version=version,input={"prompt":"Watercolor painting of an underwater submarine"}    )>>>prediction.status'starting'>>>prediction.cancel()>>>prediction.reload()>>>prediction.status'canceled'

List predictions

You can list all the predictions you've run:

replicate.predictions.list()# [<Prediction: 8b0ba5ab4d85>, <Prediction: 494900564e8c>]

Lists of predictions are paginated. You can get the next page of predictions by passing thenext property as an argument to thelist method:

page1=replicate.predictions.list()ifpage1.next:page2=replicate.predictions.list(page1.next)

Load output files

Output files are returned asFileOutput objects:

importreplicatefromPILimportImage# pip install pillowoutput=replicate.run("stability-ai/stable-diffusion:27b93a2413e7f36cd83da926f3656280b2931564ff050bf9575f1fdf9bcd7478",input={"prompt":"wavy colorful abstract patterns, oceans"}    )# This has a .read() method that returns the binary data.withopen("my_output.png","wb")asfile:file.write(output[0].read())# It also implements the iterator protocol to stream the data.background=Image.open(output[0])

FileOutput

Is afile-like object returned from thereplicate.run() method that makes it easier to work with models that output files. It implementsIterator andAsyncIterator for reading the file data in chunks as well asread() andaread() to read the entire file into memory.

Note

It is worth noting that at this timeread() andaread() do not currently accept asize argument to read up tosize bytes.

Lastly, the URL of the underlying data source is available on theurl attribute though we recommend you use the object as an iterator or use itsread() oraread() methods, as theurl property may not always return HTTP URLs in future.

print(output.url)#=> "data:image/png;base64,xyz123..." or "https://delivery.replicate.com/..."

To consume the file directly:

withopen('output.bin','wb')asfile:file.write(output.read())

Or for very large files they can be streamed:

withopen(file_path,'wb')asfile:forchunkinoutput:file.write(chunk)

Each of these methods has an equivalentasyncio API.

asyncwithaiofiles.open(filename,'w')asfile:awaitfile.write(awaitoutput.aread())asyncwithaiofiles.open(filename,'w')asfile:awaitforchunkinoutput:awaitfile.write(chunk)

For streaming responses from common frameworks, all support takingIterator types:

Django

@condition(etag_func=None)defstream_response(request):output=replicate.run("black-forest-labs/flux-schnell",input={...},use_file_output=True)returnHttpResponse(output,content_type='image/webp')

FastAPI

@app.get("/")asyncdefmain():output=replicate.run("black-forest-labs/flux-schnell",input={...},use_file_output=True)returnStreamingResponse(output)

Flask

@app.route('/stream')defstreamed_response():output=replicate.run("black-forest-labs/flux-schnell",input={...},use_file_output=True)returnapp.response_class(stream_with_context(output))

You can opt out ofFileOutput by passinguse_file_output=False to thereplicate.run() method.

constreplicate=replicate.run("acmecorp/acme-model",use_file_output=False);

List models

You can list the models you've created:

replicate.models.list()

Lists of models are paginated. You can get the next page of models by passing thenext property as an argument to thelist method, or you can use thepaginate method to fetch pages automatically.

# Automatic pagination using `replicate.paginate` (recommended)models= []forpageinreplicate.paginate(replicate.models.list):models.extend(page.results)iflen(models)>100:break# Manual pagination using `next` cursorspage=replicate.models.list()whilepage:models.extend(page.results)iflen(models)>100:breakpage=replicate.models.list(page.next)ifpage.nextelseNone

You can also find collections of featured models on Replicate:

>>>collections= [collectionforpageinreplicate.paginate(replicate.collections.list)forcollectioninpage]>>>collections[0].slug"vision-models">>>collections[0].description"Multimodal large language models with vision capabilities like object detection and optical character recognition (OCR)">>>replicate.collections.get("text-to-image").models[<Model:stability-ai/sdxl>, ...]

Create a model

You can create a model for a user or organizationwith a given name, visibility, and hardware SKU:

importreplicatemodel=replicate.models.create(owner="your-username",name="my-model",visibility="public",hardware="gpu-a40-large")

Here's how to list of all the available hardware for running models on Replicate:

>>> [hw.skuforhwinreplicate.hardware.list()]['cpu','gpu-t4','gpu-a40-small','gpu-a40-large']

Fine-tune a model

Use thetraining API to fine-tune models to make them better at a particular task. To see whatlanguage models currently support fine-tuning, check out Replicate'scollection of trainable language models.

If you're looking to fine-tuneimage models, check out Replicate'sguide to fine-tuning image models.

Here's how to fine-tune a model on Replicate:

training=replicate.trainings.create(model="stability-ai/sdxl",version="39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b",input={"input_images":"https://my-domain/training-images.zip","token_string":"TOK","caption_prefix":"a photo of TOK","max_train_steps":1000,"use_face_detection_instead":False    },# You need to create a model on Replicate that will be the destination for the trained version.destination="your-username/model-name")

Customize client behavior

Thereplicate package exports a default shared client. This client is initialized with an API token set by theREPLICATE_API_TOKEN environment variable.

You can create your own client instance to pass a different API token value, add custom headers to requests, or control the behavior of the underlyingHTTPX client:

importosfromreplicate.clientimportClientreplicate=Client(api_token=os.environ["SOME_OTHER_REPLICATE_API_TOKEN"]headers={"User-Agent":"my-app/1.0"    })

Warning

Never hardcode authentication credentials like API tokens into your code.Instead, pass them as environment variables when running your program.

Experimental`use()` interface

The latest versions ofreplicate >= 1.1.0b1 include a new experimentaluse() function that is intended to make running a model closer to calling a function rather than an API request.

Some key differences toreplicate.run().

You "import" the model using theuse() syntax, after that you call the model like a function.
The output type matches the model definition.
Baked in support for streaming for all models.
File outputs will be represented asPathLike objects and downloaded to disk when used*.

Note

* We've replaced theFileOutput implementation withPath objects. However to avoid unnecessary downloading of files until they are needed we've implemented aPathProxy class that will defer the download until the first time the object is used. If you need the underlying URL of thePath object you can use theget_path_url(path: Path) -> str helper.

Examples

To use a model:

Important

For nowuse() MUST be called in the top level module scope. We may relax this in future.

importreplicateflux_dev=replicate.use("black-forest-labs/flux-dev")outputs=flux_dev(prompt="a cat wearing an amusing hat")foroutputinoutputs:print(output)# Path(/tmp/output.webp)

Models that implement iterators will return the output of the completed run as a list unless explicitly streaming (see Streaming section below). Language models that definex-cog-iterator-display: concatenate will return strings:

claude=replicate.use("anthropic/claude-4-sonnet")output=claude(prompt="Give me a recipe for tasty smashed avocado on sourdough toast that could feed all of California.")print(output)# "Here's a recipe to feed all of California (about 39 million people)! ..."

You can pass the results of one model directly into another:

importreplicateflux_dev=replicate.use("black-forest-labs/flux-dev")claude=replicate.use("anthropic/claude-4-sonnet")images=flux_dev(prompt="a cat wearing an amusing hat")result=claude(prompt="describe this image for me",image=images[0])print(str(result))# "This shows an image of a cat wearing a hat ..."

To create an individual prediction that has not yet resolved, use thecreate() method:

claude = replicate.use("anthropic/claude-4-sonnet")prediction = claude.create(prompt="Give me a recipe for tasty smashed avocado on sourdough toast that could feed all of California.")prediction.logs() # get current logs (WIP)prediction.output() # get the output

Streaming

Many models, particularly large language models (LLMs), will yield partial results as the model is running. To consume outputs from these models as they run you can pass thestreaming argument touse():

claude=replicate.use("anthropic/claude-4-sonnet",streaming=True)output=claude(prompt="Give me a recipe for tasty smashed avocado on sourdough toast that could feed all of California.")forchunkinoutput:print(chunk)# "Here's a recipe ", "to feed all", " of California"

Downloading file outputs

Output files are provided as Pythonos.PathLike objects. These are supported by most of the Python standard library likeopen() andPath, as well as third-party libraries likepillow andffmpeg-python.

The first time the file is accessed it will be downloaded to a temporary directory on disk ready for use.

Here's an example of how to use thepillow package to convert file outputs:

importreplicatefromPILimportImageflux_dev=replicate.use("black-forest-labs/flux-dev")images=flux_dev(prompt="a cat wearing an amusing hat")fori,pathinenumerate(images):withImage.open(path)asimg:img.save(f"./output_{i}.png",format="PNG")

For libraries that do not supportPath orPathLike instances you can useopen() as you would with any other file. For example to userequests to upload the file to a different location:

importreplicateimportrequestsflux_dev=replicate.use("black-forest-labs/flux-dev")images=flux_dev(prompt="a cat wearing an amusing hat")forpathinimages:withopen(path,"rb")asf:r=requests.post("https://api.example.com/upload",files={"file":f})

Accessing outputs as HTTPS URLs

If you do not need to download the output to disk. You can access the underlying URL for a Path object returned from a model call by using theget_path_url() helper.

importreplicatefromreplicateimportget_url_pathflux_dev=replicate.use("black-forest-labs/flux-dev")outputs=flux_dev(prompt="a cat wearing an amusing hat")foroutputinoutputs:print(get_url_path(output))# "https://replicate.delivery/xyz"

Async Mode

By defaultuse() will return a function instance with a sync interface. You can passuse_async=True to have it return anAsyncFunction that provides an async interface.

importasyncioimportreplicateasyncdefmain():flux_dev=replicate.use("black-forest-labs/flux-dev",use_async=True)outputs=awaitflux_dev(prompt="a cat wearing an amusing hat")foroutputinoutputs:print(Path(output))asyncio.run(main())

When used in streaming mode then anAsyncIterator will be returned.

importasyncioimportreplicateasyncdefmain():claude=replicate.use("anthropic/claude-3.5-haiku",streaming=True,use_async=True)output=awaitclaude(prompt="say hello")# Stream the response as it comes in.asyncfortokeninoutput:print(token)# Wait until model has completed. This will return either a `list` or a `str` depending# on whether the model uses AsyncIterator or ConcatenateAsyncIterator. You can check this# on the model schema by looking for `x-cog-display: concatenate`.print(awaitoutput)asyncio.run(main())

Typing

By defaultuse() knows nothing about the interface of the model. To provide a better developer experience we provide two methods to add type annotations to the function returned by theuse() helper.

1. Provide a function signature

The use method accepts a function signature as an additionalhint keyword argument. When provided it will use this signature for themodel() andmodel.create() functions.

# Flux takes a required prompt string and optional image and seed.defhint(*,prompt:str,image:Path|None=None,seed:int|None=None)->str: ...flux_dev=use("black-forest-labs/flux-dev",hint=hint)output1=flux_dev()# will warn that `prompt` is missingoutput2=flux_dev(prompt="str")# output2 will be typed as `str`

2. Provide a class

The second method requires creating a callable class with aname field. The name will be used as the function reference when passed touse().

classFluxDev:name="black-forest-labs/flux-dev"def__call__(self,*,prompt:str,image:Path|None=None,seed:int|None=None )->str: ...flux_dev=use(FluxDev)output1=flux_dev()# will warn that `prompt` is missingoutput2=flux_dev(prompt="str")# output2 will be typed as `str`

Warning

Currently the typing system doesn't correctly support thestreaming flag for models that return lists or use iterators. We're working on improvements here.

In future we hope to provide tooling to generate and provide these models as packages to make working with them easier. For now you may wish to create your own.

API Reference

The Replicate Python Library provides several key classes and functions for working with models in pipelines:

`use()` Function

Creates a callable function wrapper for a Replicate model.

defuse(ref:FunctionRef,*,streaming:bool=False,use_async:bool=False)->Function|AsyncFunctiondefuse(ref:str,*,hint:Callable|None=None,streaming:bool=False,use_async:bool=False)->Function|AsyncFunction

Parameters:

Parameter	Type	Default	Description
`ref`	`str \| FunctionRef`	Required	Model reference (e.g., "owner/model" or "owner/model:version")
`hint`	`Callable \| None`	`None`	Function signature for type hints
`streaming`	`bool`	`False`	Return OutputIterator for streaming results
`use_async`	`bool`	`False`	Return AsyncFunction instead of Function

Returns:

Function - Synchronous model wrapper (default)
AsyncFunction - Asynchronous model wrapper (whenuse_async=True)

`Function` Class

A synchronous wrapper for calling Replicate models.

Methods:

Method	Signature	Description
`__call__()`	`(args, *inputs) -> Output`	Execute the model and return final output
`create()`	`(args, *inputs) -> Run`	Start a prediction and return Run object

Properties:

Property	Type	Description
`openapi_schema`	`dict`	Model's OpenAPI schema for inputs/outputs
`default_example`	`dict \| None`	Default example inputs (not yet implemented)

`AsyncFunction` Class

An asynchronous wrapper for calling Replicate models.

Methods:

Method	Signature	Description
`__call__()`	`async (args, *inputs) -> Output`	Execute the model and return final output
`create()`	`async (args, *inputs) -> AsyncRun`	Start a prediction and return AsyncRun object

Properties:

Property	Type	Description
`openapi_schema()`	`async () -> dict`	Model's OpenAPI schema for inputs/outputs
`default_example`	`dict \| None`	Default example inputs (not yet implemented)

`Run` Class

Represents a running prediction with access to output and logs.

Methods:

Method	Signature	Description
`output()`	`() -> Output`	Get prediction output (blocks until complete)
`logs()`	`() -> str \| None`	Get current prediction logs

Behavior:

Whenstreaming=True: ReturnsOutputIterator immediately
Whenstreaming=False: Waits for completion and returns final result

`AsyncRun` Class

Asynchronous version of Run for async model calls.

Methods:

Method	Signature	Description
`output()`	`async () -> Output`	Get prediction output (awaits completion)
`logs()`	`async () -> str \| None`	Get current prediction logs

`OutputIterator` Class

Iterator wrapper for streaming model outputs.

Methods:

Method	Signature	Description
`__iter__()`	`() -> Iterator[T]`	Synchronous iteration over output chunks
`__aiter__()`	`() -> AsyncIterator[T]`	Asynchronous iteration over output chunks
`__str__()`	`() -> str`	Convert to string (concatenated or list representation)
`__await__()`	`() -> List[T] \| str`	Await all results (string for concatenate, list otherwise)