replicate/replicate-pythonPublic

NotificationsYou must be signed in to change notification settings
Fork250
Star844

Python client for Replicate

License

Apache-2.0 license

844 stars 250 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 287 Commits
.github/workflows		.github/workflows
.vscode		.vscode
replicate		replicate
script		script
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Repository files navigation

Replicate Python client

This is a Python client forReplicate. It lets you run models from your Python code or Jupyter notebook, and do various other things on Replicate.

Breaking Changes in 1.0.0

The 1.0.0 release contains breaking changes:

Thereplicate.run() method now returnsFileOutputs instead of URL strings by default for models that output files.FileOutput implements an iterable interface similar tohttpx.Response, making it easier to work with files efficiently.

To revert to the previous behavior, you can opt out ofFileOutput by passinguse_file_output=False toreplicate.run():

output=replicate.run("acmecorp/acme-model",use_file_output=False)

In most cases, updating existing applications to calloutput.url should resolve any issues. But we recommend using theFileOutput objects directly as we have further improvements planned to this API and this approach is guaranteed to give the fastest results.

Tip

👋 Check out an interactive version of this tutorial onGoogle Colab.

Requirements

Python 3.8+

Install

pip install replicate

Authenticate

Before running any Python scripts that use the API, you need to set your Replicate API token in your environment.

Grab your token fromreplicate.com/account and set it as an environment variable:

export REPLICATE_API_TOKEN=<your token>

We recommend not adding the token directly to your source code, because you don't want to put your credentials in source control. If anyone used your API key, their usage would be charged to your account.

Alternative authentication

As ofreplicate 1.0.7 andcog 0.14.11 it is possible to pass aREPLICATE_API_TOKEN via thecontext as part of a prediction request.

TheReplicate() constructor will now use this context when available. This grants cog models the ability to use the Replicate client libraries, scoped to a user on a per request basis.

Run a model

Create a new Python file and add the following code, replacing the model identifier and input with your own:

>>>importreplicate>>>outputs=replicate.run("black-forest-labs/flux-schnell",input={"prompt":"astronaut riding a rocket like a horse"}    )[<replicate.helpers.FileOutputobjectat0x107179b50>]>>>forindex,outputinenumerate(outputs):withopen(f"output_{index}.webp","wb")asfile:file.write(output.read())

replicate.run raisesModelError if the prediction fails.You can access the exception'sprediction propertyto get more information about the failure.

importreplicatefromreplicate.exceptionsimportModelErrortry:output=replicate.run("stability-ai/stable-diffusion-3", {"prompt":"An astronaut riding a rainbow unicorn" })exceptModelErroraseif"(some known issue)"ine.prediction.logs:passprint("Failed prediction: "+e.prediction.id)

Note

By default the Replicate client will hold the connection open for up to 60 seconds while waitingfor the prediction to complete. This is designed to optimize getting the model output back to theclient as quickly as possible.

The timeout can be configured by passingwait=x toreplicate.run() wherex is a timeoutin seconds between 1 and 60. To disable the sync mode you can passwait=False.

AsyncIO support

You can also use the Replicate client asynchronously by prependingasync_ to the method name.

Here's an example of how to run several predictions concurrently and wait for them all to complete:

importasyncioimportreplicate# https://replicate.com/stability-ai/sdxlmodel_version="stability-ai/sdxl:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b"prompts= [f"A chariot pulled by a team of{count} rainbow unicorns"forcountin ["two","four","six","eight"]]asyncwithasyncio.TaskGroup()astg:tasks= [tg.create_task(replicate.async_run(model_version,input={"prompt":prompt}))forpromptinprompts    ]results=awaitasyncio.gather(*tasks)print(results)

To run a model that takes a file input you can pass eithera URL to a publicly accessible file on the Internetor a handle to a file on your local device.

>>>output=replicate.run("andreasjansson/blip-2:f677695e5e89f8b236e52ecd1d3f01beb44c34606419bcc19345e046d8f786f9",input={"image":open("path/to/mystery.jpg") }    )"an astronaut riding a horse"

Run a model and stream its output

Replicate’s API supports server-sent event streams (SSEs) for language models.Use thestream method to consume tokens as they're produced by the model.

importreplicateforeventinreplicate.stream("meta/meta-llama-3-70b-instruct",input={"prompt":"Please write a haiku about llamas.",    },):print(str(event),end="")

Tip

Some models, likemeta/meta-llama-3-70b-instruct,don't require a version string.You can always refer to the API documentation on the model page for specifics.

You can also stream the output of a prediction you create.This is helpful when you want the ID of the prediction separate from its output.

prediction=replicate.predictions.create(model="meta/meta-llama-3-70b-instruct",input={"prompt":"Please write a haiku about llamas."},stream=True,)foreventinprediction.stream():print(str(event),end="")

For more information, see"Streaming output" in Replicate's docs.

Run a model in the background

You can start a model and run it in the background using async mode:

>>>model=replicate.models.get("kvfrans/clipdraw")>>>version=model.versions.get("5797a99edc939ea0e9242d5e8c9cb3bc7d125b1eac21bda852e5cb79ede2cd9b")>>>prediction=replicate.predictions.create(version=version,input={"prompt":"Watercolor painting of an underwater submarine"})>>>predictionPrediction(...)>>>prediction.status'starting'>>>dict(prediction){"id":"...","status":"starting", ...}>>>prediction.reload()>>>prediction.status'processing'>>>print(prediction.logs)iteration:0,render:loss:-0.6171875iteration:10,render:loss:-0.92236328125iteration:20,render:loss:-1.197265625iteration:30,render:loss:-1.3994140625>>>prediction.wait()>>>prediction.status'succeeded'>>>prediction.output<replicate.helpers.FileOutputobjectat0x107179b50>>>>withopen("output.png","wb")asfile:file.write(prediction.output.read())

Run a model in the background and get a webhook

You can run a model and get a webhook when it completes, instead of waiting for it to finish:

model=replicate.models.get("ai-forever/kandinsky-2.2")version=model.versions.get("ea1addaab376f4dc227f5368bbd8eff901820fd1cc14ed8cad63b29249e9d463")prediction=replicate.predictions.create(version=version,input={"prompt":"Watercolor painting of an underwater submarine"},webhook="https://example.com/your-webhook",webhook_events_filter=["completed"])

For details on receiving webhooks, seereplicate.com/docs/webhooks.

Compose models into a pipeline

You can run a model and feed the output into another model:

laionide=replicate.models.get("afiaka87/laionide-v4").versions.get("b21cbe271e65c1718f2999b038c18b45e21e4fba961181fbfae9342fc53b9e05")swinir=replicate.models.get("jingyunliang/swinir").versions.get("660d922d33153019e8c263a3bba265de882e7f4f70396546b6c9c8f9d47a021a")image=laionide.predict(prompt="avocado armchair")upscaled_image=swinir.predict(image=image)

Get output from a running model

Run a model and get its output while it's running:

iterator=replicate.run("pixray/text2image:5c347a4bfa1d4523a58ae614c2194e15f2ae682b57e3797a5bb468920aa70ebf",input={"prompts":"san francisco sunset"})forindex,imageinenumerate(iterator):withopen(f"file_{index}.png","wb")asfile:file.write(image.read())

Cancel a prediction

You can cancel a running prediction:

>>>model=replicate.models.get("kvfrans/clipdraw")>>>version=model.versions.get("5797a99edc939ea0e9242d5e8c9cb3bc7d125b1eac21bda852e5cb79ede2cd9b")>>>prediction=replicate.predictions.create(version=version,input={"prompt":"Watercolor painting of an underwater submarine"}    )>>>prediction.status'starting'>>>prediction.cancel()>>>prediction.reload()>>>prediction.status'canceled'

List predictions

You can list all the predictions you've run:

replicate.predictions.list()# [<Prediction: 8b0ba5ab4d85>, <Prediction: 494900564e8c>]

Lists of predictions are paginated. You can get the next page of predictions by passing thenext property as an argument to thelist method:

page1=replicate.predictions.list()ifpage1.next:page2=replicate.predictions.list(page1.next)

Load output files

Output files are returned asFileOutput objects:

importreplicatefromPILimportImage# pip install pillowoutput=replicate.run("stability-ai/stable-diffusion:27b93a2413e7f36cd83da926f3656280b2931564ff050bf9575f1fdf9bcd7478",input={"prompt":"wavy colorful abstract patterns, oceans"}    )# This has a .read() method that returns the binary data.withopen("my_output.png","wb")asfile:file.write(output[0].read())# It also implements the iterator protocol to stream the data.background=Image.open(output[0])

FileOutput

Is afile-like object returned from thereplicate.run() method that makes it easier to work with models that output files. It implementsIterator andAsyncIterator for reading the file data in chunks as well asread() andaread() to read the entire file into memory.

Note

It is worth noting that at this timeread() andaread() do not currently accept asize argument to read up tosize bytes.

Lastly, the URL of the underlying data source is available on theurl attribute though we recommend you use the object as an iterator or use itsread() oraread() methods, as theurl property may not always return HTTP URLs in future.

print(output.url)#=> "data:image/png;base64,xyz123..." or "https://delivery.replicate.com/..."

To consume the file directly:

withopen('output.bin','wb')asfile:file.write(output.read())

Or for very large files they can be streamed:

withopen(file_path,'wb')asfile:forchunkinoutput:file.write(chunk)

Each of these methods has an equivalentasyncio API.

asyncwithaiofiles.open(filename,'w')asfile:awaitfile.write(awaitoutput.aread())asyncwithaiofiles.open(filename,'w')asfile:awaitforchunkinoutput:awaitfile.write(chunk)

For streaming responses from common frameworks, all support takingIterator types:

Django

@condition(etag_func=None)defstream_response(request):output=replicate.run("black-forest-labs/flux-schnell",input={...},use_file_output=True)returnHttpResponse(output,content_type='image/webp')

FastAPI

@app.get("/")asyncdefmain():output=replicate.run("black-forest-labs/flux-schnell",input={...},use_file_output=True)returnStreamingResponse(output)

Flask

@app.route('/stream')defstreamed_response():output=replicate.run("black-forest-labs/flux-schnell",input={...},use_file_output=True)returnapp.response_class(stream_with_context(output))

You can opt out ofFileOutput by passinguse_file_output=False to thereplicate.run() method.

constreplicate=replicate.run("acmecorp/acme-model",use_file_output=False);

List models

You can list the models you've created:

replicate.models.list()

Lists of models are paginated. You can get the next page of models by passing thenext property as an argument to thelist method, or you can use thepaginate method to fetch pages automatically.

# Automatic pagination using `replicate.paginate` (recommended)models= []forpageinreplicate.paginate(replicate.models.list):models.extend(page.results)iflen(models)>100:break# Manual pagination using `next` cursorspage=replicate.models.list()whilepage:models.extend(page.results)iflen(models)>100:breakpage=replicate.models.list(page.next)ifpage.nextelseNone

You can also find collections of featured models on Replicate:

>>>collections= [collectionforpageinreplicate.paginate(replicate.collections.list)forcollectioninpage]>>>collections[0].slug"vision-models">>>collections[0].description"Multimodal large language models with vision capabilities like object detection and optical character recognition (OCR)">>>replicate.collections.get("text-to-image").models[<Model:stability-ai/sdxl>, ...]

Create a model

You can create a model for a user or organizationwith a given name, visibility, and hardware SKU:

importreplicatemodel=replicate.models.create(owner="your-username",name="my-model",visibility="public",hardware="gpu-a40-large")

Here's how to list of all the available hardware for running models on Replicate:

>>> [hw.skuforhwinreplicate.hardware.list()]['cpu','gpu-t4','gpu-a40-small','gpu-a40-large']

Fine-tune a model

Use thetraining API to fine-tune models to make them better at a particular task. To see whatlanguage models currently support fine-tuning, check out Replicate'scollection of trainable language models.

If you're looking to fine-tuneimage models, check out Replicate'sguide to fine-tuning image models.

Here's how to fine-tune a model on Replicate:

training=replicate.trainings.create(model="stability-ai/sdxl",version="39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b",input={"input_images":"https://my-domain/training-images.zip","token_string":"TOK","caption_prefix":"a photo of TOK","max_train_steps":1000,"use_face_detection_instead":False    },# You need to create a model on Replicate that will be the destination for the trained version.destination="your-username/model-name")

Customize client behavior

Thereplicate package exports a default shared client. This client is initialized with an API token set by theREPLICATE_API_TOKEN environment variable.

You can create your own client instance to pass a different API token value, add custom headers to requests, or control the behavior of the underlyingHTTPX client:

importosfromreplicate.clientimportClientreplicate=Client(api_token=os.environ["SOME_OTHER_REPLICATE_API_TOKEN"]headers={"User-Agent":"my-app/1.0"    })