Posted onSep 18, 2023

Integrate Orquesta with Cohere using Python SDK

#llm #cohere #promptengineering #ai

Orquesta provides your product teams with no-code collaboration tooling to experiment, operate, and monitor LLMs and remote configurations within your SaaS. Using Orquesta, you can easily perform prompt engineering, prompt management, experiment in production, push new versions directly to production, and roll back instantly.

Cohere, on the other hand, is an API that offers language processing to any system. It trains massive language models and puts them behind a very simple API.

Source:Cohere.

This article guides you through integrating your SaaS with Orquesta and Cohere using our Python SDK. By the end of the article, you'll know how to set up a prompt in Orquesta, perform prompt engineering, request a prompt variant using our SDK code generator, map the Orquesta response with Cohere, send a payload to Cohere, and report the response back to Orquesta for observability and monitoring.

Prerequisites

For you to be able to follow along in this tutorial, you will need the following:

Jupyter Notebook (or any IDE of your choice).
Orquesta Python SDK.

Integration

Follow these steps to integrate the Python SDK with Cohere.

Step 1 - Install SDK and create a client instance

pipinstallorquesta-sdkpipinstallcohere

To create a client instance, you need to have access to the Orquesta API key, which can be found in your workspacehttps://my.orquesta.dev/<workspace-name>/settings/developers

Copy it and add the following code to your notebook to initialize the Orquesta client.

importtimeimportcoherefromorquesta_sdkimportOrquestaClient,OrquestaClientOptionsfromorquesta_sdk.helpersimportorquesta_cohere_parameters_mapperfromorquesta_sdk.promptsimportOrquestaPromptMetrics# Initialize Orquesta clientfromorquesta_sdkimportOrquestaClient,OrquestaClientOptionsapi_key="ORQUESTA-API-KEY"options=OrquestaClientOptions(api_key=api_key,ttl=3600)client=OrquestaClient(options)

Explanation:

Import thetime module to calculate the total time for the program to run
We also importcohere, to be able to use the API
TheOrquestaClient and theOrquestaClientOptions classes that are already defined in theorquesta_sdk module, are imported.
The Orquesta SDK has helper functions that map and interface between Orquesta and specific LLM providers. For this integration, we will make use of theorquesta_cohere_parameters_mapper helper
To log all the interactions with Cohere, we use theOrquestaPromptMetrics class
We create the instance of theOrquestaClientOptions and configure it with theapi_key and thettl (Time to Live) in seconds for the local cache; by default, it is3600 seconds (1 hour)

Finally, an instance of theOrquestaClient class is created and initialized with the previously configured options object. Thisclient instance can now interact with the Orquesta service using the provided API key for authentication.

Step 2 - Enable Cohere models in Model Garden

Head over to Orquesta's Model Garden and enable the Cohere models you want to use.

Step 3 - Set up a completion prompt and variants

The next step is to set up your completion prompt; ensure it is completion and not chat to use Cohere.

To create a prompt, click onAdd Prompt, provide aprompt key, aDomain (optional) and selectCompletion.

Once that is set up, create your first completion, give it a name prompt, add all the necessary information, and clickSave.

Step 4 - Request a variant from Orquesta using the SDK

Our flexible configuration matrix allows you to define multiple prompt variants based on custom context. This allows you to work with different prompts and hyperparameters with, for example, environment, country, locale or user segment. TheCode Snippet Generator makes it easy to request a prompt variant.

Once you open the Code Snippet Generator, copy the code snippet and paste it into your editor.

# Query the prompt from Orquestaprompt=client.prompts.query(key="data_completion",context={"environments":["test"]},variables={})

Step 5 - Map the Orquesta response to Cohere using a Helper

We have already established at the beginning of this tutorial that for us to be able to integrate these two technologies, we will make use of a Helper provided by Orquesta, which isorquesta_cohere_parameters_mapper.

# Start time of the completion requeststart_time=time.time()print(f'Start time:{start_time}')co=cohere.Client('COHERE-API-KEY')# Insert your Cohere API keycompletion=co.generate(**orquesta_cohere_parameters_mapper(prompt.value),model=prompt.value.get("model"),prompt=prompt.value.get('prompt'),)# End time of the completion requestend_time=time.time()print(f'End time:{end_time}')# Calculate the difference (latency) in millisecondslatency=(end_time-start_time)*1000print(f'Latency is:{latency}')

Explanation:

We start thetime using the time module.
An instance of the Cohereclient is created.
Using thegenerate() endpoint, we can generate realistic text conditioned on a given input
Thegenerate() endpoint also receives other body parameters, such as theprompt as a required string, themodel, thenum_generations,max_tokens,temperature, etc. For simplicity, we are only working with model and prompt
We end thetime and calculatelatency.

Step 6 - Report analytics back to Orquesta

After each query, Orquesta generates a log with a Trace ID. Using theadd_metrics() method, you can add additional information, such as thellm_response,metadata,latency, andeconomics.

# Tokenize responsesprompt_tokenization=co.tokenize(prompt.value.get('prompt'))completion_tokenization=co.tokenize(completion.generations[0].text)prompt_tokens=len(prompt_tokenization.tokens)completion_tokens=len(completion_tokenization.tokens)total_tokens=prompt_tokens+completion_tokens# Report the metrics back to Orquestametrics=OrquestaPromptMetrics(economics={"total_tokens":total_tokens,"completion_tokens":completion_tokens,"prompt_tokens":prompt_tokens,},llm_response=completion.generations[0].text,latency=latency,metadata={"finish_reason":completion.generations[0].finish_reason,},)prompt.add_metrics(metrics=metrics)

Conclusion

With these easy steps, you have successfully integrated Orquesta with Cohere, and this is just the tip of the iceberg because, as of the time of writing this article, Orquesta only supports thegenerate() endpoint, but in the future, you can use the other endpoints, such asembed,classify,summarize,detect-language, etc.

Orquesta supports otherSDKs such as Angular, Node.js, React, and TypeScript. Refer to ourdocumentation for more information.

Full Code Example

importosimporttimeimportcoherefromorquesta_sdkimportOrquestaClient,OrquestaClientOptionsfromorquesta_sdk.helpersimportorquesta_cohere_parameters_mapperfromorquesta_sdk.promptsimportOrquestaPromptMetrics# Initialize Orquesta clientfromorquesta_sdkimportOrquestaClient,OrquestaClientOptionsapi_key="ORQUESTA-API-KEY"options=OrquestaClientOptions(api_key=api_key,ttl=3600)client=OrquestaClient(options)co=cohere.Client('COEHERE-API-KEY')prompt=client.prompts.query(key="data_completion",context={"environments":["test"]},variables={},metadata={"user_id":45515})# Start time of the completion requeststart_time=time.time()print(f'Start time:{start_time}')completion=co.generate(**orquesta_cohere_parameters_mapper(prompt.value),model=prompt.value.get("model"),prompt=prompt.value.get('prompt'),)# End time of the completion requestend_time=time.time()print(f'End time:{end_time}')# Calculate the difference (latency) in millisecondslatency=(end_time-start_time)*1000print(f'Latency is:{latency}')# Tokenize responsesprompt_tokenization=co.tokenize(prompt.value.get('prompt'))completion_tokenization=co.tokenize(completion.generations[0].text)prompt_tokens=len(prompt_tokenization.tokens)completion_tokens=len(completion_tokenization.tokens)total_tokens=prompt_tokens+completion_tokens# Tokenize responsesprompt_tokenization=co.tokenize(prompt.value.get('prompt'))completion_tokenization=co.tokenize(completion.generations[0].text)prompt_tokens=len(prompt_tokenization.tokens)completion_tokens=len(completion_tokenization.tokens)total_tokens=prompt_tokens+completion_tokens# Report the metrics back to Orquestametrics=OrquestaPromptMetrics(economics={"total_tokens":total_tokens,"completion_tokens":completion_tokens,"prompt_tokens":prompt_tokens,},llm_response=completion.generations[0].text,latency=latency,metadata={"finish_reason":completion.generations[0].finish_reason,},)prompt.add_metrics(metrics=metrics)