Posted onSep 18, 2023

Integrate Orquesta with OpenAI using Python SDK

#openai #llm #promptengineering #ai

Orquesta is a powerful LLM Ops suite designed to manage both public and private LLMs (Large Language Models) from a single source. It offers full transparency on performance and costs while reducing your release cycles from weeks to minutes. Integrating Orquesta into your current setup or a new workflow requires a couple of lines of code, ensuring seamless collaboration and transparency in prompt engineering and prompt management for your team.

With Orquesta, you gain access to several LLM Ops features, enabling your team to:

Collaborate directly across product, engineering, and domain expert teams.
Manage prompts for both public and private LLM models.
Customize and localize prompt variants based on your data model.
Push new versions directly to production and roll back instantly.
Obtain model-specific token and cost estimates.
Gain insights into model-specific costs, performance, and latency.
Gather both quantitative and qualitative end-user feedback.
Experiment in production and gather real-world feedback.
Make decisions grounded in real-world information.

This article guides you through integrating your SaaS with Orquesta and OpenAI using our Python SDK. By the end of the article, you'll know how to set up a prompt in Orquesta, perform prompt engineering, request a prompt variant using the SDK code generator, map the Orquesta response with OpenAI, send a payload to OpenAI, and report the response back to Orquesta for observability and monitoring.

Prerequisites

For you to be able to follow along in this tutorial, you will need the following:

Jupyter Notebook (or any IDE of your choice).
An OpenAI account, you can sign uphere.
Orquesta Python SDK.

Integration

Follow these steps to integrate the Python SDK with OpenAI.

Step 1 - Install SDK and create a client instance

pipinstallorquesta-sdk

To create a client instance, you need to have access to the Orquesta API key, which can be found in your workspacehttps://my.orquesta.dev/<workspace-name>/settings/developers.

Copy it and add the following code to your notebook to initialize the Orquesta client.

fromorquesta_sdkimportOrquestaClient,OrquestaClientOptionsapi_key="<ORQUESTA_API_KEY>"options=OrquestaClientOptions(api_key=api_key,ttl=3600)client=OrquestaClient(options)

TheOrquestaClient and theOrquestaClientOptions classes which are already defined in theorquesta_sdk module is imported. TheAPI key, which is used for authentication, is assigned to the variableapi_key, you can either add the API key this way, or you can add it using the environment variable;api_key = os.environ.get("ORQUESTA_API_KEY", "__API_KEY__"). The instance of theOrquestaClientOptions class is created and configured with theapi_key and thettl (Time to Live) in seconds for the local cache; by default, it is3600 seconds (1 hour).

Finally, an instance of theOrquestaClient class is created and initialized with the previously configured options object. Thisclient instance can now interact with the Orquesta service using the provided API key for authentication.

Step 2 - Set up a prompt and its variants

After successfully connecting to Orquesta, you continue within the Orquesta Admin Panel to set up your prompt and variants. A prompt is the specific task you provide to LLM, and you'll get a response that is the output of the language model accomplishing the task. To create a prompt, click onAdd Prompt and theprompt key.

The image above represents the Prompt Studio in Orquesta, where:

The name of the prompt variant.
Notes, this is where you drop notes for other collaborators.
Since we are working on a chat prompt, this is where you manage the System-User-Assistant messages.
Prompt variables provide flexibility in your prompts.
Prompt tokens and cost are estimated based on the model selected.
Model Selector.
ClickSave once you are done.

Step 3 - Request a variant from Orquesta using the SDK

Our flexible configuration matrix allows you to define multiple prompt variants based on custom context. This allows you to work with different prompts and hyperparameters with, for example, environment, country, locale or user segment. The Code Snippet Generator makes it easy to request a prompt variant.

Once you open the Code Snippet Generator, you can use the generated snippet to consume your first prompt from your application.

Step 4 - Map the Orquesta response to OpenAI using a Helper

Map the Orquesta response to OpenAI's API using the Helper functions. Each LLM provider has its own Helper function in Orquesta.

For OpenAI, use the Helper:orquesta_openai_parameters_mapperor Class:OrquestaOpenAIPromptParameters.

importosimporttimeimportopenaifromorquesta_sdkimportOrquestaClient,OrquestaClientOptionsfromorquesta_sdk.helpersimportorquesta_openai_parameters_mapperfromorquesta_sdk.promptsimportOrquestaPromptMetricsopenai.api_key="<OPENAI_API_KEY>"

Paste the code copied from the Code Snippet Generator here.

# Query the prompt from Orquestaprompt=client.prompts.query(key="customer-support-chat",context={"environments":["test"],"country":["BEL","NLD"],"locale":["en"],"user-segment":["b2c"]},variables={"customer_name":"John"},metadata={"user_id":45515})ifprompt.has_error:print("There was an error while fetching the prompt")

You can now send the payload to OpenAI and receive the response.

# Start time of the completion requeststart_time=time.time()print(f'Start time:{start_time}')completion=openai.ChatCompletion.create(**orquesta_openai_parameters_mapper(prompt.value),model=prompt.value.get("model"),messages=prompt.value.get("messages"),)# End time of the completion requestend_time=time.time()print(f'End time:{end_time}')# Calculate the difference (latency) in millisecondslatency=(end_time-start_time)*1000print(f'Latency is:{latency}')

Step 5 - Report analytics back to Orquesta

After each query, Orquesta generates a log with a Trace ID. Using theadd_metrics() method, you can add additional information, such as thellm_response,metadata,latency, andeconomics.

# Report the metrics back to Orquestametrics=OrquestaPromptMetrics(economics={"total_tokens":completion.usage.get("total_tokens"),"completion_tokens":completion.usage.get("completion_tokens"),"prompt_tokens":completion.usage.get("prompt_tokens"),},llm_response=completion.choices[0].message.content,latency=latency,metadata={"finish_reason":completion.choices[0].finish_reason,},)prompt.add_metrics(metrics=metrics)

Conclusion

And that is it, and you have integrated Orquesta with OpenAI using the Python SDK! You can easily design, test, and manage prompts for all your LLM providers using Orquesta by simply leveraging its power tools with real-time logs, versioning, code snippets, and a playground for your prompts.

Orquesta supports otherSDKs such as Angular, Node.js, React, and TypeScript. Refer to ourdocumentation for more information.

Full Code Example

importosimporttimeimportopenaifromorquesta_sdkimportOrquestaClient,OrquestaClientOptionsfromorquesta_sdk.helpersimportorquesta_openai_parameters_mapperfromorquesta_sdk.promptsimportOrquestaPromptMetricsopenai.api_key="<OPENAI_API_KEY>"# Initialize Orquesta clientapi_key="<ORQUESTA_API_KEY>"options=OrquestaClientOptions(api_key=api_key,ttl=3600)client=OrquestaClient(options)# Query the prompt from Orquestaprompt=client.prompts.query(key="customer-support-chat",context={"environments":["test"],"country":["BEL","NLD"],"locale":["en"],"user-segment":["b2c"]},variables={"customer_name":"John"},metadata={"user_id":45515})ifprompt.has_error:print("There was an error while fetching the prompt")# Start time of the completion requeststart_time=time.time()print(f'Start time:{start_time}')completion=openai.ChatCompletion.create(**orquesta_openai_parameters_mapper(prompt.value),model=prompt.value.get("model"),messages=prompt.value.get("messages"),)# End time of the completion requestend_time=time.time()print(f'End time:{end_time}')# Calculate the difference (latency) in millisecondslatency=(end_time-start_time)*1000print(f'Latency is:{latency}')# Report the metrics back to Orquestametrics=OrquestaPromptMetrics(economics={"total_tokens":completion.usage.get("total_tokens"),"completion_tokens":completion.usage.get("completion_tokens"),"prompt_tokens":completion.usage.get("prompt_tokens"),},llm_response=completion.choices[0].message.content,latency=latency,metadata={"finish_reason":completion.choices[0].finish_reason,},)prompt.add_metrics(metrics=metrics)