Learn how to use inference caching with TensorZero Gateway.
write_only (default): Only write to cache but don’t serve cached responsesread_only: Only read from cache but don’t write new entrieson: Both read from and write to cacheoff: Disable caching completelyfrom tensorzeroimport TensorZeroGatewaywith TensorZeroGateway.build_http(gateway_url="http://localhost:3000")as client: response= client.inference( model_name="openai::gpt-4o-mini", input={ "messages": [ { "role":"user", "content":"What is the capital of Japan?", } ] }, cache_options={ "enabled":"on",# read and write to cache "max_age_s":3600,# optional: cache entries >1h (>3600s) old are disregarded for reads }, )print(response)max_age_s parameter applies to the retrieval of cached responses.The cache does not automatically delete old entries (i.e. not a TTL).