Movatterモバイル変換


[0]ホーム

URL:


Advanced Usage

The Python SDK provides advanced usage options for your application. This includes data masking, logging, sampling, filtering, and more.

Masking Sensitive Data

If your trace data (inputs, outputs, metadata) might contain sensitive information (PII, secrets), you can provide amask function during client initialization. This function will be applied to all relevant data before it’s sent to Langfuse.

Themask function should acceptdata as a keyword argument and return the masked data. The returned data must be JSON-serializable.

from langfuseimport Langfuseimport redef pii_masker(data:any,**kwargs) ->any:    # Example: Simple email masking. Implement your more robust logic here.    if isinstance(data,str):        return re.sub(r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+","[EMAIL_REDACTED]", data)    elif isinstance(data,dict):        return {k: pii_masker(data=v)for k, vin data.items()}    elif isinstance(data,list):        return [pii_masker(data=item)for itemin data]    return datalangfuse= Langfuse(mask=pii_masker)# Now, any input/output/metadata will be passed through pii_maskerwith langfuse.start_as_current_observation(as_type="span",name="user-query",input={"email":"test@example.com","query":"..."})as span:    # The 'email' field in the input will be masked.    pass

Logging

The Langfuse SDK uses Python’s standardlogging module. The main logger is named"langfuse".To enable detailed debug logging, you can either:

  1. Set thedebug=True parameter when initializing theLangfuse client.
  2. Set theLANGFUSE_DEBUG="True" environment variable.
  3. Configure the"langfuse" logger manually:
import logginglangfuse_logger= logging.getLogger("langfuse")langfuse_logger.setLevel(logging.DEBUG)

The default log level for thelangfuse logger islogging.WARNING.

Sampling

You can configure the SDK to sample traces by setting thesample_rate parameter during client initialization (or via theLANGFUSE_SAMPLE_RATE environment variable). This value should be a float between0.0 (sample 0% of traces) and1.0 (sample 100% of traces).

If a trace is not sampled, none of its observations (spans, generations) or associated scores will be sent to Langfuse.

from langfuseimport Langfuse# Sample approximately 20% of traceslangfuse_sampled= Langfuse(sample_rate=0.2)

Filtering by Instrumentation Scope

You can configure the SDK to filter out spans from specific instrumentation libraries by using theblocked_instrumentation_scopes parameter. This is useful when you want to exclude infrastructure spans while keeping your LLM and application spans.

from langfuseimport Langfuse# Filter out database spanslangfuse= Langfuse(    blocked_instrumentation_scopes=["sqlalchemy","psycopg"])

How it works:

When third-party libraries create OpenTelemetry spans (through their instrumentation packages), each span has an associated “instrumentation scope” that identifies which library created it. The Langfuse SDK filters spans at the export level based on these scope names.

You can see the instrumentation scope name for any span in the Langfuse UI under the span’s metadata (metadata.scope.name). Use this to identify which scopes you want to filter.

⚠️

Cross-Library Span Relationships

When filtering instrumentation scopes, be aware that blocking certain libraries may break trace tree relationships if spans from blocked and non-blocked libraries are nested together.

For example, if you block parent spans but keep child spans from a separate library, you may see “orphaned” LLM spans whose parent spans were filtered out. This can make traces harder to interpret.

Consider the impact on trace structure when choosing which scopes to filter.

Isolated TracerProvider

You can configure a separate OpenTelemetry TracerProvider for use with Langfuse. This creates isolation between Langfuse tracing and your other observability systems.

Benefits of isolation:

  • Langfuse spans won’t be sent to your other observability backends (e.g., Datadog, Jaeger, Zipkin)
  • Third-party library spans won’t be sent to Langfuse
  • Independent configuration and sampling rates
⚠️

While TracerProviders are isolated, they share the same OpenTelemetry context for tracking active spans. This can cause span relationship issues where:

  • A parent span from one TracerProvider might have children from another TracerProvider
  • Some spans may appear “orphaned” if their parent spans belong to a different TracerProvider
  • Trace hierarchies may be incomplete or confusing

Plan your instrumentation carefully to avoid confusing trace structures.

from opentelemetry.sdk.traceimport TracerProviderfrom langfuseimport Langfuselangfuse_tracer_provider= TracerProvider()# do not set to global tracer provider to keep isolationlangfuse= Langfuse(tracer_provider=langfuse_tracer_provider)langfuse.start_span(name="myspan").end()# Span will be isolated from remaining OTEL instrumentation

UsingThreadPoolExecutors

Please use theOpenTelemetry ThreadingInstrumentor to ensure that the OpenTelemetry context is correctly propagated to all threads.

main.py
from opentelemetry.instrumentation.threadingimport ThreadingInstrumentorThreadingInstrumentor().instrument()

Distributed tracing

To maintain the trace context across service / process boundaries, please rely on the OpenTelemetry nativecontext propagation across service / process boundaries as much as possible.

Using thetrace_context argument to ‘force’ the parent child relationship may lead to unexpected trace updates as the resulting span will be treated as a root span server side.

Multi-Project Setup (Experimental)

⚠️

Multi-project setups areexperimental and have important limitations regarding third-party OpenTelemetry integrations.

The Langfuse Python SDK supports routing traces to different projects within the same application by using multiple public keys. This works because the Langfuse SDK adds a specific span attribute containing the public key to all spans it generates.

How it works:

  1. Span Attributes: The Langfuse SDK adds a specific span attribute containing the public key to spans it creates
  2. Multiple Processors: Multiple span processors are registered onto the global tracer provider, each with their respective exporters bound to a specific public key
  3. Filtering: Within each span processor, spans are filtered based on the presence and value of the public key attribute

Important Limitation with Third-Party Libraries:

Third-party libraries that emit OpenTelemetry spans automatically (e.g., HTTP clients, databases, other instrumentation libraries) donot have the Langfuse public key span attribute. As a result:

  • These spans cannot be routed to a specific project
  • They are processed by all span processors and sent to all projects
  • All projects will receive these third-party spans

Why is this experimental?This approach requires that thepublic_key parameter be passed to all Langfuse SDK executions across all integrations to ensure proper routing, and third-party spans will appear in all projects.

Initialization

To set up multiple projects, initialize separate Langfuse clients for each project:

from langfuseimport Langfuse# Initialize clients for different projectsproject_a_client= Langfuse(    public_key="pk-lf-project-a-...",    secret_key="sk-lf-project-a-...",    base_url="https://cloud.langfuse.com")project_b_client= Langfuse(    public_key="pk-lf-project-b-...",    secret_key="sk-lf-project-b-...",    base_url="https://cloud.langfuse.com")

Integration Usage

For all integrations in multi-project setups, you must specify thepublic_key parameter to ensure traces are routed to the correct project.

Observe Decorator:

Passlangfuse_public_key as a keyword argument to thetop-most observed function (not the decorator). From Python SDK >= 3.2.2, nested decorated functions will automatically pick up the public key from the execution context they are currently into. Also, calls toget_client will be also aware of the currentlangfuse_public_key in the decorated function execution context, so passing thelangfuse_public_key here again is not necessary.

from langfuseimport observe@observedef nested():    # get_client call is context aware    # if it runs inside another decorated function that has    # langfuse_public_key passed, it does not need passing here again@observedef process_data_for_project_a(data):    # passing `langfuse_public_key` here again is not necessarily    # as it is stored in execution context    nested()    return {"processed": data}@observedef process_data_for_project_b(data):    # passing `langfuse_public_key` here again is not necessarily    # as it is stored in execution context    nested()    return {"enhanced": data}# Route to Project A# Top-most decorated function needs `langfuse_public_key` kwargresult_a= process_data_for_project_a(    data="input data",    langfuse_public_key="pk-lf-project-a-...")# Route to Project B# Top-most decorated function needs `langfuse_public_key` kwargresult_b= process_data_for_project_b(    data="input data",    langfuse_public_key="pk-lf-project-b-...")

OpenAI Integration:

Addlangfuse_public_key as a keyword argument to the OpenAI execution:

from langfuse.openaiimport openaiclient= openai.OpenAI()# Route to Project Aresponse_a= client.chat.completions.create(    model="gpt-4o",    messages=[{"role":"user","content":"Hello from Project A"}],    langfuse_public_key="pk-lf-project-a-...")# Route to Project Bresponse_b= client.chat.completions.create(    model="gpt-4o",    messages=[{"role":"user","content":"Hello from Project B"}],    langfuse_public_key="pk-lf-project-b-...")

Langchain Integration:

Addpublic_key to the CallbackHandler constructor:

from langfuse.langchainimport CallbackHandlerfrom langchain_openaiimport ChatOpenAIfrom langchain_core.promptsimport ChatPromptTemplate# Create handlers for different projectshandler_a= CallbackHandler(public_key="pk-lf-project-a-...")handler_b= CallbackHandler(public_key="pk-lf-project-b-...")llm= ChatOpenAI(model_name="gpt-4o")prompt= ChatPromptTemplate.from_template("Tell me about{topic}")chain= prompt| llm# Route to Project Aresponse_a= chain.invoke(    {"topic":"machine learning"},    config={"callbacks": [handler_a]})# Route to Project Bresponse_b= chain.invoke(    {"topic":"data science"},    config={"callbacks": [handler_b]})

Important Considerations:

  • Every Langfuse SDK execution across all integrations must include the appropriate public key parameter
  • Missing public key parameters may result in traces being routed to the default project or lost
  • Third-party OpenTelemetry spans (from HTTP clients, databases, etc.) will appear in all projects since they lack the Langfuse public key attribute

Passingcompletion_start_time for TTFT tracking

If you are using the Python SDK to manually create generations, you can pass thecompletion_start_time parameter. This allows langfuse to calculate the time to first token (TTFT) for you.

from langfuseimport get_clientimport datetimeimport timelangfuse= get_client()# Start observation with specific typewith langfuse.start_as_current_observation(    as_type="generation",    name="TTFT-Generation")as generation:    # simulate LLM time to first token    time.sleep(3)    # Update the generation with the time the model started to generate    generation.update(        completion_start_time=datetime.datetime.now(),        output="some response",    )# Flush events in short-lived applicationslangfuse.flush()

Self-signed SSL certificates (self-hosted Langfuse)

If you areself-hosting Langfuse and you’d like to use self-signed SSL certificates, you will need to configure the SDK to trust the self-signed certificate:

⚠️

Changing SSL settings has major security implications depending on your environment. Be sure you understand these implications before you proceed.

1. Set OpenTelemetry span exporter to trust self-signed certificate

.env
OTEL_EXPORTER_OTLP_TRACES_CERTIFICATE="/path/to/my-selfsigned-cert.crt"

2. Set HTTPX to trust certificate for all other API requests to Langfuse instance

main.py
import osimport httpxfrom langfuseimport Langfusehttpx_client= httpx.Client(verify=os.environ["OTEL_EXPORTER_OTLP_TRACES_CERTIFICATE"])langfuse= Langfuse(httpx_client=httpx_client)

Observation Types

Langfuse supports multiple observation types to provide context for different components of LLM applications.The full list of the observation types is document here:Observation types.

Setting observation types with the@observe decorator

By setting theas_type parameter in the@observe decorator, you can specify the observation type for a method:

from langfuseimport observe# Tool calls to external services@observe(as_type="tool")def retrieve_context(query):    results= vector_store.get(query)    return results

Setting observation types with client methods and context manager

With the Langfuse client, you can directly create an observation with a defined type:

from langfuseimport get_client()langfuse= get_client()def process_with_manual_tracing():    trace= langfuse.trace(name="document-processing")    # Create different observation types    embedding_obs= trace.start_observation(as_type="embedding",        name="document-embedding",        input={"document":"text content"}    )    embeddings= generate_embeddings("text content")    embedding_obs.update(output={"embeddings": embeddings})    embedding_obs.end()

The context manager approach provides automatic resource cleanup:

from langfuseimport get_clientlangfuse= get_client()def process_with_context_managers():    with langfuse.start_as_current_observation(as_type="chain",        name="retrieval-pipeline",    )as chain:        # Retrieval step        with langfuse.start_as_current_observation(            as_type="retriever",            name="vector-search",        )as retriever:            search_results= perform_vector_search("user question")            retriever.update(output={"results": search_results})
Was this page helpful?

[8]ページ先頭

©2009-2025 Movatter.jp