- Notifications
You must be signed in to change notification settings - Fork776
Description
Environment: Mac OS, RHEL 8 (doesn't matter)
Configure traces and/or metrics GRPC exporter with invalid collector URL
# init tracing trace_provider = TracerProvider(resource=resource) trace_exporter = create_trace_exporter(exporter, otlp_config, jaeger_config) trace_processor = BatchSpanProcessor(trace_exporter) trace_provider.add_span_processor(trace_processor) telemetry_trace.set_tracer_provider(trace_provider) # init metrics global _telemetry_meter # pylint: disable=global-statement metrics_exporter = create_metrics_exporter(exporter, otlp_config) metric_reader = PeriodicExportingMetricReader( metrics_exporter, export_interval_millis=export_interval, export_timeout_millis=3000 ) metrics_provider = MeterProvider(resource=resource, metric_readers=[metric_reader]) telemetry_metrics.set_meter_provider(metrics_provider) _telemetry_meter = telemetry_metrics.get_meter_provider().get_meter(service_name, str(service_version))What is the expected behavior?
When executing:
meter_provider = telemetry_metrics.get_meter_provider()if meter_provider and (shutdown_meter := getattr(meter_provider, 'shutdown', None)): shutdown_meter(timeout_millis=3000)it should shutdown in about 3 seconds
What is the actual behavior?
It takes ~60 seconds to shutdown with logs like
WARNING - opentelemetry.exporter.otlp.proto.grpc.exporter::_export:363 | Transient error StatusCode.UNAVAILABLE encountered while exporting metrics, retrying in 8s.Additional context
From what it looks like both metrics and traces exporters usethis base/mixin which has atimeout_millis
Now going level up whereOTLPMetricExporter.shudown calls it correctly
def shutdown(self, timeout_millis: float = 30_000, **kwargs) -> None: OTLPExporterMixin.shutdown(self, timeout_millis=timeout_millis)However,PeriodicExportingMetricReader.shutdown callsOTLPMetricExporter.shutdown with a different kwarg name which seems to be completely ignored
def shutdown(self, timeout_millis: float = 30_000, **kwargs) -> None: deadline_ns = time_ns() + timeout_millis * 10**6 def _shutdown(): self._shutdown = True did_set = self._shutdown_once.do_once(_shutdown) if not did_set: _logger.warning("Can't shutdown multiple times") return self._shutdown_event.set() if self._daemon_thread: self._daemon_thread.join( timeout=(deadline_ns - time_ns()) / 10**9 ) self._exporter.shutdown(timeout=(deadline_ns - time_ns()) / 10**6) # <--- timeout vs timeout_millis- given the use of
time_ns()if correct kwarg name it would lead to the error of negative timeout value being supplied
As for traces,exporter calls OTLPExporterMixin.shutdown without propagating any timeouts at all.
This leads to some bad behaviour when combined with k8s and async application since timeoutless thread lock blocks event loop and also leads to hanging containers in k8s cluster until they are killed.