Recent Posts
Archives

PostHeaderIcon [Devoxx FR 2024] Instrumenting Java Applications with OpenTelemetry: A Comprehensive Guide


Introduction

In a recent presentation at a Paris JUG event, Bruce Bujon, an R&D Engineer at Datadog and an open-source developer, delivered an insightful talk on instrumenting Java applications with OpenTelemetry. This powerful observability framework is transforming how developers monitor and analyze application performance, infrastructure, and security. In this detailed post, we’ll explore the key concepts from Bruce’s presentation, breaking down OpenTelemetry, its components, and practical steps to implement it in Java applications.

What is OpenTelemetry?

OpenTelemetry is an open-source observability framework designed to collect, process, and export telemetry data in a vendor-agnostic manner. It captures data from various sources—such as virtual machines, databases, and applications—and exports it to observability backends for analysis. Importantly, OpenTelemetry focuses solely on data collection and management, leaving visualization and analysis to backend tools like Datadog, Jaeger, or Grafana.

The framework supports three primary signals:

  • Traces: These map the journey of requests through an application, highlighting the time taken by each component or microservice.
  • Logs: Timestamped events, such as user actions or system errors, familiar to most developers.
  • Metrics: Aggregated numerical data, like request rates, error counts, or CPU usage over time.

In his talk, Bruce focused on traces, which are particularly valuable for understanding performance bottlenecks in distributed systems.

Why Use OpenTelemetry for Java Applications?

For Java developers, OpenTelemetry offers a standardized way to instrument applications, ensuring compatibility with various observability backends. Its flexibility allows developers to collect telemetry data without being tied to a specific tool, making it ideal for diverse tech stacks. Bruce highlighted its growing adoption, noting that OpenTelemetry is the second most active project in the Cloud Native Computing Foundation (CNCF), behind only Kubernetes.

Instrumenting a Java Application: A Step-by-Step Guide

Bruce demonstrated three approaches to instrumenting Java applications with OpenTelemetry, using a simple example of two web services: an “Order” service and a “Storage” service. The goal was to trace a request from the Order service, which calls the Storage service to check stock levels for items like hats, bags, and socks.

Approach 1: Manual Instrumentation with OpenTelemetry API and SDK

The first approach involves manually instrumenting the application using the OpenTelemetry API and SDK. This method offers maximum control but requires significant development effort.

Steps:

  1. Add Dependencies: Include the OpenTelemetry Bill of Materials (BOM) to manage library versions, along with the API, SDK, OTLP exporter, and semantic conventions.
  2. Initialize the SDK: Set up a TracerProvider with a resource defining the service (e.g., “storage”) and attributes like service name and deployment environment.
  3. Create a Tracer: Use the Tracer to generate spans for specific operations, such as a web route or internal method.
  4. Instrument Routes: For each route or method, create a span using a SpanBuilder, set attributes (e.g., span kind as “server”), and mark the start and end of the span.
  5. Export Data: Configure the SDK to export spans to an OpenTelemetry Collector via the OTLP protocol.

Example Output: Bruce showed a trace with two spans—one for the route and one for an internal method—displayed in Datadog’s APM view, with attributes like service name and HTTP method.

Pros: Fine-grained control over instrumentation.

Cons: Verbose and time-consuming, especially for large applications or libraries with private APIs.

Approach 2: Framework Support with Spring Boot

The second approach leverages framework-specific integrations, such as Spring Boot’s OpenTelemetry starter, to automate instrumentation.

Steps:

  1. Add Spring Boot Starter: Include the OpenTelemetry starter, which bundles the API, SDK, exporter, and autoconfigure dependencies.
  2. Configure Environment Variables: Set variables for the service name, OTLP endpoint, and other settings.
  3. Run the Application: The starter automatically instruments web routes, capturing HTTP methods, routes, and response codes.

Example Output: Bruce demonstrated a trace for the Order service, with spans automatically generated for routes and tagged with HTTP metadata.

Pros: Minimal code changes and good generic instrumentation.

Cons: Limited customization and varying support across frameworks (e.g., Spring Boot doesn’t support JDBC out of the box).

Approach 3: Auto-Instrumentation with JVM Agent

The third and most powerful approach uses the OpenTelemetry JVM agent for automatic instrumentation, requiring minimal code changes.

Steps:

  1. Add the JVM Agent: Attach the OpenTelemetry Java agent to the JVM using a command-line option (e.g., -javaagent:opentelemetry-javaagent.jar).
  2. Configure Environment Variables: Use autoconfigure variables (around 80 options) to customize the agent’s behavior.
  3. Remove Manual Instrumentation: Eliminate SDK, exporter, and framework dependencies, keeping only the API and semantic conventions for custom instrumentation.
  4. Run the Application: The agent instruments web servers, clients, and libraries (e.g., JDBC, Kafka) at runtime.

Example Output: Bruce showcased a complete distributed trace, including spans for both services, web clients, and servers, with context propagation handled automatically.

Pros: Comprehensive instrumentation with minimal effort, supporting over 100 libraries.

Cons: Potential conflicts with other JVM agents (e.g., security tools) and limited support for native images (e.g., Quarkus).

Context Propagation: Linking Traces Across Services

A critical aspect of distributed tracing is context propagation, ensuring that spans from different services are linked within a single trace. Bruce explained that without propagation, the Order and Storage services generated separate traces.

To address this, OpenTelemetry uses HTTP headers (e.g., W3C’s traceparent and tracestate) to carry tracing context. In the manual approach, Bruce implemented a RestTemplate interceptor in Spring to inject headers and a Quarkus filter to extract them. The JVM agent, however, handles this automatically, simplifying the process.

Additional Considerations

  • Baggage: In response to an audience question, Bruce clarified that OpenTelemetry’s baggage feature allows propagating business-specific metadata across services, complementing tracing context.
  • Cloud-Native Support: While cloud providers like AWS Lambda have proprietary monitoring solutions, their native support for OpenTelemetry varies. Bruce suggested further exploration for specific use cases like batch jobs or serverless functions.
  • Performance: The JVM agent modifies bytecode at runtime, which may impact startup time but generally has negligible runtime overhead.

Conclusion

OpenTelemetry is a game-changer for Java developers seeking to enhance application observability. As Bruce demonstrated, it offers three flexible approaches—manual instrumentation, framework support, and auto-instrumentation—catering to different needs and expertise levels. The JVM agent stands out for its ease of use and comprehensive coverage, making it an excellent starting point for teams new to OpenTelemetry.

To get started, add the OpenTelemetry Java agent to your application with a single command-line option and configure it via environment variables. This minimal setup allows you to immediately observe your application’s behavior and assess OpenTelemetry’s value for your team.

The code and slides from Bruce’s presentation are available on GitHub, providing a practical reference for implementing OpenTelemetry in your projects. Whether you’re monitoring microservices or monoliths, OpenTelemetry empowers you to gain deep insights into your applications’ performance and behavior.

Resources

Leave a Reply