Recent Posts
Archives

Posts Tagged ‘Monitoring’

PostHeaderIcon Observability for Modern Systems: From Metrics to Traces

Good monitoring doesn’t just tell you when things are broken—it explains why.

1) White-Box vs Black-Box Monitoring

White-box: metrics from inside the system (CPU, memory, app metrics). Example: http_server_requests_seconds from Spring Actuator.

Black-box: synthetic probes simulating user behavior (ping APIs, load test flows). Example: periodic “buy flow” test in production.

2) Tracing Distributed Transactions

Use OpenTelemetry to propagate context across microservices:

// Spring Boot setup
implementation "io.opentelemetry:opentelemetry-exporter-otlp:1.30.0"

// Annotate spans
Span span = tracer.spanBuilder("checkout").startSpan();
try (Scope scope = span.makeCurrent()) {
    paymentService.charge(card);
    inventoryService.reserve(item);
} finally {
    span.end();
}

These traces flow into Jaeger or Grafana Tempo to visualize bottlenecks across services.

3) Example Dashboard for a High-Value Service

  • Availability: % successful requests (SLO vs actual).
  • Latency: p95/p99 end-to-end response times.
  • Error Rate: 4xx vs 5xx breakdown.
  • Dependency Health: DB latency, cache hit ratio, downstream service SLOs.
  • User metrics: active sessions, checkout success rate.

PostHeaderIcon [SpringIO2024] Revving Up the Good Old Samaritan: Spring Boot Admin by Jatin Makhija @ Spring I/O 2024

At Spring I/O 2024 in Barcelona, Jatin Makhija, an engineering leader at Deutsche Telekom Digital Labs, delivered an insightful presentation on leveraging Spring Boot Admin to enhance application monitoring and management. With a rich background in startups like Exigo and VWO, Jatin shared practical use cases and live demonstrations, illustrating how Spring Boot Admin empowers developers to streamline operations in complex, distributed systems. This talk, filled with actionable insights, highlighted the tool’s versatility in addressing real-world challenges, from log management to feature flag automation.

Empowering Log Management

Jatin began by addressing a universal pain point for developers: debugging production issues. He emphasized the critical role of logs in resolving incidents, noting that Spring Boot Admin allows engineers to dynamically adjust log levels—from info to trace—in seconds without redeploying applications. Through a live demo, Jatin showcased how to filter logs at the class level, enabling precise debugging. However, he cautioned about the costs of excessive logging, both in infrastructure and compliance with GDPR. By masking personally identifiable information (PII) and reverting log levels promptly, teams can maintain security and efficiency. This capability ensures rapid issue resolution while keeping customers satisfied, as Jatin illustrated with real-time log adjustments.

Streamlining Feature Flags

Feature flags are indispensable in modern applications, particularly in multi-tenant environments. Jatin explored how Spring Boot Admin simplifies their management, allowing teams to toggle features without redeploying. He presented two compelling use cases: a legacy discount system and a mobile exchange program. In the latter, Jatin demonstrated dynamically switching locales (e.g., from German to English) to adapt third-party integrations, ensuring seamless user experiences across regions. By refreshing application contexts on the fly, Spring Boot Admin reduces downtime and enhances testing coverage. Jatin’s approach empowers product owners to experiment confidently, minimizing technical debt and ensuring robust feature validation.

Automating Operations

Automation is a cornerstone of efficient development, and Jatin showcased how Spring Boot Admin’s REST APIs can be harnessed to automate testing workflows. By integrating with CI/CD pipelines like Jenkins and test frameworks such as Selenium, teams can dynamically patch configurations and validate multi-tenant setups. A recorded demo illustrated an automated test toggling a mobile exchange feature, highlighting increased test coverage and early defect detection. Jatin emphasized that this automation reduces manual effort, boosts regression testing accuracy, and enables scalable deployments, allowing teams to ship with confidence.

Scaling Monitoring and Diagnostics

Monitoring distributed systems is complex, but Spring Boot Admin simplifies it with built-in metrics and diagnostics. Jatin demonstrated accessing health statuses, thread dumps, and heap dumps through the tool’s intuitive interface. He shared a story of debugging a Kubernetes pod misconfiguration, where Spring Boot Admin revealed discrepancies in CPU allocation, preventing application instability. By integrating the Git Commit Plugin, teams can track deployment details like commit IDs and timestamps, enhancing traceability in microservices. Jatin also addressed scalability, showcasing a deployment managing 374 instances across 24 applications, proving Spring Boot Admin’s robustness in large-scale environments.

Links:

PostHeaderIcon [SpringIO2019] Cloud Native Spring Boot Admin by Johannes Edmeier

At Spring I/O 2019 in Barcelona, Johannes Edmeier, a seasoned developer from Germany, captivated attendees with his deep dive into managing Spring Boot applications in Kubernetes environments using Spring Boot Admin. As the maintainer of this open-source project, Johannes shared practical insights into integrating Spring Boot Admin with Kubernetes via the Spring Cloud Kubernetes project. His session illuminated how developers can gain operational visibility and control without altering application code, making it a must-know tool for cloud-native ecosystems. This post explores Johannes’ approach, highlighting its relevance for modern DevOps.

Understanding Spring Boot Admin

Spring Boot Admin, a four-and-a-half-year-old project boasting over 17,000 GitHub stars, is an Apache-licensed tool designed to monitor and manage Spring Boot applications. Johannes, employed by ConSol, a German consultancy, dedicates 20% of his work time—and significant personal hours—to its development. The tool provides a user-friendly interface to visualize metrics, logs, and runtime configurations, addressing the limitations of basic monitoring solutions like plain metrics or logs. For Kubernetes-deployed applications, it leverages Spring Boot Actuator endpoints to deliver comprehensive insights without requiring code changes or new container images.

The challenge in cloud-native environments lies in achieving visibility into distributed systems. Johannes emphasized that Kubernetes, a common denominator across cloud vendors, demands robust monitoring tools. Spring Boot Admin meets this need by integrating with Spring Cloud Kubernetes, enabling service discovery and dynamic updates as services scale or fail. This synergy ensures developers can manage applications seamlessly, even in complex, dynamic clusters.

Setting Up Spring Boot Admin on Kubernetes

Configuring Spring Boot Admin for Kubernetes is straightforward, as Johannes demonstrated. Developers start by including the Spring Boot Admin starter server dependency, which bundles the UI and REST endpoints, and the Spring Cloud Kubernetes starter for service discovery. These dependencies, managed via Spring Cloud BOM, simplify setup. Johannes highlighted the importance of enabling the admin server, discovery client, and scheduling annotations in the application class to ensure health checks and service updates function correctly. A common pitfall, recently addressed in the documentation, is forgetting to enable scheduling, which prevents dynamic service updates.

For Kubernetes deployment, Johannes pre-built a Docker image and configured a service account with role-based access control (RBAC) to read pod, service, and endpoint data. This minimal RBAC setup avoids unnecessary permissions, enhancing security. An ingress and service complete the deployment, allowing access to the Spring Boot Admin UI. Johannes showcased a wallboard view, ideal for team dashboards, and demonstrated real-time monitoring by simulating a service failure, which triggered a yellow “restricted” status and subsequent recovery as Kubernetes rescheduled the pod.

Enhancing Monitoring with Actuator Endpoints

Spring Boot Admin’s power lies in its integration with Spring Boot Actuator, which exposes endpoints like health, info, metrics, and more. By default, only health and info endpoints are exposed, but Johannes showed how to expose all endpoints using a Kubernetes environment variable (management.endpoints.web.exposure.include=*). This unlocks detailed views for metrics, environment properties, beans, and scheduled tasks. For instance, the health endpoint provides granular details when set to “always” show details, revealing custom health indicators like database connectivity.

Johannes also highlighted advanced features, such as rendering Swagger UI links via the info endpoint’s properties, simplifying access to API documentation. For security, he recommended isolating Actuator endpoints on a separate management port (e.g., 9080) to prevent public exposure via the main ingress. Spring Cloud Kubernetes facilitates this by allowing developers to specify the management port for discovery, ensuring Spring Boot Admin accesses Actuator endpoints securely while keeping them hidden from external traffic.

Customization and Security Considerations

Spring Boot Admin excels in customization, catering to specific monitoring needs. Johannes demonstrated how to add top-level links to external tools like Grafana or Kibana, or embed them as iframes, reducing the need to memorize URLs. For advanced use cases, developers can create custom views using Vue.js, as Johannes did to toggle application status (e.g., setting a service to “out of service”). This flexibility extends to notifications, supporting Slack, Microsoft Teams, and email via simple configurations, with a test SMTP server like MailHog for demos.

Security is a critical concern, as Spring Boot Admin proxies requests to Actuator endpoints. Johannes cautioned against exposing the admin server publicly, citing an unsecured instance found via Google. He outlined three security approaches: no authentication (not recommended), session-based authentication with cookies, or OAuth2 with token forwarding, where the target application validates access. A service account handles background health checks, ensuring minimal permissions. For Keycloak integration, Johannes referenced a blog post by his colleague Tomas, showcasing Spring Boot Admin’s compatibility with modern security frameworks.

Runtime Management and Future Enhancements

Spring Boot Admin empowers runtime management, a standout feature Johannes showcased. The loggers endpoint allows dynamic adjustment of logging levels, with a forthcoming feature to set levels across all instances simultaneously. Other endpoints, like Jolokia for JMX interaction, enable runtime reconfiguration but require caution due to their power. Heap and thread dump endpoints aid debugging but risk exposing sensitive data or overwhelming resources. Johannes also previewed upcoming features, like minimum instance checks, enhancing Spring Boot Admin’s robustness in production.

For Johannes, Spring Boot Admin is more than a monitoring tool—it’s a platform for operational excellence. By integrating seamlessly with Kubernetes and Spring Boot Actuator, it addresses the complexities of cloud-native applications, empowering developers to focus on delivering value. His session at Spring I/O 2019 underscores its indispensable role in modern software ecosystems.

Links:

PostHeaderIcon [ScalaDaysNewYork2016] Monitoring Reactive Applications: New Approaches for a New Paradigm

Reactive applications, built on event-driven and asynchronous foundations, require innovative monitoring strategies. At Scala Days New York 2016, Duncan DeVore and Henrik Engström, both from Lightbend, explored the challenges and solutions for monitoring such systems. They discussed how traditional monitoring falls short for reactive architectures and introduced Lightbend’s approach to addressing these challenges, emphasizing adaptability and precision in observing distributed systems.

The Shift from Traditional Monitoring

Duncan and Henrik began by outlining the limitations of traditional monitoring, which relies on stack traces in synchronous systems to diagnose issues. In reactive applications, built with frameworks like Akka and Play, the asynchronous, message-driven nature disrupts this model. Stack traces lose relevance, as actors communicate without a direct call stack. The speakers categorized monitoring into business process, functional, and technical types, highlighting the need to track metrics like actor counts, message flows, and system performance in distributed environments.

The Impact of Distributed Systems

The rise of the internet and cloud computing has transformed system design, as Duncan explained. Distributed computing, pioneered by initiatives like ARPANET, and the economic advantages of cloud platforms have enabled businesses to scale rapidly. However, this shift introduces complexities, such as network partitions and variable workloads, necessitating new monitoring approaches. Henrik noted that reactive systems, designed for scalability and resilience, require tools that can handle dynamic data flows and provide insights into system behavior without relying on traditional metrics.

Challenges in Monitoring Reactive Systems

Henrik detailed the difficulties of monitoring asynchronous systems, where data flows through push or pull models. In push-based systems, monitoring tools must handle high data volumes, risking overload, while pull-based systems allow selective querying for efficiency. The speakers emphasized anomaly detection over static thresholds, as thresholds are hard to calibrate and may miss nuanced issues. Anomaly detection, exemplified by tools like Prometheus, identifies unusual patterns by correlating metrics, reducing false alerts and enhancing system understanding.

Lightbend’s Monitoring Solution

Duncan and Henrik introduced Lightbend Monitoring, a subscription-based tool tailored for reactive applications. It integrates with Akka actors and Lagom circuit breakers, generating metrics and traces for backends like StatsD and Telegraf. The solution supports pull-based monitoring, allowing selective data collection to manage high data volumes. Future enhancements include support for distributed tracing, Prometheus integration, and improved Lagom compatibility, aiming to provide a comprehensive view of system health and performance.

Links: