Recent Posts
Archives

Archive for the ‘en-US’ Category

PostHeaderIcon [DevoxxGR2024] The Architect Elevator: Mid-Day Keynote by Gregor Hohpe

In his mid-day keynote at Devoxx Greece 2024, Gregor Hohpe, reflecting on two decades as an architect, presented the Architect Elevator—a metaphor for architects connecting organizational layers from the “engine room” to the “penthouse.” Rejecting the notion that architects are the smartest decision-makers, Gregor argued they amplify collective intelligence by sharing models, revealing blind spots, and fostering better decisions. Using metaphors, sketches, and multi-dimensional thinking, architects bridge technical and business strategies, ensuring alignment in complex, fast-changing environments.

Redefining the Architect’s Role

Gregor emphasized that being an architect is a mindset, not a title. Architects don’t make all decisions but boost the team’s IQ through seven maneuvers: connecting organizational layers, using metaphors, drawing abstract sketches, expanding solution spaces, trading options, zooming in/out, and embracing non-binary thinking. The value lies in spanning multiple levels—executive strategy to hands-on engineering—rather than sitting in an ivory tower or engine room alone.

The Architect Elevator Metaphor

Organizations are layered like skyscrapers, with management at the top and developers below, often isolated by middle management. This “loosely coupled” structure creates illusions of success upstairs and unchecked freedom downstairs, misaligning strategy and execution. Architects ride the elevator to connect these layers, ensuring technical decisions support business goals. For example, a strategy to enter new markets requires automated, cloud-based systems for replication, while product diversification demands robust integration.

Connecting Levels with Metaphors and Sketches

Gregor advocated using metaphors to invite stakeholders into technical discussions, avoiding jargon that alienates smart executives. For instance, explaining automation’s role in security and cost-efficiency aligns engine-room work with C-suite priorities. Sketches, like Frank Gehry’s architectural drawings, should capture mental models, not blueprints, abstracting complexity to focus on purpose and constraints. These foster shared understanding across layers.

Multi-Dimensional Thinking

Architects expand solution spaces by adding dimensions to debates. For example, speed vs. quality arguments are resolved by automation and shift-left testing. Similarly, cloud lock-in concerns are reframed by balancing switching costs against benefits like scalability. Gregor’s experience at an insurance company showed standardization (harmonization) enables innovation by locking down protocols while allowing diverse languages, trading one option for another. The Black-Scholes formula illustrates that options (e.g., scalability) are more valuable in uncertain environments, justifying architecture’s role.

Zooming In and Out

Zooming out reveals system characteristics, like layering’s trade-offs (clean dependencies vs. latency) or resilience in loosely coupled designs. Local optimization, as in pre-DevOps silos, often fails globally. Architects optimize globally, aligning teams via feedback cycles and value stream mapping. Zooming also applies to models: different abstractions (e.g., topographical vs. political maps) answer different questions, requiring architects to tailor models to stakeholders’ needs.

Architecture and Agility in Uncertainty

Gregor highlighted that architecture and agility thrive in uncertainty, providing options (e.g., scalability) and flexibility. Using a car metaphor, agility is the steering wheel, and architecture the engine—both are essential. Architects avoid binary thinking (e.g., “all containers”), embracing trade-offs in a multi-dimensional solution space to align with business needs.

Practical Takeaways

  • Connect Layers: Bridge technical and business strategy with clear communication.
  • Use Metaphors and Sketches: Simplify concepts to engage stakeholders.
  • Think Multi-Dimensionally: Reframe problems to expand solutions.
  • **Zoom In/Out: Optimize globally, tailoring abstractions to questions.
  • Embrace Uncertainty: Leverage architecture and agility to create valuable options.

Links

Hashtags: ##SocioTechnical #GregorHohpe #DevoxxGR2024 #ArchitectureMindset #PlatformStrategy

PostHeaderIcon [DevoxxGR2024] Socio-Technical Smells: How Technical Problems Cause Organizational Friction by Adam Tornhill

At Devoxx Greece 2024, Adam Tornhill delivered a compelling session on socio-technical smells, emphasizing how technical issues in codebases create organizational friction. Using behavioral code analysis, which combines code metrics with team interaction data, Adam demonstrated how to identify and mitigate five common challenges: architectural coordination bottlenecks, implicit team dependencies, knowledge risks, scaling issues tied to Brooks’s Law, and the impact of bad code on morale and attrition. Through real-world examples from codebases like Facebook’s Folly, Hibernate, ASP.NET Core, and Telegram for Android, he showcased practical techniques to align technical and organizational design, reducing waste and improving team efficiency.

Overcrowded Systems and Brooks’s Law

Adam introduced the concept of overcrowded systems with a story from his past, where a product company’s subsystem, developed by 25 people over two years, faced critical deadlines. After analysis, Adam’s team recommended scrapping the code and rewriting it with just five developers, delivering in two and a half months instead of three. This success highlighted Brooks’s Law (from The Mythical Man-Month, 1975), which states that adding people to a late project increases coordination overhead, delaying delivery. A visualization showed that beyond a certain team size, communication costs outweigh productivity gains. Solutions include shrinking teams to match work modularity or redesigning systems for higher modularity to support parallel work.

Coordination Bottlenecks in Code

Using behavioral code analysis on git logs, Adam identified coordination bottlenecks where multiple developers edit the same files. Visualizations of Facebook’s Folly C++ library revealed a file modified by 58 developers in a year, indicating a “god class” with low cohesion. Code smells like complex if-statements, lengthy comments, and nested logic confirmed this. Similarly, Hibernate’s AbstractEntityPersister class, with over 5,000 lines and 380 methods, showed poor cohesion. By extracting methods into cohesive classes (e.g., lifecycle or proxy), developers can reduce coordination needs, creating natural team boundaries.

Implicit Dependencies and Change Coupling

Adam explored inter-module dependencies using change coupling, a technique that analyzes git commit patterns to find files that co-evolve, revealing logical dependencies not visible in static code. In ASP.NET Core, integration tests showed high cohesion within a package, but an end-to-end Razor Page test coupled with four packages indicated low cohesion and high change costs. In Telegram for Android, a god class (ChatActivity) was a change coupling hub, requiring modifications for nearly every feature. Adam recommended aligning architecture with the problem domain to minimize cross-team dependencies and avoid “shotgun surgery,” where changes scatter across multiple services.

Knowledge Risks and Truck Factor

Adam discussed knowledge risks using the truck factor—the number of developers who can leave before a codebase becomes unmaintainable. In React, with 1,500 contributors, the truck factor is two, meaning 50% of knowledge is lost if two key developers leave. Vue.js has a truck factor of one, risking 70% knowledge loss. Visualizations highlighted files with low truck factors, poor code health, and high activity as onboarding risks. Adam advised prioritizing refactoring of such code to reduce key-person dependencies and ease onboarding, as unfamiliarity often masquerades as complexity.

Bad Code’s Organizational Impact

A study showed that changes to “red” (low-quality) code take up to 10 times longer than to “green” (high-quality) code, with unfamiliar developers needing 50% more time for small tasks and double for larger ones. A story about a German team perceiving an inherited codebase as a “mess” revealed that its issues stemmed from poor onboarding, not technical debt. Adam emphasized addressing root causes—training and onboarding—over premature refactoring. Bad code also lowers morale, increases attrition, and amplifies organizational problems, making socio-technical alignment critical.

Practical Takeaways

Adam’s techniques, supported by tools like CodeScene and research in his book Your Code as a Crime Scene, offer actionable insights:
Use Behavioral Code Analysis: Leverage git logs to detect coordination bottlenecks and change coupling.
Increase Cohesion: Refactor god classes and align architecture with domains to reduce team dependencies.
Mitigate Knowledge Risks: Prioritize refactoring high-risk code with low truck factors to ease onboarding.
Address Root Causes: Invest in onboarding to avoid mistaking unfamiliarity for complexity.
Visualize Patterns: Use tools to highlight socio-technical smells, enabling data-driven decisions.

Links:

PostHeaderIcon Understanding Dependency Management and Resolution: A Look at Java, Python, and Node.js

Understanding Dependency Management and Resolution: A Look at Java, Python, and Node.js

Mastering how dependencies are handled can define your project’s success or failure. Let’s explore the nuances across today’s major development ecosystems.

Introduction

Every modern application relies heavily on external libraries. These libraries accelerate development, improve security, and enable integration with third-party services. However, unmanaged dependencies can lead to catastrophic issues — from version conflicts to severe security vulnerabilities. That’s why understanding dependency management and resolution is absolutely essential, particularly across different programming ecosystems.

What is Dependency Management?

Dependency management involves declaring external components your project needs, installing them properly, ensuring their correct versions, and resolving conflicts when multiple components depend on different versions of the same library. It also includes updating libraries responsibly and securely over time. In short, good dependency management prevents issues like broken builds, “dependency hell”, or serious security holes.

Java: Maven and Gradle

In the Java ecosystem, dependency management is an integrated and structured part of the build lifecycle, using tools like Maven and Gradle.

Maven and Dependency Scopes

Maven uses a declarative pom.xml file to list dependencies. A particularly important notion in Maven is the dependency scope.

Scopes control where and how dependencies are used. Examples include:

  • compile (default): Needed at both compile time and runtime.
  • provided: Needed for compile, but provided at runtime by the environment (e.g., Servlet API in a container).
  • runtime: Needed only at runtime, not at compile time.
  • test: Used exclusively for testing (JUnit, Mockito, etc.).
  • system: Provided by the system explicitly (deprecated practice).

<dependency>
  <groupId>junit</groupId>
  <artifactId>junit</artifactId>
  <version>4.13.2</version>
  <scope>test</scope>
</dependency>
    

This nuanced control allows Java developers to avoid bloating production artifacts with unnecessary libraries, and to fine-tune build behaviors. This is a major feature missing from simpler systems like pip or npm.

Gradle

Gradle, offering both Groovy and Kotlin DSLs, also supports scopes through configurations like implementation, runtimeOnly, testImplementation, which have similar meanings to Maven scopes but are even more flexible.


dependencies {
    implementation 'org.springframework.boot:spring-boot-starter'
    testImplementation 'org.springframework.boot:spring-boot-starter-test'
}
    

Python: pip and Poetry

Python dependency management is simpler, but also less structured compared to Java. With pip, there is no formal concept of scopes.

pip

Developers typically separate main dependencies and development dependencies manually using different files:

  • requirements.txt – Main project dependencies.
  • requirements-dev.txt – Development and test dependencies (pytest, tox, etc.).

This manual split is prone to human error and lacks the rigorous environment control that Maven or Gradle enforce.

Poetry

Poetry improves the situation by introducing a structured division:


[tool.poetry.dependencies]
requests = "^2.31"

[tool.poetry.dev-dependencies]
pytest = "^7.1"
    

Poetry brings concepts closer to Maven scopes, but they are still less fine-grained (no runtime/compile distinction, for instance).

Node.js: npm and Yarn

JavaScript dependency managers like npm and yarn allow a simple distinction between regular and development dependencies.

npm

Dependencies are declared in package.json under different sections:

  • dependencies – Needed in production.
  • devDependencies – Needed only for development (e.g., testing libraries, linters).

{
  "dependencies": {
    "express": "^4.18.2"
  },
  "devDependencies": {
    "mocha": "^10.2.0"
  }
}
    

While convenient, npm’s dependency management lacks Maven’s level of strictness around dependency resolution, often leading to version mismatches or “node_modules bloat.”

Key Differences Between Ecosystems

When switching between Java, Python, and Node.js environments, developers must be aware of the following fundamental differences:

1. Formality of Scopes

Java’s Maven/Gradle ecosystem defines scopes formally at the dependency level. Python (pip) and JavaScript (npm) ecosystems use looser, file- or section-based categorization.

2. Handling of Transitive Dependencies

Maven and Gradle resolve and include transitive dependencies automatically with sophisticated conflict resolution strategies (e.g., nearest version wins). pip historically had weak transitive dependency handling, leading to issues unless careful pinning is done. npm introduced better nested module flattening with npm v7+ but conflicts still occur in complex trees.

3. Lockfiles

npm/yarn and Python Poetry use lockfiles (package-lock.json, yarn.lock, poetry.lock) to ensure consistent dependency installations across machines. Maven and Gradle historically did not need lockfiles because they strictly followed declared versions and scopes. However, Gradle introduced lockfile support with dependency locking in newer versions.

4. Dependency Updating Strategy

Java developers often manually manage dependency versions inside pom.xml or use dependencyManagement blocks for centralized control. pip requires updating requirements.txt or regenerating them via pip freeze. npm/yarn allows semver rules (“^”, “~”) but auto-updating can lead to subtle breakages if not careful.

Best Practices Across All Languages

  • Pin exact versions wherever possible to avoid surprise updates.
  • Use lockfiles and commit them to version control (Git).
  • Separate production and development/test dependencies explicitly.
  • Use dependency scanners (e.g., OWASP Dependency-Check, Snyk, npm audit) regularly to detect vulnerabilities.
  • Prefer stable, maintained libraries with good community support and recent commits.

Conclusion

Dependency management, while often overlooked early in projects, becomes critical as applications scale. Maven and Gradle offer the most fine-grained controls via dependency scopes and conflict resolution. Python and JavaScript ecosystems are evolving rapidly, but require developers to be much more careful manually. Understanding these differences, and applying best practices accordingly, will ensure smoother builds, faster delivery, and safer production systems.

Interested in deeper dives into dependency vulnerability scanning, SBOM generation, or automatic dependency update pipelines? Subscribe to our blog for more in-depth content!

PostHeaderIcon [Devoxx FR 2024] Mastering Reproducible Builds with Apache Maven: Insights from Hervé Boutemy


Introduction

In a recent presentation, Hervé Boutemy, a veteran Maven maintainer, Apache Software Foundation member, and Solution Architect at Sonatype, delivered a compelling talk on reproducible builds with Apache Maven. With over 20 years of experience in Java, CI/CD, DevOps, and software supply chain security, Hervé shared his five-year journey to make Maven builds reproducible, a critical practice for achieving the highest level of trust in software, as defined by SLSA Level 4. This post dives into the key concepts, practical steps, and surprising benefits of reproducible builds, based on Hervé’s insights and hands-on demonstrations.

What Are Reproducible Builds?

Reproducible builds ensure that compiling the same source code, with the same environment and build tools, produces identical binaries, byte-for-byte. This practice verifies that the distributed binary matches the source code, eliminating risks like malicious tampering or unintended changes. Hervé highlighted the infamous XZ incident, where discrepancies between source tarballs and Git repositories went unnoticed—reproducible builds could have caught this by ensuring the binary matched the expected source.

Originally pioneered by Linux distributions like Debian in 2013, reproducible builds have gained traction in the Java ecosystem. Hervé’s work has led to over 2,000 verified reproducible releases from 500+ open-source projects on Maven Central, with stats growing weekly.

Why Reproducible Builds Matter

Reproducible builds are primarily about security. They allow anyone to rebuild a project and confirm that the binary hasn’t been compromised (e.g., no backdoors or “foireux” additions, as Hervé humorously put it). But Hervé’s five-year experience revealed additional benefits:

  • Build Validation: Ensure patches or modifications don’t introduce unintended changes. A “build successful” message doesn’t guarantee the binary is correct—reproducible builds do.
  • Data Leak Prevention: Hervé found sensitive data (e.g., usernames, machine names, even a PGP passphrase!) embedded in Maven Central artifacts, exposing personal or organizational details.
  • Enterprise Trust: When outsourcing development, reproducible builds verify that a vendor’s binary matches the provided source, saving time and reducing risk.
  • Build Efficiency: Reproducible builds enable caching optimizations, improving build performance.

These benefits extend beyond security, making reproducible builds a powerful tool for developers, enterprises, and open-source communities.

Implementing Reproducible Builds with Maven

Hervé outlined a practical workflow to achieve reproducible builds, demonstrated through his open-source project, reproducible-central, which includes scripts and rebuild recipes for 3,500+ compilations across 627+ projects. Here’s how to make your Maven builds reproducible:

Step 1: Rebuild and Verify

Start by rebuilding a project from its source (e.g., a Git repository tag) and comparing the output binary to a reference (e.g., Maven Central or an internal repository). Hervé’s rebuild.sh script automates this:

  • Specify the Environment: Define the JDK (e.g., JDK 8 or 17), OS (Windows, Linux, FreeBSD), and Maven command (e.g., mvn clean verify -DskipTests).
  • Use Docker: The script creates a Docker image with the exact environment (JDK, OS, Maven version) to ensure consistency.
  • Compare Binaries: The script downloads the reference binary and checks if the rebuilt binary matches, reporting success or failure.

Hervé demonstrated this with the Maven Javadoc Plugin (version 3.5.0), showing a 100% reproducible build when the environment matched the original (e.g., JDK 8 on Windows).

Step 2: Diagnose Differences

If the binaries don’t match, use diffoscope, a tool from the Linux reproducible builds community, to analyze differences. Diffoscope compares archives (e.g., JARs), nested archives, and even disassembles bytecode to pinpoint issues like:

  • Timestamps: JARs include file timestamps, which vary by build time.
  • File Order: ZIP-based JARs don’t guarantee consistent file ordering.
  • Bytecode Variations: Different JDK major versions produce different bytecode, even for the same target (e.g., targeting Java 8 with JDK 17 vs. JDK 8).
  • Permissions: File permissions (e.g., group write access) differ across environments.

Hervé showed a case where a build failed due to a JDK mismatch (JDK 11 vs. JDK 8), which diffoscope revealed through bytecode differences.

Step 3: Configure Maven for Reproducibility

To make builds reproducible, address common sources of “noise” in Maven projects:

  • Fix Timestamps: Set a consistent timestamp using the project.build.outputTimestamp property, managed by the Maven Release or Versions plugins. This ensures JARs have identical timestamps across builds.
  • Upgrade Plugins: Many Maven plugins historically introduced variability (e.g., random timestamps or environment-specific data). Hervé contributed fixes to numerous plugins, and his artifact:check-buildplan goal identifies outdated plugins, suggesting upgrades to reproducible versions.
  • Avoid Non-Reproducible Outputs: Skip Javadoc generation (highly variable) and GPG signing (non-reproducible by design) during verification.

For example, Hervé explained that configuring project.build.outputTimestamp and upgrading plugins eliminated timestamp and file-order issues in JARs, making builds reproducible.

Step 4: Test Locally

Before scaling, test reproducibility locally using mvn verify (not install, which pollutes the local repository). The artifact:compare goal compares your build output to a reference binary (e.g., from Maven Central or an internal repository). For internal projects, specify your repository URL as a parameter.

To test without a remote repository, build twice locally: run mvn install for the first build, then mvn verify for the second, comparing the results. This catches issues like unfixed dates or environment-specific data.

Step 5: Scale and Report

For large-scale verification, adapt Hervé’s reproducible-central scripts to your internal repository. These scripts generate reports with group IDs, artifact IDs, and reproducibility scores, helping track progress across releases. Hervé’s stats (e.g., 100% reproducibility for some projects, partial for others) provide a model for enterprise reporting.

Challenges and Lessons Learned

Hervé shared several challenges and insights from his journey:

  • JDK Variability: Bytecode differs across major JDK versions, even for the same target. Always match the original JDK major version (e.g., JDK 8 for a Java 8 target).
  • Environment Differences: Windows vs. Linux line endings (CRLF vs. LF) or file permissions (e.g., group write access) can break reproducibility. Docker ensures consistent environments.
  • Plugin Issues: Older plugins introduced variability, but Hervé’s contributions have made modern versions reproducible.
  • Unexpected Findings: Reproducible builds uncovered sensitive data in Maven Central artifacts, highlighting the need for careful build hygiene.

One surprising lesson came from file permissions: Hervé discovered that newer Linux distributions default to non-writable group permissions, unlike older ones, requiring adjustments to build recipes.

Interactive Learning: The Quiz

Hervé ended with a fun quiz to test the audience’s understanding, presenting rebuild results and asking, “Reproducible or not?” Examples included:

  • Case 1: A Maven Javadoc Plugin 3.5.0 build matched the reference perfectly (reproducible).
  • Case 2: A build showed bytecode differences due to a JDK mismatch (JDK 11 vs. JDK 8, not reproducible).
  • Case 3: A build differed only in file permissions (group write access), fixable by adjusting the environment (reproducible with a corrected recipe).

The quiz reinforced a key point: reproducibility requires precise environment matching, but tools like diffoscope make debugging straightforward.

Getting Started

Ready to make your Maven builds reproducible? Follow these steps:

  1. Clone reproducible-central and explore Hervé’s scripts and stats.
  2. Run mvn artifact:check-buildplan to identify and upgrade non-reproducible plugins.
  3. Set project.build.outputTimestamp in your POM file to fix JAR timestamps.
  4. Test locally with mvn verify and artifact:compare, specifying your repository if needed.
  5. Scale up using rebuild.sh and Docker for consistent environments, adapting to your internal repository.

Hervé encourages feedback to improve his tools, so if you hit issues, reach out via the project’s GitHub or Apache’s community channels.

Conclusion

Reproducible builds with Maven are not only achievable but transformative, offering security, trust, and operational benefits. Hervé Boutemy’s work demystifies the process, providing tools, scripts, and a clear roadmap to success. From preventing backdoors to catching configuration errors and sensitive data leaks, reproducible builds are a must-have for modern Java development.

Start small with artifact:check-buildplan, test locally, and scale with reproducible-central. As Hervé’s 3,500+ rebuilds show, the Java community is well on its way to making reproducibility the norm. Join the movement, and let’s build software we can trust!

Resources

PostHeaderIcon [Devoxx FR 2024] Instrumenting Java Applications with OpenTelemetry: A Comprehensive Guide


Introduction

In a recent presentation at a Paris JUG event, Bruce Bujon, an R&D Engineer at Datadog and an open-source developer, delivered an insightful talk on instrumenting Java applications with OpenTelemetry. This powerful observability framework is transforming how developers monitor and analyze application performance, infrastructure, and security. In this detailed post, we’ll explore the key concepts from Bruce’s presentation, breaking down OpenTelemetry, its components, and practical steps to implement it in Java applications.

What is OpenTelemetry?

OpenTelemetry is an open-source observability framework designed to collect, process, and export telemetry data in a vendor-agnostic manner. It captures data from various sources—such as virtual machines, databases, and applications—and exports it to observability backends for analysis. Importantly, OpenTelemetry focuses solely on data collection and management, leaving visualization and analysis to backend tools like Datadog, Jaeger, or Grafana.

The framework supports three primary signals:

  • Traces: These map the journey of requests through an application, highlighting the time taken by each component or microservice.
  • Logs: Timestamped events, such as user actions or system errors, familiar to most developers.
  • Metrics: Aggregated numerical data, like request rates, error counts, or CPU usage over time.

In his talk, Bruce focused on traces, which are particularly valuable for understanding performance bottlenecks in distributed systems.

Why Use OpenTelemetry for Java Applications?

For Java developers, OpenTelemetry offers a standardized way to instrument applications, ensuring compatibility with various observability backends. Its flexibility allows developers to collect telemetry data without being tied to a specific tool, making it ideal for diverse tech stacks. Bruce highlighted its growing adoption, noting that OpenTelemetry is the second most active project in the Cloud Native Computing Foundation (CNCF), behind only Kubernetes.

Instrumenting a Java Application: A Step-by-Step Guide

Bruce demonstrated three approaches to instrumenting Java applications with OpenTelemetry, using a simple example of two web services: an “Order” service and a “Storage” service. The goal was to trace a request from the Order service, which calls the Storage service to check stock levels for items like hats, bags, and socks.

Approach 1: Manual Instrumentation with OpenTelemetry API and SDK

The first approach involves manually instrumenting the application using the OpenTelemetry API and SDK. This method offers maximum control but requires significant development effort.

Steps:

  1. Add Dependencies: Include the OpenTelemetry Bill of Materials (BOM) to manage library versions, along with the API, SDK, OTLP exporter, and semantic conventions.
  2. Initialize the SDK: Set up a TracerProvider with a resource defining the service (e.g., “storage”) and attributes like service name and deployment environment.
  3. Create a Tracer: Use the Tracer to generate spans for specific operations, such as a web route or internal method.
  4. Instrument Routes: For each route or method, create a span using a SpanBuilder, set attributes (e.g., span kind as “server”), and mark the start and end of the span.
  5. Export Data: Configure the SDK to export spans to an OpenTelemetry Collector via the OTLP protocol.

Example Output: Bruce showed a trace with two spans—one for the route and one for an internal method—displayed in Datadog’s APM view, with attributes like service name and HTTP method.

Pros: Fine-grained control over instrumentation.

Cons: Verbose and time-consuming, especially for large applications or libraries with private APIs.

Approach 2: Framework Support with Spring Boot

The second approach leverages framework-specific integrations, such as Spring Boot’s OpenTelemetry starter, to automate instrumentation.

Steps:

  1. Add Spring Boot Starter: Include the OpenTelemetry starter, which bundles the API, SDK, exporter, and autoconfigure dependencies.
  2. Configure Environment Variables: Set variables for the service name, OTLP endpoint, and other settings.
  3. Run the Application: The starter automatically instruments web routes, capturing HTTP methods, routes, and response codes.

Example Output: Bruce demonstrated a trace for the Order service, with spans automatically generated for routes and tagged with HTTP metadata.

Pros: Minimal code changes and good generic instrumentation.

Cons: Limited customization and varying support across frameworks (e.g., Spring Boot doesn’t support JDBC out of the box).

Approach 3: Auto-Instrumentation with JVM Agent

The third and most powerful approach uses the OpenTelemetry JVM agent for automatic instrumentation, requiring minimal code changes.

Steps:

  1. Add the JVM Agent: Attach the OpenTelemetry Java agent to the JVM using a command-line option (e.g., -javaagent:opentelemetry-javaagent.jar).
  2. Configure Environment Variables: Use autoconfigure variables (around 80 options) to customize the agent’s behavior.
  3. Remove Manual Instrumentation: Eliminate SDK, exporter, and framework dependencies, keeping only the API and semantic conventions for custom instrumentation.
  4. Run the Application: The agent instruments web servers, clients, and libraries (e.g., JDBC, Kafka) at runtime.

Example Output: Bruce showcased a complete distributed trace, including spans for both services, web clients, and servers, with context propagation handled automatically.

Pros: Comprehensive instrumentation with minimal effort, supporting over 100 libraries.

Cons: Potential conflicts with other JVM agents (e.g., security tools) and limited support for native images (e.g., Quarkus).

Context Propagation: Linking Traces Across Services

A critical aspect of distributed tracing is context propagation, ensuring that spans from different services are linked within a single trace. Bruce explained that without propagation, the Order and Storage services generated separate traces.

To address this, OpenTelemetry uses HTTP headers (e.g., W3C’s traceparent and tracestate) to carry tracing context. In the manual approach, Bruce implemented a RestTemplate interceptor in Spring to inject headers and a Quarkus filter to extract them. The JVM agent, however, handles this automatically, simplifying the process.

Additional Considerations

  • Baggage: In response to an audience question, Bruce clarified that OpenTelemetry’s baggage feature allows propagating business-specific metadata across services, complementing tracing context.
  • Cloud-Native Support: While cloud providers like AWS Lambda have proprietary monitoring solutions, their native support for OpenTelemetry varies. Bruce suggested further exploration for specific use cases like batch jobs or serverless functions.
  • Performance: The JVM agent modifies bytecode at runtime, which may impact startup time but generally has negligible runtime overhead.

Conclusion

OpenTelemetry is a game-changer for Java developers seeking to enhance application observability. As Bruce demonstrated, it offers three flexible approaches—manual instrumentation, framework support, and auto-instrumentation—catering to different needs and expertise levels. The JVM agent stands out for its ease of use and comprehensive coverage, making it an excellent starting point for teams new to OpenTelemetry.

To get started, add the OpenTelemetry Java agent to your application with a single command-line option and configure it via environment variables. This minimal setup allows you to immediately observe your application’s behavior and assess OpenTelemetry’s value for your team.

The code and slides from Bruce’s presentation are available on GitHub, providing a practical reference for implementing OpenTelemetry in your projects. Whether you’re monitoring microservices or monoliths, OpenTelemetry empowers you to gain deep insights into your applications’ performance and behavior.

Resources

PostHeaderIcon [DevoxxBE2023] How Sand and Java Create the World’s Most Powerful Chips

Johan Janssen, an architect at ASML, captivated the DevoxxBE2023 audience with a deep dive into the intricate process of chip manufacturing and the role of Java in optimizing it. Johan, a seasoned speaker and JavaOne Rock Star, explained how ASML’s advanced lithography machines, powered by Java-based software, enable the creation of cutting-edge computer chips used in devices worldwide.

From Sand to Silicon Wafers

Johan began by demystifying chip production, starting with silica sand, an abundant resource transformed into silicon ingots and sliced into wafers. These wafers, approximately 30 cm in diameter, serve as the foundation for chips, hosting up to 600 chips per wafer or thousands for smaller sensors. He passed around a wafer adorned with Java’s mascot, Duke, illustrating the physical substrate of modern electronics.

The process involves printing multiple layers—up to 200—onto wafers using extreme ultraviolet (EUV) lithography machines. These machines, requiring four Boeing 747s for transport, achieve precision at the nanometer scale, with transistors as small as three nanometers. Johan likened this to driving a car 300 km and retracing the path with only 2 mm deviation, highlighting the extraordinary accuracy required.

The Role of EUV Lithography

Johan detailed the EUV lithography process, where tin droplets are hit by a 40-kilowatt laser to generate plasma at sun-like temperatures, producing EUV light. This light, directed by ultra-flat mirrors, patterns wafers through reticles costing €250,000 each. The process demands cleanroom environments, as even a single dust particle can ruin a chip, and involves continuous calibration to maintain precision across thousands of parameters.

ASML’s machines, some over 30 years old, remain in use for producing sensors and less advanced chips, demonstrating their longevity. Johan also previewed future advancements, such as high numerical aperture (NA) machines, which will enable even smaller transistors, further enhancing chip performance and energy efficiency.

Java-Powered Analytics Platform

At the heart of Johan’s talk was ASML’s Java-based analytics platform, which processes 31 terabytes of data weekly to optimize chip production. Built on Apache Spark, the platform distributes computations across worker nodes, supporting plugins for data ingestion, UI customization, and processing. These plugins allow departments to integrate diverse data types, from images to raw measurements, and support languages like Julia and C alongside Java.

The platform, running on-premise to protect sensitive data, consolidates previously disparate applications, improving efficiency and user experience. Johan highlighted a machine learning use case where the platform increased defect detection from 70% to 92% without slowing production, showcasing Java’s role in handling complex computations.

Challenges and Solutions in Chip Manufacturing

Johan discussed challenges like layer misalignment, which can cause short circuits or defective chips. The platform addresses these by analyzing wafer plots to identify correctable errors, such as adjusting subsequent layers to compensate for misalignments. Non-correctable errors may result in downgrading chips (e.g., from 16 GB to 8 GB RAM), ensuring minimal waste.

He emphasized a pragmatic approach to tool selection, starting with REST endpoints and gradually adopting Kafka for streaming data as needs evolved. Johan also noted ASML’s collaboration with tool maintainers to enhance compatibility, such as improving Spark’s progress tracking for customer feedback.

Future of Chip Manufacturing

Looking ahead, Johan highlighted the industry’s push to diversify chip production beyond Taiwan, driven by geopolitical and economic factors. However, building new factories, or “fabs,” costing $10–20 billion, faces challenges like equipment backlogs and the need for highly skilled operators. ASML’s customer support teams, working alongside clients like Intel, underscore the specialized knowledge required.

Johan concluded by stressing the importance of a forward-looking mindset, with ASML’s roadmap prioritizing innovation over rigid methodologies. This approach, combined with Java’s robustness, ensures the platform’s scalability and adaptability in a rapidly evolving industry.

Links:

PostHeaderIcon [DevoxxBE2023] Moving Java Forward Together: Community Power

Sharat Chander, Oracle’s Senior Director of Java Developer Engagement, delivered a compelling session at DevoxxBE2023, emphasizing the Java community’s pivotal role in driving the language’s evolution. With over 25 years in the IT industry, Sharat’s passion for Java and community engagement shone through as he outlined how developers can contribute to Java’s future, ensuring its relevance for decades to come.

The Legacy and Longevity of Java

Sharat began by reflecting on Java’s 28-year journey, a testament to its enduring impact on software development. He engaged the audience with a poll, revealing the diverse experience levels among attendees, from those using Java for five years to veterans with over 25 years of expertise. This diversity underscores Java’s broad adoption across industries, from small startups to large enterprises.

Java’s success, Sharat argued, stems from its thoughtful innovation strategy. Unlike the “move fast and break things” mantra, the Java team prioritizes stability and backward compatibility, ensuring that applications built on older versions remain functional. Projects like Amber, Panama, and the recent introduction of virtual threads in Java 21 exemplify this incremental yet impactful approach to innovation.

Balancing Stability and Progress

Sharat addressed the tension between rapid innovation and maintaining stability, a challenge given Java’s extensive history. He highlighted the six-month release cadence introduced to reduce latency to innovation, allowing developers to adopt new features without waiting for major releases. This approach, likened to a train arriving every three minutes, minimizes disruption and enhances accessibility.

The Java team’s commitment to trust, innovation, and predictability guides its development process. Sharat emphasized that Java’s design principles—established 28 years ago—continue to shape its evolution, ensuring it meets the needs of diverse applications, from AI and big data to emerging fields like quantum computing.

Community as the Heart of Java

The core of Sharat’s message was the community’s role in Java’s vitality. He debunked the “build it and they will come” myth, stressing that Java’s success relies on active community participation. Programs like the OpenJDK project invite developers to engage with mailing lists, review code check-ins, and contribute to technical decisions, fostering transparency and collaboration.

Sharat also highlighted foundational programs like the Java Community Process (JCP) and Java Champions, who advocate for Java independently, providing critical feedback to the Java team. He encouraged attendees to join Java User Groups (JUGs), noting the nearly 400 groups worldwide as vital hubs for knowledge sharing and networking.

Digital Engagement and Future Initiatives

Recognizing the digital era’s impact, Sharat discussed Oracle’s efforts to reach Java’s 10 million developers through platforms like dev.java. This portal aggregates learning resources, community content, and programs like JEEP Cafe and Sip of Java, which offer digestible insights into Java’s features. The recently launched Java Playground provides a browser-based environment for experimenting with code snippets, accelerating feature adoption.

Sharat also announced the community contributions initiative on dev.java, featuring content from Java Champions like Venkat Subramaniam and Hannes Kutz. This platform aims to showcase community expertise, encouraging developers to submit their best practices via GitHub pull requests.

Nurturing Diversity and Inclusion

A poignant moment in Sharat’s talk was his call for greater gender diversity in the Java community. He acknowledged the industry’s shortcomings in achieving balanced representation and urged collective action to expand the community’s mindshare. Programs like JDuchess aim to create inclusive spaces, ensuring Java’s evolution benefits from diverse perspectives.

Links:

PostHeaderIcon [DevoxxBE2023] Making Your @Beans Intelligent: Spring AI Innovations

At DevoxxBE2023, Dr. Mark Pollack delivered an insightful presentation on integrating artificial intelligence into Java applications using Spring AI, a project inspired by advancements in AI frameworks like LangChain and LlamaIndex. Mark, a seasoned Spring developer since 2003 and leader of the Spring Data project, explored how Java developers can harness pre-trained AI models to create intelligent applications that address real-world challenges. His talk introduced the audience to Spring AI’s capabilities, from simple “Hello World” examples to sophisticated use cases like question-and-answer systems over custom documents.

The Genesis of Spring AI

Mark began by sharing his journey into AI, sparked by the transformative impact of ChatGPT. Unlike traditional AI development, which often required extensive data cleaning and model training, pre-trained models like those from OpenAI offer accessible APIs and vast knowledge bases, enabling developers to focus on application engineering rather than data science. Mark highlighted how Spring AI emerged from his exploration of code generation, leveraging the structured nature of code within these models to create a framework tailored for Java developers. This framework abstracts the complexity of AI model interactions, making it easier to integrate AI into Spring-based applications.

Spring AI draws inspiration from Python’s AI ecosystem but adapts these concepts to Java’s idioms, emphasizing component abstractions and pluggability. Mark emphasized that this is not a direct port but a reimagination, aligning with the Spring ecosystem’s strengths in enterprise integration and batch processing. This approach positions Spring AI as a bridge between Java’s robust software engineering practices and the dynamic world of AI.

Core Components of AI Applications

A significant portion of Mark’s presentation focused on the architecture of AI applications, which extends beyond merely calling a model. He introduced a conceptual framework involving contextual data, AI frameworks, and models. Contextual data, akin to ETL (Extract, Transform, Load) processes, involves parsing and transforming data—such as PDFs—into embeddings stored in vector databases. These embeddings enable efficient similarity searches, crucial for use cases like question-and-answer systems.

Mark demonstrated a simple AI client in Spring AI, which abstracts interactions with various AI models, including OpenAI, Hugging Face, Amazon Bedrock, and Google Vertex. This portability allows developers to switch models without significant code changes. He also showcased the Spring CLI, a tool inspired by JavaScript’s Create React App, which simplifies project setup by generating starter code from existing repositories.

Prompt Engineering and Its Importance

Prompt engineering emerged as a critical theme in Mark’s talk. He explained that crafting effective prompts is essential for directing AI models to produce desired outputs, such as JSON-formatted responses or specific styles of answers. Spring AI’s PromptTemplate class facilitates this by allowing developers to create reusable, stateful templates with placeholders for dynamic content. Mark illustrated this with a demo where a prompt template generated a joke about a raccoon, highlighting the importance of roles (system and user) in defining the context and tone of AI responses.

He also touched on the concept of “dogfooding,” where AI models are used to refine prompts, creating a feedback loop that enhances their effectiveness. This iterative process, combined with evaluation techniques, ensures that applications deliver accurate and relevant responses, addressing challenges like model hallucinations—where AI generates plausible but incorrect information.

Retrieval Augmented Generation (RAG)

Mark introduced Retrieval Augmented Generation (RAG), a technique to overcome the limitations of AI models’ context windows, which restrict the amount of data they can process. RAG involves pre-processing data into smaller fragments, converting them into embeddings, and storing them in vector databases for similarity searches. This approach allows developers to provide only relevant data to the model, improving efficiency and accuracy.

In a demo, Mark showcased RAG with a bicycle shop dataset, where a question about city-commuting bikes retrieved relevant product descriptions from a vector store. This process mirrors traditional search engines but leverages AI to synthesize answers, demonstrating how Spring AI integrates with vector databases like Milvus and PostgreSQL to handle complex queries.

Real-World Applications and Future Directions

Mark highlighted practical applications of Spring AI, such as enabling question-and-answer systems for financial documents, medical records, or government programs like Medicaid. These use cases illustrate AI’s potential to make complex information more accessible, particularly for non-technical users. He also discussed the importance of evaluation in AI development, advocating for automated scoring mechanisms to assess response quality beyond simple test passing.

Looking forward, Mark outlined Spring AI’s roadmap, emphasizing robust core abstractions and support for a growing number of models and vector databases. He encouraged developers to explore the project’s GitHub repository and participate in its evolution, underscoring the rapid pace of AI advancements and the need for community involvement.

Links:

PostHeaderIcon [DevoxxUK2024] Processing XML with Kafka Connect by Dale Lane

Dale Lane, a seasoned developer at IBM with a deep focus on event-driven architectures, delivered a compelling session at DevoxxUK2024, unveiling a powerful Kafka Connect plugin designed to streamline XML data processing. With extensive experience in Apache Kafka and Flink, Dale addressed the challenges of integrating XML data into Kafka pipelines, a task often fraught with complexity due to the format’s incompatibility with Kafka’s native data structures like Avro or JSON. His presentation offers practical solutions for developers seeking to bridge external systems with Kafka, transforming XML into more manageable formats or generating XML outputs for legacy systems. Through clear examples, Dale illustrates how this open-source plugin enhances flexibility and efficiency in Kafka Connect pipelines, empowering developers to handle diverse data integration scenarios with ease.

Understanding Kafka Connect Pipelines

Dale begins by demystifying Kafka Connect, a robust framework for moving data between Kafka and external systems. He outlines two primary pipeline types: source pipelines, which import data from external systems into Kafka, and sink pipelines, which export Kafka data to external destinations. A source pipeline typically involves a connector to fetch data, optional transformations to modify or filter it, and a converter to serialize the data into formats like Avro or JSON for Kafka topics. Conversely, a sink pipeline starts with a converter to deserialize Kafka data, followed by transformations and a connector to deliver it to an external system. This foundational explanation sets the stage for understanding where and how XML processing fits into these workflows, ensuring developers grasp the pipeline’s modular structure before diving into specific use cases.

Converting XML for Kafka Integration

A common challenge Dale addresses is integrating XML data from external systems, such as IBM MQ or XML-based web services, into Kafka’s ecosystem, which favors structured formats. He introduces the Kafka Connect plugin, available on GitHub under an Apache license, as a solution to parse XML into structured records early in the pipeline. For instance, using an IBM MQ source connector, the plugin can transform XML documents from a message queue into a generic structured format, allowing subsequent transformations and serialization into JSON or Avro. Dale demonstrates this with a weather API that returns XML strings, showing how the plugin converts these into structured objects for further processing, making them compatible with Kafka tools that struggle with raw XML. This approach significantly enhances the usability of external data within Kafka’s ecosystem.

Generating XML Outputs from Kafka

For scenarios where external systems require XML, Dale showcases the plugin’s ability to convert Kafka’s JSON or Avro messages into XML strings within a sink pipeline. He provides an example using a Kafka topic with JSON messages destined for an IBM MQ system, where the plugin, integrated as part of the sink connector, transforms structured data into XML before delivery. Another case involves an HTTP sink connector posting to an XML-based web service, such as an XML-RPC API. Here, the pipeline deserializes JSON, applies transformations to align with the API’s payload requirements, and uses the plugin to produce an XML string. This flexibility ensures seamless communication with legacy systems, bridging modern Kafka workflows with traditional XML-based infrastructure.

Enhancing Pipelines with Schema Support

Dale emphasizes the plugin’s schema handling capabilities, which add robustness to XML processing. In source pipelines, the plugin can reference an external XSD schema to validate and structure XML data, which is then paired with an Avro converter to submit schemas to a registry, ensuring compatibility with Kafka’s schema-driven ecosystem. In sink pipelines, enabling schema inclusion generates an XSD alongside the XML output, providing a clear description of the data’s structure. Dale illustrates this with a stock price connector, where enabling schema support produces XML events with accompanying XSDs, enhancing interoperability. This feature is particularly valuable for maintaining data integrity across systems, making the plugin a versatile tool for complex integration tasks.

Links:

PostHeaderIcon [DevoxxBE2023] Securing the Supply Chain for Your Java Applications by Thomas Vitale

At Devoxx Belgium 2023, Thomas Vitale, a software engineer and architect at Systematic, delivered an authoritative session on securing the software supply chain for Java applications. As the author of Cloud Native Spring in Action and a passionate advocate for cloud-native technologies, Thomas provided a comprehensive exploration of securing every stage of the software lifecycle, from source code to deployment. Drawing on the SLSA framework and CNCF research, he demonstrated practical techniques for ensuring integrity, authenticity, and resilience using open-source tools like Gradle, Sigstore, and Kyverno. Through a blend of theoretical insights and live demonstrations, Thomas illuminated the critical importance of supply chain security in today’s threat landscape.

Safeguarding Source Code with Git Signatures

Thomas began by defining the software supply chain as the end-to-end process of delivering software, encompassing code, dependencies, tools, practices, and people. He emphasized the risks at each stage, starting with source code. Using Git as an example, Thomas highlighted its audit trail capabilities but cautioned that commit authorship can be manipulated. In a live demo, he showed how he could impersonate a colleague by altering Git’s username and email, underscoring the need for signed commits. By enforcing signed commits with GPG or SSH keys—or preferably a keyless approach via GitHub’s single sign-on—developers can ensure commit authenticity, establishing a verifiable provenance trail critical for supply chain security.

Managing Dependencies with Software Bills of Materials (SBOMs)

Moving to dependencies, Thomas stressed the importance of knowing exactly what libraries are included in a project, especially given vulnerabilities like Log4j. He introduced Software Bills of Materials (SBOMs) as a standardized inventory of software components, akin to a list of ingredients. Using the CycloneDX plugin for Gradle, Thomas demonstrated generating an SBOM during the build process, which provides precise dependency details, including versions, licenses, and hashes for integrity verification. This approach, integrated into Maven or Gradle, ensures accuracy over post-build scanning tools like Snyk, enabling developers to identify vulnerabilities, check license compliance, and verify component integrity before production.

Thomas further showcased Dependency-Track, an OWASP project, to analyze SBOMs and flag vulnerabilities, such as a critical issue in SnakeYAML. He introduced the Vulnerability Exploitability Exchange (VEX) standard, which complements SBOMs by documenting whether vulnerabilities affect an application. In his demo, Thomas marked a SnakeYAML vulnerability as a false positive due to Spring Boot’s safe deserialization, demonstrating how VEX communicates security decisions to stakeholders, reducing unnecessary alerts and ensuring compliance with emerging regulations.

Building Secure Artifacts with Reproducible Builds

The build phase, Thomas explained, is another critical juncture for security. Using Spring Boot as an example, he outlined three packaging methods: JAR files, native executables, and container images. He critiqued Dockerfiles for introducing non-determinism and maintenance overhead, advocating for Cloud Native Buildpacks as a reproducible, secure alternative. In a demo, Thomas built a container image with Buildpacks, highlighting its fixed creation timestamp (January 1, 1980) to ensure identical outputs for unchanged inputs, enhancing security by eliminating variability. This reproducibility, coupled with SBOM generation during the build, ensures artifacts are both secure and traceable.

Signing and Verifying Artifacts with SLSA

To ensure artifact integrity, Thomas introduced the SLSA framework, which provides guidelines for securing software artifacts across the supply chain. He demonstrated signing container images with Sigstore’s Cosign tool, using a keyless approach to avoid managing private keys. This process, integrated into a GitHub Actions pipeline, ensures that artifacts are authentically linked to their creator. Thomas further showcased SLSA’s provenance generation, which documents the artifact’s origin, including the Git commit hash and build steps. By achieving SLSA Level 3, his pipeline provided non-falsifiable provenance, ensuring traceability from source code to deployment.

Securing Deployments with Policy Enforcement

The final stage, deployment, requires validating artifacts to ensure they meet security standards. Thomas demonstrated using Cosign and the SLSA Verifier to validate signatures and provenance, ensuring only trusted artifacts are deployed. On Kubernetes, he introduced Kyverno, a policy engine that enforces signature and provenance checks, automatically rejecting non-compliant deployments. This approach ensures that production environments remain secure, aligning with the principle of validating metadata to prevent unauthorized or tampered artifacts from running.

Conclusion: A Holistic Approach to Supply Chain Security

Thomas’s session at Devoxx Belgium 2023 provided a robust framework for securing Java application supply chains. By addressing source code integrity, dependency management, build reproducibility, artifact signing, and deployment validation, he offered a comprehensive strategy to mitigate risks. His practical demonstrations, grounded in open-source tools and standards like SLSA and VEX, empowered developers to adopt these practices without overwhelming complexity. Thomas’s emphasis on asking “why” at each step encouraged attendees to tailor security measures to their context, ensuring both compliance and resilience in an increasingly regulated landscape.

Links: