Recent Posts
Archives

Archive for the ‘en-US’ Category

PostHeaderIcon [NodeCongress2024] The Supply Chain Security Crisis in Open Source: A Shift from Vulnerabilities to Malicious Attacks

Lecturer: Feross Aboukhadijeh

Feross Aboukhadijeh is an entrepreneur, prolific open-source programmer, and the Founder and CEO of Socket, a developer-first security platform. He is renowned in the JavaScript ecosystem for creating widely adopted open-source projects such as WebTorrent and Standard JS, and for maintaining over 100 npm packages. Academically, he serves as a Lecturer at Stanford University, where he has taught the course CS 253 Web Security. His professional career includes roles at major technology companies like Quora, Facebook, Yahoo, and Intel.

Abstract

This article analyzes the escalating threat landscape within the open-source software (OSS) supply chain, focusing specifically on malicious package attacks as opposed to traditional security vulnerabilities. Drawing from a scholarly lecture, it outlines the primary attack vectors, including typosquatting, dependency confusion, and sophisticated account takeover (e.g., the XZ Utils backdoor). The analysis highlights the methodological shortcomings of the existing vulnerability reporting system (CVE/GHSAs) in detecting these novel risks. Finally, it details the emerging innovation of using static analysis, dynamic runtime analysis, and Large Language Models (LLMs) to proactively audit package behavior and safeguard the software supply chain.

Context: The Evolving Open Source Threat Model

The dependency model of modern software development, characterized by the massive reuse of third-party open-source packages, has created a fertile ground for large-scale security breaches. The fundamental issue is the inherent trust placed in thousands of transitive dependencies, which collectively form the software supply chain. The context of security has shifted from managing known vulnerabilities to defending against deliberate malicious injection.

Analysis of Primary Attack Vectors

Attackers employ several cunning strategies to compromise the supply chain:

  1. Typosquatting and Name Confusion: This low-effort but high-impact method involves publishing a package with a name slightly misspelled from a popular one (e.g., eslunt instead of eslint). Developers accidentally install the malicious version, which often contains code to exfiltrate environment variables, system information, or credentials.
  2. Dependency Confusion: This technique exploits automated build tools in private development environments. By publishing a malicious package to a public registry (like npm) with the same name as a private internal dependency, the public package is often inadvertently downloaded and prioritized, leading to unauthorized code execution.
  3. Account Takeover and Backdoors: This represents the most sophisticated class of attack, exemplified by the XZ Utils incident. Attackers compromise a maintainer’s account (often via phishing) and subtly introduce a backdoor into a critical, widely used project. The XZ Utils attack, in particular, was characterized by years of preparation and extremely complex code obfuscation, which utilized a Trojanized m4 macro to hide the malicious payload and only execute it on specific conditions (e.g., when run on a Linux distribution with sshd installed).

Methodological Innovations in Defense

The traditional security model, reliant on the Common Vulnerabilities and Exposures (CVE) database, is inadequate for detecting these malicious behaviors. A new, analytical methodology is required, focusing on package auditing and behavioral analysis:

  • Static Manifest Analysis: Packages can be analyzed for red flags in their manifest file (package.json), such as the use of risky postinstall scripts, which execute code immediately upon installation and are often used by malware.
  • Runtime Behavioral Analysis (Sandboxing): The most effective defense is to run the package installation and observe its behavior in a sandboxed environment, checking for undesirable actions like networking activity or shell command execution.
  • LLM-Assisted Analysis: Advanced security tools are now using Large Language Models (LLMs) to reason about the relationship between a package’s declared purpose and its actual code. An LLM can be prompted to assess whether a dependency that claims to be a utility function is legitimately opening network connections, providing a powerful, context-aware method for identifying behavioral anomalies.

Conclusion and Implications for Robust Software Engineering

The rise of malicious supply chain attacks mandates a paradigm shift in how developers approach dependency management. The existing vulnerability-centric system is too noisy and fails to address the root cause of these sophisticated exploits. For secure and robust software engineering, the definition of “open-source security” must be expanded beyond traditional vulnerability scanning to include maintenance risks (unmaintained or low-quality packages). Proactive defense requires the implementation of continuous, behavioral auditing tools that leverage advanced techniques like LLMs to identify deviations from expected package behavior.

Links

Hashtags: #OpenSourceSecurity #SupplyChainAttack #SoftwareSupplyChain #LLMSecurity #Typosquatting #NodeCongress

PostHeaderIcon Understanding Dependency Management and Resolution: A Look at Java, Python, and Node.js

Understanding Dependency Management and Resolution: A Look at Java, Python, and Node.js

Mastering how dependencies are handled can define your project’s success or failure. Let’s explore the nuances across today’s major development ecosystems.

Introduction

Every modern application relies heavily on external libraries. These libraries accelerate development, improve security, and enable integration with third-party services. However, unmanaged dependencies can lead to catastrophic issues — from version conflicts to severe security vulnerabilities. That’s why understanding dependency management and resolution is absolutely essential, particularly across different programming ecosystems.

What is Dependency Management?

Dependency management involves declaring external components your project needs, installing them properly, ensuring their correct versions, and resolving conflicts when multiple components depend on different versions of the same library. It also includes updating libraries responsibly and securely over time. In short, good dependency management prevents issues like broken builds, “dependency hell”, or serious security holes.

Java: Maven and Gradle

In the Java ecosystem, dependency management is an integrated and structured part of the build lifecycle, using tools like Maven and Gradle.

Maven and Dependency Scopes

Maven uses a declarative pom.xml file to list dependencies. A particularly important notion in Maven is the dependency scope.

Scopes control where and how dependencies are used. Examples include:

  • compile (default): Needed at both compile time and runtime.
  • provided: Needed for compile, but provided at runtime by the environment (e.g., Servlet API in a container).
  • runtime: Needed only at runtime, not at compile time.
  • test: Used exclusively for testing (JUnit, Mockito, etc.).
  • system: Provided by the system explicitly (deprecated practice).

<dependency>
  <groupId>junit</groupId>
  <artifactId>junit</artifactId>
  <version>4.13.2</version>
  <scope>test</scope>
</dependency>
    

This nuanced control allows Java developers to avoid bloating production artifacts with unnecessary libraries, and to fine-tune build behaviors. This is a major feature missing from simpler systems like pip or npm.

Gradle

Gradle, offering both Groovy and Kotlin DSLs, also supports scopes through configurations like implementation, runtimeOnly, testImplementation, which have similar meanings to Maven scopes but are even more flexible.


dependencies {
    implementation 'org.springframework.boot:spring-boot-starter'
    testImplementation 'org.springframework.boot:spring-boot-starter-test'
}
    

Python: pip and Poetry

Python dependency management is simpler, but also less structured compared to Java. With pip, there is no formal concept of scopes.

pip

Developers typically separate main dependencies and development dependencies manually using different files:

  • requirements.txt – Main project dependencies.
  • requirements-dev.txt – Development and test dependencies (pytest, tox, etc.).

This manual split is prone to human error and lacks the rigorous environment control that Maven or Gradle enforce.

Poetry

Poetry improves the situation by introducing a structured division:


[tool.poetry.dependencies]
requests = "^2.31"

[tool.poetry.dev-dependencies]
pytest = "^7.1"
    

Poetry brings concepts closer to Maven scopes, but they are still less fine-grained (no runtime/compile distinction, for instance).

Node.js: npm and Yarn

JavaScript dependency managers like npm and yarn allow a simple distinction between regular and development dependencies.

npm

Dependencies are declared in package.json under different sections:

  • dependencies – Needed in production.
  • devDependencies – Needed only for development (e.g., testing libraries, linters).

{
  "dependencies": {
    "express": "^4.18.2"
  },
  "devDependencies": {
    "mocha": "^10.2.0"
  }
}
    

While convenient, npm’s dependency management lacks Maven’s level of strictness around dependency resolution, often leading to version mismatches or “node_modules bloat.”

Key Differences Between Ecosystems

When switching between Java, Python, and Node.js environments, developers must be aware of the following fundamental differences:

1. Formality of Scopes

Java’s Maven/Gradle ecosystem defines scopes formally at the dependency level. Python (pip) and JavaScript (npm) ecosystems use looser, file- or section-based categorization.

2. Handling of Transitive Dependencies

Maven and Gradle resolve and include transitive dependencies automatically with sophisticated conflict resolution strategies (e.g., nearest version wins). pip historically had weak transitive dependency handling, leading to issues unless careful pinning is done. npm introduced better nested module flattening with npm v7+ but conflicts still occur in complex trees.

3. Lockfiles

npm/yarn and Python Poetry use lockfiles (package-lock.json, yarn.lock, poetry.lock) to ensure consistent dependency installations across machines. Maven and Gradle historically did not need lockfiles because they strictly followed declared versions and scopes. However, Gradle introduced lockfile support with dependency locking in newer versions.

4. Dependency Updating Strategy

Java developers often manually manage dependency versions inside pom.xml or use dependencyManagement blocks for centralized control. pip requires updating requirements.txt or regenerating them via pip freeze. npm/yarn allows semver rules (“^”, “~”) but auto-updating can lead to subtle breakages if not careful.

Best Practices Across All Languages

  • Pin exact versions wherever possible to avoid surprise updates.
  • Use lockfiles and commit them to version control (Git).
  • Separate production and development/test dependencies explicitly.
  • Use dependency scanners (e.g., OWASP Dependency-Check, Snyk, npm audit) regularly to detect vulnerabilities.
  • Prefer stable, maintained libraries with good community support and recent commits.

Conclusion

Dependency management, while often overlooked early in projects, becomes critical as applications scale. Maven and Gradle offer the most fine-grained controls via dependency scopes and conflict resolution. Python and JavaScript ecosystems are evolving rapidly, but require developers to be much more careful manually. Understanding these differences, and applying best practices accordingly, will ensure smoother builds, faster delivery, and safer production systems.

Interested in deeper dives into dependency vulnerability scanning, SBOM generation, or automatic dependency update pipelines? Subscribe to our blog for more in-depth content!

PostHeaderIcon [Devoxx FR 2024] Mastering Reproducible Builds with Apache Maven: Insights from Hervé Boutemy


Introduction

In a recent presentation, Hervé Boutemy, a veteran Maven maintainer, Apache Software Foundation member, and Solution Architect at Sonatype, delivered a compelling talk on reproducible builds with Apache Maven. With over 20 years of experience in Java, CI/CD, DevOps, and software supply chain security, Hervé shared his five-year journey to make Maven builds reproducible, a critical practice for achieving the highest level of trust in software, as defined by SLSA Level 4. This post dives into the key concepts, practical steps, and surprising benefits of reproducible builds, based on Hervé’s insights and hands-on demonstrations.

What Are Reproducible Builds?

Reproducible builds ensure that compiling the same source code, with the same environment and build tools, produces identical binaries, byte-for-byte. This practice verifies that the distributed binary matches the source code, eliminating risks like malicious tampering or unintended changes. Hervé highlighted the infamous XZ incident, where discrepancies between source tarballs and Git repositories went unnoticed—reproducible builds could have caught this by ensuring the binary matched the expected source.

Originally pioneered by Linux distributions like Debian in 2013, reproducible builds have gained traction in the Java ecosystem. Hervé’s work has led to over 2,000 verified reproducible releases from 500+ open-source projects on Maven Central, with stats growing weekly.

Why Reproducible Builds Matter

Reproducible builds are primarily about security. They allow anyone to rebuild a project and confirm that the binary hasn’t been compromised (e.g., no backdoors or “foireux” additions, as Hervé humorously put it). But Hervé’s five-year experience revealed additional benefits:

  • Build Validation: Ensure patches or modifications don’t introduce unintended changes. A “build successful” message doesn’t guarantee the binary is correct—reproducible builds do.
  • Data Leak Prevention: Hervé found sensitive data (e.g., usernames, machine names, even a PGP passphrase!) embedded in Maven Central artifacts, exposing personal or organizational details.
  • Enterprise Trust: When outsourcing development, reproducible builds verify that a vendor’s binary matches the provided source, saving time and reducing risk.
  • Build Efficiency: Reproducible builds enable caching optimizations, improving build performance.

These benefits extend beyond security, making reproducible builds a powerful tool for developers, enterprises, and open-source communities.

Implementing Reproducible Builds with Maven

Hervé outlined a practical workflow to achieve reproducible builds, demonstrated through his open-source project, reproducible-central, which includes scripts and rebuild recipes for 3,500+ compilations across 627+ projects. Here’s how to make your Maven builds reproducible:

Step 1: Rebuild and Verify

Start by rebuilding a project from its source (e.g., a Git repository tag) and comparing the output binary to a reference (e.g., Maven Central or an internal repository). Hervé’s rebuild.sh script automates this:

  • Specify the Environment: Define the JDK (e.g., JDK 8 or 17), OS (Windows, Linux, FreeBSD), and Maven command (e.g., mvn clean verify -DskipTests).
  • Use Docker: The script creates a Docker image with the exact environment (JDK, OS, Maven version) to ensure consistency.
  • Compare Binaries: The script downloads the reference binary and checks if the rebuilt binary matches, reporting success or failure.

Hervé demonstrated this with the Maven Javadoc Plugin (version 3.5.0), showing a 100% reproducible build when the environment matched the original (e.g., JDK 8 on Windows).

Step 2: Diagnose Differences

If the binaries don’t match, use diffoscope, a tool from the Linux reproducible builds community, to analyze differences. Diffoscope compares archives (e.g., JARs), nested archives, and even disassembles bytecode to pinpoint issues like:

  • Timestamps: JARs include file timestamps, which vary by build time.
  • File Order: ZIP-based JARs don’t guarantee consistent file ordering.
  • Bytecode Variations: Different JDK major versions produce different bytecode, even for the same target (e.g., targeting Java 8 with JDK 17 vs. JDK 8).
  • Permissions: File permissions (e.g., group write access) differ across environments.

Hervé showed a case where a build failed due to a JDK mismatch (JDK 11 vs. JDK 8), which diffoscope revealed through bytecode differences.

Step 3: Configure Maven for Reproducibility

To make builds reproducible, address common sources of “noise” in Maven projects:

  • Fix Timestamps: Set a consistent timestamp using the project.build.outputTimestamp property, managed by the Maven Release or Versions plugins. This ensures JARs have identical timestamps across builds.
  • Upgrade Plugins: Many Maven plugins historically introduced variability (e.g., random timestamps or environment-specific data). Hervé contributed fixes to numerous plugins, and his artifact:check-buildplan goal identifies outdated plugins, suggesting upgrades to reproducible versions.
  • Avoid Non-Reproducible Outputs: Skip Javadoc generation (highly variable) and GPG signing (non-reproducible by design) during verification.

For example, Hervé explained that configuring project.build.outputTimestamp and upgrading plugins eliminated timestamp and file-order issues in JARs, making builds reproducible.

Step 4: Test Locally

Before scaling, test reproducibility locally using mvn verify (not install, which pollutes the local repository). The artifact:compare goal compares your build output to a reference binary (e.g., from Maven Central or an internal repository). For internal projects, specify your repository URL as a parameter.

To test without a remote repository, build twice locally: run mvn install for the first build, then mvn verify for the second, comparing the results. This catches issues like unfixed dates or environment-specific data.

Step 5: Scale and Report

For large-scale verification, adapt Hervé’s reproducible-central scripts to your internal repository. These scripts generate reports with group IDs, artifact IDs, and reproducibility scores, helping track progress across releases. Hervé’s stats (e.g., 100% reproducibility for some projects, partial for others) provide a model for enterprise reporting.

Challenges and Lessons Learned

Hervé shared several challenges and insights from his journey:

  • JDK Variability: Bytecode differs across major JDK versions, even for the same target. Always match the original JDK major version (e.g., JDK 8 for a Java 8 target).
  • Environment Differences: Windows vs. Linux line endings (CRLF vs. LF) or file permissions (e.g., group write access) can break reproducibility. Docker ensures consistent environments.
  • Plugin Issues: Older plugins introduced variability, but Hervé’s contributions have made modern versions reproducible.
  • Unexpected Findings: Reproducible builds uncovered sensitive data in Maven Central artifacts, highlighting the need for careful build hygiene.

One surprising lesson came from file permissions: Hervé discovered that newer Linux distributions default to non-writable group permissions, unlike older ones, requiring adjustments to build recipes.

Interactive Learning: The Quiz

Hervé ended with a fun quiz to test the audience’s understanding, presenting rebuild results and asking, “Reproducible or not?” Examples included:

  • Case 1: A Maven Javadoc Plugin 3.5.0 build matched the reference perfectly (reproducible).
  • Case 2: A build showed bytecode differences due to a JDK mismatch (JDK 11 vs. JDK 8, not reproducible).
  • Case 3: A build differed only in file permissions (group write access), fixable by adjusting the environment (reproducible with a corrected recipe).

The quiz reinforced a key point: reproducibility requires precise environment matching, but tools like diffoscope make debugging straightforward.

Getting Started

Ready to make your Maven builds reproducible? Follow these steps:

  1. Clone reproducible-central and explore Hervé’s scripts and stats.
  2. Run mvn artifact:check-buildplan to identify and upgrade non-reproducible plugins.
  3. Set project.build.outputTimestamp in your POM file to fix JAR timestamps.
  4. Test locally with mvn verify and artifact:compare, specifying your repository if needed.
  5. Scale up using rebuild.sh and Docker for consistent environments, adapting to your internal repository.

Hervé encourages feedback to improve his tools, so if you hit issues, reach out via the project’s GitHub or Apache’s community channels.

Conclusion

Reproducible builds with Maven are not only achievable but transformative, offering security, trust, and operational benefits. Hervé Boutemy’s work demystifies the process, providing tools, scripts, and a clear roadmap to success. From preventing backdoors to catching configuration errors and sensitive data leaks, reproducible builds are a must-have for modern Java development.

Start small with artifact:check-buildplan, test locally, and scale with reproducible-central. As Hervé’s 3,500+ rebuilds show, the Java community is well on its way to making reproducibility the norm. Join the movement, and let’s build software we can trust!

Resources

PostHeaderIcon [Devoxx FR 2024] Instrumenting Java Applications with OpenTelemetry: A Comprehensive Guide


Introduction

In a recent presentation at a Paris JUG event, Bruce Bujon, an R&D Engineer at Datadog and an open-source developer, delivered an insightful talk on instrumenting Java applications with OpenTelemetry. This powerful observability framework is transforming how developers monitor and analyze application performance, infrastructure, and security. In this detailed post, we’ll explore the key concepts from Bruce’s presentation, breaking down OpenTelemetry, its components, and practical steps to implement it in Java applications.

What is OpenTelemetry?

OpenTelemetry is an open-source observability framework designed to collect, process, and export telemetry data in a vendor-agnostic manner. It captures data from various sources—such as virtual machines, databases, and applications—and exports it to observability backends for analysis. Importantly, OpenTelemetry focuses solely on data collection and management, leaving visualization and analysis to backend tools like Datadog, Jaeger, or Grafana.

The framework supports three primary signals:

  • Traces: These map the journey of requests through an application, highlighting the time taken by each component or microservice.
  • Logs: Timestamped events, such as user actions or system errors, familiar to most developers.
  • Metrics: Aggregated numerical data, like request rates, error counts, or CPU usage over time.

In his talk, Bruce focused on traces, which are particularly valuable for understanding performance bottlenecks in distributed systems.

Why Use OpenTelemetry for Java Applications?

For Java developers, OpenTelemetry offers a standardized way to instrument applications, ensuring compatibility with various observability backends. Its flexibility allows developers to collect telemetry data without being tied to a specific tool, making it ideal for diverse tech stacks. Bruce highlighted its growing adoption, noting that OpenTelemetry is the second most active project in the Cloud Native Computing Foundation (CNCF), behind only Kubernetes.

Instrumenting a Java Application: A Step-by-Step Guide

Bruce demonstrated three approaches to instrumenting Java applications with OpenTelemetry, using a simple example of two web services: an “Order” service and a “Storage” service. The goal was to trace a request from the Order service, which calls the Storage service to check stock levels for items like hats, bags, and socks.

Approach 1: Manual Instrumentation with OpenTelemetry API and SDK

The first approach involves manually instrumenting the application using the OpenTelemetry API and SDK. This method offers maximum control but requires significant development effort.

Steps:

  1. Add Dependencies: Include the OpenTelemetry Bill of Materials (BOM) to manage library versions, along with the API, SDK, OTLP exporter, and semantic conventions.
  2. Initialize the SDK: Set up a TracerProvider with a resource defining the service (e.g., “storage”) and attributes like service name and deployment environment.
  3. Create a Tracer: Use the Tracer to generate spans for specific operations, such as a web route or internal method.
  4. Instrument Routes: For each route or method, create a span using a SpanBuilder, set attributes (e.g., span kind as “server”), and mark the start and end of the span.
  5. Export Data: Configure the SDK to export spans to an OpenTelemetry Collector via the OTLP protocol.

Example Output: Bruce showed a trace with two spans—one for the route and one for an internal method—displayed in Datadog’s APM view, with attributes like service name and HTTP method.

Pros: Fine-grained control over instrumentation.

Cons: Verbose and time-consuming, especially for large applications or libraries with private APIs.

Approach 2: Framework Support with Spring Boot

The second approach leverages framework-specific integrations, such as Spring Boot’s OpenTelemetry starter, to automate instrumentation.

Steps:

  1. Add Spring Boot Starter: Include the OpenTelemetry starter, which bundles the API, SDK, exporter, and autoconfigure dependencies.
  2. Configure Environment Variables: Set variables for the service name, OTLP endpoint, and other settings.
  3. Run the Application: The starter automatically instruments web routes, capturing HTTP methods, routes, and response codes.

Example Output: Bruce demonstrated a trace for the Order service, with spans automatically generated for routes and tagged with HTTP metadata.

Pros: Minimal code changes and good generic instrumentation.

Cons: Limited customization and varying support across frameworks (e.g., Spring Boot doesn’t support JDBC out of the box).

Approach 3: Auto-Instrumentation with JVM Agent

The third and most powerful approach uses the OpenTelemetry JVM agent for automatic instrumentation, requiring minimal code changes.

Steps:

  1. Add the JVM Agent: Attach the OpenTelemetry Java agent to the JVM using a command-line option (e.g., -javaagent:opentelemetry-javaagent.jar).
  2. Configure Environment Variables: Use autoconfigure variables (around 80 options) to customize the agent’s behavior.
  3. Remove Manual Instrumentation: Eliminate SDK, exporter, and framework dependencies, keeping only the API and semantic conventions for custom instrumentation.
  4. Run the Application: The agent instruments web servers, clients, and libraries (e.g., JDBC, Kafka) at runtime.

Example Output: Bruce showcased a complete distributed trace, including spans for both services, web clients, and servers, with context propagation handled automatically.

Pros: Comprehensive instrumentation with minimal effort, supporting over 100 libraries.

Cons: Potential conflicts with other JVM agents (e.g., security tools) and limited support for native images (e.g., Quarkus).

Context Propagation: Linking Traces Across Services

A critical aspect of distributed tracing is context propagation, ensuring that spans from different services are linked within a single trace. Bruce explained that without propagation, the Order and Storage services generated separate traces.

To address this, OpenTelemetry uses HTTP headers (e.g., W3C’s traceparent and tracestate) to carry tracing context. In the manual approach, Bruce implemented a RestTemplate interceptor in Spring to inject headers and a Quarkus filter to extract them. The JVM agent, however, handles this automatically, simplifying the process.

Additional Considerations

  • Baggage: In response to an audience question, Bruce clarified that OpenTelemetry’s baggage feature allows propagating business-specific metadata across services, complementing tracing context.
  • Cloud-Native Support: While cloud providers like AWS Lambda have proprietary monitoring solutions, their native support for OpenTelemetry varies. Bruce suggested further exploration for specific use cases like batch jobs or serverless functions.
  • Performance: The JVM agent modifies bytecode at runtime, which may impact startup time but generally has negligible runtime overhead.

Conclusion

OpenTelemetry is a game-changer for Java developers seeking to enhance application observability. As Bruce demonstrated, it offers three flexible approaches—manual instrumentation, framework support, and auto-instrumentation—catering to different needs and expertise levels. The JVM agent stands out for its ease of use and comprehensive coverage, making it an excellent starting point for teams new to OpenTelemetry.

To get started, add the OpenTelemetry Java agent to your application with a single command-line option and configure it via environment variables. This minimal setup allows you to immediately observe your application’s behavior and assess OpenTelemetry’s value for your team.

The code and slides from Bruce’s presentation are available on GitHub, providing a practical reference for implementing OpenTelemetry in your projects. Whether you’re monitoring microservices or monoliths, OpenTelemetry empowers you to gain deep insights into your applications’ performance and behavior.

Resources

PostHeaderIcon [PHPForumParis2023] Streams: We All Underestimate Predis! – Alexandre Daubois

Alexandre Daubois, lead Symfony developer at Wanadev Digital, delivered a concise yet impactful session at Forum PHP 2023, spotlighting the power of Predis, a PHP client for Redis. Focusing on his team’s work at Wanadev Digital, Alexandre shared how Predis’s stream capabilities resolved critical performance issues in their 3D home modeling tool, Kozikaza. His talk highlighted practical applications of Redis streams, inspiring developers to leverage this underutilized tool for efficient data handling.

The Power of Redis Streams

Alexandre introduced Redis streams as a lightweight, in-memory data structure ideal for handling large datasets. At Wanadev Digital, the Kozikaza platform, which enables users to design 3D home models in browsers, faced challenges with storing and processing large JSON models. Alexandre explained how Predis’s stream functionality allowed his team to write data incrementally to cloud storage, avoiding memory bottlenecks. This approach enabled Kozikaza to handle massive datasets, such as 50GB JSON files, efficiently.

Solving Real-World Challenges

Detailing the implementation, Alexandre described how Predis’s Lazy Stream feature facilitated piecewise data writing to cloud buckets, resolving memory constraints in Kozikaza’s workflow. He shared user behavior insights, noting that long session times (up to six hours) made initial load times less critical, as users kept the application open. This context allowed Alexandre’s team to prioritize functionality over premature optimization, using Predis to deliver a robust solution under tight deadlines.

Links:

PostHeaderIcon [KotlinConf2023] Transforming Farmers’ Lives in Kenya: Apollo Agriculture’s Android Apps with Harun Wangereka

Harun Wangereka, a Software Engineer at Apollo Agriculture and a Google Developer Expert for Android, delivered an inspiring presentation at KotlinConf’23 about how his company is leveraging Android technology to change the lives of farmers in Kenya. His talk detailed Apollo Agriculture’s two core Android applications, built entirely in Kotlin, which are offline-first and utilize server-driven UI (SDUI) with Jetpack Compose to cater to the unique challenges of their user base. Harun is also active in Droidcon Kenya.

Apollo Agriculture’s mission is to empower small-scale farmers by bundling financing, high-quality farm inputs, agronomic advice, insurance, and market access. Their tech-based approach uses satellite data and machine learning for credit decisions and automated operations to maintain low costs and scalability. The customer journey involves signup via agents or SMS/USSD, kyc data collection (including GPS farm outlines), automated credit decisions (often within minutes), input pickup from agro-dealers, digital advice via voice trainings, and loan repayment post-harvest.

Addressing Unique Challenges in the Kenyan Context

Harun highlighted several critical challenges that shaped their app development strategy:
* Low-Memory Devices: Many agents and farmers use entry-level Android devices with limited RAM and storage. The apps need to be lightweight and performant.
* Low/Intermittent Internet Bandwidth: Internet connectivity can be unreliable and expensive. An offline-first approach is crucial, allowing agents to perform tasks without constant internet access and sync data later.
* Diverse User Needs and Rapid Iteration: The agricultural domain requires frequent updates to forms, workflows, and information provided to farmers and agents. A flexible UI system that can be updated without frequent app releases is essential.

These challenges led Apollo Agriculture to adopt a server-driven UI (SDUI) approach. Initially implemented with Anko (a deprecated Kotlin library for Android UI), they later rewrote this system entirely using Jetpack Compose.

Server-Driven UI with Jetpack Compose

The core of their SDUI system relies on JSON responses from the server that define the UI components, their properties, validations, and conditional logic.
Key aspects of their implementation include:
* Task-Based Structure: The app presents tasks to agents (e.g., onboarding a farmer, collecting survey data). Each task is represented by a JSON schema from the server.
* Dynamic Form Rendering: The JSON schema defines various UI elements like text inputs, number inputs, date pickers, location pickers (with map integration for capturing farm boundaries), image inputs (with compression), and more. These are dynamically rendered using Jetpack Compose.
* Stateful Composable Components: Harun detailed their approach to building stateful UI components in Compose. Each question or input field manages its own state (value, errors, visibility) using remember and mutableStateOf. Validation logic (e.g., required fields, min/max length) is also defined in the JSON and applied dynamically.
* Triggers and Conditionals: The JSON schema supports triggers (e.g., “on save”) and complex conditional logic using an internal tool called “Choice Expressions” and an implementation of JSON Schema. This allows UI elements or entire sections to be shown/hidden or enabled/disabled based on user input or other conditions, enabling dynamic and context-aware forms.
* Offline First: Task schemas and user data are stored locally, allowing full offline functionality. Data is synced with the server when connectivity is available.
* Testing: They extensively test their dynamic UI components and state logic in isolation, verifying state changes, validation behavior, and conditional rendering.

Harun shared examples of the JSON structure for defining UI elements, properties (like labels, hints, input types), validators, and conditional expressions. He walked through how a simple text input composable would manage its state, handle user input, and apply validation rules based on the server-provided schema.

Learnings and Future Considerations

The journey involved migrating from Anko to Jetpack Compose for their SDUI renderer, which Compose’s reactive DSL made more manageable and maintainable. They found Compose to be well-suited for building dynamic, stateful UIs.
Challenges encountered included handling keyboard interactions smoothly with scrolling content and managing the complexity of deeply nested conditional UIs.
When asked about open-sourcing their powerful form-rendering engine, Harun mentioned it’s a possibility they are considering, as the core logic is already modularized, and community input could be valuable. He also noted that while some pricing information is dynamic (e.g., based on farm size), they try to standardize core package prices to avoid confusion for farmers.

Harun Wangereka’s talk provided a compelling case study of how Kotlin and Jetpack Compose can be used to build sophisticated, resilient, and impactful Android applications that address real-world challenges in demanding environments.

Links:

PostHeaderIcon [PHPForumParis2023] Open/Closed Principle: Extend, Don’t Extends! – Thomas Dutrion

Thomas Dutrion, CTO and a passionate advocate for clean code, presented an engaging session at Forum PHP 2023 on the Open/Closed Principle, a cornerstone of the SOLID principles. With a playful nod to avoiding PHP’s extends keyword, Thomas clarified how to design extensible systems without relying on inheritance. His talk, infused with practical examples and a call for team collaboration, offered PHP developers a clear framework for building flexible, maintainable codebases.

Demystifying the Open/Closed Principle

Thomas began by explaining the Open/Closed Principle, which states that software entities should be open for extension but closed for modification. He emphasized that this principle enables developers to add new functionality without altering existing code, reducing the risk of introducing bugs. Using relatable analogies, Thomas distinguished between “extending” a system’s behavior through design patterns and the pitfalls of using PHP’s extends for inheritance, which can lead to rigid, tightly coupled code.

Practical Techniques for Extension

Delving into implementation, Thomas showcased techniques like decorators and callbacks to achieve extensibility. He provided examples of middleware patterns, such as those defined in PSR-15, where request handlers are passed through a stack of processes, allowing behavior to be extended dynamically. Thomas cautioned against overly complex callback chains, advocating for clear, team-aligned designs. His demonstrations highlighted how these patterns maintain code stability while allowing for seamless enhancements.

Team Collaboration and Clarity

Concluding his talk, Thomas stressed the importance of team agreement when applying the Open/Closed Principle. He noted that patterns like decorators often rely on dependency injection, which can obscure implementation details unless well-documented. By advocating for clear communication and tools like event dispatchers, Thomas inspired developers to work collaboratively, ensuring extensible designs are both effective and understandable within their teams.

PostHeaderIcon [KotlinConf2023] KotlinConf’23 Closing Panel: Community Questions and Future Insights

KotlinConf’23 concluded with its traditional Closing Panel, an open forum where attendees could pose their burning questions to a diverse group of experts from the Kotlin community, including key figures from JetBrains and Google. The panel, moderated by Hadi Hariri, featured prominent names such as Roman Elizarov, Egor Tolstoy, Maxim Shafirov (CEO of JetBrains), Svetlana Isakova, Pamela Hill, Sebastian Aigner (all JetBrains), Grace Kloba, Kevin Galligan, David Blanc, Wenbo, Jeffrey van Gogh (all Google), Jake Wharton (Cash App), and Zac Sweers (Slack), among others.

The session was lively, covering a wide range of topics from language features and tooling to ecosystem development and the future of Kotlin across different platforms.

Kotlin’s Ambitions and Language Evolution

One of the initial questions addressed Kotlin’s overarching goal, humorously framed as whether Kotlin aims to “get rid of other programming languages”. Roman Elizarov quipped they only want to get rid of “bad ones,” while Egor Tolstoy clarified that Kotlin’s focus is primarily on application development (services, desktop, web, mobile) rather than systems programming.

Regarding Kotlin 2.0 and the possibility of removing features, the panel indicated a strong preference for maintaining backward compatibility. However, if a feature were to be considered for removal, it would likely be something with a clearly superior alternative, such as potentially older ways of doing things if newer, more robust mechanisms (like K2 compiler plugins replacing older KAPT mechanisms, hypothetically) became the standard. The discussion also touched on the desire for a unified, official Kotlin style guide and formatter to reduce community fragmentation around tooling, though Zac Sweers noted that even with an official tool, community alternatives would likely persist.

Multiplatform, Compose, and Ecosystem

A significant portion of the Q&A revolved around Kotlin Multiplatform (KMP) and Compose Multiplatform.
* Dart Interoperability: Questions arose about interoperability between Kotlin/Native (especially for Compose on iOS which uses Skia) and Dart/Flutter. While direct, deep interoperability wasn’t presented as a primary focus, the general sentiment was that both ecosystems are strong, and developers choose based on their needs. The panel emphasized that Compose for iOS aims for a native feel and deep integration with iOS platform features.
* Compose UI for iOS and Material Design: A recurring concern was whether Compose UI on iOS would feel “too Material Design” and not native enough for iOS users. Panelists from JetBrains and Google acknowledged this, stressing ongoing efforts to ensure Compose components on iOS adhere to Cupertino (iOS native) design principles and feel natural on the platform. Jake Wharton added that making Kotlin APIs feel idiomatic to iOS developers is crucial for adoption.
* Future of KMP: The panel expressed strong optimism for KMP’s future, highlighting its stability and growing library support. They see KMP becoming the default way to build applications when targeting multiple platforms with Kotlin. The focus is on making KMP robust and ensuring a great developer experience across all supported targets.

Performance, Tooling, and Emerging Areas

  • Build Times: Concerns about Kotlin/Native build times, especially for iOS, were acknowledged. The team is continuously working on improving compiler performance and reducing build times, with K2 expected to bring further optimizations.
  • Project Loom and Coroutines: Roman Elizarov reiterated points from his earlier talk, stating that Loom is excellent for migrating existing blocking Java code, while Kotlin Coroutines offer finer-grained control and structured concurrency, especially beneficial for UI and complex asynchronous workflows. They are not mutually exclusive and can coexist.
  • Kotlin in Gaming: While not a primary focus historically, the panel acknowledged growing interest and some community libraries for game development with Kotlin. The potential for KMP in this area was also noted.
  • Documentation: The importance of clear, comprehensive, and up-to-date documentation was a recurring theme, with the panel acknowledging it as an ongoing effort.
  • AI and Kotlin: When asked about AI taking developers’ jobs, Zac Sweers offered a pragmatic take: AI won’t take your job, but someone who knows how to use AI effectively might. The panel highlighted that Kotlin is well-suited for building AI tools and applications.

The panel concluded with the exciting reveal of Kotlin’s reimagined mascot, Kodee (spelled K-O-D-E-E), a cute, modern character designed to represent the language and its community. Pins of Kodee were made available to attendees, adding a fun, tangible takeaway to the conference’s close.

Links:

PostHeaderIcon [DevoxxBE2023] How Sand and Java Create the World’s Most Powerful Chips

Johan Janssen, an architect at ASML, captivated the DevoxxBE2023 audience with a deep dive into the intricate process of chip manufacturing and the role of Java in optimizing it. Johan, a seasoned speaker and JavaOne Rock Star, explained how ASML’s advanced lithography machines, powered by Java-based software, enable the creation of cutting-edge computer chips used in devices worldwide.

From Sand to Silicon Wafers

Johan began by demystifying chip production, starting with silica sand, an abundant resource transformed into silicon ingots and sliced into wafers. These wafers, approximately 30 cm in diameter, serve as the foundation for chips, hosting up to 600 chips per wafer or thousands for smaller sensors. He passed around a wafer adorned with Java’s mascot, Duke, illustrating the physical substrate of modern electronics.

The process involves printing multiple layers—up to 200—onto wafers using extreme ultraviolet (EUV) lithography machines. These machines, requiring four Boeing 747s for transport, achieve precision at the nanometer scale, with transistors as small as three nanometers. Johan likened this to driving a car 300 km and retracing the path with only 2 mm deviation, highlighting the extraordinary accuracy required.

The Role of EUV Lithography

Johan detailed the EUV lithography process, where tin droplets are hit by a 40-kilowatt laser to generate plasma at sun-like temperatures, producing EUV light. This light, directed by ultra-flat mirrors, patterns wafers through reticles costing €250,000 each. The process demands cleanroom environments, as even a single dust particle can ruin a chip, and involves continuous calibration to maintain precision across thousands of parameters.

ASML’s machines, some over 30 years old, remain in use for producing sensors and less advanced chips, demonstrating their longevity. Johan also previewed future advancements, such as high numerical aperture (NA) machines, which will enable even smaller transistors, further enhancing chip performance and energy efficiency.

Java-Powered Analytics Platform

At the heart of Johan’s talk was ASML’s Java-based analytics platform, which processes 31 terabytes of data weekly to optimize chip production. Built on Apache Spark, the platform distributes computations across worker nodes, supporting plugins for data ingestion, UI customization, and processing. These plugins allow departments to integrate diverse data types, from images to raw measurements, and support languages like Julia and C alongside Java.

The platform, running on-premise to protect sensitive data, consolidates previously disparate applications, improving efficiency and user experience. Johan highlighted a machine learning use case where the platform increased defect detection from 70% to 92% without slowing production, showcasing Java’s role in handling complex computations.

Challenges and Solutions in Chip Manufacturing

Johan discussed challenges like layer misalignment, which can cause short circuits or defective chips. The platform addresses these by analyzing wafer plots to identify correctable errors, such as adjusting subsequent layers to compensate for misalignments. Non-correctable errors may result in downgrading chips (e.g., from 16 GB to 8 GB RAM), ensuring minimal waste.

He emphasized a pragmatic approach to tool selection, starting with REST endpoints and gradually adopting Kafka for streaming data as needs evolved. Johan also noted ASML’s collaboration with tool maintainers to enhance compatibility, such as improving Spark’s progress tracking for customer feedback.

Future of Chip Manufacturing

Looking ahead, Johan highlighted the industry’s push to diversify chip production beyond Taiwan, driven by geopolitical and economic factors. However, building new factories, or “fabs,” costing $10–20 billion, faces challenges like equipment backlogs and the need for highly skilled operators. ASML’s customer support teams, working alongside clients like Intel, underscore the specialized knowledge required.

Johan concluded by stressing the importance of a forward-looking mindset, with ASML’s roadmap prioritizing innovation over rigid methodologies. This approach, combined with Java’s robustness, ensures the platform’s scalability and adaptability in a rapidly evolving industry.

Links:

PostHeaderIcon [PHPForumParis2023] Webperf: Boost Your PHP Apps with HTTP 103 Early Hints – Kévin Dunglas

Kévin Dunglas, co-founder of Les-Tilleuls.coop and creator of API Platform, delivered a dynamic session at Forum PHP 2023 on leveraging HTTP 103 Early Hints to enhance web performance in PHP applications. Drawing from his extensive experience in the PHP ecosystem and inspiration from technologies like Go and the Caddy web server, Kévin explored how this HTTP status code optimizes page load times. His talk provided actionable insights for developers seeking to improve user experiences through cutting-edge web protocols.

Understanding HTTP 103 Early Hints

Kévin introduced HTTP 103 Early Hints, a status code that allows servers to preemptively inform browsers about critical resources, such as CSS or JavaScript files, before the main response is fully processed. Unlike server push, which sends resources directly, Early Hints enables browsers to check their cache, reducing unnecessary data transfers. Kévin explained how this mechanism, supported by modern browsers, enhances performance by initiating resource fetching earlier, particularly for PHP applications built with frameworks like Symfony.

Practical Implementation in PHP

Delving into implementation, Kévin demonstrated how to integrate HTTP 103 Early Hints into PHP applications, using examples from API Platform. He highlighted the role of reverse proxies like Vulcain, developed in collaboration with Google, to enable Early Hints for web APIs. By showing how to configure servers to send these hints, Kévin illustrated their impact on reducing latency, especially for front-end and API-driven applications. His practical examples made the concept accessible, encouraging developers to adopt this technique.

Future Potential and Collaboration

Kévin concluded by discussing ongoing efforts to expand Early Hints’ applicability, particularly for APIs, through contributions from developers like Robin. He emphasized the collaborative nature of open-source projects, urging the PHP community to contribute to tools like Vulcain. By highlighting the performance benefits and ease of integration, Kévin inspired developers at Les-Tilleuls.coop and beyond to explore this emerging standard, enhancing the speed and efficiency of their applications.

Links: