Recent Posts
Archives

Archive for the ‘en-US’ Category

PostHeaderIcon Gradle: A Love-Hate Journey at Margot Bank

At Devoxx France 2019, David Wursteisen and Jérémy Martinez, developers at Margot Bank, delivered a candid talk on their experience with Gradle while building a core banking system from scratch. Their 45-minute session, “Gradle, je t’aime: moi non plus,” explored why they chose Gradle over alternatives, its developer-friendly features, script maintenance strategies, and persistent challenges like memory consumption. This post dives into their insights, offering a comprehensive guide for developers navigating build tools in complex projects.

Choosing Gradle for a Modern Banking System

Margot Bank, a startup redefining corporate banking, embarked on an ambitious project in 2017 to rebuild its IT infrastructure, including a core banking system (CBS) with Kotlin and Java modules. The CBS comprised applications for payments, data management, and a central “core” module, all orchestrated with microservices. Selecting a build tool was critical, given the need for speed, flexibility, and scalability. The team evaluated Maven, SBT, Bazel, and Gradle. Maven, widely used in Java ecosystems, lacked frequent updates, risking obsolescence. SBT’s Scala-based DSL added complexity, unsuitable for a Kotlin-focused stack. Bazel, while powerful for monorepos, didn’t support generic languages well. Gradle emerged as the winner, thanks to its task-based architecture, where tasks like compilejar, and assemble form a dependency graph, executing only modified components. This incremental build system saved time, crucial for Margot’s rapid iterations. Frequent releases (e.g., Gradle 5.1.1 in 2019) and a dynamic Groovy DSL further cemented its appeal, aligning with Devoxx’s emphasis on modern build tools.

Streamlining Development with Gradle’s Features

Gradle’s developer experience shone at Margot Bank, particularly with IntelliJ IDEA integration. The IDE auto-detected source sets (e.g., maintestintegrationTest) and tasks, enabling seamless task execution. Eclipse support, though less polished, handled basic imports. The Gradle Wrapper, a binary committed to repositories, automated setup by downloading the specified Gradle version (e.g., 5.1.1) from a custom URL, secured with checksums. This ensured consistency across developer machines, a boon for onboarding. Dependency management leveraged dynamic configurations like api and implementation. For example, marking a third-party client like AmazingMail as implementation in a web app module hid its classes from transitive dependencies, reducing coupling. Composite builds, introduced in recent Gradle versions, allowed local projects (e.g., a mailer module) to be linked without publishing to Maven Local, streamlining multi-project workflows. A notable pain point was disk usage: open-source projects’ varying Gradle versions accumulated 4GB on developers’ machines, as IntelliJ redundantly downloaded sources alongside binaries. Addressing an audience question, the team emphasized selective caching (e.g., wrapper binaries) to mitigate overhead, highlighting Gradle’s balance of power and complexity.

Enhancing Builds with Plugins and Kotlin DSL

For script maintainers, standardizing configurations across Margot’s projects was paramount. The team developed an internal Gradle plugin to centralize settings for linting (e.g., Ktlint), Nexus repositories, and releases. Applied via apply plugin: 'com.margotbank.standard', it ensured uniformity, reducing configuration drift. For project-specific logic, buildSrc proved revolutionary. This module housed Kotlin code for tasks like version management, keeping build.gradle files declarative. For instance, a Versions.kt object centralized dependency versions (e.g., junit:5.3.1), with unused ones grayed out in IntelliJ for cleanup. Migrating from Groovy to Kotlin DSL brought static typing benefits: autocompletion, refactoring, and navigation. A sourceSet.create("integrationTest") call, though verbose, clarified intent compared to Groovy’s dynamic integrationTest {}. Migration was iterative, file by file, avoiding disruptions. Challenges included verbose syntax for plugins like JaCoCo, requiring explicit casts. A buildSrc extension for commit message parsing (e.g., extracting Git SHAs) exemplified declarative simplicity. This approach, inspired by Devoxx’s focus on maintainable scripts, empowered developers to contribute to shared tooling, fostering collaboration across teams.

Gradle’s performance, driven by daemons that keep processes in memory, was a double-edged sword. Daemons reduced startup time, but multiple instances (e.g., 5.1.1 and 5.0.10) occasionally ran concurrently, consuming excessive RAM. On CI servers, Gradle crashed under heavy loads, prompting tweaks: disabling daemons, adjusting Docker memory, and upgrading to Gradle 4.4.5 for better memory optimization. Diagnostics remained elusive, as crashes stemmed from either Gradle or the Kotlin compiler. Configuration tweaks like enabling caching (org.gradle.caching=true) and parallel task execution (org.gradle.parallel=true) improved build times, but required careful tuning. The team allocated maximum heap space (-Xmx4g) upfront to handle large builds, reflecting Margot’s resource-intensive CI pipeline. An audience question on caching underscored selective imports (e.g., excluding redundant sources) to optimize costs. Looking ahead, Margot planned to leverage build caching for granular task reuse and explore tools like Build Queue for cleaner pipelines. Despite frustrations, Gradle’s flexibility and evolving features—showcased at Devoxx—made it indispensable, though memory management demanded ongoing vigilance.

Links :

Hashtags: #Gradle #KotlinDSL #BuildTools #DavidWursteisen #JeremyMartinez #DevoxxFrance2019

PostHeaderIcon [VivaTech 2018] How VCs Are Growing Tomorrow’s Euro-corns

Philippe Botteri and Bernard Liautaud, moderated by Emmanuelle Duten, Editor-in-Chief at Les Echos/Capital Finance, explored how venture capitalists foster European unicorns at VivaTech 2018. Recorded in Paris and available on YouTube, this panel from Accel and Balderton Capital discusses creating ecosystems for startups to thrive. This post, with three subsections, delves into their strategies. Connect with Philippe on LinkedIn and Bernard on LinkedIn. Visit VivaTech.

Building a Robust Ecosystem

Philippe highlights Europe’s progress in producing high-value exits, citing Spotify’s $30 billion exit in 2018, surpassing U.S. (Dropbox, $12 billion) and Asian (Flipkart, $20 billion) counterparts. This shift reflects a maturing ecosystem, with Europe’s 25 unicorns trailing the U.S.’s 110 and China’s 60. Bernard emphasizes the ecosystem’s growth since 2006, driven by experienced entrepreneurs, global VCs, and ambitious talent. Top French universities now see 20-30% of graduates joining startups, not banks, signaling a cultural shift toward innovation.

The availability of capital, especially at early stages, supports this growth. Bernard notes that European funds have tripled in size, fostering competition and higher valuations. However, late-stage funding lags, with European champions raising $1.5 billion on average compared to $7.5 billion for U.S. firms. Philippe sees this as a maturity gap, not a failure, with Europe catching up rapidly through global ambitions and talent influx.

Prioritizing Sustainable Growth

Bernard argues that unicorn status, often driven by a single investor’s enthusiasm, is a misleading metric. He advocates focusing on revenue and long-term impact, aiming to build companies with $1-10 billion in revenue over 10-15 years. A billion-dollar valuation doesn’t guarantee sustainability; some firms reach $1 billion with just $50 million in revenue, only to be acquired. Spotify, generating over $1 billion quarterly, exemplifies the ideal: a scalable, high-revenue business.

Philippe counters that valuations reflect potential, not current worth. A $1 billion price tag signals a VC’s belief in a $5-10 billion future outcome, balanced against the risk of failure. Rapid technology adoption drives larger outcomes, justifying higher valuations. Both agree that sustainable growth requires aligning capital, talent, and ambition to create enduring European giants, not fleeting unicorns.

The influx of late-stage capital, like SoftBank’s $100 billion Vision Fund, creates winner-takes-all dynamics in certain sectors. Bernard notes this gives funded companies an edge but doesn’t suit all industries, where multiple players can coexist. Philippe emphasizes liquidity for employees and founders, critical for retaining talent. Late-stage rounds and secondary sales provide this, delaying IPOs but ensuring stakeholders benefit.

Emmanuelle raises audience concerns about overvaluation and bubbles. Both panelists dismiss bubble fears, describing the market as vibrant, not overheated. Philippe notes that competition on hot deals may inflate valuations, but real metrics—consumer and B2B growth—underpin most successes. Bernard predicts cyclical downturns but sees no systemic risk, with Europe’s ecosystem poised to produce innovative, global leaders.

Hashtags: #VivaTech #PhilippeBotteri #BernardLiautaud #Accel #BaldertonCapital #Unicorns #VentureCapital #Spotify #EuropeanStartups


PostHeaderIcon [DevoxxFR 2018] Java in Docker: Best Practices for Production

The practice of running Java applications within Docker containers has become widely adopted in modern software deployment, yet it is not devoid of potential challenges, particularly when transitioning to production environments. Charles Sabourdin, a freelance architect, and Jean-Christophe Sirot, an engineer at Docker, collaborated at DevoxxFR2018 to share their valuable experiences and disseminate best practices for optimizing Java applications inside Docker containers. Their insightful talk directly addressed common and often frustrating issues, such as containers crashing unexpectedly, applications consuming excessive RAM leading to node instability, and encountering CPU throttling. They offered practical solutions and configurations aimed at ensuring smoother and more reliable production deployments for Java workloads.

The presenters initiated their session with a touch of humor, explaining why operations teams might exhibit a degree of apprehension when tasked with deploying a containerized Java application into a production setting. It’s a common scenario: containers that perform flawlessly on a developer’s local machine can begin to behave erratically or fail outright in production. This discrepancy often stems from a fundamental misunderstanding of how the Java Virtual Machine (JVM) interacts with the resource limits imposed by the container’s control groups (cgroups). Several key problems frequently surface in this context. Perhaps the most common is memory mismanagement; the JVM, particularly older versions, might not be inherently aware of the memory limits defined for its container by the cgroup. This lack of awareness can lead the JVM to attempt to allocate and use more memory than has been allocated to the container by the orchestrator or runtime. Such overconsumption inevitably results in the container being abruptly terminated by the operating system’s Out-Of-Memory (OOM) killer, a situation that can be difficult to diagnose without understanding this interaction.

Similarly, CPU resource allocation can present challenges. The JVM might not accurately perceive the CPU resources available to it within the container, such as CPU shares or quotas defined by cgroups. This can lead to suboptimal decisions in sizing internal thread pools (like the common ForkJoinPool or garbage collection threads) or can cause the application to experience unexpected CPU throttling, impacting performance. Another frequent issue is Docker image bloat. Overly large Docker images not only increase deployment times across the infrastructure but also expand the potential attack surface by including unnecessary libraries or tools, thereby posing security vulnerabilities. The talk aimed to equip developers and operations personnel with the knowledge to anticipate and mitigate these common pitfalls. During the presentation, a demonstration application, humorously named “ressources-munger,” was used to simulate these problems, clearly showing how an application could consume excessive memory leading to an OOM kill by Docker, or how it might trigger excessive swapping if not configured correctly, severely degrading performance.

JVM Memory Management and CPU Considerations within Containers

A significant portion of the discussion was dedicated to the intricacies of JVM memory management within the containerized environment. Charles and Jean-Christophe elaborated that older JVM versions, specifically those prior to Java 8 update 131 and Java 9, were not inherently “cgroup-aware”. This lack of awareness meant that the JVM’s default heap sizing heuristics—for example, typically allocating up to one-quarter of the physical host’s memory for the heap—would be based on the total resources of the host machine rather than the specific limits imposed on the container by its cgroup. This behavior is a primary contributor to unexpected OOM kills when the container’s actual memory limit is much lower than what the JVM assumes based on the host.

Several best practices were shared to address these memory-related issues effectively. The foremost recommendation is to use cgroup-aware JVM versions. Modern Java releases, particularly Java 8 update 191 and later, and Java 10 and newer, incorporate significantly improved cgroup awareness. For older Java 8 updates (specifically 8u131 to 8u190), experimental flags such as -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap can be employed to enable the JVM to better respect container memory limits. In Java 10 and subsequent versions, this behavior became standard and often requires no special flags. However, even with cgroup-aware JVMs, explicitly setting the heap size using parameters like -Xms for the initial heap size and -Xmx for the maximum heap size is frequently a recommended practice for predictability and control. Newer JVMs also offer options like -XX:MaxRAMPercentage, allowing for more dynamic heap sizing relative to the container’s allocated memory. It’s crucial to understand that the JVM’s total memory footprint extends beyond just the heap; it also requires memory for metaspace (which replaced PermGen in Java 8+), thread stacks, native libraries, and direct memory buffers. Therefore, when allocating memory to a container, it is essential to account for this total footprint, not merely the -Xmx value. A common guideline suggests that the Java heap might constitute around 50-75% of the total memory allocated to the container, with the remainder reserved for these other essential JVM components and any other processes running within the container. Tuning metaspace parameters, such as -XX:MetaspaceSize and -XX:MaxMetaspaceSize, can also prevent excessive native memory consumption, particularly in applications that dynamically load many classes.

Regarding CPU resources, the presenters noted that the JVM’s perception of available processors is also influenced by its cgroup awareness. In environments where CPU resources are constrained, using flags like -XX:ActiveProcessorCount can be beneficial to explicitly inform the JVM about the number of CPUs it should consider for sizing its internal thread pools, such as the common ForkJoinPool or the threads used for garbage collection. Optimizing the Docker image itself is another critical aspect of preparing Java applications for production. This involves choosing a minimal base image, such as alpine-jre, distroless, or official “slim” JRE images, instead of a full operating system distribution, to reduce the image size and potential attack surface. Utilizing multi-stage builds in the Dockerfile is a highly recommended technique; this allows developers to use a larger image containing build tools like Maven or Gradle and a full JDK in an initial stage, and then copy only the necessary application artifacts (like the JAR file) and a minimal JRE into a final, much smaller runtime image. Furthermore, being mindful of Docker image layering by combining related commands in the Dockerfile where possible can help reduce the number of layers and optimize image size. For applications on Java 9 and later, tools like jlink can be used to create custom, minimal JVM runtimes that include only the Java modules specifically required by the application, further reducing the image footprint. The session strongly emphasized that a collaborative approach between development and operations teams, combined with a thorough understanding of both JVM internals and Docker containerization principles, is paramount for successfully and reliably running Java applications in production environments.

Links:

Hashtags: #Java #Docker #JVM #Containerization #DevOps #Performance #MemoryManagement #DevoxxFR2018 #CharlesSabourdin #JeanChristopheSirot #BestPractices #ProductionReady #CloudNative

PostHeaderIcon [DevoxxFR 2018] Watch Out! Don’t Plug in That USB, You Might Get Seriously Hacked!

The seemingly innocuous USB drive, a ubiquitous tool for data transfer and device charging, can harbor hidden dangers. At Devoxx France 2018, Aurélien Loyer and Nathan Damie, both from Zenika , delivered a cautionary and eye-opening presentation titled “Attention ! Ne mets pas cette clé tu risques de te faire hacker très fort !” (Watch Out! Don’t Plug in That USB, You Might Get Seriously Hacked!). They demonstrated how easily a modified USB device, commonly known as a “BadUSB,” can be used to execute arbitrary code on a victim’s computer, often by emulating a keyboard.

The speakers explained that these malicious devices often don’t rely on exploiting software vulnerabilities. Instead, they leverage a fundamental trust computers place in Human Interface Devices (HIDs), such as keyboards and mice. A BadUSB device, disguised as a regular flash drive or even embedded within a cable or other peripheral, can announce itself to the operating system as a keyboard and then rapidly type and execute commands – all without the user’s direct interaction beyond plugging it in.

What is BadUSB and How Does It Work?

Aurélien Loyer and Nathan Damie explained that many BadUSB devices are not standard USB flash drives but rather contain small microcontrollers like the Adafruit Trinket or Pro Micro. These microcontrollers are programmed (often using the Arduino IDE ) to act as a Human Interface Device (HID), specifically a keyboard. When plugged into a computer, the operating system recognizes it as a keyboard and accepts input from it. The pre-programmed script on the microcontroller can then “type” a sequence of commands at high speed. This could involve opening a command prompt or terminal, downloading and executing malware from the internet, exfiltrating data, or performing other malicious actions.

The speakers demonstrated this live by plugging a device into a computer, which then automatically opened a text editor and typed a message, followed by executing commands to open a web browser and navigate to a specific URL. They showed the simplicity of the Arduino code required: essentially initializing the keyboard library and then sending keystroke commands (e.g., Keyboard.print(), Keyboard.press(), Keyboard.release()). More sophisticated attacks could involve a delay before execution, or triggering the payload based on certain conditions, making them harder to detect. Nathan even demonstrated a modified gaming controller that could harbor such a payload, executing it unexpectedly. The core danger lies in the fact that computers are generally designed to trust keyboard inputs without question.

Potential Dangers and Countermeasures

The implications of BadUSB attacks are significant. Aurélien and Nathan highlighted how easily these devices can be disguised. They showed examples of microcontrollers small enough to fit inside the plastic casing of a standard USB drive, or even integrated into USB cables or other peripherals like a mouse with a hidden logger. This makes visual inspection unreliable. The attack vector often relies on social engineering: an attacker might drop “lost” USB drives in a parking lot or other public area, hoping a curious individual will pick one up and plug it into their computer. Even seemingly harmless devices like e-cigarettes could potentially be weaponized if they contain a malicious microcontroller and are plugged into a USB port for charging.

As for countermeasures, the speakers emphasized caution. The most straightforward advice is to never plug in USB devices from unknown or untrusted sources. For situations where using an untrusted USB device for charging is unavoidable (though not recommended for data transfer), they mentioned “USB condoms” – small hardware dongles that physically block the data pins in a USB connection, allowing only power to pass through. However, this would render a data-carrying device like a flash drive unusable for its primary purpose. The session served as a stark reminder that physical security and user awareness are crucial components of overall cybersecurity, as even the most common peripherals can pose a threat.

Links:

Hashtags: #BadUSB #CyberSecurity #HardwareHacking #SocialEngineering #USB #Microcontroller #Arduino #Zenika #DevoxxFR2018 #PhysicalSecurity

PostHeaderIcon [DevoxxFR 2018] Are you “merge” or “rebase” oriented?

Git, the distributed version control system, has become an indispensable tool in the modern developer’s arsenal, revolutionizing how teams collaborate on code. Its flexibility and power, however, come with a degree of complexity that can initially intimidate newcomers, particularly when it comes to integrating changes from different branches. At Devoxx France 2018, Jonathan Detoeuf, a freelance developer with a passion for Software Craftsmanship and Agile methodologies, tackled one of Git’s most debated topics in his presentation: “T’es plutôt merge ou rebase ?” (Are you more of a merge or rebase person?). He aimed to demystify these two fundamental Git commands, explaining their respective use cases, how to avoid common pitfalls, and ultimately, how to maintain a clean, understandable project history.

Jonathan began by underscoring the importance of the Git log (history) as a “Tower of Babel” – a repository of the team’s collective knowledge, containing not just the current source code but the entire evolution of the project. Well-crafted commit messages and a clear history are crucial for understanding past decisions, tracking down bugs, and onboarding new team members. With this premise, the choice between merging and rebasing becomes more than just a technical preference; it’s about how clearly and effectively a team communicates its development story through its Git history. Jonathan’s talk provided practical guidance, moving beyond the often-unhelpful official Git documentation that offers freedom but little explicit recommendation on when to use which strategy.

Understanding the Basics: Merge vs. Rebase and History

Before diving into specific recommendations, Jonathan Detoeuf revisited the core mechanics of merging and rebasing, emphasizing their impact on project history. A standard merge (often a “merge commit” when branches have diverged) integrates changes from one branch into another by creating a new commit that has two parent commits. This explicitly shows where a feature branch was merged back into a main line, preserving the historical context of parallel development. A fast-forward merge, on the other hand, occurs if the target branch hasn’t diverged; Git simply moves the branch pointer forward. Rebasing, in contrast, re-applies commits from one branch onto the tip of another, creating a linear history as if the changes were made sequentially. This can make the history look cleaner but rewrites it, potentially losing the context of when changes were originally made in relation to other branches.

Jonathan stressed the value of a well-maintained Git history. It’s not just a log of changes but a narrative of the project’s development. Clear commit messages are vital as they convey the intent behind changes. A good history allows for “archaeology” – understanding why a particular piece of code exists or how a bug was introduced, even years later when the original developers are no longer around. Therefore, the decision to merge or rebase should be guided by the desire to create a history that is both accurate and easy to understand. He cautioned that many developers fear losing code with Git, especially during conflict resolution, making it important to master these integration techniques.

The Case for Merging: Durable Branches and Significant Events

Jonathan Detoeuf advocated for using merge commits (specifically, non-fast-forward merges) primarily for integrating “durable” branches or marking significant events in the project’s lifecycle. Durable branches are long-lived branches like main, develop, or release branches. When merging one durable branch into another (e.g., merging a release branch into main), a merge commit clearly signifies this integration point. Similarly, a merge commit is appropriate for marking key milestones such as the completion of a release, the end of a sprint, or a deployment to production. These merge commits act as explicit markers in the history, making it easy to see when major features or versions were incorporated.

He contrasted this with merging minor feature branches where a simple fast-forward merge might be acceptable if the history remains clear, or if a rebase is preferred for a cleaner linear history before the final integration. The key is that the merge commit should add value by highlighting a significant integration point or preserving the context of a substantial piece of work being completed. If it’s just integrating a pull request for a small, self-contained feature that has been reviewed, other strategies like rebase followed by a fast-forward merge, or even “squash and merge,” might be preferable to avoid cluttering the main line history with trivial merge bubbles. Jonathan’s advice leans towards using merge commits judiciously to preserve meaningful historical context, especially for branches that represent a significant body of work or a persistent line of development.

The Case for Rebasing: Feature Branches and Keeping it Clean

Rebasing, according to Jonathan Detoeuf, finds its primary utility when working with local or short-lived feature branches before they are shared or merged into a more permanent branch. When a developer is working on a feature and the main branch (e.g., develop or main) has advanced, rebasing the feature branch onto the latest state of the main branch can help incorporate upstream changes cleanly. This process rewrites the feature branch’s history by applying its commits one by one on top of the new base, resulting in a linear sequence of changes. This makes the feature branch appear as if it was developed sequentially after the latest changes on the main branch, which can simplify the final merge (often allowing a fast-forward merge) and lead to a cleaner, easier-to-read history on the main line.

Jonathan also highlighted git pull –rebase as a way to update a local branch with remote changes, avoiding unnecessary merge commits that can clutter the local history when simply trying to synchronize with colleagues’ work on the same branch. Furthermore, interactive rebase (git rebase -i) is a powerful tool for “cleaning up” the history of a feature branch before creating a pull request or merging. It allows developers to squash multiple work-in-progress commits into more meaningful, atomic commits, edit commit messages, reorder commits, or even remove unwanted ones. This careful curation of a feature branch’s history before integration ensures that the main project history remains coherent and valuable. However, a crucial rule for rebasing is to never rebase a branch that has already been pushed and is being used by others, as rewriting shared history can cause significant problems for collaborators. The decision-making flowchart Jonathan presented often guided towards rebasing for feature branches to integrate changes from a durable branch, or to clean up history before a fast-forward merge.

Best Practices and Conflict Avoidance

Beyond the when-to-merge-vs-rebase dilemma, Jonathan Detoeuf shared several best practices for smoother collaboration and conflict avoidance. Regularly committing small, atomic changes makes it easier to manage history and resolve conflicts if they arise. Communicating with team members about who is working on what can also prevent overlapping efforts on the same files. Structuring the application well, with clear separation of concerns into different files or modules, naturally reduces the likelihood of merge conflicts.

When conflicts do occur, understanding the changes using git diff and carefully resolving them is key. Jonathan also touched upon various Git workflows, such as feature branching, Gitflow, or trunk-based development, noting that the choice of merge/rebase strategy often aligns with the chosen workflow. For instance, the “feature merge” (or GitHub flow) often involves creating a feature branch, working on it, and then merging it back (often via a pull request, which might use a squash merge or a rebase-and-merge strategy depending on team conventions). He ultimately provided a decision tree to help developers choose: for durable branches, merging is generally preferred to integrate other durable branches or significant features. For feature branches, rebasing is often used to incorporate changes from durable branches or to clean up history before a fast-forward merge. The overarching goal is to maintain an informative and clean project history that serves the team well.

Hashtags: #Git #VersionControl #Merge #Rebase #SoftwareDevelopment #DevOps #JonathanDetoeuf #DevoxxFR2018 #GitWorkflow #SourceControl

PostHeaderIcon [DevoxxFR 2018] Apache Kafka: Beyond the Brokers – Exploring the Ecosystem

Apache Kafka is often recognized for its high-throughput, distributed messaging capabilities, but its power extends far beyond just the brokers. Florent Ramière from Confluent, a company significantly contributing to Kafka’s development, presented a comprehensive tour of the Kafka ecosystem at DevoxxFR2018. He aimed to showcase the array of open-source components that revolve around Kafka, enabling robust data integration, stream processing, and more.

Kafka Fundamentals and the Confluent Platform

Florent began with a quick refresher on Kafka’s core concept: an ordered, replayable log of messages (events) where consumers can read at their own pace from specific offsets. This design provides scalability, fault tolerance, and guaranteed ordering (within a partition), making it a cornerstone for event-driven architectures and handling massive data streams (Confluent sees clients handling up to 60 GB/s).

To get started, while Kafka involves several components like brokers and ZooKeeper, the Confluent Platform offers tools to simplify setup. The confluent CLI can start a local development environment with Kafka, ZooKeeper, Kafka SQL (ksqlDB), Schema Registry, and more with a single command. Docker images are also readily available for containerized deployments.

Kafka Connect: Bridging Kafka with External Systems

A significant part of the ecosystem is Kafka Connect, a framework for reliably streaming data between Kafka and other systems. Connectors act as sources (ingesting data into Kafka from databases, message queues, etc.) or sinks (exporting data from Kafka to data lakes, search indexes, analytics platforms, etc.). Florent highlighted the availability of numerous pre-built connectors for systems like JDBC databases, Elasticsearch, HDFS, S3, and Change Data Capture (CDC) tools.

He drew a parallel between Kafka Connect and Logstash, noting that while Logstash is excellent, Kafka Connect is designed as a distributed, fault-tolerant, and scalable service for these data integration tasks. It allows for transformations (e.g., filtering, renaming fields, anonymization) within the Connect pipeline via a REST API for configuration. This makes it a powerful tool for building data pipelines without writing extensive custom code.

Stream Processing with Kafka Streams and ksqlDB

Once data is in Kafka, processing it in real-time is often the next step. Kafka Streams is a client library for building stream processing applications directly in Java (or Scala). Unlike frameworks like Spark or Flink that often require separate processing clusters, Kafka Streams applications are standalone Java applications that read from Kafka, process data, and can write results back to Kafka or external systems. This simplifies deployment and monitoring. Kafka Streams provides rich DSL for operations like filtering, mapping, joining streams and tables (a table in Kafka Streams is a view of the latest value for each key in a stream), windowing, and managing state, all with exactly-once processing semantics.

For those who prefer SQL to Java/Scala, ksqlDB (formerly Kafka SQL or KSQL) offers a SQL-like declarative language to define stream processing logic on top of Kafka topics. Users can create streams and tables from Kafka topics, perform continuous queries (SELECT statements that run indefinitely, emitting results as new data arrives), joins, aggregations over windows, and write results to new Kafka topics. ksqlDB runs as a separate server and uses Kafka Streams internally. It also manages stateful operations by storing state in RocksDB and backing it up to Kafka topics for fault tolerance. Florent emphasized that while ksqlDB is powerful for many use cases, complex UDFs or very intricate logic might still be better suited for Kafka Streams directly.

Schema Management and Other Essential Tools

When dealing with structured data in Kafka, especially in evolving systems, schema management becomes crucial. The Confluent Schema Registry helps manage and enforce schemas (typically Avro, but also Protobuf and JSON Schema) for messages in Kafka topics. It ensures schema compatibility (e.g., backward, forward, full compatibility) as schemas evolve, preventing data quality issues and runtime errors in producers and consumers. REST Proxy allows non-JVM applications to produce and consume messages via HTTP. Kafka also ships with command-line tools for performance testing (e.g., kafka-producer-perf-test, kafka-consumer-perf-test), latency checking, and inspecting consumer group lags, which are vital for operations and troubleshooting. Effective monitoring, often using JMX metrics exposed by Kafka components fed into systems like Prometheus via JMX Exporter or Jolokia, is also critical for production deployments.

Florent concluded by encouraging exploration of the Confluent Platform demos and his “kafka-story” GitHub repository, which provide step-by-step examples of these ecosystem components.

Links:

Hashtags: #ApacheKafka #KafkaConnect #KafkaStreams #ksqlDB #Confluent #StreamProcessing #DataIntegration #DevoxxFR2018 #FlorentRamiere #EventDrivenArchitecture #Microservices #BigData

PostHeaderIcon [DevoxxFR 2018] Software Heritage: Preserving Humanity’s Software Legacy

Software is intricately woven into the fabric of our modern world, driving industry, fueling innovation, and forming a critical part of our scientific and cultural knowledge. Recognizing the profound importance of the source code that underpins this digital infrastructure, the Software Heritage initiative was launched. At Devoxx France 2018, Roberto Di Cosmo, a professor, director of Software Heritage, and affiliated with Inria , delivered an insightful talk titled “Software Heritage: Pourquoi et comment preserver le patrimoine logiciel de l’Humanite” (Software Heritage: Why and How to Preserve Humanity’s Software Legacy). He articulated the mission to collect, preserve, and share all publicly available software source code, creating a universal archive for future generations – a modern-day Library of Alexandria for software.

Di Cosmo began by emphasizing that source code is not just a set of instructions for computers; it’s a rich repository of human knowledge, ingenuity, and history. From complex algorithms to the subtle comments left by developers, source code tells a story of problem-solving and technological evolution. However, this invaluable heritage is fragile and at risk of being lost due to obsolete storage media, defunct projects, and disappearing hosting platforms.

The Mission: Collect, Preserve, Share

The core mission of Software Heritage, as outlined by Roberto Di Cosmo, is threefold: to collect, preserve, and make accessible the entirety of publicly available software source code. This ambitious undertaking aims to create a comprehensive and permanent archive – an “Internet Archive for source code” – safeguarding it from loss and ensuring it remains available for research, education, industrial development, and cultural understanding.

The collection process involves systematically identifying and archiving code from a vast array of sources, including forges like GitHub, GitLab, Bitbucket, institutional repositories like HAL, and package repositories such as Gitorious and Google Code (many of which are now defunct, highlighting the urgency). Preservation is a long-term commitment, requiring strategies to combat digital obsolescence and ensure the integrity and continued accessibility of the archived code over decades and even centuries. Sharing this knowledge involves providing tools and interfaces for researchers, developers, historians, and the general public to explore this vast repository, discover connections between projects, and trace the lineage of software. Di Cosmo stressed that this is not just about backing up code; it’s about building a structured, interconnected knowledge base.

Technical Challenges and Approach

The scale of this endeavor presents significant technical challenges. The sheer volume of source code is immense and constantly growing. Code exists in numerous version control systems (Git, Subversion, Mercurial, etc.) and packaging formats, each with its own metadata and history. To address this, Software Heritage has developed a sophisticated infrastructure capable of ingesting code from diverse origins and storing it in a universal, canonical format.

A key element of their technical approach is the use of a Merkle tree structure, similar to what Git uses. All software artifacts (files, directories, commits, revisions) are identified by cryptographic hashes of their content. This allows for massive deduplication (since identical files or code snippets are stored only once, regardless of how many projects they appear in) and ensures the integrity and verifiability of the archive. This graph-based model also allows for the reconstruction of the full development history of software projects and the relationships between them. Di Cosmo explained that this structure not only saves space but also provides a powerful way to navigate and understand the evolution of software. The entire infrastructure itself is open source.

A Universal Archive for All

Roberto Di Cosmo emphasized that Software Heritage is built as a common infrastructure for society, serving multiple purposes. For industry, it provides a reference point for existing code, preventing reinvention and facilitating reuse. For science, it offers a vast dataset for research on software engineering, programming languages, and the evolution of code, and is crucial for the reproducibility of research that relies on software. For education, it’s a rich learning resource. And for society as a whole, it preserves a vital part of our collective memory and technological heritage.

He concluded with a call to action, inviting individuals, institutions, and companies to support the initiative. This support can take many forms: contributing code from missing sources, helping to develop tools and connectors for different version control systems, providing financial sponsorship, or simply spreading the word about the importance of preserving our software legacy. Software Heritage aims to be a truly global and collaborative effort to ensure that the knowledge embedded in source code is not lost to time.

Links:

Hashtags: #SoftwareHeritage #OpenSource #Archive #DigitalPreservation #SourceCode #CulturalHeritage #RobertoDiCosmo #Inria #DevoxxFR2018

PostHeaderIcon [DevoxxFR 2018] Deploying Microservices on AWS: Compute Options Explored at Devoxx France 2018

At Devoxx France 2018, Arun Gupta and Tiffany Jernigan, both from Amazon Web Services (AWS), delivered a three-hour deep-dive session titled Compute options for Microservices on AWS. This hands-on tutorial explored deploying a microservices-based application using various AWS compute options: EC2, Amazon Elastic Container Service (ECS), AWS Fargate, Elastic Kubernetes Service (EKS), and AWS Lambda. Through a sample application with web app, greeting, and name microservices, they demonstrated local testing, deployment pipelines, service discovery, monitoring, and canary deployments. The session, rich with code demos, is available on YouTube, with code and slides on GitHub.

Microservices: Solving Business Problems

Arun Gupta opened by addressing the monolith vs. microservices debate, emphasizing that the choice depends on business needs. Microservices enable agility, frequent releases, and polyglot environments but introduce complexity. AWS simplifies this with managed services, allowing developers to focus on business logic. The demo application featured three microservices: a public-facing web app, and internal greeting and name services, communicating via REST endpoints. Built with WildFly Swarm, a Java EE-compliant server, the application produced a portable fat JAR, deployable as a container or Lambda function. The presenters highlighted service discovery, ensuring the web app could locate stateless instances of greeting and name services.

EC2: Full Control for Traditional Deployments

Amazon EC2 offers developers complete control over virtual machines, ideal for those needing to manage the full stack. The presenters deployed the microservices on EC2 instances, running WildFly Swarm JARs. Using Maven and a Docker profile, they generated container images, pushed to Docker Hub, and tested locally with Docker Compose. A docker stack deploy command spun up the services, accessible via curl localhost:8080, returning responses like “hello Sheldon.” EC2 requires manual scaling and cluster management, but its flexibility suits custom stacks. The GitHub repo includes configurations for EC2 deployments, showcasing integration with AWS services like CloudWatch for logging.

Amazon ECS: Orchestrating Containers

Amazon ECS simplifies container orchestration, managing scheduling and scaling. The presenters created an ECS cluster in the AWS Management Console, defining task definitions for the three microservices. Task definitions specified container images, CPU, and memory, with an Application Load Balancer (ALB) enabling path-based routing (e.g., /resources/greeting). Using the ECS CLI, they deployed services, ensuring high availability across multiple availability zones. CloudWatch integration provided metrics and logs, with alarms for monitoring. ECS reduces operational overhead compared to EC2, balancing control and automation. The session highlighted ECS’s deep integration with AWS services, streamlining production workloads.

AWS Fargate: Serverless Containers

Introduced at re:Invent 2017, AWS Fargate abstracts server management, allowing developers to focus on containers. The presenters deployed the same microservices using Fargate, specifying task definitions with AWS VPC networking for fine-grained security. The Fargate CLI, a GitHub project by AWS’s John Pignata, simplified setup, creating ALBs and task definitions automatically. A curl to the load balancer URL returned responses like “howdy Penny.” Fargate’s per-second billing and task-level resource allocation optimize costs. Available initially in US East (N. Virginia), Fargate suits developers prioritizing simplicity. The session emphasized its role in reducing infrastructure management.

Elastic Kubernetes Service (EKS): Kubernetes on AWS

EKS, in preview during the session, brings managed Kubernetes to AWS. The presenters deployed the microservices on an EKS cluster, using kubectl to manage pods and services. They introduced Istio, a service mesh, to handle traffic routing and observability. Istio’s sidecar containers enabled 50/50 traffic splits between “hello” and “howdy” versions of the greeting service, configured via YAML manifests. Chaos engineering was demonstrated by injecting 5-second delays in 10% of requests, testing resilience. AWS X-Ray, integrated via a daemon set, provided service maps and traces, identifying bottlenecks. EKS, later supporting Fargate, offers flexibility for Kubernetes users. The GitHub repo includes EKS manifests and Istio configurations.

AWS Lambda: Serverless Microservices

AWS Lambda enables serverless deployments, eliminating server management. The presenters repurposed the WildFly Swarm application for Lambda, using the Serverless Application Model (SAM). Each microservice became a Lambda function, fronted by API Gateway endpoints (e.g., /greeting). SAM templates defined functions, APIs, and DynamoDB tables, with sam local start-api testing endpoints locally via Dockerized Lambda runtimes. Responses like “howdy Sheldon” were verified with curl localhost:3000. SAM’s package and deploy commands uploaded functions to S3, while canary deployments shifted traffic (e.g., 10% to new versions) with CloudWatch alarms. Lambda’s per-second billing and 300-second execution limit suit event-driven workloads. The session showcased SAM’s integration with AWS services and the Serverless Application Repository.

Deployment Pipelines: Automating with AWS CodePipeline

The presenters built a deployment pipeline using AWS CodePipeline, a managed service inspired by Amazon’s internal tooling. A GitHub push triggered the pipeline, which used CodeCommit to build Docker images, pushed them to Amazon Elastic Container Registry (ECR), and deployed to an ECS cluster. For Lambda, SAM templates were packaged and deployed. CloudFormation templates automated resource creation, including VPCs, subnets, and ALBs. The pipeline ensured immutable deployments with commit-based image tags, maintaining production stability. The GitHub repo provides CloudFormation scripts, enabling reproducible environments. This approach minimizes manual intervention, supporting rapid iteration.

Monitoring and Logging: AWS X-Ray and CloudWatch

Monitoring was a key focus, with AWS X-Ray providing end-to-end tracing. In ECS and EKS, X-Ray daemons collected traces, generating service maps showing web app, greeting, and name interactions. For Lambda, X-Ray was enabled natively via SAM templates. CloudWatch offered metrics (e.g., CPU usage) and logs, with alarms for thresholds. In EKS, Kubernetes tools like Prometheus and Grafana were mentioned, but X-Ray’s integration with AWS services was emphasized. The presenters demonstrated debugging Lambda functions locally using SAM CLI and IntelliJ, enhancing developer agility. These tools ensure observability, critical for distributed microservices.

Choosing the Right Compute Option

The session concluded by comparing compute options. EC2 offers maximum control but requires managing scaling and updates. ECS balances automation and flexibility, ideal for containerized workloads. Fargate eliminates server management, suiting simple deployments. EKS caters to Kubernetes users, with Istio enhancing observability. Lambda, best for event-driven microservices, minimizes operational overhead but has execution limits. Factors like team expertise, application requirements, and cost influence the choice. The presenters encouraged feedback via GitHub issues to shape AWS’s roadmap. Visit aws.amazon.com/containers for more.

Links:

Hashtags: #AWS #Microservices #ECS #Fargate #EKS #Lambda #DevoxxFR2018 #ArunGupta #TiffanyJernigan #CloudComputing

PostHeaderIcon [DevoxxFR 2017] Introduction to the Philosophy of Artificial Intelligence

The rapid advancements and increasing integration of artificial intelligence into various aspects of our lives raise fundamental questions that extend beyond the purely technical realm into the domain of philosophy. As machines become capable of performing tasks that were once considered uniquely human, such as understanding language, recognizing patterns, and making decisions, we are prompted to reconsider our definitions of intelligence, consciousness, and even what it means to be human. At DevoxxFR 2017, Eric Lefevre Ardant and Sonia Ouchtar offered a thought-provoking introduction to the philosophy of artificial intelligence, exploring key concepts and thought experiments that challenge our understanding of machine intelligence and its potential implications.

Eric and Sonia began by acknowledging the pervasive presence of “AI” in contemporary discourse, noting that the term is often used broadly to encompass everything from simple algorithms to hypothetical future superintelligence. They stressed the importance of developing a critical perspective on these discussions and acquiring the vocabulary necessary to engage with the deeper philosophical questions surrounding AI. Their talk aimed to move beyond the hype and delve into the core questions that philosophers have grappled with as the possibility of machine intelligence has become more concrete.

The Turing Test: A Criterion for Machine Intelligence?

A central focus of the presentation was the Turing Test, proposed by Alan Turing in 1950 as a way to determine if a machine can exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human. Eric and Sonia explained the setup of the test, which involves a human interrogator interacting with both a human and a machine through text-based conversations. If the interrogator cannot reliably distinguish the machine from the human after a series of conversations, the machine is said to have passed the Turing Test.

They discussed the principles behind the test, highlighting that it focuses on observable behavior (linguistic communication) rather than the internal workings of the machine. The Turing Test has been influential but also widely debated. Eric and Sonia presented some of the key criticisms of the test, such as the argument that simulating intelligent conversation does not necessarily imply true understanding or consciousness.

The Chinese Room Argument: Challenging the Turing Test

To further explore the limitations of the Turing Test and the complexities of defining machine intelligence, Eric and Sonia introduced John Searle’s Chinese Room argument, a famous thought experiment proposed in 1980. They described the scenario: a person who does not understand Chinese is locked in a room with a large set of Chinese symbols, a rulebook in English for manipulating these symbols, and incoming batches of Chinese symbols (representing questions). By following the rules in the rulebook, the person can produce outgoing batches of Chinese symbols (representing answers) that are appropriate responses to the incoming questions, making it appear to an outside observer that the person understands Chinese.

Sonia and Eric explained that Searle’s argument is that even if the person in the room can pass the Turing Test for understanding Chinese (by producing seemingly intelligent responses), they do not actually understand Chinese. They are simply manipulating symbols according to rules, without any genuine semantic understanding. The Chinese Room argument is a direct challenge to the idea that passing the Turing Test is a sufficient criterion for claiming a machine possesses true intelligence or understanding. It raises profound questions about the nature of understanding, consciousness, and whether symbolic manipulation alone can give rise to genuine cognitive states.

The talk concluded by emphasizing that the philosophy of AI is a fertile and ongoing area of inquiry with deep connections to various other disciplines, including neuroscience, psychology, linguistics, and computer science. Eric and Sonia encouraged attendees to continue exploring these philosophical questions, recognizing that understanding the fundamental nature of intelligence, both human and artificial, is crucial as we continue to develop increasingly capable machines. The session provided a valuable framework for critically evaluating claims about AI and engaging with the ethical and philosophical implications of artificial intelligence.

Hashtags: #AI #ArtificialIntelligence #Philosophy #TuringTest #ChineseRoom #MachineIntelligence #Consciousness #EricLefevreArdant #SoniaOUCHTAR #PhilosophyOfAI


PostHeaderIcon [DevoxxUS2017] Java EE 8: Adapting to Cloud and Microservices

At DevoxxUS2017, Linda De Michiel, a pivotal figure in the Java EE architecture team and Specification Lead for the Java EE Platform at Oracle, delivered a comprehensive overview of Java EE 8’s development. With her extensive experience since 1997, Linda highlighted the platform’s evolution to embrace cloud computing and microservices, aligning with modern industry trends. Her presentation detailed updates to existing Java Specification Requests (JSRs) and introduced new ones, while also previewing plans for Java EE 9. This post explores the key themes of Linda’s talk, emphasizing Java EE 8’s role in modern enterprise development.

Evolution from Java EE 7

Linda began by reflecting on Java EE 7, which focused on HTML5 support, modernized web-tier APIs, and simplified development through Context and Dependency Injection (CDI). Building on this foundation, Java EE 8 shifts toward cloud-native and microservices architectures. Linda noted that emerging trends, such as containerized deployments and distributed systems, influenced the platform’s direction. By enhancing CDI and introducing new APIs, Java EE 8 aims to streamline development for scalable, cloud-based applications, ensuring developers can build robust systems that meet contemporary demands.

Enhancements to Core JSRs

A significant portion of Linda’s talk focused on updates to existing JSRs, including CDI 2.0, JSON Binding (JSON-B), JSON Processing (JSON-P), and JAX-RS. She announced that CDI 2.0 had unanimously passed its public review ballot, a milestone for the expert group. JSON-B and JSON-P, crucial for data interchange in modern applications, have reached proposed final draft stages, while JAX-RS enhances RESTful services with reactive programming support. Linda highlighted the open-source nature of these implementations, such as GlassFish and Jersey, encouraging community contributions to refine these APIs for enterprise use.

New APIs for Modern Challenges

Java EE 8 introduces new JSRs to address cloud and microservices requirements, notably the Security API. Linda discussed its early draft review, which aims to standardize authentication and authorization across distributed systems. Servlet and JSF updates are also progressing, with JSF nearing final release. These APIs enable developers to build secure, scalable applications suited for microservices architectures. Linda emphasized the platform’s aggressive timeline for a summer release, underscoring the community’s commitment to delivering production-ready solutions that align with industry shifts toward cloud and container technologies.

Community Engagement and Future Directions

Linda stressed the importance of community feedback, directing developers to the Java EE specification project on java.net for JSR details and user groups. She highlighted the Adopt-a-JSR program, led by advocates like Heather VanCura, as a channel for aggregating feedback to expert groups. Looking ahead, Linda briefly outlined Java EE 9’s focus on further cloud integration and modularity. By inviting contributions through open-source platforms like GlassFish, Linda encouraged developers to shape the platform’s future, ensuring Java EE remains relevant in a rapidly evolving technological landscape.

Links: