Recent Posts
Archives

Posts Tagged ‘Devoxx’

PostHeaderIcon [DevoxxFR 2018] Java in Docker: Best Practices for Production

The practice of running Java applications within Docker containers has become widely adopted in modern software deployment, yet it is not devoid of potential challenges, particularly when transitioning to production environments. Charles Sabourdin, a freelance architect, and Jean-Christophe Sirot, an engineer at Docker, collaborated at DevoxxFR2018 to share their valuable experiences and disseminate best practices for optimizing Java applications inside Docker containers. Their insightful talk directly addressed common and often frustrating issues, such as containers crashing unexpectedly, applications consuming excessive RAM leading to node instability, and encountering CPU throttling. They offered practical solutions and configurations aimed at ensuring smoother and more reliable production deployments for Java workloads.

The presenters initiated their session with a touch of humor, explaining why operations teams might exhibit a degree of apprehension when tasked with deploying a containerized Java application into a production setting. It’s a common scenario: containers that perform flawlessly on a developer’s local machine can begin to behave erratically or fail outright in production. This discrepancy often stems from a fundamental misunderstanding of how the Java Virtual Machine (JVM) interacts with the resource limits imposed by the container’s control groups (cgroups). Several key problems frequently surface in this context. Perhaps the most common is memory mismanagement; the JVM, particularly older versions, might not be inherently aware of the memory limits defined for its container by the cgroup. This lack of awareness can lead the JVM to attempt to allocate and use more memory than has been allocated to the container by the orchestrator or runtime. Such overconsumption inevitably results in the container being abruptly terminated by the operating system’s Out-Of-Memory (OOM) killer, a situation that can be difficult to diagnose without understanding this interaction.

Similarly, CPU resource allocation can present challenges. The JVM might not accurately perceive the CPU resources available to it within the container, such as CPU shares or quotas defined by cgroups. This can lead to suboptimal decisions in sizing internal thread pools (like the common ForkJoinPool or garbage collection threads) or can cause the application to experience unexpected CPU throttling, impacting performance. Another frequent issue is Docker image bloat. Overly large Docker images not only increase deployment times across the infrastructure but also expand the potential attack surface by including unnecessary libraries or tools, thereby posing security vulnerabilities. The talk aimed to equip developers and operations personnel with the knowledge to anticipate and mitigate these common pitfalls. During the presentation, a demonstration application, humorously named “ressources-munger,” was used to simulate these problems, clearly showing how an application could consume excessive memory leading to an OOM kill by Docker, or how it might trigger excessive swapping if not configured correctly, severely degrading performance.

JVM Memory Management and CPU Considerations within Containers

A significant portion of the discussion was dedicated to the intricacies of JVM memory management within the containerized environment. Charles and Jean-Christophe elaborated that older JVM versions, specifically those prior to Java 8 update 131 and Java 9, were not inherently “cgroup-aware”. This lack of awareness meant that the JVM’s default heap sizing heuristics—for example, typically allocating up to one-quarter of the physical host’s memory for the heap—would be based on the total resources of the host machine rather than the specific limits imposed on the container by its cgroup. This behavior is a primary contributor to unexpected OOM kills when the container’s actual memory limit is much lower than what the JVM assumes based on the host.

Several best practices were shared to address these memory-related issues effectively. The foremost recommendation is to use cgroup-aware JVM versions. Modern Java releases, particularly Java 8 update 191 and later, and Java 10 and newer, incorporate significantly improved cgroup awareness. For older Java 8 updates (specifically 8u131 to 8u190), experimental flags such as -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap can be employed to enable the JVM to better respect container memory limits. In Java 10 and subsequent versions, this behavior became standard and often requires no special flags. However, even with cgroup-aware JVMs, explicitly setting the heap size using parameters like -Xms for the initial heap size and -Xmx for the maximum heap size is frequently a recommended practice for predictability and control. Newer JVMs also offer options like -XX:MaxRAMPercentage, allowing for more dynamic heap sizing relative to the container’s allocated memory. It’s crucial to understand that the JVM’s total memory footprint extends beyond just the heap; it also requires memory for metaspace (which replaced PermGen in Java 8+), thread stacks, native libraries, and direct memory buffers. Therefore, when allocating memory to a container, it is essential to account for this total footprint, not merely the -Xmx value. A common guideline suggests that the Java heap might constitute around 50-75% of the total memory allocated to the container, with the remainder reserved for these other essential JVM components and any other processes running within the container. Tuning metaspace parameters, such as -XX:MetaspaceSize and -XX:MaxMetaspaceSize, can also prevent excessive native memory consumption, particularly in applications that dynamically load many classes.

Regarding CPU resources, the presenters noted that the JVM’s perception of available processors is also influenced by its cgroup awareness. In environments where CPU resources are constrained, using flags like -XX:ActiveProcessorCount can be beneficial to explicitly inform the JVM about the number of CPUs it should consider for sizing its internal thread pools, such as the common ForkJoinPool or the threads used for garbage collection. Optimizing the Docker image itself is another critical aspect of preparing Java applications for production. This involves choosing a minimal base image, such as alpine-jre, distroless, or official “slim” JRE images, instead of a full operating system distribution, to reduce the image size and potential attack surface. Utilizing multi-stage builds in the Dockerfile is a highly recommended technique; this allows developers to use a larger image containing build tools like Maven or Gradle and a full JDK in an initial stage, and then copy only the necessary application artifacts (like the JAR file) and a minimal JRE into a final, much smaller runtime image. Furthermore, being mindful of Docker image layering by combining related commands in the Dockerfile where possible can help reduce the number of layers and optimize image size. For applications on Java 9 and later, tools like jlink can be used to create custom, minimal JVM runtimes that include only the Java modules specifically required by the application, further reducing the image footprint. The session strongly emphasized that a collaborative approach between development and operations teams, combined with a thorough understanding of both JVM internals and Docker containerization principles, is paramount for successfully and reliably running Java applications in production environments.

Links:

Hashtags: #Java #Docker #JVM #Containerization #DevOps #Performance #MemoryManagement #DevoxxFR2018 #CharlesSabourdin #JeanChristopheSirot #BestPractices #ProductionReady #CloudNative

PostHeaderIcon [DevoxxFR 2018] Watch Out! Don’t Plug in That USB, You Might Get Seriously Hacked!

The seemingly innocuous USB drive, a ubiquitous tool for data transfer and device charging, can harbor hidden dangers. At Devoxx France 2018, Aurélien Loyer and Nathan Damie, both from Zenika , delivered a cautionary and eye-opening presentation titled “Attention ! Ne mets pas cette clé tu risques de te faire hacker très fort !” (Watch Out! Don’t Plug in That USB, You Might Get Seriously Hacked!). They demonstrated how easily a modified USB device, commonly known as a “BadUSB,” can be used to execute arbitrary code on a victim’s computer, often by emulating a keyboard.

The speakers explained that these malicious devices often don’t rely on exploiting software vulnerabilities. Instead, they leverage a fundamental trust computers place in Human Interface Devices (HIDs), such as keyboards and mice. A BadUSB device, disguised as a regular flash drive or even embedded within a cable or other peripheral, can announce itself to the operating system as a keyboard and then rapidly type and execute commands – all without the user’s direct interaction beyond plugging it in.

What is BadUSB and How Does It Work?

Aurélien Loyer and Nathan Damie explained that many BadUSB devices are not standard USB flash drives but rather contain small microcontrollers like the Adafruit Trinket or Pro Micro. These microcontrollers are programmed (often using the Arduino IDE ) to act as a Human Interface Device (HID), specifically a keyboard. When plugged into a computer, the operating system recognizes it as a keyboard and accepts input from it. The pre-programmed script on the microcontroller can then “type” a sequence of commands at high speed. This could involve opening a command prompt or terminal, downloading and executing malware from the internet, exfiltrating data, or performing other malicious actions.

The speakers demonstrated this live by plugging a device into a computer, which then automatically opened a text editor and typed a message, followed by executing commands to open a web browser and navigate to a specific URL. They showed the simplicity of the Arduino code required: essentially initializing the keyboard library and then sending keystroke commands (e.g., Keyboard.print(), Keyboard.press(), Keyboard.release()). More sophisticated attacks could involve a delay before execution, or triggering the payload based on certain conditions, making them harder to detect. Nathan even demonstrated a modified gaming controller that could harbor such a payload, executing it unexpectedly. The core danger lies in the fact that computers are generally designed to trust keyboard inputs without question.

Potential Dangers and Countermeasures

The implications of BadUSB attacks are significant. Aurélien and Nathan highlighted how easily these devices can be disguised. They showed examples of microcontrollers small enough to fit inside the plastic casing of a standard USB drive, or even integrated into USB cables or other peripherals like a mouse with a hidden logger. This makes visual inspection unreliable. The attack vector often relies on social engineering: an attacker might drop “lost” USB drives in a parking lot or other public area, hoping a curious individual will pick one up and plug it into their computer. Even seemingly harmless devices like e-cigarettes could potentially be weaponized if they contain a malicious microcontroller and are plugged into a USB port for charging.

As for countermeasures, the speakers emphasized caution. The most straightforward advice is to never plug in USB devices from unknown or untrusted sources. For situations where using an untrusted USB device for charging is unavoidable (though not recommended for data transfer), they mentioned “USB condoms” – small hardware dongles that physically block the data pins in a USB connection, allowing only power to pass through. However, this would render a data-carrying device like a flash drive unusable for its primary purpose. The session served as a stark reminder that physical security and user awareness are crucial components of overall cybersecurity, as even the most common peripherals can pose a threat.

Links:

Hashtags: #BadUSB #CyberSecurity #HardwareHacking #SocialEngineering #USB #Microcontroller #Arduino #Zenika #DevoxxFR2018 #PhysicalSecurity

PostHeaderIcon [DevoxxFR 2018] Are you “merge” or “rebase” oriented?

Git, the distributed version control system, has become an indispensable tool in the modern developer’s arsenal, revolutionizing how teams collaborate on code. Its flexibility and power, however, come with a degree of complexity that can initially intimidate newcomers, particularly when it comes to integrating changes from different branches. At Devoxx France 2018, Jonathan Detoeuf, a freelance developer with a passion for Software Craftsmanship and Agile methodologies, tackled one of Git’s most debated topics in his presentation: “T’es plutôt merge ou rebase ?” (Are you more of a merge or rebase person?). He aimed to demystify these two fundamental Git commands, explaining their respective use cases, how to avoid common pitfalls, and ultimately, how to maintain a clean, understandable project history.

Jonathan began by underscoring the importance of the Git log (history) as a “Tower of Babel” – a repository of the team’s collective knowledge, containing not just the current source code but the entire evolution of the project. Well-crafted commit messages and a clear history are crucial for understanding past decisions, tracking down bugs, and onboarding new team members. With this premise, the choice between merging and rebasing becomes more than just a technical preference; it’s about how clearly and effectively a team communicates its development story through its Git history. Jonathan’s talk provided practical guidance, moving beyond the often-unhelpful official Git documentation that offers freedom but little explicit recommendation on when to use which strategy.

Understanding the Basics: Merge vs. Rebase and History

Before diving into specific recommendations, Jonathan Detoeuf revisited the core mechanics of merging and rebasing, emphasizing their impact on project history. A standard merge (often a “merge commit” when branches have diverged) integrates changes from one branch into another by creating a new commit that has two parent commits. This explicitly shows where a feature branch was merged back into a main line, preserving the historical context of parallel development. A fast-forward merge, on the other hand, occurs if the target branch hasn’t diverged; Git simply moves the branch pointer forward. Rebasing, in contrast, re-applies commits from one branch onto the tip of another, creating a linear history as if the changes were made sequentially. This can make the history look cleaner but rewrites it, potentially losing the context of when changes were originally made in relation to other branches.

Jonathan stressed the value of a well-maintained Git history. It’s not just a log of changes but a narrative of the project’s development. Clear commit messages are vital as they convey the intent behind changes. A good history allows for “archaeology” – understanding why a particular piece of code exists or how a bug was introduced, even years later when the original developers are no longer around. Therefore, the decision to merge or rebase should be guided by the desire to create a history that is both accurate and easy to understand. He cautioned that many developers fear losing code with Git, especially during conflict resolution, making it important to master these integration techniques.

The Case for Merging: Durable Branches and Significant Events

Jonathan Detoeuf advocated for using merge commits (specifically, non-fast-forward merges) primarily for integrating “durable” branches or marking significant events in the project’s lifecycle. Durable branches are long-lived branches like main, develop, or release branches. When merging one durable branch into another (e.g., merging a release branch into main), a merge commit clearly signifies this integration point. Similarly, a merge commit is appropriate for marking key milestones such as the completion of a release, the end of a sprint, or a deployment to production. These merge commits act as explicit markers in the history, making it easy to see when major features or versions were incorporated.

He contrasted this with merging minor feature branches where a simple fast-forward merge might be acceptable if the history remains clear, or if a rebase is preferred for a cleaner linear history before the final integration. The key is that the merge commit should add value by highlighting a significant integration point or preserving the context of a substantial piece of work being completed. If it’s just integrating a pull request for a small, self-contained feature that has been reviewed, other strategies like rebase followed by a fast-forward merge, or even “squash and merge,” might be preferable to avoid cluttering the main line history with trivial merge bubbles. Jonathan’s advice leans towards using merge commits judiciously to preserve meaningful historical context, especially for branches that represent a significant body of work or a persistent line of development.

The Case for Rebasing: Feature Branches and Keeping it Clean

Rebasing, according to Jonathan Detoeuf, finds its primary utility when working with local or short-lived feature branches before they are shared or merged into a more permanent branch. When a developer is working on a feature and the main branch (e.g., develop or main) has advanced, rebasing the feature branch onto the latest state of the main branch can help incorporate upstream changes cleanly. This process rewrites the feature branch’s history by applying its commits one by one on top of the new base, resulting in a linear sequence of changes. This makes the feature branch appear as if it was developed sequentially after the latest changes on the main branch, which can simplify the final merge (often allowing a fast-forward merge) and lead to a cleaner, easier-to-read history on the main line.

Jonathan also highlighted git pull –rebase as a way to update a local branch with remote changes, avoiding unnecessary merge commits that can clutter the local history when simply trying to synchronize with colleagues’ work on the same branch. Furthermore, interactive rebase (git rebase -i) is a powerful tool for “cleaning up” the history of a feature branch before creating a pull request or merging. It allows developers to squash multiple work-in-progress commits into more meaningful, atomic commits, edit commit messages, reorder commits, or even remove unwanted ones. This careful curation of a feature branch’s history before integration ensures that the main project history remains coherent and valuable. However, a crucial rule for rebasing is to never rebase a branch that has already been pushed and is being used by others, as rewriting shared history can cause significant problems for collaborators. The decision-making flowchart Jonathan presented often guided towards rebasing for feature branches to integrate changes from a durable branch, or to clean up history before a fast-forward merge.

Best Practices and Conflict Avoidance

Beyond the when-to-merge-vs-rebase dilemma, Jonathan Detoeuf shared several best practices for smoother collaboration and conflict avoidance. Regularly committing small, atomic changes makes it easier to manage history and resolve conflicts if they arise. Communicating with team members about who is working on what can also prevent overlapping efforts on the same files. Structuring the application well, with clear separation of concerns into different files or modules, naturally reduces the likelihood of merge conflicts.

When conflicts do occur, understanding the changes using git diff and carefully resolving them is key. Jonathan also touched upon various Git workflows, such as feature branching, Gitflow, or trunk-based development, noting that the choice of merge/rebase strategy often aligns with the chosen workflow. For instance, the “feature merge” (or GitHub flow) often involves creating a feature branch, working on it, and then merging it back (often via a pull request, which might use a squash merge or a rebase-and-merge strategy depending on team conventions). He ultimately provided a decision tree to help developers choose: for durable branches, merging is generally preferred to integrate other durable branches or significant features. For feature branches, rebasing is often used to incorporate changes from durable branches or to clean up history before a fast-forward merge. The overarching goal is to maintain an informative and clean project history that serves the team well.

Hashtags: #Git #VersionControl #Merge #Rebase #SoftwareDevelopment #DevOps #JonathanDetoeuf #DevoxxFR2018 #GitWorkflow #SourceControl

PostHeaderIcon [DevoxxFR 2018] Apache Kafka: Beyond the Brokers – Exploring the Ecosystem

Apache Kafka is often recognized for its high-throughput, distributed messaging capabilities, but its power extends far beyond just the brokers. Florent Ramière from Confluent, a company significantly contributing to Kafka’s development, presented a comprehensive tour of the Kafka ecosystem at DevoxxFR2018. He aimed to showcase the array of open-source components that revolve around Kafka, enabling robust data integration, stream processing, and more.

Kafka Fundamentals and the Confluent Platform

Florent began with a quick refresher on Kafka’s core concept: an ordered, replayable log of messages (events) where consumers can read at their own pace from specific offsets. This design provides scalability, fault tolerance, and guaranteed ordering (within a partition), making it a cornerstone for event-driven architectures and handling massive data streams (Confluent sees clients handling up to 60 GB/s).

To get started, while Kafka involves several components like brokers and ZooKeeper, the Confluent Platform offers tools to simplify setup. The confluent CLI can start a local development environment with Kafka, ZooKeeper, Kafka SQL (ksqlDB), Schema Registry, and more with a single command. Docker images are also readily available for containerized deployments.

Kafka Connect: Bridging Kafka with External Systems

A significant part of the ecosystem is Kafka Connect, a framework for reliably streaming data between Kafka and other systems. Connectors act as sources (ingesting data into Kafka from databases, message queues, etc.) or sinks (exporting data from Kafka to data lakes, search indexes, analytics platforms, etc.). Florent highlighted the availability of numerous pre-built connectors for systems like JDBC databases, Elasticsearch, HDFS, S3, and Change Data Capture (CDC) tools.

He drew a parallel between Kafka Connect and Logstash, noting that while Logstash is excellent, Kafka Connect is designed as a distributed, fault-tolerant, and scalable service for these data integration tasks. It allows for transformations (e.g., filtering, renaming fields, anonymization) within the Connect pipeline via a REST API for configuration. This makes it a powerful tool for building data pipelines without writing extensive custom code.

Stream Processing with Kafka Streams and ksqlDB

Once data is in Kafka, processing it in real-time is often the next step. Kafka Streams is a client library for building stream processing applications directly in Java (or Scala). Unlike frameworks like Spark or Flink that often require separate processing clusters, Kafka Streams applications are standalone Java applications that read from Kafka, process data, and can write results back to Kafka or external systems. This simplifies deployment and monitoring. Kafka Streams provides rich DSL for operations like filtering, mapping, joining streams and tables (a table in Kafka Streams is a view of the latest value for each key in a stream), windowing, and managing state, all with exactly-once processing semantics.

For those who prefer SQL to Java/Scala, ksqlDB (formerly Kafka SQL or KSQL) offers a SQL-like declarative language to define stream processing logic on top of Kafka topics. Users can create streams and tables from Kafka topics, perform continuous queries (SELECT statements that run indefinitely, emitting results as new data arrives), joins, aggregations over windows, and write results to new Kafka topics. ksqlDB runs as a separate server and uses Kafka Streams internally. It also manages stateful operations by storing state in RocksDB and backing it up to Kafka topics for fault tolerance. Florent emphasized that while ksqlDB is powerful for many use cases, complex UDFs or very intricate logic might still be better suited for Kafka Streams directly.

Schema Management and Other Essential Tools

When dealing with structured data in Kafka, especially in evolving systems, schema management becomes crucial. The Confluent Schema Registry helps manage and enforce schemas (typically Avro, but also Protobuf and JSON Schema) for messages in Kafka topics. It ensures schema compatibility (e.g., backward, forward, full compatibility) as schemas evolve, preventing data quality issues and runtime errors in producers and consumers. REST Proxy allows non-JVM applications to produce and consume messages via HTTP. Kafka also ships with command-line tools for performance testing (e.g., kafka-producer-perf-test, kafka-consumer-perf-test), latency checking, and inspecting consumer group lags, which are vital for operations and troubleshooting. Effective monitoring, often using JMX metrics exposed by Kafka components fed into systems like Prometheus via JMX Exporter or Jolokia, is also critical for production deployments.

Florent concluded by encouraging exploration of the Confluent Platform demos and his “kafka-story” GitHub repository, which provide step-by-step examples of these ecosystem components.

Links:

Hashtags: #ApacheKafka #KafkaConnect #KafkaStreams #ksqlDB #Confluent #StreamProcessing #DataIntegration #DevoxxFR2018 #FlorentRamiere #EventDrivenArchitecture #Microservices #BigData

PostHeaderIcon [DevoxxFR 2018] Software Heritage: Preserving Humanity’s Software Legacy

Software is intricately woven into the fabric of our modern world, driving industry, fueling innovation, and forming a critical part of our scientific and cultural knowledge. Recognizing the profound importance of the source code that underpins this digital infrastructure, the Software Heritage initiative was launched. At Devoxx France 2018, Roberto Di Cosmo, a professor, director of Software Heritage, and affiliated with Inria , delivered an insightful talk titled “Software Heritage: Pourquoi et comment preserver le patrimoine logiciel de l’Humanite” (Software Heritage: Why and How to Preserve Humanity’s Software Legacy). He articulated the mission to collect, preserve, and share all publicly available software source code, creating a universal archive for future generations – a modern-day Library of Alexandria for software.

Di Cosmo began by emphasizing that source code is not just a set of instructions for computers; it’s a rich repository of human knowledge, ingenuity, and history. From complex algorithms to the subtle comments left by developers, source code tells a story of problem-solving and technological evolution. However, this invaluable heritage is fragile and at risk of being lost due to obsolete storage media, defunct projects, and disappearing hosting platforms.

The Mission: Collect, Preserve, Share

The core mission of Software Heritage, as outlined by Roberto Di Cosmo, is threefold: to collect, preserve, and make accessible the entirety of publicly available software source code. This ambitious undertaking aims to create a comprehensive and permanent archive – an “Internet Archive for source code” – safeguarding it from loss and ensuring it remains available for research, education, industrial development, and cultural understanding.

The collection process involves systematically identifying and archiving code from a vast array of sources, including forges like GitHub, GitLab, Bitbucket, institutional repositories like HAL, and package repositories such as Gitorious and Google Code (many of which are now defunct, highlighting the urgency). Preservation is a long-term commitment, requiring strategies to combat digital obsolescence and ensure the integrity and continued accessibility of the archived code over decades and even centuries. Sharing this knowledge involves providing tools and interfaces for researchers, developers, historians, and the general public to explore this vast repository, discover connections between projects, and trace the lineage of software. Di Cosmo stressed that this is not just about backing up code; it’s about building a structured, interconnected knowledge base.

Technical Challenges and Approach

The scale of this endeavor presents significant technical challenges. The sheer volume of source code is immense and constantly growing. Code exists in numerous version control systems (Git, Subversion, Mercurial, etc.) and packaging formats, each with its own metadata and history. To address this, Software Heritage has developed a sophisticated infrastructure capable of ingesting code from diverse origins and storing it in a universal, canonical format.

A key element of their technical approach is the use of a Merkle tree structure, similar to what Git uses. All software artifacts (files, directories, commits, revisions) are identified by cryptographic hashes of their content. This allows for massive deduplication (since identical files or code snippets are stored only once, regardless of how many projects they appear in) and ensures the integrity and verifiability of the archive. This graph-based model also allows for the reconstruction of the full development history of software projects and the relationships between them. Di Cosmo explained that this structure not only saves space but also provides a powerful way to navigate and understand the evolution of software. The entire infrastructure itself is open source.

A Universal Archive for All

Roberto Di Cosmo emphasized that Software Heritage is built as a common infrastructure for society, serving multiple purposes. For industry, it provides a reference point for existing code, preventing reinvention and facilitating reuse. For science, it offers a vast dataset for research on software engineering, programming languages, and the evolution of code, and is crucial for the reproducibility of research that relies on software. For education, it’s a rich learning resource. And for society as a whole, it preserves a vital part of our collective memory and technological heritage.

He concluded with a call to action, inviting individuals, institutions, and companies to support the initiative. This support can take many forms: contributing code from missing sources, helping to develop tools and connectors for different version control systems, providing financial sponsorship, or simply spreading the word about the importance of preserving our software legacy. Software Heritage aims to be a truly global and collaborative effort to ensure that the knowledge embedded in source code is not lost to time.

Links:

Hashtags: #SoftwareHeritage #OpenSource #Archive #DigitalPreservation #SourceCode #CulturalHeritage #RobertoDiCosmo #Inria #DevoxxFR2018

PostHeaderIcon [DevoxxFR 2018] Deploying Microservices on AWS: Compute Options Explored at Devoxx France 2018

At Devoxx France 2018, Arun Gupta and Tiffany Jernigan, both from Amazon Web Services (AWS), delivered a three-hour deep-dive session titled Compute options for Microservices on AWS. This hands-on tutorial explored deploying a microservices-based application using various AWS compute options: EC2, Amazon Elastic Container Service (ECS), AWS Fargate, Elastic Kubernetes Service (EKS), and AWS Lambda. Through a sample application with web app, greeting, and name microservices, they demonstrated local testing, deployment pipelines, service discovery, monitoring, and canary deployments. The session, rich with code demos, is available on YouTube, with code and slides on GitHub.

Microservices: Solving Business Problems

Arun Gupta opened by addressing the monolith vs. microservices debate, emphasizing that the choice depends on business needs. Microservices enable agility, frequent releases, and polyglot environments but introduce complexity. AWS simplifies this with managed services, allowing developers to focus on business logic. The demo application featured three microservices: a public-facing web app, and internal greeting and name services, communicating via REST endpoints. Built with WildFly Swarm, a Java EE-compliant server, the application produced a portable fat JAR, deployable as a container or Lambda function. The presenters highlighted service discovery, ensuring the web app could locate stateless instances of greeting and name services.

EC2: Full Control for Traditional Deployments

Amazon EC2 offers developers complete control over virtual machines, ideal for those needing to manage the full stack. The presenters deployed the microservices on EC2 instances, running WildFly Swarm JARs. Using Maven and a Docker profile, they generated container images, pushed to Docker Hub, and tested locally with Docker Compose. A docker stack deploy command spun up the services, accessible via curl localhost:8080, returning responses like “hello Sheldon.” EC2 requires manual scaling and cluster management, but its flexibility suits custom stacks. The GitHub repo includes configurations for EC2 deployments, showcasing integration with AWS services like CloudWatch for logging.

Amazon ECS: Orchestrating Containers

Amazon ECS simplifies container orchestration, managing scheduling and scaling. The presenters created an ECS cluster in the AWS Management Console, defining task definitions for the three microservices. Task definitions specified container images, CPU, and memory, with an Application Load Balancer (ALB) enabling path-based routing (e.g., /resources/greeting). Using the ECS CLI, they deployed services, ensuring high availability across multiple availability zones. CloudWatch integration provided metrics and logs, with alarms for monitoring. ECS reduces operational overhead compared to EC2, balancing control and automation. The session highlighted ECS’s deep integration with AWS services, streamlining production workloads.

AWS Fargate: Serverless Containers

Introduced at re:Invent 2017, AWS Fargate abstracts server management, allowing developers to focus on containers. The presenters deployed the same microservices using Fargate, specifying task definitions with AWS VPC networking for fine-grained security. The Fargate CLI, a GitHub project by AWS’s John Pignata, simplified setup, creating ALBs and task definitions automatically. A curl to the load balancer URL returned responses like “howdy Penny.” Fargate’s per-second billing and task-level resource allocation optimize costs. Available initially in US East (N. Virginia), Fargate suits developers prioritizing simplicity. The session emphasized its role in reducing infrastructure management.

Elastic Kubernetes Service (EKS): Kubernetes on AWS

EKS, in preview during the session, brings managed Kubernetes to AWS. The presenters deployed the microservices on an EKS cluster, using kubectl to manage pods and services. They introduced Istio, a service mesh, to handle traffic routing and observability. Istio’s sidecar containers enabled 50/50 traffic splits between “hello” and “howdy” versions of the greeting service, configured via YAML manifests. Chaos engineering was demonstrated by injecting 5-second delays in 10% of requests, testing resilience. AWS X-Ray, integrated via a daemon set, provided service maps and traces, identifying bottlenecks. EKS, later supporting Fargate, offers flexibility for Kubernetes users. The GitHub repo includes EKS manifests and Istio configurations.

AWS Lambda: Serverless Microservices

AWS Lambda enables serverless deployments, eliminating server management. The presenters repurposed the WildFly Swarm application for Lambda, using the Serverless Application Model (SAM). Each microservice became a Lambda function, fronted by API Gateway endpoints (e.g., /greeting). SAM templates defined functions, APIs, and DynamoDB tables, with sam local start-api testing endpoints locally via Dockerized Lambda runtimes. Responses like “howdy Sheldon” were verified with curl localhost:3000. SAM’s package and deploy commands uploaded functions to S3, while canary deployments shifted traffic (e.g., 10% to new versions) with CloudWatch alarms. Lambda’s per-second billing and 300-second execution limit suit event-driven workloads. The session showcased SAM’s integration with AWS services and the Serverless Application Repository.

Deployment Pipelines: Automating with AWS CodePipeline

The presenters built a deployment pipeline using AWS CodePipeline, a managed service inspired by Amazon’s internal tooling. A GitHub push triggered the pipeline, which used CodeCommit to build Docker images, pushed them to Amazon Elastic Container Registry (ECR), and deployed to an ECS cluster. For Lambda, SAM templates were packaged and deployed. CloudFormation templates automated resource creation, including VPCs, subnets, and ALBs. The pipeline ensured immutable deployments with commit-based image tags, maintaining production stability. The GitHub repo provides CloudFormation scripts, enabling reproducible environments. This approach minimizes manual intervention, supporting rapid iteration.

Monitoring and Logging: AWS X-Ray and CloudWatch

Monitoring was a key focus, with AWS X-Ray providing end-to-end tracing. In ECS and EKS, X-Ray daemons collected traces, generating service maps showing web app, greeting, and name interactions. For Lambda, X-Ray was enabled natively via SAM templates. CloudWatch offered metrics (e.g., CPU usage) and logs, with alarms for thresholds. In EKS, Kubernetes tools like Prometheus and Grafana were mentioned, but X-Ray’s integration with AWS services was emphasized. The presenters demonstrated debugging Lambda functions locally using SAM CLI and IntelliJ, enhancing developer agility. These tools ensure observability, critical for distributed microservices.

Choosing the Right Compute Option

The session concluded by comparing compute options. EC2 offers maximum control but requires managing scaling and updates. ECS balances automation and flexibility, ideal for containerized workloads. Fargate eliminates server management, suiting simple deployments. EKS caters to Kubernetes users, with Istio enhancing observability. Lambda, best for event-driven microservices, minimizes operational overhead but has execution limits. Factors like team expertise, application requirements, and cost influence the choice. The presenters encouraged feedback via GitHub issues to shape AWS’s roadmap. Visit aws.amazon.com/containers for more.

Links:

Hashtags: #AWS #Microservices #ECS #Fargate #EKS #Lambda #DevoxxFR2018 #ArunGupta #TiffanyJernigan #CloudComputing

PostHeaderIcon [DevoxxFR 2017] Introduction to the Philosophy of Artificial Intelligence

The rapid advancements and increasing integration of artificial intelligence into various aspects of our lives raise fundamental questions that extend beyond the purely technical realm into the domain of philosophy. As machines become capable of performing tasks that were once considered uniquely human, such as understanding language, recognizing patterns, and making decisions, we are prompted to reconsider our definitions of intelligence, consciousness, and even what it means to be human. At DevoxxFR 2017, Eric Lefevre Ardant and Sonia Ouchtar offered a thought-provoking introduction to the philosophy of artificial intelligence, exploring key concepts and thought experiments that challenge our understanding of machine intelligence and its potential implications.

Eric and Sonia began by acknowledging the pervasive presence of “AI” in contemporary discourse, noting that the term is often used broadly to encompass everything from simple algorithms to hypothetical future superintelligence. They stressed the importance of developing a critical perspective on these discussions and acquiring the vocabulary necessary to engage with the deeper philosophical questions surrounding AI. Their talk aimed to move beyond the hype and delve into the core questions that philosophers have grappled with as the possibility of machine intelligence has become more concrete.

The Turing Test: A Criterion for Machine Intelligence?

A central focus of the presentation was the Turing Test, proposed by Alan Turing in 1950 as a way to determine if a machine can exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human. Eric and Sonia explained the setup of the test, which involves a human interrogator interacting with both a human and a machine through text-based conversations. If the interrogator cannot reliably distinguish the machine from the human after a series of conversations, the machine is said to have passed the Turing Test.

They discussed the principles behind the test, highlighting that it focuses on observable behavior (linguistic communication) rather than the internal workings of the machine. The Turing Test has been influential but also widely debated. Eric and Sonia presented some of the key criticisms of the test, such as the argument that simulating intelligent conversation does not necessarily imply true understanding or consciousness.

The Chinese Room Argument: Challenging the Turing Test

To further explore the limitations of the Turing Test and the complexities of defining machine intelligence, Eric and Sonia introduced John Searle’s Chinese Room argument, a famous thought experiment proposed in 1980. They described the scenario: a person who does not understand Chinese is locked in a room with a large set of Chinese symbols, a rulebook in English for manipulating these symbols, and incoming batches of Chinese symbols (representing questions). By following the rules in the rulebook, the person can produce outgoing batches of Chinese symbols (representing answers) that are appropriate responses to the incoming questions, making it appear to an outside observer that the person understands Chinese.

Sonia and Eric explained that Searle’s argument is that even if the person in the room can pass the Turing Test for understanding Chinese (by producing seemingly intelligent responses), they do not actually understand Chinese. They are simply manipulating symbols according to rules, without any genuine semantic understanding. The Chinese Room argument is a direct challenge to the idea that passing the Turing Test is a sufficient criterion for claiming a machine possesses true intelligence or understanding. It raises profound questions about the nature of understanding, consciousness, and whether symbolic manipulation alone can give rise to genuine cognitive states.

The talk concluded by emphasizing that the philosophy of AI is a fertile and ongoing area of inquiry with deep connections to various other disciplines, including neuroscience, psychology, linguistics, and computer science. Eric and Sonia encouraged attendees to continue exploring these philosophical questions, recognizing that understanding the fundamental nature of intelligence, both human and artificial, is crucial as we continue to develop increasingly capable machines. The session provided a valuable framework for critically evaluating claims about AI and engaging with the ethical and philosophical implications of artificial intelligence.

Hashtags: #AI #ArtificialIntelligence #Philosophy #TuringTest #ChineseRoom #MachineIntelligence #Consciousness #EricLefevreArdant #SoniaOUCHTAR #PhilosophyOfAI


PostHeaderIcon [DevoxxFR] How to be a Tech Lead in an XXL Pizza Team Without Drowning

The role of a Tech Lead is multifaceted, requiring a blend of technical expertise, mentorship, and facilitation skills. However, these responsibilities become significantly more challenging when leading a large team, humorously dubbed an “XXL pizza team,” potentially comprising fifteen or more individuals, with a substantial number of developers. Damien Beaufils shared his valuable one-year retrospective on navigating this complex role within such a large and diverse team, offering practical insights on how to effectively lead, continue contributing technically, and avoid being overwhelmed.

Damien’s experience was rooted in leading a team working on a public-facing website, notable for its heterogeneity. The team was mixed in terms of skill sets, gender, and composition (combining client and vendor personnel), adding layers of complexity to the leadership challenge.

Balancing Technical Contribution and Leadership

A key tension for many Tech Leads is balancing hands-on coding with leadership duties. Damien addressed this directly, emphasizing that while staying connected to the code is important for credibility and understanding, the primary focus must shift towards enabling the team. He detailed practices put in place to foster collective ownership of the codebase and enhance overall product quality. These included encouraging pair programming, implementing robust code review processes, and establishing clear coding standards.

The goal was to distribute technical knowledge and responsibility across the team rather than concentrating it solely with the Tech Lead. By empowering team members to take ownership and contribute to quality initiatives, Damien found that the team’s overall capability and autonomy increased, allowing him to focus more on strategic technical guidance and facilitation.

Fostering Learning, Progression, and Autonomy

Damien highlighted several successful strategies employed to promote learning, progression, and autonomy within the XXL team. These successes were not achieved by acting as a “super-hero” dictating solutions but through deliberate efforts to facilitate growth. Initiatives such as organizing internal workshops, encouraging knowledge sharing sessions, and providing opportunities for developers to explore new technologies contributed to a culture of continuous learning.

He stressed the importance of the Tech Lead acting as a coach, guiding individuals and the team towards self-improvement and problem-solving. By fostering an environment where team members felt empowered to make technical decisions and learn from both successes and failures, Damien helped build a more resilient and autonomous team. This shift from relying on a single point of technical expertise to distributing knowledge and capability was crucial for managing the scale and diversity of the team effectively.

Challenges and Lessons Learned

Damien was also candid about the problems encountered and the strategies that proved less effective. Leading a large, mixed team inevitably presents communication challenges, potential conflicts, and the difficulty of ensuring consistent application of standards. He discussed the importance of clear communication channels, active listening, and addressing issues proactively.

One crucial lesson learned was the need for clearly defined, measurable indicators to track progress in areas like code quality, team efficiency, and technical debt reduction. Without objective metrics, it’s challenging to assess the effectiveness of implemented practices and demonstrate improvement. Damien concluded that while there’s no magic formula for leading an XXL team, a pragmatic approach focused on empowering the team, fostering a culture of continuous improvement, and using data to inform decisions is essential for success without becoming overwhelmed.

Hashtags: #TechLead #TeamManagement #SoftwareDevelopment #Leadership #DevOps #Agile #DamienBeaufils

PostHeaderIcon [DevoxxFR 2017] Why Your Company Should Store All Its Code in a Single Repo

The strategic decision regarding how an organization structures and manages its source code repositories is far more than a mere technical implementation detail; it is a fundamental architectural choice with profound and wide-ranging implications for development workflow efficiency, team collaboration dynamics, the ease of code sharing and reuse, and the effectiveness of the entire software delivery pipeline, including crucial aspects like Continuous Integration and deployment. The prevailing trend in recent years, particularly amplified by the widespread adoption of microservices architectures and the facilitation of distributed teams, has often leaned towards organizing code into numerous independent repositories (commonly known as the multi-repo approach). In this model, it is typical to have one repository per application, per service, or even per library. However, as Thierry Abaléa highlighted in his concise yet highly insightful and provocative talk at DevoxxFR 2017, some of the most innovative, productive, and successful technology companies in the world, including industry giants like Google, Facebook, and Twitter, operate and maintain their vast and complex codebases within a single, unified repository – a practice referred to as using a monorepo. This striking divergence in practice between the common industry trend and the approach taken by these leading technology companies prompted the central and compelling question of his presentation: what significant advantages, perhaps not immediately obvious, drive these large, successful organizations to embrace and actively maintain a monorepo strategy despite its perceived complexities and challenges, and are these benefits transferable and applicable to other organizations, regardless of their size, industry, or current architectural choices?

Thierry began by acknowledging the intuitive appeal and the perceived simplicity of the multi-repo model, where the organization of source code often appears to naturally mirror the organizational structure of teams or the architectural decomposition of applications into independent services. He conceded that for very small projects or nascent organizations, the multi-repo approach can seem straightforward. However, he sharply contrasted this with the monorepo approach favored by the aforementioned tech giants. He argued that while creating numerous small, independent repositories might seem simpler initially, this perceived simplicity rapidly erodes and can introduce significant friction, overhead, and complexity as the number of services, applications, libraries, and development teams grows within an organization. Managing dependencies between dozens, hundreds, or even thousands of independent repositories, coordinating changes that span across service boundaries, and ensuring consistent tooling, build processes, and deployment pipelines across a highly fragmented codebase become increasingly challenging, time-consuming, and error-prone in a large-scale multi-repo environment.

Unpacking the Compelling and Often Underestimated Advantages of the Monorepo

Thierry articulated several compelling and often underestimated benefits associated with adopting and effectively managing a monorepo strategy. A primary and perhaps the most impactful advantage is the unparalleled ease and efficiency of code sharing and reuse across different projects, applications, and teams within the organization. With all code residing in a single, unified place, developers can readily discover, access, and incorporate libraries, components, or utility functions developed by other teams elsewhere within the company without the friction of adding external dependencies or navigating multiple repositories. This inherent discoverability and accessibility fosters consistency in tooling and practices, reduces redundant effort spent on reinventing common functionalities, and actively promotes the creation and adoption of a shared internal ecosystem of high-quality, reusable code assets.

Furthermore, a monorepo can significantly enhance cross-team collaboration and dramatically facilitate large-scale refactorings and code modifications that span multiple components or services. Changes that affect several different parts of the system residing within the same repository can often be made atomically in a single commit, simplifying the process of coordinating complex updates across different parts of the system and fundamentally reducing the challenges associated with managing version compatibility issues and dependency hell that often plague multi-repo setups. Thierry also highlighted the simplification of dependency and version management; in a monorepo, there is typically a single, unified version of the entire codebase at any given point in time, eliminating the complexities and potential inconsistencies of tracking and synchronizing versions across numerous independent repositories. This unified view simplifies dependency upgrades and helps prevent conflicts arising from incompatible library versions. Finally, he argued that a monorepo inherently facilitates the implementation of a more effective and comprehensive cross-application Continuous Integration (CI) pipeline. Changes committed to the monorepo can easily trigger automated builds and tests for all affected downstream components, applications, and services within the same repository, enabling comprehensive testing of interactions and integrations between different parts of the system before changes are merged into the main development line. This leads to higher confidence in the overall stability and correctness of the entire system.

Addressing Practical Considerations, Challenges, and Potential Drawbacks

While making a strong and persuasive case for the advantages of a monorepo, Thierry also presented a balanced and realistic view by addressing the practical considerations, significant challenges, and potential drawbacks associated with this approach. He acknowledged that managing and scaling the underlying tooling (such as version control systems like Git or Mercurial, build systems like Bazel or Pants, and Continuous Integration infrastructure) to handle a massive monorepo containing millions of lines of code and potentially thousands of developers requires significant investment in infrastructure, tooling development, and specialized expertise. Companies like Google, Facebook, and Microsoft have had to develop highly sophisticated custom solutions or heavily adapt and extend existing open-source tools to manage their enormous repositories efficiently and maintain performance. Thierry noted that contributions from these leading companies back to open-source projects like Git and Mercurial are gradually making monorepo tooling more accessible and performant for other organizations.

He also pointed out that successfully adopting, implementing, and leveraging a monorepo effectively necessitates a strong and mature engineering culture characterized by high levels of transparency, trust, communication, and effective collaboration across different teams and organizational boundaries. If teams operate in silos with poor communication channels and a lack of awareness of work happening elsewhere in the codebase, a monorepo can potentially exacerbate issues related to unintentional breaking changes or conflicting work rather than helping to solve them. Thierry suggested that a full, immediate, “big bang” switch to a monorepo might not be feasible, practical, or advisable for all organizations. A phased or incremental approach, perhaps starting with new projects, consolidating code within a specific department or domain, or gradually migrating related services into a monorepo, could be a more manageable and lower-risk way to transition and build the necessary tooling, processes, and cultural practices over time. The talk provided a nuanced and well-rounded perspective, encouraging organizations to carefully consider the significant potential benefits of a monorepo for improving collaboration, code sharing, and CI efficiency, while being acutely mindful of the required investment in tooling, infrastructure, and, critically, the importance of fostering a collaborative and transparent engineering culture.

Hashtags: #Monorepo #CodeOrganization #EngineeringPractices #ThierryAbalea #SoftwareArchitecture #VersionControl #ContinuousIntegration #Collaboration #Google #Facebook #Twitter #DeveloperProductivity

PostHeaderIcon [DevoxxFR] Kill Your Branches, Do Feature Toggles

For many software development teams, managing feature branches in version control can be a source of significant pain and delays, particularly when branches diverge over long periods, leading to complex and time-consuming merge conflicts. Morgan LEROI proposed an alternative strategy: minimize or eliminate long-lived feature branches in favor of using Feature Toggles. His presentation explored the concepts behind feature toggles, their benefits, and shared practical experience on how this approach can streamline development workflows and enable new capabilities like activating features on demand.

Morgan opened by illustrating the common frustration associated with merging branches that have diverged significantly, describing it as a “traumatic experience”. This pain point underscores the need for development practices that reduce the time code spends in isolation before being integrated.

Embracing Feature Toggles

Feature Toggles, also known as Feature Flags, are a technique that allows developers to enable or disable specific features in an application at runtime, without deploying new code. The core idea is to merge code frequently into the main development branch (e.g., main or master), even if features are not yet complete or ready for production release. The incomplete or experimental features are wrapped in toggles that can be controlled externally.

Morgan explained that this approach addresses the merge hell problem by ensuring code is integrated continuously in small increments, minimizing divergence. It also decouples deployment from release; code containing new features can be deployed to production disabled, and the feature can be “released” or activated later via the toggle when ready.

Practical Benefits and Use Cases

Beyond simplifying merging, Feature Toggles offer several tangible benefits. Morgan highlighted their use by major industry players, including Amazon, demonstrating their effectiveness at scale. A key advantage is the ability to activate new features on demand, for specific user groups, or even for individual users. This enables phased rollouts, A/B testing, and easier rollback if a feature proves problematic.

Morgan detailed the application of feature toggles in A/B testing scenarios. By showing different versions of a feature (or the presence/absence of a feature) to different user segments, teams can collect metrics on user behavior and make data-driven decisions about which version is more effective. This allows for continuous experimentation and optimization based on real-world usage. He suggested that even a simple boolean configuration toggle (if (featureIsEnabled) { ... }) can be a starting point. Morgan encouraged developers to consider feature toggles as a powerful tool for improving development flow, reducing merge pain, and gaining flexibility in releasing new functionality. He challenged attendees to reflect on whether their current branching strategy is serving them well and to consider experimenting with feature toggles. Morgan Leroi is a Staff Software Engineer at Algolia.

Hashtags: #FeatureToggles #BranchingStrategy #ContinuousDelivery #DevOps #SoftwareDevelopment #Agile #MorganLEROI #DevoxxFR2017