Archive for the ‘en-US’ Category
[VivaTech 2019] Funding and Growing Tomorrow’s Unicorns
A 25-minute panel at VivaTech 2019, moderated by Emmanuelle Duten of Les Echos/Capital Finance, featured Philippe Botteri, Partner at Accel, Virginie Morgon, CEO of Eurazeo, and David Thevenon, Partner at SoftBank Investment Advisers, available on YouTube. Connected via LinkedIn, LinkedIn, and LinkedIn, they discussed Europe’s unicorn boom. This 1000–1200-word post, for investors, entrepreneurs, and policymakers, explores the drivers of unicorn growth.
Europe’s Unicorn Momentum
Philippe highlighted Europe’s unicorn surge, with 17–18 created in 2018, fueled by $23 billion in investments. Accel’s $575 million fund targets 22 European cities, a shift from London-Tel Aviv dominance 15 years ago. Virginie noted that two-thirds of 2018’s new unicorns were European, driven by ambitious founders and growing growth capital. David emphasized Europe’s robust ecosystem, with SoftBank’s investments in Germany and beyond, signaling that the region now rivals global hubs, supported by professional early-stage and growth investors.
Characteristics of Unicorn Founders
Virginie stressed that unicorn founders, like Farfetch’s José Neves, exhibit exceptional execution and ambition, mastering complex platforms (e.g., logistics, delivery). Philippe cited Doctolib’s Stan, whose passion for transforming European healthcare inspired Accel’s Series A investment, now a unicorn. David pointed to OYO’s Ritesh Agarwal, scaling from 13 to 200,000 hotel rooms since 2015, driven by urgency and global vision. These founders combine strategic thinking, platform-building (e.g., Grab’s shift to financial services), and relentless focus, distinguishing them in competitive markets.
Supporting Unicorn Growth
Beyond capital, VCs provide operational support. Philippe’s Accel leverages its global network (Silicon Valley, London, India) to help software startups relocate to the U.S., hiring top sales talent. Virginie’s Eurazeo offers strategic guidance, from commercial partnerships to U.S. expansion, as seen with ContentSquare. David’s SoftBank provides long-term capital (12-year funds) and market access (China, Latin America), fostering peace of mind for innovation. This hands-on partnership—$100 million on average to reach unicorn status—ensures rapid scaling, even if profitability lags behind growth.
Navigating the Application Lifecycle in Kubernetes
At Devoxx France 2019, Charles Sabourdin and Jean-Christophe Sirot, seasoned professionals in cloud-native technologies, delivered an extensive exploration of managing application lifecycles within Kubernetes. Charles, an architect with over 15 years in Linux and Java, and Jean-Christophe, a Docker expert since 2002, combined their expertise to demystify Docker’s underpinnings, Kubernetes’ orchestration, and the practicalities of continuous integration and delivery (CI/CD). Through demos and real-world insights, they addressed security challenges across development and business-as-usual (BAU) phases, proposing organizational strategies to streamline containerized workflows. This post captures their comprehensive session, offering a roadmap for developers and operations teams navigating Kubernetes ecosystems.
Docker’s Foundations: Isolation and Layered Efficiency
Charles opened the session by revisiting Docker’s core principles, emphasizing its reliance on Linux kernel features like namespaces and control groups (cgroups). Unlike virtual machines (VMs), which bundle entire operating systems, Docker containers share the host kernel, isolating processes within lightweight environments. This design achieves hyper-density, allowing more containers to run on a single machine compared to VMs. Charles demonstrated launching a container, highlighting its process isolation using commands like ps
within a containerized bash session, contrasting it with the host’s process list. He introduced Docker’s layer system, where images are built as immutable, stacked deltas, optimizing storage through shared base layers. Tools like Dive, he noted, help inspect these layers, revealing command histories and suggesting size optimizations. This foundation sets the stage for Kubernetes, enabling efficient, portable application delivery across environments.
Kubernetes: Orchestrating Scalable Deployments
Jean-Christophe transitioned to Kubernetes, describing it as a resource orchestrator that manages containerized applications across node pools. Kubernetes abstracts infrastructure complexities, using declarative configurations to maintain desired application states. Key components include pods—the smallest deployable units housing containers—replica sets for scaling, and deployments for managing updates. Charles demonstrated creating a namespace and deploying a sample application using kubectl run
, which scaffolds deployments, replica sets, and pods. He showcased rolling updates, where Kubernetes progressively replaces pods to ensure zero downtime, configurable via parameters like maxSurge
and maxUnavailable
. The duo emphasized Kubernetes’ auto-scaling capabilities, which adjust pod counts based on load, and the importance of defining resource limits to prevent performance bottlenecks. Their demo underscored Kubernetes’ role in achieving resilient, scalable deployments, aligning with hyper-density goals.
CI/CD Pipelines: Propagating Versions Seamlessly
The session delved into CI/CD pipelines, illustrating how Docker tags facilitate version propagation across development, pre-production, and production environments. Charles outlined a standard process: developers build Docker images tagged with version numbers (e.g., 1.1
, 1.2
) or environment labels (e.g., prod
, staging
). These images, stored in registries like Docker Hub or private repositories, are pulled by Kubernetes clusters for deployment. Jean-Christophe highlighted debates around tagging strategies, noting that version-based tags ensure traceability, while environment tags simplify environment-specific deployments. Their demo integrated tools like Jenkins and JFrog Artifactory, automating builds, tests, and deployments. They stressed the need for robust pipeline configurations to avoid resource overuse, citing Jenkins’ default manual build triggers for tagged releases as a safeguard. This pipeline approach ensures consistent, automated delivery, bridging development and production.
Security Across the Lifecycle: Development vs. BAU
Security emerged as a central theme, with Charles contrasting development and BAU phases. During development, teams rapidly address Common Vulnerabilities and Exposures (CVEs) with frequent releases, leveraging tools like JFrog Xray and Clair to scan images for vulnerabilities. Xray integrates with Artifactory, while Clair, an open-source solution, scans registry images for known CVEs. However, in BAU, where releases are less frequent, unpatched vulnerabilities pose greater risks. Charles shared an anecdote about a PHP project where a dependency switch broke builds after two years, underscoring the need for ongoing maintenance. They advocated for practices like running containers in read-only mode and using non-root users to minimize attack surfaces. Tools like OWASP Dependency-Track, they suggested, could enhance visibility into library vulnerabilities, though current scanners often miss non-package dependencies. This dichotomy highlights the need for automated, proactive security measures throughout the lifecycle.
Organizational Strategies: Balancing Complexity and Responsibility
Drawing from their experiences, Charles and Jean-Christophe proposed organizational solutions to manage Kubernetes complexity. They introduced a “1-2-3 model” for image management: Level 1 uses vendor-provided images (e.g., official MySQL images) managed by operations; Level 2 involves base images built by dedicated teams, incorporating standardized tooling; and Level 3 allows project-specific images, with teams assuming maintenance responsibilities. This model clarifies ownership, reducing risks like disappearing maintainers when projects transition to BAU. They emphasized cross-team collaboration, encouraging developers and operations to share knowledge and align on practices like Dockerfile authorship and resource allocation in YAML configurations. Charles reflected on historical DevOps silos, advocating for shared vocabularies and traceable decisions to navigate evolving best practices. Their return-of-experience underscored the importance of balancing automation with human oversight to maintain robust, secure Kubernetes environments.
Links:
- Devoxx France 2019 Video
- Kubernetes Documentation
- Docker Documentation
- JFrog Xray Documentation
- Clair GitHub Repository
- OWASP Dependency-Track
- Dive GitHub Repository
Hashtags: #Kubernetes #Docker #DevOps #CICD #Security #DevoxxFR #CharlesSabourdin #JeanChristopheSirot #JFrog #Clair
[DevoxxFR 2019] Back to Basics: Stop Wasting Time with Dates
At Devoxx France 2019, Frédéric Camblor, a web developer at 4SH in Bordeaux, delivered an insightful session on mastering date and time handling in software applications. Drawing from years of noting real-world issues in a notebook, Frédéric aimed to equip developers with the right questions to ask when working with dates, ensuring they avoid common pitfalls like time zone mismatches, daylight saving time (DST) quirks, and leap seconds.
Understanding Time Fundamentals
Frédéric began by exploring the historical context of time measurement, contrasting ancient solar-based “true time” with modern standardized systems. He introduced Greenwich Mean Time (GMT), now deprecated in favor of Coordinated Universal Time (UTC), which is based on International Atomic Time. UTC, defined by the highly regular oscillations of cesium-133 atoms (9,192,631,770 per second), is geopolitically agnostic, free from DST or seasonal shifts, with its epoch set at January 1, 1970, 00:00:00 Greenwich time.
The distinction between GMT and UTC lies in the irregularity of Earth’s rotation, affected by tidal forces and earthquakes. To align astronomical time (UT1) with atomic time, leap seconds are introduced every six months by the International Earth Rotation and Reference Systems Service (IERS). In Java, these leap seconds are smoothed over the last 1,000 seconds of June or December, making them transparent to developers. Frédéric emphasized the role of the Network Time Protocol (NTP), which synchronizes computer clocks to atomic time via a global network of root nodes, ensuring sub-second accuracy despite local quartz oscillator drift.
Time Representations in Software
Frédéric outlined three key time representations developers encounter: timestamps, ISO 8601 datetimes, and local dates/times. Timestamps, the simplest, count seconds or milliseconds since the 1970 epoch but face limitations, such as the 2038 overflow issue on 32-bit systems (though mitigated in Java). ISO 8601 datetimes (e.g., 2019-04-18T12:00:00+01:00) offer human-readable precision with time zone offsets, enhancing clarity over raw timestamps. Local dates/times, however, are complex, often lacking explicit time zone or DST context, leading to ambiguities in scenarios like recurring meetings.
Each representation has trade-offs. Timestamps are precise but opaque, ISO 8601 is readable but requires parsing, and local times carry implicit assumptions that can cause bugs if not clarified. Frédéric urged developers to choose representations thoughtfully based on application needs.
Navigating Time Zones and DST
Time zones, defined by the IANA database, are geopolitical regions with uniform time rules, distinct from time zone offsets (e.g., UTC+1). Frédéric clarified that a time zone like Europe/Paris can yield different offsets (UTC+1 or UTC+2) depending on DST, which requires a time zone table to resolve. These tables, updated frequently (e.g., nine releases in 2018), reflect geopolitical changes, such as Russia’s abrupt time zone shifts or the EU’s 2018 consultation to abolish DST by 2023. Frédéric highlighted the importance of updating time zone data in systems like Java (via JRE updates or TZUpdater), MySQL, or Node.js to avoid outdated rules.
DST introduces further complexity, creating “local time gaps” during spring transitions (e.g., 2:00–3:00 AM doesn’t exist) and overlaps in fall (e.g., 2:00–3:00 AM occurs twice). Libraries handle these differently: Moment.js adjusts invalid times, while Java throws exceptions. Frédéric warned against scheduling tasks like CRON jobs at local times prone to DST shifts (e.g., 2:30 AM), recommending UTC-based scheduling to avoid missed or duplicated executions.
Common Pitfalls and Misconceptions
Frédéric debunked several myths, such as “a day is always 24 hours” or “comparing dates is simple.” DST can result in 23- or 25-hour days, and leap years (every four years, except centurial years not divisible by 400) add complexity. For instance, 2000 was a leap year, but 2100 won’t be. Comparing dates requires distinguishing between equality (same moment) and identity (same time zone), as Java’s equals()
and isEqual()
methods behave differently.
JavaScript’s Date
object was singled out for its flaws, including inconsistent parsing (dashes vs. slashes shift time zones), zero-based months, and unreliable handling of pre-1970 dates. Frédéric recommended using libraries like Moment.js, Moment-timezone, or Luxon to mitigate these issues. He also highlighted edge cases, such as the non-existent December 30, 2011, in Samoa due to a time zone shift, which can break calendar applications.
Best Practices for Robust Date Handling
Frédéric shared practical strategies drawn from real-world experience. Servers and databases should operate in UTC to avoid DST issues and expose conversion bugs when client and server time zones differ. For searches involving local dates (e.g., retrieving messages by date), he advocated defining a date range (e.g., 00:00–23:59 in the user’s time zone) rather than a single date to account for implicit time zone assumptions. Storing future dates requires capturing the user’s time zone to handle potential rule changes.
For time-only patterns (e.g., recurring 3:00 PM meetings), storing the user’s time zone is critical to resolve DST ambiguities. Frédéric advised against storing times in datetime fields (e.g., as 1970-01-01T15:00:00), recommending string storage with time zone metadata. For date-only patterns like birthdays, using dedicated data structures prevents inappropriate operations, and storing at 12:00 UTC minimizes time zone shift bugs. Finally, he cautioned against local datetimes without time zones, as they cannot be reliably placed on a timeline.
Frédéric concluded by urging developers to question assumptions, update time zone data, and use appropriate time scales. His engaging talk, blending humor, history, and hard-earned lessons, left attendees ready to tackle date and time challenges with confidence.
Links:
Hashtags: #DateTime #TimeZones #DST #ISO8601 #UTC #DevoxxFR2019 #FrédéricCamblor #4SH #Java #JavaScript
Gradle: A Love-Hate Journey at Margot Bank
At Devoxx France 2019, David Wursteisen and Jérémy Martinez, developers at Margot Bank, delivered a candid talk on their experience with Gradle while building a core banking system from scratch. Their 45-minute session, “Gradle, je t’aime: moi non plus,” explored why they chose Gradle over alternatives, its developer-friendly features, script maintenance strategies, and persistent challenges like memory consumption. This post dives into their insights, offering a comprehensive guide for developers navigating build tools in complex projects.
Choosing Gradle for a Modern Banking System
Margot Bank, a startup redefining corporate banking, embarked on an ambitious project in 2017 to rebuild its IT infrastructure, including a core banking system (CBS) with Kotlin and Java modules. The CBS comprised applications for payments, data management, and a central “core” module, all orchestrated with microservices. Selecting a build tool was critical, given the need for speed, flexibility, and scalability. The team evaluated Maven, SBT, Bazel, and Gradle. Maven, widely used in Java ecosystems, lacked frequent updates, risking obsolescence. SBT’s Scala-based DSL added complexity, unsuitable for a Kotlin-focused stack. Bazel, while powerful for monorepos, didn’t support generic languages well. Gradle emerged as the winner, thanks to its task-based architecture, where tasks like compile
, jar
, and assemble
form a dependency graph, executing only modified components. This incremental build system saved time, crucial for Margot’s rapid iterations. Frequent releases (e.g., Gradle 5.1.1 in 2019) and a dynamic Groovy DSL further cemented its appeal, aligning with Devoxx’s emphasis on modern build tools.
Streamlining Development with Gradle’s Features
Gradle’s developer experience shone at Margot Bank, particularly with IntelliJ IDEA integration. The IDE auto-detected source sets (e.g., main
, test
, integrationTest
) and tasks, enabling seamless task execution. Eclipse support, though less polished, handled basic imports. The Gradle Wrapper, a binary committed to repositories, automated setup by downloading the specified Gradle version (e.g., 5.1.1) from a custom URL, secured with checksums. This ensured consistency across developer machines, a boon for onboarding. Dependency management leveraged dynamic configurations like api
and implementation
. For example, marking a third-party client like AmazingMail
as implementation
in a web app module hid its classes from transitive dependencies, reducing coupling. Composite builds, introduced in recent Gradle versions, allowed local projects (e.g., a mailer
module) to be linked without publishing to Maven Local, streamlining multi-project workflows. A notable pain point was disk usage: open-source projects’ varying Gradle versions accumulated 4GB on developers’ machines, as IntelliJ redundantly downloaded sources alongside binaries. Addressing an audience question, the team emphasized selective caching (e.g., wrapper binaries) to mitigate overhead, highlighting Gradle’s balance of power and complexity.
Enhancing Builds with Plugins and Kotlin DSL
For script maintainers, standardizing configurations across Margot’s projects was paramount. The team developed an internal Gradle plugin to centralize settings for linting (e.g., Ktlint), Nexus repositories, and releases. Applied via apply plugin: 'com.margotbank.standard'
, it ensured uniformity, reducing configuration drift. For project-specific logic, buildSrc
proved revolutionary. This module housed Kotlin code for tasks like version management, keeping build.gradle
files declarative. For instance, a Versions.kt
object centralized dependency versions (e.g., junit:5.3.1
), with unused ones grayed out in IntelliJ for cleanup. Migrating from Groovy to Kotlin DSL brought static typing benefits: autocompletion, refactoring, and navigation. A sourceSet.create("integrationTest")
call, though verbose, clarified intent compared to Groovy’s dynamic integrationTest {}
. Migration was iterative, file by file, avoiding disruptions. Challenges included verbose syntax for plugins like JaCoCo, requiring explicit casts. A buildSrc
extension for commit message parsing (e.g., extracting Git SHAs) exemplified declarative simplicity. This approach, inspired by Devoxx’s focus on maintainable scripts, empowered developers to contribute to shared tooling, fostering collaboration across teams.
Navigating Performance and Memory Challenges
Gradle’s performance, driven by daemons that keep processes in memory, was a double-edged sword. Daemons reduced startup time, but multiple instances (e.g., 5.1.1 and 5.0.10) occasionally ran concurrently, consuming excessive RAM. On CI servers, Gradle crashed under heavy loads, prompting tweaks: disabling daemons, adjusting Docker memory, and upgrading to Gradle 4.4.5 for better memory optimization. Diagnostics remained elusive, as crashes stemmed from either Gradle or the Kotlin compiler. Configuration tweaks like enabling caching (org.gradle.caching=true
) and parallel task execution (org.gradle.parallel=true
) improved build times, but required careful tuning. The team allocated maximum heap space (-Xmx4g
) upfront to handle large builds, reflecting Margot’s resource-intensive CI pipeline. An audience question on caching underscored selective imports (e.g., excluding redundant sources) to optimize costs. Looking ahead, Margot planned to leverage build caching for granular task reuse and explore tools like Build Queue for cleaner pipelines. Despite frustrations, Gradle’s flexibility and evolving features—showcased at Devoxx—made it indispensable, though memory management demanded ongoing vigilance.
Links :
Hashtags: #Gradle #KotlinDSL #BuildTools #DavidWursteisen #JeremyMartinez #DevoxxFrance2019
[VivaTech 2018] How VCs Are Growing Tomorrow’s Euro-corns
Philippe Botteri and Bernard Liautaud, moderated by Emmanuelle Duten, Editor-in-Chief at Les Echos/Capital Finance, explored how venture capitalists foster European unicorns at VivaTech 2018. Recorded in Paris and available on YouTube, this panel from Accel and Balderton Capital discusses creating ecosystems for startups to thrive. This post, with three subsections, delves into their strategies. Connect with Philippe on LinkedIn and Bernard on LinkedIn. Visit VivaTech.
Building a Robust Ecosystem
Philippe highlights Europe’s progress in producing high-value exits, citing Spotify’s $30 billion exit in 2018, surpassing U.S. (Dropbox, $12 billion) and Asian (Flipkart, $20 billion) counterparts. This shift reflects a maturing ecosystem, with Europe’s 25 unicorns trailing the U.S.’s 110 and China’s 60. Bernard emphasizes the ecosystem’s growth since 2006, driven by experienced entrepreneurs, global VCs, and ambitious talent. Top French universities now see 20-30% of graduates joining startups, not banks, signaling a cultural shift toward innovation.
The availability of capital, especially at early stages, supports this growth. Bernard notes that European funds have tripled in size, fostering competition and higher valuations. However, late-stage funding lags, with European champions raising $1.5 billion on average compared to $7.5 billion for U.S. firms. Philippe sees this as a maturity gap, not a failure, with Europe catching up rapidly through global ambitions and talent influx.
Prioritizing Sustainable Growth
Bernard argues that unicorn status, often driven by a single investor’s enthusiasm, is a misleading metric. He advocates focusing on revenue and long-term impact, aiming to build companies with $1-10 billion in revenue over 10-15 years. A billion-dollar valuation doesn’t guarantee sustainability; some firms reach $1 billion with just $50 million in revenue, only to be acquired. Spotify, generating over $1 billion quarterly, exemplifies the ideal: a scalable, high-revenue business.
Philippe counters that valuations reflect potential, not current worth. A $1 billion price tag signals a VC’s belief in a $5-10 billion future outcome, balanced against the risk of failure. Rapid technology adoption drives larger outcomes, justifying higher valuations. Both agree that sustainable growth requires aligning capital, talent, and ambition to create enduring European giants, not fleeting unicorns.
Navigating Capital and Exits
The influx of late-stage capital, like SoftBank’s $100 billion Vision Fund, creates winner-takes-all dynamics in certain sectors. Bernard notes this gives funded companies an edge but doesn’t suit all industries, where multiple players can coexist. Philippe emphasizes liquidity for employees and founders, critical for retaining talent. Late-stage rounds and secondary sales provide this, delaying IPOs but ensuring stakeholders benefit.
Emmanuelle raises audience concerns about overvaluation and bubbles. Both panelists dismiss bubble fears, describing the market as vibrant, not overheated. Philippe notes that competition on hot deals may inflate valuations, but real metrics—consumer and B2B growth—underpin most successes. Bernard predicts cyclical downturns but sees no systemic risk, with Europe’s ecosystem poised to produce innovative, global leaders.
Links
Hashtags: #VivaTech #PhilippeBotteri #BernardLiautaud #Accel #BaldertonCapital #Unicorns #VentureCapital #Spotify #EuropeanStartups
[DevoxxFR 2018] Java in Docker: Best Practices for Production
The practice of running Java applications within Docker containers has become widely adopted in modern software deployment, yet it is not devoid of potential challenges, particularly when transitioning to production environments. Charles Sabourdin, a freelance architect, and Jean-Christophe Sirot, an engineer at Docker, collaborated at DevoxxFR2018 to share their valuable experiences and disseminate best practices for optimizing Java applications inside Docker containers. Their insightful talk directly addressed common and often frustrating issues, such as containers crashing unexpectedly, applications consuming excessive RAM leading to node instability, and encountering CPU throttling. They offered practical solutions and configurations aimed at ensuring smoother and more reliable production deployments for Java workloads.
Navigating Common Pitfalls: Why Operations Teams May Approach Java Containers with Caution
The presenters initiated their session with a touch of humor, explaining why operations teams might exhibit a degree of apprehension when tasked with deploying a containerized Java application into a production setting. It’s a common scenario: containers that perform flawlessly on a developer’s local machine can begin to behave erratically or fail outright in production. This discrepancy often stems from a fundamental misunderstanding of how the Java Virtual Machine (JVM) interacts with the resource limits imposed by the container’s control groups (cgroups). Several key problems frequently surface in this context. Perhaps the most common is memory mismanagement; the JVM, particularly older versions, might not be inherently aware of the memory limits defined for its container by the cgroup. This lack of awareness can lead the JVM to attempt to allocate and use more memory than has been allocated to the container by the orchestrator or runtime. Such overconsumption inevitably results in the container being abruptly terminated by the operating system’s Out-Of-Memory (OOM) killer, a situation that can be difficult to diagnose without understanding this interaction.
Similarly, CPU resource allocation can present challenges. The JVM might not accurately perceive the CPU resources available to it within the container, such as CPU shares or quotas defined by cgroups. This can lead to suboptimal decisions in sizing internal thread pools (like the common ForkJoinPool or garbage collection threads) or can cause the application to experience unexpected CPU throttling, impacting performance. Another frequent issue is Docker image bloat. Overly large Docker images not only increase deployment times across the infrastructure but also expand the potential attack surface by including unnecessary libraries or tools, thereby posing security vulnerabilities. The talk aimed to equip developers and operations personnel with the knowledge to anticipate and mitigate these common pitfalls. During the presentation, a demonstration application, humorously named “ressources-munger,” was used to simulate these problems, clearly showing how an application could consume excessive memory leading to an OOM kill by Docker, or how it might trigger excessive swapping if not configured correctly, severely degrading performance.
JVM Memory Management and CPU Considerations within Containers
A significant portion of the discussion was dedicated to the intricacies of JVM memory management within the containerized environment. Charles and Jean-Christophe elaborated that older JVM versions, specifically those prior to Java 8 update 131 and Java 9, were not inherently “cgroup-aware”. This lack of awareness meant that the JVM’s default heap sizing heuristics—for example, typically allocating up to one-quarter of the physical host’s memory for the heap—would be based on the total resources of the host machine rather than the specific limits imposed on the container by its cgroup. This behavior is a primary contributor to unexpected OOM kills when the container’s actual memory limit is much lower than what the JVM assumes based on the host.
Several best practices were shared to address these memory-related issues effectively. The foremost recommendation is to use cgroup-aware JVM versions. Modern Java releases, particularly Java 8 update 191 and later, and Java 10 and newer, incorporate significantly improved cgroup awareness. For older Java 8 updates (specifically 8u131 to 8u190), experimental flags such as -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap
can be employed to enable the JVM to better respect container memory limits. In Java 10 and subsequent versions, this behavior became standard and often requires no special flags. However, even with cgroup-aware JVMs, explicitly setting the heap size using parameters like -Xms
for the initial heap size and -Xmx
for the maximum heap size is frequently a recommended practice for predictability and control. Newer JVMs also offer options like -XX:MaxRAMPercentage
, allowing for more dynamic heap sizing relative to the container’s allocated memory. It’s crucial to understand that the JVM’s total memory footprint extends beyond just the heap; it also requires memory for metaspace (which replaced PermGen in Java 8+), thread stacks, native libraries, and direct memory buffers. Therefore, when allocating memory to a container, it is essential to account for this total footprint, not merely the -Xmx
value. A common guideline suggests that the Java heap might constitute around 50-75% of the total memory allocated to the container, with the remainder reserved for these other essential JVM components and any other processes running within the container. Tuning metaspace parameters, such as -XX:MetaspaceSize
and -XX:MaxMetaspaceSize
, can also prevent excessive native memory consumption, particularly in applications that dynamically load many classes.
Regarding CPU resources, the presenters noted that the JVM’s perception of available processors is also influenced by its cgroup awareness. In environments where CPU resources are constrained, using flags like -XX:ActiveProcessorCount
can be beneficial to explicitly inform the JVM about the number of CPUs it should consider for sizing its internal thread pools, such as the common ForkJoinPool or the threads used for garbage collection. Optimizing the Docker image itself is another critical aspect of preparing Java applications for production. This involves choosing a minimal base image, such as alpine-jre
, distroless
, or official “slim” JRE images, instead of a full operating system distribution, to reduce the image size and potential attack surface. Utilizing multi-stage builds in the Dockerfile is a highly recommended technique; this allows developers to use a larger image containing build tools like Maven or Gradle and a full JDK in an initial stage, and then copy only the necessary application artifacts (like the JAR file) and a minimal JRE into a final, much smaller runtime image. Furthermore, being mindful of Docker image layering by combining related commands in the Dockerfile where possible can help reduce the number of layers and optimize image size. For applications on Java 9 and later, tools like jlink
can be used to create custom, minimal JVM runtimes that include only the Java modules specifically required by the application, further reducing the image footprint. The session strongly emphasized that a collaborative approach between development and operations teams, combined with a thorough understanding of both JVM internals and Docker containerization principles, is paramount for successfully and reliably running Java applications in production environments.
Links:
- Docker Official Website
- OpenJDK Docker Hub Official Images
- Understanding JVM Memory Management (Baeldung)
Hashtags: #Java #Docker #JVM #Containerization #DevOps #Performance #MemoryManagement #DevoxxFR2018 #CharlesSabourdin #JeanChristopheSirot #BestPractices #ProductionReady #CloudNative
[DevoxxFR 2018] Watch Out! Don’t Plug in That USB, You Might Get Seriously Hacked!
The seemingly innocuous USB drive, a ubiquitous tool for data transfer and device charging, can harbor hidden dangers. At Devoxx France 2018, Aurélien Loyer and Nathan Damie, both from Zenika , delivered a cautionary and eye-opening presentation titled “Attention ! Ne mets pas cette clé tu risques de te faire hacker très fort !” (Watch Out! Don’t Plug in That USB, You Might Get Seriously Hacked!). They demonstrated how easily a modified USB device, commonly known as a “BadUSB,” can be used to execute arbitrary code on a victim’s computer, often by emulating a keyboard.
The speakers explained that these malicious devices often don’t rely on exploiting software vulnerabilities. Instead, they leverage a fundamental trust computers place in Human Interface Devices (HIDs), such as keyboards and mice. A BadUSB device, disguised as a regular flash drive or even embedded within a cable or other peripheral, can announce itself to the operating system as a keyboard and then rapidly type and execute commands – all without the user’s direct interaction beyond plugging it in.
What is BadUSB and How Does It Work?
Aurélien Loyer and Nathan Damie explained that many BadUSB devices are not standard USB flash drives but rather contain small microcontrollers like the Adafruit Trinket or Pro Micro. These microcontrollers are programmed (often using the Arduino IDE ) to act as a Human Interface Device (HID), specifically a keyboard. When plugged into a computer, the operating system recognizes it as a keyboard and accepts input from it. The pre-programmed script on the microcontroller can then “type” a sequence of commands at high speed. This could involve opening a command prompt or terminal, downloading and executing malware from the internet, exfiltrating data, or performing other malicious actions.
The speakers demonstrated this live by plugging a device into a computer, which then automatically opened a text editor and typed a message, followed by executing commands to open a web browser and navigate to a specific URL. They showed the simplicity of the Arduino code required: essentially initializing the keyboard library and then sending keystroke commands (e.g., Keyboard.print(), Keyboard.press(), Keyboard.release()). More sophisticated attacks could involve a delay before execution, or triggering the payload based on certain conditions, making them harder to detect. Nathan even demonstrated a modified gaming controller that could harbor such a payload, executing it unexpectedly. The core danger lies in the fact that computers are generally designed to trust keyboard inputs without question.
Potential Dangers and Countermeasures
The implications of BadUSB attacks are significant. Aurélien and Nathan highlighted how easily these devices can be disguised. They showed examples of microcontrollers small enough to fit inside the plastic casing of a standard USB drive, or even integrated into USB cables or other peripherals like a mouse with a hidden logger. This makes visual inspection unreliable. The attack vector often relies on social engineering: an attacker might drop “lost” USB drives in a parking lot or other public area, hoping a curious individual will pick one up and plug it into their computer. Even seemingly harmless devices like e-cigarettes could potentially be weaponized if they contain a malicious microcontroller and are plugged into a USB port for charging.
As for countermeasures, the speakers emphasized caution. The most straightforward advice is to never plug in USB devices from unknown or untrusted sources. For situations where using an untrusted USB device for charging is unavoidable (though not recommended for data transfer), they mentioned “USB condoms” – small hardware dongles that physically block the data pins in a USB connection, allowing only power to pass through. However, this would render a data-carrying device like a flash drive unusable for its primary purpose. The session served as a stark reminder that physical security and user awareness are crucial components of overall cybersecurity, as even the most common peripherals can pose a threat.
Links:
-
Zenika: https://www.zenika.com/ (Previously found)
-
SparkFun Pro Micro: https://www.sparkfun.com/products/12640 (A common Pro Micro, the RP2040 version is a newer model)
Hashtags: #BadUSB #CyberSecurity #HardwareHacking #SocialEngineering #USB #Microcontroller #Arduino #Zenika #DevoxxFR2018 #PhysicalSecurity
[DevoxxFR 2018] Are you “merge” or “rebase” oriented?
Git, the distributed version control system, has become an indispensable tool in the modern developer’s arsenal, revolutionizing how teams collaborate on code. Its flexibility and power, however, come with a degree of complexity that can initially intimidate newcomers, particularly when it comes to integrating changes from different branches. At Devoxx France 2018, Jonathan Detoeuf, a freelance developer with a passion for Software Craftsmanship and Agile methodologies, tackled one of Git’s most debated topics in his presentation: “T’es plutôt merge ou rebase ?” (Are you more of a merge or rebase person?). He aimed to demystify these two fundamental Git commands, explaining their respective use cases, how to avoid common pitfalls, and ultimately, how to maintain a clean, understandable project history.
Jonathan began by underscoring the importance of the Git log (history) as a “Tower of Babel” – a repository of the team’s collective knowledge, containing not just the current source code but the entire evolution of the project. Well-crafted commit messages and a clear history are crucial for understanding past decisions, tracking down bugs, and onboarding new team members. With this premise, the choice between merging and rebasing becomes more than just a technical preference; it’s about how clearly and effectively a team communicates its development story through its Git history. Jonathan’s talk provided practical guidance, moving beyond the often-unhelpful official Git documentation that offers freedom but little explicit recommendation on when to use which strategy.
Understanding the Basics: Merge vs. Rebase and History
Before diving into specific recommendations, Jonathan Detoeuf revisited the core mechanics of merging and rebasing, emphasizing their impact on project history. A standard merge (often a “merge commit” when branches have diverged) integrates changes from one branch into another by creating a new commit that has two parent commits. This explicitly shows where a feature branch was merged back into a main line, preserving the historical context of parallel development. A fast-forward merge, on the other hand, occurs if the target branch hasn’t diverged; Git simply moves the branch pointer forward. Rebasing, in contrast, re-applies commits from one branch onto the tip of another, creating a linear history as if the changes were made sequentially. This can make the history look cleaner but rewrites it, potentially losing the context of when changes were originally made in relation to other branches.
Jonathan stressed the value of a well-maintained Git history. It’s not just a log of changes but a narrative of the project’s development. Clear commit messages are vital as they convey the intent behind changes. A good history allows for “archaeology” – understanding why a particular piece of code exists or how a bug was introduced, even years later when the original developers are no longer around. Therefore, the decision to merge or rebase should be guided by the desire to create a history that is both accurate and easy to understand. He cautioned that many developers fear losing code with Git, especially during conflict resolution, making it important to master these integration techniques.
The Case for Merging: Durable Branches and Significant Events
Jonathan Detoeuf advocated for using merge commits (specifically, non-fast-forward merges) primarily for integrating “durable” branches or marking significant events in the project’s lifecycle. Durable branches are long-lived branches like main, develop, or release branches. When merging one durable branch into another (e.g., merging a release branch into main), a merge commit clearly signifies this integration point. Similarly, a merge commit is appropriate for marking key milestones such as the completion of a release, the end of a sprint, or a deployment to production. These merge commits act as explicit markers in the history, making it easy to see when major features or versions were incorporated.
He contrasted this with merging minor feature branches where a simple fast-forward merge might be acceptable if the history remains clear, or if a rebase is preferred for a cleaner linear history before the final integration. The key is that the merge commit should add value by highlighting a significant integration point or preserving the context of a substantial piece of work being completed. If it’s just integrating a pull request for a small, self-contained feature that has been reviewed, other strategies like rebase followed by a fast-forward merge, or even “squash and merge,” might be preferable to avoid cluttering the main line history with trivial merge bubbles. Jonathan’s advice leans towards using merge commits judiciously to preserve meaningful historical context, especially for branches that represent a significant body of work or a persistent line of development.
The Case for Rebasing: Feature Branches and Keeping it Clean
Rebasing, according to Jonathan Detoeuf, finds its primary utility when working with local or short-lived feature branches before they are shared or merged into a more permanent branch. When a developer is working on a feature and the main branch (e.g., develop or main) has advanced, rebasing the feature branch onto the latest state of the main branch can help incorporate upstream changes cleanly. This process rewrites the feature branch’s history by applying its commits one by one on top of the new base, resulting in a linear sequence of changes. This makes the feature branch appear as if it was developed sequentially after the latest changes on the main branch, which can simplify the final merge (often allowing a fast-forward merge) and lead to a cleaner, easier-to-read history on the main line.
Jonathan also highlighted git pull –rebase as a way to update a local branch with remote changes, avoiding unnecessary merge commits that can clutter the local history when simply trying to synchronize with colleagues’ work on the same branch. Furthermore, interactive rebase (git rebase -i) is a powerful tool for “cleaning up” the history of a feature branch before creating a pull request or merging. It allows developers to squash multiple work-in-progress commits into more meaningful, atomic commits, edit commit messages, reorder commits, or even remove unwanted ones. This careful curation of a feature branch’s history before integration ensures that the main project history remains coherent and valuable. However, a crucial rule for rebasing is to never rebase a branch that has already been pushed and is being used by others, as rewriting shared history can cause significant problems for collaborators. The decision-making flowchart Jonathan presented often guided towards rebasing for feature branches to integrate changes from a durable branch, or to clean up history before a fast-forward merge.
Best Practices and Conflict Avoidance
Beyond the when-to-merge-vs-rebase dilemma, Jonathan Detoeuf shared several best practices for smoother collaboration and conflict avoidance. Regularly committing small, atomic changes makes it easier to manage history and resolve conflicts if they arise. Communicating with team members about who is working on what can also prevent overlapping efforts on the same files. Structuring the application well, with clear separation of concerns into different files or modules, naturally reduces the likelihood of merge conflicts.
When conflicts do occur, understanding the changes using git diff and carefully resolving them is key. Jonathan also touched upon various Git workflows, such as feature branching, Gitflow, or trunk-based development, noting that the choice of merge/rebase strategy often aligns with the chosen workflow. For instance, the “feature merge” (or GitHub flow) often involves creating a feature branch, working on it, and then merging it back (often via a pull request, which might use a squash merge or a rebase-and-merge strategy depending on team conventions). He ultimately provided a decision tree to help developers choose: for durable branches, merging is generally preferred to integrate other durable branches or significant features. For feature branches, rebasing is often used to incorporate changes from durable branches or to clean up history before a fast-forward merge. The overarching goal is to maintain an informative and clean project history that serves the team well.
Hashtags: #Git #VersionControl #Merge #Rebase #SoftwareDevelopment #DevOps #JonathanDetoeuf #DevoxxFR2018 #GitWorkflow #SourceControl
[DevoxxFR 2018] Apache Kafka: Beyond the Brokers – Exploring the Ecosystem
Apache Kafka is often recognized for its high-throughput, distributed messaging capabilities, but its power extends far beyond just the brokers. Florent Ramière from Confluent, a company significantly contributing to Kafka’s development, presented a comprehensive tour of the Kafka ecosystem at DevoxxFR2018. He aimed to showcase the array of open-source components that revolve around Kafka, enabling robust data integration, stream processing, and more.
Kafka Fundamentals and the Confluent Platform
Florent began with a quick refresher on Kafka’s core concept: an ordered, replayable log of messages (events) where consumers can read at their own pace from specific offsets. This design provides scalability, fault tolerance, and guaranteed ordering (within a partition), making it a cornerstone for event-driven architectures and handling massive data streams (Confluent sees clients handling up to 60 GB/s).
To get started, while Kafka involves several components like brokers and ZooKeeper, the Confluent Platform offers tools to simplify setup. The confluent
CLI can start a local development environment with Kafka, ZooKeeper, Kafka SQL (ksqlDB), Schema Registry, and more with a single command. Docker images are also readily available for containerized deployments.
Kafka Connect: Bridging Kafka with External Systems
A significant part of the ecosystem is Kafka Connect, a framework for reliably streaming data between Kafka and other systems. Connectors act as sources (ingesting data into Kafka from databases, message queues, etc.) or sinks (exporting data from Kafka to data lakes, search indexes, analytics platforms, etc.). Florent highlighted the availability of numerous pre-built connectors for systems like JDBC databases, Elasticsearch, HDFS, S3, and Change Data Capture (CDC) tools.
He drew a parallel between Kafka Connect and Logstash, noting that while Logstash is excellent, Kafka Connect is designed as a distributed, fault-tolerant, and scalable service for these data integration tasks. It allows for transformations (e.g., filtering, renaming fields, anonymization) within the Connect pipeline via a REST API for configuration. This makes it a powerful tool for building data pipelines without writing extensive custom code.
Stream Processing with Kafka Streams and ksqlDB
Once data is in Kafka, processing it in real-time is often the next step. Kafka Streams is a client library for building stream processing applications directly in Java (or Scala). Unlike frameworks like Spark or Flink that often require separate processing clusters, Kafka Streams applications are standalone Java applications that read from Kafka, process data, and can write results back to Kafka or external systems. This simplifies deployment and monitoring. Kafka Streams provides rich DSL for operations like filtering, mapping, joining streams and tables (a table in Kafka Streams is a view of the latest value for each key in a stream), windowing, and managing state, all with exactly-once processing semantics.
For those who prefer SQL to Java/Scala, ksqlDB (formerly Kafka SQL or KSQL) offers a SQL-like declarative language to define stream processing logic on top of Kafka topics. Users can create streams and tables from Kafka topics, perform continuous queries (SELECT statements that run indefinitely, emitting results as new data arrives), joins, aggregations over windows, and write results to new Kafka topics. ksqlDB runs as a separate server and uses Kafka Streams internally. It also manages stateful operations by storing state in RocksDB and backing it up to Kafka topics for fault tolerance. Florent emphasized that while ksqlDB is powerful for many use cases, complex UDFs or very intricate logic might still be better suited for Kafka Streams directly.
Schema Management and Other Essential Tools
When dealing with structured data in Kafka, especially in evolving systems, schema management becomes crucial. The Confluent Schema Registry helps manage and enforce schemas (typically Avro, but also Protobuf and JSON Schema) for messages in Kafka topics. It ensures schema compatibility (e.g., backward, forward, full compatibility) as schemas evolve, preventing data quality issues and runtime errors in producers and consumers. REST Proxy allows non-JVM applications to produce and consume messages via HTTP. Kafka also ships with command-line tools for performance testing (e.g., kafka-producer-perf-test
, kafka-consumer-perf-test
), latency checking, and inspecting consumer group lags, which are vital for operations and troubleshooting. Effective monitoring, often using JMX metrics exposed by Kafka components fed into systems like Prometheus via JMX Exporter or Jolokia, is also critical for production deployments.
Florent concluded by encouraging exploration of the Confluent Platform demos and his “kafka-story” GitHub repository, which provide step-by-step examples of these ecosystem components.
Links:
- Confluent Platform
- Apache Kafka Connect Documentation
- Apache Kafka Streams Documentation
- ksqlDB (formerly KSQL) Website
- Confluent Schema Registry Documentation
- Florent Ramière’s Kafka Story on GitHub
- Confluent Blog
Hashtags: #ApacheKafka #KafkaConnect #KafkaStreams #ksqlDB #Confluent #StreamProcessing #DataIntegration #DevoxxFR2018 #FlorentRamiere #EventDrivenArchitecture #Microservices #BigData
[DevoxxFR 2018] Software Heritage: Preserving Humanity’s Software Legacy
Software is intricately woven into the fabric of our modern world, driving industry, fueling innovation, and forming a critical part of our scientific and cultural knowledge. Recognizing the profound importance of the source code that underpins this digital infrastructure, the Software Heritage initiative was launched. At Devoxx France 2018, Roberto Di Cosmo, a professor, director of Software Heritage, and affiliated with Inria , delivered an insightful talk titled “Software Heritage: Pourquoi et comment preserver le patrimoine logiciel de l’Humanite” (Software Heritage: Why and How to Preserve Humanity’s Software Legacy). He articulated the mission to collect, preserve, and share all publicly available software source code, creating a universal archive for future generations – a modern-day Library of Alexandria for software.
Di Cosmo began by emphasizing that source code is not just a set of instructions for computers; it’s a rich repository of human knowledge, ingenuity, and history. From complex algorithms to the subtle comments left by developers, source code tells a story of problem-solving and technological evolution. However, this invaluable heritage is fragile and at risk of being lost due to obsolete storage media, defunct projects, and disappearing hosting platforms.
The Mission: Collect, Preserve, Share
The core mission of Software Heritage, as outlined by Roberto Di Cosmo, is threefold: to collect, preserve, and make accessible the entirety of publicly available software source code. This ambitious undertaking aims to create a comprehensive and permanent archive – an “Internet Archive for source code” – safeguarding it from loss and ensuring it remains available for research, education, industrial development, and cultural understanding.
The collection process involves systematically identifying and archiving code from a vast array of sources, including forges like GitHub, GitLab, Bitbucket, institutional repositories like HAL, and package repositories such as Gitorious and Google Code (many of which are now defunct, highlighting the urgency). Preservation is a long-term commitment, requiring strategies to combat digital obsolescence and ensure the integrity and continued accessibility of the archived code over decades and even centuries. Sharing this knowledge involves providing tools and interfaces for researchers, developers, historians, and the general public to explore this vast repository, discover connections between projects, and trace the lineage of software. Di Cosmo stressed that this is not just about backing up code; it’s about building a structured, interconnected knowledge base.
Technical Challenges and Approach
The scale of this endeavor presents significant technical challenges. The sheer volume of source code is immense and constantly growing. Code exists in numerous version control systems (Git, Subversion, Mercurial, etc.) and packaging formats, each with its own metadata and history. To address this, Software Heritage has developed a sophisticated infrastructure capable of ingesting code from diverse origins and storing it in a universal, canonical format.
A key element of their technical approach is the use of a Merkle tree structure, similar to what Git uses. All software artifacts (files, directories, commits, revisions) are identified by cryptographic hashes of their content. This allows for massive deduplication (since identical files or code snippets are stored only once, regardless of how many projects they appear in) and ensures the integrity and verifiability of the archive. This graph-based model also allows for the reconstruction of the full development history of software projects and the relationships between them. Di Cosmo explained that this structure not only saves space but also provides a powerful way to navigate and understand the evolution of software. The entire infrastructure itself is open source.
A Universal Archive for All
Roberto Di Cosmo emphasized that Software Heritage is built as a common infrastructure for society, serving multiple purposes. For industry, it provides a reference point for existing code, preventing reinvention and facilitating reuse. For science, it offers a vast dataset for research on software engineering, programming languages, and the evolution of code, and is crucial for the reproducibility of research that relies on software. For education, it’s a rich learning resource. And for society as a whole, it preserves a vital part of our collective memory and technological heritage.
He concluded with a call to action, inviting individuals, institutions, and companies to support the initiative. This support can take many forms: contributing code from missing sources, helping to develop tools and connectors for different version control systems, providing financial sponsorship, or simply spreading the word about the importance of preserving our software legacy. Software Heritage aims to be a truly global and collaborative effort to ensure that the knowledge embedded in source code is not lost to time.
Links:
-
Roberto Di Cosmo: Director, Software Heritage; Professor; Inria.: www.linkedin.com/in/roberto-di-cosmo/
-
Software Heritage: https://www.softwareheritage.org/
-
Inria (French National Institute for Research in Digital Science and Technology): https://www.inria.fr/fr
-
Devoxx France: https://www.devoxx.fr/
Hashtags: #SoftwareHeritage #OpenSource #Archive #DigitalPreservation #SourceCode #CulturalHeritage #RobertoDiCosmo #Inria #DevoxxFR2018