Recent Posts
Archives

PostHeaderIcon [DevoxxBE2024] Thinking Like an Architect

In a reflective talk at Devoxx Belgium 2024, Gregor Hohpe, a veteran architect, shared insights from two decades of experience in “Thinking Like an Architect.” Hohpe debunked the myth of architects as all-knowing decision-makers, instead portraying them as “IQ boosters” who enhance team decision-making through models, metaphors, and multi-level communication. Despite a minor issue with a clicker during the presentation, his engaging delivery and relatable examples, like the “architect elevator,” offered practical strategies for navigating complex organizational and technical landscapes.

Connecting Levels with the Architect Elevator

Hohpe introduced the “architect elevator,” a metaphor for architects’ role in bridging organizational layers—from developers to executives. He argued that the most valuable architects connect business strategy to technical implementation, translating complex trade-offs into terms executives understand without oversimplifying. For example, automation and frequent releases (developer priorities) enable security and cost-efficiency (executive concerns). This connection counters the isolation caused by layered organizations, where management may assume all is well due to buzzwords like Kubernetes, while developers operate with unchecked freedom.

Seeing More Dimensions in Decision-Making

Architects expand solution spaces by revealing additional dimensions, Hohpe explained. Using a sketch of a cylinder mistaken as a circle or rectangle, he showed how architects resolve debates—like speed versus quality—by introducing options like automated testing. At AWS, Hohpe tackled vendor lock-in by framing it as a two-dimensional trade-off: switching costs versus benefits. This approach, inspired by Adrian Cockcroft’s analogy of marriage as “accepted lock-in,” fosters rational discussions, avoiding binary thinking and helping teams find balanced solutions.

Selling Options to Defer Decisions

Hohpe likened architects to options traders, deferring decisions to reduce uncertainty. For instance, standard APIs allow language flexibility, sacrificing some protocol options to gain adaptability. In a financial firm, he explained this to executives using options trading, noting that options’ value rises with volatility—a concept they instantly grasped via the Black-Scholes formula. This metaphor underscores architecture’s increasing relevance in uncertain environments, aligning it with agile methodologies, which thrive under similar conditions. However, options come at the cost of complexity, a trade-off architects must weigh.

Zooming In and Out for System-Wide Perspective

To tackle complexity, architects must zoom in and out, balancing local and global optima. Hohpe illustrated this with two systems using identical components but different connections, yielding opposite characteristics (e.g., latency versus resilience). Local optimization, like perfecting a single component, often fails to ensure system-wide success, as seen in operations where “all lights are green, but nothing works.” By viewing systems holistically, architects ensure decisions align with broader goals, avoiding pitfalls like excessive layering that propagates changes unnecessarily.

Using Models to Navigate Uncertainty

Hohpe emphasized models as architects’ best tools for simplifying complexity. Comparing geocentric and heliocentric solar system models, he showed how the right model makes decisions obvious, even if imperfect. Models vary by purpose—topographical maps for hiking, population density for logistics—requiring architects to choose based on the question at hand. In uncertain environments, models shine by forcing assumptions, enabling scenario-based planning (e.g., low, medium, high user loads). Hohpe urged architects to avoid absolutes, embracing shades of gray to find optimal trade-offs.

Hashtags: #SoftwareArchitecture #ArchitectMindset #AgileArchitecture #DevoxxBE2024

PostHeaderIcon [DevoxxBE2024] Project Leyden: Improving Java’s Startup Time by Per Minborg, Sébastien Deleuze

Per Minborg and Sébastien Deleuze delivered an insightful joint presentation at Devoxx Belgium 2024, unveiling the transformative potential of Project Leyden to enhance Java application startup time, warmup, and footprint. Per, from Oracle’s Java Core Library team, and Sébastien, a Spring Framework core committer at Broadcom, explored how Leyden shifts computation across time to optimize performance. Despite minor demo hiccups, such as Wi-Fi-related delays, their talk combined technical depth with practical demonstrations, showcasing how Spring Boot 3.3 leverages Leyden’s advancements, cutting startup times significantly and paving the way for future Java optimizations.

Understanding Project Leyden’s Mission

Project Leyden, an open-source initiative under OpenJDK, aims to address long-standing Java performance challenges: startup time, warmup time, and memory footprint. Per explained startup as the duration from launching a program to its first useful operation, like displaying “Hello World” or serving a Spring app’s initial request. Warmup, conversely, is the time to reach peak performance via JIT compilation. Leyden’s approach involves shifting computations earlier (e.g., at build time) or later (e.g., via lazy initialization) while preserving Java’s dynamic nature. Unlike GraalVM Native Image or Project CRaC, which sacrifice dynamism for speed, Leyden maintains compatibility, allowing developers to balance performance and flexibility.

Class Data Sharing (CDS) and AOT Cache: Today’s Solutions

Per introduced Class Data Sharing (CDS), a feature available since JDK 5, and its evolution into the Ahead-of-Time (AOT) Cache, a cornerstone of Leyden’s strategy. CDS preloads JDK classes, while AppCDS, introduced in JDK 10, extends this to application classes. The AOT Cache, an upcoming enhancement, stores class objects, resolved linkages, and method profiles, enabling near-instant startup. Sébastien demonstrated this with a Spring Boot Pet Clinic application, reducing startup from 3.2 seconds to 800 milliseconds using CDS and AOT Cache. The process involves a training run to generate the cache, which is then reused for faster deployments, though it requires consistent JVM and classpath configurations.

Spring Boot’s Synergy with Leyden

Sébastien highlighted the collaboration between the Spring and Leyden teams, initiated after a 2023 JVM Language Summit case study. Spring Boot 3.3 introduces features to simplify CDS and AOT Cache usage, such as extracting executable JARs into a CDS-friendly layout. A demo showed how a single command extracts the JAR, runs a training phase, and generates a cache, which is then embedded in a container image. This reduced startup times by up to 4x and memory usage by 20% when combined with Spring’s AOT optimizations. Sébastien also demonstrated how AOT Cache retains JIT “warmness,” enabling near-peak performance from startup, though a minor performance plateau gap is being addressed.

Future Horizons and Trade-offs

Looking ahead, Leyden plans to introduce stable values, a hybrid between mutable and immutable fields, offering final-like performance with flexible initialization. Per emphasized that Leyden avoids the heavy constraints of GraalVM (e.g., limited reflection) or CRaC (e.g., Linux-only, security concerns with serialized secrets). While CRaC achieves millisecond startups, its lifecycle complexities and security risks limit adoption. Leyden’s AOT Cache, conversely, offers significant gains (2–4x faster startups) with minimal constraints, making it ideal for most use cases. Developers can experiment with Leyden’s early access builds to optimize their applications, with further enhancements like code cache storage on the horizon.

Hashtags: #ProjectLeyden #Java #SpringBoot #AOTCache #CDS #StartupTime #JVM #DevoxxBE2024 #PerMinborg #SébastienDeleuze

PostHeaderIcon [Scala IO Paris 2024] Calculating Is Funnier Than Guessing

In the ScalaIO Paris 2024 session “Calculating is funnier than guessing”, Regis Kuckaertz, a French developer living in an English-speaking country captivated the audience with a methodical approach to writing compilers for domain-specific languages (DSLs) in Scala. The talk debunked the mystique of compiler construction, emphasizing a principled, calculation-based process over ad-hoc guesswork. Using equational reasoning and structural induction, the speaker derived a compiler and stack machine for a simple boolean expression language, Expr, and extended the approach to the more complex ZPure datatype from the ZIO Prelude library. The result was a correct-by-construction compiler, offering performance gains over interpreters while remaining accessible to functional programmers.

Laying the Foundation with Equational Reasoning

The talk began by highlighting the limitations of interpreters for DSLs, which, while easy to write via structural induction, incur runtime overhead. The speaker argued that functional programming’s strength lies in embedding DSLs, citing examples like Cats Effect, ZIO, and Kulo for metrics. To achieve “abstraction without remorse,” DSLs must be compiled into efficient machine code. The proposed method, inspired by historical work on calculating compilers, avoids pre-made recipes, instead using a single-step derivation process combining evaluation, continuation-passing style (CPS), and defunctionalization.

For the Expr language, comprising boolean constants, negation, and conjunction, the speaker defined a denotational semantics with an evaluator function. This function maps expressions to boolean values, e.g., evaluating And(Not(B(true)), B(false)) to a boolean result. The evaluator was refined to make implicit behaviors explicit, such as Scala’s left-to-right evaluation of &&, ensuring the specification aligns with developer expectations. This step underscored the importance of intimate familiarity with execution details, uncovered through the derivation process.

Deriving a Compiler for Expr

The core of the talk was deriving a compiler and stack machine for Expr using equational reasoning. The correctness specification required that compiling an expression and executing it on a stack yields the same result as evaluating the expression and pushing it onto the stack. The compiler was defined with a helper function using symbolic CPS, taking a continuation to guide code generation. For each constructor—B (boolean), Not, and And—the speaker applied the specification, reducing expressions step-by-step.

For B, a Push instruction was introduced to place a boolean on the stack. For Not, a Neg instruction negated the top stack value, with the subexpression compiled inductively. For And, the derivation distributed stack operations over conditional branches, introducing an If instruction to select continuations based on a boolean. The final Compile function used a Halt continuation to stop execution. The resulting machine language and stack machine, implemented as an imperative tail-recursive loop, fit on a single slide, achieving orders-of-magnitude performance improvements over the interpreter.

Tackling Complexity with ZPure

To demonstrate real-world applicability, the speaker applied the technique to ZPure, a datatype from ZIO Prelude for pure computations with state, logging, and error handling. The language includes constructors for pure values, failures, error handling, state management, logging, and flat mapping. The evaluator threads state and logs, succeeding or failing based on the computation. The compiler derivation followed the same process, introducing instructions like PushThrowLoadStoreLogMarkUnmark, and Call to handle values, errors, state, and continuations.

The derivation for ZPure required careful handling of failures, where a Throw instruction invokes a failure routine that unwinds the stack until it finds a handler or crashes. For Catch and FlatMap, the speaker applied the induction hypothesis, introducing stack markers to manage handlers and continuations. Despite Scala functions in ZPure requiring runtime compilation, the speaker proposed defunctionalization—using data types like Flow or lambda calculus encodings—to eliminate this, though this was left as future work. The resulting compiler and machine, again fitting on a slide, were correct by construction, with unreachable cases confidently excluded.

Reflections and Future Directions

The talk emphasized that calculating compilers is a mechanical, repeatable process, not a mysterious art. By deriving machine instructions through equational reasoning, developers ensure correctness without extensive unit testing. The speaker noted a limitation in ZPure: its evaluator and compiler allow non-terminating expressions, which a partial monad could address. Future work includes defunctionalizing ZPure to avoid runtime compilation and optimizing machine code into directed acyclic graphs to reduce duplication.

The speaker recommended resources like Philip Wadler’s papers on calculating compilers, encouraging functional programmers to explore this approachable technique. The talk, blending humor with rigor, demonstrated that compiling DSLs is not only feasible but also “funnier” than guessing, offering a path to efficient, correct code.

Hashtags: #Scala #CompilerDesign #EquationalReasoning #ZPure #ScalaIOParis2024 #FunctionalProgramming

PostHeaderIcon [AWS Summit Paris 2024] Winning Fundraising Strategies for 2024

The AWS Summit Paris 2024 session “Levée de fonds en 2024 – Les stratégies gagnantes” (SUP112-FR) offered a 29-minute panel with investors sharing insights on startup fundraising. Anaïs Monlong (Iris Capital), Audrey Soussan (Ventech), and Samantha Jérusalmy (Elaia) discussed market trends, investor expectations, and pitching tips for early-stage startups. With European VC funding down 40% to €45B in 2023 (2024 Atomico), this post outlines strategies to secure funding in 2024.

2024 Fundraising Market

Samantha Jérusalmy described 2024 as challenging post-2021 bubble, with investors prioritizing profitability (80% of VCs, 2024 PitchBook). However, Audrey Soussan highlighted ample liquidity, with early-stage deals (Seed/Series A) making up 60% of EU funding in 2023 (2024 Dealroom). Anaïs Monlong noted a tripling of VC assets in five years, driven by corporate interest in tech, especially AI (€10B raised in 2023, 2024 Sifted). Sectors like cloud-enabled industries and data utilization remain attractive.

Investor Expectations

Samantha explained VC business models: funds (e.g., €150–250M) seek 10–30% stakes, aiming for exits at €1B+ to return multiples (2–5x). A €1B exit with 10% yields €100M, insufficient for a €200M fund without multiple “unicorns.” Investors need billion-euro addressable markets. Audrey advised Seed startups to show €50K monthly revenue or design partners, while Series A requires recurring revenue. Anaïs emphasized strong tech cores (e.g., Shi Technology, Exotec) for industrial transformation.

Pitching Best Practices

Anaïs recommended concise pitch decks: market size, product screenshots, team background. Avoid premature valuation claims, as pricing varies widely. Target one fund contact to ensure follow-up, leveraging their sector fit. Audrey suggested sizing rounds for 18–24 months at the lower end, adjusting upward if oversubscribed. Overshooting (e.g., €5M to €1M) signals weakness. Samantha stressed pre-pitch fund alignment, avoiding large funds for sub-€1B markets.

Valuation Strategies

Samantha likened valuation to a “marriage,” advising entrepreneurs to build rapport before discussing terms. Audrey urged creating competition among investors to optimize valuation, but warned high valuations risk harsh terms or down rounds (lower valuations in later rounds). Anaïs clarified valuations aren’t discounted cash flow-based but market-driven, aligning with recent deals. All advised balancing valuation with investor value-add and long-term equity story to avoid Series A/B traps.

PostHeaderIcon [KCD UK 2024] Deep Dive into Kubernetes Runtime Security

Saeid Bostandoust, founder of CubeDemi.io, delivered an in-depth presentation at KCDUK2024 on Kubernetes runtime security, focusing on tools and techniques to secure containers during execution. As a CNCF project contributor, Saeid explored Linux security features like Linux Capabilities, SELinux, AppArmor, Seccomp-BPF, and KubeArmor, providing a comprehensive overview of how these can be applied in Kubernetes to mitigate zero-day attacks and enforce policies. His talk emphasized practical implementation, observability, and policy enforcement, aligning with KCDUK2024’s focus on securing cloud-native environments.

Understanding Runtime Security

Saeid defined runtime security as protecting applications during execution, contrasting it with pre-runtime measures like static code analysis. Runtime security focuses on mitigating zero-day attacks and malicious behavior through real-time intrusion detection, process isolation, policy enforcement, and monitoring. Linux offers over 30 security mechanisms, including SELinux, AppArmor, Linux Capabilities, Seccomp-BPF, and namespaces, alongside kernel drivers to counter threats like Meltdown and Spectre. Saeid focused on well-known features, explaining their roles and Kubernetes integration.

Linux Capabilities: Historically, processes were either privileged (with full root permissions) or unprivileged, leading to vulnerabilities like privilege escalation via commands like ping. Linux Capabilities, introduced to granularly assign permissions, allow processes to perform specific actions (e.g., opening raw sockets for ping) without full root privileges. In Kubernetes, capabilities can be configured in pod manifests to drop unnecessary permissions, enhancing security even for root-run containers.

Seccomp-BPF: Seccomp (Secure Computing) restricts system calls a process can make. Originally limited to basic calls (read, write, exit), Seccomp-BPF extends this with customizable profiles. In Kubernetes, a Seccomp-BPF profile can be defined in a JSON file and applied via a pod’s security context, terminating processes that attempt unauthorized system calls. Saeid demonstrated a restrictive profile that limits a container to basic operations, preventing it from running if additional system calls are needed.

LSM Modules (AppArmor, SELinux, BPF-LSM): Linux Security Modules (LSM) provide hooks to intercept operations, such as file access or network communication. AppArmor uses path-based profiles, while SELinux employs label-based policies. BPF-LSM, a newer option, allows dynamic policy injection via eBPF code, offering flexibility without requiring application restarts. Saeid noted that BPF-LSM, available in kernel versions 5.7 and later, supports stacking with other LSMs, enhancing Kubernetes security.

KubeArmor: Simplifying Policy Enforcement

KubeArmor, a CNCF project, simplifies runtime security by allowing users to define policies via Kubernetes Custom Resource Definitions (CRDs). These policies are translated into AppArmor, SELinux, or BPF-LSM profiles, depending on the worker node’s LSM. KubeArmor addresses the challenge of syncing profiles across cluster nodes, automating deployment and updates. It uses eBPF for observability, monitoring system calls and generating telemetry for tools like Prometheus and Elasticsearch. Saeid showcased KubeArmor’s architecture, including a daemon set with an init container for compiling eBPF code and a relay server for aggregating logs and alerts.

An example policy demonstrated KubeArmor denying access to sensitive files (e.g., /etc/passwd/etc/shadow) and commands (e.g., aptapt-get), with logs showing enforcement details. Unlike manual AppArmor or SELinux profiles, which are complex and hard to scale, KubeArmor’s declarative approach and default deny policies simplify securing containers, preventing access to dangerous assets like /proc mounts.

Practical Implementation and Community Engagement

Saeid provided practical examples, such as configuring a pod to drop all capabilities except those needed, applying a Seccomp-BPF profile to restrict system calls, and using KubeArmor to enforce file access policies. He highlighted KubeArmor’s integration with tools like OPA Gatekeeper to block unauthorized commands (e.g., kubectl exec) when BPF-LSM is unavailable. For further learning, Saeid offered a 50% discount on CubeDemi.io’s container security workshop, encouraging KCDUK2024 attendees to deepen their Kubernetes security expertise.

Hashtags: #Kubernetes #RuntimeSecurity #KubeArmor #CNCF #LinuxSecurity #SaeidBostandoust #KCDUK2024

PostHeaderIcon [KotlinConf2024] Hacking Sony Cameras with Kotlin

At KotlinConf2024, Rahul Ravikumar, a Google software engineer, shared his adventure reverse-engineering Sony’s Bluetooth Low Energy (BLE) protocol to build a Kotlin Multiplatform (KMP) remote camera app. Frustrated by Sony’s bloated apps—Imaging Edge Mobile (1.9 stars) and Creators’ App (3.3 stars)—Rahul crafted a lean solution for his Sony Alpha a7r Mark 5, focusing on remote control. Using Compose Multiplatform for desktop and mobile, and Sony’s C SDK via cinterop, he demonstrated how KMP enables cross-platform innovation. His live demo, clicking a photo with a single button, thrilled the audience and showcased BLE’s potential for fun and profit.

Reverse-Engineering Sony’s BLE Protocol

Rahul’s journey began with Sony’s underwhelming app ecosystem, prompting him to reverse-engineer the camera’s undocumented BLE protocol. BLE’s Generic Access Profile (GAP) handles device discovery, with the camera (peripheral) advertising its presence and the phone (central) connecting. The Generic Attribute Profile (GATT) manages commands, using 16-bit UUIDs for services like Sony’s remote control (FF01 for commands, FF02 for notifications). Unable to use Android’s HCI Snoop logs due to Sony’s Wi-Fi Direct reliance, Rahul employed a USB BLE sniffer and Wireshark to capture GATT traffic. He identified Sony’s company ID (0x02D01) and camera marker (0x03000) in advertising packets. Key operations—reset (0x0106), focus (0x0107), and capture (0x0109)—form a state machine, with notifications (e.g., 0x023F) confirming actions. This meticulous process, decoding hexadecimal payloads, enabled Rahul to control the camera programmatically.

Building a KMP Remote Camera App

With the protocol cracked, Rahul built a KMP app using Compose Multiplatform, targeting Android and desktop. The app’s BLE scanner filters for Sony’s manufacturer data (0x03000), ignoring irrelevant metadata like model codes. Connection logic uses Kotlin Flows to monitor peripheral state, ensuring seamless reconnections. Capturing a photo involves sending reset and focus commands to FF01, awaiting focus confirmation on FF02, then triggering capture and shutter reset. For advanced features, Rahul integrated Sony’s C SDK via cinterop, navigating its complexities to access functions like interval shooting. His live demo, despite an initially powered-off camera, succeeded when the camera advertised, and a button click took a photo, earning audience cheers. The app’s simplicity contrasts Sony’s feature-heavy apps, proving KMP’s power for cross-platform development. Rahul’s GitHub repository offers the code, inviting developers to explore BLE and KMP for their own projects.

Hashtags: #KotlinMultiplatform #BluetoothLE

PostHeaderIcon [PyConUS 2024] Pandas + Dask DataFrame 2.0: A Leap Forward in Distributed Computing

At PyCon US 2024, Patrick Hoefler delivered an insightful presentation on the advancements in Dask DataFrame 2.0, particularly its enhanced integration with pandas and its performance compared to other big data tools like Spark, DuckDB, and Polars. As a maintainer of both pandas and Dask, Patrick, who works at Coiled, shared how recent improvements have transformed Dask into a robust and efficient solution for distributed computing, making it a compelling choice for handling large-scale datasets.

Enhanced String Handling with Arrow Integration

One of the most significant upgrades in Dask DataFrame 2.0 is its adoption of Apache Arrow for string handling, moving away from the less efficient NumPy object data type. Patrick highlighted that this shift has resulted in substantial performance gains. For instance, string operations are now two to three times faster in pandas, and in Dask, they can achieve up to tenfold improvements due to better multithreading capabilities. Additionally, memory usage has been drastically reduced—by approximately 60 to 70% in typical datasets—making Dask more suitable for memory-constrained environments. This enhancement ensures that users can process large datasets with string-heavy columns more efficiently, a critical factor in distributed workloads.

Revolutionary Shuffle Algorithm

Patrick emphasized the complete overhaul of Dask’s shuffle algorithm, which is pivotal for distributed systems where data must be communicated across multiple workers. The previous algorithm scaled poorly, with a logarithmic complexity that hindered performance as dataset sizes grew. The new peer-to-peer (P2P) shuffle algorithm, however, scales linearly, ensuring that doubling the dataset size only doubles the workload. This improvement not only boosts performance but also enhances reliability, allowing Dask to handle arbitrarily large datasets with constant memory usage by leveraging disk storage when necessary. Such advancements make Dask a more resilient choice for complex data processing tasks.

Query Planning: A Game-Changer

The introduction of a logical query planning layer marks a significant milestone for Dask. Historically, Dask executed operations as they were received, often leading to inefficient processing. The new query optimizer employs techniques like column projection and predicate pushdown, which significantly reduce unnecessary data reads and network transfers. For example, by identifying and prioritizing filters and projections early in the query process, Dask can minimize data movement, potentially leading to performance improvements of up to 1000x in certain scenarios. This optimization makes Dask more intuitive and efficient, aligning it closer to established systems like Spark.

Benchmarking Against the Giants

Patrick presented comprehensive benchmarks using the TPC-H dataset to compare Dask’s performance against Spark, DuckDB, and Polars. At a 100 GB scale, DuckDB often outperformed others due to its single-node optimization, but Dask held its own. At larger scales (1 TB and 10 TB), Dask’s distributed nature gave it an edge, particularly when DuckDB struggled with memory constraints on complex queries. Against Spark, Dask showed remarkable progress, outperforming it in most queries at the 1 TB scale and maintaining competitiveness at 10 TB, despite some overhead issues that Patrick noted are being addressed. These results underscore Dask’s growing capability to handle enterprise-level data processing tasks.

Hashtags: #Dask #Pandas #BigData #DistributedComputing #PyConUS2024 #PatrickHoefler #Coiled #Spark #DuckDB #Polars

PostHeaderIcon Running Docker Natively on WSL2 (Ubuntu 24.04) in Windows 11

For many developers, Docker Desktop has long been the default solution to run Docker on Windows. However, licensing changes and the desire for a leaner setup have pushed teams to look for alternatives. Fortunately, with the maturity of Windows Subsystem for Linux 2 (WSL2), it is now possible to run the full Docker Engine directly inside a Linux distribution such as Ubuntu 24.04, while still accessing containers seamlessly from both Linux and Windows.

In this guide, I’ll walk you through a clean, step-by-step setup for running Docker Engine inside WSL2 without Docker Desktop, explain how Windows and WSL2 communicate, and share best practices for maintaining a healthy development environment.


Why Run Docker Inside WSL2?

Running Docker natively inside WSL2 has several benefits:

  • No licensing issues – you avoid Docker Desktop’s commercial license requirements.
  • Lightweight – no heavy virtualization layer; containers run directly inside your WSL Linux distro.
  • Integrated networking – on Windows 11 with modern WSL versions,
    containers bound to localhost inside WSL are automatically reachable from Windows.
  • Familiar Linux workflow – you install and use Docker exactly as you would on a regular Ubuntu server.

Step 1 – Update Ubuntu

Open your Ubuntu 24.04 terminal and ensure your system is up to date:

sudo apt update && sudo apt upgrade -y

Step 2 – Install Docker Engine

Install Docker using the official Docker repository:

# Install prerequisites
sudo apt install -y ca-certificates curl gnupg lsb-release

# Add Docker’s GPG key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
  sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

# Configure Docker repository
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker Engine
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Step 3 – Run Docker Without sudo

To avoid prefixing every command with sudo, add your user to the docker group:

sudo usermod -aG docker $USER

Restart your WSL terminal for the change to take effect, then verify:

docker --version
docker ps

Step 4 – Test Networking

One of the most common questions is:
“Will my containers be accessible from both Ubuntu and Windows?”
The answer is yes on modern Windows 11 with WSL2.
Let’s test it by running an Nginx container:

docker run -d -p 8080:80 --name webtest nginx
  • Inside Ubuntu (WSL): curl http://localhost:8080
  • From Windows (browser or PowerShell): http://localhost:8080

Thanks to WSL2’s localhost forwarding, Windows traffic to localhost is routed
into the WSL network, making containers instantly accessible without extra configuration.


Step 5 – Run Multi-Container Applications with Docker Compose

The Docker Compose plugin is already installed as part of the package above. Check the version:

docker compose version

Create a docker-compose.yml for a WordPress + MySQL stack:

version: "3.9"
services:
  db:
    image: mysql:8.0
    environment:
      MYSQL_ROOT_PASSWORD: rootpass
      MYSQL_DATABASE: wordpress
      MYSQL_USER: wpuser
      MYSQL_PASSWORD: wppass
    volumes:
      - db_data:/var/lib/mysql

  wordpress:
    image: wordpress:latest
    ports:
      - "8080:80"
    environment:
      WORDPRESS_DB_HOST: db:3306
      WORDPRESS_DB_USER: wpuser
      WORDPRESS_DB_PASSWORD: wppass
      WORDPRESS_DB_NAME: wordpress

volumes:
  db_data:

Start the services:

docker compose up -d

Once the containers are running, open http://localhost:8080 in your Windows browser
to access WordPress. The containers are managed entirely inside WSL2,
but networking feels seamless.


Maintenance: Cleaning Up Docker Data

Over time, Docker accumulates images, stopped containers, volumes, and networks.
This can take up significant disk space inside your WSL distribution.
Here are safe maintenance commands to keep your environment clean:

Remove Unused Objects

docker system prune -a --volumes
  • -a: removes all unused images, not just dangling ones
  • --volumes: also removes unused volumes

Reset Everything (Dangerous)

If you need to wipe your Docker environment completely (images, containers, volumes, networks):

docker stop $(docker ps -aq) 2>/dev/null
docker rm -f $(docker ps -aq) 2>/dev/null
docker volume rm $(docker volume ls -q) 2>/dev/null
docker network rm $(docker network ls -q) 2>/dev/null
docker image rm -f $(docker image ls -q) 2>/dev/null

⚠️ Use this only if you want to start fresh. All data will be removed.


Conclusion

By running Docker Engine directly inside WSL2, you gain a powerful, lightweight, and license-free Docker environment that integrates seamlessly with Windows 11. Your containers are accessible from both Linux and Windows, Docker Compose works out of the box, and maintenance is straightforward with prune commands.

This approach is particularly well-suited for developers who want the flexibility of Docker without the overhead of Docker Desktop. With WSL2 and Ubuntu 24.04, you get the best of both worlds: Linux-native Docker with Windows accessibility.

PostHeaderIcon Predictive Modeling and the Illusion of Signal

Introduction

Vincent Warmerdam delves into the illusions often encountered in predictive modeling, highlighting the cognitive traps and statistical misconceptions that lead to overconfidence in model performance.

The Seduction of Spurious Correlations

Models often perform well on training data by exploiting noise rather than genuine signal. Vincent emphasizes critical thinking and statistical rigor to avoid being misled by deceptively strong results.

Building Robust Models

Using robust cross-validation, considering domain knowledge, and testing against out-of-sample data are vital strategies to counteract the illusion of predictive prowess.

Conclusion

Data science is not just coding and modeling — it requires constant skepticism, critical evaluation, and humility. Vincent reminds us to stay vigilant against the comforting but dangerous mirage of false predictability.

PostHeaderIcon Building Intelligent Data Products at Scale

Introduction

Thomas Vachon shares insights into scaling data-driven products, blending machine learning, engineering, and user-centric design to create impactful and intelligent applications.

Key Ingredients for Success

Building intelligent products requires aligning data pipelines, model training, deployment infrastructure, and feedback loops. Vachon stresses the importance of cross-functional collaboration between data scientists, software engineers, and product teams.

Real-World Lessons

From architectural best practices to team organization strategies, Vachon illustrates how to navigate the complexity of scaling data initiatives sustainably.

Conclusion

Intelligent data products demand not only technical excellence but also thoughtful design, scalability planning, and user empathy from day one.