[PyConUS 2024] Pandas + Dask DataFrame 2.0: A Leap Forward in Distributed Computing

Author: Jonathan Lalou

At PyCon US 2024, Patrick Hoefler delivered an insightful presentation on the advancements in Dask DataFrame 2.0, particularly its enhanced integration with pandas and its performance compared to other big data tools like Spark, DuckDB, and Polars. As a maintainer of both pandas and Dask, Patrick, who works at Coiled, shared how recent improvements have transformed Dask into a robust and efficient solution for distributed computing, making it a compelling choice for handling large-scale datasets.

Enhanced String Handling with Arrow Integration

One of the most significant upgrades in Dask DataFrame 2.0 is its adoption of Apache Arrow for string handling, moving away from the less efficient NumPy object data type. Patrick highlighted that this shift has resulted in substantial performance gains. For instance, string operations are now two to three times faster in pandas, and in Dask, they can achieve up to tenfold improvements due to better multithreading capabilities. Additionally, memory usage has been drastically reduced—by approximately 60 to 70% in typical datasets—making Dask more suitable for memory-constrained environments. This enhancement ensures that users can process large datasets with string-heavy columns more efficiently, a critical factor in distributed workloads.

Revolutionary Shuffle Algorithm

Patrick emphasized the complete overhaul of Dask’s shuffle algorithm, which is pivotal for distributed systems where data must be communicated across multiple workers. The previous algorithm scaled poorly, with a logarithmic complexity that hindered performance as dataset sizes grew. The new peer-to-peer (P2P) shuffle algorithm, however, scales linearly, ensuring that doubling the dataset size only doubles the workload. This improvement not only boosts performance but also enhances reliability, allowing Dask to handle arbitrarily large datasets with constant memory usage by leveraging disk storage when necessary. Such advancements make Dask a more resilient choice for complex data processing tasks.

Query Planning: A Game-Changer

The introduction of a logical query planning layer marks a significant milestone for Dask. Historically, Dask executed operations as they were received, often leading to inefficient processing. The new query optimizer employs techniques like column projection and predicate pushdown, which significantly reduce unnecessary data reads and network transfers. For example, by identifying and prioritizing filters and projections early in the query process, Dask can minimize data movement, potentially leading to performance improvements of up to 1000x in certain scenarios. This optimization makes Dask more intuitive and efficient, aligning it closer to established systems like Spark.

Benchmarking Against the Giants

Patrick presented comprehensive benchmarks using the TPC-H dataset to compare Dask’s performance against Spark, DuckDB, and Polars. At a 100 GB scale, DuckDB often outperformed others due to its single-node optimization, but Dask held its own. At larger scales (1 TB and 10 TB), Dask’s distributed nature gave it an edge, particularly when DuckDB struggled with memory constraints on complex queries. Against Spark, Dask showed remarkable progress, outperforming it in most queries at the 1 TB scale and maintaining competitiveness at 10 TB, despite some overhead issues that Patrick noted are being addressed. These results underscore Dask’s growing capability to handle enterprise-level data processing tasks.

[DefCon32] Closing Ceremonies & Awards

Author: jonathan

As the echoes of innovation and collaboration fade from the halls of the Las Vegas Convention Center, the closing ceremonies of DEF CON 32 encapsulate the spirit of a community that thrives on engagement, resilience, and shared purpose. Hosted by Jeff Moss, known as Dark Tangent, alongside contributors like Mar Williams and representatives from various teams, the event reflects on achievements, honors trailblazers, and charts a course forward. Amid reflections on past giants and celebrations of current triumphs, the gathering underscores the hacker ethos: pushing boundaries while fostering inclusivity and growth.

Jeff opens with a tone of relief and gratitude, acknowledging the unforeseen venue shift that tested the community’s adaptability. What began as a potential setback transformed into a revitalized experience, with attendees praising the spacious layout that evoked the intimacy of earlier conventions. This backdrop sets the stage for a moment of solemnity, where participants pause to honor those who paved the way—mentors, innovators, and unsung heroes whose legacies endure in the collective memory.

The theme of “engage” permeates the proceedings, inspiring initiatives that extend the conference’s impact beyond its annual confines. Jeff highlights two new ventures aimed at channeling the community’s expertise toward societal good and personal advancement. These efforts embody a commitment to proactive involvement, bridging the gap between hacker ingenuity and real-world challenges.

Honoring the Past: A Moment of Reflection

In a poignant start, Jeff calls for silence to remember predecessors whose contributions form the foundation of today’s cybersecurity landscape. This ritual serves as a reminder that progress stems from accumulated wisdom, urging attendees to carry forward the ethos of giving back. The gesture resonates deeply, connecting generations and reinforcing the communal bonds that define DEF CON.

Transitioning to celebration, the ceremonies spotlight individuals and organizations embodying selfless dedication. Jeff presents the Uber Contributor Award to The Prophet, a figure whose decades-long involvement spans writing for 2600 magazine, educating newcomers, and organizing events like Telephreak Challenge and QueerCon. His journey from phreaker to multifaceted influencer exemplifies the transformative power of sustained engagement. The Prophet’s acceptance speech captures the magic of the community, where dreams materialize through collective effort.

Similarly, the Electronic Frontier Foundation (EFF) receives recognition for over two decades of advocacy, raising $130,000 this year alone to support speakers and defend digital rights. Their representative emphasizes EFF’s role in amplifying security research for global benefit, aligning with DEF CON’s mission to empower ethical hacking.

Embracing the Theme: Engagement in Action

The “engage” motif drives discussions on evolving the community’s role in an increasingly complex digital world. Jeff articulates how this concept prompted bold experiments, acknowledging the uncertainties but embracing potential failures as learning opportunities. This mindset reflects the hacker’s adaptability, turning challenges into catalysts for innovation.

Attendees share feedback on the new venue, noting reduced overcrowding and a more relaxed atmosphere reminiscent of DEF CON’s earlier editions. Such observations validate the rapid pivot from the previous location, a decision thrust upon organizers by an unexpected contract termination. Jeff recounts the whirlwind process with humor, crediting quick alliances and the community’s resilience for the seamless transition.

Spotlight on Creativity: The Badge Unveiled

Mar Williams takes the stage to demystify the DEF CON 32 badge, a testament to accessible design and collaborative artistry. Drawing from a concept rooted in inclusivity, Mar aimed to create something approachable for novices while offering depth for experts. Partnering with Raspberry Pi, the badge incorporates layers of interactivity—from loading custom ROMs to developing games via GB Studio.

Acknowledgments flow to the team: Bonnie Finley for 3D modeling and game art, Chris Maltby for plugins and development, Nutmeg for additional game work, Will Tuttle for narrative input, Ada Rose Cannon for character creation, Legion 303 for audio, and others like ICSN for manufacturing. Mar’s vision emphasizes community participation, with the badge’s game dedicating itself to players who engage and make an impact. Challenges like SOS signals and proximity interactions foster connections, while post-conference resources encourage ongoing tinkering.

Triumphs in Competition: Village and Challenge Winners

The ceremonies burst with energy as winners from myriad contests are announced, showcasing the breadth of skills within the community. From the AI Village Capture the Flag, where teams like AI Cyber Challenge victors demonstrate prowess in emerging tech, to the Aviation Village’s high-flying achievements, each victory highlights specialized expertise.

Notable accolades include the AppSec Village’s top performers in secure coding, the Biohacking Village’s innovative health hacks, and the Car Hacking Village’s vehicular exploits. The Cloud Village CTF crowns champions in scalable defenses, while the Crypto & Privacy Village recognizes cryptographic ingenuity. Diversity shines through in the ICS Village’s industrial control triumphs and the IoT Village’s device dissections.

Special mentions go to the Lockpick Village’s dexterity masters, the Misinformation Village’s truth-seekers, and the Packet Hacking Village’s network ninjas. The Password Cracking Contest and Physical Pentest Challenge celebrate brute force and subtle infiltration, respectively. The Policy Village engages in advocacy wins, and the Recon Village excels in intelligence gathering.

Celebrating Hands-On Innovation: More Contest Highlights

The Red Team Village’s strategic simulations yield victors in offensive operations, complemented by the RFID Village’s access control breakthroughs. Rogue Access Point contests reward wireless wizardry, while the Soldering Skills Village honors precise craftsmanship.

The Space Security Village pushes boundaries in orbital defenses, and the Tamper Evident Village masters detection of intrusions. Telecom and Telephreak challenges revive analog artistry, with the Vishing Competition testing social engineering finesse. The Voting Village exposes electoral vulnerabilities, and the WiFi Village dominates spectrum battles.

Wireless CTF and Wordle Hacking rounds out the roster, each contributing to a tapestry of technical mastery and creative problem-solving.

Organizational Gratitude: Behind-the-Scenes Heroes

Jeff extends heartfelt thanks to departments, goons, and volunteers who orchestrated the event amid upheaval. Retiring goons like GMark, Noise, Ira, Estang, Gataca, Duna, The Samorphix, Brick, Wham, Casper receive nods for their service, earning lifetime attendance. New “noons” are welcomed, injecting fresh energy.

Gold badge holders, signifying a decade of dedication, are celebrated for their enduring commitment. This segment underscores the human element sustaining DEF CON’s scale and vibrancy.

Looking Ahead: Community and Continuity

Social channels keep the conversation alive year-round, from Discord movie nights to YouTube archives and Instagram updates. The DEF CON Social Mastodon server offers a moderated space adhering to the code of conduct, providing a haven amid social media fragmentation.

A lighthearted anecdote from Jeff about the badge’s “dark chocolate” Easter egg illustrates serendipitous joy, where proximity triggers whimsical interactions. Such moments encapsulate the conference’s blend of seriousness and play.

Finally, anticipation builds for DEF CON 33, slated for August 7-10 at the same venue. Jeff reflects on the positive reception, affirming the space’s role in reducing FOMO and enhancing connections. With content continually uploaded online, the community remains engaged, ready to disengage only until the next convergence.

Links:

EN_DEFCON32MainStageTalks_007_010.md

Posted in en-US | Tags: BadgeDesign, ClosingCeremonies, ContestWinners, CTF, CybersecurityInitiatives, DefCon32, EFF, EngageTheme, HackerCommunity, JeffMoss, MapleMallardMagistrates, MarWilliams, TheProphet | No Comments »

[DefCon32] Counter Deception: Defending Yourself in a World Full of Lies

Author: jonathan

The digital age promised universal access to knowledge, yet it has evolved into a vast apparatus for misinformation. Tom Cross and Greg Conti examine this paradox, tracing deception’s roots from ancient stratagems to modern cyber threats. Drawing on military doctrines and infosec experiences, they articulate principles for crafting illusions and, crucially, for dismantling them. Their discourse empowers individuals to navigate an ecosystem where truth is obscured, fostering tools and mindsets to reclaim clarity.

Deception, at its essence, conceals reality to gain advantage, influencing decisions or inaction. Historical precedents abound: the Trojan Horse’s cunning infiltration, Civil War quaker guns mimicking artillery, or the Persian Gulf War’s feigned amphibious assault diverting attention from a land offensive. In contemporary conflicts, like Russia’s Ukraine invasion, fabricated narratives such as the “Ghost of Kyiv” bolster morale while masking intentions. These tactics transcend eras, targeting not only laypersons but experts, code, and emerging AI systems.

In cybersecurity, falsehoods manifest at every layer: spoofed signals in the electromagnetic spectrum, false flags in malware attribution, or fabricated personas for network access and influence propagation. Humans fall prey through phishing, typo-squatting, or mimicry, while specialists encounter deceptive metadata or rotating infrastructures. Malware detection evades scrutiny via polymorphism or fileless techniques, and AI succumbs to data poisoning or jailbreaks. Strategically, deception scales from tactical engagements to national objectives, concealing capabilities or projecting alternatives.

Maxims of Effective Deception

Military thinkers have distilled deception into enduring guidelines. Sun Tzu advocated knowing adversaries intimately while veiling one’s own plans, emphasizing preparation and adaptability. Von Clausewitz viewed war—and by extension, conflict—as enveloped in uncertainty, where illusions amplify fog. Modern doctrines, like those from the U.S. Joint Chiefs, outline six tenets: focus on key decision-makers, integration with operations, centralized control for consistency, timeliness to exploit windows, security to prevent leaks, and adaptability to evolving conditions.

These principles manifest in cyber realms. Attackers exploit cognitive biases—confirmation, anchoring, availability—embedding falsehoods in blind spots. Narratives craft compelling stories, leveraging emotions like fear or outrage to propagate. Coordination ensures unified messaging across channels, while adaptability counters defenses. In practice, state actors deploy bot networks for amplification, or cybercriminals use deepfakes for social engineering. Understanding these offensive strategies illuminates defensive countermeasures.

Inverting Principles for Countermeasures

Flipping offensive maxims yields defensive strategies. To counter focus, broaden information sources, triangulating across diverse perspectives to mitigate echo chambers. Against integration, scrutinize contexts: does a claim align with broader evidence? For centralized control, identify coordination patterns—sudden surges in similar messaging signal orchestration.

Timeliness demands vigilance during critical periods, like elections, where rushed judgments invite errors. Security’s inverse promotes transparency, fostering open verification. Adaptability encourages continuous learning, refining discernment amid shifting tactics.

Practically, countering biases involves self-awareness: question assumptions, seek disconfirming evidence. Triangulation cross-references claims against reliable outlets, fact-checkers, or archives. Detecting narratives entails pattern recognition—recurring themes, emotional triggers, or inconsistencies. Tools like reverse image searches or metadata analyzers expose fabrications.

Applying Counter Deception in Digital Ecosystems

The internet’s structure amplifies deceit, yet hackers’ ingenuity can reclaim agency. Social media, often ego-centric, distorts realities through algorithmic funhouse mirrors. Curating expert networks—via follows, endorsements—filters noise, prioritizing credible voices. Protocols for machine-readable endorsements, akin to LinkedIn but open, enable querying endorsed specialists on topics, surfacing informed commentary.

Innovative protocols like backlinks—envisioned by pioneers such as Vannevar Bush, Douglas Engelbart, and Ted Nelson—remain underexplored. These allow viewing inbound references, revealing critiques or extensions. Projects like Xanadu or Hyperscope hint at potentials: annotating documents with trusted overlays, highlighting recent edits for scrutiny. Content moderation challenges stymied widespread adoption, but coupling with decentralized systems like Mastodon offers paths forward.

Large language models (LLMs) present dual edges: prone to hallucinations, yet adept at structuring unstructured data. Dispassionate analysis could unearth omitted facts from narratives, or map expertise by parsing academic sites to link profiles. Defensive tools might flag biases or inconsistencies, augmenting human judgment per Engelbart’s augmentation ethos.

Scaling countermeasures involves education: embedding media literacy in curricula, emphasizing critical inquiry. Resources like Media Literacy Now provide K-12 frameworks, while frameworks like “48 Critical Thinking Questions” prompt probing—who benefits, where’s the origin? Hackers, adept at discerning falsehoods, can prototype tools—feed analyzers, narrative detectors—leveraging open protocols for innovation.

Ultimately, countering deception demands vigilance and creativity. By inverting offensive doctrines, individuals fortify perceptions, transforming the internet from a misinformation conduit into a truth-seeking engine.

Links:

EN_DEFCON32MainStageTalks_006_006.md

Posted in en-US | Tags: CounterDeception, Cybersecurity, DefCon32, DigitalDefense, GregConti, MilitaryDoctrine, Misinformation, TomCross | No Comments »

[DefCon32] AMDSinkclose – Universal Ring2 Privilege Escalation

Author: jonathan

In the intricate landscape of hardware security, vulnerabilities often lurk within architectural designs that have persisted for years. Enrique Nissim and Krzysztof Okupski, principal security consultants at IOActive, unravel a profound flaw in AMD processors, dubbed AMDSinkclose. Their exploration reveals how this issue enables attackers to escalate privileges to System Management Mode (SMM), granting unparalleled access to system resources. By dissecting the mechanics of SMM and the processor’s memory handling, they demonstrate exploitation paths that bypass traditional safeguards, affecting a vast array of devices from laptops to servers.

SMM represents one of the most potent execution environments in x86 architectures, offering unrestricted control over I/O devices and memory. It operates stealthily, invisible to operating systems, hypervisors, and security tools like antivirus or endpoint detection systems. During boot, firmware initializes hardware and loads SMM code into a protected memory region called SMRAM. At runtime, the OS can invoke SMM services for tasks such as power management or security checks via System Management Interrupts (SMIs). When an SMI triggers, the processor saves its state in SMRAM, executes the necessary operations, and resumes normal activity. This isolation makes SMM an attractive target for persistence mechanisms, including bootkits or firmware implants.

The duo’s prior research focused on vendor misconfigurations and software flaws in SMM components, yielding tools for vulnerability detection and several CVEs in 2023. However, AMDSinkclose shifts the lens to an inherent processor defect. Unlike Intel systems, where SMM-related Model-Specific Registers (MSRs) are accessible only within SMM, AMD allows ring-0 access to these registers. While an SMM lock bit prevents runtime tampering with key configurations, a critical oversight in the documentation exposes two fields—TClose and AClose—not covered by this lock. TClose, in particular, redirects data accesses in SMM to Memory-Mapped I/O (MMIO) instead of SMRAM, creating a pathway for manipulation.

Architectural Foundations and the Core Vulnerability

At the heart of SMM security lies the memory controller’s role in protecting SMRAM. Firmware configures registers like TSEG Base, TSEG Mask, and SMM Base to overlap and shield this region. The TSEG Mask includes fields for enabling protections, but the unlocked TClose bit allows ring-0 users to set it, altering behavior without violating the lock. When activated, instruction fetches in SMM remain directed to DRAM, but data accesses divert to MMIO. This split enables attackers to control execution by mapping malicious content into the MMIO space.

The feature originated around 2006 to allow SMM code to access I/O devices using SMRAM’s physical addresses, though no vendors appear to utilize it. Documentation warns against leaving TClose set upon SMM exit, as it could misdirect state saves to MMIO. Yet, from ring-0, setting this bit and triggering an SMI causes immediate system instability—freezes or hangs—due to erroneous data handling. This echoes the 2015 Memory Sinkhole attack by Christopher Domas, which remapped the APIC to overlap TSEG, but AMDSinkclose affects the entire TSEG region, amplifying the impact.

Brainstorming exploits, Enrique and Krzysztof considered remapping PCI devices to overlay SMRAM, but initial attempts failed due to hardware restrictions. Instead, they targeted the SMM entry point, a vendor-defined layout typically following EDK2 standards. This includes a core area for support code, per-core SMM bases with entry points at offset 0x8000, and save states at 0xFE00. By setting TClose and invoking an SMI, data reads from these offsets redirect to MMIO, allowing control if an attacker maps a suitable device there.

Exploitation Techniques and Multi-Core Challenges

Exploiting AMDSinkclose requires precise manipulation of the Global Descriptor Table (GDT) and Interrupt Descriptor Table (IDT) within SMM. Upon SMI entry, the processor operates in real mode, loading a GDT from the save state to transition to protected mode. By controlling data fetches via TClose, attackers can supply a malicious GDT, enabling arbitrary code execution. The challenge lies in aligning MMIO mappings with SMM offsets, as direct PCI remapping proved ineffective.

The solution involves leveraging the processor’s address wraparound behavior. In protected mode, addresses exceeding 4GB wrap around, but SMM’s real-mode entry point operates at a lower level where this wraparound can be exploited. By setting the SMM base to a high address like 0xFFFFFFF0, data accesses wrap to low MMIO regions (0x0 to 0xFFF), where integrated devices like the Local APIC reside. This allows overwriting the GDT with controlled content from the APIC’s registers.

Multi-core systems introduce complexity, as all cores enter SMM simultaneously during a broadcast SMI. The exploit must handle concurrent execution, ensuring only one core performs the malicious action while others halt safely. Disabling Simultaneous Multithreading (SMT) simplifies this, but wraparound enables targeting specific cores. Testing on Ryzen laptops confirmed reliability, with code injection succeeding across threads.

Impact on Firmware and Mitigation Strategies

The ramifications extend to firmware persistence. Once in SMM, attackers disable SPI flash protections like ROM Armor, enabling writes to non-volatile storage. Depending on configurations—such as Platform Secure Boot (PSB)—outcomes vary. Fully enabled protections limit writes to variables, potentially breaking Secure Boot by altering keys. Absent PSB, full firmware implants become feasible, resistant to OS reinstalls or updates, as malware can intercept and falsify flash operations.

Research on vendor configurations reveals widespread vulnerabilities: many systems lack ROM Armor or PSB, exposing them to implants. Even with protections, bootkits remain possible, executing pre-OS loader. A fused disable of PSB ensures perpetual vulnerability.

AMD’s microcode update addresses the issue, though coverage may vary. OEMs can patch SMM entry points to detect and halt on TClose activation, integrable into EDK2 or Coreboot. Users might trap MSR accesses via hypervisors. Reported in October 2023, CVE-2023-31315 was assigned, with an advisory published recently. Exploit code is forthcoming, underscoring the need for deepened architectural scrutiny.

Links:

Posted in en-US | Tags: AMDProcessors, AMDSinkclose, DefCon32, EnriqueNissim, FirmwareImplants, HardwareSecurity, IOActive, KrzysztofOkupski, PrivilegeEscalation, SMMExploitation | No Comments »

Running Docker Natively on WSL2 (Ubuntu 24.04) in Windows 11

Author: Jonathan Lalou

For many developers, Docker Desktop has long been the default solution to run Docker on Windows. However, licensing changes and the desire for a leaner setup have pushed teams to look for alternatives. Fortunately, with the maturity of Windows Subsystem for Linux 2 (WSL2), it is now possible to run the full Docker Engine directly inside a Linux distribution such as Ubuntu 24.04, while still accessing containers seamlessly from both Linux and Windows.

In this guide, I’ll walk you through a clean, step-by-step setup for running Docker Engine inside WSL2 without Docker Desktop, explain how Windows and WSL2 communicate, and share best practices for maintaining a healthy development environment.

Why Run Docker Inside WSL2?

Running Docker natively inside WSL2 has several benefits:

No licensing issues – you avoid Docker Desktop’s commercial license requirements.
Lightweight – no heavy virtualization layer; containers run directly inside your WSL Linux distro.
Integrated networking – on Windows 11 with modern WSL versions,
containers bound to localhost inside WSL are automatically reachable from Windows.
Familiar Linux workflow – you install and use Docker exactly as you would on a regular Ubuntu server.

Step 1 – Update Ubuntu

Open your Ubuntu 24.04 terminal and ensure your system is up to date:

sudo apt update && sudo apt upgrade -y

Step 2 – Install Docker Engine

Install Docker using the official Docker repository:

# Install prerequisites
sudo apt install -y ca-certificates curl gnupg lsb-release

# Add Docker’s GPG key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
  sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

# Configure Docker repository
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker Engine
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Step 3 – Run Docker Without `sudo`

To avoid prefixing every command with sudo, add your user to the docker group:

sudo usermod -aG docker $USER

Restart your WSL terminal for the change to take effect, then verify:

docker --version
docker ps

Step 4 – Test Networking

One of the most common questions is:
“Will my containers be accessible from both Ubuntu and Windows?”
The answer is yes on modern Windows 11 with WSL2.
Let’s test it by running an Nginx container:

docker run -d -p 8080:80 --name webtest nginx

Inside Ubuntu (WSL): curl http://localhost:8080
From Windows (browser or PowerShell): http://localhost:8080

Thanks to WSL2’s localhost forwarding, Windows traffic to localhost is routed
into the WSL network, making containers instantly accessible without extra configuration.

Step 5 – Run Multi-Container Applications with Docker Compose

The Docker Compose plugin is already installed as part of the package above. Check the version:

docker compose version

Create a docker-compose.yml for a WordPress + MySQL stack:

version: "3.9"
services:
  db:
    image: mysql:8.0
    environment:
      MYSQL_ROOT_PASSWORD: rootpass
      MYSQL_DATABASE: wordpress
      MYSQL_USER: wpuser
      MYSQL_PASSWORD: wppass
    volumes:
      - db_data:/var/lib/mysql

  wordpress:
    image: wordpress:latest
    ports:
      - "8080:80"
    environment:
      WORDPRESS_DB_HOST: db:3306
      WORDPRESS_DB_USER: wpuser
      WORDPRESS_DB_PASSWORD: wppass
      WORDPRESS_DB_NAME: wordpress

volumes:
  db_data:

Start the services:

docker compose up -d

Once the containers are running, open http://localhost:8080 in your Windows browser
to access WordPress. The containers are managed entirely inside WSL2,
but networking feels seamless.

Maintenance: Cleaning Up Docker Data

Over time, Docker accumulates images, stopped containers, volumes, and networks.
This can take up significant disk space inside your WSL distribution.
Here are safe maintenance commands to keep your environment clean:

Remove Unused Objects

docker system prune -a --volumes

-a: removes all unused images, not just dangling ones
--volumes: also removes unused volumes

Reset Everything (Dangerous)

If you need to wipe your Docker environment completely (images, containers, volumes, networks):

docker stop $(docker ps -aq) 2>/dev/null
docker rm -f $(docker ps -aq) 2>/dev/null
docker volume rm $(docker volume ls -q) 2>/dev/null
docker network rm $(docker network ls -q) 2>/dev/null
docker image rm -f $(docker image ls -q) 2>/dev/null

⚠️ Use this only if you want to start fresh. All data will be removed.

Conclusion

By running Docker Engine directly inside WSL2, you gain a powerful, lightweight, and license-free Docker environment that integrates seamlessly with Windows 11. Your containers are accessible from both Linux and Windows, Docker Compose works out of the box, and maintenance is straightforward with prune commands.

This approach is particularly well-suited for developers who want the flexibility of Docker without the overhead of Docker Desktop. With WSL2 and Ubuntu 24.04, you get the best of both worlds: Linux-native Docker with Windows accessibility.

Posted in en-US | Tags: Docker, Linux, WSL | No Comments »

Predictive Modeling and the Illusion of Signal

Author: Jonathan Lalou

Introduction

Vincent Warmerdam delves into the illusions often encountered in predictive modeling, highlighting the cognitive traps and statistical misconceptions that lead to overconfidence in model performance.

The Seduction of Spurious Correlations

Models often perform well on training data by exploiting noise rather than genuine signal. Vincent emphasizes critical thinking and statistical rigor to avoid being misled by deceptively strong results.

Building Robust Models

Using robust cross-validation, considering domain knowledge, and testing against out-of-sample data are vital strategies to counteract the illusion of predictive prowess.

Conclusion

Data science is not just coding and modeling — it requires constant skepticism, critical evaluation, and humility. Vincent reminds us to stay vigilant against the comforting but dangerous mirage of false predictability.

Posted in en-US | Tags: PyData2024, Python | No Comments »

Building Intelligent Data Products at Scale

Author: Jonathan Lalou

Introduction

Thomas Vachon shares insights into scaling data-driven products, blending machine learning, engineering, and user-centric design to create impactful and intelligent applications.

Key Ingredients for Success

Building intelligent products requires aligning data pipelines, model training, deployment infrastructure, and feedback loops. Vachon stresses the importance of cross-functional collaboration between data scientists, software engineers, and product teams.

Real-World Lessons

From architectural best practices to team organization strategies, Vachon illustrates how to navigate the complexity of scaling data initiatives sustainably.

Conclusion

Intelligent data products demand not only technical excellence but also thoughtful design, scalability planning, and user empathy from day one.

Posted in en-US | Tags: PyData2024, Python | No Comments »

Boosting AI Reliability: Uncertainty Quantification with MAPIE

Author: Jonathan Lalou

Watch the video

Introduction

Thierry Cordier and Valentin Laurent introduce MAPIE, a Python library within scikit-learn-contrib, designed for uncertainty quantification in machine learning models.

MAPIE on GitHub

Managing Uncertainty in Machine Learning

In AI applications — from autonomous vehicles to medical diagnostics — understanding prediction uncertainty is crucial. MAPIE uses conformal prediction methods to generate prediction intervals with controlled confidence, ensuring safer and more interpretable AI systems.

Key Features

MAPIE supports regression, classification, time series forecasting, and complex tasks like multi-label classification and semantic segmentation. It integrates seamlessly with scikit-learn, TensorFlow, PyTorch, and custom models.

Real-World Use Cases

By generating calibrated prediction intervals, MAPIE enables selective classification, robust decision-making under uncertainty, and provides statistical guarantees critical for safety-critical AI systems.

Conclusion

MAPIE empowers data scientists to quantify uncertainty elegantly, bridging the gap between predictive power and real-world reliability.

Posted in en-US | Tags: PyData2024, Python | No Comments »

[PyData Paris 2024] Exploring Quarto Dashboard for Impactful and Visual Communication

Author: Jonathan Lalou

Exploring Quarto Dashboard for Impactful and Visual Communication

Watch the video

Introduction

Christophe Dervieux introduces us to Quarto Dashboard, a powerful open-source scientific and technical publishing system. Designed to create impactful visual communication directly from Jupyter Notebooks, Quarto enables the seamless creation of interactive charts, dashboards, and dynamic narratives.

Building Visual Communication with Quarto

Quarto extends standard markdown with advanced features tailored for scientific writing. It offers support for multiple computation engines, allowing narratives and executable code to merge into various outputs: PDF, HTML pages, websites, books, and especially dashboards. The dashboard format enhances data communication by organizing visual metrics in an efficient and impactful layout.

Using Quarto, rendering a Jupyter notebook becomes simple: with just a command-line instruction (quarto render), users can output polished, shareable dashboards. Additional extensions, such as those available in VS Code, JupyterLab, and Positron IDEs, streamline this experience further.

Dashboard Features and Design

Dashboards in Quarto organize content using components like cards, rows, columns, sidebars, and tabs. Each element structures visual outputs like plots, tables, and value boxes, allowing maximum clarity. Customization is straightforward, leveraging YAML configuration and Bootstrap-based theming. Users can create multi-page navigation, interactivity through JavaScript libraries, and adapt layouts for specific audiences.

Recent updates even enable branding dashboards easily with SCSS themes, making Quarto ideal for both scientific and corporate environments.

Conclusion

Quarto revolutionizes technical communication by enabling scientists and analysts to produce professional-grade dashboards and publications effortlessly. Christophe’s session at PyData Paris 2023 showcased the simplicity, power, and flexibility Quarto brings to modern data storytelling.

Posted in en-US | Tags: PyData2024, Python | No Comments »

JpaSystemException: A collection with cascade=”all-delete-orphan” was no longer referenced by the owning entity instance

Author: Jonathan Lalou

Case:

Entity declaration:

    @OneToMany(cascade = CascadeType.ALL, orphanRemoval = true)
    private List<Foo> foos = Lists.newArrayList();

This block

            user.getFoos().clear();
// instantiate `foos`, eg: final List<Foo> foos = myService.createFoos(bla, bla);
            user.setFoos(foos);

generates this error:

org.springframework.orm.jpa.JpaSystemException: A collection with cascade="all-delete-orphan" was no longer referenced by the owning entity instance: com.github.lalou.jonathan.blabla.User.foos

Fix:

Do not use setFoos() ; rather, after clearing, use addAll(). In other words, replace:

            user.getFoos().clear();
            user.setFoos(foos);

with

user.getFoos().clear(); user.getFoos().addAll(foos);

(copied to https://stackoverflow.com/questions/78858499/jpasystemexception-a-collection-with-cascade-all-delete-orphan-was-no-longer )

Posted in en-US | Tags: Hibernate, jpa-2.1, spring-data, spring-data-jpa | No Comments »

Enhanced String Handling with Arrow Integration

Revolutionary Shuffle Algorithm

Query Planning: A Game-Changer

Benchmarking Against the Giants

Links:

Honoring the Past: A Moment of Reflection

Embracing the Theme: Engagement in Action

Spotlight on Creativity: The Badge Unveiled

Triumphs in Competition: Village and Challenge Winners

Celebrating Hands-On Innovation: More Contest Highlights

Organizational Gratitude: Behind-the-Scenes Heroes

Looking Ahead: Community and Continuity

Links:

Maxims of Effective Deception

Inverting Principles for Countermeasures

Applying Counter Deception in Digital Ecosystems

Links:

Architectural Foundations and the Core Vulnerability

Exploitation Techniques and Multi-Core Challenges

Impact on Firmware and Mitigation Strategies

Links:

Why Run Docker Inside WSL2?

Step 1 – Update Ubuntu

Step 2 – Install Docker Engine

Step 3 – Run Docker Without sudo

Step 4 – Test Networking

Step 5 – Run Multi-Container Applications with Docker Compose

Maintenance: Cleaning Up Docker Data

Remove Unused Objects

Reset Everything (Dangerous)

Conclusion

Introduction

The Seduction of Spurious Correlations

Building Robust Models

Conclusion

Introduction

Key Ingredients for Success

Real-World Lessons

Conclusion

Introduction

Managing Uncertainty in Machine Learning

Key Features

Real-World Use Cases

Conclusion

Exploring Quarto Dashboard for Impactful and Visual Communication

Introduction

Building Visual Communication with Quarto

Dashboard Features and Design

Conclusion

Case:

Fix:

Step 3 – Run Docker Without `sudo`