Recent Posts
Archives

Archive for the ‘en-US’ Category

PostHeaderIcon [PyConUS 2024] Pandas + Dask DataFrame 2.0: A Leap Forward in Distributed Computing

At PyCon US 2024, Patrick Hoefler delivered an insightful presentation on the advancements in Dask DataFrame 2.0, particularly its enhanced integration with pandas and its performance compared to other big data tools like Spark, DuckDB, and Polars. As a maintainer of both pandas and Dask, Patrick, who works at Coiled, shared how recent improvements have transformed Dask into a robust and efficient solution for distributed computing, making it a compelling choice for handling large-scale datasets.

Enhanced String Handling with Arrow Integration

One of the most significant upgrades in Dask DataFrame 2.0 is its adoption of Apache Arrow for string handling, moving away from the less efficient NumPy object data type. Patrick highlighted that this shift has resulted in substantial performance gains. For instance, string operations are now two to three times faster in pandas, and in Dask, they can achieve up to tenfold improvements due to better multithreading capabilities. Additionally, memory usage has been drastically reduced—by approximately 60 to 70% in typical datasets—making Dask more suitable for memory-constrained environments. This enhancement ensures that users can process large datasets with string-heavy columns more efficiently, a critical factor in distributed workloads.

Revolutionary Shuffle Algorithm

Patrick emphasized the complete overhaul of Dask’s shuffle algorithm, which is pivotal for distributed systems where data must be communicated across multiple workers. The previous algorithm scaled poorly, with a logarithmic complexity that hindered performance as dataset sizes grew. The new peer-to-peer (P2P) shuffle algorithm, however, scales linearly, ensuring that doubling the dataset size only doubles the workload. This improvement not only boosts performance but also enhances reliability, allowing Dask to handle arbitrarily large datasets with constant memory usage by leveraging disk storage when necessary. Such advancements make Dask a more resilient choice for complex data processing tasks.

Query Planning: A Game-Changer

The introduction of a logical query planning layer marks a significant milestone for Dask. Historically, Dask executed operations as they were received, often leading to inefficient processing. The new query optimizer employs techniques like column projection and predicate pushdown, which significantly reduce unnecessary data reads and network transfers. For example, by identifying and prioritizing filters and projections early in the query process, Dask can minimize data movement, potentially leading to performance improvements of up to 1000x in certain scenarios. This optimization makes Dask more intuitive and efficient, aligning it closer to established systems like Spark.

Benchmarking Against the Giants

Patrick presented comprehensive benchmarks using the TPC-H dataset to compare Dask’s performance against Spark, DuckDB, and Polars. At a 100 GB scale, DuckDB often outperformed others due to its single-node optimization, but Dask held its own. At larger scales (1 TB and 10 TB), Dask’s distributed nature gave it an edge, particularly when DuckDB struggled with memory constraints on complex queries. Against Spark, Dask showed remarkable progress, outperforming it in most queries at the 1 TB scale and maintaining competitiveness at 10 TB, despite some overhead issues that Patrick noted are being addressed. These results underscore Dask’s growing capability to handle enterprise-level data processing tasks.

Hashtags: #Dask #Pandas #BigData #DistributedComputing #PyConUS2024 #PatrickHoefler #Coiled #Spark #DuckDB #Polars

PostHeaderIcon [DevoxxGR2024] Butcher Virtual Threads Like a Pro at Devoxx Greece 2024 by Piotr Przybyl

Piotr Przybyl, a Java Champion and developer advocate at Elastic, captivated audiences at Devoxx Greece 2024 with a dynamic exploration of Java 21’s virtual threads. Through vivid analogies, practical demos, and a touch of humor, Piotr demystified virtual threads, highlighting their potential and pitfalls. His talk, rich with real-world insights, offered developers a guide to leveraging this transformative feature while avoiding common missteps. As a seasoned advocate for technologies like Elasticsearch and Testcontainers, Piotr’s presentation was a masterclass in navigating modern Java concurrency.

Understanding Virtual Threads

Piotr began by contextualizing virtual threads within Java’s concurrency evolution. Introduced in Java 21 under Project Loom, virtual threads address the limitations of traditional platform threads, which are costly to create and limited in number. Unlike platform threads, virtual threads are lightweight, managed by a scheduler that mounts and unmounts them from carrier threads during I/O operations. This enables a thread-per-request model, scaling applications to handle millions of concurrent tasks. Piotr likened virtual threads to taxis in a busy city like Athens, efficiently transporting passengers (tasks) without occupying resources during idle periods.

However, virtual threads are not a universal solution. Piotr emphasized that they do not inherently speed up individual requests but improve scalability by handling more concurrent tasks. Their API remains familiar, aligning with existing thread practices, making adoption seamless for developers accustomed to Java’s threading model.

Common Pitfalls and Pinning

A central theme of Piotr’s talk was “pinning,” a performance issue where virtual threads remain tied to carrier threads, negating benefits. Pinning occurs during I/O or native calls within synchronized blocks, akin to keeping a taxi running during a lunch break. Piotr demonstrated this with a legacy Elasticsearch client, using Testcontainers and Toxiproxy to simulate slow network calls. By enabling tracing with flags like -J-DTracePinnThreads, He identified and resolved pinning issues, replacing synchronized methods with modern, non-blocking clients.

Piotr cautioned against misuses like thread pooling or reusing virtual threads, which disrupt their lightweight design. He advocated for careful monitoring using JFR events to ensure threads remain unpinned, ensuring optimal performance in production environments.

Structured Concurrency and Scope Values

Piotr explored structured concurrency, a preview feature in Java 21, designed to eliminate thread leaks and cancellation delays. By creating scopes that manage forks, developers can ensure tasks complete or fail together, simplifying error handling. He demonstrated a shutdown-on-failure scope, where a single task failure cancels all others, contrasting this with the complexity of managing interdependent futures.

Scope Values, another preview feature, offer immutable, one-way thread locals to prevent bugs like data leakage in thread pools. Piotr illustrated their use in maintaining request context, warning against mutability to preserve reliability. These features, he argued, complement virtual threads, fostering robust, maintainable concurrent applications.

Practical Debugging and Best Practices

Through live coding, Piotr showcased how debugging with logging can inadvertently introduce I/O, unmounting virtual threads and degrading performance. He compared this to a concert where logging scatters tasks, reducing completion rates. To mitigate this, he recommended avoiding I/O in critical paths and using structured concurrency for monitoring.

Piotr’s best practices included using framework-specific annotations (e.g., Quarkus, Spring) to enable virtual threads and ensuring tasks are interruptible. He urged developers to test thoroughly, leveraging tools like Testcontainers to simulate real-world conditions. His blog post on testing unpinned threads provides further guidance for practitioners.

Conclusion

Piotr’s presentation was a clarion call to embrace virtual threads with enthusiasm and caution. By understanding their mechanics, avoiding pitfalls like pinning, and leveraging structured concurrency, developers can unlock unprecedented scalability. His engaging analogies and practical demos made complex concepts accessible, empowering attendees to modernize Java applications responsibly. As Java evolves, Piotr’s insights ensure developers remain equipped to navigate its concurrency landscape.

Links:

PostHeaderIcon [DefCon32] Closing Ceremonies & Awards

As the echoes of innovation and collaboration fade from the halls of the Las Vegas Convention Center, the closing ceremonies of DEF CON 32 encapsulate the spirit of a community that thrives on engagement, resilience, and shared purpose. Hosted by Jeff Moss, known as Dark Tangent, alongside contributors like Mar Williams and representatives from various teams, the event reflects on achievements, honors trailblazers, and charts a course forward. Amid reflections on past giants and celebrations of current triumphs, the gathering underscores the hacker ethos: pushing boundaries while fostering inclusivity and growth.

Jeff opens with a tone of relief and gratitude, acknowledging the unforeseen venue shift that tested the community’s adaptability. What began as a potential setback transformed into a revitalized experience, with attendees praising the spacious layout that evoked the intimacy of earlier conventions. This backdrop sets the stage for a moment of solemnity, where participants pause to honor those who paved the way—mentors, innovators, and unsung heroes whose legacies endure in the collective memory.

The theme of “engage” permeates the proceedings, inspiring initiatives that extend the conference’s impact beyond its annual confines. Jeff highlights two new ventures aimed at channeling the community’s expertise toward societal good and personal advancement. These efforts embody a commitment to proactive involvement, bridging the gap between hacker ingenuity and real-world challenges.

Honoring the Past: A Moment of Reflection

In a poignant start, Jeff calls for silence to remember predecessors whose contributions form the foundation of today’s cybersecurity landscape. This ritual serves as a reminder that progress stems from accumulated wisdom, urging attendees to carry forward the ethos of giving back. The gesture resonates deeply, connecting generations and reinforcing the communal bonds that define DEF CON.

Transitioning to celebration, the ceremonies spotlight individuals and organizations embodying selfless dedication. Jeff presents the Uber Contributor Award to The Prophet, a figure whose decades-long involvement spans writing for 2600 magazine, educating newcomers, and organizing events like Telephreak Challenge and QueerCon. His journey from phreaker to multifaceted influencer exemplifies the transformative power of sustained engagement. The Prophet’s acceptance speech captures the magic of the community, where dreams materialize through collective effort.

Similarly, the Electronic Frontier Foundation (EFF) receives recognition for over two decades of advocacy, raising $130,000 this year alone to support speakers and defend digital rights. Their representative emphasizes EFF’s role in amplifying security research for global benefit, aligning with DEF CON’s mission to empower ethical hacking.

Embracing the Theme: Engagement in Action

The “engage” motif drives discussions on evolving the community’s role in an increasingly complex digital world. Jeff articulates how this concept prompted bold experiments, acknowledging the uncertainties but embracing potential failures as learning opportunities. This mindset reflects the hacker’s adaptability, turning challenges into catalysts for innovation.

Attendees share feedback on the new venue, noting reduced overcrowding and a more relaxed atmosphere reminiscent of DEF CON’s earlier editions. Such observations validate the rapid pivot from the previous location, a decision thrust upon organizers by an unexpected contract termination. Jeff recounts the whirlwind process with humor, crediting quick alliances and the community’s resilience for the seamless transition.

Spotlight on Creativity: The Badge Unveiled

Mar Williams takes the stage to demystify the DEF CON 32 badge, a testament to accessible design and collaborative artistry. Drawing from a concept rooted in inclusivity, Mar aimed to create something approachable for novices while offering depth for experts. Partnering with Raspberry Pi, the badge incorporates layers of interactivity—from loading custom ROMs to developing games via GB Studio.

Acknowledgments flow to the team: Bonnie Finley for 3D modeling and game art, Chris Maltby for plugins and development, Nutmeg for additional game work, Will Tuttle for narrative input, Ada Rose Cannon for character creation, Legion 303 for audio, and others like ICSN for manufacturing. Mar’s vision emphasizes community participation, with the badge’s game dedicating itself to players who engage and make an impact. Challenges like SOS signals and proximity interactions foster connections, while post-conference resources encourage ongoing tinkering.

Triumphs in Competition: Village and Challenge Winners

The ceremonies burst with energy as winners from myriad contests are announced, showcasing the breadth of skills within the community. From the AI Village Capture the Flag, where teams like AI Cyber Challenge victors demonstrate prowess in emerging tech, to the Aviation Village’s high-flying achievements, each victory highlights specialized expertise.

Notable accolades include the AppSec Village’s top performers in secure coding, the Biohacking Village’s innovative health hacks, and the Car Hacking Village’s vehicular exploits. The Cloud Village CTF crowns champions in scalable defenses, while the Crypto & Privacy Village recognizes cryptographic ingenuity. Diversity shines through in the ICS Village’s industrial control triumphs and the IoT Village’s device dissections.

Special mentions go to the Lockpick Village’s dexterity masters, the Misinformation Village’s truth-seekers, and the Packet Hacking Village’s network ninjas. The Password Cracking Contest and Physical Pentest Challenge celebrate brute force and subtle infiltration, respectively. The Policy Village engages in advocacy wins, and the Recon Village excels in intelligence gathering.

Celebrating Hands-On Innovation: More Contest Highlights

The Red Team Village’s strategic simulations yield victors in offensive operations, complemented by the RFID Village’s access control breakthroughs. Rogue Access Point contests reward wireless wizardry, while the Soldering Skills Village honors precise craftsmanship.

The Space Security Village pushes boundaries in orbital defenses, and the Tamper Evident Village masters detection of intrusions. Telecom and Telephreak challenges revive analog artistry, with the Vishing Competition testing social engineering finesse. The Voting Village exposes electoral vulnerabilities, and the WiFi Village dominates spectrum battles.

Wireless CTF and Wordle Hacking rounds out the roster, each contributing to a tapestry of technical mastery and creative problem-solving.

Organizational Gratitude: Behind-the-Scenes Heroes

Jeff extends heartfelt thanks to departments, goons, and volunteers who orchestrated the event amid upheaval. Retiring goons like GMark, Noise, Ira, Estang, Gataca, Duna, The Samorphix, Brick, Wham, Casper receive nods for their service, earning lifetime attendance. New “noons” are welcomed, injecting fresh energy.

Gold badge holders, signifying a decade of dedication, are celebrated for their enduring commitment. This segment underscores the human element sustaining DEF CON’s scale and vibrancy.

Looking Ahead: Community and Continuity

Social channels keep the conversation alive year-round, from Discord movie nights to YouTube archives and Instagram updates. The DEF CON Social Mastodon server offers a moderated space adhering to the code of conduct, providing a haven amid social media fragmentation.

A lighthearted anecdote from Jeff about the badge’s “dark chocolate” Easter egg illustrates serendipitous joy, where proximity triggers whimsical interactions. Such moments encapsulate the conference’s blend of seriousness and play.

Finally, anticipation builds for DEF CON 33, slated for August 7-10 at the same venue. Jeff reflects on the positive reception, affirming the space’s role in reducing FOMO and enhancing connections. With content continually uploaded online, the community remains engaged, ready to disengage only until the next convergence.

Links:

EN_DEFCON32MainStageTalks_007_010.md

PostHeaderIcon [DefCon32] Counter Deception: Defending Yourself in a World Full of Lies

The digital age promised universal access to knowledge, yet it has evolved into a vast apparatus for misinformation. Tom Cross and Greg Conti examine this paradox, tracing deception’s roots from ancient stratagems to modern cyber threats. Drawing on military doctrines and infosec experiences, they articulate principles for crafting illusions and, crucially, for dismantling them. Their discourse empowers individuals to navigate an ecosystem where truth is obscured, fostering tools and mindsets to reclaim clarity.

Deception, at its essence, conceals reality to gain advantage, influencing decisions or inaction. Historical precedents abound: the Trojan Horse’s cunning infiltration, Civil War quaker guns mimicking artillery, or the Persian Gulf War’s feigned amphibious assault diverting attention from a land offensive. In contemporary conflicts, like Russia’s Ukraine invasion, fabricated narratives such as the “Ghost of Kyiv” bolster morale while masking intentions. These tactics transcend eras, targeting not only laypersons but experts, code, and emerging AI systems.

In cybersecurity, falsehoods manifest at every layer: spoofed signals in the electromagnetic spectrum, false flags in malware attribution, or fabricated personas for network access and influence propagation. Humans fall prey through phishing, typo-squatting, or mimicry, while specialists encounter deceptive metadata or rotating infrastructures. Malware detection evades scrutiny via polymorphism or fileless techniques, and AI succumbs to data poisoning or jailbreaks. Strategically, deception scales from tactical engagements to national objectives, concealing capabilities or projecting alternatives.

Maxims of Effective Deception

Military thinkers have distilled deception into enduring guidelines. Sun Tzu advocated knowing adversaries intimately while veiling one’s own plans, emphasizing preparation and adaptability. Von Clausewitz viewed war—and by extension, conflict—as enveloped in uncertainty, where illusions amplify fog. Modern doctrines, like those from the U.S. Joint Chiefs, outline six tenets: focus on key decision-makers, integration with operations, centralized control for consistency, timeliness to exploit windows, security to prevent leaks, and adaptability to evolving conditions.

These principles manifest in cyber realms. Attackers exploit cognitive biases—confirmation, anchoring, availability—embedding falsehoods in blind spots. Narratives craft compelling stories, leveraging emotions like fear or outrage to propagate. Coordination ensures unified messaging across channels, while adaptability counters defenses. In practice, state actors deploy bot networks for amplification, or cybercriminals use deepfakes for social engineering. Understanding these offensive strategies illuminates defensive countermeasures.

Inverting Principles for Countermeasures

Flipping offensive maxims yields defensive strategies. To counter focus, broaden information sources, triangulating across diverse perspectives to mitigate echo chambers. Against integration, scrutinize contexts: does a claim align with broader evidence? For centralized control, identify coordination patterns—sudden surges in similar messaging signal orchestration.

Timeliness demands vigilance during critical periods, like elections, where rushed judgments invite errors. Security’s inverse promotes transparency, fostering open verification. Adaptability encourages continuous learning, refining discernment amid shifting tactics.

Practically, countering biases involves self-awareness: question assumptions, seek disconfirming evidence. Triangulation cross-references claims against reliable outlets, fact-checkers, or archives. Detecting narratives entails pattern recognition—recurring themes, emotional triggers, or inconsistencies. Tools like reverse image searches or metadata analyzers expose fabrications.

Applying Counter Deception in Digital Ecosystems

The internet’s structure amplifies deceit, yet hackers’ ingenuity can reclaim agency. Social media, often ego-centric, distorts realities through algorithmic funhouse mirrors. Curating expert networks—via follows, endorsements—filters noise, prioritizing credible voices. Protocols for machine-readable endorsements, akin to LinkedIn but open, enable querying endorsed specialists on topics, surfacing informed commentary.

Innovative protocols like backlinks—envisioned by pioneers such as Vannevar Bush, Douglas Engelbart, and Ted Nelson—remain underexplored. These allow viewing inbound references, revealing critiques or extensions. Projects like Xanadu or Hyperscope hint at potentials: annotating documents with trusted overlays, highlighting recent edits for scrutiny. Content moderation challenges stymied widespread adoption, but coupling with decentralized systems like Mastodon offers paths forward.

Large language models (LLMs) present dual edges: prone to hallucinations, yet adept at structuring unstructured data. Dispassionate analysis could unearth omitted facts from narratives, or map expertise by parsing academic sites to link profiles. Defensive tools might flag biases or inconsistencies, augmenting human judgment per Engelbart’s augmentation ethos.

Scaling countermeasures involves education: embedding media literacy in curricula, emphasizing critical inquiry. Resources like Media Literacy Now provide K-12 frameworks, while frameworks like “48 Critical Thinking Questions” prompt probing—who benefits, where’s the origin? Hackers, adept at discerning falsehoods, can prototype tools—feed analyzers, narrative detectors—leveraging open protocols for innovation.

Ultimately, countering deception demands vigilance and creativity. By inverting offensive doctrines, individuals fortify perceptions, transforming the internet from a misinformation conduit into a truth-seeking engine.

Links:

EN_DEFCON32MainStageTalks_006_006.md

PostHeaderIcon [DefCon32] AMDSinkclose – Universal Ring2 Privilege Escalation

In the intricate landscape of hardware security, vulnerabilities often lurk within architectural designs that have persisted for years. Enrique Nissim and Krzysztof Okupski, principal security consultants at IOActive, unravel a profound flaw in AMD processors, dubbed AMDSinkclose. Their exploration reveals how this issue enables attackers to escalate privileges to System Management Mode (SMM), granting unparalleled access to system resources. By dissecting the mechanics of SMM and the processor’s memory handling, they demonstrate exploitation paths that bypass traditional safeguards, affecting a vast array of devices from laptops to servers.

SMM represents one of the most potent execution environments in x86 architectures, offering unrestricted control over I/O devices and memory. It operates stealthily, invisible to operating systems, hypervisors, and security tools like antivirus or endpoint detection systems. During boot, firmware initializes hardware and loads SMM code into a protected memory region called SMRAM. At runtime, the OS can invoke SMM services for tasks such as power management or security checks via System Management Interrupts (SMIs). When an SMI triggers, the processor saves its state in SMRAM, executes the necessary operations, and resumes normal activity. This isolation makes SMM an attractive target for persistence mechanisms, including bootkits or firmware implants.

The duo’s prior research focused on vendor misconfigurations and software flaws in SMM components, yielding tools for vulnerability detection and several CVEs in 2023. However, AMDSinkclose shifts the lens to an inherent processor defect. Unlike Intel systems, where SMM-related Model-Specific Registers (MSRs) are accessible only within SMM, AMD allows ring-0 access to these registers. While an SMM lock bit prevents runtime tampering with key configurations, a critical oversight in the documentation exposes two fields—TClose and AClose—not covered by this lock. TClose, in particular, redirects data accesses in SMM to Memory-Mapped I/O (MMIO) instead of SMRAM, creating a pathway for manipulation.

Architectural Foundations and the Core Vulnerability

At the heart of SMM security lies the memory controller’s role in protecting SMRAM. Firmware configures registers like TSEG Base, TSEG Mask, and SMM Base to overlap and shield this region. The TSEG Mask includes fields for enabling protections, but the unlocked TClose bit allows ring-0 users to set it, altering behavior without violating the lock. When activated, instruction fetches in SMM remain directed to DRAM, but data accesses divert to MMIO. This split enables attackers to control execution by mapping malicious content into the MMIO space.

The feature originated around 2006 to allow SMM code to access I/O devices using SMRAM’s physical addresses, though no vendors appear to utilize it. Documentation warns against leaving TClose set upon SMM exit, as it could misdirect state saves to MMIO. Yet, from ring-0, setting this bit and triggering an SMI causes immediate system instability—freezes or hangs—due to erroneous data handling. This echoes the 2015 Memory Sinkhole attack by Christopher Domas, which remapped the APIC to overlap TSEG, but AMDSinkclose affects the entire TSEG region, amplifying the impact.

Brainstorming exploits, Enrique and Krzysztof considered remapping PCI devices to overlay SMRAM, but initial attempts failed due to hardware restrictions. Instead, they targeted the SMM entry point, a vendor-defined layout typically following EDK2 standards. This includes a core area for support code, per-core SMM bases with entry points at offset 0x8000, and save states at 0xFE00. By setting TClose and invoking an SMI, data reads from these offsets redirect to MMIO, allowing control if an attacker maps a suitable device there.

Exploitation Techniques and Multi-Core Challenges

Exploiting AMDSinkclose requires precise manipulation of the Global Descriptor Table (GDT) and Interrupt Descriptor Table (IDT) within SMM. Upon SMI entry, the processor operates in real mode, loading a GDT from the save state to transition to protected mode. By controlling data fetches via TClose, attackers can supply a malicious GDT, enabling arbitrary code execution. The challenge lies in aligning MMIO mappings with SMM offsets, as direct PCI remapping proved ineffective.

The solution involves leveraging the processor’s address wraparound behavior. In protected mode, addresses exceeding 4GB wrap around, but SMM’s real-mode entry point operates at a lower level where this wraparound can be exploited. By setting the SMM base to a high address like 0xFFFFFFF0, data accesses wrap to low MMIO regions (0x0 to 0xFFF), where integrated devices like the Local APIC reside. This allows overwriting the GDT with controlled content from the APIC’s registers.

Multi-core systems introduce complexity, as all cores enter SMM simultaneously during a broadcast SMI. The exploit must handle concurrent execution, ensuring only one core performs the malicious action while others halt safely. Disabling Simultaneous Multithreading (SMT) simplifies this, but wraparound enables targeting specific cores. Testing on Ryzen laptops confirmed reliability, with code injection succeeding across threads.

Impact on Firmware and Mitigation Strategies

The ramifications extend to firmware persistence. Once in SMM, attackers disable SPI flash protections like ROM Armor, enabling writes to non-volatile storage. Depending on configurations—such as Platform Secure Boot (PSB)—outcomes vary. Fully enabled protections limit writes to variables, potentially breaking Secure Boot by altering keys. Absent PSB, full firmware implants become feasible, resistant to OS reinstalls or updates, as malware can intercept and falsify flash operations.

Research on vendor configurations reveals widespread vulnerabilities: many systems lack ROM Armor or PSB, exposing them to implants. Even with protections, bootkits remain possible, executing pre-OS loader. A fused disable of PSB ensures perpetual vulnerability.

AMD’s microcode update addresses the issue, though coverage may vary. OEMs can patch SMM entry points to detect and halt on TClose activation, integrable into EDK2 or Coreboot. Users might trap MSR accesses via hypervisors. Reported in October 2023, CVE-2023-31315 was assigned, with an advisory published recently. Exploit code is forthcoming, underscoring the need for deepened architectural scrutiny.

Links:

PostHeaderIcon [DevoxxBE2024] A Kafka Producer’s Request: Or, There and Back Again by Danica Fine

Danica Fine, a developer advocate at Confluent, took Devoxx Belgium 2024 attendees on a captivating journey through the lifecycle of a Kafka producer’s request. Her talk demystified the complex process of getting data into Apache Kafka, often treated as a black box by developers. Using a Hobbit-themed example, Danica traced a producer.send() call from client to broker and back, detailing configurations and metrics that impact performance and reliability. By breaking down serialization, partitioning, batching, and broker-side processing, she equipped developers with tools to debug issues and optimize workflows, making Kafka less intimidating and more approachable.

Preparing the Journey: Serialization and Partitioning

Danica began with a simple schema for tracking Hobbit whereabouts, stored in a topic with six partitions and a replication factor of three. The first step in producing data is serialization, converting objects into bytes for brokers, controlled by key and value serializers. Misconfigurations here can lead to errors, so monitoring serialization metrics is crucial. Next, partitioning determines which partition receives the data. The default partitioner uses a key’s hash or sticky partitioning for keyless records to distribute data evenly. Configurations like partitioner.class, partitioner.ignore.keys, and partitioner.adaptive.partitioning.enable allow fine-tuning, with adaptive partitioning favoring faster brokers to avoid hot partitions, especially in high-throughput scenarios like financial services.

Batching for Efficiency

To optimize throughput, Kafka groups records into batches before sending them to brokers. Danica explained key configurations: batch.size (default 16KB) sets the maximum batch size, while linger.ms (default 0) controls how long to wait to fill a batch. Setting linger.ms above zero introduces latency but reduces broker load by sending fewer requests. buffer.memory (default 32MB) allocates space for batches, and misconfigurations can cause memory issues. Metrics like batch-size-avg, records-per-request-avg, and buffer-available-bytes help monitor batching efficiency, ensuring optimal throughput without overwhelming the client.

Sending the Request: Configurations and Metrics

Once batched, data is sent via a produce request over TCP, with configurations like max.request.size (default 1MB) limiting batch volume and acks determining how many replicas must acknowledge the write. Setting acks to “all” ensures high durability but increases latency, while acks=1 or 0 prioritizes speed. enable.idempotence and transactional.id prevent duplicates, with transactions ensuring consistency across sessions. Metrics like request-rate, requests-in-flight, and request-latency-avg provide visibility into request performance, helping developers identify bottlenecks or overloaded brokers.

Broker-Side Processing: From Socket to Disk

On the broker, requests enter the socket receive buffer, then are processed by network threads (default 3) and added to the request queue. IO threads (default 8) validate data with a cyclic redundancy check and write it to the page cache, later flushing to disk. Configurations like num.network.threads, num.io.threads, and queued.max.requests control thread and queue sizes, with metrics like network-processor-avg-idle-percent and request-handler-avg-idle-percent indicating thread utilization. Data is stored in a commit log with log, index, and snapshot files, supporting efficient retrieval and idempotency. The log.flush.rate and local-time-ms metrics ensure durable storage.

Replication and Response: Completing the Journey

Unfinished requests await replication in a “purgatory” data structure, with follower brokers fetching updates every 500ms (often faster). The remote-time-ms metric tracks replication duration, critical for acks=all. Once replicated, the broker builds a response, handled by network threads and queued in the response queue. Metrics like response-queue-time-ms and total-time-ms measure the full request lifecycle. Danica emphasized that understanding these stages empowers developers to collaborate with operators, tweaking configurations like default.replication.factor or topic-level settings to optimize performance.

Empowering Developers with Kafka Knowledge

Danica concluded by encouraging developers to move beyond treating Kafka as a black box. By mastering configurations and monitoring metrics, they can proactively address issues, from serialization errors to replication delays. Her talk highlighted resources like Confluent Developer for guides and courses on Kafka internals. This knowledge not only simplifies debugging but also fosters better collaboration with operators, ensuring robust, efficient data pipelines.

Links:

PostHeaderIcon Running Docker Natively on WSL2 (Ubuntu 24.04) in Windows 11

For many developers, Docker Desktop has long been the default solution to run Docker on Windows. However, licensing changes and the desire for a leaner setup have pushed teams to look for alternatives. Fortunately, with the maturity of Windows Subsystem for Linux 2 (WSL2), it is now possible to run the full Docker Engine directly inside a Linux distribution such as Ubuntu 24.04, while still accessing containers seamlessly from both Linux and Windows.

In this guide, I’ll walk you through a clean, step-by-step setup for running Docker Engine inside WSL2 without Docker Desktop, explain how Windows and WSL2 communicate, and share best practices for maintaining a healthy development environment.


Why Run Docker Inside WSL2?

Running Docker natively inside WSL2 has several benefits:

  • No licensing issues – you avoid Docker Desktop’s commercial license requirements.
  • Lightweight – no heavy virtualization layer; containers run directly inside your WSL Linux distro.
  • Integrated networking – on Windows 11 with modern WSL versions,
    containers bound to localhost inside WSL are automatically reachable from Windows.
  • Familiar Linux workflow – you install and use Docker exactly as you would on a regular Ubuntu server.

Step 1 – Update Ubuntu

Open your Ubuntu 24.04 terminal and ensure your system is up to date:

sudo apt update && sudo apt upgrade -y

Step 2 – Install Docker Engine

Install Docker using the official Docker repository:

# Install prerequisites
sudo apt install -y ca-certificates curl gnupg lsb-release

# Add Docker’s GPG key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
  sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

# Configure Docker repository
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker Engine
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Step 3 – Run Docker Without sudo

To avoid prefixing every command with sudo, add your user to the docker group:

sudo usermod -aG docker $USER

Restart your WSL terminal for the change to take effect, then verify:

docker --version
docker ps

Step 4 – Test Networking

One of the most common questions is:
“Will my containers be accessible from both Ubuntu and Windows?”
The answer is yes on modern Windows 11 with WSL2.
Let’s test it by running an Nginx container:

docker run -d -p 8080:80 --name webtest nginx
  • Inside Ubuntu (WSL): curl http://localhost:8080
  • From Windows (browser or PowerShell): http://localhost:8080

Thanks to WSL2’s localhost forwarding, Windows traffic to localhost is routed
into the WSL network, making containers instantly accessible without extra configuration.


Step 5 – Run Multi-Container Applications with Docker Compose

The Docker Compose plugin is already installed as part of the package above. Check the version:

docker compose version

Create a docker-compose.yml for a WordPress + MySQL stack:

version: "3.9"
services:
  db:
    image: mysql:8.0
    environment:
      MYSQL_ROOT_PASSWORD: rootpass
      MYSQL_DATABASE: wordpress
      MYSQL_USER: wpuser
      MYSQL_PASSWORD: wppass
    volumes:
      - db_data:/var/lib/mysql

  wordpress:
    image: wordpress:latest
    ports:
      - "8080:80"
    environment:
      WORDPRESS_DB_HOST: db:3306
      WORDPRESS_DB_USER: wpuser
      WORDPRESS_DB_PASSWORD: wppass
      WORDPRESS_DB_NAME: wordpress

volumes:
  db_data:

Start the services:

docker compose up -d

Once the containers are running, open http://localhost:8080 in your Windows browser
to access WordPress. The containers are managed entirely inside WSL2,
but networking feels seamless.


Maintenance: Cleaning Up Docker Data

Over time, Docker accumulates images, stopped containers, volumes, and networks.
This can take up significant disk space inside your WSL distribution.
Here are safe maintenance commands to keep your environment clean:

Remove Unused Objects

docker system prune -a --volumes
  • -a: removes all unused images, not just dangling ones
  • --volumes: also removes unused volumes

Reset Everything (Dangerous)

If you need to wipe your Docker environment completely (images, containers, volumes, networks):

docker stop $(docker ps -aq) 2>/dev/null
docker rm -f $(docker ps -aq) 2>/dev/null
docker volume rm $(docker volume ls -q) 2>/dev/null
docker network rm $(docker network ls -q) 2>/dev/null
docker image rm -f $(docker image ls -q) 2>/dev/null

⚠️ Use this only if you want to start fresh. All data will be removed.


Conclusion

By running Docker Engine directly inside WSL2, you gain a powerful, lightweight, and license-free Docker environment that integrates seamlessly with Windows 11. Your containers are accessible from both Linux and Windows, Docker Compose works out of the box, and maintenance is straightforward with prune commands.

This approach is particularly well-suited for developers who want the flexibility of Docker without the overhead of Docker Desktop. With WSL2 and Ubuntu 24.04, you get the best of both worlds: Linux-native Docker with Windows accessibility.

PostHeaderIcon [DevoxxUK2024] Exploring the Power of AI-Enabled APIs by Akshata Sawant

Akshata Sawant, a Senior Developer Advocate at Salesforce, delivered an insightful presentation at DevoxxUK2024, illuminating the transformative potential of AI-enabled APIs. With a career spanning seven years in API development and a recent co-authored book on MuleSoft for Salesforce developers, Akshata expertly navigates the convergence of artificial intelligence and application programming interfaces. Her talk explores how AI-powered APIs are reshaping industries by enhancing automation, data analysis, and user experiences, while also addressing critical ethical and security considerations. Through practical examples and a clear framework, Akshata demonstrates how these technologies synergize to create smarter, more connected systems.

The Evolution of APIs and AI Integration

Akshata begins by likening APIs to a waiter, facilitating seamless communication between disparate systems, such as a customer ordering food and a kitchen preparing it. This analogy underscores the fundamental role of APIs in enabling interoperability across applications. She traces the evolution of APIs from the cumbersome Enterprise JavaBeans (EJB) and SOAP-based systems to the more streamlined REST APIs, noting their pervasive adoption across industries. The advent of AI has further accelerated this evolution, leading to what Akshata terms “API sprawling,” where APIs are integral to integration ecosystems. She introduces three key aspects of AI-enabled APIs: consuming pre-built AI APIs, using AI to streamline API development, and embedding AI models into custom APIs to enhance functionality.

Practical Applications of AI-Enabled APIs

The first aspect Akshata explores is the use of pre-built AI APIs, which are readily available from providers like Google Cloud and Microsoft Azure. These APIs, encompassing generative AI, text, language, image, and video processing, allow developers to integrate advanced capabilities without building complex models from scratch. For instance, Google Cloud’s AI APIs offer use-case-specific endpoints that can be embedded into applications, enabling rapid deployment of intelligent features. Akshata highlights the accessibility of these APIs, which come with pricing models and trial options, making them viable for businesses seeking to enhance automation or data processing. She engages the audience by inquiring about their experience with such APIs, emphasizing their growing relevance in modern development.

The second dimension involves leveraging AI to accelerate API development. Akshata describes the API management lifecycle—designing, simulating, publishing, and documenting APIs—as a complex, iterative process. AI tools can simplify these stages, particularly in generating OpenAPI specifications and documentation. She provides an example where a simple prompt to an AI model produces a comprehensive OpenAPI specification for an order management system, streamlining a traditionally time-consuming task. Additionally, AI-driven intelligent document processing can scan invoices or purchase orders, extract relevant fields, and generate REST APIs with GET and POST methods, complete with auto-generated documentation. This approach significantly reduces manual effort and enhances efficiency.

Embedding AI into Custom APIs

The third aspect focuses on embedding AI models, such as large language models (LLMs) or custom co-pilot solutions, into APIs to create sophisticated applications. Akshata showcases Salesforce’s Einstein Assistant, which integrates with OpenAI’s models to process natural language requests. For example, querying “customer details for Mark” triggers an API call that matches the request to predefined actions, retrieves relevant data, and delivers a response. This seamless integration exemplifies how AI can elevate APIs beyond mere data transfer, enabling dynamic, context-aware interactions. Akshata emphasizes that such embeddings allow developers to create tailored solutions that enhance user experiences, such as personalized customer service or automated workflows.

Ethical and Security Considerations

While celebrating the potential of AI-enabled APIs, Akshata candidly addresses their challenges. She underscores the importance of ethical considerations, such as ensuring unbiased AI outputs and protecting user privacy. Security is another critical concern, as integrating AI into APIs introduces vulnerabilities that must be mitigated through robust authentication and data encryption. Akshata’s balanced perspective highlights the need for responsible development practices to maximize benefits while minimizing risks, ensuring that AI-driven solutions remain trustworthy and secure.

Links:

PostHeaderIcon [OxidizeConf2024] Continuous Compliance with Rust in Automotive Software

Introduction to Automotive Compliance

The automotive industry, with its intricate blend of mechanical and electronic systems, demands rigorous standards to ensure safety and reliability. Vignesh Radhakrishnan from Thoughtworks delivered an insightful presentation at OxidizeConf2024, exploring the concept of continuous compliance in automotive software development using Rust. He elucidated how the shift from mechanical to software-driven vehicles has amplified the need for robust compliance processes, particularly in adhering to standards like ISO 26262 and Automotive SPICE (ASPICE). These standards are pivotal in ensuring that automotive software meets stringent safety and quality requirements, safeguarding drivers and passengers alike.

Vignesh highlighted the transformation in the automotive landscape, where modern vehicles integrate complex software for features like adaptive headlights and reverse assist cameras. Unlike mechanical components with predictable failure patterns, software introduces variability that necessitates standardized compliance to maintain quality. The presentation underscored the challenges of traditional compliance methods, which are often manual, disconnected from development workflows, and conducted at the end of the development cycle, leading to inefficiencies and delayed feedback.

Continuous Compliance: A Paradigm Shift

Continuous compliance represents a transformative approach to integrating safety and quality assessments into the software development lifecycle. Vignesh emphasized that this practice involves embedding compliance checks within the development pipeline, allowing for immediate feedback on non-compliance issues. By maintaining documentation close to the code, such as requirements and test cases, developers can ensure traceability and accountability. This method not only streamlines the audit process but also reduces the mean-time-to-recovery when issues arise, enhancing overall efficiency.

The use of open-source tools like Sphinx, a Python documentation generator, was a focal point of Vignesh’s talk. Sphinx facilitates bidirectional traceability by linking requirements to code components, enabling automated generation of audit-ready documentation in HTML and PDF formats. Vignesh demonstrated a proof-of-concept telemetry project, showcasing how Rust’s cohesive toolchain, including Cargo and Clippy, integrates seamlessly with these tools to produce compliant software artifacts. This approach minimizes manual effort and ensures that compliance is maintained iteratively with every code commit.

Rust’s Role in Simplifying Compliance

Rust’s inherent features make it an ideal choice for automotive software development, particularly in achieving continuous compliance. Vignesh highlighted Rust’s robust toolchain, which includes tools like Cargo for building, testing, and formatting code. Unlike C or C++, where developers rely on disparate tools from multiple vendors, Rust offers a unified, developer-friendly environment. This cohesiveness simplifies the integration of compliance processes into continuous integration (CI) pipelines, as demonstrated in Vignesh’s example using CircleCI to automate compliance checks.

Moreover, Rust’s emphasis on safety and ownership models reduces common programming errors, aligning well with the stringent requirements of automotive standards. By leveraging Rust’s capabilities, developers can produce cleaner, more maintainable code that inherently supports compliance efforts. Vignesh’s example of generating traceability matrices and architectural diagrams using open-source tools like PlantUML further illustrated how Rust can enhance the compliance process, making it more accessible and cost-effective.

Practical Implementation and Benefits

In his demonstration, Vignesh showcased a practical implementation of continuous compliance using a telemetry project that streams data to AWS. By integrating Sphinx with Rust code, he illustrated how requirements, test cases, and architectural designs could be documented and linked automatically. This setup allows for real-time compliance assessments, ensuring that software remains audit-ready at all times. The use of open-source plugins and tools provides flexibility, enabling adaptation to various input sources like Jira, further streamlining the process.

The benefits of this approach are manifold. Continuous compliance fosters greater accountability within development teams, as non-compliance issues are identified early. It also enhances flexibility by allowing integration with existing project tools, reducing dependency on proprietary solutions. Vignesh cited the Ferrocene compiler as a real-world example, where similar open-source tools have been used to generate compliance artifacts, demonstrating the feasibility of this approach in large-scale projects.

Links:

PostHeaderIcon [DevoxxUK2024] Game, Set, Match: Transforming Live Sports with AI-Driven Commentary by Mark Needham & Dunith Danushka

Mark Needham, from ClickHouse’s product team, and Dunith Danushka, a Senior Developer Advocate at Redpanda, presented an innovative experiment at DevoxxUK2024, showcasing an AI-driven co-pilot for live sports commentary. Inspired by the BBC’s live text commentary for sports like tennis and football, their solution automates repetitive summarization tasks, freeing human commentators to focus on nuanced insights. By integrating Redpanda for streaming, ClickHouse for analytics, and a large language model (LLM) for text generation, they demonstrate a scalable architecture for real-time commentary. Their talk details the technical blueprint, practical implementation, and broader applications, offering a compelling pattern for generative AI in streaming data contexts.

Real-Time Data Streaming with Redpanda

Dunith introduces Redpanda, a Kafka-compatible streaming platform written in C++ to maximize modern hardware efficiency. Unlike Kafka, Redpanda consolidates components like the broker, schema registry, and HTTP proxy into a single binary, simplifying deployment and management. Its web-based console and CLI (rpk) facilitate debugging and administration, such as creating topics and inspecting payloads. In their demo, Mark and Dunith simulate a tennis match by feeding JSON-formatted events into a Redpanda topic named “points.” These events, capturing match details like scores and players, are published at 20x speed using a Python script with the Twisted library. Redpanda’s ability to handle high-throughput streams—hundreds of thousands of messages per second—ensures robust real-time data ingestion, setting the stage for downstream processing.

Analytics with ClickHouse

Mark explains ClickHouse’s role as a column-oriented analytics database optimized for aggregation queries. Unlike row-oriented databases like PostgreSQL, ClickHouse stores columns contiguously, enabling rapid processing of operations like counts or averages. Its vectorized query execution processes column chunks in parallel, enhancing performance for analytics tasks. In the demo, events from Redpanda are ingested into ClickHouse via a Kafka engine table, which mirrors the “points” topic. A materialized view transforms incoming JSON data into a structured table, converting timestamps and storing match metadata. Mark also creates a “matches” table for historical context, demonstrating ClickHouse’s ability to ingest streaming data in real time without batch processing, a key feature for dynamic applications.

Generating Commentary with AI

The core innovation lies in generating human-like commentary using an LLM, specifically OpenAI’s model. Mark and Dunith design a Streamlit-based web application, dubbed the “Live Text Commentary Admin Center,” where commentators can manually input text or trigger AI-generated summaries. The application queries ClickHouse for recent events (e.g., the last minute or game) using SQL, converts results to JSON, and feeds them into the LLM with a prompt instructing it to write concise, present-tense summaries for tennis fans. For example, a query retrieving the last game’s events might yield, “Zverev and Alcaraz slug it out in an epic five-set showdown.” While effective with frontier models like GPT-4, smaller models like Llama 3 struggled, highlighting the need for robust LLMs. The generated text is published to a Redpanda “live_text” topic, enabling flexible consumption.

Broadcasting and Future Applications

To deliver commentary to end users, Mark and Dunith employ Server-Sent Events (SSE) via a FastAPI server, streaming Redpanda’s “live_text” topic to a Streamlit web app. This setup mirrors real-world applications like Wikipedia’s recent changes feed, ensuring low-latency updates. The demo showcases commentary appearing in real time, with potential extensions like tweeting updates or storing them in a data warehouse. Beyond sports, Dunith highlights the architecture’s versatility for domains like live auctions, traffic updates, or food delivery tracking (e.g., Uber Eats notifications). Future enhancements include fine-tuning smaller LLMs, integrating fine-grained statistics via text-to-SQL, or summarizing multiple matches for comprehensive coverage, demonstrating the pattern’s adaptability for real-time generative applications.

Links: