Recent Posts
Archives

PostHeaderIcon CPU vs GPU: Why GPUs Dominate AI Workloads: A Practical, Code-Driven Explanation for Developers

Modern artificial intelligence workloads—particularly those associated with deep learning—have reshaped the way computation is structured and executed. While CPUs remain indispensable for general-purpose tasks, GPUs have become the de facto standard for training and running machine learning models.

This shift is not incidental. It is driven by a deep alignment between the mathematical structure of AI and the architectural characteristics of GPUs. In this article, we examine this alignment and illustrate it with representative code commonly found in real-world AI systems.

The Computational Nature of AI

At its core, modern machine learning is an exercise in large-scale numerical optimization. Whether training a convolutional network or a transformer, the dominant operations are:

  • Matrix multiplications
  • Tensor contractions
  • Element-wise transformations
  • Non-linear activations

These operations are instances of linear algebra applied at scale. Crucially, they exhibit a high degree of data parallelism: the same operation is applied repeatedly across large datasets.

From Mathematical Abstraction to Code

To understand why GPUs excel, it is instructive to look at how AI code is written in practice.

Example 1: A Simple Neural Network Layer (PyTorch)

import torch
import torch.nn as nn

# Define a simple linear layer
layer = nn.Linear(in_features=1024, out_features=512)

# Simulated batch of input data
x = torch.randn(64, 1024)  # batch size = 64

# Forward pass
y = layer(x)

The operation above is fundamentally a matrix multiplication followed by a bias addition. Each output element is computed independently, making the workload inherently parallel.

Example 2: Training Step in a Neural Network

import torch
import torch.nn as nn
import torch.optim as optim

model = nn.Sequential(
    nn.Linear(1024, 512),
    nn.ReLU(),
    nn.Linear(512, 10)
)

optimizer = optim.Adam(model.parameters())
criterion = nn.CrossEntropyLoss()

# Dummy input and labels
inputs = torch.randn(64, 1024)
targets = torch.randint(0, 10, (64,))

# Forward pass
outputs = model(inputs)

# Compute loss
loss = criterion(outputs, targets)

# Backward pass
loss.backward()

# Update weights
optimizer.step()
optimizer.zero_grad()

Both the forward and backward passes are dominated by tensor operations applied across entire batches, reinforcing the highly parallel nature of AI workloads.

Example 3: Convolutional Operation (Core of CNNs)

import torch
import torch.nn as nn

conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding=1)

# Batch of images: (batch_size, channels, height, width)
images = torch.randn(32, 3, 224, 224)

# Apply convolution
features = conv(images)

Convolutions apply the same kernel across spatial dimensions, resulting in a massive number of independent computations—ideal for parallel execution.

Example 4: Attention Mechanism (Transformer Core)

import torch
import torch.nn.functional as F

def attention(Q, K, V):
    scores = Q @ K.transpose(-2, -1)
    scores = scores / (Q.size(-1) ** 0.5)
    weights = F.softmax(scores, dim=-1)
    return weights @ V

# Simulated query, key, value tensors
Q = torch.randn(32, 8, 128, 64)  # batch, heads, seq_len, dim
K = torch.randn(32, 8, 128, 64)
V = torch.randn(32, 8, 128, 64)

output = attention(Q, K, V)

This pattern—matrix multiplication followed by normalization and weighted aggregation—is central to modern transformer architectures and exemplifies the computational intensity of AI workloads.

Architectural Alignment

A clear pattern emerges from these examples:

  • Uniform operations applied across large tensors
  • Minimal branching or complex control flow
  • Heavy reliance on linear algebra primitives

These characteristics align closely with GPU design, which emphasizes throughput and parallel execution.

Memory Throughput and Data Movement

AI workloads are not only compute-intensive but also data-intensive. Large tensors must be moved efficiently between memory and compute units. GPUs provide significantly higher memory bandwidth than CPUs, enabling sustained performance for such operations.

The Role of Frameworks

Modern frameworks abstract away hardware complexity while exposing high-level primitives such as tensor operations and automatic differentiation. This allows developers to write expressive code while leveraging specialized hardware.

Conclusion

The preference for GPUs in AI is a consequence of structural compatibility between workload and architecture. AI code is inherently parallel, tensor-centric, and dominated by linear algebra operations.

GPUs are designed precisely to execute such workloads efficiently at scale. For software developers, understanding this alignment is essential to building performant and scalable machine learning systems.

Further Exploration

  • Computational graphs and automatic differentiation
  • Transformer architectures
  • Mixed-precision training
  • GPU execution models

PostHeaderIcon [AWSReInvent2025] Amazon S3 Performance: Architecture, Design, and Optimization for Data-Intensive Systems

Lecturer

Ian Heritage is a Senior Solutions Architect at Amazon Web Services, specializing in Amazon S3 and large-scale data storage architectures. With deep expertise in performance engineering and distributed systems, Ian Heritage helps organizations design and optimize their storage layers for high-throughput and low-latency applications, including machine learning training and real-time analytics. He is a prominent figure in the AWS storage community, known for his technical deep-dives into S3’s internal mechanics and best practices for performance at scale.

Abstract

This article explores the internal architecture and performance optimization strategies of Amazon S3, the industry-leading object storage service. It provides a detailed analysis of the differences between S3 General Purpose and the newly introduced S3 Express One Zone storage class, highlighting the architectural trade-offs between regional durability and sub-millisecond latency. The discussion covers advanced request management techniques, including prefix partitioning, request routing, and the role of the AWS Common Runtime (CRT) in maximizing throughput. By examining these technical foundations, the article offers practical guidance for architecting storage solutions that can handle millions of requests per second and petabytes of data for modern AI and analytics workloads.

S3 Storage Class Selection for High Performance

The performance of an S3-based application is fundamentally determined by the selection of the storage class. For over a decade, S3 General Purpose (Standard) has been the default choice, offering 99.999999999% (11 9s) of durability by replicating data across at least three Availability Zones. While this provides extreme reliability, the regional replication introduces a baseline latency that may be too high for certain “request-intensive” applications, such as machine learning model checkpoints or high-frequency trading logs.

To address these needs, AWS introduced S3 Express One Zone. This storage class is designed for workloads that require consistent, single-digit millisecond latency. By storing data within a single Availability Zone and utilizing a new, purpose-built architecture, Express One Zone can deliver up to 10x the performance of S3 Standard at a 50% lower request cost. This class is ideal for applications that perform frequent, small I/O operations where the overhead of regional replication would be the primary bottleneck. The choice between Standard and Express One Zone is thus a strategic decision between geographic durability and extreme performance.

Request Routing, Partitioning, and the Scale-Out Architecture

At its core, Amazon S3 is a massively distributed system that scales out to handle virtually unlimited throughput. The key to this scaling is “partitioning.” S3 automatically partitions buckets based on the object keys (names). Each partition can support a specific number of requests: 3,500 PUT/COPY/POST/DELETE requests and 5,500 GET/HEAD requests per second per prefix. For many years, users were advised to use randomized prefixes to ensure even distribution across partitions.

Modern S3 architecture has evolved to handle this automatically, but understanding prefix design remains crucial for performance. When an application’s request rate increases, S3 detects the hot spot and splits the partition to handle the load. However, this process takes time. For workloads that burst from zero to millions of requests instantly, pre-partitioning or using a wide range of prefixes is still a best practice. By spreading data across multiple prefixes (e.g., bucket/prefix1/, bucket/prefix2/), an application can linearly scale its throughput to accommodate massive concurrency, limited only by the client’s network bandwidth and CPU.

Client-Side Optimization with AWS CRT and SDKs

While the S3 service is designed for scale, the performance experienced by the end-user is often limited by the client-side implementation. To bridge this gap, AWS developed the Common Runtime (CRT) library. The CRT is a set of open-source, C-based libraries that implement high-performance networking best practices, such as automatic request retries, congestion control, and most importantly, multipart transfers.

'''
Conceptual example of enabling CRT in the AWS SDK for Python (Boto3)
'''
import boto3
from s3transfer.manager import TransferConfig

'''
The CRT allows for automatic parallelization of large object transfers
'''
config = TransferConfig(use_threads=True, max_concurrency=10)
s3 = boto3.client('s3')

s3.upload_file('large_data.zip', 'my-bucket', 'data.zip', Config=config)

The CRT automatically breaks large objects into smaller parts and uploads or downloads them in parallel. This utilizes the full network capacity of the EC2 instance and mitigates the impact of single-path network congestion. For applications using the AWS CLI or SDKs for Java, Python, and C++, opting into the CRT-based clients can result in a significant throughput increase—often double or triple the speed of standard clients for large files. Additionally, the CRT handles the complexities of DNS load balancing and connection pooling, ensuring that requests are distributed efficiently across the S3 frontend fleet.

Case Study: Optimization for Machine Learning and Analytics

Machine learning training is a premier use case for S3 performance optimization. During the training of large language models (LLMs), hundreds or thousands of GPUs must simultaneously read training data and write model “checkpoints.” These checkpoints are multi-gigabyte files that must be saved quickly to avoid idling expensive compute resources. By combining S3 Express One Zone with the CRT-based client, researchers can achieve the throughput necessary to saturate the high-speed networking of P4 and P5 instances.

In analytics, the use of “Range Gets” is a critical optimization. Instead of downloading an entire 1GB Parquet file to read a few columns, an application can request specific byte ranges. This reduces the amount of data transferred and speeds up query execution. S3 is optimized to handle these range requests efficiently, and when combined with a partitioned data layout (e.g., partitioning by date or region), it enables sub-second query responses over petabytes of data. This architectural synergy between storage class, partitioning, and client-side logic is what allows S3 to serve as the foundation for the world’s largest data lakes.

Links:

PostHeaderIcon [KotlinConf2025] Simplifying Full-Stack Kotlin: A Fresh Take with HTMX and Ktor

Becoming a full-stack developer is a highly sought-after and valuable skill in today’s tech landscape, allowing individuals to own features from start to finish and make holistic architectural decisions. This versatility is particularly important for small teams and startups. However, the role can be intimidating due to the extensive list of technologies one is expected to master, including Kubernetes, Postgres, Kotlin, Ktor, and numerous JavaScript frameworks. Anders Sveen’s talk challenges this complexity, proposing a simpler, more streamlined approach to web development by using HTMX and Ktor with Kotlin.

The Case for Simplicity

Sveen poses a crucial question: do we truly need all this complexity when HTML and CSS remain stable, unlike the ever-changing frontend frameworks?. He argues that many applications don’t require the overhead of a modern JavaScript Single Page Application (SPA), since everything ultimately renders to HTML anyway. His proposed solution uses technologies like HTMX, AlpineJS, and Unpoly, which build upon HTML and CSS rather than replacing them, allowing developers to achieve 98% of SPA functionality with significantly less frontend code and complexity.

A Synergistic Solution

The core of the presentation demonstrates how HTMX and kotlinx.html combine with Ktor to build modern, interactive web applications. The stack offers a refreshing simplicity, leveraging Ktor’s powerful backend capabilities, kotlinx.html’s type-safe HTML generation, and HTMX’s elegant method for handling frontend interactions. The talk also highlights how this simplified stack can reduce the need for microservices and complex technical setups by minimizing unnecessary coordination within development teams. Sveen, with 20 years of experience, emphasizes that this approach allows developers to be more full-stack, enabling them to quickly take an idea, deliver a solution, and learn from user feedback.

Links:

PostHeaderIcon [DevoxxBE2025] Robotics and GraalVM Native Libraries

Lecturer

Florian Enner is a co-founder and chief software engineer at HEBI Robotics, a company focused on modular robotic systems for research and industrial applications. Holding a Master’s degree in Robotics from Carnegie Mellon University, he has contributed to advancements in real-time control software and hardware integration, with publications in venues like the IEEE International Conference on Robotics and Automation.

Abstract

This article explores the application of Java in robotics development, with a particular emphasis on real-time control and the emerging role of GraalVM’s native shared libraries as a potential substitute for portions of C++ codebases. It elucidates core concepts in modular robotic hardware and software design, positioned within the framework of HEBI Robotics’ efforts to create adaptable platforms for autonomous and inspection tasks. By examining demonstrations of robotic assemblies and code compilation processes, the narrative underscores approaches to achieving platform independence, optimizing execution speed, and incorporating safety protocols. The exploration assesses environmental factors in embedded computing, ramifications for workflow efficiency and system expandability, and offers perspectives on migrating established code for greater development agility.

Innovations in Modular Robotic Components

HEBI Robotics develops interchangeable elements that serve as sophisticated foundational units for assembling tailored robotic configurations, comparable to enhanced construction sets. These encompass drive mechanisms, visual sensors, locomotion foundations, and power sources, engineered to support swift prototyping across sectors like manufacturing oversight and self-governing navigation. The breakthrough resides in the drive units’ unified architecture, merging propulsion elements, position detectors, and regulation circuits into streamlined modules that permit sequential linking, thereby minimizing cabling demands and mitigating intricacies in articulated assemblies.

Situated within the broader landscape of robotics, this methodology counters the segmentation where pre-built options frequently fall short of bespoke requirements. Through standardized yet modifiable parts, HEBI promotes innovation in academic and commercial settings, allowing practitioners to prioritize advanced algorithms over fundamental assembly. For example, drive units facilitate instantaneous regulation at 1 kHz frequencies, incorporating adjustments for voltage fluctuations in portable energy scenarios and protective expirations to avert erratic operations.

Procedurally, the computational framework spans multiple programming environments, accommodating dialects such as Java, C++, Python, and MATLAB to broaden usability. Illustrations depict mechanisms like multi-legged walkers or wheeled units directed through wired or wireless connections, underscoring the arrangement’s durability in practical deployments. Ramifications involve diminished entry thresholds for exploratory groups, fostering accelerated cycles of refinement and more secure implementations, especially for novices or instructional purposes.

Java’s Application in Instantaneous Robotic Regulation

The deployment of Java in robotics contests traditional assumptions regarding its fitness for temporally stringent duties, which have historically been the domain of more direct languages. At HEBI, Java drives regulatory cycles on integrated Linux platforms, capitalizing on its comprehensive toolkit for output while attaining consistent timing. Central to this is the administration of memory reclamation interruptions via meticulous distribution tactics and localized variables for thread-specific information.

The interface conceals equipment details, enabling Java applications to dispatch directives and obtain responses promptly. An elementary Java routine can coordinate a mechanical limb’s motions:

import com.hebi.robotics.*;

public class LimbRegulation {
    public static void main(String[] args) {
        ModuleSet components = ModuleSet.fromDetection("limb");
        Group assembly = components.formAssembly();
        Directive dir = Directive.generate();
        dir.assignPlacement(new double[]{0.0, Math.PI/2, 0.0}); // Define joint placements
        assembly.transmitDirective(dir);
    }
}

This script identifies components, organizes a regulatory cluster, and issues placement directives. Protective measures incorporate through directive durations: absence of renewals within designated intervals (e.g., 100 ms) initiates shutdowns, safeguarding against risks.

Placed in perspective, this strategy diverges from C++’s prevalence in constrained devices, providing Java’s strengths in clarity and swift iteration. Examination indicates Java equaling C++ delays in regulatory sequences, with slight burdens alleviated by enhancements like preemptive assembly. Ramifications extend to group formation: Java’s accessibility draws varied expertise, hastening initiative timelines while upholding dependability.

GraalVM’s Native Assembly for Interchangeable Modules

GraalVM’s indigenous compilation converts Java scripts into independent executables or interchangeable modules, offering a pathway to supplant efficiency-vital C++ segments. At HEBI, this is investigated for interchangeable modules, assembling Java logic into .so files invokable from C++.

The procedure entails configuring GraalVM for introspections and assets, then assembling:

native-image --shared -jar mymodule.jar -H:Name=mymodule

This yields an interchangeable module with JNI exports. A basic illustration assembles a Java category with functions revealed for C++ calls:

public class Conference {
    public static int sum(int first, int second) {
        return first + second;
    }
}

Assembled into libconference.so, it is callable from C++. Illustrations confirm successful runs, with “Greetings Conference” output from Java-derived logic.

Situated within robotics’ demand for minimal-delay modules, this connects dialects, permitting Java for reasoning and C++ for boundaries. Efficiency assessments show near-indigenous velocities, with launch benefits over virtual machines. Ramifications involve streamlined upkeep: Java’s protective attributes diminish flaws in regulations, while indigenous assembly guarantees harmony with current C++ frameworks.

Efficiency Examination and Practical Illustrations

Efficiency benchmarks contrast GraalVM modules to C++ counterparts: in sequences, delays are equivalent, with Java’s reclamation regulated for predictability. Practical illustrations encompass serpentine overseers traversing conduits, regulated through Java for trajectory planning.

Examination discloses GraalVM’s promise in constrained contexts, where rapid assemblies (under 5 minutes for minor modules) permit swift refinements. Protective attributes, like speed restrictions, merge effortlessly.

Ramifications: blended codebases capitalize on advantages, boosting expandability for intricate mechanisms like equilibrium platforms involving users.

Prospective Paths in Robotic Computational Frameworks

GraalVM vows additional mergers, such as multilingual modules for fluid multi-dialect calls. HEBI foresees complete Java regulations, lessening C++ dependence for superior output.

Obstacles: guaranteeing temporal assurances in assembled scripts. Prospective: wider integrations in robotic structures.

In overview, GraalVM empowers Java in robotics, fusing proficiency with creator-oriented instruments for novel arrangements.

Links:

  • Lecture video: https://www.youtube.com/watch?v=md2JFgegN7U
  • Florian Enner on LinkedIn: https://www.linkedin.com/in/florian-enner-59b81466/
  • Florian Enner on GitHub: https://github.com/ennerf
  • HEBI Robotics website: https://www.hebirobotics.com/

PostHeaderIcon [MunchenJUG] Reliability in Enterprise Software: A Critical Analysis of Automated Testing in Spring Boot Ecosystems (27/Oct/2025)

Lecturer

Philip Riecks is an independent software consultant and educator specializing in Java, Spring Boot, and cloud-native architectures. With over seven years of professional experience in the software industry, Philip has established himself as a prominent voice in the Java ecosystem through his platform, Testing Java Applications Made Simple. He is a co-author of the influential technical book Stratospheric: From Zero to Production with Spring Boot and AWS, which bridges the gap between local development and production-ready cloud deployments. In addition to his consulting work, he produces extensive educational content via his blog and YouTube channel, focusing on demystifying complex testing patterns for enterprise developers.

Abstract

In the contemporary landscape of rapid software delivery, automated testing serves as the primary safeguard for application reliability and maintainability. This article explores the methodologies for demystifying testing within the Spring Boot framework, moving beyond superficial unit tests toward a comprehensive strategy that encompasses integration and slice testing. By analyzing the “Developer’s Dilemma”—the friction between speed of delivery and the confidence provided by a robust test suite—this analysis identifies key innovations such as the “Testing Pyramid” and specialized Spring Boot test slices. The discussion further examines the technical implications of external dependency management through tools like Testcontainers and WireMock, advocating for a holistic approach that treats test code with the same rigor as production logic.

The Paradigm Shift in Testing Methodology

Traditional software development often relegated testing to a secondary phase, frequently outsourced to separate quality assurance departments. However, the rise of DevOps and continuous integration has necessitated a shift toward “test-driven” or “test-enabled” development. Philip Riecks identifies that the primary challenge for developers is not the lack of tools, but the lack of a clear strategy. Testing is often perceived as a bottleneck rather than an accelerator.

The methodology proposed focuses on the Testing Pyramid, which prioritizes a high volume of fast, isolated unit tests at the base, followed by a smaller number of integration tests, and a minimal set of end-to-end (E2E) tests at the apex. The innovation in Spring Boot testing lies in its ability to provide “Slice Testing,” allowing developers to load only specific parts of the application context (e.g., the web layer or the data access layer) rather than the entire infrastructure. This approach significantly reduces test execution time while maintaining high fidelity.

Architectural Slicing and Context Management

One of the most powerful features of the Spring Boot ecosystem is its refined support for slice testing via annotations. This allows for an analytical approach to testing where the scope of the test is strictly defined by the architectural layer under scrutiny.

  1. Web Layer Testing: Using @WebMvcTest, developers can test REST controllers without launching a full HTTP server. This slice provides a mocked environment where the web infrastructure is active, but business services are replaced by mocks (e.g., using @MockBean).
  2. Data Access Testing: The @DataJpaTest annotation provides a specialized environment for testing JPA repositories. It typically uses an in-memory database by default, ensuring that database interactions are verified without the overhead of a production-grade database.
  3. JSON Serialization: @JsonTest isolates the serialization and deserialization logic, ensuring that data structures correctly map to their JSON representations.

This granular control prevents “Context Bloat,” where tests become slow and brittle due to the unnecessary loading of the entire application environment.

Code Sample: A Specialized Controller Test Slice

@WebMvcTest(UserRegistrationController.class)
class UserRegistrationControllerTest {

    @Autowired
    private MockMvc mockMvc;

    @MockBean
    private UserRegistrationService registrationService;

    @Test
    void shouldRegisterUserSuccessfully() throws Exception {
        mockMvc.perform(post("/api/users")
                .contentType(MediaType.APPLICATION_JSON)
                .content("{\"username\": \"priecks\", \"email\": \"philip@example.com\"}"))
                .andExpect(status().isCreated());
    }
}

Managing External Dependencies: Testcontainers and WireMock

A significant hurdle in integration testing is the reliance on external systems such as databases, message brokers, or third-party APIs. Philip emphasizes the move away from “In-Memory” databases (like H2) for testing production-grade applications, citing the risk of “Environment Parity” issues where H2 behaves differently than a production PostgreSQL instance.

The integration of Testcontainers allows developers to spin up actual Docker instances of their production infrastructure during the test lifecycle. This ensures that the code is tested against the exact same database engine used in production. Similarly, WireMock is utilized to simulate external HTTP APIs, allowing for the verification of fault-tolerance mechanisms like retries and circuit breakers without depending on the availability of the actual external service.

Consequences of Testing on Long-term Maintainability

The implications of a robust testing strategy extend far beyond immediate bug detection. A well-tested codebase enables fearless refactoring. When developers have a “safety net” of automated tests, they can update dependencies, optimize algorithms, or redesign components with the confidence that existing functionality remains intact.

Furthermore, Philip argues that the responsibility for quality must lie with the engineer who writes the code. In an “On-Call” culture, the developer who builds the system also runs it. This ownership model, supported by automated testing, transforms software engineering from a process of “handing over” code to one of “carefully crafting” resilient systems.

Conclusion

Demystifying Spring Boot testing requires a transition from viewing tests as a chore to seeing them as a fundamental engineering discipline. By leveraging architectural slices, managing dependencies with Testcontainers, and adhering to the Testing Pyramid, developers can build applications that are not only functional but also sustainable. The ultimate goal is to reach a state where testing provides joy through the confidence it instills, ensuring that the software remains a robust asset for the enterprise rather than a source of technical debt.

Links:

PostHeaderIcon [DotJs2025] Recreating Windows Media Player Art with Web MIDI API

Nostalgia’s neon flickers in forgotten folders—Windows Media Player’s pulsating palettes, sound’s synesthetic surge—yet Web MIDI API resurrects this reverie, fusing firmware’s fidelity with canvas’s caprice. Vadim Smirnov, developer advocate at CKEditor, resurrected this relic at dotJS 2025, scripting synths to spawn spectra from scales. A code crafter craving quirks, Vadim vivified browser APIs’ bounty: from Gamepad’s grips to MIDI’s melodies, an arsenal arcane yet accessible.

Vadim’s voyage ventured MDN’s meadow: APIs’ alphabet aglow—A to C’s cornucopia, from Ambient Light’s auras to Credential Management’s cloaks. Web MIDI’s majesty: hardware handshakes, note’s ingress igniting visuals—synthesizers’ sonatas, sequencers’ scrolls. Vadim’s vignette: MIDI’s muster (navigator.requestMIDIAccess()), inputs’ influx, notes’ notation—velocity’s vibrancy vectoring hues, pitch’s palette permuting particles.

Canvas’s choreography: particles’ proliferation, hues’ harmony—Web Audio’s waveforms weaving waves, MIDI’s messages modulating mods. Vadim’s virtuoso: keyboard kludges for synthless souls, repos’ riches for replication—exploration’s exhortation, weirdness’s warrant.

This resurrection: APIs’ allure, physicality’s portal—code’s cadence commanding keys.

APIs’ Abundant Arsenal

Vadim vaulted MDN’s vault: A-C’s array—Gamepad’s grasp to MIDI’s muse. Web MIDI’s weave: access’s appeal, notes’ nuance—velocity’s vector, pitch’s prism.

Synths’ Spectra and Sketches

Canvas’s cascade: particles’ pulse, hues’ hymn—Audio’s arcs amplifying MIDI’s motifs. Vadim’s vow: keyboards’ kinship, repos’ revelation—weird’s wisdom.

Links:

PostHeaderIcon [RivieraDev2025] Moustapha Agack – One Pixel at a Time: Running DOOM on an E-Reader

Moustapha Agack regaled the Riviera DEV 2025 crowd with a tale of audacious tinkering in his session, chronicling his quest to resurrect the iconic DOOM on a humble Kindle e-reader. Lacking embedded systems expertise, Moustapha embarked on this odyssey driven by whimsy and challenge, transforming a 25-euro thrift find into a retro gaming relic. His narrative wove through hardware idiosyncrasies, software sorcery, and triumphant playback, celebrating the open-source ethos that fuels such feats.

The Allure of DOOM: A Porting Phenomenon

Moustapha kicked off by immersing attendees in DOOM’s lore, the 1993 id Software opus that pioneered first-person shooters with its labyrinthine levels and demonic foes. Its source code, liberated in 1997 under GPL, has spawned thousands of ports—from pregnancy tests to analytics dashboards—cementing its status as internet folklore. Moustapha quipped about the “Run DOOM on Reddit” subreddit, where biweekly posts chronicle absurd adaptations, like voice-powered variants or alien hardware hypotheticals.

The game’s appeal lies in its modular C codebase: clean patterns, hardware abstraction layers, and raycasting renderer make it portable gold. Moustapha praised its elegance—efficient collision detection, binary space partitioning—contrasting his novice TypeScript background with the raw C grit. This disparity fueled his motivation: prove that curiosity trumps credentials in maker pursuits.

Decoding the Kindle: E-Ink Enigmas

Shifting to hardware, Moustapha dissected the Kindle 4 (2010 model), a $25 Boncoin bargain boasting 500,000 pixels of e-ink wizardry. Unlike LCDs, e-ink mimics paper via electrophoretic microspheres—black-and-white beads in oil, manipulated by electric fields for grayscale shades. He likened pixels to microscopic disco balls: charged fields flip beads, yielding 16-level grays but demanding full refreshes to banish “ghosting” artifacts.

The ARM9 processor (532 MHz), 256MB RAM, and Linux kernel (2.6.31) promise viability, yet jailbreaking—via USB exploits—unlocks framebuffer access for custom rendering. Moustapha detailed framebuffer mechanics: direct memory writes trigger screen updates, but e-ink’s sluggish 500ms latency and power draw necessitate optimizations like partial refreshes. His setup bypassed Amazon’s sandbox, installing a minimal environment sans GUI, priming the device for DOOM’s pixel-pushing demands.

Cross-Compilation Conundrums and Code Conjuring

The crux lay in bridging architectures: compiling DOOM’s x86-centric code for ARM. Moustapha chronicled toolchain tribulations—Dockerized GCC cross-compilers, dependency hunts yielding bloated binaries. He opted for Chocolate Doom, a faithful source port, stripping extraneous features for e-ink austerity: monochrome palettes, scaled resolutions (400×600 to 320×240), and throttled framerates (1-2 FPS) to sync with refresh cycles.

Input mapping proved fiendish: no joystick meant keyboard emulation via five tactile buttons, scripted in Lua for directional strafing. Rendering tweaks—dithered grayscale conversion, waveform controls for ghost mitigation—ensured legibility. Moustapha shared war stories: endless iterations debugging endianness mismatches, memory overflows, and linker woes, underscoring embedded development’s unforgiving precision.

Triumph and Takeaways: Pixels in Motion

Victory arrived with a live demo: DOOM’s corridors flickering on e-ink, demons dispatched amid deliberate blips. Moustapha beamed at this personal milestone—a 2000s internet kid etching his port into legenddom. He open-sourced everything: binaries, scripts, slides via Slidev (Markdown-JS hybrid for interactive decks), inviting Kindlers to replicate.

Reflections abounded: e-ink’s constraints honed creativity, cross-compilation demystified low-level ops, and DOOM’s legacy affirmed open-source’s democratizing force. Moustapha urged aspiring hackers: embrace imperfection, iterate relentlessly, and revel in absurdity. His odyssey reminds that innovation blooms in unlikely crucibles—one pixel, one port at a time.

Links:

PostHeaderIcon [GoogleIO2024] Tune and Deploy Gemini with Vertex AI and Ground with Cloud Databases: Building AI Applications

Vertex AI offers a comprehensive lifecycle for Gemini models, enabling customization and deployment. Ivan Nardini and Bala Narasimhan demonstrated fine-tuning, evaluation, and grounding techniques, using a media company scenario to illustrate practical applications.

Addressing Business Challenges with AI Solutions

Ivan framed the discussion around Symol Media’s issues: rising churn rates, declining engagement, and dropping satisfaction scores. Analysis revealed users spending under a minute on articles, signaling navigation and content quality problems.

The proposed AI-driven revamp personalizes the website, recommending articles based on preferences. This leverages Gemini Pro on Vertex AI, fine-tuned with company data for tailored summaries and suggestions.

Bala explained the architecture, integrating Cloud SQL for PostgreSQL with vector embeddings for semantic search, ensuring relevant content delivery.

Fine-Tuning and Deployment on Vertex AI

Ivan detailed supervised fine-tuning (SFT) on Vertex AI, using datasets of article summaries to adapt Gemini. This process, accessible via console or APIs, involves parameter-efficient tuning for cost-effectiveness.

Deployment creates scalable endpoints, with monitoring ensuring performance. Evaluation compares models using metrics like ROUGE, validating improvements.

These steps, available since 2024, enable production-ready AI with minimal infrastructure management.

Grounding with Cloud Databases for Accuracy

Bala focused on retrieval-augmented generation (RAG) using Cloud SQL’s vector capabilities. Embeddings from articles are stored and queried semantically, grounding responses in factual data to reduce hallucinations.

The jumpstart solution deploys this stack easily, with observability tools monitoring query performance and cache usage.

Launched in 2024, this integration supports production gen AI apps with robust data handling.

Observability and Future Enhancements

The demo showcased insights for query optimization, including execution plans and user metrics. Future plans include expanded vector support across Google Cloud databases.

This holistic approach empowers developers to build trustworthy AI solutions.

Links:

PostHeaderIcon [NDCMelbourne2025] TDD & DDD from the Ground Up – Chris Simon

Chris Simon, a seasoned developer and co-organizer of Domain-Driven Design Australia, presents a compelling live-coding session at NDC Melbourne 2025, demonstrating how Test-Driven Development (TDD) and Domain-Driven Design (DDD) can create maintainable, scalable software. Through a university enrollment system example, Chris illustrates how TDD’s iterative red-green-refactor cycle and DDD’s focus on ubiquitous language and domain modeling can evolve a simple CRUD application into a robust solution. His approach highlights the power of combining these methodologies to adapt to changing requirements without compromising code quality.

Starting with TDD: The Red-Green-Refactor Cycle

Chris kicks off by introducing TDD’s core phases: writing a failing test (red), making it pass with minimal code (green), and refactoring to improve structure. Using a .NET-based university enrollment system, he begins with a basic test to register a student, ensuring a created status response. Each step is deliberately small, balancing test and implementation to minimize risk. This disciplined approach, Chris explains, builds a safety net of tests, allowing confident code evolution as complexity increases.

Incorporating DDD: Ubiquitous Language and Domain Logic

As the system grows, Chris introduces DDD principles, particularly the concept of ubiquitous language. He renames methods to reflect business intent, such as “register” instead of “create” for students, and uses a static factory method to encapsulate logic. His IDE extension, Contextive, further supports this by providing domain term definitions across languages, ensuring consistency. By moving validation logic, like checking room availability, into domain models, Chris ensures business rules are encapsulated, reducing controller complexity and enhancing maintainability.

Handling Complexity: Refactoring for Scalability

As requirements evolve, such as preventing course over-enrollment, Chris encounters a race condition in the initial implementation. He demonstrates how TDD’s tests catch this issue, allowing safe refactoring. Through event storming, he rethinks the domain model, delaying room allocation until course popularity is known. This shift, informed by domain expert collaboration, optimizes resource utilization and eliminates unnecessary constraints, showcasing DDD’s ability to align code with business needs.

Balancing Testing Strategies

Chris explores the trade-offs between API-level and unit-level testing. While API tests protect the public contract, unit tests for complex scheduling algorithms allow faster, more efficient test setup. By testing a scheduler that matches courses to rooms based on enrollment counts, he ensures robust logic without overcomplicating API tests. This strategic balance, he argues, maintains refactorability while addressing intricate business rules, a key takeaway for developers navigating complex domains.

Adapting to Change with Confidence

The session culminates in a significant refactor, removing the over-enrollment check after realizing it’s applied at the wrong stage. Chris’s tests provide the confidence to make this change, ensuring no unintended regressions. By making domain model setters private, he confirms the system adheres to DDD principles, encapsulating business logic effectively. This adaptability, driven by TDD and DDD, underscores the value of iterative development and domain collaboration in building resilient software.

Links:

PostHeaderIcon [AWSReInforce2025] AWS Network Firewall: Latest features and deployment options (NIS201-NEW)

Lecturer

Amish Shah serves as Product Manager for AWS Network Firewall, driving capabilities that simplify stateful inspection at scale. His team focuses on reducing operational complexity while maintaining granular control across VPC and Transit Gateway environments.

Abstract

The technical session introduces enhancements to AWS Network Firewall that address deployment complexity, visibility gaps, and threat defense sophistication. Through Transit Gateway integration, automated domain management, and active threat defense, it establishes patterns for consistent security policy enforcement across hybrid architectures.

Transit Gateway Integration Architecture

Native Transit Gateway attachment eliminates appliance sprawl:

VPC A → TGW → Network Firewall Endpoint → VPC B

Traffic flows symmetrically through firewall endpoints in each Availability Zone. Centralized route table management propagates 10.0.0.0/8 via firewall inspection while maintaining 172.16.0.0/12 for direct connectivity. This pattern supports:

  • 100 Gbps aggregate throughput
  • Automatic failover across AZs
  • Consistent policy application across spokes

Multiple VPC Endpoint Support

The new capability permits multiple firewall endpoints per VPC:

endpoints:
  - subnet: us-east-1a
    az: us-east-1a
  - subnet: us-east-1b
    az: us-east-1b
  - subnet: us-east-1c
    az: us-east-1c

Each endpoint maintains independent health status. Route tables direct traffic to healthy endpoints, achieving 99.999% availability. This eliminates single points of failure in multi-AZ architectures.

Automated Domain List Management

Dynamic domain lists update hourly from AWS threat intelligence:

{
  "source": "AWSManaged",
  "name": "PhishingDomains",
  "update_frequency": "3600",
  "action": "DROP"
}

Integration with Route 53 Resolver DNS Firewall enables layer 7 blocking before connection establishment. The console provides visibility into list versions, rule hits, and update timestamps.

Active Threat Defense with Managed Rules

The new managed rule group consumes real-time threat intelligence:

{
  "rule_group": "AttackInfrastructure",
  "action": "DROP",
  "threat_signatures": 1500000,
  "update_source": "AWS Threat Intel"
}

Rules target C2 infrastructure, exploit kits, and phishing domains. Capacity consumption appears in console metrics, enabling budget planning. Organizations can toggle to ALERT mode for forensic analysis before enforcement.

Operational Dashboard and Metrics

The enhanced dashboard displays:

  • Top talkers by bytes/packets
  • Rule group utilization
  • Threat signature matches
  • Endpoint health status
SELECT source_ip, sum(bytes) 
FROM firewall_logs 
WHERE action = 'DROP' 
GROUP BY source_ip 
ORDER BY 2 DESC LIMIT 10

CloudWatch integration enables alerting on anomalous patterns.

Deployment Best Practices

Reference architectures include:

  1. Centralized Egress: Internet-bound traffic via TGW to shared firewall
  2. Distributed Ingress: Public ALB → firewall endpoint → application VPC
  3. Hybrid Connectivity: Site-to-Site VPN through firewall inspection

Terraform modules automate endpoint creation, policy attachment, and logging configuration.

Conclusion: Simplified Security at Scale

The enhancements transform Network Firewall from complex appliance management into a cloud-native security fabric. Transit Gateway integration eliminates topology constraints, automated domain lists reduce rule maintenance, and active threat defense blocks known bad actors at line rate. Organizations achieve consistent, scalable protection without sacrificing operational agility.

Links: