Recent Posts
Archives

PostHeaderIcon [OxidizeConf2024] Continuous Compliance with Rust in Automotive Software

Introduction to Automotive Compliance

The automotive industry, with its intricate blend of mechanical and electronic systems, demands rigorous standards to ensure safety and reliability. Vignesh Radhakrishnan from Thoughtworks delivered an insightful presentation at OxidizeConf2024, exploring the concept of continuous compliance in automotive software development using Rust. He elucidated how the shift from mechanical to software-driven vehicles has amplified the need for robust compliance processes, particularly in adhering to standards like ISO 26262 and Automotive SPICE (ASPICE). These standards are pivotal in ensuring that automotive software meets stringent safety and quality requirements, safeguarding drivers and passengers alike.

Vignesh highlighted the transformation in the automotive landscape, where modern vehicles integrate complex software for features like adaptive headlights and reverse assist cameras. Unlike mechanical components with predictable failure patterns, software introduces variability that necessitates standardized compliance to maintain quality. The presentation underscored the challenges of traditional compliance methods, which are often manual, disconnected from development workflows, and conducted at the end of the development cycle, leading to inefficiencies and delayed feedback.

Continuous Compliance: A Paradigm Shift

Continuous compliance represents a transformative approach to integrating safety and quality assessments into the software development lifecycle. Vignesh emphasized that this practice involves embedding compliance checks within the development pipeline, allowing for immediate feedback on non-compliance issues. By maintaining documentation close to the code, such as requirements and test cases, developers can ensure traceability and accountability. This method not only streamlines the audit process but also reduces the mean-time-to-recovery when issues arise, enhancing overall efficiency.

The use of open-source tools like Sphinx, a Python documentation generator, was a focal point of Vignesh’s talk. Sphinx facilitates bidirectional traceability by linking requirements to code components, enabling automated generation of audit-ready documentation in HTML and PDF formats. Vignesh demonstrated a proof-of-concept telemetry project, showcasing how Rust’s cohesive toolchain, including Cargo and Clippy, integrates seamlessly with these tools to produce compliant software artifacts. This approach minimizes manual effort and ensures that compliance is maintained iteratively with every code commit.

Rust’s Role in Simplifying Compliance

Rust’s inherent features make it an ideal choice for automotive software development, particularly in achieving continuous compliance. Vignesh highlighted Rust’s robust toolchain, which includes tools like Cargo for building, testing, and formatting code. Unlike C or C++, where developers rely on disparate tools from multiple vendors, Rust offers a unified, developer-friendly environment. This cohesiveness simplifies the integration of compliance processes into continuous integration (CI) pipelines, as demonstrated in Vignesh’s example using CircleCI to automate compliance checks.

Moreover, Rust’s emphasis on safety and ownership models reduces common programming errors, aligning well with the stringent requirements of automotive standards. By leveraging Rust’s capabilities, developers can produce cleaner, more maintainable code that inherently supports compliance efforts. Vignesh’s example of generating traceability matrices and architectural diagrams using open-source tools like PlantUML further illustrated how Rust can enhance the compliance process, making it more accessible and cost-effective.

Practical Implementation and Benefits

In his demonstration, Vignesh showcased a practical implementation of continuous compliance using a telemetry project that streams data to AWS. By integrating Sphinx with Rust code, he illustrated how requirements, test cases, and architectural designs could be documented and linked automatically. This setup allows for real-time compliance assessments, ensuring that software remains audit-ready at all times. The use of open-source plugins and tools provides flexibility, enabling adaptation to various input sources like Jira, further streamlining the process.

The benefits of this approach are manifold. Continuous compliance fosters greater accountability within development teams, as non-compliance issues are identified early. It also enhances flexibility by allowing integration with existing project tools, reducing dependency on proprietary solutions. Vignesh cited the Ferrocene compiler as a real-world example, where similar open-source tools have been used to generate compliance artifacts, demonstrating the feasibility of this approach in large-scale projects.

Links:

PostHeaderIcon [DevoxxUK2024] Game, Set, Match: Transforming Live Sports with AI-Driven Commentary by Mark Needham & Dunith Danushka

Mark Needham, from ClickHouse’s product team, and Dunith Danushka, a Senior Developer Advocate at Redpanda, presented an innovative experiment at DevoxxUK2024, showcasing an AI-driven co-pilot for live sports commentary. Inspired by the BBC’s live text commentary for sports like tennis and football, their solution automates repetitive summarization tasks, freeing human commentators to focus on nuanced insights. By integrating Redpanda for streaming, ClickHouse for analytics, and a large language model (LLM) for text generation, they demonstrate a scalable architecture for real-time commentary. Their talk details the technical blueprint, practical implementation, and broader applications, offering a compelling pattern for generative AI in streaming data contexts.

Real-Time Data Streaming with Redpanda

Dunith introduces Redpanda, a Kafka-compatible streaming platform written in C++ to maximize modern hardware efficiency. Unlike Kafka, Redpanda consolidates components like the broker, schema registry, and HTTP proxy into a single binary, simplifying deployment and management. Its web-based console and CLI (rpk) facilitate debugging and administration, such as creating topics and inspecting payloads. In their demo, Mark and Dunith simulate a tennis match by feeding JSON-formatted events into a Redpanda topic named “points.” These events, capturing match details like scores and players, are published at 20x speed using a Python script with the Twisted library. Redpanda’s ability to handle high-throughput streams—hundreds of thousands of messages per second—ensures robust real-time data ingestion, setting the stage for downstream processing.

Analytics with ClickHouse

Mark explains ClickHouse’s role as a column-oriented analytics database optimized for aggregation queries. Unlike row-oriented databases like PostgreSQL, ClickHouse stores columns contiguously, enabling rapid processing of operations like counts or averages. Its vectorized query execution processes column chunks in parallel, enhancing performance for analytics tasks. In the demo, events from Redpanda are ingested into ClickHouse via a Kafka engine table, which mirrors the “points” topic. A materialized view transforms incoming JSON data into a structured table, converting timestamps and storing match metadata. Mark also creates a “matches” table for historical context, demonstrating ClickHouse’s ability to ingest streaming data in real time without batch processing, a key feature for dynamic applications.

Generating Commentary with AI

The core innovation lies in generating human-like commentary using an LLM, specifically OpenAI’s model. Mark and Dunith design a Streamlit-based web application, dubbed the “Live Text Commentary Admin Center,” where commentators can manually input text or trigger AI-generated summaries. The application queries ClickHouse for recent events (e.g., the last minute or game) using SQL, converts results to JSON, and feeds them into the LLM with a prompt instructing it to write concise, present-tense summaries for tennis fans. For example, a query retrieving the last game’s events might yield, “Zverev and Alcaraz slug it out in an epic five-set showdown.” While effective with frontier models like GPT-4, smaller models like Llama 3 struggled, highlighting the need for robust LLMs. The generated text is published to a Redpanda “live_text” topic, enabling flexible consumption.

Broadcasting and Future Applications

To deliver commentary to end users, Mark and Dunith employ Server-Sent Events (SSE) via a FastAPI server, streaming Redpanda’s “live_text” topic to a Streamlit web app. This setup mirrors real-world applications like Wikipedia’s recent changes feed, ensuring low-latency updates. The demo showcases commentary appearing in real time, with potential extensions like tweeting updates or storing them in a data warehouse. Beyond sports, Dunith highlights the architecture’s versatility for domains like live auctions, traffic updates, or food delivery tracking (e.g., Uber Eats notifications). Future enhancements include fine-tuning smaller LLMs, integrating fine-grained statistics via text-to-SQL, or summarizing multiple matches for comprehensive coverage, demonstrating the pattern’s adaptability for real-time generative applications.

Links:

PostHeaderIcon [DevoxxUK2024] Enter The Parallel Universe of the Vector API by Simon Ritter

Simon Ritter, Deputy CTO at Azul Systems, delivered a captivating session at DevoxxUK2024, exploring the transformative potential of Java’s Vector API. This innovative API, introduced as an incubator module in JDK 16 and now in its eighth iteration in JDK 23, empowers developers to harness Single Instruction Multiple Data (SIMD) instructions for parallel processing. By leveraging Advanced Vector Extensions (AVX) in modern processors, the Vector API enables efficient execution of numerically intensive operations, significantly boosting application performance. Simon’s talk navigates the intricacies of vector computations, contrasts them with traditional concurrency models, and demonstrates practical applications, offering developers a powerful tool to optimize Java applications.

Understanding Concurrency and Parallelism

Simon begins by clarifying the distinction between concurrency and parallelism, a common source of confusion. Concurrency involves tasks that overlap in execution time but may not run simultaneously, as the operating system may time-share a single CPU. Parallelism, however, ensures tasks execute simultaneously, leveraging multiple CPUs or cores. For instance, two users editing documents on separate machines achieve parallelism, while a single-core CPU running multiple tasks creates the illusion of parallelism through time-sharing. Java’s threading model, introduced in JDK 1.0, facilitates concurrency via the Thread class, but coordinating data sharing across threads remains challenging. Simon highlights how Java evolved with the concurrency utilities in JDK 5, the Fork/Join framework in JDK 7, and parallel streams in JDK 8, each simplifying concurrent programming while introducing trade-offs, such as non-deterministic results in parallel streams.

The Essence of Vector Processing

The Vector API, distinct from the legacy java.util.Vector class, enables true parallel processing within a single execution unit using SIMD instructions. Simon explains that vectors in mathematics represent sets of values, unlike scalars, and the Vector API applies this concept by storing multiple values in wide registers (e.g., 256-bit AVX2 registers). These registers, divided into lanes (e.g., eight 32-bit integers), allow a single operation, such as adding a constant, to process all lanes in one clock cycle. This contrasts with iterative loops, which process elements sequentially. Historical context reveals SIMD’s roots in 1960s supercomputers like the ILLIAC IV and Cray-1, with modern implementations in Intel’s MMX, SSE, and AVX instructions, culminating in AVX-512 with 512-bit registers. The Vector API abstracts these complexities, enabling developers to write cross-platform code without targeting specific microarchitectures.

Leveraging the Vector API

Simon illustrates the Vector API’s practical application through its core components: Vector, VectorSpecies, and VectorShape. The Vector class, parameterized by type (e.g., Integer), supports operations like addition and multiplication across all lanes. Subclasses like IntVector handle primitive types, offering methods like fromArray to populate vectors from arrays. VectorShape defines register sizes (64 to 512 bits or S_MAX for the largest available), ensuring portability across architectures like Intel and ARM. VectorSpecies combines type and shape, specifying, for example, an IntVector with eight lanes in a 256-bit register. Simon demonstrates a loop processing a million-element array, using VectorSpecies to calculate iterations based on lane count, and employs VectorMask to handle partial arrays, ensuring no side effects from unused lanes. This approach optimizes performance for numerically intensive tasks, such as matrix computations or data transformations.

Performance Insights and Trade-offs

The Vector API’s performance benefits shine in specific scenarios, particularly when autovectorization by the JIT compiler is insufficient. Simon references benchmarks from Tomas Zezula, showing that explicit Vector API usage outperforms autovectorization for small arrays (e.g., 64 elements) due to better register utilization. However, for larger arrays (e.g., 2 million elements), memory access latency—100+ cycles for RAM versus 3-5 for L1 cache—diminishes gains. Conditional operations, like adding only even-valued elements, further highlight the API’s value, as the C2 JIT compiler often fails to autovectorize such cases. Azul’s Falcon JIT compiler, based on LLVM, improves autovectorization, but explicit Vector API usage remains superior for complex operations. Simon emphasizes that while the API offers significant flexibility through masks and shuffles, its benefits wane with large datasets due to memory bottlenecks.

Links:

PostHeaderIcon Predictive Modeling and the Illusion of Signal

Introduction

Vincent Warmerdam delves into the illusions often encountered in predictive modeling, highlighting the cognitive traps and statistical misconceptions that lead to overconfidence in model performance.

The Seduction of Spurious Correlations

Models often perform well on training data by exploiting noise rather than genuine signal. Vincent emphasizes critical thinking and statistical rigor to avoid being misled by deceptively strong results.

Building Robust Models

Using robust cross-validation, considering domain knowledge, and testing against out-of-sample data are vital strategies to counteract the illusion of predictive prowess.

Conclusion

Data science is not just coding and modeling — it requires constant skepticism, critical evaluation, and humility. Vincent reminds us to stay vigilant against the comforting but dangerous mirage of false predictability.

PostHeaderIcon Building Intelligent Data Products at Scale

Introduction

Thomas Vachon shares insights into scaling data-driven products, blending machine learning, engineering, and user-centric design to create impactful and intelligent applications.

Key Ingredients for Success

Building intelligent products requires aligning data pipelines, model training, deployment infrastructure, and feedback loops. Vachon stresses the importance of cross-functional collaboration between data scientists, software engineers, and product teams.

Real-World Lessons

From architectural best practices to team organization strategies, Vachon illustrates how to navigate the complexity of scaling data initiatives sustainably.

Conclusion

Intelligent data products demand not only technical excellence but also thoughtful design, scalability planning, and user empathy from day one.

PostHeaderIcon Boosting AI Reliability: Uncertainty Quantification with MAPIE

Watch the video

Introduction

Thierry Cordier and Valentin Laurent introduce MAPIE, a Python library within scikit-learn-contrib, designed for uncertainty quantification in machine learning models.

MAPIE on GitHub

Managing Uncertainty in Machine Learning

In AI applications — from autonomous vehicles to medical diagnostics — understanding prediction uncertainty is crucial. MAPIE uses conformal prediction methods to generate prediction intervals with controlled confidence, ensuring safer and more interpretable AI systems.

Key Features

MAPIE supports regression, classification, time series forecasting, and complex tasks like multi-label classification and semantic segmentation. It integrates seamlessly with scikit-learn, TensorFlow, PyTorch, and custom models.

Real-World Use Cases

By generating calibrated prediction intervals, MAPIE enables selective classification, robust decision-making under uncertainty, and provides statistical guarantees critical for safety-critical AI systems.

Conclusion

MAPIE empowers data scientists to quantify uncertainty elegantly, bridging the gap between predictive power and real-world reliability.

PostHeaderIcon [PyData Paris 2024] Exploring Quarto Dashboard for Impactful and Visual Communication

Exploring Quarto Dashboard for Impactful and Visual Communication

Watch the video

Introduction

Christophe Dervieux introduces us to Quarto Dashboard, a powerful open-source scientific and technical publishing system. Designed to create impactful visual communication directly from Jupyter Notebooks, Quarto enables the seamless creation of interactive charts, dashboards, and dynamic narratives.

Building Visual Communication with Quarto

Quarto extends standard markdown with advanced features tailored for scientific writing. It offers support for multiple computation engines, allowing narratives and executable code to merge into various outputs: PDF, HTML pages, websites, books, and especially dashboards. The dashboard format enhances data communication by organizing visual metrics in an efficient and impactful layout.

Using Quarto, rendering a Jupyter notebook becomes simple: with just a command-line instruction (quarto render), users can output polished, shareable dashboards. Additional extensions, such as those available in VS Code, JupyterLab, and Positron IDEs, streamline this experience further.

Dashboard Features and Design

Dashboards in Quarto organize content using components like cards, rows, columns, sidebars, and tabs. Each element structures visual outputs like plots, tables, and value boxes, allowing maximum clarity. Customization is straightforward, leveraging YAML configuration and Bootstrap-based theming. Users can create multi-page navigation, interactivity through JavaScript libraries, and adapt layouts for specific audiences.

Recent updates even enable branding dashboards easily with SCSS themes, making Quarto ideal for both scientific and corporate environments.

Conclusion

Quarto revolutionizes technical communication by enabling scientists and analysts to produce professional-grade dashboards and publications effortlessly. Christophe’s session at PyData Paris 2023 showcased the simplicity, power, and flexibility Quarto brings to modern data storytelling.

PostHeaderIcon [DevoxxUK2024] Devoxx UK Introduces: Aspiring Speakers 2024, Short Talks

The Aspiring Speakers 2024 session at DevoxxUK2024, organized in collaboration with the London Java Community, showcased five emerging talents sharing fresh perspectives on technology and leadership. Rajani Rao explores serverless architectures, Yemurai Rabvukwa bridges chemistry and cybersecurity, Farhath Razzaque delves into AI-driven productivity, Manogna Machiraju tackles imposter syndrome in leadership, and Leena Mooneeram offers strategies for platform team synergy. Each 10-minute talk delivers actionable insights, reflecting the diversity and innovation within the tech community. This session highlights the power of new voices in shaping the future of software development.

Serverless Revolution with Rajani Rao

Rajani Rao, a principal technologist at Viva and founder of the Women Coding Community, presents a compelling case for serverless computing. Using a restaurant analogy—contrasting home cooking (traditional computing) with dining out (serverless)—Rajani illustrates how serverless eliminates infrastructure management, enhances scalability, and optimizes costs. She shares a real-world example of porting a REST API from Windows EC2 instances to AWS Lambda, handling 6 billion monthly requests. This shift, completed in a day, resolved issues like CPU overload and patching failures, freeing the team from maintenance burdens. The result was not only operational efficiency but also a monetized service, boosting revenue and team morale. Rajani advocates starting small with serverless to unlock creativity and improve developer well-being.

Chemistry Meets Cybersecurity with Yemurai Rabvukwa

Yemurai Rabvukwa, a cybersecurity engineer and TikTok content creator under STEM Bab, draws parallels between chemistry and cybersecurity. Her squiggly career path—from studying chemistry in China to pivoting to tech during a COVID-disrupted study abroad—highlights transferable skills like analytical thinking and problem-solving. Yemurai identifies three intersections: pharmaceuticals, healthcare, and energy. In pharmaceuticals, both fields use a prevent-detect-respond framework to safeguard systems and ensure quality. The 2017 WannaCry attack on the NHS underscores a multidisciplinary approach in healthcare, involving stakeholders to restore services. In energy, geopolitical risks and ransomware target renewable sectors, emphasizing cybersecurity’s critical role. Yemurai’s journey inspires leveraging diverse backgrounds to tackle complex tech challenges.

AI-Powered Productivity with Farhath Razzaque

Farhath Razzaque, a freelance full-stack engineer and AI enthusiast, explores how generative AI can transform developer productivity. Quoting DeepMind’s Demis Hassabis, Farhath emphasizes AI’s potential to accelerate innovation. He outlines five levels of AI adoption: zero-shot prompting for quick error resolution, AI apps like Cursor IDE for streamlined coding, prompt engineering for precise outputs, agentic workflows for collaborative AI agents, and custom solutions using frameworks like LangChain. Farhath highlights open-source tools like NoAI Browser and MakeReal, which rival commercial offerings at lower costs. By automating repetitive tasks and leveraging domain expertise, developers can achieve 10x productivity gains, preparing for an AI-driven future.

Overcoming Imposter Syndrome with Manogna Machiraju

Manogna Machiraju, head of engineering at Domestic & General, shares a candid exploration of imposter syndrome in leadership roles. Drawing from her 2017 promotion to engineering manager, Manogna recounts overworking to prove her worth, only to face project failure and team burnout. This prompted reflection on her role’s expectations, realizing she wasn’t meant to code but to enable her team. She advocates building clarity before acting, appreciating team efforts, and embracing tolerable imperfection. Manogna also addresses the challenge of not being the expert in senior roles, encouraging curiosity and authenticity over faking expertise. Her principle—leaning into discomfort with determination—offers a roadmap for navigating leadership doubts.

Platform Happiness with Leena Mooneeram

Leena Mooneeram, a platform engineer at Chainalysis, presents a developer’s guide to platform happiness, emphasizing mutual engagement between engineers and platform teams. Viewing platforms as products, Leena suggests three actions: be an early adopter to shape tools and build relationships, contribute by fixing documentation or small bugs, and question considerately with context and urgency details. These steps enhance platform robustness and reduce friction. For instance, early adopters provide critical feedback, while contributions like PRs for typos streamline workflows. Leena’s mutual engagement model fosters collaboration, ensuring platforms empower engineers to build software joyfully and efficiently.

Links:

PostHeaderIcon [OxidizeConf2024] Writing Rust Bindings for ThreadX

Crafting Safe Interfaces for Embedded Systems

In the domain of embedded systems, where reliability and efficiency are paramount, Rust has emerged as a powerful tool for building robust software. At OxidizeConf2024, Sojan James from Acsia Technologies delivered an engaging presentation on creating Rust bindings for ThreadX, a compact real-time operating system (RTOS) tailored for microcontrollers. With nearly two decades of experience in C programming and a passion for Rust since 2018, Sojan shared his journey of developing bindings that bridge ThreadX’s C-based architecture with Rust’s safety guarantees, offering practical strategies for embedded developers.

ThreadX, known for its lightweight footprint and static memory allocation, is widely used in automotive digital cockpits and other resource-constrained environments. Sojan’s goal was to create a safe Rust API over ThreadX’s C interfaces, enabling developers to leverage Rust’s type safety and ownership model. His approach involved generating unsafe bindings, wrapping them in a safe abstraction layer, and building sample applications on an STM32 microcontroller. This process, completed primarily during a week-long Christmas project, demonstrates Rust’s potential to enhance embedded development with minimal overhead.

Strategies for Binding Development

Sojan outlined a systematic approach to developing Rust bindings for ThreadX. The first step was creating unsafe bindings to interface with ThreadX’s C API, using Rust’s foreign function interface (FFI) to call C functions directly. This required careful handling of callbacks and memory management, as ThreadX’s static allocation model aligns well with Rust’s borrow checker. Sojan emphasized the importance of reviewing the generated bindings to identify areas where Rust’s ownership semantics could expose architectural inconsistencies, though he noted ThreadX’s maturity minimized such issues.

To create a safe API, Sojan wrapped the unsafe bindings in Rust structs and enums, introducing a typed channel interface for message passing. For example, he demonstrated a queue of type Event, an enum ensuring type safety at compile time. This approach prevents common errors, such as mixing incompatible data types, enhancing reliability in safety-critical applications like automotive systems. A demo on an STM32 showcased two tasks communicating via a 64-byte queue within a 2KB block pool, highlighting the practical application of these bindings in real-world scenarios.

Future Directions and Community Engagement

While Sojan’s bindings are functional, challenges remain, particularly with ThreadX’s timer callbacks, which lack a context pointer, complicating Rust’s safe abstraction. He plans to address this by exploring alternative callback mechanisms or additional abstractions. The bindings, hosted on GitHub, are open for community contributions, reflecting Sojan’s commitment to collaborative development. At Acsia, Rust is being integrated into automotive platforms, including an R&D project, signaling its growing adoption in the industry.

Sojan’s work underscores Rust’s potential to modernize embedded development, offering memory safety without sacrificing performance. By sharing his code and inviting contributions, he fosters a community-driven approach to refining these bindings. As Rust gains traction in automotive and other embedded domains, Sojan’s strategies provide a blueprint for developers seeking to integrate modern programming paradigms with established RTOS platforms, paving the way for safer, more efficient systems.

Links:

PostHeaderIcon JpaSystemException: A collection with cascade=”all-delete-orphan” was no longer referenced by the owning entity instance

Case:

Entity declaration:

    @OneToMany(cascade = CascadeType.ALL, orphanRemoval = true)
    private List<Foo> foos = Lists.newArrayList();

This block

            user.getFoos().clear();
// instantiate `foos`, eg: final List<Foo> foos = myService.createFoos(bla, bla);
            user.setFoos(foos);

generates this error:

org.springframework.orm.jpa.JpaSystemException: A collection with cascade="all-delete-orphan" was no longer referenced by the owning entity instance: com.github.lalou.jonathan.blabla.User.foos

 Fix:

Do not use setFoos() ; rather, after clearing, use addAll(). In other words, replace:

            user.getFoos().clear();
            user.setFoos(foos);

with

user.getFoos().clear(); user.getFoos().addAll(foos);

 

(copied to https://stackoverflow.com/questions/78858499/jpasystemexception-a-collection-with-cascade-all-delete-orphan-was-no-longer )