Recent Posts
Archives

Archive for the ‘General’ Category

PostHeaderIcon [ScalaDaysNewYork2016] Monitoring Reactive Applications: New Approaches for a New Paradigm

Reactive applications, built on event-driven and asynchronous foundations, require innovative monitoring strategies. At Scala Days New York 2016, Duncan DeVore and Henrik Engström, both from Lightbend, explored the challenges and solutions for monitoring such systems. They discussed how traditional monitoring falls short for reactive architectures and introduced Lightbend’s approach to addressing these challenges, emphasizing adaptability and precision in observing distributed systems.

The Shift from Traditional Monitoring

Duncan and Henrik began by outlining the limitations of traditional monitoring, which relies on stack traces in synchronous systems to diagnose issues. In reactive applications, built with frameworks like Akka and Play, the asynchronous, message-driven nature disrupts this model. Stack traces lose relevance, as actors communicate without a direct call stack. The speakers categorized monitoring into business process, functional, and technical types, highlighting the need to track metrics like actor counts, message flows, and system performance in distributed environments.

The Impact of Distributed Systems

The rise of the internet and cloud computing has transformed system design, as Duncan explained. Distributed computing, pioneered by initiatives like ARPANET, and the economic advantages of cloud platforms have enabled businesses to scale rapidly. However, this shift introduces complexities, such as network partitions and variable workloads, necessitating new monitoring approaches. Henrik noted that reactive systems, designed for scalability and resilience, require tools that can handle dynamic data flows and provide insights into system behavior without relying on traditional metrics.

Challenges in Monitoring Reactive Systems

Henrik detailed the difficulties of monitoring asynchronous systems, where data flows through push or pull models. In push-based systems, monitoring tools must handle high data volumes, risking overload, while pull-based systems allow selective querying for efficiency. The speakers emphasized anomaly detection over static thresholds, as thresholds are hard to calibrate and may miss nuanced issues. Anomaly detection, exemplified by tools like Prometheus, identifies unusual patterns by correlating metrics, reducing false alerts and enhancing system understanding.

Lightbend’s Monitoring Solution

Duncan and Henrik introduced Lightbend Monitoring, a subscription-based tool tailored for reactive applications. It integrates with Akka actors and Lagom circuit breakers, generating metrics and traces for backends like StatsD and Telegraf. The solution supports pull-based monitoring, allowing selective data collection to manage high data volumes. Future enhancements include support for distributed tracing, Prometheus integration, and improved Lagom compatibility, aiming to provide a comprehensive view of system health and performance.

Links:

PostHeaderIcon [DotSecurity2017] Post-Quantum Cryptography

In the shadowed corridors of computational evolution, where qubits dance on the precipice of unraveling classical safeguards, the specter of quantum supremacy looms as both marvel and menace. Tanja Lange, a pioneering cryptographer and chair of the Coding Theory and Cryptology group at Eindhoven University of Technology, confronted this conundrum at dotSecurity 2017, elucidating the imperative for encryption resilient to tomorrow’s quantum tempests. With a career illuminating the interstices of mathematics and machine security, Tanja dissected the vulnerabilities plaguing contemporary ciphers—RSA’s reliance on factorization’s fortress, ECC’s elliptic enigmas—while heralding lattice-based bastions and code-theoretic countermeasures as beacons of post-quantum fortitude. This discourse transcends abstraction; it charts a course for safeguarding secrets sown today from harvests reaped by adversaries armed with tomorrow’s arithmetic.

Tanja’s treatise commenced with cryptography’s ubiquity: the browser’s lock icon, a talisman of TLS’s aegis, enshrines RSA or Diffie-Hellman duos, their potency predicated on problems polynomials presume intractable. Yet, Shor’s quantum sleight—factoring in factorial fractions, discrete logs dispatched—threatens this tranquility. Grover’s oracle amplifies: symmetric keys halved in fortitude, AES-256’s bulwark bruised to 128-bit equivalence. Retroactive peril compounds: “harvest now, decrypt later,” state actors stockpiling streams for quantum quelling. Tanja tallied timelines: Google’s Sycamore’s supremacy in 2019, IBM’s 2023 roadmap to 1,000+ qubits—2025’s horizon harbors harbingers capable of cracking 2048-bit RSA in hours.

Post-quantum’s pantheon pivots on presumptions quantum-proof: lattices’ learning with errors (LWE), multivariate quadratics’ mazes, hash’s hierarchies. Tanja traversed LWE’s labyrinth: vectors veiled in noise, decoding’s dichotomy—structured sparsity succumbing sans trapdoors, randomness repelling revelation. McEliece’s mantle, code-based cryptography’s cornerstone since 1978, endures: Goppa codes’ generator matrices, encryption as error-infused syndromes—decryption’s discernment demands secret scaffolds. Tanja touted standardization’s sprint: NIST’s 2016 clarion, 2022’s Kyber crystallization (lattice largesse), Dilithium’s digital signatures—round three’s rites refining resilience.

Challenges cascade: key sizes’ kilobyte burdens (Kyber’s 1KB public, McEliece’s megabyte monoliths), signatures’ sprawl—yet optimizations orbit: hybrid harbingers blending classical clutches with quantum cautions. Tanja tempered trepidation: current crypto’s continuum, migration’s mosaic—signal spikes, certificate cascades. Her horizon: PQC’s proliferation, from Chrome’s 2024 infusions to IETF’s interoperability—ensuring enclaves eternal against entanglement’s edge.

Quantum’s Quandary and Classical Cracks

Tanja traced threats: Shor’s sieve shattering RSA’s ramparts, Grover’s grope gnawing symmetric sinews—harvest’s haunt, 2025’s qubit quorum. ECC’s edifice echoes: elliptic’s enigmas eclipsed, Diffie-Hellman’s duels dissolved.

Lattice Locks and Code Crypts

LWE’s veil: noise’s nebula, trapdoors’ trove—McEliece’s matrices, Goppa’s girth. NIST’s novelties: Kyber’s kernels, Dilithium’s declarations—hybrids’ harmony, keys’ curtailment.

Migration’s Mandate and Horizons

Tanja’s timeline: signal’s surge, certs’ cascade—Chrome’s convergence, IETF’s accord. PQC’s promise: enclaves enduring, entanglement evaded.

Links:

PostHeaderIcon [ScalaDaysNewYork2016] Lightbend Lagom: Crafting Microservices with Precision

Microservices have become a cornerstone of modern software architecture, yet their complexity often poses challenges. At Scala Days New York 2016, Mirco Dotta, a software engineer at Lightbend, introduced Lagom, an open-source framework designed to simplify the creation of reactive microservices. Mirco showcased how Lagom, meaning “just right” in Swedish, balances developer productivity with adherence to reactive principles, offering a seamless experience from development to production.

The Philosophy of Lagom

Mirco emphasized that Lagom prioritizes appropriately sized services over the “micro” aspect of microservices. By focusing on clear boundaries and isolation, Lagom ensures services are neither too small nor overly complex, aligning with the Swedish concept of sufficiency. Built on Play Framework and Akka, Lagom is inherently asynchronous and non-blocking, promoting scalability and resilience. Mirco highlighted its opinionated approach, which standardizes service structures to enhance consistency across teams, allowing developers to focus on domain logic rather than infrastructure.

Development Environment Efficiency

Lagom’s development environment, inspired by Play Framework, is a standout feature. Mirco demonstrated this with a sample application called Cheerer, a Twitter-like service. Using a single SBT command, runAll, developers can launch all services, including an embedded Cassandra server, service locator, and gateway, within one JVM. The environment supports hot reloading, automatically recompiling and restarting services upon code changes. This streamlined setup, consistent across different machines, frees developers from managing complex scripts, enhancing productivity and collaboration.

Service and Persistence APIs

Lagom’s service API is defined through a descriptor method, specifying endpoints and metadata for inter-service communication. Mirco showcased a “Hello World” service, illustrating how services expose endpoints that other services can call, facilitated by the service locator. For persistence, Lagom defaults to Cassandra, leveraging its scalability and resilience, but allows flexibility for other data stores. Mirco advocated for event sourcing and CQRS (Command Query Responsibility Segregation), noting their suitability for microservices. These patterns enable immutable event logs and optimized read views, simplifying data management and scalability.

Production-Ready Features

Transitioning to production is seamless with Lagom, as Mirco demonstrated through its integration with SBT Native Packager, supporting formats like Docker images and RPMs. Lightbend Conductor, available for free in development, simplifies orchestration, offering features like rolling upgrades and circuit breakers for fault tolerance. Mirco highlighted ongoing work to support other orchestration tools like Kubernetes, encouraging community contributions to expand Lagom’s ecosystem. Circuit breakers and monitoring capabilities further ensure service reliability in production environments.

Links:

PostHeaderIcon [ScalaDaysNewYork2016] Connecting Reactive Applications with Fast Data Using Reactive Streams

The rapid evolution of data processing demands systems that can handle real-time information efficiently. At Scala Days New York 2016, Luc Bourlier, a software engineer at Lightbend, delivered an insightful presentation on integrating reactive applications with fast data architectures using Apache Spark and Reactive Streams. Luc demonstrated how Spark Streaming, enhanced with backpressure support in Spark 1.5, enables seamless connectivity between reactive systems and real-time data processing, ensuring responsiveness under varying workloads.

Understanding Fast Data

Luc began by defining fast data as the application of big data tools and algorithms to streaming data, enabling near-instantaneous insights. Unlike traditional big data, which processes stored datasets, fast data focuses on analyzing data as it arrives. Luc illustrated this with a scenario where a business initially runs batch jobs to analyze historical data but soon requires daily, hourly, or even real-time updates to stay competitive. This shift from batch to streaming processing underscores the need for systems that can adapt to dynamic data inflows, a core principle of fast data architectures.

Spark Streaming and Backpressure

Central to Luc’s presentation was Spark Streaming, an extension of Apache Spark designed for real-time data processing. Spark Streaming processes data in mini-batches, allowing it to leverage Spark’s in-memory computation capabilities, a significant advancement over Hadoop’s disk-based MapReduce model. Luc highlighted the introduction of backpressure in Spark 1.5, a feature developed by his team at Lightbend. Backpressure dynamically adjusts the data ingestion rate based on processing capacity, preventing system overload. By analyzing the number of records processed and the time taken in each mini-batch, Spark computes an optimal ingestion rate, ensuring stability even under high data volumes.

Reactive Streams Integration

To connect reactive applications with Spark Streaming, Luc introduced Reactive Streams, a set of Java interfaces designed to facilitate communication between systems with backpressure support. These interfaces allow a reactive application, such as one generating random numbers for a Pi computation demo, to feed data into Spark Streaming without overwhelming the system. Luc demonstrated this integration using a Raspberry Pi cluster, showcasing how backpressure ensures the system remains stable by throttling the data producer when processing lags. This approach maintains responsiveness, a key tenet of reactive systems, by aligning data production with consumption capabilities.

Practical Demonstration and Challenges

Luc’s live demo vividly illustrated the integration process. He presented a dashboard displaying a reactive application computing Pi approximations, with Spark analyzing the generated data in real time. Initially, the system handled 1,000 elements per second efficiently, but as the rate increased to 4,000, processing delays emerged without backpressure, causing data to accumulate in memory. By enabling backpressure, Luc showed how Spark adjusted the ingestion rate, maintaining processing times around one second and preventing system failure. He noted challenges, such as the need to handle variable-sized records, but emphasized that backpressure significantly enhances system reliability.

Future Enhancements

Looking forward, Luc discussed ongoing improvements to Spark’s backpressure mechanism, including better handling of aggregated records and potential integration with Reactive Streams for enhanced pluggability. He encouraged developers to explore Reactive Streams at reactivestreams.org, noting its inclusion in Java 9’s concurrent package. These advancements aim to further streamline the connection between reactive applications and fast data systems, making real-time processing more accessible and robust.

Links:

PostHeaderIcon [ScalaDaysNewYork2016] Scala’s Road Ahead: Shaping the Future of a Versatile Language

Scala, a language renowned for blending functional and object-oriented programming, stands at a pivotal juncture as outlined by its creator, Martin Odersky, in his keynote at Scala Days New York 2016. Martin’s address explored Scala’s unique identity, recent developments like Scala 2.12 and the Scala Center, and the experimental Dotty compiler, offering a vision for the language’s evolution over the next five years. This talk underscored Scala’s commitment to balancing simplicity, power, and theoretical rigor while addressing community needs.

Scala’s Recent Milestones

Martin began by reflecting on Scala’s steady growth, evidenced by increasing job postings and Google Trends for Scala tutorials. The establishment of the Scala Center marks a significant milestone, providing a hub for community collaboration with support from industry leaders like Lightbend and Goldman Sachs. Additionally, Scala 2.12, set for release in mid-2016, optimizes for Java 8, leveraging lambdas and default methods to produce more compact and faster code. This release, with 33 new features and contributions from 65 committers, reflects Scala’s vibrant community and commitment to progress.

The Scala Center: Fostering Community Collaboration

The Scala Center, as Martin described, serves as a steward for Scala, focusing on projects that benefit the entire community. By coordinating contributions and fostering industrial partnerships, it aims to streamline development and ensure Scala’s longevity. While Martin deferred detailed discussion to Heather Miller’s keynote, he emphasized the center’s role in unifying efforts to enhance Scala’s ecosystem, making it a cornerstone for future growth.

Dotty: A New Foundation for Scala

Central to Martin’s vision is Dotty, a new Scala compiler built on the Dependent Object Types (DOT) calculus. This theoretical foundation, proven sound after an eight-year effort, provides a robust basis for evaluating new language features. Dotty, with a leaner codebase of 45,000 lines compared to the current compiler’s 75,000, offers faster compilation and simplifies the language’s internals by encoding complex features like type parameters into a minimal subset. This approach enhances confidence in language evolution, allowing developers to experiment with new constructs without compromising stability.

Evolving Scala’s Libraries

Looking beyond Scala 2.12, Martin outlined plans for Scala 2.13, focusing on revamping the standard library, particularly collections. Inspired by Spark’s lazy evaluation and pair datasets, Scala aims to simplify collections while maintaining compatibility. Proposals include splitting the library into a core module, containing essentials like collections, and a platform module for additional functionalities like JSON handling. This modular approach would enable dynamic updates and broader community contributions, addressing the challenges of maintaining a monolithic library.

Addressing Language Complexity

Martin acknowledged Scala’s reputation for complexity, particularly with features like implicits, which, while powerful, can lead to unexpected behavior if misused. To mitigate this, he proposed style guidelines, such as the principle of least power, encouraging developers to use the simplest constructs necessary. Additionally, he suggested enforcing rules for implicit conversions, limiting them to packages containing the source or target types to reduce surprises. These measures aim to balance Scala’s flexibility with usability, ensuring it remains approachable.

Future Innovations: Simplifying and Strengthening Scala

Martin’s vision for Scala includes several forward-looking features. Implicit function types will reduce boilerplate by abstracting over implicit parameters, while effect systems will treat side effects like exceptions as capabilities, enhancing type safety. Nullable types, modeled as union types, address Scala’s null-related issues, aligning it with modern languages like Kotlin. Generic programming improvements, inspired by libraries like Shapeless, aim to eliminate tuple limitations, and better records will support data engines like Spark. These innovations, grounded in Dotty’s foundations, promise a more robust and intuitive Scala.

Links:

PostHeaderIcon [ScalaDaysNewYork2016] Spark 2.0: Evolving Big Data Processing with Structured APIs

Apache Spark, a cornerstone in big data processing, has significantly shaped the landscape of distributed computing with its functional programming paradigm rooted in Scala. In a keynote address at Scala Days New York 2016, Matei Zaharia, the creator of Spark, elucidated the evolution of Spark’s APIs, culminating in the transformative release of Spark 2.0. This presentation highlighted how Spark has progressed from its initial vision of a unified engine to a more sophisticated platform with structured APIs like DataFrames and Datasets, enabling enhanced performance and usability for developers worldwide.

The Genesis of Spark’s Vision

Spark was conceived with two primary ambitions: to create a unified engine capable of handling diverse big data workloads and to offer a concise, language-integrated API that mirrors working with local data collections. Matei explained that unlike the earlier MapReduce model, which was groundbreaking yet limited, Spark extended its capabilities to support iterative computations, streaming, and interactive data exploration. This unification was critical, as prior to Spark, developers often juggled multiple specialized systems, each with its own complexities, making integration cumbersome. By leveraging Scala’s functional constructs, Spark introduced Resilient Distributed Datasets (RDDs), allowing developers to perform operations like map, filter, and join with ease, abstracting the complexities of distributed computing.

The success of this vision is evident in Spark’s widespread adoption. With over a thousand organizations deploying it, including on clusters as large as 8,000 nodes, Spark has become the most active open-source big data project. Its libraries for SQL, streaming, machine learning, and graph processing have been embraced, with 75% of surveyed organizations using multiple components, demonstrating the power of its unified approach.

Challenges with the Functional API

Despite its strengths, the original RDD-based API presented challenges, particularly in optimization and efficiency. Matei highlighted that the functional API, while intuitive, conceals the semantics of computations, making it difficult for the engine to optimize operations automatically. For instance, operations like groupByKey can lead to inefficient memory usage, as they materialize large intermediate datasets unnecessarily. This issue is exemplified in a word count example where groupByKey creates a sequence of values before summing them, consuming excessive memory when a simpler reduceByKey could suffice.

Moreover, the reliance on Java objects for data storage introduces significant memory overhead. Matei illustrated this with a user class example, where headers, pointers, and padding consume roughly two-thirds of the allocated memory, a critical concern for in-memory computing frameworks like Spark. These challenges underscored the need for a more structured approach to data processing.

Introducing Structured APIs: DataFrames and Datasets

To address these limitations, Spark introduced DataFrames and Datasets, structured APIs built atop the Spark SQL engine. These APIs impose a defined schema on data, enabling the engine to understand and optimize computations more effectively. DataFrames, dynamically typed, resemble tables in a relational database, supporting operations like filtering and aggregation through a domain-specific language (DSL). Datasets, statically typed, extend this concept by aligning closely with Scala’s type system, allowing developers to work with case classes for type safety.

Matei demonstrated how DataFrames enable declarative programming, where operations are expressed as logical plans that Spark optimizes before execution. For example, filtering users by state generates an abstract syntax tree, allowing Spark to optimize the query plan rather than executing operations eagerly. This declarative nature, inspired by data science tools like Pandas, distinguishes Spark’s DataFrames from similar APIs in R and Python, enhancing performance through lazy evaluation and optimization.

Optimizing Performance with Project Tungsten

A significant focus of Spark 2.0 is Project Tungsten, which addresses the shifting bottlenecks in big data systems. Matei noted that while I/O was the primary constraint in 2010, advancements in storage (SSDs) and networking (10-40 gigabit) have shifted the focus to CPU efficiency. Tungsten employs three strategies: runtime code generation, cache locality exploitation, and off-heap memory management. By encoding data in a compact binary format, Spark reduces memory overhead compared to Java objects. Code generation, facilitated by the Catalyst optimizer, produces specialized bytecode that operates directly on binary data, improving CPU performance. These optimizations ensure Spark can leverage modern hardware trends, delivering significant performance gains.

Structured Streaming: A Unified Approach to Real-Time Processing

Spark 2.0 introduces structured streaming, a high-level API that extends the benefits of DataFrames and Datasets to streaming computations. Matei emphasized that real-world streaming applications often involve batch and interactive workloads, such as updating a database for a web application or applying a machine learning model. Structured streaming treats streams as infinite DataFrames, allowing developers to use familiar APIs to define computations. The engine then incrementally executes these plans, maintaining state and handling late data efficiently. For instance, a batch job grouping data by user ID can be adapted to streaming by changing the input source, with Spark automatically updating results as new data arrives.

This approach simplifies the development of continuous applications, enabling seamless integration of streaming, batch, and interactive processing within a single API, a capability that sets Spark apart from other streaming engines.

Future Directions and Community Engagement

Looking ahead, Matei outlined Spark’s commitment to evolving its APIs while maintaining compatibility. The structured APIs will serve as the foundation for new libraries, facilitating interoperability across languages like Python and R. Additionally, Spark’s data source API allows applications to seamlessly switch between storage systems like Hive, Cassandra, or JSON, enhancing flexibility. Matei also encouraged community participation, noting that Databricks offers a free Community Edition with tutorials to help developers explore Spark’s capabilities.

Links:

PostHeaderIcon (long tweet) When ‘filter’ does not work with Primefaces’ datatable

Abstract

Sometimes, the filter function in Primefaces <p:datatable/> does not work when the field on which filtering is operated typed as an enum.

Explanation

Actually, in order to filter, Primefaces relies on a direct '=' comparison. The hack to fix this issue is to force Primefaces to compare on the enum name, and not by a reference check.

Quick fix

In the enum class, add the following block:

[java]public String getName(){ return name(); }[/java]

Have the datatable declaration to look like:

[xml]<p:dataTable id="castorsDT" var="castor" value="#{managedCastorListManagedBean.initiatedCastors}" widgetVar="castorsTable" filteredValue="#{managedCastorListManagedBean.filteredCastors}">
[/xml]

Declare the enum-filtered column lke this:

[xml]<p:column sortBy="#{castor.castorWorkflowStatus}" filterable="true" filterBy="#{castor.castorWorkflowStatus.name}" filterMatchMode="in">
<f:facet name="filter">
<p:selectCheckboxMenu label="#{messages[‘status’]}" onchange="PF(‘castorsTable’).filter()">

<f:selectItems value="#{transverseManagedBean.allCastorWorkflowStatuses}" var="cws" itemLabel="#{cws.name}" itemValue="#{cws.name}"/>
</p:selectCheckboxMenu>
</f:facet>
</p:column>[/xml]

Notice how the filtering attribute is declared:

[xml]filterable="true" filterBy="#{castor.castorWorkflowStatus.name}" filterMatchMode="in"[/xml]

In other terms, the comparison is forced the rely on equals() of class String, through the calls to getName() and name().

PostHeaderIcon Retours du Devoxx France 2016 (4): Gradle: Harder, Better, Stronger, Faster

La conference est animee par Andres Almiray de Canoo Fellow, un Java Champion qui nous vient du Mexique. Officiellement, il s’agit de presenter Gradle pour un usage avance ; neanmoins, Andres dissimule a peine son intention de nous faire quitter Maven pour Gradle.

Read the rest of this entry »

PostHeaderIcon Jonathan LALOU recommends… Stephane TORTAJADA

I wrote the following notice on Stephane TORTAJADA‘s profile on LinkedIn:

I had reported to Stephane for one year. I recommend Stephane for his management, that is based on a few principles:
* understand personally technical and functional problems
* allow team mates to make errors, in order to learn from them
* protect team mates from external aggressions and impediments
* escalating up and down the relevant information

PostHeaderIcon Jonathan LALOU recommends… Ahmed CHAARI

I wrote the following notice on Ahmed CHAARI‘s profile on LinkedIn:

Ahmed has shown his ability to absorb and learn complex technologies, as well as improve his skills in a short time and a not-so-easy environment. This is why I consider Ahmed as a software engineer to recommend for any Java-focused team.