Recent Posts
Archives

Posts Tagged ‘Akka’

PostHeaderIcon [ScalaDaysNewYork2016] Large-Scale Graph Analysis with Scala and Akka

Ben Fonarov, a Big Data specialist at Capital One, presented a compelling case study at Scala Days New York 2016 on building a large-scale graph analysis engine using Scala, Akka, and HBase. Ben detailed the architecture and implementation of Athena, a distributed time-series graph system designed to deliver integrated, real-time data to enterprise users, addressing the challenges of data overload in a banking environment.

Addressing Enterprise Data Needs

Ben Fonarov opened by outlining the motivation behind Athena: the need to provide integrated, real-time data to users at Capital One. Unlike traditional table-based thinking, Athena represents data as a graph, modeling entities like accounts and transactions to align with business concepts. Ben highlighted the challenges of data overload, with multiple data warehouses and ETL processes generating vast datasets. Athena’s visual interface allows users to define graph schemas, ensuring data is accessible in a format that matches their mental models.

Architectural Considerations

Ben described two architectural approaches to building Athena. The naive implementation used a single actor to process queries, which was insufficient for production-scale loads. The robust solution leveraged an Akka cluster, distributing query processing across nodes for scalability. A query parser translated user requests into graph traversals, while actors managed tasks and streamed results to users. This design ensured low latency and scalability, handling up to 200 billion nodes efficiently.

Streaming and Optimization

A key feature of Athena, Ben explained, was its ability to stream results in real time, avoiding the batch processing limitations of frameworks like TinkerPop’s Gremlin. By using Akka’s actor-based concurrency, Athena processes queries incrementally, delivering results as they are computed. Ben discussed optimizations, such as limiting the number of nodes per actor to prevent bottlenecks, and plans to integrate graph algorithms like PageRank to enhance analytical capabilities.

Future Directions and Community Engagement

Ben concluded by sharing future plans for Athena, including adopting a Gremlin-like DSL for graph traversals and integrating with tools like Spark and H2O. He emphasized the importance of community feedback, inviting developers to join Capital One’s data team to contribute to Athena’s evolution. Running on AWS EC2, Athena represents a scalable solution for enterprise graph analysis, poised to transform how banks handle complex data relationships.

Links:

PostHeaderIcon [ScalaDaysNewYork2016] Perfect Scalability: Architecting Limitless Systems

Michael Nash, co-author of Applied Akka Patterns, delivered an insightful exploration of scalability at Scala Days New York 2016, distinguishing it from performance and outlining strategies to achieve near-linear scalability using the Lightbend ecosystem. Michael’s presentation delved into architectural principles, real-world patterns, and tools that enable systems to handle increasing loads without failure.

Scalability vs. Performance

Michael Nash clarified that scalability is the ability to handle greater loads without breaking, distinct from performance, which focuses on processing the same load faster. Using a simple graph, Michael illustrated how performance improvements shift response times downward, while scalability extends the system’s capacity to handle more requests. He cautioned that poorly designed systems hit scalability limits, leading to errors or degraded performance, emphasizing the need for architectures that avoid these bottlenecks.

Avoiding Scalability Pitfalls

Michael identified key enemies of scalability, such as shared databases, synchronous communication, and sequential IDs. He advocated for denormalized, isolated data stores per microservice, using event sourcing and CQRS to decouple systems. For instance, an inventory service can update based on events from a customer service without direct database access, enhancing scalability. Michael also warned against overusing Akka cluster sharding, which introduces overhead, recommending it only when consistency is critical.

Leveraging the Lightbend Ecosystem

The Lightbend ecosystem, including Scala, Akka, and Spark, provides robust tools for scalability, Michael explained. Akka’s actor model supports asynchronous messaging, ideal for distributed systems, while Spark handles large-scale data processing. Tools like Docker, Mesos, and Lightbend’s ConductR streamline deployment and orchestration, enabling rolling upgrades without downtime. Michael emphasized integrating these tools with continuous delivery and deep monitoring to maintain system health under high loads.

Real-World Applications and DevOps

Michael shared case studies from IoT wearables to high-finance systems, highlighting common patterns like event-driven architectures and microservices. He stressed the importance of DevOps in scalable systems, advocating for automated deployment pipelines and monitoring to detect issues early. By embracing failure as inevitable and designing for resilience, systems can scale across data centers, as seen in continent-spanning applications. Michael’s practical advice included starting deployment planning early to avoid scalability bottlenecks.

Links:

PostHeaderIcon [ScalaDaysNewYork2016] The Zen of Akka: Mastering Asynchronous Design

At Scala Days New York 2016, Konrad Malawski, a key member of the Akka team at Lightbend, delivered a profound exploration of the principles guiding the effective use of Akka, a toolkit for building concurrent and distributed systems. Konrad’s presentation, inspired by the philosophical lens of “The Tao of Programming,” offered practical insights into designing applications with Akka, emphasizing the shift from synchronous to asynchronous paradigms to achieve robust, scalable architectures.

Embracing the Messaging Paradigm

Konrad Malawski began by underscoring the centrality of messaging in Akka’s actor model. Drawing from Alan Kay’s vision of object-oriented programming, Konrad explained that actors encapsulate state and communicate solely through messages, mirroring real-world computing interactions. This approach fosters loose coupling, both spatially and temporally, allowing components to operate independently. A single actor, Konrad noted, is limited in utility, but when multiple actors collaborate—such as delegating tasks to specialized actors like a “yellow specialist”—powerful patterns like worker pools and sharding emerge. These patterns enable efficient workload distribution, aligning perfectly with the distributed nature of modern systems.

Structuring Actor Systems for Clarity

A common pitfall for newcomers to Akka, Konrad observed, is creating unstructured systems with actors communicating chaotically. To counter this, he advocated for hierarchical actor systems using context.actorOf to spawn child actors, ensuring a clear supervisory structure. This hierarchy not only organizes actors but also enhances fault tolerance through supervision, where parent actors manage failures of their children. Konrad cautioned against actor selection—directly addressing actors by path—as it leads to brittle designs akin to “stealing a TV from a stranger’s house.” Instead, actors should be introduced through proper references, fostering maintainable and predictable interactions.

Balancing Power and Constraints

Konrad emphasized the philosophy of “constraints liberate, liberties constrain,” a principle echoed across Scala conferences. Akka actors, being highly flexible, can perform a wide range of tasks, but this power can overwhelm developers. He contrasted actors with more constrained abstractions like futures, which handle single values, and Akka Streams, which enforce a static data flow. These constraints enable optimizations, such as transparent backpressure in streams, which are harder to implement in the dynamic actor model. However, actors excel in distributed settings, where messaging simplifies scaling across nodes, making Akka a versatile choice for complex systems.

Community and Future Directions

Konrad highlighted the vibrant Akka community, encouraging contributions through platforms like GitHub and Gitter. He noted ongoing developments, such as Akka Typed, an experimental API that enhances type safety in actor interactions. By sharing resources like the Reactive Streams TCK and community-driven initiatives, Konrad underscored Lightbend’s commitment to evolving Akka collaboratively. His call to action was clear: engage with the community, experiment with new features, and contribute to shaping Akka’s future, ensuring it remains a cornerstone of reactive programming.

Links:

PostHeaderIcon [ScalaDaysNewYork2016] Monitoring Reactive Applications: New Approaches for a New Paradigm

Reactive applications, built on event-driven and asynchronous foundations, require innovative monitoring strategies. At Scala Days New York 2016, Duncan DeVore and Henrik Engström, both from Lightbend, explored the challenges and solutions for monitoring such systems. They discussed how traditional monitoring falls short for reactive architectures and introduced Lightbend’s approach to addressing these challenges, emphasizing adaptability and precision in observing distributed systems.

The Shift from Traditional Monitoring

Duncan and Henrik began by outlining the limitations of traditional monitoring, which relies on stack traces in synchronous systems to diagnose issues. In reactive applications, built with frameworks like Akka and Play, the asynchronous, message-driven nature disrupts this model. Stack traces lose relevance, as actors communicate without a direct call stack. The speakers categorized monitoring into business process, functional, and technical types, highlighting the need to track metrics like actor counts, message flows, and system performance in distributed environments.

The Impact of Distributed Systems

The rise of the internet and cloud computing has transformed system design, as Duncan explained. Distributed computing, pioneered by initiatives like ARPANET, and the economic advantages of cloud platforms have enabled businesses to scale rapidly. However, this shift introduces complexities, such as network partitions and variable workloads, necessitating new monitoring approaches. Henrik noted that reactive systems, designed for scalability and resilience, require tools that can handle dynamic data flows and provide insights into system behavior without relying on traditional metrics.

Challenges in Monitoring Reactive Systems

Henrik detailed the difficulties of monitoring asynchronous systems, where data flows through push or pull models. In push-based systems, monitoring tools must handle high data volumes, risking overload, while pull-based systems allow selective querying for efficiency. The speakers emphasized anomaly detection over static thresholds, as thresholds are hard to calibrate and may miss nuanced issues. Anomaly detection, exemplified by tools like Prometheus, identifies unusual patterns by correlating metrics, reducing false alerts and enhancing system understanding.

Lightbend’s Monitoring Solution

Duncan and Henrik introduced Lightbend Monitoring, a subscription-based tool tailored for reactive applications. It integrates with Akka actors and Lagom circuit breakers, generating metrics and traces for backends like StatsD and Telegraf. The solution supports pull-based monitoring, allowing selective data collection to manage high data volumes. Future enhancements include support for distributed tracing, Prometheus integration, and improved Lagom compatibility, aiming to provide a comprehensive view of system health and performance.

Links:

PostHeaderIcon [ScalaDaysNewYork2016] Lightbend Lagom: Crafting Microservices with Precision

Microservices have become a cornerstone of modern software architecture, yet their complexity often poses challenges. At Scala Days New York 2016, Mirco Dotta, a software engineer at Lightbend, introduced Lagom, an open-source framework designed to simplify the creation of reactive microservices. Mirco showcased how Lagom, meaning “just right” in Swedish, balances developer productivity with adherence to reactive principles, offering a seamless experience from development to production.

The Philosophy of Lagom

Mirco emphasized that Lagom prioritizes appropriately sized services over the “micro” aspect of microservices. By focusing on clear boundaries and isolation, Lagom ensures services are neither too small nor overly complex, aligning with the Swedish concept of sufficiency. Built on Play Framework and Akka, Lagom is inherently asynchronous and non-blocking, promoting scalability and resilience. Mirco highlighted its opinionated approach, which standardizes service structures to enhance consistency across teams, allowing developers to focus on domain logic rather than infrastructure.

Development Environment Efficiency

Lagom’s development environment, inspired by Play Framework, is a standout feature. Mirco demonstrated this with a sample application called Cheerer, a Twitter-like service. Using a single SBT command, runAll, developers can launch all services, including an embedded Cassandra server, service locator, and gateway, within one JVM. The environment supports hot reloading, automatically recompiling and restarting services upon code changes. This streamlined setup, consistent across different machines, frees developers from managing complex scripts, enhancing productivity and collaboration.

Service and Persistence APIs

Lagom’s service API is defined through a descriptor method, specifying endpoints and metadata for inter-service communication. Mirco showcased a “Hello World” service, illustrating how services expose endpoints that other services can call, facilitated by the service locator. For persistence, Lagom defaults to Cassandra, leveraging its scalability and resilience, but allows flexibility for other data stores. Mirco advocated for event sourcing and CQRS (Command Query Responsibility Segregation), noting their suitability for microservices. These patterns enable immutable event logs and optimized read views, simplifying data management and scalability.

Production-Ready Features

Transitioning to production is seamless with Lagom, as Mirco demonstrated through its integration with SBT Native Packager, supporting formats like Docker images and RPMs. Lightbend Conductor, available for free in development, simplifies orchestration, offering features like rolling upgrades and circuit breakers for fault tolerance. Mirco highlighted ongoing work to support other orchestration tools like Kubernetes, encouraging community contributions to expand Lagom’s ecosystem. Circuit breakers and monitoring capabilities further ensure service reliability in production environments.

Links: