Posts Tagged ‘ScalaDaysNewYork2016’
[ScalaDaysNewYork2016] Large-Scale Graph Analysis with Scala and Akka
Ben Fonarov, a Big Data specialist at Capital One, presented a compelling case study at Scala Days New York 2016 on building a large-scale graph analysis engine using Scala, Akka, and HBase. Ben detailed the architecture and implementation of Athena, a distributed time-series graph system designed to deliver integrated, real-time data to enterprise users, addressing the challenges of data overload in a banking environment.
Addressing Enterprise Data Needs
Ben Fonarov opened by outlining the motivation behind Athena: the need to provide integrated, real-time data to users at Capital One. Unlike traditional table-based thinking, Athena represents data as a graph, modeling entities like accounts and transactions to align with business concepts. Ben highlighted the challenges of data overload, with multiple data warehouses and ETL processes generating vast datasets. Athena’s visual interface allows users to define graph schemas, ensuring data is accessible in a format that matches their mental models.
Architectural Considerations
Ben described two architectural approaches to building Athena. The naive implementation used a single actor to process queries, which was insufficient for production-scale loads. The robust solution leveraged an Akka cluster, distributing query processing across nodes for scalability. A query parser translated user requests into graph traversals, while actors managed tasks and streamed results to users. This design ensured low latency and scalability, handling up to 200 billion nodes efficiently.
Streaming and Optimization
A key feature of Athena, Ben explained, was its ability to stream results in real time, avoiding the batch processing limitations of frameworks like TinkerPop’s Gremlin. By using Akka’s actor-based concurrency, Athena processes queries incrementally, delivering results as they are computed. Ben discussed optimizations, such as limiting the number of nodes per actor to prevent bottlenecks, and plans to integrate graph algorithms like PageRank to enhance analytical capabilities.
Future Directions and Community Engagement
Ben concluded by sharing future plans for Athena, including adopting a Gremlin-like DSL for graph traversals and integrating with tools like Spark and H2O. He emphasized the importance of community feedback, inviting developers to join Capital One’s data team to contribute to Athena’s evolution. Running on AWS EC2, Athena represents a scalable solution for enterprise graph analysis, poised to transform how banks handle complex data relationships.
Links:
[ScalaDaysNewYork2016] Implicits Inspected and Explained: Demystifying Scala’s Power
At Scala Days New York 2016, Tim Soethout, a functional programmer at ING Bank, offered a comprehensive guide to Scala’s implicits, a feature often perceived as magical by developers transitioning from basic to advanced Scala programming. Tim’s presentation bridged this gap, providing clear explanations and practical examples to demonstrate how implicits enhance code expressiveness and flexibility.
Understanding Implicits
Tim Soethout began by defining implicits as a mechanism for providing values or conversions without explicit references, enabling concise and flexible code. Drawing parallels with object-oriented programming, Tim explained that implicits extend “is-a” and “has-a” relationships with “is-viewable-as,” allowing developers to add rich interfaces to existing types. For instance, in Akka, the !
(tell) operator uses an implicit sender parameter, simplifying message passing. Similarly, Scala’s Futures rely on implicit execution contexts to manage asynchronous operations, abstracting thread scheduling from developers.
Compiler Resolution of Implicits
A key focus of Tim’s talk was demystifying how the Scala compiler resolves implicits. He outlined the compiler’s search process, which prioritizes local scope, companion objects, and package objects related to the involved types. Tim cautioned against implicit conversions with mismatched semantics, as they can lead to unexpected behavior. Using a live coding demo, he illustrated how implicits enable expressive DSLs, such as JSON serialization libraries, by automatically resolving type-specific writers, thus reducing boilerplate code.
Type Classes and Extensibility
Tim explored type classes as a powerful application of implicits, allowing non-intrusive library extensions. By defining behaviors like JSON serialization in companion objects, developers can extend functionality without modifying core libraries. He demonstrated this with a JSON writer example, where implicits ensured type-safe serialization for complex data structures. Tim emphasized that this approach fosters loose coupling, making libraries more modular and easier to maintain.
Practical Debugging Tips
Addressing common challenges, Tim offered strategies for debugging implicits, such as inspecting bytecode or leveraging IDEs to trace implicit resolutions. He warned against chaining multiple implicit conversions, as the compiler restricts itself to a single conversion to avoid complexity. By sharing practical examples, Tim equipped developers with the tools to harness implicits effectively, ensuring they enhance rather than obscure code clarity.
Links:
[ScalaDaysNewYork2016] Finagle Under the Hood: Unraveling Twitter’s RPC System
At Scala Days New York 2016, Vladimir Kostyukov, a member of Twitter’s Finagle team, provided an in-depth exploration of Finagle, a scalable and extensible RPC system written in Scala. Vladimir elucidated how Finagle simplifies the complexities of distributed systems, offering a functional programming model that enhances developer productivity while managing intricate backend operations like load balancing and circuit breaking.
The Essence of Finagle
Vladimir Kostyukov introduced Finagle as a robust RPC system used extensively at Twitter and other organizations. Unlike traditional application frameworks, Finagle focuses on facilitating communication between services, abstracting complexities such as connection pooling and load balancing. Vladimir highlighted its protocol-agnostic nature, supporting protocols like HTTP/2 and Twitter’s custom Mux, which enables efficient multiplexing. This flexibility allows developers to build services in Scala or Java, seamlessly integrating Finagle into diverse tech stacks.
Client-Server Dynamics
Delving into Finagle’s internals, Vladimir described its client-server model, where services are treated as composable functions. When a client sends a request, Finagle’s stack—comprising modules for connection pooling, load balancing, and failure handling—processes it efficiently. On the server side, incoming requests are routed through similar modules, ensuring resilience and scalability. Vladimir emphasized the “power of two choices” load balancing algorithm, which selects the least-loaded node from two randomly chosen servers, achieving near-optimal load distribution in constant time.
Advanced Features and Ecosystem
Vladimir showcased Finagle’s advanced capabilities, such as streaming support for large payloads and integration with tools like Zipkin for tracing and Twitter Server for monitoring. Libraries like Finatra and Featherbed further enhance Finagle’s utility, enabling developers to build RESTful APIs and type-safe HTTP clients. These features make Finagle a powerful choice for handling high-throughput systems, as demonstrated by its widespread adoption at Twitter for managing massive data flows.
Community and Future Prospects
Encouraging community engagement, Vladimir invited developers to contribute to Finagle’s open-source repository on GitHub. He discussed ongoing efforts to support HTTP/2 and improve performance, underscoring Finagle’s evolution toward a utopian RPC system. By offering a welcoming environment for pull requests and feedback, Vladimir emphasized the collaborative spirit driving Finagle’s development, ensuring it remains a cornerstone of scalable service architectures.
Links:
[ScalaDaysNewYork2016] Nightmare Before Best Practices: Lessons from Failure
At Scala Days New York 2016, José Castro, a software engineer at Codacy, delivered a riveting presentation that diverged from the typical conference narrative. Instead of showcasing success stories, José shared cautionary tales of software development mishaps, emphasizing the critical importance of adhering to best practices to prevent costly errors. Through vivid anecdotes, he illustrated how neglecting simple procedures can lead to significant financial and operational setbacks, offering valuable lessons for developers.
The Costly Oversight in Payment Systems
José Castro began with a chilling account of a website launch that initially seemed successful but resulted in a €180,000 loss. The development team had integrated a shopping cart with a bank’s payment system, but for three weeks, no customer payments were processed. José recounted how a developer’s personal purchase revealed that the system was authorizing transactions without completing charges, a flaw unnoticed due to inadequate testing. The bank’s policy allowed only one week to finalize charges, rendering earlier transactions uncollectible. This oversight, José emphasized, could have been prevented with rigorous integration testing and automated checks to ensure payment flows were correctly implemented.
Deployment Disasters and Human Error
Another tale José shared involved a deployment error that brought down a critical system for 12 hours. A developer, tasked with updating a customer-facing application, accidentally deployed to the production environment instead of staging, overwriting essential configurations. The absence of proper deployment protocols and environment safeguards exacerbated the issue, leading to significant downtime. José highlighted the need for automated deployment pipelines and environment-specific configurations to prevent such human errors, ensuring that production systems remain insulated from untested changes.
The Perils of Inadequate Documentation
José also recounted a scenario where insufficient documentation led to a prolonged outage in a payment processing system. A critical configuration change was made without updating the documentation, leaving the team unable to troubleshoot when the system failed. This lack of clarity delayed recovery, costing the company valuable time and revenue. José advocated for documentation-driven development, where comprehensive records of system configurations and procedures are maintained, enabling quick resolution of issues and reducing dependency on individual knowledge.
Fostering a Healthy Code Review Culture
In addressing code review challenges, José discussed the emotional barriers developers face when receiving feedback. He shared an example of a team member who successfully separated personal ego from code quality, embracing constructive criticism. To mitigate conflicts, José recommended automated code review tools like Codacy, which provide objective feedback, reducing interpersonal tension. By automating routine checks, teams can focus on higher-level implementation discussions, fostering a collaborative environment and improving code quality without bruising egos.
Links:
[ScalaDaysNewYork2016] Perfect Scalability: Architecting Limitless Systems
Michael Nash, co-author of Applied Akka Patterns, delivered an insightful exploration of scalability at Scala Days New York 2016, distinguishing it from performance and outlining strategies to achieve near-linear scalability using the Lightbend ecosystem. Michael’s presentation delved into architectural principles, real-world patterns, and tools that enable systems to handle increasing loads without failure.
Scalability vs. Performance
Michael Nash clarified that scalability is the ability to handle greater loads without breaking, distinct from performance, which focuses on processing the same load faster. Using a simple graph, Michael illustrated how performance improvements shift response times downward, while scalability extends the system’s capacity to handle more requests. He cautioned that poorly designed systems hit scalability limits, leading to errors or degraded performance, emphasizing the need for architectures that avoid these bottlenecks.
Avoiding Scalability Pitfalls
Michael identified key enemies of scalability, such as shared databases, synchronous communication, and sequential IDs. He advocated for denormalized, isolated data stores per microservice, using event sourcing and CQRS to decouple systems. For instance, an inventory service can update based on events from a customer service without direct database access, enhancing scalability. Michael also warned against overusing Akka cluster sharding, which introduces overhead, recommending it only when consistency is critical.
Leveraging the Lightbend Ecosystem
The Lightbend ecosystem, including Scala, Akka, and Spark, provides robust tools for scalability, Michael explained. Akka’s actor model supports asynchronous messaging, ideal for distributed systems, while Spark handles large-scale data processing. Tools like Docker, Mesos, and Lightbend’s ConductR streamline deployment and orchestration, enabling rolling upgrades without downtime. Michael emphasized integrating these tools with continuous delivery and deep monitoring to maintain system health under high loads.
Real-World Applications and DevOps
Michael shared case studies from IoT wearables to high-finance systems, highlighting common patterns like event-driven architectures and microservices. He stressed the importance of DevOps in scalable systems, advocating for automated deployment pipelines and monitoring to detect issues early. By embracing failure as inevitable and designing for resilience, systems can scale across data centers, as seen in continent-spanning applications. Michael’s practical advice included starting deployment planning early to avoid scalability bottlenecks.
Links:
[ScalaDaysNewYork2016] Domain-Driven Design and Onion Architecture in Scala
Wade Waldron, a senior consultant at Lightbend and co-author of Applied Akka Patterns, presented a compelling case for combining Domain-Driven Design (DDD) and Onion Architecture at Scala Days New York 2016. Using the relatable example of frying an egg, Wade illustrated how these methodologies enhance code clarity, maintainability, and portability, leveraging Scala’s expressive features to model complex domains effectively.
Defining the Core Domain
Wade Waldron introduced DDD as a methodology that prioritizes the core domain—the sphere of knowledge central to a system—over peripheral concerns like user interfaces or databases. Using the egg-frying case study, Wade demonstrated how Scala’s case classes and traits enable rapid prototyping of domain models, such as eggs, frying pans, and cooking processes. By focusing on domain rules rather than technology, developers can create stable, reusable code. Wade emphasized that DDD does not require CQRS or event sourcing, though these patterns complement it, allowing flexibility in implementation.
Leveraging Onion Architecture
Onion Architecture, Wade explained, organizes code into concentric layers, with the domain model at the core and infrastructure, like databases, at the outer layers. This structure ensures portability by isolating the domain from external dependencies. In the egg-frying example, repositories and services abstract interactions with external systems, allowing seamless swaps of databases or APIs without altering the core logic. Wade showcased how Scala’s concise syntax supports this layering, enabling developers to maintain clean package structures and focus on business logic.
Enhancing Portability and Testability
A key benefit of combining DDD with Onion Architecture, as Wade highlighted, is the ability to refactor implementations without impacting consumers. By defining clear APIs and using repositories, developers can switch databases or rewrite domain models transparently. Wade shared real-world examples where his team performed live migrations and database swaps unnoticed by users, thanks to the abstraction layers. This approach also simplifies testing, as in-memory repositories can mimic real data stores, enhancing test efficiency and reliability.
Engaging Stakeholders with Domain Language
Wade stressed the importance of using a ubiquitous language in DDD to align developers and stakeholders. In the egg-frying scenario, terms like “fry” and “cook” bridge technical and non-technical discussions, ensuring clarity. However, Wade acknowledged challenges in large organizations where stakeholders may focus on technical details. He advised persistently steering conversations back to the domain level, fostering a shared understanding that drives effective collaboration and reduces miscommunication.
Links:
[ScalaDaysNewYork2016] The Zen of Akka: Mastering Asynchronous Design
At Scala Days New York 2016, Konrad Malawski, a key member of the Akka team at Lightbend, delivered a profound exploration of the principles guiding the effective use of Akka, a toolkit for building concurrent and distributed systems. Konrad’s presentation, inspired by the philosophical lens of “The Tao of Programming,” offered practical insights into designing applications with Akka, emphasizing the shift from synchronous to asynchronous paradigms to achieve robust, scalable architectures.
Embracing the Messaging Paradigm
Konrad Malawski began by underscoring the centrality of messaging in Akka’s actor model. Drawing from Alan Kay’s vision of object-oriented programming, Konrad explained that actors encapsulate state and communicate solely through messages, mirroring real-world computing interactions. This approach fosters loose coupling, both spatially and temporally, allowing components to operate independently. A single actor, Konrad noted, is limited in utility, but when multiple actors collaborate—such as delegating tasks to specialized actors like a “yellow specialist”—powerful patterns like worker pools and sharding emerge. These patterns enable efficient workload distribution, aligning perfectly with the distributed nature of modern systems.
Structuring Actor Systems for Clarity
A common pitfall for newcomers to Akka, Konrad observed, is creating unstructured systems with actors communicating chaotically. To counter this, he advocated for hierarchical actor systems using context.actorOf
to spawn child actors, ensuring a clear supervisory structure. This hierarchy not only organizes actors but also enhances fault tolerance through supervision, where parent actors manage failures of their children. Konrad cautioned against actor selection—directly addressing actors by path—as it leads to brittle designs akin to “stealing a TV from a stranger’s house.” Instead, actors should be introduced through proper references, fostering maintainable and predictable interactions.
Balancing Power and Constraints
Konrad emphasized the philosophy of “constraints liberate, liberties constrain,” a principle echoed across Scala conferences. Akka actors, being highly flexible, can perform a wide range of tasks, but this power can overwhelm developers. He contrasted actors with more constrained abstractions like futures, which handle single values, and Akka Streams, which enforce a static data flow. These constraints enable optimizations, such as transparent backpressure in streams, which are harder to implement in the dynamic actor model. However, actors excel in distributed settings, where messaging simplifies scaling across nodes, making Akka a versatile choice for complex systems.
Community and Future Directions
Konrad highlighted the vibrant Akka community, encouraging contributions through platforms like GitHub and Gitter. He noted ongoing developments, such as Akka Typed, an experimental API that enhances type safety in actor interactions. By sharing resources like the Reactive Streams TCK and community-driven initiatives, Konrad underscored Lightbend’s commitment to evolving Akka collaboratively. His call to action was clear: engage with the community, experiment with new features, and contribute to shaping Akka’s future, ensuring it remains a cornerstone of reactive programming.
Links:
[ScalaDaysNewYork2016] Monitoring Reactive Applications: New Approaches for a New Paradigm
Reactive applications, built on event-driven and asynchronous foundations, require innovative monitoring strategies. At Scala Days New York 2016, Duncan DeVore and Henrik Engström, both from Lightbend, explored the challenges and solutions for monitoring such systems. They discussed how traditional monitoring falls short for reactive architectures and introduced Lightbend’s approach to addressing these challenges, emphasizing adaptability and precision in observing distributed systems.
The Shift from Traditional Monitoring
Duncan and Henrik began by outlining the limitations of traditional monitoring, which relies on stack traces in synchronous systems to diagnose issues. In reactive applications, built with frameworks like Akka and Play, the asynchronous, message-driven nature disrupts this model. Stack traces lose relevance, as actors communicate without a direct call stack. The speakers categorized monitoring into business process, functional, and technical types, highlighting the need to track metrics like actor counts, message flows, and system performance in distributed environments.
The Impact of Distributed Systems
The rise of the internet and cloud computing has transformed system design, as Duncan explained. Distributed computing, pioneered by initiatives like ARPANET, and the economic advantages of cloud platforms have enabled businesses to scale rapidly. However, this shift introduces complexities, such as network partitions and variable workloads, necessitating new monitoring approaches. Henrik noted that reactive systems, designed for scalability and resilience, require tools that can handle dynamic data flows and provide insights into system behavior without relying on traditional metrics.
Challenges in Monitoring Reactive Systems
Henrik detailed the difficulties of monitoring asynchronous systems, where data flows through push or pull models. In push-based systems, monitoring tools must handle high data volumes, risking overload, while pull-based systems allow selective querying for efficiency. The speakers emphasized anomaly detection over static thresholds, as thresholds are hard to calibrate and may miss nuanced issues. Anomaly detection, exemplified by tools like Prometheus, identifies unusual patterns by correlating metrics, reducing false alerts and enhancing system understanding.
Lightbend’s Monitoring Solution
Duncan and Henrik introduced Lightbend Monitoring, a subscription-based tool tailored for reactive applications. It integrates with Akka actors and Lagom circuit breakers, generating metrics and traces for backends like StatsD and Telegraf. The solution supports pull-based monitoring, allowing selective data collection to manage high data volumes. Future enhancements include support for distributed tracing, Prometheus integration, and improved Lagom compatibility, aiming to provide a comprehensive view of system health and performance.
Links:
[ScalaDaysNewYork2016] Lightbend Lagom: Crafting Microservices with Precision
Microservices have become a cornerstone of modern software architecture, yet their complexity often poses challenges. At Scala Days New York 2016, Mirco Dotta, a software engineer at Lightbend, introduced Lagom, an open-source framework designed to simplify the creation of reactive microservices. Mirco showcased how Lagom, meaning “just right” in Swedish, balances developer productivity with adherence to reactive principles, offering a seamless experience from development to production.
The Philosophy of Lagom
Mirco emphasized that Lagom prioritizes appropriately sized services over the “micro” aspect of microservices. By focusing on clear boundaries and isolation, Lagom ensures services are neither too small nor overly complex, aligning with the Swedish concept of sufficiency. Built on Play Framework and Akka, Lagom is inherently asynchronous and non-blocking, promoting scalability and resilience. Mirco highlighted its opinionated approach, which standardizes service structures to enhance consistency across teams, allowing developers to focus on domain logic rather than infrastructure.
Development Environment Efficiency
Lagom’s development environment, inspired by Play Framework, is a standout feature. Mirco demonstrated this with a sample application called Cheerer, a Twitter-like service. Using a single SBT command, runAll
, developers can launch all services, including an embedded Cassandra server, service locator, and gateway, within one JVM. The environment supports hot reloading, automatically recompiling and restarting services upon code changes. This streamlined setup, consistent across different machines, frees developers from managing complex scripts, enhancing productivity and collaboration.
Service and Persistence APIs
Lagom’s service API is defined through a descriptor method, specifying endpoints and metadata for inter-service communication. Mirco showcased a “Hello World” service, illustrating how services expose endpoints that other services can call, facilitated by the service locator. For persistence, Lagom defaults to Cassandra, leveraging its scalability and resilience, but allows flexibility for other data stores. Mirco advocated for event sourcing and CQRS (Command Query Responsibility Segregation), noting their suitability for microservices. These patterns enable immutable event logs and optimized read views, simplifying data management and scalability.
Production-Ready Features
Transitioning to production is seamless with Lagom, as Mirco demonstrated through its integration with SBT Native Packager, supporting formats like Docker images and RPMs. Lightbend Conductor, available for free in development, simplifies orchestration, offering features like rolling upgrades and circuit breakers for fault tolerance. Mirco highlighted ongoing work to support other orchestration tools like Kubernetes, encouraging community contributions to expand Lagom’s ecosystem. Circuit breakers and monitoring capabilities further ensure service reliability in production environments.
Links:
[ScalaDaysNewYork2016] Connecting Reactive Applications with Fast Data Using Reactive Streams
The rapid evolution of data processing demands systems that can handle real-time information efficiently. At Scala Days New York 2016, Luc Bourlier, a software engineer at Lightbend, delivered an insightful presentation on integrating reactive applications with fast data architectures using Apache Spark and Reactive Streams. Luc demonstrated how Spark Streaming, enhanced with backpressure support in Spark 1.5, enables seamless connectivity between reactive systems and real-time data processing, ensuring responsiveness under varying workloads.
Understanding Fast Data
Luc began by defining fast data as the application of big data tools and algorithms to streaming data, enabling near-instantaneous insights. Unlike traditional big data, which processes stored datasets, fast data focuses on analyzing data as it arrives. Luc illustrated this with a scenario where a business initially runs batch jobs to analyze historical data but soon requires daily, hourly, or even real-time updates to stay competitive. This shift from batch to streaming processing underscores the need for systems that can adapt to dynamic data inflows, a core principle of fast data architectures.
Spark Streaming and Backpressure
Central to Luc’s presentation was Spark Streaming, an extension of Apache Spark designed for real-time data processing. Spark Streaming processes data in mini-batches, allowing it to leverage Spark’s in-memory computation capabilities, a significant advancement over Hadoop’s disk-based MapReduce model. Luc highlighted the introduction of backpressure in Spark 1.5, a feature developed by his team at Lightbend. Backpressure dynamically adjusts the data ingestion rate based on processing capacity, preventing system overload. By analyzing the number of records processed and the time taken in each mini-batch, Spark computes an optimal ingestion rate, ensuring stability even under high data volumes.
Reactive Streams Integration
To connect reactive applications with Spark Streaming, Luc introduced Reactive Streams, a set of Java interfaces designed to facilitate communication between systems with backpressure support. These interfaces allow a reactive application, such as one generating random numbers for a Pi computation demo, to feed data into Spark Streaming without overwhelming the system. Luc demonstrated this integration using a Raspberry Pi cluster, showcasing how backpressure ensures the system remains stable by throttling the data producer when processing lags. This approach maintains responsiveness, a key tenet of reactive systems, by aligning data production with consumption capabilities.
Practical Demonstration and Challenges
Luc’s live demo vividly illustrated the integration process. He presented a dashboard displaying a reactive application computing Pi approximations, with Spark analyzing the generated data in real time. Initially, the system handled 1,000 elements per second efficiently, but as the rate increased to 4,000, processing delays emerged without backpressure, causing data to accumulate in memory. By enabling backpressure, Luc showed how Spark adjusted the ingestion rate, maintaining processing times around one second and preventing system failure. He noted challenges, such as the need to handle variable-sized records, but emphasized that backpressure significantly enhances system reliability.
Future Enhancements
Looking forward, Luc discussed ongoing improvements to Spark’s backpressure mechanism, including better handling of aggregated records and potential integration with Reactive Streams for enhanced pluggability. He encouraged developers to explore Reactive Streams at reactivestreams.org, noting its inclusion in Java 9’s concurrent package. These advancements aim to further streamline the connection between reactive applications and fast data systems, making real-time processing more accessible and robust.