Recent Posts
Archives

Posts Tagged ‘Scala’

PostHeaderIcon [Scala IO Paris 2024] Escaping the False Dichotomy: Sanely Automatic Derivation in Scala

In the ScalaIO Paris 2024 session “Slow Auto, Inconvenient Semi: Escaping False Dichotomy with Sanely Automatic Derivation,” Mateusz Kubuszok delivered a user-focused exploration of typeclass derivation in Scala. With over nine years of Scala experience and as a co-maintainer of the Chimney library, Mateusz examined the trade-offs between automatic and semi-automatic derivation, proposing a “sanely automatic” approach that balances usability, compile-time performance, and runtime efficiency. Using JSON serialization libraries like Circe and Jsoniter-Scala as case studies, the talk highlighted how library design choices impact users and offered a practical solution to common derivation pain points, supported by benchmarks and a public repository.

Understanding Typeclasses and Derivation

Mateusz began by demystifying typeclasses, describing them as parameterized interfaces (e.g., an Encoder[T] for JSON serialization) whose implementations are provided by the compiler via implicits or givens. Typeclass derivation automates creating these implementations for complex types like case classes or sealed traits by combining implementations for their fields or subtypes. For example, encoding a User(name: String, address: Address) requires encoders for String and Address, which the compiler recursively resolves.

Derivation comes in two flavors: automatic and semi-automatic. Automatic derivation, popularized by Circe, uses implicits to generate implementations on-demand, but can lead to slow compilation and runtime performance issues. Semi-automatic derivation requires explicit calls (e.g., deriveEncoder[Address]) to define implicits, ensuring consistency but adding manual overhead. Mateusz emphasized that these choices, made by library authors, significantly affect users through compilation times, error messages, and performance.

The Pain Points of Automatic and Semi-Automatic Derivation

Automatic derivation in Circe recursively generates encoders for nested types, checking for existing implicits before falling back to derivation. This can cause circular dependencies or stack overflows if not managed carefully. Semi-automatic derivation avoids this by always generating new instances, but requires users to define implicits for every intermediate type, increasing code verbosity. Mateusz shared anecdotes of developers banning automatic derivation due to compilation times ballooning to 10 minutes for complex JSON hierarchies.

Benchmarks on a nested case class (five levels deep) revealed stark differences. On Scala 2.13, Circe’s automatic derivation took 14 seconds to compile a single file, versus 12 seconds for semi-automatic. On Scala 3, automatic derivation soared to 46 seconds (cold JVM) or 16 seconds (warm), while semi-automatic took 10 seconds or 1 second, respectively. Runtime performance also suffered: automatic derivation on Scala 3 was up to 10 times slower than semi-automatic, likely due to large inlined methods overwhelming JVM optimization.

Comparing Libraries: Circe vs. Jsoniter-Scala

Mateusz contrasted Circe with Jsoniter-Scala, which prioritizes performance. Jsoniter-Scala uses a single macro to generate recursive codec implementations, avoiding intermediate implicits except for custom overrides. This reduces memory allocation and compilation overhead. Benchmarks showed Jsoniter-Scala compiling faster than Circe (e.g., 2–3 times faster on Scala 3) and running three times faster, even when Circe was given a head start by testing on JSON representations rather than strings.

Jsoniter-Scala’s approach minimizes implicit searches, embedding logic in macros instead of relying on the compiler’s typechecker. For example, encoding a User with an Address field involves one codec handling all nested types, unlike Circe’s recursive implicit resolution. This results in fewer allocations and better error messages, as macros can pinpoint failure causes (e.g., a missing implicit for a specific field).

Sanely Automatic Derivation: A New Approach

Inspired by Jsoniter-Scala, Mateusz proposed “sanely automatic derivation” to combine automatic derivation’s convenience with semi-automatic’s performance. Using his Chimney library as a testbed, he split typeclasses into two traits: one for user-facing APIs and another for automatic derivation, invisible to implicit searches to avoid circular dependencies. This allows recursive derivation with minimal implicits, using macros to handle nested types efficiently.

Mateusz implemented this for a Jsoniter-Scala wrapper on Scala 3, achieving compilation times comparable to Jsoniter-Scala’s single-implicit approach and faster than Circe’s semi-automatic derivation. Runtime performance matched Jsoniter-Scala’s, with negligible overhead. Error messages were improved by logging macro actions (e.g., which field caused a failure) via Scala’s macro settings, viewable in IDEs like VS Code without console output.

A Fair Comparison: Custom Typeclass Benchmark

To ensure fairness, Mateusz created a custom Show typeclass (similar to Circe’s ShowPretty) for pretty-printing case classes using StringBuilder. He implemented it with Shapeless (Scala 2), Mirrors (Scala 3), Magnolia (both), and his sanely automatic approach. Initial results showed his approach outperforming Shapeless and Mirrors but trailing Magnolia’s semi-automatic derivation. By adding caching within macros (reusing derived implementations for repeated types), his approach became the fastest across all platforms, compiling in less time than Shapeless, Mirrors, or Magnolia, with better runtime performance.

This caching, inspired by Jsoniter-Scala, avoids re-deriving the same type multiple times within a macro, reducing method size and enabling JVM optimization. The change required minimal code, demonstrating that library authors can adopt this approach with a single, non-invasive pull request.

Implications for Scala’s Future

Mateusz concluded by addressing Scala’s reputation for slow compilation, citing a Virtus Lab survey where 53% of developers complained about compile times, often tied to typeclass derivation. Libraries like Shapeless and Magnolia prioritize developer convenience over user experience, leading to opaque errors and performance issues. His sanely automatic derivation offers a user-friendly alternative: one import, fast compilation, efficient runtime, and debuggable errors.

By sharing his Chimney Macro Commons library, Mateusz encourages library authors to rethink derivation strategies. While macros require more maintenance than Shapeless or Magnolia, they become viable as libraries scale and prioritize user needs. He urged developers to provide feedback to library maintainers, challenging assumptions that automatic and semi-automatic are the only options, to make Scala more accessible and production-ready.

Hashtags: #Scala #TypeclassDerivation #ScalaIOParis2024 #Circe #JsoniterScala #Chimney #Performance #Macros #Scala3

PostHeaderIcon [Scala IO Paris 2024] Calculating Is Funnier Than Guessing

In the ScalaIO Paris 2024 session “Calculating is funnier than guessing”, Regis Kuckaertz, a French developer living in an English-speaking country captivated the audience with a methodical approach to writing compilers for domain-specific languages (DSLs) in Scala. The talk debunked the mystique of compiler construction, emphasizing a principled, calculation-based process over ad-hoc guesswork. Using equational reasoning and structural induction, the speaker derived a compiler and stack machine for a simple boolean expression language, Expr, and extended the approach to the more complex ZPure datatype from the ZIO Prelude library. The result was a correct-by-construction compiler, offering performance gains over interpreters while remaining accessible to functional programmers.

Laying the Foundation with Equational Reasoning

The talk began by highlighting the limitations of interpreters for DSLs, which, while easy to write via structural induction, incur runtime overhead. The speaker argued that functional programming’s strength lies in embedding DSLs, citing examples like Cats Effect, ZIO, and Kulo for metrics. To achieve “abstraction without remorse,” DSLs must be compiled into efficient machine code. The proposed method, inspired by historical work on calculating compilers, avoids pre-made recipes, instead using a single-step derivation process combining evaluation, continuation-passing style (CPS), and defunctionalization.

For the Expr language, comprising boolean constants, negation, and conjunction, the speaker defined a denotational semantics with an evaluator function. This function maps expressions to boolean values, e.g., evaluating And(Not(B(true)), B(false)) to a boolean result. The evaluator was refined to make implicit behaviors explicit, such as Scala’s left-to-right evaluation of &&, ensuring the specification aligns with developer expectations. This step underscored the importance of intimate familiarity with execution details, uncovered through the derivation process.

Deriving a Compiler for Expr

The core of the talk was deriving a compiler and stack machine for Expr using equational reasoning. The correctness specification required that compiling an expression and executing it on a stack yields the same result as evaluating the expression and pushing it onto the stack. The compiler was defined with a helper function using symbolic CPS, taking a continuation to guide code generation. For each constructor—B (boolean), Not, and And—the speaker applied the specification, reducing expressions step-by-step.

For B, a Push instruction was introduced to place a boolean on the stack. For Not, a Neg instruction negated the top stack value, with the subexpression compiled inductively. For And, the derivation distributed stack operations over conditional branches, introducing an If instruction to select continuations based on a boolean. The final Compile function used a Halt continuation to stop execution. The resulting machine language and stack machine, implemented as an imperative tail-recursive loop, fit on a single slide, achieving orders-of-magnitude performance improvements over the interpreter.

Tackling Complexity with ZPure

To demonstrate real-world applicability, the speaker applied the technique to ZPure, a datatype from ZIO Prelude for pure computations with state, logging, and error handling. The language includes constructors for pure values, failures, error handling, state management, logging, and flat mapping. The evaluator threads state and logs, succeeding or failing based on the computation. The compiler derivation followed the same process, introducing instructions like PushThrowLoadStoreLogMarkUnmark, and Call to handle values, errors, state, and continuations.

The derivation for ZPure required careful handling of failures, where a Throw instruction invokes a failure routine that unwinds the stack until it finds a handler or crashes. For Catch and FlatMap, the speaker applied the induction hypothesis, introducing stack markers to manage handlers and continuations. Despite Scala functions in ZPure requiring runtime compilation, the speaker proposed defunctionalization—using data types like Flow or lambda calculus encodings—to eliminate this, though this was left as future work. The resulting compiler and machine, again fitting on a slide, were correct by construction, with unreachable cases confidently excluded.

Reflections and Future Directions

The talk emphasized that calculating compilers is a mechanical, repeatable process, not a mysterious art. By deriving machine instructions through equational reasoning, developers ensure correctness without extensive unit testing. The speaker noted a limitation in ZPure: its evaluator and compiler allow non-terminating expressions, which a partial monad could address. Future work includes defunctionalizing ZPure to avoid runtime compilation and optimizing machine code into directed acyclic graphs to reduce duplication.

The speaker recommended resources like Philip Wadler’s papers on calculating compilers, encouraging functional programmers to explore this approachable technique. The talk, blending humor with rigor, demonstrated that compiling DSLs is not only feasible but also “funnier” than guessing, offering a path to efficient, correct code.

Hashtags: #Scala #CompilerDesign #EquationalReasoning #ZPure #ScalaIOParis2024 #FunctionalProgramming

PostHeaderIcon [ScalaDays 2019] Techniques for Teaching Scala

Noel Welsh, co-founder of Underscore VC and a respected voice in the Scala community, known for his educational contributions including the books “Creative Scala” and “Essential Scala,” shared his valuable insights on “Techniques for Teaching Scala” at ScalaDays Lausanne 2019. His presentation moved beyond the technical intricacies of the language to explore the pedagogical approaches that make learning Scala more effective, engaging, and accessible to newcomers.

Addressing Scala’s Complexity

Welsh likely began by acknowledging the perceived complexity of Scala. While powerful, its rich feature set, including its blend of object-oriented and functional paradigms, can present a steep learning curve. His talk would have focused on how educators and mentors can break down these complexities into digestible components, fostering a positive learning experience that encourages long-term adoption and mastery of the language.

Structured and Incremental Learning

One of the core themes Welsh might have explored is the importance of a structured and incremental approach to teaching. Instead of overwhelming beginners with advanced concepts like implicits, monads, or higher-kinded types from day one, he probably advocated for starting with Scala’s more familiar object-oriented features, gradually introducing functional concepts as the learner builds confidence and a foundational understanding. He might have shared specific curriculum structures or learning pathways that have proven successful in his experience, perhaps drawing from his work at Underscore Training or the philosophy behind “Creative Scala,” which uses creative coding exercises to make learning fun and tangible.

Practical Application and Real-World Examples

Another key technique Welsh likely emphasized is the power of practical application and real-world examples. Abstract theoretical explanations, he might have argued, are less effective than hands-on coding exercises that allow learners to see Scala in action and understand how its features solve concrete problems. This could involve guiding students through building small, meaningful projects or using Scala for data manipulation, web development, or other relatable tasks. The use of tools like Scala Worksheets or Ammonite scripts for immediate feedback and experimentation could also have been highlighted as beneficial for learning.

Psychological Aspects of Learning

Welsh may have also delved into the psychological aspects of learning a new programming language. Addressing common frustrations, building a supportive learning environment, and celebrating small victories can significantly impact a student’s motivation and persistence. He might have discussed the role of clear, concise explanations, well-chosen analogies, and the importance of providing constructive feedback. Techniques for teaching specific Scala concepts that often trip up beginners, such as pattern matching, futures, or the collections library, could have been illustrated with practical teaching tips.

Investing in Scala Education

Ultimately, Noel Welsh’s presentation would have been a call to the Scala community to invest not just in developing the language and its tools, but also in refining the art and science of teaching it. By adopting more effective pedagogical techniques, the community can lower the barrier to entry, nurture new talent, and ensure that Scala’s power is accessible to a broader audience of developers, fostering a more vibrant and sustainable ecosystem.

Hashtags: #ScalaDays2019 #Scala #Education #Teaching #Pedagogy #NoelWelsh #CreativeScala

PostHeaderIcon [ScalaDays 2019] Scala Best Practices Unveiled

At ScalaDays Lausanne 2019, Nicolas Rinaudo, CTO of Besedo, delivered a candid and insightful talk on Scala best practices, drawing from his experience merging Node, Java, and Scala teams. Speaking at EPFL’s 10th anniversary conference, Nicolas shared pitfalls he wished he’d known as a beginner, from array comparisons to exceptions, offering practical solutions. His interactive approach—challenging the audience to spot code issues—drew laughter, applause, and questions on tools like Scalafix, making his session a highlight for developers seeking to write robust Scala code.

Array Comparisons and Type Annotations

Nicolas began with a deceptive array comparison using ==, which fails due to arrays’ reference equality on the JVM, unlike other Scala collections. Beginners from Java or JavaScript backgrounds often stumble here, expecting value equality. The fix, sameElements, ensures correct comparisons, while Scala 3’s immutable arrays behave sanely. Another common error is omitting type annotations on public methods. Without explicit types, the compiler may infer overly broad types like Some instead of Option, risking binary compatibility issues in libraries. Nicolas advised always annotating public members, noting Scala 3’s requirement for implicit members, though the core issue persists, emphasizing vigilance for maintainable code.

Sealed Traits and Case Classes

Enumerations in Scala, Nicolas warned, are treacherous, allowing non-exhaustive pattern matches that compile but crash at runtime. Sealed traits, with subtypes declared in the same file, enable the compiler to enforce exhaustive matching, preventing such failures. He recommended enabling fatal warnings to catch these issues. Similarly, non-final case classes can be extended, leading to unexpected equality behaviors, like two distinct objects comparing equal due to inherited methods. Making case classes final avoids this, though Scala’s default remains unchanged even in Scala 3. Nicolas also noted that sealed traits aren’t foolproof—subtypes can be extended unless final, a subtlety that surprises learners but is manageable with disciplined design.

Exceptions and Referential Transparency

Exceptions, common in Java, disrupt Scala’s referential transparency, where expressions should be replaceable with their values without altering behavior. Nicolas likened exceptions to goto statements, as their origin is untraceable, and Scala’s unchecked exceptions evade compile-time checks. For instance, parsing a string to an integer may throw a NumberFormatException, but most inputs fail, making it a common case unsuitable for exceptions. Instead, use Option for absent values, Either for errors with context, or Try for Java interop. These encode errors in return types, enhancing predictability. Scala 3 offers no changes here, so Nicolas’s advice remains critical for writing transparent, robust code.

Enforcing Best Practices

To prevent these pitfalls, Nicolas advocated rigorous code reviews, citing research showing reduced defects. Automated tools like WartRemover, Scalafix, Scapegoat, and Scalastyle detect issues during builds, though some, like WartRemover’s strict defaults, may need tuning. Migrating to Scala 3 addresses many problems, such as implicit conversions and enumeration ambiguities, with features like the enum keyword and explicit import rules. Nicolas invited the audience to share additional practices, pointing to his companion website for detailed articles. His acknowledgment of Scala contributors, especially for Scala 3 insights, underscored the community’s role in evolving best practices, making his talk a call to action for cleaner coding.

Hashtags: #ScalaDays2019 #Scala #BestPractices #FunctionalProgramming

PostHeaderIcon [ScalaDays 2019] Preserving Privacy with Scala

At ScalaDays Lausanne 2019, Manohar Jonnalagedda and Jakob Odersky, researchers turned industry innovators, unveiled a Scala-powered approach to secure multi-party computation (MPC) at EPFL’s 10th anniversary conference. Their talk, moments before lunch, captivated attendees with a protocol to average salaries without revealing individual data, sparking curiosity about privacy-preserving applications. Manohar and Jakob, from Inpher, detailed a compiler transforming high-level code into secure, distributed computations, addressing real-world challenges like GDPR-compliant banking and satellite collision detection, earning applause and probing questions on security and scalability.

A Privacy-Preserving Protocol

Manohar opened with a relatable scenario: wanting to compare salaries with Jakob without disclosing personal figures. Their solution, a privacy-preserving protocol, lets three parties—Manohar, Jakob, and their CTO Dmitar—compute an average securely. Each generates three random numbers summing to zero, sharing them such that each party holds a unique view of partial sums. In Scala, this is modeled with a SecretValue type for private integers and a SharedNumber list, accessible only by the corresponding party. Each sums their shares, publishes the result, and the final sum reveals the average without exposing individual salaries. This protocol, using random shares, ensures no single party can deduce another’s data unless all communications are intercepted, balancing simplicity and security.

Secure Multi-Party Computation

Jakob explained MPC as a cryptographic subfield enabling joint function computation without revealing private inputs. The salary example used addition, but MPC supports multiplication via Beaver triplets, precomputed by a trusted dealer for efficiency. With addition and multiplication, MPC handles polynomials, enabling linear and logistic regression or exponential approximations via Taylor polynomials. Manohar highlighted Scala’s role in modeling these operations, with functions for element-wise addition and revealing sums. The protocol achieves information-theoretic security for integers, where masked values are indistinguishable, but floating-point numbers require computational security due to distribution challenges. This flexibility makes MPC suitable for complex computations, from machine learning to statistical analysis, all while preserving privacy.

Real-World Applications

MPC shines in domains where data sensitivity or legal constraints, like GDPR, restrict sharing. Manohar cited ING, a bank building credit-scoring models across European countries without moving user data across borders, complying with GDPR. Another compelling case involved satellite operators—American, Russian, or Chinese—secretly computing collision risks to avoid incidents like the 2009 Iridium-Cosmos crash, which threatened the International Space Station. Jakob emphasized that Inpher’s XOR platform, legally vetted by Baker McKenzie, ensures GDPR compliance by keeping data at its source. These use cases, from finance to defense, underscore MPC’s value in enabling secure collaboration, with Scala providing a robust, type-safe foundation for protocol implementation.

Building a Compiler for MPC

To scale MPC beyond simple embeddings, Manohar and Jakob’s team developed a compiler at Inpher, targeting a high-level language resembling Python or Scala for data scientists. This compiler transforms linear algebra-style code into low-level primitives for distributed execution across parties’ virtual machines, verified to prevent data leaks. It performs static analysis to optimize memory and communication, inferring masking parameters to minimize computational overhead. For example, multiplying masked floating-point numbers risks format explosion, so the compiler uses fixed-point representations and statistical bounds to maintain efficiency. The output, resembling assembly for MPC engines, manages memory allocation and propagates data dimensions. While currently MPC-focused, the compiler’s design could integrate other privacy techniques, offering a versatile platform for secure computation.

Hashtags: #ScalaDays2019 #Scala #Privacy #MPC