Recent Posts
Archives

PostHeaderIcon CTO Perspective: Choosing a Tech Stack for Mainframe Rebuild

Original post

From LinkedIn: https://www.linkedin.com/posts/matthias-patzak_cto-technology-activity-7312449287647375360-ogNg?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAWqBcBNS5uEX9jPi1JPdGxlnWwMBjXwaw

Summary of the question

As CTO for a mainframe rebuild (core banking/insurance/retail app, 100 teams/1000 people with Cobol expertise), considering Java/Kotlin, TypeScript/Node.js, Go, and Python. Key decision criteria are technical maturity/stability, robust community, and innovation/adoption. The CTO finds these criteria sound and seeks a language recommendation.

TL;DR: my response

  • Team, mainframe rebuild: Java/Kotlin are frontrunners due to maturity, ecosystem, and team’s Java-adjacent skills. Go has niche potential. TypeScript/Node.js and Python less ideal for core.
  • Focus now: deep PoC comparing Java (Spring Boot) vs. Kotlin on our use cases. Evaluate developer productivity, readability, interoperability, performance.
  • Develop comprehensive Java/Kotlin training for our 100 Cobol-experienced teams.
  • Strategic adoption plan (Java, Kotlin, or hybrid) based on PoC and team input is next.
  • This balances proven stability with modern practices on the JVM for our core.

My detailed opinion

As a CTO with experience in these large-scale transformations, my priority remains a solution that balances technical strength with the pragmatic realities of our team’s current expertise and long-term maintainability.

While Go offers compelling performance characteristics, the specific demands of our core business application – be it in banking, insurance, or retail – often prioritize a mature ecosystem, robust enterprise patterns, and a more gradual transition path for our significant team. Given our 100 teams deeply skilled in Cobol, the learning curve and the availability of readily transferable concepts become key considerations.

Therefore, while acknowledging Go’s strengths in certain cloud-native scenarios, I want to emphasize the strategic advantages of the Java/Kotlin ecosystem for our primary language choice, with a deliberate hesitation and deeper exploration between these two JVM-based options.

Re-emphasizing Java and Exploring Kotlin More Deeply:

  • Java’s Enduring Strength: Java’s decades of proven stability in building mission-critical enterprise systems cannot be overstated. The JVM’s resilience, the vast array of mature libraries and frameworks (especially Spring Boot), and the well-established architectural patterns provide a solid and predictable foundation. Moreover, the sheer size of the Java developer community ensures a deep pool of talent and readily available support for our teams as they transition. For a core system in a regulated industry, this level of established maturity significantly mitigates risk.

  • Kotlin’s Modern Edge and Interoperability: Kotlin presents a compelling evolution on the JVM. Its modern syntax, null safety features, and concise code can lead to increased developer productivity and reduced boilerplate – benefits I’ve witnessed firsthand in JVM-based projects. Crucially, Kotlin’s seamless interoperability with Java is a major strategic advantage. It allows us to:

    • Gradually adopt Kotlin: Teams can start by integrating Kotlin into existing Java codebases, allowing for a phased learning process without a complete overhaul.
    • Leverage the entire Java ecosystem: Kotlin developers can effortlessly use any Java library or framework, giving us access to the vast resources of the Java world.
    • Attract modern talent: Kotlin’s growing popularity can help us attract developers who are excited about working with a modern, yet stable, language on a proven platform.

Why Hesitate Between Java and Kotlin?

The decision of whether to primarily adopt Java or Kotlin (or a strategic mix) requires careful consideration of our team’s specific needs and the long-term vision:

  • Learning Curve: While Kotlin is designed to be approachable for Java developers, there is still a learning curve associated with its new syntax and features. We need to assess how quickly our large Cobol-experienced team can become proficient in Kotlin.
  • Team Preference and Buy-in: Understanding our developers’ preferences and ensuring buy-in for the chosen language is crucial for successful adoption.
  • Long-Term Ecosystem Evolution: While both Java and Kotlin have strong futures on the JVM, we need to consider the long-term trends and the level of investment in each language within the enterprise space.
  • Specific Use Cases: Certain parts of our system might benefit more from Kotlin’s conciseness or specific features, while other more established components might initially remain in Java.

Proposed Next Steps (Revised Focus):

  1. Targeted Proof of Concept (PoC) – Deep Dive into Java and Kotlin: Instead of a broad PoC including Go, let’s focus our initial efforts on a detailed comparison of Java (using Spring Boot) and Kotlin on representative use cases from our core business application. This PoC should specifically evaluate:
    • Developer Productivity: How quickly can teams with a Java-adjacent mindset (after initial training) develop and maintain code in both languages?
    • Code Readability and Maintainability: How do the resulting codebases compare in terms of clarity and ease of understanding for a large team?
    • Interoperability Scenarios: How seamlessly can Java and Kotlin code coexist and interact within the same project?
    • Performance Benchmarking: While the JVM provides a solid base, are there noticeable performance differences for our specific workloads?
  2. Comprehensive Training and Upskilling Program: We need to develop a detailed training program that caters to our team’s Cobol background and provides clear pathways for learning both Java and Kotlin. This program should include hands-on exercises and mentorship opportunities.
  3. Strategic Adoption Plan: Based on the PoC results and team feedback, we’ll develop a strategic adoption plan that outlines whether we’ll primarily focus on Java, Kotlin, or a hybrid approach. This plan should consider the long-term maintainability and talent acquisition goals.

While Go remains a valuable technology for specific niches, for the core of our mainframe rebuild, our focus should now be on leveraging the mature and evolving Java/Kotlin ecosystem and strategically determining the optimal path for our large and experienced team. This approach minimizes risk while embracing modern development practices on a proven platform.

PostHeaderIcon 🚀 Mastering Flyway Migrations with Maven

Managing database migrations can be tricky, but Flyway simplifies the process with versioned scripts for schema changes, data updates, and rollbacks.
Here’s a quick guide with useful examples:

✅ 1️⃣ Creating a Table (V1 – Initial migration)
“`
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50)…,
email VARCHAR(100)…,
created_at TIMESTAMP…
);
“`
✅ 2️⃣ Inserting Sample Data (V2)
“`
INSERT INTO users (username, email) VALUES
(‘alice’, ‘alice@example.com‘),
(‘bob’, ‘bob@…’);
“`

✅ 3️⃣ Adding a New Column (V3 – Schema change)
“`
ALTER TABLE users ADD COLUMN last_login TIMESTAMP;
“`

✅ 4️⃣ Renaming a Column (V4)
“`
ALTER TABLE users RENAME COLUMN email TO contact;
“`

♻ Undo Script (U4__Rename_email_to_contact.sql)
“`
ALTER TABLE users RENAME COLUMN contact TO email;
“`

✅ 5️⃣ Deleting a Column (V5)
“`
ALTER TABLE users DROP COLUMN last_login;
“`

♻ Undo Script (U5__Revert_remove_last_login.sql)
“`
ALTER TABLE users ADD COLUMN last_login TIMESTAMP;
“`

✅ 6️⃣ Deleting Specific Data (V6)
“`
DELETE FROM users WHERE username = ‘alice’;
“`

♻ Undo Script (U6__Revert_delete_user.sql)
“`
INSERT INTO users (username, contact) VALUES (‘alice’, ‘alice@example.com‘);
“`

💡 Configuring Flyway in pom.xml
To integrate Flyway into your Spring Boot or Java project, add the following configuration in your `pom.xml`:
“`
<properties>
<flyway.version>11.4.1</flyway.version>
</properties>

<dependencies>
<!– Flyway Core –>
<dependency>
<groupId>org.flywaydb</groupId>
<artifactId>flyway-core</artifactId>
<version>${flyway.version}</version>
</dependency>

<!– Database Driver (Example: PostgreSQL) –>
<dependency>
“org.postgresql:postgresql:runtime”
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.flywaydb</groupId>
<artifactId>flyway-maven-plugin</artifactId>
<version>${flyway.version}</version>
<configuration>
<url>jdbc:postgresql://localhost:5432/mydb</url>
<user>myuser</user>
<password>mypassword</password>
<schemas>public</schemas>
<locations>filesystem:src/main/resources/db/migration</locations>
</configuration>
</plugin>
</plugins>
</build>
“`

📂 Migration scripts should be placed in:
`src/main/resources/db/migration/`
Example:
“`
V1__Create_users_table.sql
V2__Insert_sample_data.sql
V3__Add_last_login_column.sql
“`

💡 Flyway Maven Plugin Commands
👉Apply migrations:
“`mvn flyway:migrate“`
👉Undo the last migration (if `flyway.licenseKey` is provided):
“`mvn flyway:undo“`
👉Check migration status:
“`mvn flyway:info“`
👉Repair migration history:
“`mvn flyway:repair“`
Clean database (⚠Deletes all tables!):
“`mvn flyway:clean“`

PostHeaderIcon 🚀 Making Spring AOP Work with Struts 2: A Powerful Combination! 🚀

Spring AOP (Aspect-Oriented Programming) and Struts 2 might seem like an unusual pairing, but when configured correctly, they can bring cleaner, more modular, and reusable code to your Struts-based applications.

The Challenge:

  • Struts 2 manages its own action instances for each request, while Spring’s AOP relies on proxying beans managed by the Spring container. This means Struts actions are not Spring beans by default, making AOP trickier to apply.
  • The Solution: Making Struts 2 Actions Spring-Managed
  • To make Spring AOP work with Struts 2, follow these steps:


✅ Step 1: Enable Spring integration with Struts 2


Ensure your `struts.xml` is configured to use Spring:

“`<constant name=”struts.objectFactory” value=”spring”/>“`

This makes Struts retrieve action instances from the Spring context instead of creating them directly.

✅ Step 2: Define Actions as Spring Beans


In your applicationContext.xml or equivalent Spring configuration, define your Struts actions:

“`
<bean id=”myAction” class=”com.example.MyStrutsAction” scope=”prototype”/>
“`
Setting the scope to “prototype” ensures a new instance per request, preserving Struts 2 behavior.


✅ Step 3: Apply AOP with `@Aspect`

Now, you can apply Spring AOP to your Struts actions just like any other Spring-managed bean:

“`
@Aspect
@Component
public class LoggingAspect {

@Before(“execution(* com.example.MyStrutsAction.execute(..))”)
public void logBefore(JoinPoint joinPoint) {
System.out.println(“Executing: ” + joinPoint.getSignature().toShortString());
}
}
“`

This will log method executions before any `execute()` method in your actions runs!

Key Benefits of This Approach
🔹 Separation of Concerns – Keep logging, security, and transaction management outside your action classes.
🔹 Reusability – Apply cross-cutting concerns like security or caching without modifying Struts actions.
🔹 Spring’s Full Power – Leverage dependency injection and other Spring features within your Struts 2 actions.
🔥 By integrating Spring AOP with Struts 2, you get the best of both worlds: Struts’ flexible request handling and Spring’s powerful aspect-oriented capabilities. Ready to make your legacy Struts 2 app cleaner and more maintainable? Let’s discuss!

PostHeaderIcon SpringBatch: How to have different schedules, per environment, for instance: keep the fixedDelay=60000 in prod, but schedule with a Cron expression in local dev?

Case

In SpringBatch, a batch is scheduled in a bean JobScheduler with

[java]
@Scheduled(fixedDelay = 60000)
void doSomeThing(){…}
[/java]

.
How to have different schedules, per environment, for instance: keep the fixedDelay=60000 in prod, but schedule with a Cron expression in local dev?

Solution

Add this block to the <JobScheduler:

[java]
@Value("${jobScheduler.scheduling.enabled:true}")
private boolean schedulingEnabled;

@Value("${jobScheduler.scheduling.type:fixedDelay}")
private String scheduleType;

@Value("${jobScheduler.scheduling.fixedDelay:60000}")
private long fixedDelay;

@Value("${jobScheduler.scheduling.initialDelay:0}")
private long initialDelay;

@Value("${jobScheduler.scheduling.cron:}")
private String cronExpression;

@Scheduled(fixedDelayString = "${jobScheduler.scheduling.fixedDelay:60000}", initialDelayString = "${jobScheduler.scheduling.initialDelay:0}")
@ConditionalOnProperty(name = "jobScheduler.scheduling.type", havingValue = "fixedDelay")
public void scheduleFixedDelay() throws Exception {
if ("fixedDelay".equals(scheduleType) || "initialDelayFixedDelay".equals(scheduleType)) {
doSomething();
}
}

@Scheduled(cron = "${jobScheduler.scheduling.cron:0 0 1 * * ?}")
@ConditionalOnProperty(name = "jobScheduler.scheduling.type", havingValue = "cron", matchIfMissing = false)
public void scheduleCron() throws Exception {
if ("cron".equals(scheduleType)) {
doSomething(); }
}
[/java]

In application.yml, add:

[xml]
jobScheduler:
# noinspection GrazieInspection
scheduling:
enabled: true
type: fixedDelay
fixedDelay: 60000
initialDelay: 0
cron: 0 0 1 31 2 ? # every 31st of February… which means: never
[/xml]

(note the cron expression: leaving it empty may prevent SpringBoot from starting)

In application.yml, add:

[xml]
jobScheduler:
# noinspection GrazieInspection
scheduling:
type: cron
cron: 0 0 1 * * ?
[/xml]

It should work now ;-).

PostHeaderIcon SnowFlake❄: Why does SUM() return NULL instead of 0?

🙋‍♀️🙋‍♂️ Question

I’m running a SUM() query on a Snowflake table, but instead of returning 0 when there are no matching rows, it returns NULL.

For example, I have this table:

CREATE OR REPLACE TABLE sales (
    region STRING,
    amount NUMBER
);

INSERT INTO sales VALUES 
    ('North', 100),
    ('South', 200),
    ('North', NULL),
    ('East', 150);

Now, I run the following query to sum the sales for a region that doesn’t exist:

SELECT SUM(amount) FROM sales WHERE region = 'West';
  • Expected output: 0
  • Actual output: NULL

Why is this happening, and how can I make Snowflake return 0 instead of NULL?

Answer

 First and basic approach: explicitly filter out NULL before aggregation:
SELECT SUM(CASE WHEN amount IS NOT NULL THEN amount ELSE 0 END) AS total_sales 
FROM sales 
WHERE region = 'West'; 

This method ensures that NULL does not interfere with the SUM calculation.

✅ Even better: Use COALESCE() to handle NULL.

By default, SUM() returns NULL if there are no rows that match the condition or if all matching rows contain NULL.

🔹 To return 0 instead of NULL, use COALESCE(), which replaces NULL with a default value:

SELECT COALESCE(SUM(amount), 0) AS total_sales 
FROM sales 
WHERE region = 'West'; 

🔹 This ensures that when SUM(amount) is NULL, it gets converted to 0.

 (copied to https://stackoverflow.com/questions/79524739/why-does-sum-return-null-instead-of-0 )

PostHeaderIcon A Tricky Java Question

Here’s a super tricky Java interview question that messes with developer intuition:

❓ Weird Question:

“What will be printed when executing the following code?”

import java.util.*;
public class TrickyJava {
 public static void main(String[] args) {
 List list = Arrays.asList("T-Rex", "Velociraptor", "Dilophosaurus");
 list.replaceAll(s -> s.toUpperCase());
 System.out.println(list);
 }
 }

The Trap:

At first glance, everything looks normal:

Arrays.asList(...) creates a List.
replaceAll(...) is a method in List that modifies elements using a function.
Strings are converted to uppercase.
Most developers will expect this output:

[T-REX, VELOCIRAPTOR, DILOPHOSAURUS]

But surprise! This code sometimes throws an UnsupportedOperationException.

 

✅ Correct Answer:

The output depends on the JVM implementation!

It might work and print:

[T-REX, VELOCIRAPTOR, DILOPHOSAURUS]

Or it might crash with:

Exception in thread "main" java.lang.UnsupportedOperationException
at java.util.AbstractList$Itr.remove(AbstractList.java:572)
at java.util.AbstractList.remove(AbstractList.java:212)
at java.util.AbstractList$ListItr.remove(AbstractList.java:582)
at java.util.List.replaceAll(List.java:500)

Why?

Arrays.asList(...) does not return a regular ArrayList, but rather a fixed-size list backed by an array.
The replaceAll(...) method attempts to modify the list in-place, which is not allowed for a fixed-size list.
Some JVM implementations optimize this internally, making it work, but it is not guaranteed to succeed.

Key Takeaways

Arrays.asList(...) returns a fixed-size list, not a modifiable ArrayList.
Modifying it directly (e.g., add(), remove(), replaceAll()) can fail with UnsupportedOperationException.
Behavior depends on the JVM implementation and internal optimizations.

How to Fix It?

To ensure safe modification, wrap the list in a mutable ArrayList:

List list = new ArrayList<>(Arrays.asList("T-Rex", "Velociraptor", "Dilophosaurus"));
list.replaceAll(s -> s.toUpperCase());
System.out.println(list); // ✅ Always works!

PostHeaderIcon [PHPForumParis 2024] Is WordPress a lost cause?

Cyrille Coquard, a seasoned WordPress developer, took the stage at PHPForumParis2024 to address a contentious question: Is WordPress a lost cause? With humor and insight, Cyrille tackles the platform’s reputation, often marred by perceptions of outdated code and technical debt. By drawing parallels between WordPress and PHP’s evolution, he argues that WordPress is undergoing a quiet transformation toward professionalism. His talk, aimed at PHP developers, demystifies WordPress’s architecture and advocates for modern development practices to elevate its potential.

The Shared Legacy of WordPress and PHP

Cyrille opens by highlighting the intertwined histories of WordPress and PHP, both born in an era of amateur-driven development. This shared origin, while fostering accessibility, has led to technical debt that tarnishes their reputations. He compares WordPress to a “Fiat or Clio”—a practical, accessible tool for the masses—contrasting it with frameworks like Symfony, likened to a high-end race car. This metaphor underscores WordPress’s mission to democratize web creation, prioritizing user-friendliness over developer-centric complexity. Cyrille emphasizes that the platform’s early design choices, while not always optimal, reflect its commitment to simplicity and affordability.

Plugins and Themes: Extending WordPress’s Power

A core strength of WordPress lies in its extensibility through plugins and themes. Cyrille explains how themes allow for visual customization, enabling the 40% of websites powered by WordPress to look unique. Plugins, meanwhile, add functionality or modify behavior, addressing both generic and specific user needs. He illustrates this with examples like WooCommerce for e-commerce and Gravity Forms for form creation. By leveraging pre-existing plugins, developers can meet common requirements efficiently, reserving custom development for unique challenges. This approach, Cyrille notes, significantly reduces costs, as seen in his work at WP Rocket, where a single plugin optimizes performance across millions of sites.

Modernizing WordPress Development with Launchpad

To address WordPress’s development challenges, Cyrille introduces Launchpad, his open-source framework designed to bring modern PHP practices to the WordPress ecosystem. Inspired by Laravel and Symfony, Launchpad simplifies plugin creation by introducing concepts like subscribers and service providers. These patterns, familiar to PHP developers, reduce the learning curve for newcomers while promoting clean, maintainable code. Cyrille demonstrates how to create a simple plugin, emphasizing event-driven development that hooks into WordPress’s core functionality. By providing a standardized, well-documented framework, Launchpad aims to bridge the gap between WordPress’s amateur roots and professional standards.

Hashtags: #WordPress #PHP #WebDevelopment #Launchpad #PHPForumParis2024 #CyrilleCoquard #WPRocket

PostHeaderIcon [PyData Global 2024] Making Gaussian Processes Useful

Bill Engels and Chris Fonnesbeck, both brilliant software developers from PyMC Labs, delivered an insightful 90-minute tutorial at PyData Global 2024 titled “Making Gaussian Processes Useful.” Aimed at demystifying Gaussian processes (GPs) for practicing data scientists, their session bridged the gap between theoretical complexity and practical application. Using baseball analytics as a motivating example, Chris introduced Bayesian modeling and GPs, while Bill provided hands-on strategies for overcoming computational and identifiability challenges. This post explores their comprehensive approach, offering actionable insights for leveraging GPs in real-world scenarios.

Bayesian Inference and Probabilistic Programming

Chris kicked off the tutorial by grounding the audience in Bayesian inference, often implemented through probabilistic programming. He described it as writing software with partially random outputs, enabled by languages like PyMC that provide primitives for random variables. Unlike deterministic programming, probabilistic programming allows modeling distributions over variables, including functions via GPs. Chris explained that Bayesian inference involves specifying a joint probability model for data and parameters, using Bayes’ formula to derive the posterior distribution. This posterior reflects what we learn about unknown parameters after observing data, with the likelihood and priors as key components. The computational challenge lies in the normalizing constant, a multidimensional integral that probabilistic programming libraries handle numerically, freeing data scientists to focus on model specification.

Hierarchical Modeling with Baseball Data

To illustrate Bayesian modeling, Chris used the example of estimating home run probabilities for baseball players. He introduced a simple unpooled model where each player’s home run rate is modeled with a beta prior and a binomial likelihood, reflecting hits over plate appearances. Using PyMC, this model is straightforward to implement, with each line of code corresponding to a mathematical component. However, Chris highlighted its limitations: players with few at-bats yield highly uncertain estimates, leaning heavily on the flat prior. This led to the introduction of hierarchical modeling, or partial pooling, where individual home run rates are drawn from a population distribution with hyperparameters (mean and standard deviation). This approach shrinks extreme estimates, producing more realistic rates, as seen when comparing unpooled estimates (with outliers up to 80%) to pooled ones (clustered below 10%, aligning with real-world data like Barry Bonds’ 15% peak).

Gaussian Processes as a Hierarchical Extension

Chris transitioned to GPs, framing them as a generalization of hierarchical models for continuous predictors, such as player age affecting home run rates. Unlike categorical groups, GPs model relationships where similarity decreases with distance (e.g., younger players’ performance is more similar). A GP is a distribution over functions, parameterized by a mean function (often zero) and a covariance function, which defines how outputs covary based on input proximity. Chris emphasized two key properties of multivariate Gaussians—easy marginalization and conditioning—that make GPs computationally tractable despite their infinite dimensionality. By evaluating a covariance function at specific inputs, a GP yields a finite multivariate normal, enabling flexible, nonlinear modeling without explicitly parameterizing the function’s form.

Computational Challenges and the HSGP Approximation

One of the biggest hurdles with GPs is their computational cost, particularly for latent GPs used with non-Gaussian data like binomial home run counts. Chris explained that the posterior covariance function requires inverting a matrix, which scales cubically with the number of data points (e.g., thousands of players). This makes exact GPs infeasible for large datasets. To address this, he introduced the Hilbert Space Gaussian Process (HSGP) approximation, which reduces cubic compute time to linear by approximating the GP with a finite set of basis functions. These functions depend on the data, while coefficients rely on hyperparameters like length scale and amplitude. Chris demonstrated implementing an HSGP in PyMC to model age effects, specifying 100 basis functions and a boundary three times the data range, resulting in a model that ran in minutes rather than years.

Practical Debugging with GPs

Bill took over to provide practical tips for fitting GPs, emphasizing their sensitivity to priors and the need for debugging. He revisited the baseball example, modeling batting averages with a hierarchical model before introducing a GP to account for age effects. Bill showed that a standard hierarchical model treats players as exchangeable, pooling information equally across all players. A GP, however, allows local pooling, where players of similar ages inform each other more strongly. He introduced the exponentiated quadratic covariance function, which uses a length scale to define “closeness” in age and a scale parameter for effect size. Bill highlighted common pitfalls, such as small length scales reducing a GP to a standard hierarchical model or large length scales causing identifiability issues with intercepts, and provided solutions like informative priors (e.g., inverse gamma, log-normal) to constrain length scales to realistic ranges.

Advanced GP Modeling for Slugging Percentage

Bill concluded with a sophisticated model for slugging percentage, a metric reflecting hitting power, using 10 years of baseball data. The model included player, park, and season effects, with an HSGP to capture age effects. He initially used an exponentiated quadratic covariance function but encountered sampling issues (divergences), a common problem with GPs. Bill fixed this by switching to a Matern 5/2 covariance function, which assumes less smoothness and better suits real-world data, and adopting a centered parameterization for stronger age effects. These changes reduced divergences to near zero, producing a reliable model. The resulting age curve peaked at 26, aligning with baseball wisdom, and showed a decline for older players, demonstrating the GP’s ability to capture nonlinear trends.

Key Takeaways and Resources

Bill and Chris emphasized that GPs extend hierarchical models by enabling local pooling over continuous variables, but their computational and identifiability challenges require careful handling. Informative priors, appropriate covariance functions (e.g., Matern over exponential quadratic), and approximations like HSGP are critical for practical use. They encouraged using PyMC for its high-level interface and the Nutpie sampler for efficiency, while noting alternatives like GPFlow for specialized needs. Their GitHub repository, linked below, includes slides and notebooks for further exploration, making this tutorial a valuable resource for data scientists aiming to apply GPs effectively.

Links:

 

PostHeaderIcon [Scala IO Paris 2024] Escaping the False Dichotomy: Sanely Automatic Derivation in Scala

In the ScalaIO Paris 2024 session “Slow Auto, Inconvenient Semi: Escaping False Dichotomy with Sanely Automatic Derivation,” Mateusz Kubuszok delivered a user-focused exploration of typeclass derivation in Scala. With over nine years of Scala experience and as a co-maintainer of the Chimney library, Mateusz examined the trade-offs between automatic and semi-automatic derivation, proposing a “sanely automatic” approach that balances usability, compile-time performance, and runtime efficiency. Using JSON serialization libraries like Circe and Jsoniter-Scala as case studies, the talk highlighted how library design choices impact users and offered a practical solution to common derivation pain points, supported by benchmarks and a public repository.

Understanding Typeclasses and Derivation

Mateusz began by demystifying typeclasses, describing them as parameterized interfaces (e.g., an Encoder[T] for JSON serialization) whose implementations are provided by the compiler via implicits or givens. Typeclass derivation automates creating these implementations for complex types like case classes or sealed traits by combining implementations for their fields or subtypes. For example, encoding a User(name: String, address: Address) requires encoders for String and Address, which the compiler recursively resolves.

Derivation comes in two flavors: automatic and semi-automatic. Automatic derivation, popularized by Circe, uses implicits to generate implementations on-demand, but can lead to slow compilation and runtime performance issues. Semi-automatic derivation requires explicit calls (e.g., deriveEncoder[Address]) to define implicits, ensuring consistency but adding manual overhead. Mateusz emphasized that these choices, made by library authors, significantly affect users through compilation times, error messages, and performance.

The Pain Points of Automatic and Semi-Automatic Derivation

Automatic derivation in Circe recursively generates encoders for nested types, checking for existing implicits before falling back to derivation. This can cause circular dependencies or stack overflows if not managed carefully. Semi-automatic derivation avoids this by always generating new instances, but requires users to define implicits for every intermediate type, increasing code verbosity. Mateusz shared anecdotes of developers banning automatic derivation due to compilation times ballooning to 10 minutes for complex JSON hierarchies.

Benchmarks on a nested case class (five levels deep) revealed stark differences. On Scala 2.13, Circe’s automatic derivation took 14 seconds to compile a single file, versus 12 seconds for semi-automatic. On Scala 3, automatic derivation soared to 46 seconds (cold JVM) or 16 seconds (warm), while semi-automatic took 10 seconds or 1 second, respectively. Runtime performance also suffered: automatic derivation on Scala 3 was up to 10 times slower than semi-automatic, likely due to large inlined methods overwhelming JVM optimization.

Comparing Libraries: Circe vs. Jsoniter-Scala

Mateusz contrasted Circe with Jsoniter-Scala, which prioritizes performance. Jsoniter-Scala uses a single macro to generate recursive codec implementations, avoiding intermediate implicits except for custom overrides. This reduces memory allocation and compilation overhead. Benchmarks showed Jsoniter-Scala compiling faster than Circe (e.g., 2–3 times faster on Scala 3) and running three times faster, even when Circe was given a head start by testing on JSON representations rather than strings.

Jsoniter-Scala’s approach minimizes implicit searches, embedding logic in macros instead of relying on the compiler’s typechecker. For example, encoding a User with an Address field involves one codec handling all nested types, unlike Circe’s recursive implicit resolution. This results in fewer allocations and better error messages, as macros can pinpoint failure causes (e.g., a missing implicit for a specific field).

Sanely Automatic Derivation: A New Approach

Inspired by Jsoniter-Scala, Mateusz proposed “sanely automatic derivation” to combine automatic derivation’s convenience with semi-automatic’s performance. Using his Chimney library as a testbed, he split typeclasses into two traits: one for user-facing APIs and another for automatic derivation, invisible to implicit searches to avoid circular dependencies. This allows recursive derivation with minimal implicits, using macros to handle nested types efficiently.

Mateusz implemented this for a Jsoniter-Scala wrapper on Scala 3, achieving compilation times comparable to Jsoniter-Scala’s single-implicit approach and faster than Circe’s semi-automatic derivation. Runtime performance matched Jsoniter-Scala’s, with negligible overhead. Error messages were improved by logging macro actions (e.g., which field caused a failure) via Scala’s macro settings, viewable in IDEs like VS Code without console output.

A Fair Comparison: Custom Typeclass Benchmark

To ensure fairness, Mateusz created a custom Show typeclass (similar to Circe’s ShowPretty) for pretty-printing case classes using StringBuilder. He implemented it with Shapeless (Scala 2), Mirrors (Scala 3), Magnolia (both), and his sanely automatic approach. Initial results showed his approach outperforming Shapeless and Mirrors but trailing Magnolia’s semi-automatic derivation. By adding caching within macros (reusing derived implementations for repeated types), his approach became the fastest across all platforms, compiling in less time than Shapeless, Mirrors, or Magnolia, with better runtime performance.

This caching, inspired by Jsoniter-Scala, avoids re-deriving the same type multiple times within a macro, reducing method size and enabling JVM optimization. The change required minimal code, demonstrating that library authors can adopt this approach with a single, non-invasive pull request.

Implications for Scala’s Future

Mateusz concluded by addressing Scala’s reputation for slow compilation, citing a Virtus Lab survey where 53% of developers complained about compile times, often tied to typeclass derivation. Libraries like Shapeless and Magnolia prioritize developer convenience over user experience, leading to opaque errors and performance issues. His sanely automatic derivation offers a user-friendly alternative: one import, fast compilation, efficient runtime, and debuggable errors.

By sharing his Chimney Macro Commons library, Mateusz encourages library authors to rethink derivation strategies. While macros require more maintenance than Shapeless or Magnolia, they become viable as libraries scale and prioritize user needs. He urged developers to provide feedback to library maintainers, challenging assumptions that automatic and semi-automatic are the only options, to make Scala more accessible and production-ready.

Hashtags: #Scala #TypeclassDerivation #ScalaIOParis2024 #Circe #JsoniterScala #Chimney #Performance #Macros #Scala3

PostHeaderIcon [DevoxxBE2024] How JavaScript Happened: A Short History of Programming Languages

In an engaging session at Devoxx Belgium 2024, Mark Rendle traced the evolution of programming languages leading to JavaScript’s creation in 1995. Titled “How JavaScript Happened: A Short History of Programming Languages,” the talk blended humor and history, from Ada Lovelace’s 1840s program to JavaScript’s rapid development for Netscape Navigator 2.0. Despite a brief battery scare during the presentation, Rendle’s storytelling and FizzBuzz examples across languages captivated the audience, offering insights into language design and JavaScript’s eclectic origins.

The Dawn of Programming

Rendle began in the 1830s with Ada Lovelace, who wrote the first program for Charles Babbage’s unbuilt Analytical Engine, introducing programming notation 120 years before computers existed. The 1940s saw programmable machines like Colossus, built to crack German ciphers, and ENIAC, programmed by women who deciphered its operation without manuals. These early systems, configured via patch cables, laid the groundwork for modern computing, though programming remained labor-intensive.

The Rise of High-Level Languages

The 1950s marked a shift with Fortran, created by John Backus to simplify machine code translation for IBM’s 701 mainframe. Fortran introduced if statements, the asterisk for multiplication (due to punch card limitations), and the iterator variable i, still ubiquitous today. ALGOL 58 and 60 followed, bringing block structures, if-then-else, and BNF grammar, formalized by Backus. Lisp, developed by John McCarthy, introduced first-class functions, the heap, and early garbage collection, while Simula pioneered object-oriented programming with classes and inheritance.

From APL to C and Beyond

Rendle highlighted APL’s concise syntax, enabled by its unique keyboard and dynamic typing, influencing JavaScript’s flexibility. The 1960s and 70s saw BCPL, B, and C, with C introducing curly braces, truthiness, and the iconic “hello world” program. Smalltalk added reflection, virtual machines, and the console, while ML introduced functional programming concepts like arrow functions. Scheme, a simplified Lisp, directly influenced JavaScript’s initial design as a browser scripting language, shaped to compete with Java applets.

JavaScript’s Hasty Creation

In 1995, Brendan Eich created JavaScript in ten days for Netscape Navigator 2.0, initially as a Scheme-like language with a DOM interface. To counter Java applets, it adopted a C-like syntax and prototypal inheritance (inspired by Self), as classical inheritance wasn’t feasible in Scheme. Rendle humorously speculated on advising Eich to add static typing and classical inheritance, noting JavaScript’s roots in Fortran, ALGOL, Lisp, and others. Despite its rushed origins, JavaScript inherited a rich legacy, from Fortran’s syntax to Smalltalk’s object model.

The Legacy and Future of JavaScript

Rendle concluded by reflecting on JavaScript’s dominance, driven by its browser integration, and its ongoing evolution, with features like async/await (from C#) and proposed gradual typing. He dismissed languages like COBOL and Pascal for lacking influential contributions, crediting BASIC for inspiring programmers despite adding little to language design. JavaScript, a synthesis of 70 years of innovation, continues to evolve, shaped by decisions from 1955 to today, proving no language is immune to historical influence.

Hashtags: #JavaScript #ProgrammingHistory #MarkRendle #DevoxxBE2024