Recent Posts
Archives

Posts Tagged ‘CloudNative’

PostHeaderIcon [DevoxxFR2025] Boosting Java Application Startup Time: JVM and Framework Optimizations

In the world of modern application deployment, particularly in cloud-native and microservice architectures, fast startup time is a crucial factor impacting scalability, resilience, and cost efficiency. Slow-starting applications can delay deployments, hinder auto-scaling responsiveness, and consume resources unnecessarily. Olivier Bourgain, in his presentation, delved into strategies for significantly accelerating the startup time of Java applications, focusing on optimizations at both the Java Virtual Machine (JVM) level and within popular frameworks like Spring Boot. He explored techniques ranging from garbage collection tuning to leveraging emerging technologies like OpenJDK’s Project Leyden and Spring AOT (Ahead-of-Time Compilation) to make Java applications lighter, faster, and more efficient from the moment they start.

The Importance of Fast Startup

Olivier began by explaining why fast startup time matters in modern environments. In microservices architectures, applications are frequently started and stopped as part of scaling events, deployments, or rolling updates. A slow startup adds to the time it takes to scale up to handle increased load, potentially leading to performance degradation or service unavailability. In serverless or function-as-a-service environments, cold starts (the time it takes for an idle instance to become ready) are directly impacted by application startup time, affecting latency and user experience. Faster startup also improves developer productivity by reducing the waiting time during local development and testing cycles. Olivier emphasized that optimizing startup time is no longer just a minor optimization but a fundamental requirement for efficient cloud-native deployments.

JVM and Garbage Collection Optimizations

Optimizing the JVM configuration and understanding garbage collection behavior are foundational steps in improving Java application startup. Olivier discussed how different garbage collectors (like G1, Parallel, or ZGC) can impact startup time and memory usage. Tuning JVM arguments related to heap size, garbage collection pauses, and just-in-time (JIT) compilation tiers can influence how quickly the application becomes responsive. While JIT compilation is crucial for long-term performance, it can introduce startup overhead as the JVM analyzes and optimizes code during initial execution. Techniques like Class Data Sharing (CDS) were mentioned as a way to reduce startup time by sharing pre-processed class metadata between multiple JVM instances. Olivier provided practical tips and configurations for optimizing JVM settings specifically for faster startup, balancing it with overall application performance.

Framework Optimizations: Spring Boot and Beyond

Popular frameworks like Spring Boot, while providing immense productivity benefits, can sometimes contribute to longer startup times due to their extensive features and reliance on reflection and classpath scanning during initialization. Olivier explored strategies within the Spring ecosystem and other frameworks to mitigate this. He highlighted Spring AOT (Ahead-of-Time Compilation) as a transformative technology that analyzes the application at build time and generates optimized code and configuration, reducing the work the JVM needs to do at runtime. This can significantly decrease startup time and memory footprint, making Spring Boot applications more suitable for resource-constrained environments and serverless deployments. Project Leyden in OpenJDK, aiming to enable static images and further AOT compilation for Java, was also discussed as a future direction for improving startup performance at the language level. Olivier demonstrated how applying these framework-specific optimizations and leveraging AOT compilation can have a dramatic impact on the startup speed of Java applications, making them competitive with applications written in languages traditionally known for faster startup.

Links:

PostHeaderIcon [DevoxxGR2025] Email Re-Platforming Case Study

George Gkogkolis from Travelite Group shared a 15-minute case study at Devoxx Greece 2025 on re-platforming to process 1 million emails per hour.

The Challenge

Travelite Group, a global OTA handling flight tickets in 75 countries, processes 350,000 emails daily, expected to hit 2 million. Previously, a SaaS ticketing system struggled with growing traffic, poor licensing, and subpar user experience. Sharding the system led to complex agent logins and multiplexing issues with the booking engine. Market research revealed no viable alternatives, as vendors’ licensing models couldn’t handle the scale, prompting an in-house solution.

The New Platform

The team built a cloud-native, microservices-based platform within a year, going live in December 2024. It features a receiving app, a React-based web UI with Mantine Dev, a Spring Boot backend, and Amazon DocumentDB, integrated with Amazon SES and S3. Emails land in a Postfix server, are stored in S3, and processed via EventBridge and SQS. Data migration was critical, moving terabytes of EML files and databases in under two months, achieving a peak throughput of 1 million emails per hour by scaling to 50 receiver instances.

Lessons Learned

Starting with migration would have eased performance optimization, as synthetic data didn’t match production scale. Cloud-native deployment simplified scaling, and a backward-compatible API eased integration. Open standards (EML, Open API) ensured reliability. Future plans include AI and LLM enhancements by 2025, automating domain allocation for scalability.

Links

PostHeaderIcon [DotJs2024] The Future of Serverless is WebAssembly

Envision a computing paradigm where applications ignite with the swiftness of a spark, unbound by the sluggish boot times of traditional servers, and orchestrated in a symphony of polyglot harmony. David Flanagan, a seasoned software engineer and educator with a storied tenure at Fermyon Technologies, unveiled this vision at dotJS 2024, championing WebAssembly (Wasm) as the linchpin for next-generation serverless architectures. Drawing from his deep immersion in cloud-native ecosystems—from Kubernetes orchestration to edge computing—Flanagan demystified how Wasm’s component model, fortified by WASI Preview 2, ushers in nanosecond-scale invocations, seamless interoperability, and unprecedented portability. This isn’t mere theory; it’s a blueprint for crafting resilient microservices that scale effortlessly across diverse runtimes.

Flanagan’s discourse pivoted on relatability, eschewing abstract metrics for visceral analogies. To grasp nanoseconds’ potency—where a single tick equates to a second in a thrashing Metallica riff like “Master of Puppets”—he likened Wasm’s cold-start latency to everyday marvels. Traditional JavaScript functions, mired in milliseconds, mirror a leisurely coffee brew at Starbucks; Wasm, conversely, evokes an espresso shot from a high-end machine, frothy and instantaneous. Benchmarks underscore this: Spin, Fermyon’s runtime, clocks in at 200 nanoseconds versus AWS Lambda’s 100-500 milliseconds, a gulf vast enough to render prior serverless pains obsolete. Yet, beyond velocity lies versatility—Wasm’s binary format, agnostic to origin languages, enables Rust, Go, Zig, or TypeScript modules to converse fluidly via standardized interfaces, dismantling silos that once plagued polyglot teams.

At the core lies the WIT (WebAssembly Interface Types) component model, a contractual scaffold ensuring type-safe handoffs. Flanagan illustrated with a Spin-powered API: a Rust greeter module yields to a TypeScript processor, each oblivious to the other’s internals yet synchronized via WIT-defined payloads. This modularity extends to stateful persistence—key-value stores mirroring Redis or SQLite datastores—without tethering to vendor lock-in. Cron scheduling, WebSocket subscriptions, even LLM inferences via Hugging Face models, integrate natively; a mere TOML tweak provisions MQTT feeds or GPU-accelerated prompts, all sandboxed for ironclad isolation. Flanagan’s live sketches revealed Spin’s developer bliss: CLI scaffolds in seconds, hot-reloading for iterative bliss, and Fermyon Cloud’s edge deployment scaling to zero cost.

This tapestry of traits—rapidity, portability, composability—positions Wasm as serverless’s salvation. Flanagan evoked Drupal’s Wasm incarnation: a full CMS, sans server, piping content through browser-native execution. For edge warriors, it’s liberation; for monoliths, a migration path sans rewrite. As toolchains mature—Wazero for Go, Wasmer for universal hosting—the ecosystem beckons builders to reimagine distributed systems, where functions aren’t fleeting but foundational.

Nanosecond Precision in Practice

Flanagan anchored abstractions in benchmarks, equating Wasm’s 200ns starts to life’s micro-moments—a blink’s brevity amplified across billions of requests. Spin’s plumbing abstracts complexities: TOML configs summon Redis proxies or SQLite veins, yielding KV/SQL APIs that ORMs like Drizzle embrace. This precision cascades to AI: one-liner prompts leverage remote GPUs, democratizing inference without infrastructural toil.

Polyglot Harmony and Extensibility

WIT’s rigor ensures Rust’s safety meshes with Go’s concurrency, TypeScript’s ergonomics—all via declarative interfaces. Spin’s extensibility invites custom components; 200 Rust lines birth integrations, from Wy modules to templated hooks. Flanagan heralded Fermyon Cloud’s provisioning: edge-global, zero-scale, GPU-ready— a canvas for audacious architectures where Wasm weaves the warp and weft.

Links:

PostHeaderIcon [DevoxxGR2024] The Art of Debugging Inside K8s Environment at Devoxx Greece 2024 by Andrii Soldatenko

At Devoxx Greece 2024, Andrii Soldatenko, a seasoned software engineer and tech evangelist at Dynatrace, delivered an engaging presentation on mastering the art of debugging within Kubernetes (K8s) environments. With a blend of humor, practical insights, and real-world strategies, Andrii illuminated the complexities of troubleshooting cloud-native applications. Drawing from his extensive experience, he provided actionable techniques to enhance debugging efficiency, making the session a valuable resource for developers navigating the intricacies of Kubernetes. His talk emphasized proactive design, robust tooling, and a systematic approach to resolving issues in distributed systems.

The Challenges of Debugging in Kubernetes

Andrii began by acknowledging the inherent difficulties of debugging in modern cloud-native environments. Unlike traditional development, where a local debugger suffices, Kubernetes introduces layers of complexity with containers, pods, and distributed architectures. He humorously outlined his “eight stages of debugging,” from denial (“this can’t happen”) to self-realization (“I wrote this code”), resonating with developers who face similar emotional journeys. These stages underscore the psychological and technical hurdles of troubleshooting in K8s, where issues often stem from accidental complexities like misconfigured resources or network policies.

The dynamic nature of Kubernetes, with its orchestration of pods, nodes, and services, demands a shift in debugging mindset. Andrii emphasized that while writing YAML manifests for K8s is straightforward, ensuring they function as intended is not. He highlighted the absence of comprehensive debugging guides, noting that most literature focuses on deployment rather than troubleshooting. This gap inspired his talk, which aimed to equip developers with practical strategies to diagnose and resolve issues effectively.

Strategies for Effective Debugging

To tackle Kubernetes debugging, Andrii proposed a structured approach, starting with a high-level mind map for assessing pod states. For instance, a pod in a “Pending” state might indicate resource shortages or port conflicts, while a “Crashing” pod could signal health probe failures. He focused on scenarios where pods are running but behaving unexpectedly, a common yet challenging issue. Andrii advocated revisiting init containers, which perform setup tasks like data migrations. By temporarily replacing their commands with a sleep directive, developers can use kubectl exec to inspect the container’s state, checking volumes, permissions, or network access.

For containers lacking debugging tools, Andrii introduced ephemeral containers, a Kubernetes feature since version 1.8 designed for interactive troubleshooting. By launching an ephemeral container with tools like netcat or a debugger, developers can inspect a pod’s state without altering its primary container. He shared a practical example of debugging a Go application by sharing process namespaces, allowing access to the application’s processes. This approach enables setting breakpoints and navigating code, even in minimal, distroless containers.

Leveraging Tools for Enhanced Debugging

Andrii showcased several tools to streamline Kubernetes debugging. He recommended building custom debug containers tailored to specific needs, such as including sqlite, python, or network utilities, and shared his own debug container on GitHub. For network-related issues, he highlighted a pre-existing container with tools like tcpdump, which simplifies packet inspection without requiring manual installations. Andrii also praised Stern, a CLI tool for tailing logs across multiple pods in a replica set, making it easier to trace requests and identify exceptions.

For developers using Visual Studio Code, Andrii demonstrated remote debugging by configuring a launch.json file to connect to a Kubernetes pod. By exposing a debug port and using tools like Telepresence, developers can intercept cluster traffic and test changes locally, bypassing slow CI/CD cycles. He also highlighted K9s, a terminal-based UI for Kubernetes, with a custom plugin for initiating debug sessions via kubectl debug. These tools collectively enhance efficiency, allowing developers to focus on problem-solving rather than manual configuration.

Best Practices for Proactive Debugging

Andrii concluded with actionable best practices to prevent and address debugging challenges. He stressed embedding version information, like Git commit SHAs, into container images to synchronize codebases during remote debugging. Scaling down traffic to a single pod ensures consistent debugging sessions, avoiding request distribution across replicas. He also advocated for a blameless culture, where developers use debuggers to slow down and analyze issues methodically rather than rushing to fix symptoms.

By sharing his GitHub repository and additional resources, Andrii encouraged attendees to experiment with these techniques. His talk was a compelling call to action for developers to embrace robust debugging practices, ensuring resilience and reliability in Kubernetes environments. Through practical demonstrations and a lighthearted approach, he demystified the complexities of cloud-native debugging, empowering developers to tackle issues with confidence.

Links:

PostHeaderIcon [DevoxxGR2024] Devoxx Greece 2024 Sustainability Chronicles: Innovate Through Green Technology With Kepler and KEDA

At Devoxx Greece 2024, Katie Gamanji, a senior field engineer at Apple and a technical oversight committee member for the Cloud Native Computing Foundation (CNCF), delivered a compelling presentation on advancing environmental sustainability within the cloud-native ecosystem. With Kubernetes celebrating its tenth anniversary, Katie emphasized the urgent need for technologists to integrate green practices into their infrastructure strategies. Her talk explored how tools like Kepler and KEDA’s carbon-aware operator enable practitioners to measure and mitigate carbon emissions, while fostering a vibrant, inclusive community to drive these efforts forward. Drawing from her extensive experience and leadership in the CNCF, Katie provided a roadmap for aligning technological innovation with climate responsibility.

The Imperative of Cloud Sustainability

Katie began by underscoring the critical role of sustainability in the tech sector, particularly given the industry’s contribution to global greenhouse gas emissions. She highlighted that the tech sector accounts for 1.4% of global emissions, a figure that could soar to 10% within a decade without intervention. However, by leveraging renewable energy, emissions could be reduced by up to 80%. International agreements like COP21 and the United Nations’ Sustainable Development Goals (SDGs) have spurred national regulations, compelling organizations to assess and report their carbon footprints. Major cloud providers, such as Google Cloud Platform (GCP), have set ambitious net-zero targets, with GCP already operating on renewable energy since 2022. Yet, Katie stressed that sustainability cannot be outsourced solely to cloud providers; organizations must embed these principles internally.

The emergence of “GreenOps,” inspired by FinOps, encapsulates the processes, tools, and cultural shifts needed to achieve digital sustainability. By optimizing infrastructure—through strategies like using spot instances or serverless architectures—organizations can reduce both costs and emissions. Katie introduced a four-phase strategy proposed by the FinOps Foundation’s Environmental Sustainability Working Group: awareness, discovery, roadmap, and execution. This framework encourages organizations to educate stakeholders, benchmark emissions, implement automated tools, and iteratively pursue ambitious sustainability goals.

Measuring Emissions with Kepler

To address emissions within Kubernetes clusters, Katie introduced Kepler, a CNCF sandbox project developed by Red Hat and IBM. Kepler, a Kubernetes Efficient Power Level Exporter, utilizes eBPF to probe system statistics and export power consumption metrics to Prometheus for visualization in tools like Grafana. Deployed as a daemon set, Kepler collects node- and container-level metrics, focusing on power usage and resource utilization. By tracing CPU performance counters and Linux kernel trace points, it calculates energy consumption in joules, converting this to kilowatt-hours and multiplying by region-specific emission factors for gases like coal, petroleum, and natural gas.

Katie demonstrated Kepler’s practical application using a Grafana dashboard, which displayed emissions per gas and allowed granular analysis by container, day, or namespace. This visibility enables organizations to identify high-emission components, such as during traffic spikes, and optimize accordingly. As a sandbox project, Kepler is gaining momentum, and Katie encouraged attendees to explore it, provide feedback, or contribute to its development, reinforcing its potential to establish a baseline for carbon accounting in cloud-native environments.

Scaling Sustainably with KEDA’s Carbon-Aware Operator

Complementing Kepler’s observational capabilities, Katie introduced KEDA (Kubernetes Event-Driven Autoscaler), a graduated CNCF project, and its carbon-aware operator. KEDA, created by Microsoft and Red Hat, scales applications based on external events, offering a rich catalog of triggers. The carbon-aware operator optimizes emissions by scaling applications according to carbon intensity—grams of CO2 equivalent emitted per kilowatt-hour consumed. In scenarios where infrastructure is powered by renewable sources like solar or wind, carbon intensity approaches zero, allowing for maximum application replicas. Conversely, high carbon intensity, such as from coal-based energy, prompts scaling down to minimize emissions.

Katie illustrated this with a custom resource definition (CRD) that configures scaling behavior based on carbon intensity forecasts from providers like WattTime or Electricity Maps. In her demo, a Grafana dashboard showed an application scaling from 15 replicas at a carbon intensity of 530 to a single replica at 580, dynamically responding to grid data. This proactive approach ensures sustainability is embedded in scheduling decisions, aligning resource usage with environmental impact.

Nurturing a Sustainable Community

Beyond technology, Katie emphasized the pivotal role of the Kubernetes community in driving sustainability. Operating on principles of inclusivity, open governance, and transparency, the community fosters innovation through technical advisory groups (TAGs) focused on domains like observability, security, and environmental sustainability. The TAG Environmental Sustainability, established just over a year ago, aims to benchmark emissions across graduated CNCF projects, raising awareness and encouraging greener practices.

To sustain this momentum, Katie highlighted the need for education and upskilling. Resources like the Kubernetes and Cloud Native Associate (KCNA) certification and her own Cloud Native Fundamentals course on Udacity lower entry barriers for newcomers. By diversifying technical and governing boards, the community can continue to evolve, ensuring it scales alongside technological advancements. Katie’s vision is a cloud-native ecosystem where innovation and sustainability coexist, supported by a nurturing, inclusive community.

Conclusion

Katie Gamanji’s presentation at Devoxx Greece 2024 was a clarion call for technologists to prioritize environmental sustainability. By leveraging tools like Kepler and KEDA’s carbon-aware operator, practitioners can measure and mitigate emissions within Kubernetes clusters, aligning infrastructure with climate goals. Equally important is the community’s role in fostering education, inclusivity, and collaboration to sustain these efforts. Katie’s insights, grounded in her leadership at Apple and the CNCF, offer a blueprint for innovating through green technology while building a resilient, forward-thinking ecosystem.

Links:

PostHeaderIcon [DevoxxBE2023] Securing the Supply Chain for Your Java Applications by Thomas Vitale

At Devoxx Belgium 2023, Thomas Vitale, a software engineer and architect at Systematic, delivered an authoritative session on securing the software supply chain for Java applications. As the author of Cloud Native Spring in Action and a passionate advocate for cloud-native technologies, Thomas provided a comprehensive exploration of securing every stage of the software lifecycle, from source code to deployment. Drawing on the SLSA framework and CNCF research, he demonstrated practical techniques for ensuring integrity, authenticity, and resilience using open-source tools like Gradle, Sigstore, and Kyverno. Through a blend of theoretical insights and live demonstrations, Thomas illuminated the critical importance of supply chain security in today’s threat landscape.

Safeguarding Source Code with Git Signatures

Thomas began by defining the software supply chain as the end-to-end process of delivering software, encompassing code, dependencies, tools, practices, and people. He emphasized the risks at each stage, starting with source code. Using Git as an example, Thomas highlighted its audit trail capabilities but cautioned that commit authorship can be manipulated. In a live demo, he showed how he could impersonate a colleague by altering Git’s username and email, underscoring the need for signed commits. By enforcing signed commits with GPG or SSH keys—or preferably a keyless approach via GitHub’s single sign-on—developers can ensure commit authenticity, establishing a verifiable provenance trail critical for supply chain security.

Managing Dependencies with Software Bills of Materials (SBOMs)

Moving to dependencies, Thomas stressed the importance of knowing exactly what libraries are included in a project, especially given vulnerabilities like Log4j. He introduced Software Bills of Materials (SBOMs) as a standardized inventory of software components, akin to a list of ingredients. Using the CycloneDX plugin for Gradle, Thomas demonstrated generating an SBOM during the build process, which provides precise dependency details, including versions, licenses, and hashes for integrity verification. This approach, integrated into Maven or Gradle, ensures accuracy over post-build scanning tools like Snyk, enabling developers to identify vulnerabilities, check license compliance, and verify component integrity before production.

Thomas further showcased Dependency-Track, an OWASP project, to analyze SBOMs and flag vulnerabilities, such as a critical issue in SnakeYAML. He introduced the Vulnerability Exploitability Exchange (VEX) standard, which complements SBOMs by documenting whether vulnerabilities affect an application. In his demo, Thomas marked a SnakeYAML vulnerability as a false positive due to Spring Boot’s safe deserialization, demonstrating how VEX communicates security decisions to stakeholders, reducing unnecessary alerts and ensuring compliance with emerging regulations.

Building Secure Artifacts with Reproducible Builds

The build phase, Thomas explained, is another critical juncture for security. Using Spring Boot as an example, he outlined three packaging methods: JAR files, native executables, and container images. He critiqued Dockerfiles for introducing non-determinism and maintenance overhead, advocating for Cloud Native Buildpacks as a reproducible, secure alternative. In a demo, Thomas built a container image with Buildpacks, highlighting its fixed creation timestamp (January 1, 1980) to ensure identical outputs for unchanged inputs, enhancing security by eliminating variability. This reproducibility, coupled with SBOM generation during the build, ensures artifacts are both secure and traceable.

Signing and Verifying Artifacts with SLSA

To ensure artifact integrity, Thomas introduced the SLSA framework, which provides guidelines for securing software artifacts across the supply chain. He demonstrated signing container images with Sigstore’s Cosign tool, using a keyless approach to avoid managing private keys. This process, integrated into a GitHub Actions pipeline, ensures that artifacts are authentically linked to their creator. Thomas further showcased SLSA’s provenance generation, which documents the artifact’s origin, including the Git commit hash and build steps. By achieving SLSA Level 3, his pipeline provided non-falsifiable provenance, ensuring traceability from source code to deployment.

Securing Deployments with Policy Enforcement

The final stage, deployment, requires validating artifacts to ensure they meet security standards. Thomas demonstrated using Cosign and the SLSA Verifier to validate signatures and provenance, ensuring only trusted artifacts are deployed. On Kubernetes, he introduced Kyverno, a policy engine that enforces signature and provenance checks, automatically rejecting non-compliant deployments. This approach ensures that production environments remain secure, aligning with the principle of validating metadata to prevent unauthorized or tampered artifacts from running.

Conclusion: A Holistic Approach to Supply Chain Security

Thomas’s session at Devoxx Belgium 2023 provided a robust framework for securing Java application supply chains. By addressing source code integrity, dependency management, build reproducibility, artifact signing, and deployment validation, he offered a comprehensive strategy to mitigate risks. His practical demonstrations, grounded in open-source tools and standards like SLSA and VEX, empowered developers to adopt these practices without overwhelming complexity. Thomas’s emphasis on asking “why” at each step encouraged attendees to tailor security measures to their context, ensuring both compliance and resilience in an increasingly regulated landscape.

Links:

PostHeaderIcon [DevoxxPL2022] How We Migrate Customers and Internal Teams to Kubernetes • Piotr Bochyński

At Devoxx Poland 2022, Piotr Bochyński, a seasoned cloud native expert at SAP, shared a compelling narrative on transitioning customers and internal teams from a Cloud Foundry-based platform to Kubernetes. His presentation illuminated the strategic imperatives, technical challenges, and practical solutions that defined SAP’s journey toward a multi-cloud Kubernetes ecosystem. By leveraging open-source projects like Kyma and Gardener, Piotr’s team addressed the limitations of their legacy platform, fostering developer productivity and operational scalability. His insights offer valuable lessons for organizations contemplating a similar migration.

Understanding Platform as a Service

Piotr began by contextualizing Platform as a Service (PaaS), a model that abstracts infrastructure complexities, allowing developers to focus on application development. Unlike Infrastructure as a Service (IaaS), which provides raw virtual machines, PaaS delivers managed runtimes, middleware, and automation, accelerating time-to-market. However, this convenience comes with trade-offs, such as reduced control and potential vendor lock-in, often tied to opinionated frameworks like the 12-factor application methodology. Piotr highlighted SAP’s initial adoption of Cloud Foundry, an open-source PaaS, to avoid vendor dependency while meeting multi-cloud requirements driven by legal and business needs, particularly in sectors like banking. Yet, Cloud Foundry’s constraints, such as single HTTP port exposure and reliance on outdated technologies like BOSH, prompted SAP to explore Kubernetes as a more flexible alternative.

Kubernetes: A Platform for Platforms

Kubernetes, as Piotr elucidated, is not a traditional PaaS but a container orchestration framework that serves as a foundation for building custom platforms. Its declarative API and extensibility distinguish it from predecessors, enabling consistent management of diverse resources like deployments, namespaces, and custom objects. Piotr illustrated this with the thermostat analogy: developers declare a desired state (e.g., 22 degrees), and Kubernetes controllers reconcile the actual state to match it. This pattern, applied uniformly across resources, empowers developers to extend Kubernetes with custom controllers, such as a hypothetical thermostat resource. The Kyma project, an open-source initiative led by SAP, builds on this extensibility, providing opinionated building blocks like Istio-based API gateways, NATS eventing, and serverless functions to bridge the gap between raw Kubernetes and a developer-friendly PaaS.

Overcoming Migration Challenges

The migration to Kubernetes presented multifaceted challenges, from technical complexity to cultural adoption. Piotr emphasized the steep learning curve associated with Kubernetes’ vast resource set, compounded by additional components like Prometheus and Istio. To mitigate this, SAP employed Kyma to abstract complexities, offering simplified resources like API rules that encapsulate Istio configurations for secure service exposure. Another hurdle was ensuring multi-cloud compatibility. SAP’s Gardener project, a managed Kubernetes solution, addressed this by providing a consistent, Kubernetes-compliant layer across providers like AWS, Azure, and Google Cloud. Piotr also discussed operational scalability, managing thousands of clusters for hundreds of teams. By applying the Kubernetes controller pattern, SAP automated cluster provisioning, upgrades, and security patching, reducing manual intervention and ensuring reliability.

Lessons from the Journey

Reflecting on the migration, Piotr candidly shared missteps that shaped SAP’s approach. Early attempts to shield users from Kubernetes’ complexity by mimicking Cloud Foundry’s API failed, as developers craved direct control over Kubernetes resources. Similarly, restricting cluster admin roles to prevent misconfigurations stifled innovation, leading SAP to grant greater flexibility. Some technology choices, like the Service Catalog project, proved inefficient, underscoring the importance of aligning with Kubernetes’ operator pattern. License changes in tools like Grafana also necessitated pivots, highlighting the need for vigilance in open-source dependencies. Piotr’s takeaways resonate broadly: Kubernetes is a long-term investment, requiring a balance of opinionated tooling and developer freedom, with automation as a cornerstone for scalability.

Links: