Jonathan Lalou's Blog

Posts Tagged ‘AWS’

[AWSReInvent2025] Modern Secrets Management: Advancing from Traditional Practices to Security Frameworks Prepared for Artificial Intelligence

Lecturers

Resh Desai, Zach Miller, and Jake Farrell presented this session. Resh Desai works as a solutions architect at Amazon Web Services, driving forward developments in secrets management. Zach Miller is a Senior Worldwide Security Specialist Solutions Architect at AWS, specializing in cryptography, keys, secrets, and certificates. Jake Farrell serves as Senior Director of Engineering at Acquia, which provides open digital experience platforms.

Abstract

The presentation sheds light on the evolution of secrets management, highlighting AWS Secrets Manager as a central tool for handling the complete lifecycle of sensitive credentials. It weighs the advantages and drawbacks of centralized versus decentralized approaches, outlines key capabilities like encryption, automated rotation, cross-region replication, and high-volume retrieval, and details Acquia’s comprehensive migration efforts. In addition, it explores strategies for multi-tenant separation, patterns for Kubernetes integration, future synergies with agentic AI, and the latest service improvements that support third-party rotations and easier container-based deployments.

Core Functionalities of AWS Secrets Manager

AWS Secrets Manager provides a purpose-built service dedicated to managing the entire lifecycle of application secrets, database credentials, and API keys, setting it apart from IAM for identity management or KMS for cryptographic operations. By design, every secret undergoes envelope encryption with AWS-managed KMS keys, though users can opt for customer-managed keys to support scenarios such as cross-account sharing.

This setup integrates smoothly with CloudTrail to deliver thorough auditing of all actions, from creation and modification to deletion. Automation through Lambda enables rotation schedules that align precisely with enterprise policies, whether set at 30 or 90 days. For resilience, multi-region replication ensures secrets remain available during regional failovers. The service handles up to 10,000 transactions per second for retrieval, further enhanced by an open-source agent that implements caching with configurable time-to-live periods, thereby improving both efficiency and the overall developer experience.

Together, these features create a secure and traceable environment that integrates seamlessly with the wider AWS security landscape.

Navigating Centralized and Decentralized Deployment Choices

When designing secrets storage, architects must decide between consolidating secrets in a single dedicated account or distributing them closer to the applications that consume them. Centralized configurations often resonate with organizations in regulated sectors, as they allow for standardized practices in naming, tagging, and permission enforcement—typically achieved through enforced CI/CD pipelines or bespoke abstraction layers. Such consistency bolsters monitoring and control across the enterprise, although it requires significant initial investment in development and can introduce latency when adopting newly released capabilities.

On the other hand, a decentralized model empowers individual application teams to manage secrets directly via consoles or SDKs, offering greater adaptability to unique requirements. This approach streamlines onboarding and accommodates specialized needs more naturally, but it calls for robust supplementary governance to ensure alignment with broader standards.

In practice, the ideal configuration depends on factors like secret creation processes, ongoing management, replication demands, access patterns, and visibility needs, reflecting insights gathered from diverse customer experiences rather than a one-size-fits-all rule.

Acquia’s Migration Experience and Multi-Tenant Architecture

Acquia maintains oversight of over 300,000 distinct secret paths distributed across multiple AWS accounts, supporting millions of daily ephemeral pod instances and tens of thousands of hourly API interactions. Moving away from older systems required careful categorization of secrets into groups such as customer-supplied elements (including third-party tokens and environment variables), internal service communications, and emerging hybrid forms suited to AI agents.

To manage this complexity, Acquia developed a custom fronting API that applies type-specific rules for validation, scoping, and lifecycle policies, such as mandatory rotation or timed expiry. Rigorous least-privilege principles ensure complete separation between platform operations and customer data. For delivery into runtime environments, the organization relies on open-source components like the External Secrets Operator combined with AWS CSI drivers, which synchronize and inject secrets into Kubernetes as variables, configuration templates, or command-line flags. Strategic caching layers further reduce direct API calls, delivering noticeable gains in speed and expense control.

Through this disciplined, layered framework, Acquia achieves robust multi-tenancy while addressing gaps that IAM alone cannot fully cover in interconnected service scenarios.

Future Directions in Agentic AI Collaboration

Looking ahead, Acquia’s designs feature an AI gateway that provides a unified point for observing model invocations routed through Amazon Bedrock, complemented by a standardized factory for quickly provisioning secure agents. By embedding Secrets Manager deeply, the platform enables on-demand injection of properly scoped credentials, allowing smooth evolution alongside emerging AI features without compromising protective measures.

This ongoing partnership with AWS has yielded tangible benefits in operational streamlining, lower maintenance burdens, and enhanced overall performance.

Latest Service Developments and Their Wider Impact

Innovations continue to simplify adoption in container environments, with EKS add-ons now automating the installation and configuration of CSI drivers. The introduction of managed external secrets brings one-click rotation capabilities to external providers like Salesforce, removing the need for custom scripting and eliminating risks of desynchronization.

Native integrations now span more than 55 AWS services, making secret management largely invisible to end users. These progresses reduce entry barriers to advanced security practices, enabling teams to concentrate on innovation even as autonomous systems increase demands on privilege management.

In essence, effective secrets governance forms the bedrock of durable, expandable systems vital for both current operations and forthcoming intelligent workloads.

Links:

Posted in en-US | Tags: Acquia, AgenticAI, AutomatedRotation, AWS, AWSReInvent2025, AWSSecretsManager, CloudSecurity, DataProtection, KubernetesIntegration, MultiTenantSecurity, SecretsManagement | No Comments »

[AWSReInvent2025] Supercharging DevOps with AI-Driven Observability: The Next Frontier in SRE

Author: Jonathan Lalou

Lecturer

Elizabeth Fuentes is a Senior Developer Advocate at Amazon Web Services (AWS), specializing in the intersection of Artificial Intelligence and DevOps practices. With extensive experience in cloud architecture and software engineering, Elizabeth focuses on how Generative AI can streamline complex CI/CD pipelines and enhance Site Reliability Engineering (SRE). She is a key contributor to AWS educational initiatives, having co-developed advanced courses on AI-driven automation. Joining her is Laas Alina, a software architect and open-source enthusiast who focuses on implementing multi-agent systems and the Model Context Protocol (MCP) to solve observability challenges at scale.

Abstract

As software systems grow increasingly distributed and complex, traditional observability—centered on manual log analysis and reactive dashboards—is becoming insufficient. This article explores the paradigm shift toward AI-driven observability, where Generative AI serves not just as a query tool, but as an active participant in failure detection, correlation, and resolution. By leveraging Amazon Bedrock and Amazon Q, organizations can transition from “reactive” to “predictive” DevOps. The discussion analyzes the methodology of building AI agents that simulate architectural stress, automatically explain multi-layered failures, and provide traceable, actionable recommendations. We examine the implementation of the Model Context Protocol (MCP) in establishing sophisticated multi-agent systems (MAS) that transform raw data into contextual understanding, ultimately reducing the Mean Time to Resolution (MTTR) and enhancing systemic resilience.

The Evolution of Observability: From Metrics to Contextual Understanding

The traditional pillars of observability—metrics, logs, and traces—provide the “what” of a system’s state but often fail to provide the “why” in real-time. In high-velocity DevOps environments, the sheer volume of telemetry data can overwhelm human operators, leading to “alert fatigue” and delayed responses to critical incidents. Elizabeth posits that the integration of Generative AI marks the fourth pillar of observability: Contextual Intelligence. This evolution moves the industry beyond simple threshold-based monitoring toward systems that understand the semantic relationship between a failed deployment, a spike in latency, and a specific line of code.

By utilizing Large Language Models (LLMs) through Amazon Bedrock, DevOps teams can ingest vast amounts of unstructured log data and receive summaries that highlight anomalies that might be missed by traditional regex-based filters. The methodology involves training the AI to recognize “normal” operational patterns and identifying deviations not just by value, but by the intent of the system’s behavior. This contextual layer allows for a more nuanced interpretation of system health, where the AI can distinguish between a benign resource spike and a precursor to a cascading failure.

Architecting AI Agents for Predictive Troubleshooting

The transition to AI-driven observability is characterized by the deployment of “Micro-agents”—specialized AI entities designed to handle specific segments of the DevOps lifecycle. These agents operate within a Multi-Agent System (MAS), where they collaborate to solve complex incidents. For instance, a “Monitoring Agent” might detect a performance degradation and immediately trigger a “Diagnosis Agent” to correlate the event with recent CI/CD pipeline changes.

Elizabeth and Laas Alina emphasize the importance of the Model Context Protocol (MCP) in this architecture. MCP acts as the communication backbone, allowing agents to share context without losing the “lineage” of a decision. When an AI agent recommends a specific architectural change or a rollback, it must provide clear traceability. This is crucial for maintaining trust in automated systems. The agents do not operate in a vacuum; they interact with tools like Amazon Q to provide developers with instant explanations of failures directly within their Integrated Development Environment (IDE) or chat interface.

// Example of an AI-driven Observability Agent Configuration
agent:
  name: "IncidentDiagnosticAgent"
  provider: "AmazonBedrock"
  model: "claude-3-sonnet"
  capabilities:
    - log_analysis
    - metric_correlation
    - trace_summarization
  mcp_config:
    protocol_version: "1.0"
    shared_context: "deployment_metadata"
  safety_guardrails:
    - max_token_usage: 4000
    - human_in_the_loop_required: true

Transforming CI/CD through Generative AI and Simulation

Beyond reactive troubleshooting, AI-driven observability empowers proactive system design. One of the most innovative concepts discussed is the use of AI agents to simulate “stress-test” scenarios within a digital twin of the production environment. These agents can intentionally inject failures—similar to Chaos Engineering—and then observe how the observability stack responds. This creates a feedback loop where the AI helps engineers identify “blind spots” in their monitoring before a real incident occurs.

Furthermore, Generative AI transforms the CI/CD pipeline by automatically generating “failure explanations.” Instead of a developer sifting through a 5,000-line build log, Amazon Q can provide a concise summary: “The build failed because the new database schema in commit X is incompatible with the connection pool settings in environment Y.” This level of automated insight accelerates the “inner loop” of development, allowing engineers to focus on innovation rather than infrastructure archeology.

The Human-AI Partnership: Strategic Implications

A common concern in the industry is the replacement of human engineers by AI. However, Elizabeth argues that the future belongs to the “augmented engineer.” AI is a force multiplier that automates the repetitive, “drudge work” of observability—log parsing and initial triage—allowing human experts to focus on high-level strategy and complex architectural decisions. The goal is to transform teams from being “reactive” (fighting fires) to “proactive” (preventing fires).

Implementing these systems requires a cultural shift toward AI-literacy within DevOps teams. Organizations must establish safety guardrails to ensure that AI-driven recommendations are validated and that automated actions (like auto-remediation) have clear rollback paths. By embracing AI as a strategic tool, DevOps and SRE teams can achieve a level of operational excellence that was previously unattainable, ensuring that as systems grow in scale, their reliability grows in parallel.

Links:

Posted in en-US | Tags: AI, AmazonBedrock, AmazonQ, Automation, AWS, AWSReInvent2025, CloudComputing, devops, ElizabethFuentes, GenerativeAI, LaasAlina, Observability, SRE | No Comments »

[AWSReInventPartnerSessions2024] Data Mesh at Moderna: One dbt to Unify Data and People (DAT206)

Author: Jonathan Lalou

Lecturer

Connor McArthur co-founded dbt Labs, where he contributes to developing workflows for data transformations inspired by software engineering principles. With over a decade in engineering leadership, Connor focuses on metadata-driven analytics to enhance governance and development speed. Sri Kamireddy leads data initiatives at Moderna, overseeing the integration of diverse data platforms to support organizational goals in biotechnology.

Abstract

This detailed review investigates Moderna’s adoption of dbt Cloud to construct a unified data mesh architecture, integrating disparate data systems for enhanced coherence and efficiency. It scrutinizes the contextual demands of multi-platform environments, methodological use of cross-platform dbt mesh, and implications for data governance, engineering workflows, and business outcomes in a high-stakes industry.

Contextual Demands in Multi-Platform Data Environments

Organizations like Moderna operate in complex data landscapes, often employing multiple warehouses due to acquisitions, team preferences, or specialized needs. This diversity, while beneficial for tailored solutions, introduces fragmentation, complicating integration and governance. Moderna’s setup includes Amazon EMR, Spark, Redshift, Athena, and Glue, reflecting a hybrid approach to handle vast datasets from supply chain, manufacturing, and shipments.

The challenge lies in unifying these without duplicating data or losing lineage, which could delay insights critical for operations like vaccine distribution. dbt addresses this by providing an opinionated workflow based on software development cycles, generating active metadata that informs a data control plane. This plane centralizes building, deploying, orchestrating, observing, and cataloging analytics stacks.

Methodological Application of Cross-Platform dbt Mesh

dbt mesh enables seamless connectivity across platforms using Iceberg for interoperability. At Moderna, this methodology streamlined combining supply chain data from a data lake (Athena) with manufacturing data (Redshift). By setting project statuses to public within the dbt environment, models from one platform reference others, preserving end-to-end lineage.

A custom SDK wrapper enforces data quality checks and metadata inclusion during model development, ensuring governance without stifling domain-driven engineering. Lake Formation tags maintain access controls, preventing silos.

Code sample for referencing models across projects in dbt:

sources:
  - name: athena_project
    schema: athena_schema
    tables:
      - name: supply_chain_data

models:
  - name: unified_product
    config:
      materialized: table
    sql: |
      SELECT *
      FROM {{ ref('athena_project', 'supply_chain_data') }}
      JOIN redshift_manufacturing ON ...

This demonstrates methodological simplicity in unifying data flows.

Implications for Governance and Operational Efficiency

The approach reduces engineering workloads by eliminating custom scripts, allowing focus on value-added tasks. Enhanced lineage aids business users in tracing metrics origins, fostering trust and faster decision-making.

In biotechnology, where timely insights impact global health, this efficiency is crucial. Scalable infrastructure with controlled costs supports Moderna’s data-driven culture, emphasizing strong platforms, governance, and security.

In summary, dbt mesh at Moderna exemplifies how unified tools bridge platform divides, promoting cohesive data estates that drive innovation and reliability.

Links:

Posted in en-US | Tags: Analytics, AWS, AWSReInventPartnerSessions2024, ConnorMcArthur, DataIntegration, DataMesh, dbtCloud, dbtLabs, Governance, Moderna, SriKamireddy | No Comments »

[AWSReInvent2025] High-Performance Storage Architectures for AI/ML, Analytics, and HPC Workloads

Author: Jonathan Lalou

Lecturer

Aditi is a Senior Product Manager for Amazon FSx at Amazon Web Services (AWS). With years of experience working directly with customers on high-performance workloads, she focuses on pushing the technical boundaries of what is possible with cloud storage to meet the demands of modern compute-intensive applications.

Abstract

This article examines the critical role of high-performance storage in supporting modern AI/ML, analytics, and High-Performance Computing (HPC) workloads. As organizations scale their compute resources—incorporating hundreds or thousands of CPU and GPU cores—storage often becomes the primary bottleneck, preventing linear performance scaling. We explore the technical architectures of Amazon FSx and Amazon S3, focusing on how these services address the needs of both “lift-and-shift” file-based applications and “cloud-native” S3-based data lakes. By analyzing customer use cases in genomics, media rendering, and large language model (LLM) training, we detail the methodologies for achieving peak performance at scale.

The Storage Bottleneck in Compute-Intensive Workloads

Modern high-performance workloads are characterized by their extreme reliance on massive datasets and high-core-count compute clusters. In an ideal cloud environment, adding more compute resources should lead to a proportional increase in work completed—a concept known as linear scaling. However, traditional storage solutions often fail to keep pace with the throughput demands of these clusters, leading to a performance plateau.

When storage becomes the bottleneck, compute instances sit underutilized as they compete for access to the same data store. This is particularly detrimental given that 90% to 95% of the expenditure for these workloads is typically allocated to compute resources. Consequently, an inefficient storage layer not only extends the time to insight but also significantly increases the total cost of ownership (TCO). To avoid this, storage must be architected to scale linearly alongside compute.

Navigating the Path to the Cloud: File Systems vs. Object Storage

Organizations generally approach high-performance storage on AWS from two distinct backgrounds: those with long-standing on-premises file-based workflows and those who have built native cloud applications around object storage.

The Persistence of File-Based Architectures

Despite the rise of object storage, file systems remain the preferred interface for many researchers and developers due to three primary factors: Familiar Interface: The intuitive nature of files and directories simplifies complex data management for data scientists and developers.
* Granular Permissions: File systems provide robust POSIX permissions, allowing for fine-grained control over which users can read, write, or execute specific files.
* Consistent Data Access:* For workloads where multiple users or compute nodes access the same data simultaneously, the strong consistency of file systems ensures that all parties see the most recent data updates.

Amazon FSx for High-Performance File Access

Amazon FSx addresses these needs by providing fully managed file systems that offer the performance of local storage with the scalability of the cloud. For “lift-and-shift” scenarios, FSx allows organizations to move their existing HPC and AI/ML pipelines to AWS without refactoring their applications.

Accelerating Generative AI and ML Workloads

The emergence of generative AI has placed a renewed emphasis on data strategy. Whether an organization is building a model from scratch or fine-tuning a foundational model, the quality and accessibility of its proprietary data are the primary differentiators.

Retrieval Augmented Generation (RAG)

To move beyond generic AI responses and reduce hallucinations, many organizations are implementing Retrieval Augmented Generation (RAG). RAG allows foundational models to access evolving, large-scale data lakes without requiring the data to be manually loaded into a prompt.

The RAG methodology involves:
1. Vectorization: Converting organizational data into vectors—numeric representations that capture semantic meaning.
2. Semantic Search: Using spatial similarity to compare a query vector against the data lake’s vectors to find the most relevant information.
3. Augmentation: Feeding the retrieved context back into the model to generate a more accurate and business-specific response.

Ingestion and Data Strategy with Amazon S3

Amazon S3 serves as the foundational data lake for these AI workflows due to its cost-effectiveness and virtually unlimited scalability. Organizations typically utilize two ingestion patterns:
* Batch Ingestion: Suitable for static or infrequently changing data such as historical records and product catalogs.
* Real-Time Ingestion: Essential for agentic workflows where AI models must respond to the latest available information.

Modernizing Self-Managed Databases with Amazon FSx

While fully managed services like Amazon RDS are popular, certain business and technical requirements drive organizations toward self-managed database architectures on AWS.

Drivers for Self-Managed Databases

Organizations choose to self-manage databases like Oracle, SQL Server, or SAP HANA for several reasons:
* Granular Control: The ability to choose specific versions of the database engine and the underlying operating system.
* Custom Protection Policies: Implementing specific backup intervals and recovery procedures that may not be available in managed services.
* High Resilience: Scaling databases across multiple Availability Zones or regions with custom failover configurations.

Optimization through Storage Features

A common oversight in database deployment is the potential for the storage layer to add significant value beyond simple data persistence. Amazon FSx file systems (including FSx for NetApp ONTAP, OpenZFS, and Windows File Server) enable features like:
* Snapshots and Cloning: Facilitating rapid testing and database upgrades by creating near-instantaneous copies of production environments.
* Performance Tuning: Choosing the right FSx service can significantly optimize the TCO and performance of database environments, particularly for high-transaction workloads.

Conclusion

As compute power continues to expand, the storage layer must evolve from a passive repository into a high-performance engine. By leveraging Amazon FSx and S3, organizations can eliminate storage bottlenecks, enabling their most demanding AI, HPC, and database workloads to scale linearly and cost-effectively in the cloud.

Links:

Posted in en-US | Tags: AaronDaly, Aditi, AmazonFSx, AmazonS3, AWS, AWSreInvent, AWSReInvent2025, CloudComputing, CloudStorage, Databases, GenAI, HPC, Jim, JordanDolman, MachineLearning, MonicaVeahore, RAG | No Comments »

[AWSReInvent2025] Advancements in AWS Infrastructure as Code: A Comprehensive Year-in-Review of CloudFormation and CDK Innovations

Author: Jonathan Lalou

Lecturer

The session is delivered by product managers from Amazon Web Services who oversee the development and roadmap of AWS CloudFormation and the AWS Cloud Development Kit.

Abstract

This article provides an exhaustive and detailed retrospective on the notable progress achieved throughout the past year in AWS infrastructure as code services, with particular emphasis on both AWS CloudFormation and the AWS Cloud Development Kit (CDK). It meticulously examines a range of enhancements, including improved validation mechanisms, clearer error diagnostics, expanded construct libraries, seamless integration with artificial intelligence assistance through Model Context Protocol servers, and advanced troubleshooting utilities. The discussion analyzes how these collective innovations substantially elevate deployment reliability, enhance developer productivity, and introduce greater intelligence into infrastructure management practices for organizations of all scales.

The Critical and Enduring Role of Infrastructure as Code in Modern Cloud Architectures

Infrastructure as code has firmly established itself as an indispensable discipline for enterprises striving to achieve consistency, traceability, and accelerated iteration in their cloud operations. AWS CloudFormation offers a robust declarative approach, allowing practitioners to define resources through structured templates in JSON or YAML formats, thereby guaranteeing identical provisioning outcomes across development, staging, and production environments.

Complementing this, the AWS Cloud Development Kit empowers developers with programmatic flexibility, enabling infrastructure definition in familiar programming languages while automatically generating underlying CloudFormation templates. This duality accommodates diverse team preferences and skill sets.

The advancements introduced over the year have strategically bridged these paradigms, delivering unified capabilities that address contemporary challenges related to scale, complexity, and the evolving demands of developer experience in dynamic cloud ecosystems.

Significant Refinements Enhancing AWS CloudFormation Reliability and Practitioner Usability

AWS CloudFormation has benefited from meaningful improvements in change set validation processes, enhanced clarity in error messaging, and more intuitive management of deployment workflows. These refinements work collectively to substantially reduce the frequency of failed deployments by surfacing potential conflicts, resource constraints, or configuration incompatibilities earlier in the provisioning lifecycle.

Furthermore, the introduction of server-side APIs now enables programmatic pre-validation of proposed changes, allowing integration into continuous integration pipelines for automated safeguards that prevent runtime disruptions and promote greater confidence in infrastructure updates.

Substantial Growth and Maturation Within the AWS Cloud Development Kit Ecosystem

The AWS Cloud Development Kit has experienced considerable expansion in supported programming languages and the availability of high-level constructs. Numerous libraries, both community-contributed and AWS-maintained, have progressed from experimental developer preview stages to full general availability, covering an extensive array of common architectural patterns across networking, security, serverless computing, and data processing domains.

This maturation process provides developers with higher-level abstractions that encapsulate established best practices, thereby significantly reducing the amount of boilerplate code required and promoting greater architectural consistency across distributed teams.

Transformative Integration of Artificial Intelligence Assistance Through Model Context Protocol Servers

One of the most pivotal innovations involves the creation of specialized Model Context Protocol servers tailored specifically for CDK and CloudFormation contexts. These servers curate and expose AWS-specific expertise—including recommended practices, construct libraries at various maturity levels, and detailed cloud context information—directly to artificial intelligence-powered coding assistants.

As a result, developers receive highly contextually relevant suggestions that align precisely with AWS service conventions and idioms, dramatically accelerating the creation of secure, efficient, and idiomatic implementations while substantially lowering the cognitive burden associated with recalling intricate service details.

Strengthening Troubleshooting and Validation Tooling for Proactive Issue Resolution

New diagnostic capabilities encompass server-side APIs designed for interrogating deployment states and identifying root causes of issues, complemented by local static analysis utilities that perform early detection of syntax errors within CDK source code.

These tools operate across both programmatic CDK definitions and the generated CloudFormation templates, enabling practitioners to identify and resolve configuration problems well before they manifest during actual deployments.

Community-Driven Construct Libraries and Enhanced Cloud Context Integration

The ecosystem continues to benefit from active contributions spanning AWS internal teams and external community participants, with constructs systematically progressing through alpha evaluation and eventual general availability phases.

Additional cloud context features further enrich artificial intelligence interactions by providing service-specific insights and recommendations.

Practitioners are strongly encouraged to explore dedicated workshops that offer guided paths for understanding and implementing MCP server integration in real-world scenarios.

Measurable Organizational Benefits and Strategic Adoption Considerations

These multifaceted improvements collectively lower entry barriers for effective infrastructure management while delivering tangible advantages. Development teams realize enhanced confidence in deployment outcomes, accelerated onboarding for new members, and improved adherence to evolving architectural standards across projects.

The incorporation of artificial intelligence guidance represents a fundamental paradigm shift toward more intelligent, assisted development experiences that amplify human expertise rather than seeking to replace it.

Looking Toward the Future of Intelligent Infrastructure Orchestration

Continued investment in these areas clearly signals an ongoing commitment to deepening the convergence between programmatic expressiveness and declarative safety, increasingly augmented by artificial intelligence capabilities that guide practitioners toward optimal architectural outcomes.

Organizations that fully leverage these evolving tools position themselves advantageously for sustained operational excellence amid the accelerating complexity of modern cloud environments.

Links:

Lecture Video

Posted in en-US | Tags: AWS, AWSCDK, AWSReInvent2025, CloudFormation, IaC, MCPIntegration, reInvent2025 | No Comments »

[AWSReInvent2025] Basketball’s AI Revolution: How AWS and the NBA Are Changing the Game

Author: Jonathan Lalou

Lecturer

Chris Benyarko is Executive Vice President of Direct-to-Consumer at the NBA, overseeing fan engagement and digital strategies. Andy Oh serves as Principal of Live Sports Events at Prime Video, leading NBA broadcasting partnerships. Kristen Schaff is Global Director of Sports Partnerships at AWS, managing collaborations across major leagues. Relevant links include Chris Benyarko’s LinkedIn profile (https://www.linkedin.com/in/chris-benyarko-/) and Kristen Schaff’s LinkedIn profile (https://www.linkedin.com/in/kristen-schaff/).

Abstract

This article investigates the NBA’s digital transformation via AWS, focusing on AI-driven analytics, fan personalization, and broadcasting innovations. It analyzes partnerships enhancing game strategies, viewer experiences, and global engagement, with implications for sports technology scalability.

The NBA-AWS Partnership: Shared Vision and Technological Foundations

The NBA’s strategic alliance with AWS, formally unveiled on October 1st, is rooted in a mutual commitment to innovation and an unwavering focus on fan experiences. Chris Benyarko emphasizes that this partnership transcends mere technology provision, positioning AWS as a true collaborator in advancing the league’s goals. At its foundation lies a shared philosophy: while the NBA prioritizes fan and future fan obsession, AWS brings its renowned customer-centric approach, creating a synergy that amplifies their joint efforts. This alignment enables the league to harness AWS’s robust infrastructure for seamless integration across various operations, ultimately accelerating the pace of technological advancements.

In the broader context of basketball’s ongoing evolution, the need for sophisticated, data-driven solutions has never been more pressing. AWS offers a scalable cloud platform that excels in handling complex analytics, artificial intelligence, and machine learning tasks, converting vast amounts of raw data into meaningful insights that inform decision-making at every level. Kristen Schaff highlights what drew AWS to the NBA, pointing out the league’s dynamic, fast-paced nature and its abundance of data as ideal attributes that align perfectly with AWS’s technological strengths. From player performance tracking to predictive modeling, this collaboration leverages AWS’s tools to address the unique demands of professional sports.

The methodology underpinning this partnership involves a comprehensive migration of workflows to AWS services, ensuring low-latency streaming and personalized content delivery that reaches audiences worldwide. By combining the NBA’s deep domain knowledge with AWS’s technical prowess, the alliance not only enhances current offerings but also paves the way for future innovations that could redefine the sport.

AI and Analytics Transforming Gameplay and Strategy

Artificial intelligence is at the forefront of reshaping basketball analytics, influencing everything from individual player development to collective team strategies during games. Chris Benyarko delves into the capabilities of Second Spectrum’s optical tracking system, which deploys 29 cameras in each arena to capture an astonishing 100 million data points per night. These metrics encompass detailed aspects such as player speed, defensive positioning, and shot quality, providing coaches and analysts with granular information that was previously unattainable.

AWS plays a pivotal role in this transformation by powering machine learning models that forecast game outcomes and simulate various scenarios, thereby assisting coaches in refining their tactics. The implications are significant, as teams can now gain substantial competitive advantages through data-informed decisions, while fans benefit from enriched content on platforms like NBA League Pass, including automated highlight reels that capture the most thrilling moments. Andy Oh complements this by describing how Prime Video integrates AWS for real-time statistical overlays, which add layers of depth to the viewing experience and foster greater immersion.

Nevertheless, challenges such as data latency persist, and the partnership addresses these through continuous infrastructure optimizations, ensuring that the flow of information remains timely and reliable.

Enhancing Fan Engagement Through Personalization

Personalization has emerged as a key driver in elevating fan engagement, utilizing AI to deliver content that resonates on an individual level. Chris Benyarko explains the progression of NBA League Pass, which now employs AI to generate highlights in multiple languages, offer alternate viewing streams focused on specific players, and provide predictive elements like real-time win probabilities. These features not only cater to diverse global audiences but also deepen the connection between fans and the game.

AWS’s extensive global network facilitates this by guaranteeing low-latency delivery to over 200 countries, making high-quality experiences accessible regardless of location. Kristen Schaff underscores the importance of data privacy within these personalization efforts, ensuring that the NBA’s fan-first principles are upheld through secure, unified data management practices.

An analysis of this approach reveals its potential to shift traditional passive spectatorship toward more interactive and tailored interactions, which in turn boosts viewer retention and opens new avenues for monetization through precisely targeted advertising.

Broadcasting Innovations and Latency Challenges

Prime Video’s integration of NBA content exemplifies how AWS enables groundbreaking broadcasting innovations. Andy Oh outlines the process of capturing feeds directly from arenas and minimizing transmission hops to achieve near-real-time delivery, a critical factor especially for integrations involving live betting.

Among the notable advancements is AI-generated commentary available in various languages, powered by AWS Bedrock for natural and accurate translations. The broader implications extend to democratizing access to premium content, thereby expanding the NBA’s global footprint and attracting new demographics. However, the persistent challenge of avoiding spoilers drives an ongoing emphasis on latency reduction, with AWS tools providing the means for continuous monitoring and swift adjustments to maintain optimal performance.

Implications for Sports and Broader Industries

The NBA-AWS partnership offers valuable insights that transcend the realm of sports, demonstrating the power of real-time data platforms, personalized content delivery, and AI in production environments. Chris Benyarko envisions extending these technologies to non-professional leagues, potentially increasing participation by making advanced analytics more widely available.

Looking ahead, AI could further innovate by predicting injuries or optimizing training regimens, fundamentally altering athletic preparation and performance. These developments not only enhance the sport but also provide scalable models applicable to other industries seeking to leverage data for competitive advantage.

Conclusion

The synergy between AWS and the NBA vividly illustrates the transformative potential of AI in sports. By enhancing analytics, personalization, and broadcasting through advanced cloud technologies, this collaboration redefines fan engagement and sets a precedent for innovation across various sectors.

Links:

https://www.youtube.com/watch?v=pZczwGVzWxo
https://www.linkedin.com/in/chris-benyarko-/
https://www.linkedin.com/in/kristen-schaff/

Posted in en-US | Tags: AIRevolution, AWS, AWSReInvent2025, Basketball, FanEngagement, GlobalStreaming, LatencyOptimization, NBA, Personalization, PrimeVideo, SportsAnalytics | No Comments »

[AWSReInventPartnerSessions2024] Usage

Author: Jonathan Lalou

spec = “Sort a list of numbers”
code = generate_code(spec)
tests = [([3, 1, 2], [1, 2, 3]), ([5, 4], [4, 5])]
if test_code(code, tests):
print(“Code passes tests”)
“`

This exemplifies the iterative process of generation and validation central to the platform.

Analytical Implications for Efficiency and Innovation

The deployment of GenWizard reveals profound implications for operational efficiency. By automating repetitive tasks, it allows teams to focus on high-value activities, reducing project timelines by up to seventy percent in some cases. This efficiency stems from the platform’s ability to handle complex correlations and predictions, as seen in incident management where noise reduction leads to faster resolutions.

Innovation is fostered through enhanced decision-making. The system’s knowledge base, enriched with historical data and AI insights, supports proactive strategies like predictive maintenance and application rationalization. For instance, analyzing application portfolios identifies redundancies, enabling cost savings and streamlined operations.

Collaboration with technology partners like AWS amplifies these benefits. Amazon Q’s integration ensures seamless natural language interactions, democratizing access to advanced tools and promoting a culture of continuous improvement.

Consequences for Enterprise Adoption and Future Directions

Enterprise adoption of such platforms mitigates risks associated with legacy systems, facilitating smoother migrations and modernizations. However, challenges include ensuring data privacy and model accuracy, addressed through robust governance frameworks.

Future directions involve expanding agentic capabilities to encompass more lifecycle stages, potentially incorporating multimodal AI for broader applications. This could revolutionize industries by enabling autonomous operations, where systems self-optimize based on real-time data.

In conclusion, the fusion of generative AI with service delivery platforms like GenWizard, powered by AWS, represents a paradigm shift toward intelligent, efficient technology management, promising sustained competitive advantages.

Links:

Posted in en-US | Tags: Accenture, AmazonQ, AWS, AWSReInventPartnerSessions2024, GenerativeAI, Innovation, KishorPanth, LukeHiggins, ServiceDelivery, TechnologyLifecycle | No Comments »

[KotlinConf2025] Blueprints for Scale: What AWS Learned Building a Massive Multiplatform Project (2nd version)

Author: Jonathan Lalou

In a talk on building large-scale, multiplatform projects, Ian Botsford and Matis Lazdins of Amazon Web Services (AWS) shared their experiences creating the AWS SDK for Kotlin. This project is colossal, spanning over 300 services and targeting eight different platforms, with its code distributed across four repositories and nearly 500 Gradle modules. The talk provided a blueprint for managing a codebase with over 8.6 million lines of code, 98% of which is auto-generated. The key to their success, they claimed, was a set of five core principles that kept their maintainers sane and productive.

A Principled Approach to Development

Botsford and Lazdins detailed five tenets for managing a project of this scale: owning your dependencies, structuring the project for growth, designing for Kotlin Multiplatform (KMP) from the beginning, maintaining backward compatibility, and optimizing the maintainer experience. They provided a practical example of owning dependencies by discussing their choice of HTTP clients. Instead of exposing third-party library types directly, which could lead to inconsistent configurations and vulnerability to unexpected API changes, they created a common, abstract interface to maintain consistency and shield users from underlying implementation details.

Automating for Maintainer Sanity

A significant part of their strategy focused on the maintainer experience. Lazdins explained the importance of automating repetitive and mundane tasks to free up time for more complex work. They developed broad checks to catch issues before they are merged, which helps prevent regressions and enforce project standards. The speakers stressed that these checks should be highly informative but also overridable, giving developers autonomy while providing valuable feedback. This focus on a positive maintainer experience is crucial for the health of any large open-source project and is a key factor in the daily releases that happen sometimes multiple times a day.

Links:

Posted in en-US | Tags: AWS, IanBotsford, KMP, Kotlin, KotlinConf2025, KotlinMultiplatform, LargeScaleProjects, MatisLazdins, SoftwareDevelopment | No Comments »

[AWSReInvent2025] Introducing Nitro Isolation Engine: Transparency through Mathematics

Author: Jonathan Lalou

Lecturer

JD Bean is a principal architect in AWS’s compute and ML services organization, specializing in virtualization and security innovations. Kareem Raslan serves as a senior principal engineer in AWS’s Nitro hypervisor team, focusing on hardware-software integration for cloud security. Nathan Chong is a principal applied scientist in AWS’s automated reasoning group, with expertise in formal verification and mathematical proofs. Relevant links include JD Bean’s LinkedIn profile (https://www.linkedin.com/in/jdbean/) and Nathan Chong’s LinkedIn profile (https://www.linkedin.com/in/nathan-chong-aws/).

Abstract

This article explores the AWS Nitro Isolation Engine, an advancement in the Nitro System that employs formal verification to ensure mathematical certainty in workload isolation. It examines the evolution of Nitro’s design, the application of automated reasoning for proofs, and the implications for cloud security, emphasizing compartmentalization and transparency.

The Evolution of the AWS Nitro System

The AWS Nitro System has fundamentally transformed the landscape of cloud virtualization by prioritizing enhanced security, superior performance, and accelerated innovation. JD Bean traces its development back to 2012, explaining how it culminated in a public launch in 2017 that marked a departure from conventional hypervisors such as Xen. At its core, the system relies on a customized version of the KVM hypervisor tailored specifically for cloud environments, complemented by the sixth generation of proprietary Nitro Silicon. This infrastructure underpins all EC2 instances introduced since 2018, demonstrating AWS’s commitment to reimagining virtualization.

In earlier iterations, systems like Xen depended on a component known as Dom0, which essentially functioned as a general-purpose operating system to handle essential tasks such as input/output operations, orchestration, and monitoring. However, as AWS expanded its services and built deeper relationships with customers, the limitations of Xen became increasingly apparent. The team recognized the need to push beyond these constraints, leading to a comprehensive reinvention that eliminated superfluous elements and relocated AWS-specific functions to dedicated hardware. Consequently, the Nitro System features a streamlined host operating system reduced to a minimal kernel, which not only minimizes potential attack surfaces but also enforces a policy of zero operator access, thereby isolating customer data from AWS personnel.

Within this broader context, the rise of cloud adoption has amplified the demand for confidential computing, where sensitive workloads require robust protections against unauthorized access. The Nitro architecture addresses these needs by compartmentalizing only the most critical isolation functions, which in turn optimizes efficiency and reduces vulnerabilities. This design philosophy ensures that customers can leverage the cloud’s scalability without compromising on security, setting the stage for subsequent advancements like the Nitro Isolation Engine.

Design and Implementation of the Nitro Isolation Engine

Building upon the foundational principles of the Nitro System, the Nitro Isolation Engine introduces a compact and formally verified module that significantly bolsters isolation assurances. Kareem Raslan elaborates on its compartmentalization strategy, noting how non-essential operations are shifted to user space, leaving behind a concise kernel comprising fewer than 100,000 lines of code dedicated solely to vital activities such as memory allocation and interrupt handling.

This engine is currently implemented on the Graviton 5 processor, available in preview mode, and utilizes specialized hardware extensions to facilitate secure transitions across compartments. The implementation methodology centers on rigorous specification, where the engine’s expected behaviors—such as maintaining strict workload separation—are articulated through precise mathematical models. Subsequently, the team employs tools like Isabelle to prove that the actual code aligns perfectly with these specifications, thereby guaranteeing that no deviations occur.

Nathan Chong further illuminates the process of automated reasoning, beginning with intuitive examples like the formula for the sum of the first n natural numbers and progressing to sophisticated machine-checked proofs. For the engine, this approach extends to verifying properties over potentially infinite states, which ensures that unauthorized access paths are entirely eliminated. The result is a system that not only performs efficiently but also withstands rigorous scrutiny, providing customers with unparalleled confidence in their data’s protection.

The implications of this design are profound, as it substantially diminishes the risk of exploitation by confining the trusted computing base to a minimal footprint. By verifying a smaller codebase through automated means, the engine mitigates issues stemming from legacy components, paving the way for a more secure cloud ecosystem.

Automated Reasoning and Mathematical Proofs

Automated reasoning stands as a cornerstone of the Nitro Isolation Engine, offering what the presenters describe as “transparency through mathematics” by delivering incontrovertible assurances of isolation. Nathan Chong contrasts informal proofs and specifications with their machine-checked counterparts in the Isabelle theorem prover, where each logical step is mechanically validated to prevent errors.

At the heart of this process lie core concepts such as specifications, which define the precise behaviors a system must exhibit, and proofs, which consist of finite chains of reasoning that irrefutably establish desired properties. For domains involving infinite possibilities, such as the natural numbers, techniques like mathematical induction are employed: a base case confirms the property for the initial value, while the inductive step demonstrates its preservation across subsequent values, much like a cascade of falling dominoes.

Scaling these methods to the complexities of the Nitro Isolation Engine requires advanced mathematical frameworks, including separation logic for managing memory resources, refinement techniques for bridging abstraction levels, and theorem provers to automate verification. Drawing on decades of research in formal methods, this approach ensures comprehensive coverage of real-world scenarios, including concurrent operations that could otherwise introduce subtle vulnerabilities.

An analysis of this methodology reveals its inherent value: unlike traditional testing, which is confined to finite scenarios, mathematical proofs provide exhaustive guarantees, fostering a level of trust that is essential for confidential computing environments. This not only elevates security standards but also enables organizations to innovate with greater assurance.

Implications for Cloud Security and Future Innovations

The introduction of the Nitro Isolation Engine heralds a new era in cloud security, where mathematical proofs become the benchmark for verifying system integrity. By emphasizing compartmentalization, the engine effectively minimizes the trusted computing base, thereby reducing the potential for exploits and enhancing overall resilience. Currently available as an always-on feature on Graviton 5 processors in preview, it invites users to request access through designated AWS channels, signaling AWS’s proactive stance in deploying cutting-edge security measures.

On a broader scale, the consequences extend to industries with stringent privacy requirements, such as finance and healthcare, where verifiable isolation can mitigate compliance risks and build customer confidence. AWS’s ongoing commitment to elevating security standards—evident throughout the Nitro System’s history—suggests that future innovations will continue to prioritize robust protections, allowing for rapid advancements without sacrificing safety.

This transparency through mathematics not only demystifies complex systems but also empowers users to make informed decisions about their cloud strategies, ultimately contributing to a more secure digital landscape.

Conclusion

The Nitro Isolation Engine exemplifies AWS’s unwavering dedication to pioneering secure and innovative cloud infrastructure. Through the rigorous application of formal verification, it achieves mathematical certainty in workload isolation, thereby redefining transparency and trust in the realm of virtualization.

Links:

https://www.youtube.com/watch?v=hqqKi3E-oG8
https://www.linkedin.com/in/jdbean/
https://www.linkedin.com/in/nathan-chong-aws/

Posted in en-US | Tags: AutomatedReasoning, AWS, AWSReInvent2025, CloudSecurity, ConfidentialComputing, FormalVerification, Graviton5, IsolationEngine, MathematicalProofs, NitroSystem, Virtualization | No Comments »

[AWSReInvent2025] Revolutionizing DevSecOps: How Cathay Pacific Achieved 75% Faster Security with Agentic AI

Author: Jonathan Lalou

Lecturer

Mike Markell is a Practice Manager for AWS Professional Services in Hong Kong, where he leads digital transformation and security initiatives for major enterprises across Asia. Naresh Sharma is a senior technology leader at Cathay Pacific Airways, overseeing the airline’s global application security and DevSecOps strategy. Tony Leong is a Senior Security Architect at Cathay, specialized in building AI-powered security tooling and integrating AppSec-as-Code into high-velocity deployment pipelines.

Abstract

In the highly regulated and high-stakes environment of global aviation, managing security across more than 4,000 annual deployments presents a massive operational challenge. This article details how Cathay Pacific Airways revolutionized its “security-first” culture by moving beyond traditional security scanning to a comprehensive DevSecOps model. The core methodology centers on the implementation of Agentic AI and a RAG-based (Retrieval-Augmented Generation) assistant to solve the industry’s “false positive crisis.” By deploying “AI-powered security champions” and customized scanning rules, Cathay achieved a 75% reduction in vulnerability remediation time and a 50% reduction in security operations costs. The analysis explores the technical and cultural shifts required to empower over 1,000 developers to become proactive security practitioners while maintaining the airline’s rapid pace of innovation.

Context: The Bottleneck of Manual Security Reviews

For a global leader like Cathay Pacific, the pace of digital innovation is essential for maintaining a competitive edge in the aviation industry. However, this speed was being severely hindered by the limitations of traditional security scanning tools. The primary conflict centered on a high noise-to-signal ratio, where approximately 78% of the vulnerabilities identified by standard tools were determined to be false positives. This created a crisis where security teams were overwhelmed by alerts, leading to significant delays in the deployment of features for the airline’s fleet.

Furthermore, the manual review process required to validate these alerts created significant friction between the security and development teams. Developers often viewed security requirements as a hurdle that slowed down their ability to deliver value, while security professionals struggled to keep up with the volume of code being produced. To overcome these challenges, Cathay needed a solution that could scale with their deployment frequency—which covers everything from customer-facing apps to critical flight operation systems—without compromising on the rigorous safety standards that define the brand.

Methodology: Implementing Shift-Left Security with AI

The solution implemented by Cathay Pacific and AWS Professional Services involved a comprehensive “shift-left” strategy, which integrates security at the very beginning of the software development lifecycle. The cornerstone of this methodology is the use of Agentic AI. Unlike traditional static scanners, these AI agents act as “security champions” that provide real-time, context-aware guidance to developers as they write code. This allows for the identification of security anti-patterns and the suggestion of defensive coding practices before the code is even committed to a repository.

Another critical component of the methodology is the AppSec-as-Code library. This centralized knowledge base translates complex security policies into programmatic requirements that can be automatically enforced within CI/CD pipelines. To make this information accessible to developers, the team developed a RAG-based (Retrieval-Augmented Generation) assistant. This tool allows developers to query internal security standards using natural language, receiving accurate and context-specific advice instantly. Finally, the team moved away from “out of the box” tool configurations in favor of highly customized scanning rules. This technical fine-tuning was essential for drastically reducing the false-positive rate and ensuring that the security team only focused on legitimate threats.

Technical Analysis of Operational Gains

The implementation of AI-driven DevSecOps has yielded remarkable quantitative results for Cathay Pacific. The most significant outcome is a 75% reduction in the time required to remediate vulnerabilities. Because the AI agents filter out the vast majority of false positives and provide developers with clear, actionable fix suggestions, the entire security lifecycle has been compressed. Qualitatively, this has led to a 70% improvement in developer security capability, as the tools effectively serve as an automated, on-the-job training system that reinforces secure coding habits.

From a financial perspective, the automation of manual reviews and the reduction in wasted engineering time have led to a 50% cost reduction in security operations. The airline is now able to manage over 4,000 deployments annually with a higher level of confidence and lower overhead than was previously possible. A critical technical lesson learned during the journey was that “by default, no tool is perfect.” Success required a commitment to continuous customization and a willingness to collaborate with product vendors to tune their tools to the specific needs of the aviation industry. This iterative feedback loop was the key to moving from “human-in-the-loop” automation to a more efficient “AI-informed” model.

Consequences: A Cultural and Technical Transformation

The transformation at Cathay Pacific extended far beyond the technical architecture; it required a fundamental shift in the organization’s culture. The success of the project was predicated on a “can-do” spirit and the setting of ambitious targets that challenged the status quo. By providing developers with the tools to take ownership of security, the organization has fostered a culture where security is seen as a shared responsibility rather than an external constraint.

The implications for the global aviation and enterprise sectors are significant. Cathay has proven that it is possible to maintain a high-velocity deployment schedule in a safety-critical environment by leveraging the power of generative AI. Looking forward, the organization plans to develop even more insightful dashboards to provide security leaders with real-time visibility into the health of the application portfolio. The journey serves as a powerful testament to how Agentic AI can bridge the gap between agility and security, turning a potential bottleneck into a powerful competitive advantage.

Links:

Posted in en-US | Tags: AgenticAI, Automation, AWS, AWSReInvent2025, CathayPacific, Cybersecurity, DevSecOps, GenerativeAI, MikeMarkell, NareshSharma, ShiftLeft, TonyLeong | No Comments »