Jonathan Lalou's Blog

Posts Tagged ‘AgenticAI’

[AWSReInvent2025] Modern Secrets Management: Advancing from Traditional Practices to Security Frameworks Prepared for Artificial Intelligence

Lecturers

Resh Desai, Zach Miller, and Jake Farrell presented this session. Resh Desai works as a solutions architect at Amazon Web Services, driving forward developments in secrets management. Zach Miller is a Senior Worldwide Security Specialist Solutions Architect at AWS, specializing in cryptography, keys, secrets, and certificates. Jake Farrell serves as Senior Director of Engineering at Acquia, which provides open digital experience platforms.

Abstract

The presentation sheds light on the evolution of secrets management, highlighting AWS Secrets Manager as a central tool for handling the complete lifecycle of sensitive credentials. It weighs the advantages and drawbacks of centralized versus decentralized approaches, outlines key capabilities like encryption, automated rotation, cross-region replication, and high-volume retrieval, and details Acquia’s comprehensive migration efforts. In addition, it explores strategies for multi-tenant separation, patterns for Kubernetes integration, future synergies with agentic AI, and the latest service improvements that support third-party rotations and easier container-based deployments.

Core Functionalities of AWS Secrets Manager

AWS Secrets Manager provides a purpose-built service dedicated to managing the entire lifecycle of application secrets, database credentials, and API keys, setting it apart from IAM for identity management or KMS for cryptographic operations. By design, every secret undergoes envelope encryption with AWS-managed KMS keys, though users can opt for customer-managed keys to support scenarios such as cross-account sharing.

This setup integrates smoothly with CloudTrail to deliver thorough auditing of all actions, from creation and modification to deletion. Automation through Lambda enables rotation schedules that align precisely with enterprise policies, whether set at 30 or 90 days. For resilience, multi-region replication ensures secrets remain available during regional failovers. The service handles up to 10,000 transactions per second for retrieval, further enhanced by an open-source agent that implements caching with configurable time-to-live periods, thereby improving both efficiency and the overall developer experience.

Together, these features create a secure and traceable environment that integrates seamlessly with the wider AWS security landscape.

Navigating Centralized and Decentralized Deployment Choices

When designing secrets storage, architects must decide between consolidating secrets in a single dedicated account or distributing them closer to the applications that consume them. Centralized configurations often resonate with organizations in regulated sectors, as they allow for standardized practices in naming, tagging, and permission enforcement—typically achieved through enforced CI/CD pipelines or bespoke abstraction layers. Such consistency bolsters monitoring and control across the enterprise, although it requires significant initial investment in development and can introduce latency when adopting newly released capabilities.

On the other hand, a decentralized model empowers individual application teams to manage secrets directly via consoles or SDKs, offering greater adaptability to unique requirements. This approach streamlines onboarding and accommodates specialized needs more naturally, but it calls for robust supplementary governance to ensure alignment with broader standards.

In practice, the ideal configuration depends on factors like secret creation processes, ongoing management, replication demands, access patterns, and visibility needs, reflecting insights gathered from diverse customer experiences rather than a one-size-fits-all rule.

Acquia’s Migration Experience and Multi-Tenant Architecture

Acquia maintains oversight of over 300,000 distinct secret paths distributed across multiple AWS accounts, supporting millions of daily ephemeral pod instances and tens of thousands of hourly API interactions. Moving away from older systems required careful categorization of secrets into groups such as customer-supplied elements (including third-party tokens and environment variables), internal service communications, and emerging hybrid forms suited to AI agents.

To manage this complexity, Acquia developed a custom fronting API that applies type-specific rules for validation, scoping, and lifecycle policies, such as mandatory rotation or timed expiry. Rigorous least-privilege principles ensure complete separation between platform operations and customer data. For delivery into runtime environments, the organization relies on open-source components like the External Secrets Operator combined with AWS CSI drivers, which synchronize and inject secrets into Kubernetes as variables, configuration templates, or command-line flags. Strategic caching layers further reduce direct API calls, delivering noticeable gains in speed and expense control.

Through this disciplined, layered framework, Acquia achieves robust multi-tenancy while addressing gaps that IAM alone cannot fully cover in interconnected service scenarios.

Future Directions in Agentic AI Collaboration

Looking ahead, Acquia’s designs feature an AI gateway that provides a unified point for observing model invocations routed through Amazon Bedrock, complemented by a standardized factory for quickly provisioning secure agents. By embedding Secrets Manager deeply, the platform enables on-demand injection of properly scoped credentials, allowing smooth evolution alongside emerging AI features without compromising protective measures.

This ongoing partnership with AWS has yielded tangible benefits in operational streamlining, lower maintenance burdens, and enhanced overall performance.

Latest Service Developments and Their Wider Impact

Innovations continue to simplify adoption in container environments, with EKS add-ons now automating the installation and configuration of CSI drivers. The introduction of managed external secrets brings one-click rotation capabilities to external providers like Salesforce, removing the need for custom scripting and eliminating risks of desynchronization.

Native integrations now span more than 55 AWS services, making secret management largely invisible to end users. These progresses reduce entry barriers to advanced security practices, enabling teams to concentrate on innovation even as autonomous systems increase demands on privilege management.

In essence, effective secrets governance forms the bedrock of durable, expandable systems vital for both current operations and forthcoming intelligent workloads.

Links:

Posted in en-US | Tags: Acquia, AgenticAI, AutomatedRotation, AWS, AWSReInvent2025, AWSSecretsManager, CloudSecurity, DataProtection, KubernetesIntegration, MultiTenantSecurity, SecretsManagement | No Comments »

[VoxxedDaysBucharest2026] Building a Sarcastic, Agentic Pair Programmer: Alexander Chatzizacharias on Crafting Playful LLM Workflows

Author: Jonathan Lalou

Lecturer

Alexander Chatzizacharias is a software engineer at JDriven, a specialized consultancy in the Netherlands focused on JVM technologies and modern software development practices. With a unique background blending Dutch and Greek influences and a keen interest in game studies, Alexander brings creativity and playful thinking to technical challenges. He frequently speaks on topics including Java, Spring Boot, AI applications, and innovative development workflows.

Abstract

As mainstream AI coding assistants converge toward similar polished but somewhat generic experiences, Alexander Chatzizacharias demonstrates how to build a highly personalized, characterful AI pair programmer named “Pip.” Inspired by interactions with a sarcastic colleague named Ricardo, Pip incorporates personality through vectorized Slack history, utilizes Spring Boot and Kotlin, runs entirely locally with Qwen models via Ollama, and employs sophisticated workflows, multi-vector RAG, and the Model Context Protocol (MCP) to create delightful and productive assistance while addressing challenges like non-determinism and model drift.

The Homogenization of AI Assistants and the Quest for Personality

Alexander observes that leading AI coding tools have converged on remarkably similar chat-based interfaces and interaction patterns, largely influenced by OpenAI’s design choices. While incremental improvements continue, the overall experience feels increasingly uniform. This observation inspired the creation of Pip — an intentionally quirky, sarcastic AI pair programmer that injects personality drawn from real colleague interactions.

By processing Slack conversation history into vector embeddings stored in Qdrant, Pip can retrieve and emulate Ricardo’s characteristic sarcastic tone, witty retorts, and playful threats (such as threatening to delete poorly written code). This transforms the assistant from a neutral tool into a more engaging, human-like collaborator that questions unclear requirements, offers humorous feedback, and makes the development process more enjoyable.

Technical Architecture: Workflows, Agents, and Local Execution

Pip is implemented as a Spring Boot application written in Kotlin, with an IntelliJ IDEA plugin providing the frontend interface. Everything runs locally to maintain privacy and control: Qwen 3.5 models served through Ollama handle the language tasks.

Rather than pursuing fully autonomous agents, Alexander favors structured workflows that provide greater determinism and reliability — attributes particularly valued in enterprise environments. A categorization agent, functioning as an LLM-as-Judge, routes incoming queries to appropriate specialized handlers. Each handler uses carefully crafted system prompts derived from Slack history to consistently embody the desired personality traits.

The architecture incorporates multiple specialized agents for response generation, sophisticated RAG pipelines leveraging both dense and sparse vector representations with ColBERT reranking for improved retrieval quality, and integration with the Model Context Protocol (MCP) for tool usage such as playing music or generating memes when appropriate.

RAG, Tools, and the Challenges of Non-Determinism

Retrieval-Augmented Generation forms a cornerstone of Pip’s capabilities, dynamically pulling relevant context to overcome the inherent token limitations of even advanced models. Multi-vector search strategies combine semantic understanding with keyword precision for more reliable information retrieval from project documentation, codebases, and conversation history.

Tool integration via MCP enables rich interactions but introduces additional complexity due to the non-deterministic nature of local models. Alexander discusses practical challenges including prompt sensitivity to model updates (“model locking” strategies), the art of prompt engineering which he likens to “vibe checking,” and the necessity of implementing guardrails to maintain appropriate behavior boundaries.

Implications for Future AI Development

Alexander encourages attendees to experiment with building personalized, domain-specific AI assistants using accessible open-source tools. While acknowledging the increasing commercialization of AI, he emphasizes the current window of opportunity for creative, playful implementations that enhance both productivity and developer satisfaction.

Pip serves as an inspiring example of how thoughtful combination of RAG techniques, vector databases, workflow orchestration, and personality injection can create AI tools that feel genuinely collaborative rather than merely functional.

Links:

Posted in en-US | Tags: AgenticAI, AI, AlexanderChatzizacharias, Java, Kotlin, LLM, MCP, PairProgramming, Qdrant, Qwen, RAG, SpringBoot, VectorDatabase, VoxxedDaysBucharest2026 | No Comments »

[AWSReInvent2025] Accelerating Enterprise Modernization: The Architecture of Composable AI Agents

Author: Jonathan Lalou

Lecturer

Mortaza Chowri is the Head of Product Management for the AWS Transform team, where he leads the development of next-generation tools for complex workload migration. He is an expert in leveraging generative AI to automate technical debt reduction for large-scale enterprises. Joining him are Alexi and Ravi, who serve as senior architects within the AWS Transform division, specializing in agentic AI implementation and the creation of composable system frameworks. The session also features strategic insights from the leadership team at Capgemini, who collaborate with AWS to deliver industry-specific modernization solutions for global banking and automotive clients.

Abstract

Enterprise modernization is frequently paralyzed by the extreme complexity of legacy systems, particularly decades-old mainframes and aging Windows-bound .NET applications. This article explores the innovative framework of AWS Transform, a centralized service that utilizes “Agentic AI” to automate and streamline the migration process. The methodology centers on the concept of composability, which allows AWS partners to integrate their proprietary industry knowledge and specialized tools with foundational AI agents. By utilizing a sophisticated chat-based interface and automated business rule extraction, the platform enables a seamless transition from legacy COBOL and .NET Framework 4.x to modern, cloud-native architectures. The analysis demonstrates how these composable agents create a continuous feedback loop that significantly reduces manual effort, improves documentation, and ensures business logic remains intact during high-risk migrations.

Context: The Burden of Technical Debt and Knowledge Atrophy

Many of the world’s most critical systems, particularly in finance and manufacturing, are still dependent on infrastructure built in the late 20th century. These legacy environments present three primary obstacles that prevent organizations from achieving modern agility. First, knowledge atrophy has become a critical risk, as the original architects of these mainframe systems have often retired, leaving behind “black box” applications that lack contemporary documentation. Second, the technical debt associated with older languages like COBOL is immense, as these systems were never designed to leverage modern cloud features such as serverless compute or elastic auto-scaling.

Third, the mission-critical nature of these systems creates a state of risk aversion, where the fear of breaking a core business process during a manual rewrite often leads to stagnation. AWS Transform was specifically developed to break this cycle of inertia. By providing a unified experience that integrates discovery, assessment, and modernization into a single platform, AWS allows enterprises to view their legacy code as an asset to be reimagined rather than a liability to be feared.

Methodology: Agentic AI and the Composable Framework

The core technical innovation of AWS Transform is the transition from static point solutions to a dynamic, “unified experience” powered by specialized AI agents. These agents are designed to perform complex technical tasks with a level of autonomy that far exceeds traditional automation scripts. The methodology is built upon several key pillars of agentic behavior. Discovery agents are tasked with automatically mapping technical artifacts, such as physical servers and complex database schemas, to their optimal cloud-native equivalents.

Modernization agents, specifically those tuned for mainframe environments, perform the difficult work of extracting business rules from legacy code. This process generates comprehensive documentation that allows current engineers to “comprehend” the underlying logic of systems they did not build. The most transformative aspect of this methodology is its composability for partners. AWS provides the foundational intelligence and large language models, while partners such as Capgemini can “compose” these with their own specialized knowledge bases and custom transformation rules. This enables the creation of industry-specific agents, such as a modernization assistant specifically optimized for banking regulations or complex automotive production logic.

Technical Analysis of Mainframe Rule Extraction

The implementation of these agents in real-world scenarios, particularly through the collaboration with Capgemini, highlights a sophisticated “forward engineering” approach. In this workflow, the AI agents first scan the legacy code to identify core business logic and immutable rules. This extraction phase is critical because it ensures that while the code is updated, the essential business functions remain perfectly intact. Following extraction, the reimagination phase begins, where these rules are integrated into a modern architecture that meets cloud-native standards for security and performance.

Practitioners interact with these systems through a chat experience within the AWS Transform interface, allowing them to query both the AI agents and integrated domain experts directly. This interaction model democratizes the modernization process, making it accessible to developers who may not have expertise in COBOL but are proficient in modern languages like Java or Python. The platform serves as a bridge, translating the “what” of legacy business logic into the “how” of modern cloud execution.

Outcomes: Efficiency, Consistency, and Continuous Learning

The deployment of composable AI agents has fundamentally altered the economics and speed of enterprise modernization. By automating the most labor-intensive parts of code comprehension and translation, organizations have reported a reduction in manual effort by as much as 80%. This allows teams to focus on high-value innovation rather than the repetitive task of line-by-line code migration. Furthermore, the platform ensures architectural consistency across a large organization, preventing the fragmentation that often occurs when different teams use varying migration tools.

One of the most significant consequences of this approach is the continuous improvement of the agents themselves. Every modernization task performed through the platform provides feedback data that enhances the underlying AI models. As these agents encounter more diverse enterprise environments, their ability to handle edge cases and complex business rules grows exponentially. This creates a virtuous cycle where each successful migration makes the next one faster and more reliable, effectively solving the problem of knowledge atrophy for the long term.

Conclusion

The shift toward agentic AI and composable architectures represents a milestone in the evolution of enterprise IT. AWS Transform provides a robust framework that allows organizations to tackle their most daunting legacy challenges with a level of confidence and speed that was previously impossible. By allowing partners to integrate their unique industry expertise into a centralized AI system, AWS has created a scalable ecosystem that transforms modernization from a risky, multi-year endeavor into a manageable and continuous strategic process.

Links:

Posted in en-US | Tags: AgenticAI, AWSReInvent2025, AWSTransform, Capgemini, CloudMigration, ComposableArchitecture, EnterpriseIT, GenerativeAI, MainframeModernization, Modernization, MortazaChowri | No Comments »

[VoxxedDaysTicino2026] Agentic AI Patterns

Author: Jonathan Lalou

Lecturer

Kevin Dubois is a Senior Principal Developer Advocate at IBM, previously with Red Hat, focusing on Java, AI, and cloud-native development. As a Java Champion and Technical Lead for the CNCF Developer Experience Technical Advisory Group, Kevin authors content, speaks internationally, and contributes to open-source projects. Mario Fusco, co-presenter, is a Senior Principal Software Engineer at IBM (Red Hat), leading the Drools project. A Java Champion with expertise in functional programming and domain-specific languages, Mario coordinates the Milano Java User Group and frequently speaks on software engineering topics. Relevant links include Kevin’s LinkedIn profile (https://ch.linkedin.com/in/kevindubois), Mario’s LinkedIn profile (https://it.linkedin.com/in/mario-fusco-3467213), and Mario’s X account (https://x.com/mariofusco).

Abstract

This article investigates patterns in agentic AI systems as presented by Kevin Dubois and Mario Fusco, emphasizing orchestration of AI services for complex tasks. It delineates foundational components, workflow-based orchestration, autonomous agent models, and extensible planners. Through analysis of methodologies in LangChain4j with Quarkus, it elucidates contexts, implementations, and ramifications for building sophisticated AI applications.

Foundations of AI Services and Agentic Systems

Kevin and Mario initiate their discourse by establishing core elements of AI-infused applications, particularly within Java ecosystems using LangChain4j and Quarkus. An AI service fundamentally interfaces with a large language model (LLM) to process inputs and yield responses. However, effective integration demands more: precise prompting to elicit desired outputs, memory management to sustain conversational context, tool invocation for external actions, and data augmentation via retrieval-augmented generation (RAG).

Prompting emerges as pivotal; vague instructions yield suboptimal results, whereas structured prompts enhance accuracy. Memory, absent in standalone LLMs, requires client-side tracking—LangChain4j automates this, customizable via caching. Tools enable LLMs to perform actions like database queries or email dispatch, via function calling where LLMs request tool usage.

RAG integrates proprietary data: embeddings store vectorized information in databases like Pinecone, retrieved to enrich prompts. Moderation filters harmful content, ensuring ethical outputs.

Agentic systems extend this: agents, autonomous entities with goals, leverage these components. Patterns categorize into workflows (predefined paths) and autonomous agents (dynamic LLM-directed processes). Contexts include scenarios needing multi-step reasoning, like trip planning involving weather, flights, and accommodations.

Implications: These foundations enable modular, scalable AI, but demand careful design to mitigate errors like hallucinations.

Code illustrates basics:

@RegisterAiService
interface WeatherAgent {
    String getWeather(String city);
}

This defines an AI service interfacing with an LLM for weather queries.

Workflow-Based Orchestration of Agents

Workflow patterns orchestrate agents through coded sequences, suitable for predictable tasks. Kevin and Mario detail sequential, parallel, conditional, and looping workflows in LangChain4j.

Sequential invokes agents in order: e.g., weather retrieval followed by outfit suggestion. Parallel executes concurrently, aggregating outputs—useful for independent subtasks like multi-city weather checks.

Conditional branches based on outputs: if weather is rainy, suggest indoor activities. Looping iterates until conditions met, like refining content via reviewer-critic cycles.

Methodology employs builders:

AgenticSystem system = AgenticSystem.builder()
    .sequence(weatherAgent, outfitAgent)
    .build();

Execution yields structured results, with event logs for monitoring.

Contexts: Workflows suit deterministic processes, reducing LLM variability. Implications: Enhance efficiency but limit adaptability; error handling via retries or prompt adjustments is crucial.

Autonomous and Dynamic Agent Orchestration

Autonomous patterns empower an LLM-orchestrator to dynamically select agents, ideal for unstructured tasks. The orchestrator evaluates inputs, plans invocations, and executes, adapting via reasoning.

Mario explains: Orchestrator prompts guide tool (agent) selection. Execution involves planning, tool calls, and result integration until resolution.

AgenticSystem system = AgenticSystem.builder()
    .autonomous(orchestrator)
    .agents(agent1, agent2)
    .build();

Contexts: Handles ambiguity, like open-ended queries. Implications: Increases flexibility but risks infinite loops or off-track reasoning; human-in-the-loop mitigates via approvals.

Multimodal extensions process PDFs or generate images, expanding applicability.

Extensible Planners for Custom Agentic Patterns

To accommodate diverse needs, Mario introduces pluggable planners, abstracting orchestration. This service provider interface (SPI) allows custom implementations, like goal-oriented patterns using A* search.

Planners initialize with agents, determining next actions: invoke agents (sequentially/parallel) or conclude. Existing patterns refactor atop this.

Goal-oriented example: Define prerequisites and goals; algorithm generates invocation graphs.

Planner customPlanner = new GoalOrientedPlanner(agents);
AgenticSystem system = AgenticSystem.builder()
    .planner(() -> customPlanner)
    .build();

Hybridization combines patterns, e.g., goal-oriented with loops for refinement.

Contexts: Custom scenarios like adaptive learning systems. Implications: Fosters innovation, but requires algorithmic expertise; promotes modularity in AI design.

In summary, Kevin and Mario’s patterns advance agentic AI, blending structure with dynamism for robust applications.

Links:

Posted in en-US | Tags: AgenticAI, AIOrchestration, AutonomousAgents, IBM, KevinDubois, LangChain4j, MarioFusco, Quarkus, RedHat, VoxxedDaysTicino2026 | No Comments »

[AWSReInvent2025] The Agentic Frontier: Lessons from Anthropic’s 2025 AI Deployments

Author: Jonathan Lalou

Lecturer

Danny Leybovich is a Product Lead at Anthropic, dedicated to building the infrastructure and models that empower the next generation of AI developers. With a focus on high-reasoning models and developer experience, Danny has been instrumental in the launch of Claude Code and the evolution of Anthropic’s agentic framework. His work centers on the practical realities of moving AI from “cool demo” to “reliable autonomous system.”

Abstract

2025 marked a pivotal shift in the artificial intelligence landscape: the transition from interactive chatbots to autonomous AI agents. This article synthesizes the key discoveries made by Anthropic during this transformative year, particularly through the development of Claude Code and the deployment of the Opus 4.5 frontier model. It explores the “agentic architecture” required for long-horizon autonomous work, emphasizing the critical roles of context engineering and skill acquisition. The analysis examines the shift toward “agent-first” workflows, where the model is no longer a passive assistant but an active participant with multi-hour reasoning capabilities. By investigating patterns of reliability and the evolution of AI engineering practices, this article provides a roadmap for the next wave of agentic AI.

The Shift to Agent-First Workflows

In the early stages of generative AI, the predominant interaction pattern was the “chat” interface—a stateless exchange where a human provided a prompt and the model provided a response. 2025 saw the obsolescence of this limited model in favor of “agent-first” workflows. In an agentic architecture, the model is granted the autonomy to use tools, manage its own memory, and pursue goals over extended periods—sometimes lasting hours.

This shift changes the fundamental role of the developer. Instead of engineering a single prompt, the developer now engineers an environment in which an agent can succeed. This involves defining clear objectives, providing access to necessary APIs, and implementing “guardrails” that ensure the agent remains on track during autonomous loops. The rise of “Claude Code”—an agent that can autonomously file GitHub issues and build applications—serves as the flagship example of this transition.

Advanced Context Engineering: Beyond the Context Window

While early AI discussions focused heavily on the size of the “context window,” Anthropic’s experience in 2025 highlighted that quality of context is far more important than raw volume. Context engineering is the practice of strategically selecting and formatting the information provided to the model to maximize reasoning accuracy and minimize hallucinations.

Effective context engineering for agents involves:

State Management: Keeping track of what the agent has already done and what remains to be accomplished.
Relevant Document Retrieval: Using RAG (Retrieval-Augmented Generation) to pull only the most pertinent information into the reasoning loop.
Semantic Chunking: Ensuring that the information is presented in a way that the model can easily digest and connect to other data points.

By focusing on context engineering, developers can enable agents to maintain “state” across long horizons, allowing for complex tasks like refactoring an entire codebase or conducting multi-step regulatory research without losing the thread of the original objective.

Tool Construction and Skill Acquisition

A primary differentiator for AI agents is their ability to interact with the world through tools. In 2025, Anthropic refined the methodology for “teaching” agents new skills through tool construction. A “skill” is essentially a well-defined tool—such as a Python interpreter, a SQL query engine, or a web search function—that the model knows how and when to invoke.

The engineering challenge lies in creating “reliable” tools. If a tool’s output is ambiguous or inconsistent, the agent’s reasoning loop will break. Therefore, tool writing has become a core discipline within AI engineering. Developers must create tools that provide “structured feedback” to the model, allowing the agent to self-correct if a tool call fails. This iterative loop of tool use and self-correction is what allows agents to handle “long-horizon” tasks that were previously impossible for LLMs.

Analyzing the Performance of Opus 4.5

The release of the Opus 4.5 frontier model provided the reasoning “horsepower” necessary for the agentic revolution. Unlike smaller models that might prioritize speed, Opus 4.5 is optimized for high-reasoning tasks. Its performance characteristics include a significant reduction in “logic drift”—the tendency of a model to lose focus during long sequences of thought.

In production environments, Opus 4.5 has demonstrated an ability to navigate “deep” decision trees. For example, when tasked with finding a bug in a complex software system, the model can formulate a hypothesis, write a test to prove it, analyze the test results, and then iteratively refine its approach. This capability for “autonomous debugging” is a hallmark of the newest wave of AI, where the model’s intelligence is leveraged not just for text generation, but for problem-solving in dynamic environments.

Code Sample: Defining a Secure Tool for Claude Agentic Workflows

'''
 Conceptual tool definition for an Anthropic Agent
 This tool allows the agent to safely query a database
''' 

def get_tool_definition():
    return {
        "name": "query_database",
        "description": "Allows the agent to execute read-only SQL queries to retrieve customer data.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The SQL query to execute. Must be read-only."
                },
                "max_rows": {
                    "type": "integer",
                    "default": 10
                }
            },
            "required": ["query"]
        }
    }

'''
This structure enables the model to 'reason' about when it needs 
to fetch data versus when it can rely on its internal knowledge.
'''

Long-Horizon Autonomous Reliability

The final frontier explored in 2025 was the challenge of reliability. For an agent to be truly useful, it must be able to work for hours without human intervention. This requires a robust infrastructure that can handle model timeouts, API failures, and unexpected edge cases.

Anthropic’s research into long-horizon agents suggests that reliability is not a feature of the model alone, but a result of the model-infrastructure synergy. This includes:

Checkpointing: Periodically saving the agent’s state so it can resume after a failure.
Human-in-the-Loop (HITL) Triggers: Designing the agent to “ask for help” when it reaches a confidence threshold that is too low.
Verification Loops: Implementing a secondary model or a deterministic process to verify the agent’s output before it is committed.

These patterns are what define the current state of the art in AI engineering, moving the industry toward a future where agents are trusted partners in the enterprise.

Conclusion

The lessons of 2025 are clear: the future of AI belongs to autonomous agents. By mastering the disciplines of context engineering, tool construction, and long-horizon reliability, developers can leverage models like Claude Opus 4.5 to solve problems of unprecedented complexity. As we look ahead, the trends established this year—particularly the move toward agent-first workflows—will define the next decade of technological innovation. The demo era is over; the production era of agentic AI has begun.

Links:

Posted in en-US | Tags: AgenticAI, AIAgents, AIERa, Anthropic, AWSReInvent2025, Claude, ClaudeCode, ContextEngineering, MachineLearning, Opus45, SoftwareEngineering | No Comments »

[AWSReInvent2025] Scaling Customer Support, Compliance, and Productivity with Conversational AI at Coinbase

Author: Jonathan Lalou

Lecturer

Joshua Smith is a Senior Solutions Architect at Amazon Web Services (AWS), specializing in financial services. He collaborates closely with major institutions to design scalable, secure cloud architectures.
Vara Maharivan serves as Director of Machine Learning and Artificial Intelligence at Coinbase, leading the company’s efforts to integrate advanced AI and machine learning capabilities across its cryptocurrency platform.

Abstract

This session examines how Coinbase, a leading cryptocurrency exchange, has deployed a unified generative AI platform built on Amazon Bedrock to transform three critical operational domains: customer support, regulatory compliance, and internal developer productivity. The presentation details the architectural approach, key AWS services leveraged, real-world performance metrics, and the strategic roadmap ahead. By combining retrieval-augmented generation (RAG), tool execution, and domain-specific agents, Coinbase has achieved substantial automation, cost efficiencies, and enhanced user experiences while maintaining rigorous security and compliance standards.

The Evolution of Generative AI in Financial Services

Joshua Smith opened the discussion by contextualizing the rapid maturation of generative AI within financial services. In 2023, early adoption centered on foundational concerns such as data trust and secure retrieval mechanisms. By 2024, the introduction of Amazon Bedrock enabled broader experimentation in areas like customer support, with focus shifting toward scalability, granular access controls, and integration with existing enterprise tools. Entering 2025, the landscape has progressed toward fully agentic, multi-agent systems capable of autonomously orchestrating complex workflows.

Smith emphasized that the primary challenge is no longer prototyping conversational interfaces but rather re-engineering entire business processes to deliver measurable impact on key performance indicators. This shift demands robust infrastructure, advanced security primitives, and operational frameworks tailored for agentic workloads.

AWS Services Enabling Production-Grade Agentic AI

Central to the discussion was Amazon Bedrock, a fully managed service providing access to leading foundation models through a unified API. Bedrock supports private model customization, guardrails for safety, cost-latency optimization, and, notably, Agent Core—a suite of capabilities designed to operationalize agents at scale.

Agent Core addresses critical production gaps: a serverless runtime supporting long-running multimodal agents (up to eight hours), checkpointing and recovery, identity management compatible with existing providers, secure token vaults, shared and private memory, tool discovery with fine-grained controls, and centralized observability combining logs, traces, and metrics. These components collectively mitigate risks highlighted in industry reports, such as escalating costs, unclear value, and insufficient security, which threaten the viability of agentic initiatives.

Coinbase’s Strategic Vision for AI Integration

Vara Maharivan outlined Coinbase’s mission to increase economic freedom through a trusted global cryptocurrency platform. The company rests on three pillars: building trust via top-tier security, enhancing accessibility through intuitive experiences, and scaling operations efficiently across more than 100 countries.

AI and machine learning have long underpinned fraud detection, risk assessment, personalization, and infrastructure scaling at Coinbase. Recent innovations include graph neural network-based risk scoring for blockchain addresses, ERC-20 scam token detection combining smart contract auditing with ML, and predictive scaling models to handle market volatility.

With the advent of large language models, Coinbase identified three high-impact generative AI domains: customer support automation, compliance process acceleration, and developer productivity enhancement.

Transforming Customer Support with Agentic Workflows

Crypto markets exhibit extreme volatility, driving unpredictable spikes in user inquiries that challenge traditional human-staffed support models. Coinbase addressed this through a unified generative AI platform granting fluid access to models and internal data via standardized interfaces.

The architecture features a virtual assistant handling routine interactions autonomously and an agent-assist tool empowering human representatives. The virtual assistant resolves straightforward cases end-to-end, while the assistive tool synthesizes real-time information from knowledge bases and tools, providing agents with contextual summaries, suggested responses, and multilingual capabilities.

Results demonstrate significant impact: approximately 65% of customer contacts are now automated, yielding nearly five million annualized employee-hour savings. Automated cases resolve in under ten minutes—contrasting sharply with up to forty minutes for human-handled escalations—dramatically improving customer satisfaction and operational efficiency.

Streamlining Compliance through AI-Augmented Investigations

Regulatory compliance in financial services demands rigorous processes such as KYC, KYB, and transaction monitoring. These workflows are labor-intensive, require exhaustive explainability, and must adapt to diverse jurisdictional requirements.

Coinbase augmented traditional ML-based risk detection models (deployed via Anyscale on AWS EKS) with generative AI. A compliance-assist tool aggregates data from internal systems and open-source intelligence, producing narrative summaries and risk signals for human reviewers.

At the core lies an autoresolution engine orchestrating holistic reviews. Upon a high-risk alert, the engine coordinates data synthesis, automated actions, human-in-the-loop feedback, and customer information requests. Final decisions—such as filing Suspicious Activity Reports—remain with human compliance officers, preserving accountability while accelerating throughput and consistency.

Boosting Developer Productivity across the SDLC

Developer efficiency emerged as another strategic priority. Coinbase provides multiple best-in-class coding assistants (e.g., Claude Code, Cursor) powered by Anthropic models via Bedrock, allowing engineers to select preferred tools.

A custom GitHub Action automates pull-request reviews: summarizing changes, generating natural-language comments, enforcing conventions, identifying testing gaps, and offering debugging guidance for CI failures. This shifts human review toward higher-value architectural concerns.

For quality assurance, an in-house UI testing tool translates natural-language test descriptions into autonomous browser actions across form factors, achieving parity with human accuracy, triple the bug-detection rate, and 86% cost reduction versus manual testing.

Quantifiable outcomes include nearly 40% of daily code being AI-generated or influenced (targeting 50%), 75,000 annual hours saved via automated PR reviews, and dramatically faster test introduction.

Future Directions and Platform Modernization

Coinbase aims to democratize agentic AI across the organization, enabling every employee to experiment and innovate. Ongoing efforts focus on modernizing existing tools and scaling enterprise-wide impact.

Agent Core features—secure deployment, robust identity management, advanced memory, and interoperability—are viewed as pivotal for the next phase of expansion.

Conclusion

The Coinbase case illustrates a mature approach to generative AI deployment: leveraging a unified platform on Amazon Bedrock to address volatility-driven operational challenges while upholding security and regulatory standards. By combining autonomous agents, human augmentation, and rigorous evaluation, the company has realized substantial automation, cost savings, and quality improvements across support, compliance, and engineering functions. As agentic systems evolve, such integrated architectures offer a blueprint for financial institutions seeking transformative efficiency without compromising trust.

Links:

Lecture video

Posted in en-US | Tags: AgenticAI, AmazonBedrock, AWSreInvent, AWSReInvent2025, Coinbase, Compliance, Crypto, CustomerSupport, DeveloperProductivity, FinancialServices, GenerativeAI, JoshuaSmith, MachineLearning, VaraMaharivan | No Comments »

[AWSReInvent2025] Revolutionizing DevSecOps: How Cathay Pacific Achieved 75% Faster Security with Agentic AI

Author: Jonathan Lalou

Lecturer

Mike Markell is a Practice Manager for AWS Professional Services in Hong Kong, where he leads digital transformation and security initiatives for major enterprises across Asia. Naresh Sharma is a senior technology leader at Cathay Pacific Airways, overseeing the airline’s global application security and DevSecOps strategy. Tony Leong is a Senior Security Architect at Cathay, specialized in building AI-powered security tooling and integrating AppSec-as-Code into high-velocity deployment pipelines.

Abstract

In the highly regulated and high-stakes environment of global aviation, managing security across more than 4,000 annual deployments presents a massive operational challenge. This article details how Cathay Pacific Airways revolutionized its “security-first” culture by moving beyond traditional security scanning to a comprehensive DevSecOps model. The core methodology centers on the implementation of Agentic AI and a RAG-based (Retrieval-Augmented Generation) assistant to solve the industry’s “false positive crisis.” By deploying “AI-powered security champions” and customized scanning rules, Cathay achieved a 75% reduction in vulnerability remediation time and a 50% reduction in security operations costs. The analysis explores the technical and cultural shifts required to empower over 1,000 developers to become proactive security practitioners while maintaining the airline’s rapid pace of innovation.

Context: The Bottleneck of Manual Security Reviews

For a global leader like Cathay Pacific, the pace of digital innovation is essential for maintaining a competitive edge in the aviation industry. However, this speed was being severely hindered by the limitations of traditional security scanning tools. The primary conflict centered on a high noise-to-signal ratio, where approximately 78% of the vulnerabilities identified by standard tools were determined to be false positives. This created a crisis where security teams were overwhelmed by alerts, leading to significant delays in the deployment of features for the airline’s fleet.

Furthermore, the manual review process required to validate these alerts created significant friction between the security and development teams. Developers often viewed security requirements as a hurdle that slowed down their ability to deliver value, while security professionals struggled to keep up with the volume of code being produced. To overcome these challenges, Cathay needed a solution that could scale with their deployment frequency—which covers everything from customer-facing apps to critical flight operation systems—without compromising on the rigorous safety standards that define the brand.

Methodology: Implementing Shift-Left Security with AI

The solution implemented by Cathay Pacific and AWS Professional Services involved a comprehensive “shift-left” strategy, which integrates security at the very beginning of the software development lifecycle. The cornerstone of this methodology is the use of Agentic AI. Unlike traditional static scanners, these AI agents act as “security champions” that provide real-time, context-aware guidance to developers as they write code. This allows for the identification of security anti-patterns and the suggestion of defensive coding practices before the code is even committed to a repository.

Another critical component of the methodology is the AppSec-as-Code library. This centralized knowledge base translates complex security policies into programmatic requirements that can be automatically enforced within CI/CD pipelines. To make this information accessible to developers, the team developed a RAG-based (Retrieval-Augmented Generation) assistant. This tool allows developers to query internal security standards using natural language, receiving accurate and context-specific advice instantly. Finally, the team moved away from “out of the box” tool configurations in favor of highly customized scanning rules. This technical fine-tuning was essential for drastically reducing the false-positive rate and ensuring that the security team only focused on legitimate threats.

Technical Analysis of Operational Gains

The implementation of AI-driven DevSecOps has yielded remarkable quantitative results for Cathay Pacific. The most significant outcome is a 75% reduction in the time required to remediate vulnerabilities. Because the AI agents filter out the vast majority of false positives and provide developers with clear, actionable fix suggestions, the entire security lifecycle has been compressed. Qualitatively, this has led to a 70% improvement in developer security capability, as the tools effectively serve as an automated, on-the-job training system that reinforces secure coding habits.

From a financial perspective, the automation of manual reviews and the reduction in wasted engineering time have led to a 50% cost reduction in security operations. The airline is now able to manage over 4,000 deployments annually with a higher level of confidence and lower overhead than was previously possible. A critical technical lesson learned during the journey was that “by default, no tool is perfect.” Success required a commitment to continuous customization and a willingness to collaborate with product vendors to tune their tools to the specific needs of the aviation industry. This iterative feedback loop was the key to moving from “human-in-the-loop” automation to a more efficient “AI-informed” model.

Consequences: A Cultural and Technical Transformation

The transformation at Cathay Pacific extended far beyond the technical architecture; it required a fundamental shift in the organization’s culture. The success of the project was predicated on a “can-do” spirit and the setting of ambitious targets that challenged the status quo. By providing developers with the tools to take ownership of security, the organization has fostered a culture where security is seen as a shared responsibility rather than an external constraint.

The implications for the global aviation and enterprise sectors are significant. Cathay has proven that it is possible to maintain a high-velocity deployment schedule in a safety-critical environment by leveraging the power of generative AI. Looking forward, the organization plans to develop even more insightful dashboards to provide security leaders with real-time visibility into the health of the application portfolio. The journey serves as a powerful testament to how Agentic AI can bridge the gap between agility and security, turning a potential bottleneck into a powerful competitive advantage.

Links:

Posted in en-US | Tags: AgenticAI, Automation, AWS, AWSReInvent2025, CathayPacific, Cybersecurity, DevSecOps, GenerativeAI, MikeMarkell, NareshSharma, ShiftLeft, TonyLeong | No Comments »

[GoogleIO2025] Google I/O ’25 Keynote

Author: Jonathan Lalou

Keynote Speakers

Sundar Pichai serves as the Chief Executive Officer of Alphabet Inc. and Google, overseeing the company’s strategic direction with a focus on artificial intelligence integration across products and services. Born in India, he holds degrees from the Indian Institute of Technology Kharagpur, Stanford University, and the Wharton School, and has been instrumental in advancing Google’s cloud computing and AI initiatives since joining the firm in 2004.

Demis Hassabis acts as the Co-Founder and Chief Executive Officer of Google DeepMind, leading efforts in artificial general intelligence and breakthroughs in areas like protein folding and game-playing AI. A former child chess prodigy with a PhD in cognitive neuroscience from University College London, he has received knighthood for his contributions to science and technology.

Liz Reid holds the position of Vice President of Search at Google, directing product management and engineering for core search functionalities. She joined Google in 2003 as its first female engineer in the New York office and has spearheaded innovations in local search and AI-enhanced experiences.

Johanna Voolich functions as the Chief Product Officer at YouTube, guiding product strategies for the platform’s global user base. With extensive experience at Google in search, Android, and Workspace, she emphasizes AI-driven enhancements for content creation and consumption.

Dave Burke previously served as Vice President of Engineering for Android at Google, contributing to the platform’s development for over a decade before transitioning to advisory roles in AI and biotechnology.

Donald Glover is an acclaimed American actor, musician, writer, and director, known professionally as Childish Gambino in his music career. Born in 1983, he has garnered multiple Emmy and Grammy awards for his work in television series like Atlanta and music albums exploring diverse themes.

Sameer Samat operates as President of the Android Ecosystem at Google, responsible for the operating system’s user and developer experiences worldwide. Holding a bachelor’s degree in computer science from the University of California San Diego, he has held leadership roles in product management across Google’s mobile and ecosystem divisions.

Abstract

This examination delves into the pivotal announcements from the Google I/O 2025 keynote, centering on breakthroughs in artificial intelligence models, agentic systems, search enhancements, generative media, and extended reality platforms. It dissects the underlying methodologies driving these advancements, their contextual evolution from research prototypes to practical implementations, and the far-reaching implications for technological accessibility, societal problem-solving, and ethical AI deployment. By analyzing demonstrations and strategic integrations, the discourse illuminates how Google’s full-stack approach fosters rapid innovation while addressing real-world challenges.

Evolution of AI Models and Infrastructure

The keynote commences with Sundar Pichai highlighting the accelerated pace of AI development within Google’s ecosystem, emphasizing the transition from foundational research to widespread application. Central to this narrative is the Gemini model family, which has seen substantial enhancements since its inception. Pichai notes the deployment of over a dozen models and features in the past year, underscoring a methodology that prioritizes swift iteration and integration. For instance, the Gemini 2.5 Pro model achieves top rankings on benchmarks like the Ella Marina leaderboard, reflecting a 300-point increase in ELO scores—a metric evaluating model performance across diverse tasks.

This progress is underpinned by Google’s proprietary infrastructure, exemplified by the seventh-generation TPU named Ironwood. Designed for both training and inference at scale, it offers a tenfold performance boost over predecessors, enabling 42.5 exaflops per pod. Such hardware advancements facilitate cost reductions and efficiency gains, allowing models to process outputs at unprecedented speeds—Gemini models dominate the top three positions for tokens per second on leading leaderboards. The implications extend to democratizing AI, as lower prices and higher performance make advanced capabilities accessible to developers and users alike.

Demis Hassabis elaborates on the intelligence layer, positioning Gemini 2.5 Pro as the world’s premier foundation model. Updated previews have empowered creators to generate interactive applications from sketches or simulate urban environments, demonstrating multimodal reasoning that spans text, code, and visuals. The incorporation of LearnM, a specialized educational model, elevates its utility in learning scenarios, topping relevant benchmarks. Meanwhile, the refined Gemini 2.5 Flash serves as an efficient alternative, appealing to developers for its balance of speed and affordability.

Methodologically, these models leverage vast datasets and advanced training techniques, including reinforcement learning from human feedback, to enhance reasoning and contextual understanding. The context of this evolution lies in Google’s commitment to a full-stack AI strategy, integrating hardware, software, and research. Implications include fostering an ecosystem where AI augments human creativity, though challenges like computational resource demands necessitate ongoing optimizations to ensure equitable access.

Agentic Systems and Personalization Strategies

A significant portion of the presentation explores agentic AI, where systems autonomously execute tasks while remaining under user oversight. Pichai introduces concepts like Project Starline evolving into Google Beam, a 3D video platform that merges multiple camera feeds via AI to create immersive communications. This innovation, collaborating with HP, employs real-time rendering at 60 frames per second, implying enhanced remote interactions that mimic physical presence.

Building on this, Project Astra’s capabilities migrate to Gemini Live, enabling contextual awareness through camera and screen sharing. Demonstrations reveal its application in everyday scenarios, such as interview preparation or fitness training. The introduction of multitasking in Project Mariner allows oversight of up to ten tasks, utilizing “teach and repeat” mechanisms where agents learn from single demonstrations. Available via the Gemini API, this tool invites developer experimentation, with partners like UiPath integrating it for automation.

The agent ecosystem is bolstered by protocols like the open agent-to-agent framework and Model Context Protocol (MCP) compatibility in the Gemini SDK, facilitating inter-agent communication and service access. In practice, agent mode in the Gemini app exemplifies this by sourcing apartment listings, applying filters, and scheduling tours—streamlining complex workflows.

Personalization emerges as a complementary frontier, with “personal context” allowing models to draw from user data across Google apps, ensuring privacy through user controls. An example in Gmail illustrates personalized smart replies that emulate individual styles by analyzing past communications and documents. This methodology relies on secure data handling and fine-tuned models, implying deeper user engagement but raising ethical considerations around data consent and bias mitigation.

Overall, these agentic and personalized approaches shift AI from reactive tools to proactive assistants, contextualized within Google’s product suite. The implications are transformative for productivity, yet require robust governance to balance utility with user autonomy.

Innovations in Search and Information Retrieval

Liz Reid advances the discussion on search evolution, framing AI Overviews and AI Mode as pivotal shifts. With over 1.5 billion monthly users, AI Overviews synthesize responses from web content, enhancing query resolution. AI Mode extends this into conversational interfaces, supporting complex, multi-step inquiries like travel planning by integrating reasoning, tool usage, and web interaction.

Methodologically, this involves grounding models in real-time data, ensuring factual accuracy through citations and diverse perspectives. Demonstrations showcase handling ambiguous queries, such as dietary planning, by breaking them into sub-tasks and verifying outputs. The introduction of video understanding allows analysis of uploaded content, providing step-by-step guidance.

Contextually, these features address information overload in an era of abundant data, implying improved user satisfaction—evidenced by higher engagement metrics. However, implications include potential disruptions to content ecosystems, necessitating transparency in sourcing to maintain trust.

Generative Media and Creative Tools

Johanna Voolich and Donald Glover spotlight generative media, with Imagine 3 and V3 models enabling high-fidelity image and video creation. Imagine 3’s stylistic versatility and V3’s narrative consistency allow seamless editing, as Glover illustrates in crafting a short film.

The Flow tool democratizes filmmaking by generating clips from prompts, supporting extensions and refinements. Methodologically, these leverage diffusion-based architectures trained on vast datasets, ensuring coherence across outputs.

Context lies in empowering creators, with implications for industries like entertainment—potentially lowering barriers but raising concerns over authenticity and intellectual property. Subscription plans like Google AI Pro and Ultra provide access, fostering experimentation.

Android XR Platform and Ecosystem Expansion

Sameer Samat introduces Android XR, optimized for headsets and glasses, integrating Gemini for contextual assistance. Project Muhan with Samsung offers immersive experiences, while glasses prototypes enable hands-free interactions like navigation and translation.

Partnerships with Gentle Monster and Warby Parker emphasize style, with developer previews forthcoming. Methodologically, this builds on Android’s ecosystem, ensuring app compatibility.

Implications include redefining human-computer interaction, enhancing accessibility, but demanding advancements in battery life and privacy.

Societal Impacts and Prospective Horizons

The keynote culminates in applications like Firesat for wildfire detection and drone relief during disasters, showcasing AI’s role in societal challenges. Pichai envisions near-term realizations in robotics, medicine, quantum computing, and autonomous vehicles.

This forward-looking context underscores ethical deployment, with implications for global equity. Personal anecdotes reinforce technology’s inspirational potential, urging collaborative progress.

Links:

Posted in en-US | Tags: AgenticAI, AndroidXR, ArtificialIntelligence, DaveBurke, DemisHassabis, DonaldGlover, GeminiModels, GenerativeMedia, GoogleDeepMind, GoogleIO2025, JohannaVoolich, LizReid, SameerSamat, SundarPichai | No Comments »

[AWSReInforce2025] From possibility to production: A strong, flexible foundation for AI security

Author: Jonathan Lalou

Lecturer

The session features AWS security specialists who architect the AI security substrate, combining expertise in machine learning operations, formal methods, and cloud-native controls. Their work spans Bedrock Guardrails, SageMaker security boundaries, and agentic workflow protection.

Abstract

The presentation constructs a comprehensive AI security framework that accelerates development while maintaining enterprise-grade controls. Through layered defenses—data provenance, model isolation, runtime guardrails, and agentic supervision—it demonstrates how AWS transforms AI security from a deployment blocker into an innovation catalyst, with real-world deployments illustrating production readiness.

AI Security Risk Taxonomy and Defense Layering

AI systems introduce novel threat vectors: training data poisoning, prompt injection, model inversion, and agentic escape. AWS categorizes these across the ML lifecycle:

Data Layer: Provenance tracking, differential privacy, synthetic data generation
Model Layer: Isolation via confidential computing, integrity verification
Inference Layer: Input/output filtering, rate limiting, behavioral monitoring
Agentic Layer: Tool access control, execution sandboxing, human-in-loop gates

Defense in depth applies at each stratum, with controls compounding rather than duplicating effort.

Data Security and Provenance Foundation

Data forms the bedrock of AI trustworthiness. Amazon Macie now classifies training datasets, identifying PII leakage before model ingestion. SageMaker Feature Store implements cryptographic commitment—hashing datasets to immutable ledger entries—enabling audit trails for regulatory compliance.

\# SageMaker data provenance
feature_group = FeatureGroup(name="credit-risk")
feature_group.create(...)
commit_hash = feature_group.commit(data_frame)
audit_log.put(commit_hash, metadata)

This provenance chain supports model cards that document training data composition, bias metrics, and fairness constraints, satisfying EU AI Act requirements.

Model Isolation and Confidential Computing

Model intellectual property requires protection equivalent to source code. AWS Nitro Enclaves provide hardware-isolated execution environments:

\# Enclave attestation document
curl --cert enclave.crt --key enclave.key \
  https://enclave.local/attestation

The enclave receives encrypted model weights, decrypts internally, and serves inferences without exposing parameters. Memory encryption and remote attestation prevent exfiltration even from privileged host processes. Bedrock custom models execute within enclaves by default, eliminating trust in underlying infrastructure.

Runtime Guardrails and Content Moderation

Amazon Bedrock Guardrails implement multi-faceted content filtering:

{
  "blockedInputMessaging": "Policy violation",
  "blockedOutputsMessaging": "Response blocked",
  "contentPolicyConfig": {
    "filtersConfig": [
      {"type": "HATE", "inputStrength": "HIGH"},
      {"type": "PROMPT_INJECTION", "inputStrength": "MEDIUM"}
    ]
  }
}

Filters operate at token level, with configurable strength thresholds. PII redaction, topic blocking, and word denylists combine with contextual analysis to prevent jailbreak attempts. Guardrails integrate with CodeWhisperer to scan generated code for vulnerabilities before execution.

Agentic AI Supervision and Execution Control

Agentic workflows—LLMs that invoke tools, APIs, or other models—amplify risk surface. AWS implements execution sandboxing:

@bedrock_agent
def trading_agent(prompt):
    tools = [
        {"name": "execute_trade", "permissions": "trading:execute"},
        {"name": "read_portfolio", "permissions": "trading:read"}
    ]
    return agent.invoke(prompt, tools)

IAM-bound tool invocation ensures least privilege. Step Functions orchestrate multi-agent workflows with approval gates for high-risk actions. Anthropic’s enterprise deployment uses this pattern to route sensitive queries through human review while automating routine analysis.

Production Deployments and Operational Resilience

Robinhood’s AI-powered fraud detection processes 10 million transactions daily using SageMaker endpoints behind WAF rules that detect prompt injection patterns. BMW’s infrastructure optimization agent operates across 1,300 accounts with VPC-private networking and KMS-encrypted prompts.

These deployments share common patterns:
– Immutable infrastructure via ECS Fargate
– Blue/green model updates with Shadow Mode testing
– Continuous evaluation using held-out datasets
– Automated rollback triggered by drift detection

Future Threat Modeling and Adaptive Controls

Emerging risks—model stealing via API querying, adversarial example crafting—require proactive modeling. AWS invests in automated reasoning to prove guardrail efficacy against known attack classes. Formal methods verify that prompt filters cannot be bypassed through encoding obfuscation.

Agentic systems introduce non-deterministic execution paths. Step Functions now support probabilistic branching with confidence thresholds, routing uncertain decisions to human oversight. This hybrid approach balances automation velocity with risk management.

Conclusion: Security as AI Innovation Substrate

The AWS AI security framework demonstrates that rigorous controls need not impede velocity. By providing data provenance, model isolation, runtime guardrails, and agentic supervision as managed services, AWS enables organizations to progress from proof-of-concept to production without security debt. The flexible control plane—configurable via console, API, or IaC—adapts to evolving regulations and threat landscapes. Security becomes the substrate that accelerates AI adoption, transforming defensive posture into competitive advantage.

Links:

Lecture Video

Posted in en-US | Tags: AgenticAI, AISecurity, AWSreInforce, AWSReInforce2025, BedrockGuardrails, MLSecurity, NitroEnclaves | No Comments »

[DevoxxFR2025] Building an Agentic AI with Structured Outputs, Function Calling, and MCP

Author: Jonathan Lalou

The rapid advancements in Artificial Intelligence, particularly in large language models (LLMs), are enabling the creation of more sophisticated and autonomous AI agents – programs capable of understanding instructions, reasoning, and interacting with their environment to achieve goals. Building such agents requires effective ways for the AI model to communicate programmatically and to trigger external actions. Julien Dubois, in his deep-dive session, explored key techniques and a new protocol essential for constructing these agentic AI systems: Structured Outputs, Function Calling, and the Model-Controller Protocol (MCP). Using practical examples and the latest Java SDK developed by OpenAI, he demonstrated how to implement these features within LangChain4j, showcasing how developers can build AI agents that go beyond simple text generation.

Structured Outputs: Enabling Programmatic Communication

One of the challenges in building AI agents is getting LLMs to produce responses in a structured format that can be easily parsed and used by other parts of the application. Julien explained how Structured Outputs address this by allowing developers to define a specific JSON schema that the AI model must adhere to when generating its response. This ensures that the output is not just free-form text but follows a predictable structure, making it straightforward to map the AI’s response to data objects in programming languages like Java. He demonstrated how to provide the LLM with a JSON schema definition and constrain its output to match that schema, enabling reliable programmatic communication between the AI model and the application logic. This is crucial for scenarios where the AI needs to provide data in a specific format for further processing or action.

Function Calling: Giving AI the Ability to Act

To be truly agentic, an AI needs the ability to perform actions in the real world or interact with external tools and services. Julien introduced Function Calling as a powerful mechanism that allows developers to define functions in their code (e.g., Java methods) and expose them to the AI model. The LLM can then understand when a user’s request requires calling one of these functions and generate a structured output indicating which function to call and with what arguments. The application then intercepts this output, executes the corresponding function, and can provide the function’s result back to the AI, allowing for a multi-turn interaction where the AI reasons, acts, and incorporates the results into its subsequent responses. Julien demonstrated how to define function “signatures” that the AI can understand and how to handle the function calls triggered by the AI, showcasing scenarios like retrieving information from a database or interacting with an external API based on the user’s natural language request.

MCP: Standardizing LLM Interaction

While Structured Outputs and Function Calling provide the capabilities for AI communication and action, the Model-Controller Protocol (MCP) emerges as a new standard to streamline how LLMs interact with various data sources and tools. Julien discussed MCP as a protocol that aims to standardize the communication layer between AI models (the “Model”) and the application logic that orchestrates them and provides access to external resources (the “Controller”). This standardization can facilitate building more portable and interoperable AI agentic systems, allowing developers to switch between different LLMs or integrate new tools and data sources more easily. While details of MCP might still be evolving, its goal is to provide a common interface for tasks like function calling, accessing external knowledge, and managing conversational state. Julien illustrated how libraries like LangChain4j are adopting these concepts and integrating with protocols like MCP to simplify the development of sophisticated AI agents. The presentation, rich in code examples using the OpenAI Java SDK, provided developers with the practical knowledge and tools to start building the next generation of agentic AI applications.

Links:

Julien Dubois: https://www.linkedin.com/in/juliendubois/
Microsoft: https://www.microsoft.com/
LangChain4j on GitHub: https://github.com/langchain4j/langchain4j
OpenAI: https://openai.com/
Devoxx France LinkedIn: https://www.linkedin.com/company/devoxx-france/
Devoxx France Bluesky: https://bsky.app/profile/devoxx.fr
Devoxx France Website: https://www.devoxx.fr/

Posted in en-US | Tags: AgenticAI, AI, DevoxxFR2025, FunctionCalling, Java, JulienDubois, LangChain4j, LLM, MCP, Microsoft, OpenAI, StructuredOutputs | No Comments »