Jonathan Lalou's Blog

Posts Tagged ‘GenerativeAI’

[AWSReInventPartnerSessions2024] Architecting Real-Time Generative AI Applications: A Confluent-AWS-Anthropic Integration Framework

Lecturer

Pascal Vuylsteker serves as Senior Director of Innovation at Confluent, where he pioneers scalable data streaming architectures that underpin enterprise artificial intelligence systems. Mario Rodriguez functions as Senior Partner Solutions Architect at AWS, specializing in generative AI service orchestration across cloud environments. Gavin Doyle leads the Applied AI team at Anthropic, directing development of safe, steerable, and interpretable large language models.

Abstract

This scholarly examination delineates a comprehensive methodology for constructing real-time generative AI applications through the synergistic integration of Confluent’s streaming platform, Amazon Bedrock’s managed foundation model ecosystem, and Anthropic’s Claude models. The analysis elucidates data governance centrality, retrieval-augmented generation (RAG) with continuous contextual synchronization, Flink-mediated inference execution, and vector database orchestration. Through architectural decomposition and configuration exemplars, it demonstrates how these components eliminate data silos, ensure temporal relevance in AI outputs, and enable secure, scalable enterprise innovation.

Governance-Centric Modern Data Architecture

Enterprise competitiveness increasingly hinges upon real-time data streaming capabilities, with seventy-nine percent of IT leaders affirming its strategic necessity. However, persistent barriers—siloed repositories, skill asymmetries, governance complexity, and generative AI’s voracious data requirements—impede realization.

Contemporary data architectures position governance as the foundational core, ensuring security, compliance, and accessibility. Radiating outward are data warehouses, streaming analytics engines, and generative AI applications. This configuration systematically dismantles silos while satisfying instantaneous insight demands.

Confluent operationalizes this vision by providing real-time data integration across ingestion pipelines, data lakes, and batch processing systems. It delivers precisely contextualized information at the moment of need—prerequisite for effective generative AI deployment.

Amazon Bedrock complements this through managed access to foundation models from Anthropic, AI21 Labs, Cohere, Meta, Mistral AI, Stability AI, and Amazon. The service supports experimentation, fine-tuning, continued pre-training, and agent orchestration. Security architecture prohibits customer data incorporation into base models, maintains isolation for customized variants, implements encryption, enforces granular access controls, and complies with HIPAA, GDPR, SOC, ISO, and CSA STAR.

Proprietary data constitutes the primary differentiation vector. Three techniques leverage this advantage: RAG injects external knowledge into prompts; fine-tuning specializes models on domain corpora; continued pre-training expands comprehension using enterprise datasets.

\# Bedrock model customization (conceptual)
modelCustomization:
  baseModel: anthropic.claude-3-sonnet
  trainingData: s3://enterprise-corpus/
  fineTuning:
    epochs: 3
    learningRate: 0.0001

Real-Time Contextual Injection and Flink Inference Orchestration

Confluent integrates directly with vector databases, ensuring conversational systems operate upon current, relevant information. This transcends mere data transport to deliver AI-actionable context.

Flink Inference enables real-time machine learning via Flink SQL, dramatically simplifying model integration into operational workflows. Configuration defines endpoints, authentication, prompts, and invocation patterns.

The architectural pipeline commences with document publication to Kafka topics. Documents undergo chunking for parallel processing, embedding generation via Bedrock/Anthropic, and indexing into MongoDB Atlas with original chunks. Quick-start templates deploy this workflow, incorporating structured data summarization through Claude for natural language querying.

Chatbot interactions initiate via API/Kafka, generate embeddings, retrieve documents, construct prompts with streaming context, and invoke Claude. Token optimization employs conversation summarization; enhanced vector queries via Claude-generated reformulations yield superior retrieval.

-- Flink model definition
CREATE MODEL claude_haiku WITH (
  'connector' = 'anthropic',
  'endpoint' = 'https://api.anthropic.com/v1/messages',
  'api.key' = 'sk-ant-...',
  'model' = 'claude-3-haiku-20240307'
);

-- Real-time inference
INSERT INTO responses
SELECT ML_PREDICT('claude_haiku', enriched_prompt) FROM interactions;

Flink provides cost-effective scaling, automatic elasticity, and native integration with AWS services, Snowflake, Databricks, and MongoDB. Anthropic models remain fully accessible via Bedrock.

Strategic Implications for Enterprise AI

The methodology transforms RAG from static knowledge injection to dynamic reasoning augmentation. Contextual retrieval and in-context learning mitigate hallucinations while enabling domain-specific differentiation.

Organizations achieve decision-making superiority through proprietary data in real-time contexts. Governance scales securely; challenges like data drift necessitate continuous refinement.

Future trajectories include declarative inference and hybrid vector-stream architectures for anticipatory intelligence.

Links:

Video: How to build a real-time gen AI app with AWS, Confluent & Anthropic

Posted in en-US | Tags: Anthropic, AWSreInvent, AWSReInventPartnerSessions2024, Bedrock, Confluent, DataStreaming, Flink, GavinDoyle, GenerativeAI, MarioRodriguez, PascalVuylsteker, RAG, RealTimeAI | No Comments »

[DevoxxUK2024] Breaking AI: Live Coding and Hacking Applications with Generative AI by Simon Maple and Brian Vermeer

Author: Jonathan Lalou

Simon Maple and Brian Vermeer, both seasoned developer advocates with extensive experience at Snyk and other tech firms, delivered an electrifying live coding session at DevoxxUK2024, exploring the double-edged sword of generative AI in software development. Simon, recently transitioned to a stealth-mode startup, and Brian, a current Snyk advocate, demonstrate how tools like GitHub Copilot and ChatGPT can accelerate coding velocity while introducing significant security risks. Through a live-coded Spring Boot coffee shop application, they expose vulnerabilities such as SQL injection, directory traversal, and cross-site scripting, emphasizing the need for rigorous validation and security practices. Their engaging, demo-driven approach underscores the balance between innovation and caution, offering developers actionable insights for leveraging AI safely.

Accelerating Development with Generative AI

Simon and Brian kick off by highlighting the productivity boost offered by generative AI tools, citing studies that suggest a 55% increase in developer efficiency and a 27% higher likelihood of meeting project goals. They build a Spring Boot application with a Thymeleaf front end, using Copilot to generate a homepage with a banner and product table. The process showcases AI’s ability to rapidly produce code snippets, such as HTML fragments, based on minimal prompts. However, they caution that this speed comes with risks, as AI often prioritizes completion over correctness, potentially embedding vulnerabilities. Their live demo illustrates how Copilot’s suggestions evolve with context, but also how developers must critically evaluate outputs to ensure functionality and security.

Exposing SQL Injection Vulnerabilities

The duo dives into a search functionality for their coffee shop application, where Copilot generates a query to filter products by name or description. However, the initial code concatenates user input directly into an SQL query, creating a classic SQL injection vulnerability. Brian demonstrates an exploit by injecting malicious input to set product prices to zero, highlighting how unchecked AI-generated code can compromise a system. They then refactor the code using prepared statements, showing how parameterization separates user input from the query execution plan, effectively neutralizing the vulnerability. This example underscores the importance of understanding AI outputs and applying secure coding practices, as tools like Copilot may not inherently prioritize security.

Mitigating Directory Traversal Risks

Next, Simon and Brian tackle a profile picture upload feature, where Copilot generates code to save files to a directory. The initial implementation concatenates user-provided file names with a base path, opening the door to directory traversal attacks. Using Burp Suite, they demonstrate how an attacker could overwrite critical files by manipulating the file name with “../” sequences. To address this, they refine the code to normalize paths, ensuring files remain within the intended directory. The session highlights the limitations of AI in detecting complex vulnerabilities like path traversal, emphasizing the need for developer vigilance and tools like Snyk to catch issues early in the development cycle.

Addressing Cross-Site Scripting Threats

The final vulnerability explored is cross-site scripting (XSS) in a product page feature. The AI-generated code directly embeds user input (product names) into HTML without sanitization, allowing Brian to inject a malicious script that captures session cookies. They demonstrate both reflective and stored XSS, showing how attackers could exploit these to hijack user sessions. While querying ChatGPT for a code review fails to pinpoint the XSS issue, Simon and Brian advocate for using established libraries like Spring Utils for input sanitization. This segment reinforces the necessity of combining AI tools with robust security practices and automated scanning to mitigate risks that AI might overlook.

Balancing Innovation and Security

Throughout the session, Simon and Brian stress that generative AI, while transformative, demands a cautious approach. They liken AI tools to junior developers, capable of producing functional code but requiring oversight to avoid errors or vulnerabilities. Real-world examples, such as a Samsung employee leaking sensitive code via ChatGPT, underscore the risks of blindly trusting AI outputs. They advocate for education, clear guidelines, and security tooling to complement AI-assisted development. By integrating tools like Snyk for vulnerability scanning and fostering a culture of code review, developers can harness AI’s potential while safeguarding their applications against threats.

Links:

Posted in en-US | Tags: BrianVermeer, ChatGPT, Copilot, DevoxxUK2024, GenerativeAI, Security, SimonMaple, Snyk, SpringBoot | No Comments »

[SpringIO2024] Text-to-SQL: Chat with a Database Using Generative AI by Victor Martin & Corrado De Bari @ Spring I/O 2024

Author: Jonathan Lalou

At Spring I/O 2024 in Barcelona, Victor Martin, a product manager for Oracle Database, delivered a compelling session on Text-to-SQL, a transformative approach to querying databases using natural language, powered by generative AI. Stepping in for his colleague Corrado De Bari, who was unable to attend, Victor explored how Large Language Models (LLMs) and Oracle’s innovative tools, including Spring AI and Select AI, enable business users with no SQL expertise to interact seamlessly with databases. The talk highlighted practical implementations, security considerations, and emerging technologies like AI Vector Search, offering a glimpse into the future of database interaction.

The Promise of Text-to-SQL

Text-to-SQL leverages LLMs to translate natural language queries into executable SQL, democratizing data access for non-technical users. Victor began by posing a challenge: how long would it take to build a REST endpoint for a business user to query a database using plain text? Traditionally, this task required manual SQL construction, schema validation, and error handling. With modern frameworks like Spring Boot and Oracle’s Select AI, this process is streamlined. Select AI, integrated into Oracle Database 19c and enhanced in 23 AI, supports features like RUN_SQL to execute generated queries, NARRATE to return results as human-readable text, and EXPLAIN_SQL to detail query reasoning. Victor emphasized that these tools reduce development time, enabling rapid deployment of user-friendly database interfaces.

Configuring Oracle Database for Text-to-SQL

Implementing Text-to-SQL requires minimal configuration within Oracle Database. Victor outlined the steps: first, set up an Access Control List (ACL) to allow external LLM calls, specifying the host and port. Next, create credentials for the LLM service (e.g., Oracle Cloud Infrastructure Generative AI, Open AI, or Azure Open AI) using the DBMS_CLOUD_AI package. Finally, define a profile linking the schema, tables, and chosen LLM. This profile is selected per session to ensure queries use the correct context. Victor demonstrated this with a Spring Boot application, where the profile is set before invoking Select AI. The simplicity of this setup, combined with Spring AI’s abstraction, makes it accessible even for developers new to AI-driven database interactions.

Enhancing Queries with Schema Annotations

A key challenge in Text-to-SQL is ensuring LLMs interpret ambiguous schemas correctly. Victor highlighted that table and column names like “C1” or “Table1” can confuse models. To address this, Oracle Database supports annotations—comments on tables and columns that provide business context. For example, annotating a column as “process status” with possible values clarifies its purpose, aiding the LLM in generating accurate joins and filters. These annotations, which don’t affect production queries, are created collaboratively by DBAs and business stakeholders. Victor shared a real-world example from Oracle’s telecom applications, where annotated schemas improved query precision, enabling complex queries without manual intervention.

AI Vector Search: Querying Unstructured Data

Victor introduced AI Vector Search, a cutting-edge feature in Oracle Database 23 AI, which extends Text-to-SQL to unstructured data. Unlike traditional SQL, which queries structured data, vector search encodes text, images, or audio into high-dimensional vectors representing semantic meaning. These vectors, stored as a new VECTOR data type, enable similarity-based queries. For instance, a job search query for “software engineer positions in New York” can combine structured filters (e.g., location) with vector-based matching of job descriptions and resumes. Victor explained how embedding models, deployed via Oracle’s DBMS_DATA_MINING package, generate these vectors, with metrics like cosine similarity determining relevance. This capability opens new use cases, from document search to personalized recommendations.

Links:

Posted in en-US | Tags: AIVectorSearch, CorradoDeBari, GenerativeAI, OracleDatabase, SelectAI, SpringAI, SpringIO2024, TextToSQL, VictorMartin | No Comments »

[DotAI2024] DotAI 2024: Ines Montani – Crafting Resilient NLP Systems in the Generative Era

Author: Jonathan Lalou

Ines Montani, co-founder and CEO of Explosion AI, illuminated the pitfalls and potentials of natural language processing pipelines at DotAI 2024. As a core contributor to spaCy—an open-source NLP powerhouse—and Prodigy, a data annotation suite, Montani champions modular tools that blend human intuition with computational might. Her address critiqued the “prompts suffice” ethos, advocating hybrid architectures that fuse rules, examples, and generative flair for robust, production-viable solutions.

Harmonizing Paradigms for Enduring Intelligence

Montani traced instruction evolution: from rigid rules yielding brittle brittleness to supervised learning’s nuanced exemplars, now augmented by in-context prompts’ linguistic alchemy. Rules shine in clarity for novices, yet crumble under data flux; examples infuse domain savvy but demand curation toil; prompts democratize prototyping, yet hallucinate sans anchors.

The synergy? Layered pipelines where rules scaffold prompts, examples calibrate outputs, and LLMs infuse creativity. Montani showcased spaCy’s evolution: rule-based tokenizers ensure consistency, while generative components handle ambiguity, like entity resolution in noisy texts. This modularity mitigates drift, preserving fidelity across model swaps.

In industrial extraction—parsing resumes or contracts—Montani stressed data’s primacy: raw inputs reveal logic gaps, prompting refactorings that unearth “window-knocking machines”—flawed proxies mistaking correlation for causation. A chatbot querying calendars, she analogized, falters if oblivious to time zones; true utility demands holistic orchestration.

Fostering Modularity Amid Generative Hype

Montani cautioned against abstraction overload: leaky layers spawn brittle facades, where one-liners unravel on edge cases. Instead, embrace transparency—Prodigy’s active learning loops refine datasets iteratively, blending human oversight with AI proposals to curb over-reliance.

Retrieval-augmented generation (RAG) exemplifies balanced integration: LLMs query structured stores, yielding chat interfaces atop databases, supplanting clunky GUIs. Yet, Montani warned, context dictates efficacy; for analytical dives, raw views trump conversational veils.

Her ethos: interrogate intent—who wields the tool, what risks lurk? Surprise greets data dives, unveiling bespoke logics that generative magic alone can’t conjure. Efficiency, privacy, and modularity—spaCy’s hallmarks—thwart big-tech monoliths, empowering bespoke ingenuity.

In sum, Montani’s blueprint rejects compromise: generative AI amplifies, not supplants, principled engineering, birthing interfaces that endure and elevate.

Links:

Posted in en-US | Tags: DotAI2024, GenerativeAI, InesMontani, NLP, Prodigy, spaCy | No Comments »

[PHPForumParis2023] Experience Report: Building Two Open-Source Personal AIs with OpenAI – Maxime Thoonsen

Author: Jonathan Lalou

Maxime Thoonsen, CTO at Theodo, shared an exhilarating session at Forum PHP 2023, detailing his experience building two open-source personal AI applications using OpenAI’s technologies. As an organizer of the Generative AI Paris Meetup, Maxime’s passion for the PHP community and innovative AI solutions shone through. His step-by-step approach demystified AI development, encouraging PHP developers to explore generative AI by demonstrating its simplicity and potential through practical examples.

Understanding Generative AI’s Potential

Maxime began by introducing the capabilities of generative AI, emphasizing its accessibility for PHP developers. He explained how OpenAI’s APIs enable the creation of applications that process and generate human-like text. Drawing from his work at Theodo, Maxime showcased two personal AI projects, illustrating how they leverage semantic search and embeddings to deliver tailored responses. His enthusiasm for the community, where he began his speaking career, underscored the collaborative spirit driving AI innovation.

Practical AI Development with OpenAI

Delving into the technical details, Maxime walked the audience through building AI applications using OpenAI’s APIs. He highlighted the simplicity of implementing semantic search to retrieve relevant data from documents, advising against premature fine-tuning in favor of straightforward similarity searches. Responding to an audience question, Maxime noted the availability of open-source alternatives like Llama and Mistral, though he acknowledged OpenAI’s GPT-4 as a leader in embedding accuracy. His examples empowered developers to start building AI-driven features in their PHP projects.

Navigating the AI Ecosystem

Maxime concluded by addressing the rapidly evolving AI landscape, likening it to the proliferation of JavaScript frameworks. He emphasized the cost-effectiveness of smaller open-source models for specific use cases, while noting OpenAI’s edge in precision. His talk inspired developers to join communities like the Generative AI Paris Meetup to explore AI further, fostering a sense of curiosity and experimentation within the PHP ecosystem.

Links:

Posted in en-US | Tags: GenerativeAI, MaximeThoonsen, OpenAI, PHP, PHPForumParis2023, Theodo | No Comments »

[DevoxxBE2023] Build a Generative AI App in Project IDX and Firebase by Prakhar Srivastav

Author: Jonathan Lalou

At Devoxx Belgium 2023, Prakhar Srivastav, a software engineer at Google, unveiled the power of Project IDX and Firebase in crafting a generative AI mobile application. His session illuminated how developers can harness these tools to streamline full-stack, multiplatform app development directly from the browser, eliminating cumbersome local setups. Through a live demonstration, Prakhar showcased the creation of “Listed,” a Flutter-based app that leverages Google’s PaLM API to break down user-defined goals into actionable subtasks, offering a practical tool for task management. His engaging presentation, enriched with real-time coding, highlighted the synergy of cloud-based development environments and AI-driven solutions.

Introducing Project IDX: A Cloud-Based Development Revolution

Prakhar introduced Project IDX as a transformative cloud-based development environment designed to simplify the creation of multiplatform applications. Unlike traditional setups requiring hefty binaries like Xcode or Android Studio, Project IDX enables developers to work entirely in the browser. Prakhar demonstrated this by running Android and iOS emulators side-by-side within the browser, showcasing a Flutter app that compiles to multiple platforms—Android, iOS, web, Linux, and macOS—from a single codebase. This eliminates the need for platform-specific configurations, making development accessible even on lightweight devices like Chromebooks.

The live demo featured “Listed,” a mobile app where users input a goal, such as preparing for a tech talk, and receive AI-generated subtasks and tips. For instance, entering “give a tech talk at a conference” yielded steps like choosing a relevant topic and practicing the presentation, with a tip to have a backup plan for technical issues. Prakhar’s real-time tweak—changing the app’s color scheme from green to red—illustrated the iterative development flow, where changes are instantly reflected in the emulator, enhancing productivity and experimentation.

Harnessing the PaLM API for Generative AI

Central to the app’s functionality is Google’s PaLM API, which Prakhar utilized to integrate generative AI capabilities. He explained that large language models (LLMs), like those powering the PaLM API, act as sophisticated autocomplete systems, predicting likely text outputs based on extensive training data. For “Listed,” the text API was chosen for its suitability in single-turn interactions, such as generating subtasks from a user’s query. Prakhar emphasized the importance of crafting effective prompts, comparing a vague prompt like “the sky is” to a precise one like “complete the sentence: the sky is,” which yields more relevant results.

To enhance the AI’s output, Prakhar employed few-shot prompting, providing the model with examples of desired responses. For instance, for the query “go camping,” the prompt included sample subtasks like choosing a campsite and packing meals, along with a tip about wildlife safety. This structured approach ensured the model generated contextually accurate and actionable suggestions, making the app intuitive for users tackling complex tasks.

Securing AI Integration with Firebase Extensions

Integrating the PaLM API into a mobile app poses security challenges, particularly around API key exposure. Prakhar addressed this by leveraging Firebase Extensions, which provide pre-packaged solutions to streamline backend integration. Specifically, he used a Firebase Extension to securely call the PaLM API via Cloud Functions, avoiding the need to embed sensitive API keys in the client-side Flutter app. This setup not only enhances security but also simplifies infrastructure management, as the extension handles logging, monitoring, and optional AppCheck for client verification.

In the live demo, Prakhar navigated the Firebase Extensions Marketplace, selecting the “Call PaLM API Securely” extension. With a few clicks, he deployed Cloud Functions that exposed a POST API for sending prompts and receiving AI-generated responses. The code walkthrough revealed a straightforward implementation in Dart, where the app constructs a JSON payload with the prompt, model name (text-bison-001), and temperature (0.25 for deterministic outputs), ensuring seamless and secure communication with the backend.

Building the Flutter App: Simplicity and Collaboration

The Flutter app’s architecture, built within Project IDX, was designed for simplicity and collaboration. Prakhar walked through the main.dart file, which scaffolds the app’s UI with a material-themed interface, an input field for user queries, and a list to display AI-generated tasks. The app uses anonymous Firebase authentication to secure backend calls without requiring user logins, enhancing accessibility. A PromptBuilder class dynamically constructs prompts by combining predefined prefixes and examples, ensuring flexibility in handling varied user inputs.

Project IDX’s integration with Visual Studio Code’s open-source framework added collaborative features. Prakhar demonstrated how developers can invite colleagues to a shared workspace, enabling real-time collaboration. Additionally, the IDE’s AI capabilities allow users to explain selected code or generate new snippets, streamlining development. For instance, selecting the PromptBuilder class and requesting an explanation provided detailed insights into its parameters, showcasing how Project IDX enhances developer productivity.

Links:

Posted in en-US | Tags: CloudDevelopment, DevoxxBE2023, Firebase, Flutter, GenerativeAI, Google, MobileDevelopment, PaLMAPI, PrakharSrivastav, ProjectIDX | No Comments »