Recent Posts
Archives

Posts Tagged ‘GenerativeAI’

PostHeaderIcon [NDCMelbourne2025] How to Work with Generative AI in JavaScript – Phil Nash

Phil Nash, a developer relations engineer at DataStax, delivers a comprehensive guide to leveraging generative AI in JavaScript at NDC Melbourne 2025. His talk demystifies the process of building AI-powered applications, emphasizing that JavaScript developers can harness existing skills to create sophisticated solutions without needing deep machine learning expertise. Through practical examples and insights into tools like Gemini and retrieval-augmented generation (RAG), Phil empowers developers to explore this rapidly evolving field.

Understanding Generative AI Fundamentals

Phil begins by addressing the excitement surrounding generative AI, noting its accessibility since the release of the GPT-3.5 API two years ago. He emphasizes that JavaScript developers are well-positioned to engage with AI due to robust tooling and APIs, despite the field’s Python-centric origins. Using Google’s Gemini model as an example, Phil demonstrates how to generate content with minimal code, highlighting the importance of understanding core concepts like token generation and model behavior.

He explains tokenization, using OpenAI’s byte pair encoding as an example, where text is broken into probabilistic tokens. Parameters like top-k, top-p, and temperature allow developers to control output randomness, with Phil cautioning against overly high settings that produce nonsensical results, humorously illustrated by a chaotic AI-generated story about a gnome.

Enhancing AI with Prompt Engineering

Prompt engineering emerges as a critical skill for refining AI outputs. Phil contrasts zero-shot prompting, which offers minimal context, with techniques like providing examples or system prompts to guide model behavior. For instance, a system prompt defining a “capital city assistant” ensures concise, accurate responses. He also explores chain-of-thought prompting, where instructing the model to think step-by-step improves its ability to solve complex problems, such as a modified river-crossing riddle.

Phil underscores the need for evaluation to ensure prompt reliability, as slight changes can significantly alter outcomes. This structured approach transforms prompt engineering from guesswork into a disciplined practice, enabling developers to tailor AI responses effectively.

Retrieval-Augmented Generation for Contextual Awareness

To address AI models’ limitations, such as outdated or private data, Phil introduces retrieval-augmented generation (RAG). RAG enhances models by integrating external data, like conference talk descriptions, into prompts. He explains how vector embeddings—multidimensional representations of text—enable semantic searches, using cosine similarity to find relevant content. With DataStax’s Astra DB, developers can store and query vectorized data efficiently, as demonstrated in a demo where Phil’s bot retrieves details about NDC Melbourne talks.

This approach allows AI to provide contextually relevant answers, such as identifying AI-related talks or conference events, making it a powerful tool for building intelligent applications.

Streaming Responses and Building Agents

Phil highlights the importance of user experience, noting that AI responses can be slow. Streaming, supported by APIs like Gemini’s generateContentStream, delivers tokens incrementally, improving perceived performance. He demonstrates streaming results to a webpage using JavaScript’s fetch and text decoder streams, showcasing how to create responsive front-end experiences.

The talk culminates with AI agents, which Phil describes as systems that perceive, reason, plan, and act using tools. By defining functions in JSON schema, developers can enable models to perform tasks like arithmetic or fetching web content. A demo bot uses tools to troubleshoot a keyboard issue and query GitHub, illustrating agents’ potential to solve complex problems dynamically.

Conclusion: Empowering JavaScript Developers

Phil concludes by encouraging developers to experiment with generative AI, leveraging tools like Langflow for visual prototyping and exploring browser-based models like Gemini Nano. His talk is a call to action, urging JavaScript developers to build innovative applications by combining AI capabilities with their existing expertise. By mastering prompt engineering, RAG, streaming, and agents, developers can create powerful, user-centric solutions.

Links:

PostHeaderIcon [DevoxxUK2025] Concerto for Java and AI: Building Production-Ready LLM Applications

At DevoxxUK2025, Thomas Vitale, a software engineer at Systematic, delivered an inspiring session on integrating generative AI into Java applications to enhance his music composition process. Combining his passion for music and software engineering, Thomas showcased a “composer assistant” application built with Spring AI, addressing real-world use cases like text classification, semantic search, and structured data extraction. Through live coding and a musical performance, he demonstrated how Java developers can leverage large language models (LLMs) for production-ready applications, emphasizing security, observability, and developer experience. His talk culminated in a live composition for an audience-chosen action movie scene, blending AI-driven suggestions with human creativity.

The Why Factor for AI Integration

Thomas introduced his “Why Factor” to evaluate hype technologies like generative AI. First, identify the problem: for his composer assistant, he needed to organize and access musical data efficiently. Second, assess production readiness: LLMs must be secure and reliable for real-world use. Third, prioritize developer experience: tools like Spring AI simplify integration without disrupting workflows. By focusing on these principles, Thomas avoided blindly adopting AI, ensuring it solved specific issues, such as automating data classification to free up time for creative tasks like composing music.

Enhancing Applications with Spring AI

Using a Spring Boot application with a Thymeleaf frontend, Thomas integrated Spring AI to connect to LLMs like those from Ollama (local) and Mistral AI (cloud). He demonstrated text classification by creating a POST endpoint to categorize musical data (e.g., “Irish tin whistle” as an instrument) using a chat client API. To mitigate risks like prompt injection attacks, he employed Java enumerations to enforce structured outputs, converting free text into JSON-parsed Java objects. This approach ensured security and usability, allowing developers to swap models without code changes, enhancing flexibility for production environments.

Semantic Search and Retrieval-Augmented Generation

Thomas addressed the challenge of searching musical data by meaning, not just keywords, using semantic search. By leveraging embedding models in Spring AI, he converted text (e.g., “melancholic”) into numerical vectors stored in a PostgreSQL database, enabling searches for related terms like “sad.” He extended this with retrieval-augmented generation (RAG), where a chat client advisor retrieves relevant data before querying the LLM. For instance, asking, “What instruments for a melancholic scene?” returned suggestions like cello, based on his dataset, improving search accuracy and user experience.

Structured Data Extraction and Human Oversight

To streamline data entry, Thomas implemented structured data extraction, converting unstructured director notes (e.g., from audio recordings) into JSON objects for database storage. Spring AI facilitated this by defining a JSON schema for the LLM to follow, ensuring structured outputs. Recognizing LLMs’ potential for errors, he emphasized keeping humans in the loop, requiring users to review extracted data before saving. This approach, applied to his composer assistant, reduced manual effort while maintaining accuracy, applicable to scenarios like customer support ticket processing.

Tools and MCP for Enhanced Functionality

Thomas enhanced his application with tools, enabling LLMs to call internal APIs, such as saving composition notes. Using Spring Data, he annotated methods to make them accessible to the model, allowing automated actions like data storage. He also introduced the Model Context Protocol (MCP), implemented in Quarkus, to integrate with external music software via MIDI signals. This allowed the LLM to play chord progressions (e.g., in A minor) through his piano software, demonstrating how MCP extends AI capabilities across local processes, though he cautioned it’s not yet production-ready.

Observability and Live Composition

To ensure production readiness, Thomas integrated OpenTelemetry for observability, tracking LLM operations like token usage and prompt augmentation. During the session, he invited the audience to choose a movie scene (action won) and used his application to generate a composition plan, suggesting chord progressions (e.g., I-VI-III-VII) and instruments like percussion and strings. He performed the music live, copy-pasting AI-suggested notes into his software, fixing minor bugs, and adding creative touches, showcasing a practical blend of AI automation and human artistry.

Links:

PostHeaderIcon [AWSReInventPartnerSessions2024] Constructing Real-Time Generative AI Systems through Integrated Streaming, Managed Models, and Safety-Centric Language Architectures

Lecturer

Pascal Vuylsteker serves as Senior Director of Innovation at Confluent, where he spearheads advancements in scalable data streaming platforms designed to empower enterprise artificial intelligence initiatives. Mario Rodriguez operates as Senior Partner Solutions Architect at AWS, concentrating on seamless integrations of generative AI services within cloud ecosystems. Gavin Doyle heads the Applied AI team at Anthropic, directing efforts toward developing reliable, interpretable, and ethically aligned large language models.

Abstract

This comprehensive scholarly analysis investigates the foundational principles and practical methodologies for deploying real-time generative AI applications by harmonizing Confluent’s data streaming capabilities with Amazon Bedrock’s fully managed foundation model access and Anthropic’s advanced language models. The discussion centers on establishing robust data governance frameworks, implementing retrieval-augmented generation with continuous contextual updates, and leveraging Flink SQL for instantaneous inference. Through detailed architectural examinations and illustrative configurations, the article elucidates how these components dismantle data silos, ensure up-to-date relevance in AI responses, and facilitate scalable, secure innovation across organizational boundaries.

Establishing Governance-Centric Modern Data Infrastructures

Contemporary enterprise environments increasingly acknowledge the indispensable role of data streaming in fostering operational agility. Empirical insights reveal that seventy-nine percent of information technology executives consider real-time data flows essential for maintaining competitive advantage. Nevertheless, persistent obstacles—ranging from fragmented technical competencies and isolated data repositories to escalating governance complexities and heightened expectations from generative AI adoption—continue to hinder comprehensive exploitation of these potentials.

To counteract such impediments, contemporary data architectures prioritize governance as the pivotal nucleus. This core ensures that information remains secure, compliant with regulatory standards, and readily accessible to authorized stakeholders. Encircling this nucleus are interdependent elements including data warehouses for structured storage, streaming analytics for immediate processing, and generative AI applications that derive actionable intelligence. Such a holistic configuration empowers institutions to eradicate silos, achieve elastic scalability, and satisfy burgeoning demands for instantaneous insights.

Confluent emerges as the vital connective framework within this paradigm, facilitating uninterrupted real-time data synchronization across disparate systems. By bridging ingestion pipelines, data lakes, and batch-oriented workflows, Confluent guarantees that information arrives at designated destinations precisely when required. Absent this foundational layer, the construction of cohesive generative AI solutions becomes substantially more arduous, often resulting in delayed or inconsistent outputs.

Complementing this streaming backbone, Amazon Bedrock delivers a fully managed service granting access to an array of foundation models sourced from leading providers such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon itself. Bedrock supports diverse experimentation modalities, enables model customization through fine-tuning or extended pre-training, and permits the orchestration of intelligent agents without necessitating extensive coding expertise. From a security perspective, Bedrock rigorously prohibits the incorporation of customer data into baseline models, maintains isolation for fine-tuned variants, implements encryption protocols, enforces granular access controls aligned with AWS identity management, and adheres to certifications including HIPAA, GDPR, SOC, ISO, and CSA STAR.

The differentiation of generative AI applications hinges predominantly on proprietary datasets. Organizations possessing comparable access to foundation models achieve superiority by capitalizing on unique internal assets. Three principal techniques harness this advantage: retrieval-augmented generation incorporates external knowledge directly into prompt engineering; fine-tuning crafts specialized models tailored to domain-specific corpora; continued pre-training broadens model comprehension using enterprise-scale information repositories.

For instance, an online travel agency might synthesize personalized itineraries by amalgamating live flight availability, client profiles, inventory levels, and historical preferences. AWS furnishes an extensive suite of services accommodating unstructured, structured, streaming, and vectorized data formats, thereby enabling seamless integration across heterogeneous sources while preserving lifecycle security.

Orchestrating Real-Time Contextual Enrichment and Inference Mechanisms

Confluent assumes a critical position by directly interfacing with vector databases, thereby assuring that conversational AI frameworks consistently operate upon the most pertinent and current information. This integration transcends basic data translocation, emphasizing the delivery of contextualized, AI-actionable content.

Central to this orchestration is Flink Inference, a sophisticated capability within Confluent Cloud that facilitates instantaneous machine learning predictions through Flink SQL syntax. This approach dramatically simplifies the embedding of predictive models into operational workflows, yielding immediate analytical outcomes and supporting real-time decision-making grounded in accurate, contemporaneous data.

Configuration commences with establishing connectivity between Flink environments and target models utilizing the Confluent command-line interface. Parameters specify endpoints, authentication credentials, and model identifiers—accommodating various Claude iterations alongside other compatible architectures. Subsequent commands define reusable prompt templates, allowing baseline instructions to persist while dynamic elements vary per invocation. Finally, data insertion invokes the ML_PREDICT function, passing relevant parameters for processing.

Architecturally, the pipeline initiates with document or metadata publication to Kafka topics, forming ingress points for downstream transformation. Where appropriate, documents undergo segmentation into manageable chunks to promote parallel execution and enhance computational efficiency. Embeddings are then generated for each segment leveraging Bedrock or Anthropic services, after which these vector representations—accompanied by original chunks—are indexed within a vector store such as MongoDB Atlas.

To accelerate adoption, dedicated quick-start repositories provide deployable templates encapsulating this workflow. Notably, these templates incorporate structured document summarization via Claude, converting tabular or hierarchical data into narrative abstracts suitable for natural language querying.

Interactive sessions begin through API gateways or direct Kafka clients, enabling bidirectional real-time communication. User queries generate embeddings, which subsequently retrieve semantically aligned documents from the vector repository. Retrieved artifacts, augmented by available streaming context, inform prompt construction to maximize relevance and precision. The resultant engineered prompt undergoes processing by Claude on Anthropic Cloud, producing responses that reflect both historical knowledge and live situational awareness.

Efficiency enhancements include conversational summarization to mitigate token proliferation and refine large language model performance. Empirical observations indicate that Claude-generated query reformulations for vector retrieval substantially outperform direct human phrasing, yielding markedly superior document recall.

CREATE MODEL anthropic_claude WITH (
  'connector' = 'anthropic',
  'endpoint' = 'https://api.anthropic.com/v1/messages',
  'api.key' = 'sk-ant-your-key-here',
  'model' = 'claude-3-opus-20240229'
);

CREATE TABLE refined_queries AS
SELECT ML_PREDICT(
  'anthropic_claude',
  CONCAT('Rephrase for vector search: ', user_query)
) AS optimized_query
FROM raw_interactions;

Flink’s value proposition extends beyond connectivity to encompass cost-effectiveness, automatic scaling for voluminous workloads, and native interoperability with extensive ecosystems. Confluent maintains certified integrations across major AWS offerings, prominent data warehouses including Snowflake and Databricks, and leading vector databases such as MongoDB. Anthropic models remain comprehensively accessible via Bedrock, reflecting strategic collaborations spanning product interfaces to silicon-level optimizations.

Analytical Implications and Strategic Trajectories for Enterprise AI Deployment

The methodological synthesis presented—encompassing streaming orchestration, managed model accessibility, and safety-oriented language processing—fundamentally reconfigures retrieval-augmented generation from static knowledge injection to dynamic reasoning augmentation. This evolution proves indispensable for domains requiring precise interpretation, such as regulatory compliance or legal analysis.

Strategic ramifications are profound. Organizations unlock domain-specific differentiation by leveraging proprietary datasets within real-time contexts, achieving decision-making superiority unattainable through generic models alone. Governance frameworks scale securely, accommodating enterprise-grade requirements without sacrificing velocity.

Persistent challenges, including data provenance assurance and model drift mitigation, necessitate ongoing refinement protocols. Future pathways envision declarative inference paradigms wherein prompts and policies are codified as infrastructure, alongside hybrid architectures merging vector search with continuous streaming for anticipatory intelligence.

Links:

PostHeaderIcon [GoogleIO2024] Under the Hood with Google AI: Exploring Research, Impact, and Future Horizons

Delving into AI’s foundational elements, Jeff Dean, James Manyika, and Koray Kavukcuoglu, moderated by Laurie Segall, discussed Google’s trajectory. Their dialogue traced historical shifts, current breakthroughs, and societal implications, offering profound perspectives on technology’s evolution.

Tracing AI’s Evolution and Key Milestones

Jeff recounted AI’s journey from rule-based systems to machine learning, highlighting neural networks’ resurgence around 2010 due to computational advances. Early applications at Google, like spelling corrections, paved the way for vision, speech, and language tasks. Koray noted hardware investments’ role in enabling generative methods, transforming content creation across fields.

James emphasized AI’s multiplier effect, reshaping sciences like biology and software development. The panel agreed that multimodal, long-context models like Gemini represent culminations of algorithmic and infrastructural progress, allowing generalization to novel challenges.

Addressing Societal Impacts and Ethical Considerations

James stressed AI’s mirror to humanity, prompting grapples with bias, fairness, and values—issues societies must collectively resolve. Koray advocated responsible deployment, integrating safety from inception through techniques like watermarking and red-teaming. Jeff highlighted balancing innovation with safeguards, ensuring models align with human intent while mitigating harms.

Discussions touched on global accessibility, with efforts to support underrepresented languages and equitable benefits. The leaders underscored collaborative approaches, involving diverse stakeholders to navigate complexities.

Envisioning AI’s Future Applications and Challenges

Koray envisioned AI accelerating healthcare, solving diseases efficiently worldwide. Jeff foresaw enhancements across human endeavors, from education to scientific discovery, if pursued thoughtfully. James hoped AI fosters better humanity, aiding complex problem-solving.

Challenges include advancing agentic systems for multi-step reasoning, improving evaluation beyond benchmarks, and ensuring inclusivity. The panel expressed optimism, viewing AI as an amplifier for positive change when guided responsibly.

Links:

PostHeaderIcon [AWSReInventPartnerSessions2024] Architecting Real-Time Generative AI Applications: A Confluent-AWS-Anthropic Integration Framework

Lecturer

Pascal Vuylsteker serves as Senior Director of Innovation at Confluent, where he pioneers scalable data streaming architectures that underpin enterprise artificial intelligence systems. Mario Rodriguez functions as Senior Partner Solutions Architect at AWS, specializing in generative AI service orchestration across cloud environments. Gavin Doyle leads the Applied AI team at Anthropic, directing development of safe, steerable, and interpretable large language models.

Abstract

This scholarly examination delineates a comprehensive methodology for constructing real-time generative AI applications through the synergistic integration of Confluent’s streaming platform, Amazon Bedrock’s managed foundation model ecosystem, and Anthropic’s Claude models. The analysis elucidates data governance centrality, retrieval-augmented generation (RAG) with continuous contextual synchronization, Flink-mediated inference execution, and vector database orchestration. Through architectural decomposition and configuration exemplars, it demonstrates how these components eliminate data silos, ensure temporal relevance in AI outputs, and enable secure, scalable enterprise innovation.

Governance-Centric Modern Data Architecture

Enterprise competitiveness increasingly hinges upon real-time data streaming capabilities, with seventy-nine percent of IT leaders affirming its strategic necessity. However, persistent barriers—siloed repositories, skill asymmetries, governance complexity, and generative AI’s voracious data requirements—impede realization.

Contemporary data architectures position governance as the foundational core, ensuring security, compliance, and accessibility. Radiating outward are data warehouses, streaming analytics engines, and generative AI applications. This configuration systematically dismantles silos while satisfying instantaneous insight demands.

Confluent operationalizes this vision by providing real-time data integration across ingestion pipelines, data lakes, and batch processing systems. It delivers precisely contextualized information at the moment of need—prerequisite for effective generative AI deployment.

Amazon Bedrock complements this through managed access to foundation models from Anthropic, AI21 Labs, Cohere, Meta, Mistral AI, Stability AI, and Amazon. The service supports experimentation, fine-tuning, continued pre-training, and agent orchestration. Security architecture prohibits customer data incorporation into base models, maintains isolation for customized variants, implements encryption, enforces granular access controls, and complies with HIPAA, GDPR, SOC, ISO, and CSA STAR.

Proprietary data constitutes the primary differentiation vector. Three techniques leverage this advantage: RAG injects external knowledge into prompts; fine-tuning specializes models on domain corpora; continued pre-training expands comprehension using enterprise datasets.

\# Bedrock model customization (conceptual)
modelCustomization:
  baseModel: anthropic.claude-3-sonnet
  trainingData: s3://enterprise-corpus/
  fineTuning:
    epochs: 3
    learningRate: 0.0001

Real-Time Contextual Injection and Flink Inference Orchestration

Confluent integrates directly with vector databases, ensuring conversational systems operate upon current, relevant information. This transcends mere data transport to deliver AI-actionable context.

Flink Inference enables real-time machine learning via Flink SQL, dramatically simplifying model integration into operational workflows. Configuration defines endpoints, authentication, prompts, and invocation patterns.

The architectural pipeline commences with document publication to Kafka topics. Documents undergo chunking for parallel processing, embedding generation via Bedrock/Anthropic, and indexing into MongoDB Atlas with original chunks. Quick-start templates deploy this workflow, incorporating structured data summarization through Claude for natural language querying.

Chatbot interactions initiate via API/Kafka, generate embeddings, retrieve documents, construct prompts with streaming context, and invoke Claude. Token optimization employs conversation summarization; enhanced vector queries via Claude-generated reformulations yield superior retrieval.

-- Flink model definition
CREATE MODEL claude_haiku WITH (
  'connector' = 'anthropic',
  'endpoint' = 'https://api.anthropic.com/v1/messages',
  'api.key' = 'sk-ant-...',
  'model' = 'claude-3-haiku-20240307'
);

-- Real-time inference
INSERT INTO responses
SELECT ML_PREDICT('claude_haiku', enriched_prompt) FROM interactions;

Flink provides cost-effective scaling, automatic elasticity, and native integration with AWS services, Snowflake, Databricks, and MongoDB. Anthropic models remain fully accessible via Bedrock.

Strategic Implications for Enterprise AI

The methodology transforms RAG from static knowledge injection to dynamic reasoning augmentation. Contextual retrieval and in-context learning mitigate hallucinations while enabling domain-specific differentiation.

Organizations achieve decision-making superiority through proprietary data in real-time contexts. Governance scales securely; challenges like data drift necessitate continuous refinement.

Future trajectories include declarative inference and hybrid vector-stream architectures for anticipatory intelligence.

Links:

PostHeaderIcon [DevoxxUK2024] Breaking AI: Live Coding and Hacking Applications with Generative AI by Simon Maple and Brian Vermeer

Simon Maple and Brian Vermeer, both seasoned developer advocates with extensive experience at Snyk and other tech firms, delivered an electrifying live coding session at DevoxxUK2024, exploring the double-edged sword of generative AI in software development. Simon, recently transitioned to a stealth-mode startup, and Brian, a current Snyk advocate, demonstrate how tools like GitHub Copilot and ChatGPT can accelerate coding velocity while introducing significant security risks. Through a live-coded Spring Boot coffee shop application, they expose vulnerabilities such as SQL injection, directory traversal, and cross-site scripting, emphasizing the need for rigorous validation and security practices. Their engaging, demo-driven approach underscores the balance between innovation and caution, offering developers actionable insights for leveraging AI safely.

Accelerating Development with Generative AI

Simon and Brian kick off by highlighting the productivity boost offered by generative AI tools, citing studies that suggest a 55% increase in developer efficiency and a 27% higher likelihood of meeting project goals. They build a Spring Boot application with a Thymeleaf front end, using Copilot to generate a homepage with a banner and product table. The process showcases AI’s ability to rapidly produce code snippets, such as HTML fragments, based on minimal prompts. However, they caution that this speed comes with risks, as AI often prioritizes completion over correctness, potentially embedding vulnerabilities. Their live demo illustrates how Copilot’s suggestions evolve with context, but also how developers must critically evaluate outputs to ensure functionality and security.

Exposing SQL Injection Vulnerabilities

The duo dives into a search functionality for their coffee shop application, where Copilot generates a query to filter products by name or description. However, the initial code concatenates user input directly into an SQL query, creating a classic SQL injection vulnerability. Brian demonstrates an exploit by injecting malicious input to set product prices to zero, highlighting how unchecked AI-generated code can compromise a system. They then refactor the code using prepared statements, showing how parameterization separates user input from the query execution plan, effectively neutralizing the vulnerability. This example underscores the importance of understanding AI outputs and applying secure coding practices, as tools like Copilot may not inherently prioritize security.

Mitigating Directory Traversal Risks

Next, Simon and Brian tackle a profile picture upload feature, where Copilot generates code to save files to a directory. The initial implementation concatenates user-provided file names with a base path, opening the door to directory traversal attacks. Using Burp Suite, they demonstrate how an attacker could overwrite critical files by manipulating the file name with “../” sequences. To address this, they refine the code to normalize paths, ensuring files remain within the intended directory. The session highlights the limitations of AI in detecting complex vulnerabilities like path traversal, emphasizing the need for developer vigilance and tools like Snyk to catch issues early in the development cycle.

Addressing Cross-Site Scripting Threats

The final vulnerability explored is cross-site scripting (XSS) in a product page feature. The AI-generated code directly embeds user input (product names) into HTML without sanitization, allowing Brian to inject a malicious script that captures session cookies. They demonstrate both reflective and stored XSS, showing how attackers could exploit these to hijack user sessions. While querying ChatGPT for a code review fails to pinpoint the XSS issue, Simon and Brian advocate for using established libraries like Spring Utils for input sanitization. This segment reinforces the necessity of combining AI tools with robust security practices and automated scanning to mitigate risks that AI might overlook.

Balancing Innovation and Security

Throughout the session, Simon and Brian stress that generative AI, while transformative, demands a cautious approach. They liken AI tools to junior developers, capable of producing functional code but requiring oversight to avoid errors or vulnerabilities. Real-world examples, such as a Samsung employee leaking sensitive code via ChatGPT, underscore the risks of blindly trusting AI outputs. They advocate for education, clear guidelines, and security tooling to complement AI-assisted development. By integrating tools like Snyk for vulnerability scanning and fostering a culture of code review, developers can harness AI’s potential while safeguarding their applications against threats.

Links:

PostHeaderIcon [SpringIO2024] Text-to-SQL: Chat with a Database Using Generative AI by Victor Martin & Corrado De Bari @ Spring I/O 2024

At Spring I/O 2024 in Barcelona, Victor Martin, a product manager for Oracle Database, delivered a compelling session on Text-to-SQL, a transformative approach to querying databases using natural language, powered by generative AI. Stepping in for his colleague Corrado De Bari, who was unable to attend, Victor explored how Large Language Models (LLMs) and Oracle’s innovative tools, including Spring AI and Select AI, enable business users with no SQL expertise to interact seamlessly with databases. The talk highlighted practical implementations, security considerations, and emerging technologies like AI Vector Search, offering a glimpse into the future of database interaction.

The Promise of Text-to-SQL

Text-to-SQL leverages LLMs to translate natural language queries into executable SQL, democratizing data access for non-technical users. Victor began by posing a challenge: how long would it take to build a REST endpoint for a business user to query a database using plain text? Traditionally, this task required manual SQL construction, schema validation, and error handling. With modern frameworks like Spring Boot and Oracle’s Select AI, this process is streamlined. Select AI, integrated into Oracle Database 19c and enhanced in 23 AI, supports features like RUN_SQL to execute generated queries, NARRATE to return results as human-readable text, and EXPLAIN_SQL to detail query reasoning. Victor emphasized that these tools reduce development time, enabling rapid deployment of user-friendly database interfaces.

Configuring Oracle Database for Text-to-SQL

Implementing Text-to-SQL requires minimal configuration within Oracle Database. Victor outlined the steps: first, set up an Access Control List (ACL) to allow external LLM calls, specifying the host and port. Next, create credentials for the LLM service (e.g., Oracle Cloud Infrastructure Generative AI, Open AI, or Azure Open AI) using the DBMS_CLOUD_AI package. Finally, define a profile linking the schema, tables, and chosen LLM. This profile is selected per session to ensure queries use the correct context. Victor demonstrated this with a Spring Boot application, where the profile is set before invoking Select AI. The simplicity of this setup, combined with Spring AI’s abstraction, makes it accessible even for developers new to AI-driven database interactions.

Enhancing Queries with Schema Annotations

A key challenge in Text-to-SQL is ensuring LLMs interpret ambiguous schemas correctly. Victor highlighted that table and column names like “C1” or “Table1” can confuse models. To address this, Oracle Database supports annotations—comments on tables and columns that provide business context. For example, annotating a column as “process status” with possible values clarifies its purpose, aiding the LLM in generating accurate joins and filters. These annotations, which don’t affect production queries, are created collaboratively by DBAs and business stakeholders. Victor shared a real-world example from Oracle’s telecom applications, where annotated schemas improved query precision, enabling complex queries without manual intervention.

AI Vector Search: Querying Unstructured Data

Victor introduced AI Vector Search, a cutting-edge feature in Oracle Database 23 AI, which extends Text-to-SQL to unstructured data. Unlike traditional SQL, which queries structured data, vector search encodes text, images, or audio into high-dimensional vectors representing semantic meaning. These vectors, stored as a new VECTOR data type, enable similarity-based queries. For instance, a job search query for “software engineer positions in New York” can combine structured filters (e.g., location) with vector-based matching of job descriptions and resumes. Victor explained how embedding models, deployed via Oracle’s DBMS_DATA_MINING package, generate these vectors, with metrics like cosine similarity determining relevance. This capability opens new use cases, from document search to personalized recommendations.

Links:

PostHeaderIcon [DotAI2024] DotAI 2024: Ines Montani – Crafting Resilient NLP Systems in the Generative Era

Ines Montani, co-founder and CEO of Explosion AI, illuminated the pitfalls and potentials of natural language processing pipelines at DotAI 2024. As a core contributor to spaCy—an open-source NLP powerhouse—and Prodigy, a data annotation suite, Montani champions modular tools that blend human intuition with computational might. Her address critiqued the “prompts suffice” ethos, advocating hybrid architectures that fuse rules, examples, and generative flair for robust, production-viable solutions.

Harmonizing Paradigms for Enduring Intelligence

Montani traced instruction evolution: from rigid rules yielding brittle brittleness to supervised learning’s nuanced exemplars, now augmented by in-context prompts’ linguistic alchemy. Rules shine in clarity for novices, yet crumble under data flux; examples infuse domain savvy but demand curation toil; prompts democratize prototyping, yet hallucinate sans anchors.

The synergy? Layered pipelines where rules scaffold prompts, examples calibrate outputs, and LLMs infuse creativity. Montani showcased spaCy’s evolution: rule-based tokenizers ensure consistency, while generative components handle ambiguity, like entity resolution in noisy texts. This modularity mitigates drift, preserving fidelity across model swaps.

In industrial extraction—parsing resumes or contracts—Montani stressed data’s primacy: raw inputs reveal logic gaps, prompting refactorings that unearth “window-knocking machines”—flawed proxies mistaking correlation for causation. A chatbot querying calendars, she analogized, falters if oblivious to time zones; true utility demands holistic orchestration.

Fostering Modularity Amid Generative Hype

Montani cautioned against abstraction overload: leaky layers spawn brittle facades, where one-liners unravel on edge cases. Instead, embrace transparency—Prodigy’s active learning loops refine datasets iteratively, blending human oversight with AI proposals to curb over-reliance.

Retrieval-augmented generation (RAG) exemplifies balanced integration: LLMs query structured stores, yielding chat interfaces atop databases, supplanting clunky GUIs. Yet, Montani warned, context dictates efficacy; for analytical dives, raw views trump conversational veils.

Her ethos: interrogate intent—who wields the tool, what risks lurk? Surprise greets data dives, unveiling bespoke logics that generative magic alone can’t conjure. Efficiency, privacy, and modularity—spaCy’s hallmarks—thwart big-tech monoliths, empowering bespoke ingenuity.

In sum, Montani’s blueprint rejects compromise: generative AI amplifies, not supplants, principled engineering, birthing interfaces that endure and elevate.

Links:

PostHeaderIcon [PHPForumParis2023] Experience Report: Building Two Open-Source Personal AIs with OpenAI – Maxime Thoonsen

Maxime Thoonsen, CTO at Theodo, shared an exhilarating session at Forum PHP 2023, detailing his experience building two open-source personal AI applications using OpenAI’s technologies. As an organizer of the Generative AI Paris Meetup, Maxime’s passion for the PHP community and innovative AI solutions shone through. His step-by-step approach demystified AI development, encouraging PHP developers to explore generative AI by demonstrating its simplicity and potential through practical examples.

Understanding Generative AI’s Potential

Maxime began by introducing the capabilities of generative AI, emphasizing its accessibility for PHP developers. He explained how OpenAI’s APIs enable the creation of applications that process and generate human-like text. Drawing from his work at Theodo, Maxime showcased two personal AI projects, illustrating how they leverage semantic search and embeddings to deliver tailored responses. His enthusiasm for the community, where he began his speaking career, underscored the collaborative spirit driving AI innovation.

Practical AI Development with OpenAI

Delving into the technical details, Maxime walked the audience through building AI applications using OpenAI’s APIs. He highlighted the simplicity of implementing semantic search to retrieve relevant data from documents, advising against premature fine-tuning in favor of straightforward similarity searches. Responding to an audience question, Maxime noted the availability of open-source alternatives like Llama and Mistral, though he acknowledged OpenAI’s GPT-4 as a leader in embedding accuracy. His examples empowered developers to start building AI-driven features in their PHP projects.

Navigating the AI Ecosystem

Maxime concluded by addressing the rapidly evolving AI landscape, likening it to the proliferation of JavaScript frameworks. He emphasized the cost-effectiveness of smaller open-source models for specific use cases, while noting OpenAI’s edge in precision. His talk inspired developers to join communities like the Generative AI Paris Meetup to explore AI further, fostering a sense of curiosity and experimentation within the PHP ecosystem.

Links:

PostHeaderIcon [DevoxxBE2023] Build a Generative AI App in Project IDX and Firebase by Prakhar Srivastav

At Devoxx Belgium 2023, Prakhar Srivastav, a software engineer at Google, unveiled the power of Project IDX and Firebase in crafting a generative AI mobile application. His session illuminated how developers can harness these tools to streamline full-stack, multiplatform app development directly from the browser, eliminating cumbersome local setups. Through a live demonstration, Prakhar showcased the creation of “Listed,” a Flutter-based app that leverages Google’s PaLM API to break down user-defined goals into actionable subtasks, offering a practical tool for task management. His engaging presentation, enriched with real-time coding, highlighted the synergy of cloud-based development environments and AI-driven solutions.

Introducing Project IDX: A Cloud-Based Development Revolution

Prakhar introduced Project IDX as a transformative cloud-based development environment designed to simplify the creation of multiplatform applications. Unlike traditional setups requiring hefty binaries like Xcode or Android Studio, Project IDX enables developers to work entirely in the browser. Prakhar demonstrated this by running Android and iOS emulators side-by-side within the browser, showcasing a Flutter app that compiles to multiple platforms—Android, iOS, web, Linux, and macOS—from a single codebase. This eliminates the need for platform-specific configurations, making development accessible even on lightweight devices like Chromebooks.

The live demo featured “Listed,” a mobile app where users input a goal, such as preparing for a tech talk, and receive AI-generated subtasks and tips. For instance, entering “give a tech talk at a conference” yielded steps like choosing a relevant topic and practicing the presentation, with a tip to have a backup plan for technical issues. Prakhar’s real-time tweak—changing the app’s color scheme from green to red—illustrated the iterative development flow, where changes are instantly reflected in the emulator, enhancing productivity and experimentation.

Harnessing the PaLM API for Generative AI

Central to the app’s functionality is Google’s PaLM API, which Prakhar utilized to integrate generative AI capabilities. He explained that large language models (LLMs), like those powering the PaLM API, act as sophisticated autocomplete systems, predicting likely text outputs based on extensive training data. For “Listed,” the text API was chosen for its suitability in single-turn interactions, such as generating subtasks from a user’s query. Prakhar emphasized the importance of crafting effective prompts, comparing a vague prompt like “the sky is” to a precise one like “complete the sentence: the sky is,” which yields more relevant results.

To enhance the AI’s output, Prakhar employed few-shot prompting, providing the model with examples of desired responses. For instance, for the query “go camping,” the prompt included sample subtasks like choosing a campsite and packing meals, along with a tip about wildlife safety. This structured approach ensured the model generated contextually accurate and actionable suggestions, making the app intuitive for users tackling complex tasks.

Securing AI Integration with Firebase Extensions

Integrating the PaLM API into a mobile app poses security challenges, particularly around API key exposure. Prakhar addressed this by leveraging Firebase Extensions, which provide pre-packaged solutions to streamline backend integration. Specifically, he used a Firebase Extension to securely call the PaLM API via Cloud Functions, avoiding the need to embed sensitive API keys in the client-side Flutter app. This setup not only enhances security but also simplifies infrastructure management, as the extension handles logging, monitoring, and optional AppCheck for client verification.

In the live demo, Prakhar navigated the Firebase Extensions Marketplace, selecting the “Call PaLM API Securely” extension. With a few clicks, he deployed Cloud Functions that exposed a POST API for sending prompts and receiving AI-generated responses. The code walkthrough revealed a straightforward implementation in Dart, where the app constructs a JSON payload with the prompt, model name (text-bison-001), and temperature (0.25 for deterministic outputs), ensuring seamless and secure communication with the backend.

Building the Flutter App: Simplicity and Collaboration

The Flutter app’s architecture, built within Project IDX, was designed for simplicity and collaboration. Prakhar walked through the main.dart file, which scaffolds the app’s UI with a material-themed interface, an input field for user queries, and a list to display AI-generated tasks. The app uses anonymous Firebase authentication to secure backend calls without requiring user logins, enhancing accessibility. A PromptBuilder class dynamically constructs prompts by combining predefined prefixes and examples, ensuring flexibility in handling varied user inputs.

Project IDX’s integration with Visual Studio Code’s open-source framework added collaborative features. Prakhar demonstrated how developers can invite colleagues to a shared workspace, enabling real-time collaboration. Additionally, the IDE’s AI capabilities allow users to explain selected code or generate new snippets, streamlining development. For instance, selecting the PromptBuilder class and requesting an explanation provided detailed insights into its parameters, showcasing how Project IDX enhances developer productivity.

Links: