Posts Tagged ‘RAG’
[SpringIO2025] Real-World AI Patterns with Spring AI and Vaadin by Marcus Hellberg / Thomas Vitale
Lecturer
Marcus Hellberg is the Vice President of AI Research at Vaadin, a company specializing in tools for Java developers to build web applications. As a Java Champion with nearly 20 years of experience in Java and web development, he focuses on integrating AI capabilities into Java ecosystems. Thomas Vitale is a software engineer at Systematic, a Danish software company, with expertise in cloud-native solutions, Java, and AI. He is the author of “Cloud Native Spring in Action” and an upcoming book on developer experience on Kubernetes, and serves as a CNCF Ambassador.
- Marcus Hellberg on LinkedIn
- Marcus Hellberg on GitHub
- Thomas Vitale on LinkedIn
- Thomas Vitale on GitHub
Abstract
This article examines practical patterns for incorporating artificial intelligence into Java applications using Spring AI and Vaadin, transitioning from experimental to production-ready implementations. It analyzes techniques for memory management, guardrails, multimodality, retrieval-augmented generation, tool calling, and agents, with implications for security, user experience, and system integration. Insights emphasize robust, observable AI workflows in on-premises or cloud environments.
Memory Management and Streaming in AI Interactions
Integrating large language models (LLMs) into applications requires addressing their stateless nature, where each interaction lacks inherent context from prior exchanges. Spring AI provides advisors—interceptor-like mechanisms—to augment prompts with conversation history, enabling short-term memory. For instance, a MessageChatMemoryAdvisor retains the last N messages, ensuring continuity without manual tracking.
This pattern enhances user interactions in chat-based interfaces, built here with Vaadin’s component model for server-side Java UIs. A vertical layout hosts message lists and inputs, injecting a ChatClientBuilder to construct clients with advisors. Basic interactions involve prompting the model and appending responses, but for realism, streaming via reactive fluxes improves responsiveness, subscribing to token streams and updating UI progressively.
Code illustration:
ChatClient chatClient = builder.build();
messageInput.addSubmitListener(submitEvent -> {
String message = submitEvent.getMessage();
MessageItem userItem = messageList.addMessage("You", message);
chatClient.stream(new Prompt(message))
.subscribe(response -> {
userItem.append(response.getResult().getOutput().getContent());
});
});
Streaming suits verbose responses, reducing perceived latency, while observability integrations (e.g., OpenTelemetry) trace interactions for debugging nondeterministic behaviors.
Guardrails for Security and Validation
AI workflows must mitigate risks like sensitive data leaks or invalid outputs. Input guardrails intercept prompts, using on-premises models to check for compliance with policies, blocking unauthorized queries (e.g., personal information). Output guardrails validate responses, reprompting for corrections if deserialization fails.
Advisors enable this: a default advisor with a local chat model filters inputs/outputs. For example, querying an address might be blocked if flagged, preventing cloud exposure. This ensures determinism in structured outputs, converting unstructured text to Java objects via JSON instructions.
Implications include privacy preservation in regulated sectors and integration with Spring Security for role-based tool access.
Multimodality and Retrieval-Augmented Generation
LLMs extend beyond text through multimodality, processing images, audio, or videos. Spring AI’s entity methods augment prompts for structured extraction, e.g., parsing attendee details from images into tables for programmatic use.
Retrieval-augmented generation (RAG) combats hallucinations by embedding external data as vectors in stores like PostgreSQL. A RetrievalAugmentationAdvisor retrieves relevant documents via similarity search, augmenting prompts. Customizations allow empty contexts for fallback to model knowledge.
Example:
VectorStore vectorStore = // PostgreSQL vector store
DocumentRetriever retriever = new VectorStoreDocumentRetriever(vectorStore);
RetrievalAugmentationAdvisor advisor = RetrievalAugmentationAdvisor.builder()
.documentRetriever(retriever)
.queryAugmentor(QueryAugmentor.contextual().allowEmptyContext(true))
.build();
This pattern grounds responses in proprietary data, with thresholds controlling retrieval scope.
Tool Calling, Agents, and Dynamic Integrations
Tool calling empowers LLMs as agents, invoking external functions for tasks like database queries. Annotations describe tools, passed to clients for dynamic selection. For products, a service might expose query/update methods:
@Tool(description = "Fetch products from database")
public List<Product> getProducts(@P(description = "Category filter") String category) {
// Database query
}
Agents orchestrate tools, potentially via Model Context Protocol for external services. Demonstrations include theme generation from screenshots, editing CSS via file system tools, highlighting nondeterminism and the need for safeguards.
In conclusion, these patterns enable production AI, emphasizing modularity, security, and observability for robust Java applications.
Links:
[DotJs2025] Prompting is the New Scripting: Meet GenAIScript
As generative paradigms proliferate, scripting’s syntax strains under AI’s amorphous allure—prompts as prosaic prose, yet perilous in precision. Yohan Lasorsa, Microsoft’s principal developer advocate and Angular GDE, unveiled GenAIScript at dotJS 2025, a JS-inflected idiom abstracting LLM labyrinths into lucid loops. With 15 years traversing IoT’s interstices to cloud’s canopies, Yohan likened this lexicon to jQuery’s jubilee: DOM’s discord domesticated, now GenAI’s gyrations gentled for mortal makers.
Yohan’s yarn recalled jQuery’s jihad: browser balkanization banished, events etherealized—20 years on, GenAI’s gale mirrors, models multiplying, APIs anarchic. GenAIScript’s grace: JS carapace cloaking complexities—await ai.chat('prompt') birthing banter, ai.forEach(items, 'summarize') distilling dossiers. Demos danced: file foragers (fs.readFile), prompt pipelines (ai.pipe(model).chat(query)), even AST adventurers refactoring Angular artifacts—CLI’s churn supplanted by semantic sorcery.
This superstructure spans: agents’ autonomy (ai.agent({tools})), RAG’s retrieval (ai.retrieve({query, store})), even vision’s vignettes (ai.vision(image)). Yohan’s yield: ergonomics eclipsing exhaustion—built-ins for Bedrock, Ollama; extensibility via plugins. Caveat’s cadence: tool for tinkering, not titanic tomes—yet frameworks’ fledglings may flock hither.
GenAIScript’s gospel: prompting’s poetry, scripted sans strife—democratizing discernment in AI’s ascent.
jQuery’s Echo in AI’s Era
Yohan juxtaposed jQuery’s quirk-quelling with GenAI’s gale—models’ menagerie, APIs’ anarchy. GenAIScript’s girdle: JS’s jacket jacketting journeys—chat’s cadence, forEach’s finesse.
Patterns’ Parade and Potentials
Agents’ agency, RAG’s recall—pipelines pure, vision’s vista. Yohan’s yarns: Angular migrations mended, Bedrock bridged—plugins’ pliancy promising proliferation.
Links:
[NDCMelbourne2025] How to Work with Generative AI in JavaScript – Phil Nash
Phil Nash, a developer relations engineer at DataStax, delivers a comprehensive guide to leveraging generative AI in JavaScript at NDC Melbourne 2025. His talk demystifies the process of building AI-powered applications, emphasizing that JavaScript developers can harness existing skills to create sophisticated solutions without needing deep machine learning expertise. Through practical examples and insights into tools like Gemini and retrieval-augmented generation (RAG), Phil empowers developers to explore this rapidly evolving field.
Understanding Generative AI Fundamentals
Phil begins by addressing the excitement surrounding generative AI, noting its accessibility since the release of the GPT-3.5 API two years ago. He emphasizes that JavaScript developers are well-positioned to engage with AI due to robust tooling and APIs, despite the field’s Python-centric origins. Using Google’s Gemini model as an example, Phil demonstrates how to generate content with minimal code, highlighting the importance of understanding core concepts like token generation and model behavior.
He explains tokenization, using OpenAI’s byte pair encoding as an example, where text is broken into probabilistic tokens. Parameters like top-k, top-p, and temperature allow developers to control output randomness, with Phil cautioning against overly high settings that produce nonsensical results, humorously illustrated by a chaotic AI-generated story about a gnome.
Enhancing AI with Prompt Engineering
Prompt engineering emerges as a critical skill for refining AI outputs. Phil contrasts zero-shot prompting, which offers minimal context, with techniques like providing examples or system prompts to guide model behavior. For instance, a system prompt defining a “capital city assistant” ensures concise, accurate responses. He also explores chain-of-thought prompting, where instructing the model to think step-by-step improves its ability to solve complex problems, such as a modified river-crossing riddle.
Phil underscores the need for evaluation to ensure prompt reliability, as slight changes can significantly alter outcomes. This structured approach transforms prompt engineering from guesswork into a disciplined practice, enabling developers to tailor AI responses effectively.
Retrieval-Augmented Generation for Contextual Awareness
To address AI models’ limitations, such as outdated or private data, Phil introduces retrieval-augmented generation (RAG). RAG enhances models by integrating external data, like conference talk descriptions, into prompts. He explains how vector embeddings—multidimensional representations of text—enable semantic searches, using cosine similarity to find relevant content. With DataStax’s Astra DB, developers can store and query vectorized data efficiently, as demonstrated in a demo where Phil’s bot retrieves details about NDC Melbourne talks.
This approach allows AI to provide contextually relevant answers, such as identifying AI-related talks or conference events, making it a powerful tool for building intelligent applications.
Streaming Responses and Building Agents
Phil highlights the importance of user experience, noting that AI responses can be slow. Streaming, supported by APIs like Gemini’s generateContentStream, delivers tokens incrementally, improving perceived performance. He demonstrates streaming results to a webpage using JavaScript’s fetch and text decoder streams, showcasing how to create responsive front-end experiences.
The talk culminates with AI agents, which Phil describes as systems that perceive, reason, plan, and act using tools. By defining functions in JSON schema, developers can enable models to perform tasks like arithmetic or fetching web content. A demo bot uses tools to troubleshoot a keyboard issue and query GitHub, illustrating agents’ potential to solve complex problems dynamically.
Conclusion: Empowering JavaScript Developers
Phil concludes by encouraging developers to experiment with generative AI, leveraging tools like Langflow for visual prototyping and exploring browser-based models like Gemini Nano. His talk is a call to action, urging JavaScript developers to build innovative applications by combining AI capabilities with their existing expertise. By mastering prompt engineering, RAG, streaming, and agents, developers can create powerful, user-centric solutions.
Links:
[AWSReInventPartnerSessions2024] Architecting Real-Time Generative AI Applications: A Confluent-AWS-Anthropic Integration Framework
Lecturer
Pascal Vuylsteker serves as Senior Director of Innovation at Confluent, where he pioneers scalable data streaming architectures that underpin enterprise artificial intelligence systems. Mario Rodriguez functions as Senior Partner Solutions Architect at AWS, specializing in generative AI service orchestration across cloud environments. Gavin Doyle leads the Applied AI team at Anthropic, directing development of safe, steerable, and interpretable large language models.
Abstract
This scholarly examination delineates a comprehensive methodology for constructing real-time generative AI applications through the synergistic integration of Confluent’s streaming platform, Amazon Bedrock’s managed foundation model ecosystem, and Anthropic’s Claude models. The analysis elucidates data governance centrality, retrieval-augmented generation (RAG) with continuous contextual synchronization, Flink-mediated inference execution, and vector database orchestration. Through architectural decomposition and configuration exemplars, it demonstrates how these components eliminate data silos, ensure temporal relevance in AI outputs, and enable secure, scalable enterprise innovation.
Governance-Centric Modern Data Architecture
Enterprise competitiveness increasingly hinges upon real-time data streaming capabilities, with seventy-nine percent of IT leaders affirming its strategic necessity. However, persistent barriers—siloed repositories, skill asymmetries, governance complexity, and generative AI’s voracious data requirements—impede realization.
Contemporary data architectures position governance as the foundational core, ensuring security, compliance, and accessibility. Radiating outward are data warehouses, streaming analytics engines, and generative AI applications. This configuration systematically dismantles silos while satisfying instantaneous insight demands.
Confluent operationalizes this vision by providing real-time data integration across ingestion pipelines, data lakes, and batch processing systems. It delivers precisely contextualized information at the moment of need—prerequisite for effective generative AI deployment.
Amazon Bedrock complements this through managed access to foundation models from Anthropic, AI21 Labs, Cohere, Meta, Mistral AI, Stability AI, and Amazon. The service supports experimentation, fine-tuning, continued pre-training, and agent orchestration. Security architecture prohibits customer data incorporation into base models, maintains isolation for customized variants, implements encryption, enforces granular access controls, and complies with HIPAA, GDPR, SOC, ISO, and CSA STAR.
Proprietary data constitutes the primary differentiation vector. Three techniques leverage this advantage: RAG injects external knowledge into prompts; fine-tuning specializes models on domain corpora; continued pre-training expands comprehension using enterprise datasets.
\# Bedrock model customization (conceptual)
modelCustomization:
baseModel: anthropic.claude-3-sonnet
trainingData: s3://enterprise-corpus/
fineTuning:
epochs: 3
learningRate: 0.0001
Real-Time Contextual Injection and Flink Inference Orchestration
Confluent integrates directly with vector databases, ensuring conversational systems operate upon current, relevant information. This transcends mere data transport to deliver AI-actionable context.
Flink Inference enables real-time machine learning via Flink SQL, dramatically simplifying model integration into operational workflows. Configuration defines endpoints, authentication, prompts, and invocation patterns.
The architectural pipeline commences with document publication to Kafka topics. Documents undergo chunking for parallel processing, embedding generation via Bedrock/Anthropic, and indexing into MongoDB Atlas with original chunks. Quick-start templates deploy this workflow, incorporating structured data summarization through Claude for natural language querying.
Chatbot interactions initiate via API/Kafka, generate embeddings, retrieve documents, construct prompts with streaming context, and invoke Claude. Token optimization employs conversation summarization; enhanced vector queries via Claude-generated reformulations yield superior retrieval.
-- Flink model definition
CREATE MODEL claude_haiku WITH (
'connector' = 'anthropic',
'endpoint' = 'https://api.anthropic.com/v1/messages',
'api.key' = 'sk-ant-...',
'model' = 'claude-3-haiku-20240307'
);
-- Real-time inference
INSERT INTO responses
SELECT ML_PREDICT('claude_haiku', enriched_prompt) FROM interactions;
Flink provides cost-effective scaling, automatic elasticity, and native integration with AWS services, Snowflake, Databricks, and MongoDB. Anthropic models remain fully accessible via Bedrock.
Strategic Implications for Enterprise AI
The methodology transforms RAG from static knowledge injection to dynamic reasoning augmentation. Contextual retrieval and in-context learning mitigate hallucinations while enabling domain-specific differentiation.
Organizations achieve decision-making superiority through proprietary data in real-time contexts. Governance scales securely; challenges like data drift necessitate continuous refinement.
Future trajectories include declarative inference and hybrid vector-stream architectures for anticipatory intelligence.