Posts Tagged ‘ToolCalling’
[SpringIO2025] Real-World AI Patterns with Spring AI and Vaadin by Marcus Hellberg / Thomas Vitale
Lecturer
Marcus Hellberg is the Vice President of AI Research at Vaadin, a company specializing in tools for Java developers to build web applications. As a Java Champion with nearly 20 years of experience in Java and web development, he focuses on integrating AI capabilities into Java ecosystems. Thomas Vitale is a software engineer at Systematic, a Danish software company, with expertise in cloud-native solutions, Java, and AI. He is the author of “Cloud Native Spring in Action” and an upcoming book on developer experience on Kubernetes, and serves as a CNCF Ambassador.
- Marcus Hellberg on LinkedIn
- Marcus Hellberg on GitHub
- Thomas Vitale on LinkedIn
- Thomas Vitale on GitHub
Abstract
This article examines practical patterns for incorporating artificial intelligence into Java applications using Spring AI and Vaadin, transitioning from experimental to production-ready implementations. It analyzes techniques for memory management, guardrails, multimodality, retrieval-augmented generation, tool calling, and agents, with implications for security, user experience, and system integration. Insights emphasize robust, observable AI workflows in on-premises or cloud environments.
Memory Management and Streaming in AI Interactions
Integrating large language models (LLMs) into applications requires addressing their stateless nature, where each interaction lacks inherent context from prior exchanges. Spring AI provides advisors—interceptor-like mechanisms—to augment prompts with conversation history, enabling short-term memory. For instance, a MessageChatMemoryAdvisor retains the last N messages, ensuring continuity without manual tracking.
This pattern enhances user interactions in chat-based interfaces, built here with Vaadin’s component model for server-side Java UIs. A vertical layout hosts message lists and inputs, injecting a ChatClientBuilder to construct clients with advisors. Basic interactions involve prompting the model and appending responses, but for realism, streaming via reactive fluxes improves responsiveness, subscribing to token streams and updating UI progressively.
Code illustration:
ChatClient chatClient = builder.build();
messageInput.addSubmitListener(submitEvent -> {
String message = submitEvent.getMessage();
MessageItem userItem = messageList.addMessage("You", message);
chatClient.stream(new Prompt(message))
.subscribe(response -> {
userItem.append(response.getResult().getOutput().getContent());
});
});
Streaming suits verbose responses, reducing perceived latency, while observability integrations (e.g., OpenTelemetry) trace interactions for debugging nondeterministic behaviors.
Guardrails for Security and Validation
AI workflows must mitigate risks like sensitive data leaks or invalid outputs. Input guardrails intercept prompts, using on-premises models to check for compliance with policies, blocking unauthorized queries (e.g., personal information). Output guardrails validate responses, reprompting for corrections if deserialization fails.
Advisors enable this: a default advisor with a local chat model filters inputs/outputs. For example, querying an address might be blocked if flagged, preventing cloud exposure. This ensures determinism in structured outputs, converting unstructured text to Java objects via JSON instructions.
Implications include privacy preservation in regulated sectors and integration with Spring Security for role-based tool access.
Multimodality and Retrieval-Augmented Generation
LLMs extend beyond text through multimodality, processing images, audio, or videos. Spring AI’s entity methods augment prompts for structured extraction, e.g., parsing attendee details from images into tables for programmatic use.
Retrieval-augmented generation (RAG) combats hallucinations by embedding external data as vectors in stores like PostgreSQL. A RetrievalAugmentationAdvisor retrieves relevant documents via similarity search, augmenting prompts. Customizations allow empty contexts for fallback to model knowledge.
Example:
VectorStore vectorStore = // PostgreSQL vector store
DocumentRetriever retriever = new VectorStoreDocumentRetriever(vectorStore);
RetrievalAugmentationAdvisor advisor = RetrievalAugmentationAdvisor.builder()
.documentRetriever(retriever)
.queryAugmentor(QueryAugmentor.contextual().allowEmptyContext(true))
.build();
This pattern grounds responses in proprietary data, with thresholds controlling retrieval scope.
Tool Calling, Agents, and Dynamic Integrations
Tool calling empowers LLMs as agents, invoking external functions for tasks like database queries. Annotations describe tools, passed to clients for dynamic selection. For products, a service might expose query/update methods:
@Tool(description = "Fetch products from database")
public List<Product> getProducts(@P(description = "Category filter") String category) {
// Database query
}
Agents orchestrate tools, potentially via Model Context Protocol for external services. Demonstrations include theme generation from screenshots, editing CSS via file system tools, highlighting nondeterminism and the need for safeguards.
In conclusion, these patterns enable production AI, emphasizing modularity, security, and observability for robust Java applications.