Recent Posts
Archives

Posts Tagged ‘SpringAI’

PostHeaderIcon [SpringIO2025] Real-World AI Patterns with Spring AI and Vaadin by Marcus Hellberg / Thomas Vitale

Lecturer

Marcus Hellberg is the Vice President of AI Research at Vaadin, a company specializing in tools for Java developers to build web applications. As a Java Champion with nearly 20 years of experience in Java and web development, he focuses on integrating AI capabilities into Java ecosystems. Thomas Vitale is a software engineer at Systematic, a Danish software company, with expertise in cloud-native solutions, Java, and AI. He is the author of “Cloud Native Spring in Action” and an upcoming book on developer experience on Kubernetes, and serves as a CNCF Ambassador.

Abstract

This article examines practical patterns for incorporating artificial intelligence into Java applications using Spring AI and Vaadin, transitioning from experimental to production-ready implementations. It analyzes techniques for memory management, guardrails, multimodality, retrieval-augmented generation, tool calling, and agents, with implications for security, user experience, and system integration. Insights emphasize robust, observable AI workflows in on-premises or cloud environments.

Memory Management and Streaming in AI Interactions

Integrating large language models (LLMs) into applications requires addressing their stateless nature, where each interaction lacks inherent context from prior exchanges. Spring AI provides advisors—interceptor-like mechanisms—to augment prompts with conversation history, enabling short-term memory. For instance, a MessageChatMemoryAdvisor retains the last N messages, ensuring continuity without manual tracking.

This pattern enhances user interactions in chat-based interfaces, built here with Vaadin’s component model for server-side Java UIs. A vertical layout hosts message lists and inputs, injecting a ChatClientBuilder to construct clients with advisors. Basic interactions involve prompting the model and appending responses, but for realism, streaming via reactive fluxes improves responsiveness, subscribing to token streams and updating UI progressively.

Code illustration:

ChatClient chatClient = builder.build();
messageInput.addSubmitListener(submitEvent -> {
    String message = submitEvent.getMessage();
    MessageItem userItem = messageList.addMessage("You", message);
    chatClient.stream(new Prompt(message))
        .subscribe(response -> {
            userItem.append(response.getResult().getOutput().getContent());
        });
});

Streaming suits verbose responses, reducing perceived latency, while observability integrations (e.g., OpenTelemetry) trace interactions for debugging nondeterministic behaviors.

Guardrails for Security and Validation

AI workflows must mitigate risks like sensitive data leaks or invalid outputs. Input guardrails intercept prompts, using on-premises models to check for compliance with policies, blocking unauthorized queries (e.g., personal information). Output guardrails validate responses, reprompting for corrections if deserialization fails.

Advisors enable this: a default advisor with a local chat model filters inputs/outputs. For example, querying an address might be blocked if flagged, preventing cloud exposure. This ensures determinism in structured outputs, converting unstructured text to Java objects via JSON instructions.

Implications include privacy preservation in regulated sectors and integration with Spring Security for role-based tool access.

Multimodality and Retrieval-Augmented Generation

LLMs extend beyond text through multimodality, processing images, audio, or videos. Spring AI’s entity methods augment prompts for structured extraction, e.g., parsing attendee details from images into tables for programmatic use.

Retrieval-augmented generation (RAG) combats hallucinations by embedding external data as vectors in stores like PostgreSQL. A RetrievalAugmentationAdvisor retrieves relevant documents via similarity search, augmenting prompts. Customizations allow empty contexts for fallback to model knowledge.

Example:

VectorStore vectorStore = // PostgreSQL vector store
DocumentRetriever retriever = new VectorStoreDocumentRetriever(vectorStore);
RetrievalAugmentationAdvisor advisor = RetrievalAugmentationAdvisor.builder()
    .documentRetriever(retriever)
    .queryAugmentor(QueryAugmentor.contextual().allowEmptyContext(true))
    .build();

This pattern grounds responses in proprietary data, with thresholds controlling retrieval scope.

Tool Calling, Agents, and Dynamic Integrations

Tool calling empowers LLMs as agents, invoking external functions for tasks like database queries. Annotations describe tools, passed to clients for dynamic selection. For products, a service might expose query/update methods:

@Tool(description = "Fetch products from database")
public List<Product> getProducts(@P(description = "Category filter") String category) {
    // Database query
}

Agents orchestrate tools, potentially via Model Context Protocol for external services. Demonstrations include theme generation from screenshots, editing CSS via file system tools, highlighting nondeterminism and the need for safeguards.

In conclusion, these patterns enable production AI, emphasizing modularity, security, and observability for robust Java applications.

Links:

PostHeaderIcon [SpringIO2025] Spring I/O 2025 Keynote

Lecturer

The keynote features Spring leadership: Juergen Hoeller (Framework Lead), Rossen Stoyanchev (Web), Ana Maria Mihalceanu (AI), Moritz Halbritter (Boot), Mark Paluch (Data), Josh Long (Advocate), Mark Pollack (Messaging). Collectively, they steer the Spring portfolio’s technical direction and community engagement.

Abstract

The keynote unveils Spring Framework 7.0 and Boot 4.0, establishing JDK 21 and Jakarta EE 11 as baselines while advancing AOT compilation, virtual threads, structured concurrency, and AI integration. Live demonstrations and roadmap disclosures illustrate how these enhancements—combined with refined observability, web capabilities, and data access—position Spring as the preeminent platform for cloud-native Java development.

Baseline Evolution: JDK 21 and Jakarta EE 11

Spring Framework 7.0 mandates JDK 21, embracing virtual threads for lightweight concurrency and records for immutable data carriers. Jakarta EE 11 introduces the Core Profile and CDI Lite, trimming enterprise bloat. The demonstration showcases a virtual thread-per-request web handler processing 100,000 concurrent connections with minimal heap, contrasting traditional thread pools. This baseline shift enables native image compilation via Spring AOT, reducing startup to milliseconds and memory footprint by 90%.

AOT and Native Image Optimization

Spring Boot 4.0 refines AOT processing through Project Leyden integration, pre-computing bean definitions and proxy classes at build time. Native executables startup in under 50ms, suitable for serverless platforms. The live demo compiles a Kafka Streams application to GraalVM native image, achieving sub-second cold starts and 15MB RSS—transforming deployment economics for event-driven microservices.

AI Integration and Modern Web Capabilities

Spring AI matures with function calling, tool integration, and vector database support. A live-coded agent retrieves beans from a running context to answer natural language queries about application metrics. WebFlux enhances structured concurrency with Schedulers.boundedElastic() replacement via virtual threads, simplifying reactive code. The demonstration contrasts traditional Mono/Flux composition with straightforward sequential logic executing on virtual threads, preserving backpressure while improving readability.

Data, Messaging, and Observability Advancements

Spring Data advances R2DBC connection pooling and Redis Cluster native support. Spring for Apache Kafka 4.0 introduces configurable retry templates and Micrometer metrics out-of-the-box. Unified observability aggregates metrics, traces, and logs: Prometheus exposes 200+ Kafka client metrics, OpenTelemetry correlates spans across HTTP and Kafka, and structured logging propagates MDC context. A Grafana dashboard visualizes end-to-end latency from REST ingress to database commit, enabling proactive incident response.

Community and Future Trajectory

The keynote celebrates Spring’s global community, highlighting contributions to null-safety (JSpecify), virtual thread testing, and AOT hint generation. Planned enhancements include JDK 23 support, Project Panama integration for native memory access, and AI-driven configuration validation. The vision positions Spring as the substrate for the next decade of Java innovation, balancing cutting-edge capabilities with backward compatibility.

Links:

PostHeaderIcon [DevoxxUK2025] Concerto for Java and AI: Building Production-Ready LLM Applications

At DevoxxUK2025, Thomas Vitale, a software engineer at Systematic, delivered an inspiring session on integrating generative AI into Java applications to enhance his music composition process. Combining his passion for music and software engineering, Thomas showcased a “composer assistant” application built with Spring AI, addressing real-world use cases like text classification, semantic search, and structured data extraction. Through live coding and a musical performance, he demonstrated how Java developers can leverage large language models (LLMs) for production-ready applications, emphasizing security, observability, and developer experience. His talk culminated in a live composition for an audience-chosen action movie scene, blending AI-driven suggestions with human creativity.

The Why Factor for AI Integration

Thomas introduced his “Why Factor” to evaluate hype technologies like generative AI. First, identify the problem: for his composer assistant, he needed to organize and access musical data efficiently. Second, assess production readiness: LLMs must be secure and reliable for real-world use. Third, prioritize developer experience: tools like Spring AI simplify integration without disrupting workflows. By focusing on these principles, Thomas avoided blindly adopting AI, ensuring it solved specific issues, such as automating data classification to free up time for creative tasks like composing music.

Enhancing Applications with Spring AI

Using a Spring Boot application with a Thymeleaf frontend, Thomas integrated Spring AI to connect to LLMs like those from Ollama (local) and Mistral AI (cloud). He demonstrated text classification by creating a POST endpoint to categorize musical data (e.g., “Irish tin whistle” as an instrument) using a chat client API. To mitigate risks like prompt injection attacks, he employed Java enumerations to enforce structured outputs, converting free text into JSON-parsed Java objects. This approach ensured security and usability, allowing developers to swap models without code changes, enhancing flexibility for production environments.

Semantic Search and Retrieval-Augmented Generation

Thomas addressed the challenge of searching musical data by meaning, not just keywords, using semantic search. By leveraging embedding models in Spring AI, he converted text (e.g., “melancholic”) into numerical vectors stored in a PostgreSQL database, enabling searches for related terms like “sad.” He extended this with retrieval-augmented generation (RAG), where a chat client advisor retrieves relevant data before querying the LLM. For instance, asking, “What instruments for a melancholic scene?” returned suggestions like cello, based on his dataset, improving search accuracy and user experience.

Structured Data Extraction and Human Oversight

To streamline data entry, Thomas implemented structured data extraction, converting unstructured director notes (e.g., from audio recordings) into JSON objects for database storage. Spring AI facilitated this by defining a JSON schema for the LLM to follow, ensuring structured outputs. Recognizing LLMs’ potential for errors, he emphasized keeping humans in the loop, requiring users to review extracted data before saving. This approach, applied to his composer assistant, reduced manual effort while maintaining accuracy, applicable to scenarios like customer support ticket processing.

Tools and MCP for Enhanced Functionality

Thomas enhanced his application with tools, enabling LLMs to call internal APIs, such as saving composition notes. Using Spring Data, he annotated methods to make them accessible to the model, allowing automated actions like data storage. He also introduced the Model Context Protocol (MCP), implemented in Quarkus, to integrate with external music software via MIDI signals. This allowed the LLM to play chord progressions (e.g., in A minor) through his piano software, demonstrating how MCP extends AI capabilities across local processes, though he cautioned it’s not yet production-ready.

Observability and Live Composition

To ensure production readiness, Thomas integrated OpenTelemetry for observability, tracking LLM operations like token usage and prompt augmentation. During the session, he invited the audience to choose a movie scene (action won) and used his application to generate a composition plan, suggesting chord progressions (e.g., I-VI-III-VII) and instruments like percussion and strings. He performed the music live, copy-pasting AI-suggested notes into his software, fixing minor bugs, and adding creative touches, showcasing a practical blend of AI automation and human artistry.

Links:

PostHeaderIcon [SpringIO2024] Text-to-SQL: Chat with a Database Using Generative AI by Victor Martin & Corrado De Bari @ Spring I/O 2024

At Spring I/O 2024 in Barcelona, Victor Martin, a product manager for Oracle Database, delivered a compelling session on Text-to-SQL, a transformative approach to querying databases using natural language, powered by generative AI. Stepping in for his colleague Corrado De Bari, who was unable to attend, Victor explored how Large Language Models (LLMs) and Oracle’s innovative tools, including Spring AI and Select AI, enable business users with no SQL expertise to interact seamlessly with databases. The talk highlighted practical implementations, security considerations, and emerging technologies like AI Vector Search, offering a glimpse into the future of database interaction.

The Promise of Text-to-SQL

Text-to-SQL leverages LLMs to translate natural language queries into executable SQL, democratizing data access for non-technical users. Victor began by posing a challenge: how long would it take to build a REST endpoint for a business user to query a database using plain text? Traditionally, this task required manual SQL construction, schema validation, and error handling. With modern frameworks like Spring Boot and Oracle’s Select AI, this process is streamlined. Select AI, integrated into Oracle Database 19c and enhanced in 23 AI, supports features like RUN_SQL to execute generated queries, NARRATE to return results as human-readable text, and EXPLAIN_SQL to detail query reasoning. Victor emphasized that these tools reduce development time, enabling rapid deployment of user-friendly database interfaces.

Configuring Oracle Database for Text-to-SQL

Implementing Text-to-SQL requires minimal configuration within Oracle Database. Victor outlined the steps: first, set up an Access Control List (ACL) to allow external LLM calls, specifying the host and port. Next, create credentials for the LLM service (e.g., Oracle Cloud Infrastructure Generative AI, Open AI, or Azure Open AI) using the DBMS_CLOUD_AI package. Finally, define a profile linking the schema, tables, and chosen LLM. This profile is selected per session to ensure queries use the correct context. Victor demonstrated this with a Spring Boot application, where the profile is set before invoking Select AI. The simplicity of this setup, combined with Spring AI’s abstraction, makes it accessible even for developers new to AI-driven database interactions.

Enhancing Queries with Schema Annotations

A key challenge in Text-to-SQL is ensuring LLMs interpret ambiguous schemas correctly. Victor highlighted that table and column names like “C1” or “Table1” can confuse models. To address this, Oracle Database supports annotations—comments on tables and columns that provide business context. For example, annotating a column as “process status” with possible values clarifies its purpose, aiding the LLM in generating accurate joins and filters. These annotations, which don’t affect production queries, are created collaboratively by DBAs and business stakeholders. Victor shared a real-world example from Oracle’s telecom applications, where annotated schemas improved query precision, enabling complex queries without manual intervention.

AI Vector Search: Querying Unstructured Data

Victor introduced AI Vector Search, a cutting-edge feature in Oracle Database 23 AI, which extends Text-to-SQL to unstructured data. Unlike traditional SQL, which queries structured data, vector search encodes text, images, or audio into high-dimensional vectors representing semantic meaning. These vectors, stored as a new VECTOR data type, enable similarity-based queries. For instance, a job search query for “software engineer positions in New York” can combine structured filters (e.g., location) with vector-based matching of job descriptions and resumes. Victor explained how embedding models, deployed via Oracle’s DBMS_DATA_MINING package, generate these vectors, with metrics like cosine similarity determining relevance. This capability opens new use cases, from document search to personalized recommendations.

Links:

PostHeaderIcon [DevoxxBE2023] Making Your @Beans Intelligent: Spring AI Innovations

At DevoxxBE2023, Dr. Mark Pollack delivered an insightful presentation on integrating artificial intelligence into Java applications using Spring AI, a project inspired by advancements in AI frameworks like LangChain and LlamaIndex. Mark, a seasoned Spring developer since 2003 and leader of the Spring Data project, explored how Java developers can harness pre-trained AI models to create intelligent applications that address real-world challenges. His talk introduced the audience to Spring AI’s capabilities, from simple “Hello World” examples to sophisticated use cases like question-and-answer systems over custom documents.

The Genesis of Spring AI

Mark began by sharing his journey into AI, sparked by the transformative impact of ChatGPT. Unlike traditional AI development, which often required extensive data cleaning and model training, pre-trained models like those from OpenAI offer accessible APIs and vast knowledge bases, enabling developers to focus on application engineering rather than data science. Mark highlighted how Spring AI emerged from his exploration of code generation, leveraging the structured nature of code within these models to create a framework tailored for Java developers. This framework abstracts the complexity of AI model interactions, making it easier to integrate AI into Spring-based applications.

Spring AI draws inspiration from Python’s AI ecosystem but adapts these concepts to Java’s idioms, emphasizing component abstractions and pluggability. Mark emphasized that this is not a direct port but a reimagination, aligning with the Spring ecosystem’s strengths in enterprise integration and batch processing. This approach positions Spring AI as a bridge between Java’s robust software engineering practices and the dynamic world of AI.

Core Components of AI Applications

A significant portion of Mark’s presentation focused on the architecture of AI applications, which extends beyond merely calling a model. He introduced a conceptual framework involving contextual data, AI frameworks, and models. Contextual data, akin to ETL (Extract, Transform, Load) processes, involves parsing and transforming data—such as PDFs—into embeddings stored in vector databases. These embeddings enable efficient similarity searches, crucial for use cases like question-and-answer systems.

Mark demonstrated a simple AI client in Spring AI, which abstracts interactions with various AI models, including OpenAI, Hugging Face, Amazon Bedrock, and Google Vertex. This portability allows developers to switch models without significant code changes. He also showcased the Spring CLI, a tool inspired by JavaScript’s Create React App, which simplifies project setup by generating starter code from existing repositories.

Prompt Engineering and Its Importance

Prompt engineering emerged as a critical theme in Mark’s talk. He explained that crafting effective prompts is essential for directing AI models to produce desired outputs, such as JSON-formatted responses or specific styles of answers. Spring AI’s PromptTemplate class facilitates this by allowing developers to create reusable, stateful templates with placeholders for dynamic content. Mark illustrated this with a demo where a prompt template generated a joke about a raccoon, highlighting the importance of roles (system and user) in defining the context and tone of AI responses.

He also touched on the concept of “dogfooding,” where AI models are used to refine prompts, creating a feedback loop that enhances their effectiveness. This iterative process, combined with evaluation techniques, ensures that applications deliver accurate and relevant responses, addressing challenges like model hallucinations—where AI generates plausible but incorrect information.

Retrieval Augmented Generation (RAG)

Mark introduced Retrieval Augmented Generation (RAG), a technique to overcome the limitations of AI models’ context windows, which restrict the amount of data they can process. RAG involves pre-processing data into smaller fragments, converting them into embeddings, and storing them in vector databases for similarity searches. This approach allows developers to provide only relevant data to the model, improving efficiency and accuracy.

In a demo, Mark showcased RAG with a bicycle shop dataset, where a question about city-commuting bikes retrieved relevant product descriptions from a vector store. This process mirrors traditional search engines but leverages AI to synthesize answers, demonstrating how Spring AI integrates with vector databases like Milvus and PostgreSQL to handle complex queries.

Real-World Applications and Future Directions

Mark highlighted practical applications of Spring AI, such as enabling question-and-answer systems for financial documents, medical records, or government programs like Medicaid. These use cases illustrate AI’s potential to make complex information more accessible, particularly for non-technical users. He also discussed the importance of evaluation in AI development, advocating for automated scoring mechanisms to assess response quality beyond simple test passing.

Looking forward, Mark outlined Spring AI’s roadmap, emphasizing robust core abstractions and support for a growing number of models and vector databases. He encouraged developers to explore the project’s GitHub repository and participate in its evolution, underscoring the rapid pace of AI advancements and the need for community involvement.

Links: