Recent Posts
Archives

Posts Tagged ‘OpenAI’

PostHeaderIcon [DevoxxFR2025] Building an Agentic AI with Structured Outputs, Function Calling, and MCP

The rapid advancements in Artificial Intelligence, particularly in large language models (LLMs), are enabling the creation of more sophisticated and autonomous AI agents – programs capable of understanding instructions, reasoning, and interacting with their environment to achieve goals. Building such agents requires effective ways for the AI model to communicate programmatically and to trigger external actions. Julien Dubois, in his deep-dive session, explored key techniques and a new protocol essential for constructing these agentic AI systems: Structured Outputs, Function Calling, and the Model-Controller Protocol (MCP). Using practical examples and the latest Java SDK developed by OpenAI, he demonstrated how to implement these features within LangChain4j, showcasing how developers can build AI agents that go beyond simple text generation.

Structured Outputs: Enabling Programmatic Communication

One of the challenges in building AI agents is getting LLMs to produce responses in a structured format that can be easily parsed and used by other parts of the application. Julien explained how Structured Outputs address this by allowing developers to define a specific JSON schema that the AI model must adhere to when generating its response. This ensures that the output is not just free-form text but follows a predictable structure, making it straightforward to map the AI’s response to data objects in programming languages like Java. He demonstrated how to provide the LLM with a JSON schema definition and constrain its output to match that schema, enabling reliable programmatic communication between the AI model and the application logic. This is crucial for scenarios where the AI needs to provide data in a specific format for further processing or action.

Function Calling: Giving AI the Ability to Act

To be truly agentic, an AI needs the ability to perform actions in the real world or interact with external tools and services. Julien introduced Function Calling as a powerful mechanism that allows developers to define functions in their code (e.g., Java methods) and expose them to the AI model. The LLM can then understand when a user’s request requires calling one of these functions and generate a structured output indicating which function to call and with what arguments. The application then intercepts this output, executes the corresponding function, and can provide the function’s result back to the AI, allowing for a multi-turn interaction where the AI reasons, acts, and incorporates the results into its subsequent responses. Julien demonstrated how to define function “signatures” that the AI can understand and how to handle the function calls triggered by the AI, showcasing scenarios like retrieving information from a database or interacting with an external API based on the user’s natural language request.

MCP: Standardizing LLM Interaction

While Structured Outputs and Function Calling provide the capabilities for AI communication and action, the Model-Controller Protocol (MCP) emerges as a new standard to streamline how LLMs interact with various data sources and tools. Julien discussed MCP as a protocol that aims to standardize the communication layer between AI models (the “Model”) and the application logic that orchestrates them and provides access to external resources (the “Controller”). This standardization can facilitate building more portable and interoperable AI agentic systems, allowing developers to switch between different LLMs or integrate new tools and data sources more easily. While details of MCP might still be evolving, its goal is to provide a common interface for tasks like function calling, accessing external knowledge, and managing conversational state. Julien illustrated how libraries like LangChain4j are adopting these concepts and integrating with protocols like MCP to simplify the development of sophisticated AI agents. The presentation, rich in code examples using the OpenAI Java SDK, provided developers with the practical knowledge and tools to start building the next generation of agentic AI applications.

Links:

PostHeaderIcon [DotAI2024] DotAI 2024: Romain Huet and Katia Gil Guzman – Pioneering AI Innovations at OpenAI

Romain Huet and Katia Gil Guzman, stalwarts of OpenAI’s Developer Experience team, charted the horizon of AI integration at DotAI 2024. Huet, Head of Developer Experience with roots at Stripe and Twitter, alongside Guzman—a solutions architect turned advocate for scalable tools—illuminated iterative deployment’s ethos. Their dialogue unveiled OpenAI’s trajectory from GPT-3’s nascent API to multimodal frontiers, empowering builders to conjure native AI paradigms.

From Experimentation to Ecosystem Maturity

Huet reminisced on GPT-3’s 2020 launch: an API inviting tinkering yielded unforeseen gems like AI Dungeon’s narrative weaves or code autocompletions. This exploratory ethos, he emphasized, birthed a vibrant ecosystem—now boasting Assistants API for persistent threads and fine-tuning for bespoke adaptations.

Guzman delved into Assistants’ evolution: function calling bridges models to externalities, orchestrating tools like databases or calendars sans hallucination pitfalls. Retrieval threads embed knowledge bases, fostering context-aware dialogues that scale from prototypes to enterprises.

Their synergy underscored OpenAI’s research-to-product cadence: iterative releases, from GPT-4’s multimodal prowess to o1’s reasoning chains, democratize AGI pursuits. Huet spotlighted Pioneers Program, partnering select founders for custom fine-tunes, accelerating innovation while gleaning real-world insights.

Multimodal Horizons and Real-Time Interactions

Guzman demoed Realtime API’s alchemy: low-latency voice pipelines fuse speech-to-text with tool invocation, enabling immersive exchanges—like querying cosmic data mid-conversation, visualizing trajectories via integrated visuals. Audio’s debut heralds vision’s integration, birthing interfaces that converse fluidly across senses.

Huet envisioned this as interface reinvention: beyond text, agents navigate worlds, leveraging GPT-4’s perceptual depth for grounded actions. Early adopters, he noted, craft speech-to-speech odysseys—piloting virtual realms or debugging via vocal cues—portending conversational computing’s renaissance.

As Paris beckons with a forthcoming office, Huet and Guzman rallied the French tech vanguard: leverage these primitives to reforge software legacies into intuitive symphonies. Their clarion: wield this vanguard toolkit to author humanity’s AGI narrative.

Forging the Next Wave of AI Natives

Huet’s closing evoked a collaborative odyssey: developers as AGI co-pilots, surfacing use cases that refine models iteratively. Guzman’s parting wisdom: harness exclusivity—early access begets advantage in modality-rich vistas.

Together, they affirmed OpenAI’s mantle: not solitary savants, but enablers of collective ingenuity, where APIs evolve into canvases for tomorrow’s intelligences.

Links:

PostHeaderIcon [PHPForumParis2023] Experience Report: Building Two Open-Source Personal AIs with OpenAI – Maxime Thoonsen

Maxime Thoonsen, CTO at Theodo, shared an exhilarating session at Forum PHP 2023, detailing his experience building two open-source personal AI applications using OpenAI’s technologies. As an organizer of the Generative AI Paris Meetup, Maxime’s passion for the PHP community and innovative AI solutions shone through. His step-by-step approach demystified AI development, encouraging PHP developers to explore generative AI by demonstrating its simplicity and potential through practical examples.

Understanding Generative AI’s Potential

Maxime began by introducing the capabilities of generative AI, emphasizing its accessibility for PHP developers. He explained how OpenAI’s APIs enable the creation of applications that process and generate human-like text. Drawing from his work at Theodo, Maxime showcased two personal AI projects, illustrating how they leverage semantic search and embeddings to deliver tailored responses. His enthusiasm for the community, where he began his speaking career, underscored the collaborative spirit driving AI innovation.

Practical AI Development with OpenAI

Delving into the technical details, Maxime walked the audience through building AI applications using OpenAI’s APIs. He highlighted the simplicity of implementing semantic search to retrieve relevant data from documents, advising against premature fine-tuning in favor of straightforward similarity searches. Responding to an audience question, Maxime noted the availability of open-source alternatives like Llama and Mistral, though he acknowledged OpenAI’s GPT-4 as a leader in embedding accuracy. His examples empowered developers to start building AI-driven features in their PHP projects.

Navigating the AI Ecosystem

Maxime concluded by addressing the rapidly evolving AI landscape, likening it to the proliferation of JavaScript frameworks. He emphasized the cost-effectiveness of smaller open-source models for specific use cases, while noting OpenAI’s edge in precision. His talk inspired developers to join communities like the Generative AI Paris Meetup to explore AI further, fostering a sense of curiosity and experimentation within the PHP ecosystem.

Links:

PostHeaderIcon [DevoxxBE2023] Making Your @Beans Intelligent: Spring AI Innovations

At DevoxxBE2023, Dr. Mark Pollack delivered an insightful presentation on integrating artificial intelligence into Java applications using Spring AI, a project inspired by advancements in AI frameworks like LangChain and LlamaIndex. Mark, a seasoned Spring developer since 2003 and leader of the Spring Data project, explored how Java developers can harness pre-trained AI models to create intelligent applications that address real-world challenges. His talk introduced the audience to Spring AI’s capabilities, from simple “Hello World” examples to sophisticated use cases like question-and-answer systems over custom documents.

The Genesis of Spring AI

Mark began by sharing his journey into AI, sparked by the transformative impact of ChatGPT. Unlike traditional AI development, which often required extensive data cleaning and model training, pre-trained models like those from OpenAI offer accessible APIs and vast knowledge bases, enabling developers to focus on application engineering rather than data science. Mark highlighted how Spring AI emerged from his exploration of code generation, leveraging the structured nature of code within these models to create a framework tailored for Java developers. This framework abstracts the complexity of AI model interactions, making it easier to integrate AI into Spring-based applications.

Spring AI draws inspiration from Python’s AI ecosystem but adapts these concepts to Java’s idioms, emphasizing component abstractions and pluggability. Mark emphasized that this is not a direct port but a reimagination, aligning with the Spring ecosystem’s strengths in enterprise integration and batch processing. This approach positions Spring AI as a bridge between Java’s robust software engineering practices and the dynamic world of AI.

Core Components of AI Applications

A significant portion of Mark’s presentation focused on the architecture of AI applications, which extends beyond merely calling a model. He introduced a conceptual framework involving contextual data, AI frameworks, and models. Contextual data, akin to ETL (Extract, Transform, Load) processes, involves parsing and transforming data—such as PDFs—into embeddings stored in vector databases. These embeddings enable efficient similarity searches, crucial for use cases like question-and-answer systems.

Mark demonstrated a simple AI client in Spring AI, which abstracts interactions with various AI models, including OpenAI, Hugging Face, Amazon Bedrock, and Google Vertex. This portability allows developers to switch models without significant code changes. He also showcased the Spring CLI, a tool inspired by JavaScript’s Create React App, which simplifies project setup by generating starter code from existing repositories.

Prompt Engineering and Its Importance

Prompt engineering emerged as a critical theme in Mark’s talk. He explained that crafting effective prompts is essential for directing AI models to produce desired outputs, such as JSON-formatted responses or specific styles of answers. Spring AI’s PromptTemplate class facilitates this by allowing developers to create reusable, stateful templates with placeholders for dynamic content. Mark illustrated this with a demo where a prompt template generated a joke about a raccoon, highlighting the importance of roles (system and user) in defining the context and tone of AI responses.

He also touched on the concept of “dogfooding,” where AI models are used to refine prompts, creating a feedback loop that enhances their effectiveness. This iterative process, combined with evaluation techniques, ensures that applications deliver accurate and relevant responses, addressing challenges like model hallucinations—where AI generates plausible but incorrect information.

Retrieval Augmented Generation (RAG)

Mark introduced Retrieval Augmented Generation (RAG), a technique to overcome the limitations of AI models’ context windows, which restrict the amount of data they can process. RAG involves pre-processing data into smaller fragments, converting them into embeddings, and storing them in vector databases for similarity searches. This approach allows developers to provide only relevant data to the model, improving efficiency and accuracy.

In a demo, Mark showcased RAG with a bicycle shop dataset, where a question about city-commuting bikes retrieved relevant product descriptions from a vector store. This process mirrors traditional search engines but leverages AI to synthesize answers, demonstrating how Spring AI integrates with vector databases like Milvus and PostgreSQL to handle complex queries.

Real-World Applications and Future Directions

Mark highlighted practical applications of Spring AI, such as enabling question-and-answer systems for financial documents, medical records, or government programs like Medicaid. These use cases illustrate AI’s potential to make complex information more accessible, particularly for non-technical users. He also discussed the importance of evaluation in AI development, advocating for automated scoring mechanisms to assess response quality beyond simple test passing.

Looking forward, Mark outlined Spring AI’s roadmap, emphasizing robust core abstractions and support for a growing number of models and vector databases. He encouraged developers to explore the project’s GitHub repository and participate in its evolution, underscoring the rapid pace of AI advancements and the need for community involvement.

Links: