Jonathan Lalou's Blog

Posts Tagged ‘GemmaModels’

[GoogleIO2025] Google I/O ’25 Developer Keynote

Keynote Speakers

Josh Woodward serves as the Vice President of Google Labs, where he leads teams focused on advancing AI products, including the Gemini app and innovative tools like NotebookLM and AI Studio. His work emphasizes turning AI research into practical applications that align with Google’s mission to organize the world’s information.

Logan Kilpatrick is the Lead Product Manager for Google AI Studio, specializing in the Gemini API and artificial general intelligence initiatives. With a background in computer science from Harvard and Oxford, and prior experience at NASA and OpenAI, he drives product development to make AI accessible for developers.

Paige Bailey holds the position of Lead Product Manager for Generative Models at Google DeepMind. Her expertise lies in machine learning, with a focus on democratizing advanced AI technologies to enable developers to create innovative applications.

Diana Wong is a Group Product Manager at Google, contributing to Android ecosystem advancements. She oversees product strategies that enhance user experiences across devices, drawing from her education at Carnegie Mellon University.

Florina Muntenescu is a Developer Relations Manager at Google, specializing in Android development. With a background in computer science from Babeș-Bolyai University, she advocates for tools like Jetpack Compose and promotes best practices in app performance and adaptability.

Addy Osmani is the Head of Chrome Developer Experience at Google, serving as a Senior Staff Engineering Manager. He leads efforts to improve developer tools in Chrome, with a strong emphasis on performance, AI integration, and web standards.

David East is the Developer Relations Lead for Project IDX at Google, with extensive experience in Firebase. He has been instrumental in backend-as-a-service products, focusing on cloud-based development workspaces.

Gus Martins is the Product Manager for the Gemma family of open models at Google DeepMind. His role involves making AI models adaptable for various domains, including healthcare and multilingual applications, while fostering community contributions.

Abstract

This article examines the key innovations presented in the Google I/O 2025 Developer Keynote, focusing on advancements in AI-driven development tools across Google’s ecosystem. It explores updates to the Gemini API, Android enhancements, web technologies, Firebase Studio, and the Gemma open models, analyzing their technical foundations, practical implementations, and broader implications for software engineering. By dissecting demonstrations and announcements, the discussion highlights how these tools facilitate rapid prototyping, multimodal AI integration, and cross-platform development, ultimately aiming to empower developers in creating performant, adaptive applications.

Advancements in Gemini API and AI Studio

The keynote opens with a strong emphasis on the Gemini API, showcasing its evolution as a cornerstone for building intelligent applications. Josh Woodward introduces the concept of blending code and design through experimental tools like Stitch, which leverages Gemini 2.5 Flash for rapid interface generation. This model, noted for its speed and cost-efficiency, enables developers to transition from textual prompts to functional designs and markup in minutes. For instance, a prompt to create an app for discovering California activities generates editable screens in Figma format, complete with customizable themes such as dark mode with lime green accents.

Logan Kilpatrick delves deeper into AI Studio, positioning it as a prototyping environment that answers whether ideas can be realized with Gemini. The introduction of the 2.5 Flash native audio model enhances voice agent capabilities, supporting 24 languages and ignoring extraneous noises—ideal for real-world applications. Key improvements include function calling, search grounding, and URL context, allowing models to fetch and integrate web data dynamically. An example demonstrates grounding responses with developer docs, where a prompt yields a concise summary of function calling: connecting models to external APIs for real-world actions.

A practical illustration involves generating a text adventure game using Gemini and Imagen, where the model reasons through specifications, generates code, and self-corrects errors. This iterative, multi-turn process underscores the API’s role in accelerating development cycles. Furthermore, support for the Model Context Protocol (MCP) in the GenAI SDK facilitates integration with open-source tools, expanding the ecosystem.

Paige Bailey extends this by remixing a maps app into a “keynote companion” agent named Casey, demonstrating live audio processing and UI updates. Using functions like increment_utterance_count, the agent tracks mentions of Gemini-related terms, showcasing sliding context windows for long-running sessions. Asynchronous function calls enable non-blocking operations, such as fetching fun facts via search grounding, while structured JSON outputs ensure UI consistency.

These advancements reflect a methodological shift toward agentive AI, where models not only process inputs but execute actions autonomously. The implications are profound: developers can build conversational apps for e-commerce or navigation with minimal code, reducing latency and enhancing user engagement. However, challenges like ensuring data privacy in multimodal inputs warrant careful consideration in production environments.

AI Integration in Android Development

Shifting to mobile ecosystems, Diana Wong and Florina Muntenescu highlight how AI powers “excellent” Android apps—defined by delight, performance, and cross-device compatibility. The Androidify app exemplifies this, using selfies and image generation to create personalized Android bots. Under the hood, Gemini’s multimodal capabilities process images via generate_content, followed by Imagen 3 for robot rendering, all orchestrated through Firebase with just five lines of code.

On-device AI via Gemini Nano offers APIs for tasks like summarization and rewriting, ensuring privacy by avoiding server transmissions. The Material 3 Expressive update introduces playful elements, such as cookie-shaped buttons and morphing animations, available in Compose Material Alpha. Live updates in Android 16 provide time-sensitive notifications, enhancing user relevance.

Performance optimizations, including R8 and baseline profiles, yield significant gains, as evidenced by Reddit’s one-star rating increase. API changes in Android 16 eliminate orientation restrictions, promoting responsive UIs. Collaboration with Samsung on desktop windowing and adaptive layouts in Compose supports foldables, tablets, Chromebooks, cars, and XR devices like Project Muhan and Aura.

Developer productivity tools in Android Studio leverage Gemini for natural language-based end-to-end testing. For example, a journey script selects photos via descriptions like “woman with a pink dress,” automating assertions without manual synchronization. An AI agent for dependency updates scans projects, suggesting migrations like Kotlin 2.0, streamlining maintenance.

The contextual implications are clear: AI reduces barriers to creating adaptive, performant apps, boosting engagement metrics—Canva reports twice-weekly usage among cross-device users. Methodologically, this integrates cloud and on-device models, balancing power and privacy, but requires developers to optimize for diverse hardware, potentially increasing testing complexity.

Enhancing Web Development with Chrome Tools

Addy Osmani and Yuna Shin focus on web innovations, advocating for a “powerful web made easier” through AI-infused tools. Project IDX, now Firebase Studio, enables prompt-based app creation, but the web segment emphasizes Chrome DevTools and built-in AI APIs.

Baseline integration in VS Code and ESLint provides browser compatibility checks directly in tooltips, warning on unsupported features. AI assistance in DevTools uses natural language to debug issues, such as misaligned buttons fixed via transform properties, applying changes to workspaces without context switching.

The redesigned performance panel identifies layout shifts, with Gemini suggesting fixes like font optimizations. Seven new AI APIs, backed by Gemini Nano, support on-device processing for privacy-sensitive scenarios. Multimodal capabilities process audio and images, demonstrated by extracting ticket details to highlight seats in a theater app.

Hybrid solutions with Firebase allow fallback to cloud models, ensuring cross-browser compatibility. Partners like Deote leverage these for faster onboarding, projecting 30% efficiency gains.

Analytically, this methodology embeds AI in workflows, reducing debugging time and enabling scalable features. Implications include broader AI adoption in regulated sectors, but raise questions about model biases in automated fixes. The fine-tuning for web contexts ensures relevance, fostering a more inclusive developer experience.

Innovations in Firebase Studio

David East presents Firebase Studio as a cloud-based AI workspace for full-stack app generation. Importing Figma designs via Builder.io translates to functional components, as shown with a furniture store app. Gemini assists in extending designs, creating product detail pages with routing, data flow, and add-to-cart features using 2.5 Pro.

Automatic backend provisioning detects needs for databases or authentication, generating blueprints and code. This open, extensible VM allows custom stacks, with deployment to Firebase Hosting.

The approach streamlines prototyping, breaking changes into reviewable steps and auto-generating descriptions for placeholders. Implications extend to rapid iteration, lowering entry barriers for non-coders, though dependency on AI prompts necessitates clear specifications to avoid errors.

Expanding the Gemma Family of Open Models

Gus Martins introduces Gemma 3N, a lightweight model running on 2GB RAM with audio understanding, available in AI Studio and open-source tools. Med-Gemma advances healthcare applications, analyzing radiology images.

Fine-tuning demonstrations use LoRA in Google Colab, creating personalized emoji translators. The new AI-first Colab transforms prompts into UIs, facilitating comparisons between base and tuned models.

Community-driven variants, like Navarasa for Indic languages and S-Gemma for sign languages, highlight multilingual prowess. Dolphin Gemma, fine-tuned on vocalization data, aids marine research.

This open model strategy democratizes AI, enabling domain-specific adaptations. Implications include ethical advancements in accessibility and science, but require safeguards against misuse in sensitive areas like healthcare.

Implications and Future Directions

Collectively, these innovations signal a paradigm where AI augments every development stage, from ideation to deployment. Methodologically, multimodal models and agentive tools reduce boilerplate, fostering creativity. Contexts like privacy and performance drive hybrid approaches, with implications for inclusive tech—empowering global developers.

Future directions may involve deeper ecosystem integrations, addressing scalability and bias. As tools mature, they promise transformative impacts on software paradigms, urging ethical considerations in AI adoption.

Links:

Posted in en-US | Tags: AddyOsmani, AIDevelopment, AndroidDevelopment, ChromeDevTools, DavidEast, DianaWong, FirebaseStudio, FlorinaMuntenescu, GeminiAPI, GemmaModels, GoogleDeepMind, GoogleIO2025, GusMartins, JoshWoodward, LoganKilpatrick, MultimodalAI, PaigeBailey | No Comments »

[GoogleIO2024] What’s New in Google AI: Advancements in Models, Tools, and Edge Computing

Author: Jonathan Lalou

The realm of artificial intelligence is advancing rapidly, as evidenced by insights from Josh Gordon, Laurence Moroney, and Joana Carrasqueira. Their discussion illuminated progress in Gemini APIs, open-source frameworks, and on-device capabilities, underscoring Google’s efforts to democratize AI for creators worldwide.

Breakthroughs in Gemini Models and Developer Interfaces

Josh highlighted Gemini 1.5 Pro’s multimodal prowess, handling extensive contexts like hours of video or thousands of images. Demonstrations included analyzing museum footage for exhibit details and extracting insights from lengthy PDFs, such as identifying themes in historical texts. Audio processing shone in examples like transcribing and querying lectures, revealing the model’s versatility.

Google AI Studio facilitates prototyping, with seamless transitions to code via SDKs in Python, JavaScript, and more. The Gemini API Cookbook offers practical guides, while features like context caching reduce costs for repetitive prompts. Developers can tune models swiftly, as shown in a book recommendation app refined with synthetic data.

Empowering Frameworks for Efficient AI Development

Joana explored Keras and JAX, pivotal for scalable AI. Keras 3.0 supports multiple backends, enabling seamless transitions between TensorFlow, PyTorch, and JAX, ideal for diverse workflows. Its streamlined APIs accelerate prototyping, as illustrated in a classification task using minimal code.

JAX’s strengths in high-performance computing were evident in examples like matrix operations and neural network training, leveraging just-in-time compilation for speed. PaliGemma, a vision-language model, exemplifies fine-tuning for tasks like captioning, with Kaggle Models providing accessible datasets. These tools lower barriers, fostering innovation across research and production.

On-Device AI and Responsible Innovation

Laurence introduced Google AI Edge, unifying on-device solutions to simplify adoption. MediaPipe abstractions ease complexities in preprocessing and model management, now supporting PyTorch conversions. The Model Explorer aids in tracing inferences, enhancing transparency.

Fine-tuned Gemma models run locally for privacy-sensitive applications, like personalized book experts using retrieval-augmented generation. Emphasis on agentic workflows hints at future self-correcting systems. Laurence stressed AI’s human-centric nature, urging ethical considerations through published principles, positioning it as an amplifier for global problem-solving.

Links:

Posted in en-US | Tags: AIEdge, GeminiAPI, GemmaModels, GoogleAI, GoogleIO2024, JAX, JoanaCarrasqueira, JoshGordon, Keras, LaurenceMoroney, OnDeviceAI, ResponsibleAI, SundarPichai | No Comments »

[GoogleIO2024] Developer Keynote: Innovations in AI and Development Tools at Google I/O 2024

Author: Jonathan Lalou

The Developer Keynote at Google I/O 2024 showcased a transformative vision for software creation, emphasizing how generative artificial intelligence is reshaping the landscape for creators worldwide. Delivered by a team of Google experts, the session highlighted accessible AI models, enhanced productivity across platforms, and new tools designed to simplify complex workflows. This presentation underscored Google’s commitment to empowering millions of developers through an ecosystem that spans billions of devices, fostering innovation without the burden of underlying infrastructure challenges.

Advancing AI Accessibility and Model Integration

A core theme of the keynote revolved around making advanced AI capabilities available to every programmer. The speakers introduced Gemini 1.5 Flash, a lightweight yet powerful model optimized for speed and cost-effectiveness, now accessible globally via the Gemini API in Google AI Studio. This tool balances quality, efficiency, and affordability, enabling developers to experiment with multimodal applications that incorporate audio, video, and extensive context windows. For instance, Jacqueline demonstrated a personal workflow where voice memos and prior blog posts were synthesized into a draft article, illustrating how large context windows—up to two million tokens—unlock novel interactions while reducing computational expenses through features like context caching.

This approach extends beyond simple API calls, as the team emphasized techniques such as model tuning and system instructions to personalize outputs. Real-world examples included Loc.AI’s use of Gemini for renaming elements in frontend designs from Figma, enhancing code readability by interpreting nondescript labels. Similarly, Invision leverages the model’s speed for real-time environmental descriptions aiding low-vision users, while Zapier automates podcast editing by removing filler words from audio uploads. These cases highlight how Gemini empowers practical transformations, from efficiency gains to user delight, encouraging participation in the Gemini API developer competition for innovative applications.

Enhancing Mobile Development with Android and Gemini

Shifting focus to mobile ecosystems, the keynote delved into Android’s evolution as an AI-centric operating system. With over three billion devices, Android now integrates Gemini to enable on-device experiences that prioritize privacy and low latency. Gemini Nano, the most efficient model for edge computing, powers features like smart replies in messaging without data leaving the device, available on select hardware like the Pixel 8 Pro and Samsung Galaxy S24 series, with broader rollout planned.

Early adopters such as Patreon and Grammarly showcased its potential: Patreon for summarizing community chats, and Grammarly for intelligent suggestions. Maru elaborated on Kotlin Multiplatform support in Jetpack libraries, allowing shared business logic across Android, iOS, and web, as seen in Google Docs migrations. Compose advancements, including performance boosts and adaptive layouts, were highlighted, with examples from SoundCloud demonstrating faster UI development and cross-form-factor compatibility. Testing improvements, like Android Device Streaming via Firebase and resizable emulators, ensure robust validation for diverse hardware.

Jamal illustrated Gemini’s role in Android Studio, evolving from Studio Bot to provide code optimizations, translations, and multimodal inputs for rapid prototyping. A demo converted a wireframe image into functional Jetpack Compose code, underscoring how AI accelerates from ideation to implementation.

Revolutionizing Web and Cross-Platform Experiences

The web’s potential was amplified through AI integrations, marking its 35th anniversary with tools like WebGPU and WebAssembly for on-device inference. John discussed how these enable efficient model execution across devices, with examples like Bilibili’s 30% session duration increase via MediaPipe’s image recognition. Chrome’s enhancements, including AI-powered dev tools for error explanations and code suggestions, streamline debugging, as shown in a Boba tea app troubleshooting CORS issues.

Aaron introduced Project IDX, now in public beta, as an integrated workspace for full-stack, multiplatform development, incorporating Google Maps, DevTools, and soon Checks for privacy compliance. Flutter’s updates, including WebAssembly support for up to 2x performance gains, were exemplified by Bricket’s cross-platform expansion. Firebase’s evolution, with Data Connect for SQL integration, App Hosting for scalable web apps, and Genkit for seamless AI workflows, further simplifies backend connections.

Customizing AI Models and Future Prospects

Shabani and Lawrence explored open models like Gemma, with new variants such as PaliGemma for vision-language tasks and the upcoming Gemma 2 for enhanced performance on optimized hardware. A demo in Colab illustrated fine-tuning Gemma for personalized book recommendations, using synthetic data from Gemini and on-device inference via MediaPipe. Project Gameface’s Android expansion demonstrated accessibility advancements, while an early data science agent concept showcased multi-step reasoning with long context.

The keynote concluded with resources like accelerators and the Google Developer Program, emphasizing community-driven innovation. Eugene AI’s emissions reduction via DeepMind research exemplified real-world impact, reinforcing Google’s ecosystem for reaching global audiences.

Links:

Posted in en-US | Tags: AIInnovation, AndroidDevelopment, DeepMind, DemisHassabis, DeveloperTools, Firebase, Flutter, GeminiAI, GemmaModels, GoogleIO2024, Jacqueline, JoshWoodward, Multiplatform, ProjectIDX, SundarPichai, WebGPU | No Comments »