Archive for the ‘General’ Category
Tutorial: a Windows XP as guest VM in Virtual Box
Target and Constraints
I need a Windows XP running as a virtual machine (VM). Don’t think of using your former OEM licence, it won’t work: Windows checks and makes a difference between OEM and other licences.
Microsoft provides VHD files: you can consider them as “virtual” HDD. Officially, these VHD files are intented at developpers to test their websites on various Windows (XP to Seven) and Internet versions (6 to 9).
The VHD files provided for Windows XP need a licence key to be activated, and therefore have two main drawbacks:
- after three days and/or three reboots, the system will allow you to log in anymore. That’s quiet a limitation :-(.
- But wait, there’s even worse: the VHD file provided by Microsoft will be completely disabled on February 14th, 2013!
At last, I stress on having an absolutely legal solution, since it will be deployed both on personnal (Ubuntu) and professional (Windows 8 ) desktop computers. I do not want to waste my time playing hide and seek with authorities.
Prerequisites
In this post I will assume you are a bit familiar with working on VirtualBox. If you are not, then browse the web, ask Google, RTFM, or, at last, leave a message in the comments, I’ll try to figure out a moment to write a short tutorial.
- Download Windows XP VHD file from this page: Internet Explorer Application Compatibility VPC Image. Even though you get a
.exefile, you can uncompress it with a regular 7-zip. - Download Windows XP SP3 iso file from that page: Windows XP Service Pack 3 – ISO-9660 CD Image File. Keep the
xpsp3_5512.080413-2113_usa_x86fre_spcd.isoas is. - Download PCINTPC5 ethernet drivers from this page: NDIS5 Driver for Microsoft Windows Server 2003, Windows XP, Windows 2000, Windows ME and Windows 98.
- Unzip and convert this V4.51.zip as an ISO file: as a reminder, the Linux command is:[java]mkisofs -o target.iso -J –rock sourceFolder[/java]
- Alternatively, you can download that ISO that I prepared: ethernet_drivers_for_WinXP_VirtualBox.iso (don’t forget to thank me for the precomputed work)
Operations
Classic
- Create a VM within VirtualBox
- Name it “Windows XP” for instance
- Set the VHD file as the one downloaded and unzipped above.
Specific
- Run the VM. You must log in as IEUser. The default password is
Password1(on French keyboards:Pqsszord&) - Do not validate the licence.
- The VM will require
CmBatt.sys(and possibly another one):- On host system: mount SP3 iso
device > CD/DVD Devices> Choose a virtual CD/DVD virtual file >select WinXP SP3 ISO (xpsp3_5512.080413-2113_usa_x86fre_spcd.iso) - On guest system: run the CD, eg:
Windows+E > D:\ > Autoplay > Install. All the files will be unzipped in a folder such asC:\1a2b3c4d5e...(with hexadecimal value). - In the frame asking for
CmBatt.sys, select it inC:\1a2b3c4d5e...\i386
- On host system: mount SP3 iso
- Windows XP will ask for drivers and try to download them. But the ethernet card has not yet been installed!
- On host system:
-
- Mount the ethernet_drivers_for_WinXP_VirtualBox.iso (cf. above for details)
- Devices > Install Guest Additions > accept all
- On guest system: manually install drivers for ethernet card.
- In order to bypass the limitation of February 14th,
- if you read this post after February 14th, 2013: set the system time to January 1st 2013 for instance (I didn’t test ; it should work)
- disable time synchronization between host and guest systems, eg:[java]$VIRTUALBOX_HOME/app32/VBoxManage setextradata "Windows XP" "VBoxInternal/Devices/VMMDev/0/Config/GetHostTimeDisabled" 1[/java]
Now everything should work. I suggest to take a snapshot ;-), and then to revert to it as often as needed.
Conclusions
Officialy, the VHD files provided by Microsoft are intented at developpers who need test their websites on obsolete and out-of-date browsers like Internet Explorer. But you can imagine many other usages. On my side, the interest is to have VM as a module in a complete integrated testing environment and in the frame of a software forge.
My opinion? The solution provided by Microsoft does exist, it’s better than nothing ; anyway, implementing it is a far hard matter. Limitations and complexity of install spoil the user experience. It’s a pity, because the idea of VHD is great, but does not match that of precompiled open source Virtual Boxes: http://www.virtualboxes.org
[DevoxxFR2012] Practicing DDD in a Flash – Sculptor, the DDD Code Generator for Java
Ulrich Vachon is a DDD and agile practitioner with experience at software vendors. He promotes expressive modeling and rapid feedback.
This article expands the live coding demo of Sculptor, a DSL-based code generator for DDD applications in Java. Domain-Driven Design is powerful but verbose. Sculptor accelerates bootstrapping while preserving DDD principles. Using a simple DSL, developers define aggregates, value objects, services, and repositories. Sculptor generates Spring, JPA, REST, and MongoDB code.
Sculptor DSL and Code Generation
A live demo built a blog application:
Application Blog {
Module posts {
Entity Post {
@Id String id;
String title;
String content;
@ManyToOne Author author;
}
ValueObject Author {
String name;
String email;
}
Service PostService {
Post save(Post post);
List<Post> findAll();
}
}
}
Sculptor generated entities, repositories, services, controllers, and tests.
Customization with the Gap Mechanism
The gap keyword allows hand-written extensions without regeneration conflicts.
Links
Relevant links include the Sculptor Project at sites.google.com/site/fornaxsculptor and the original video at YouTube: Practicing DDD in a Flash.
[DevoxxBE2012] Apache TomEE: Java EE 6 Web Profile on Tomcat
David Blevins, a veteran in open-source Java EE and founder of projects like OpenEJB and TomEE, showcased Apache TomEE. With over a decade in specifications like EJB and CDI, David positioned TomEE as a bridge for Tomcat users seeking Java EE capabilities.
He polled the audience, revealing widespread Tomcat use alongside other servers, highlighting the migration pain TomEE addresses. David described TomEE as Tomcat enhanced with Java EE, unzipping Tomcat, adding Apache projects like OpenJPA and CXF, then certifying the bundle.
Emphasizing small size, certification, and Tomcat fidelity, David outlined distributions: Web Profile (minimal specs), JAX-RS (adding REST), and Plus (including JMS, JAX-WS).
Understanding the Web Profile
David clarified the Java EE 6 Web Profile, a subset of 12 specs from the full 24, excluding outdated ones like CORBA and CMP. This acknowledges Java EE’s growth, focusing on essentials for modern apps.
He noted exclusions like JAX-RS (added in EE 7) and inclusions like JavaMail in TomEE’s Web Profile for practicality. David projected EE 7’s profile reductions, potentially enabling full-profile TomEE certification.
Demonstrating TomEE in Action
In a live demo, David set up TomEE in Eclipse using Tomcat adapters, creating a servlet with EJB injection and JPA. He deployed seamlessly, showcasing CDI, transactions, and web services, all within Tomcat’s familiar environment.
David highlighted TomEE’s lightweight footprint—under 30MB—booting quickly with low memory. He integrated tools like Arquillian for testing, demonstrating in-container and embedded modes.
Advanced Features and Configuration
David explored clustering with Hazelcast, enabling session replication without code changes. He discussed production readiness, citing users like OpenShift and Jelastic.
Configuration innovations include flat XML-properties hybrids, human-readable times (e.g., “2 minutes”), and dynamic resource creation. David showed overriding via command-line or properties, extending for custom objects injectable via @Resource.
Error handling stands out: TomEE collects all deployment issues before failing, providing detailed, multi-level feedback to accelerate fixes.
Community and Future Directions
Celebrating TomEE’s first year, David shared growth metrics—surging commits and mailing lists—inviting contributions. He mentioned production adopters praising its simplicity and performance.
David announced a logo contest, encouraging participation. In Q&A, he affirmed production use, low memory needs, and solid components like OpenJPA.
Overall, David’s talk positioned TomEE as an empowering evolution for Tomcat loyalists, blending familiarity with Java EE power.
Links:
[DevoxxFR2012] Toward Sustainable Software Development – Quality, Productivity, and Longevity in Software Engineering
Frédéric Dubois brings ten years of experience in JEE architecture, agile practices, and software quality. A pragmatist at heart, he focuses on continuous improvement, knowledge sharing, and sustainable delivery over rigid processes.
This article expands Frédéric Dubois’s 2012 talk into a manifesto for sustainable software development. Rejecting the idea that quality is expensive, he proves that technical excellence drives long-term productivity. A three-year-old application should not be unmaintainable. Yet many teams face escalating costs with each new feature. Dubois challenged the audience: productivity is not about delivering more features faster today, but about maintaining velocity tomorrow, next year, and five years from now.
The True Cost of Technical Debt
Quality and productivity are intimately linked, but not in the way most assume. High quality reduces defects, simplifies evolution, and prevents technical debt. Low quality creates a vicious cycle of bugs, rework, and frustration. Dubois shared a case study: a banking application delivered on time but with poor design. Two years later, a simple change required three months of work. The same team, using TDD and refactoring, built a similar system in half the time with one-tenth the defects.
Agile Practices for Long-Term Velocity
Agile practices, when applied pragmatically, enable sustainability. Short feedback loops, automated tests, and collective ownership prevent knowledge silos. Fixed-price contracts and outsourcing often incentivize cutting corners. Transparency, shared metrics, and demo-driven development align business and technical goals.
Links
Relevant links include the original video at YouTube: Toward Sustainable Development.
[DevoxxFR2012] “Obésiciel” and Environmental Impact: Green Patterns Applied to Java – Toward Sustainable Computing
Olivier Philippot is an electronics and computer engineer with over a decade of experience in energy management systems and sustainable technology design. Having worked in R&D labs and large industrial groups, he has dedicated his career to understanding the environmental footprint of digital systems. A founding member of the French Green IT community, Olivier contributes regularly to GreenIT.fr, participates in AFNOR working groups on eco-design standards, and trains organizations on sustainable IT practices. His work bridges hardware, software, and policy to reduce the carbon intensity of computing.
This article presents a comprehensively expanded analysis of Olivier Philippot’s 2012 DevoxxFR presentation, Obésiciel and Environmental Impact: Green Patterns Applied to Java, reimagined as a foundational text on software eco-design and technical debt’s environmental cost. The talk introduced the concept of obésiciel, software that grows increasingly resource-hungry with each release, driving premature hardware obsolescence. Philippot revealed a startling truth: manufacturing a single computer emits seventy to one hundred times more CO2 than one year of use, yet software bloat has tripled performance demands every five years, reducing average PC lifespan from six to two years.
Through Green Patterns, JVM tuning strategies, data efficiency techniques, and lifecycle analysis, this piece offers a practical framework for Java developers to build lighter, longer-lived, and lower-impact applications. Updated for 2025, it integrates GraalVM native images, Project Leyden, energy-aware scheduling, and carbon-aware computing, providing a complete playbook for sustainable Java development.
The Environmental Cost of Software Bloat
Manufacturing a laptop emits two hundred to three hundred kilograms of CO2 equivalent. The use phase emits twenty to fifty kilograms per year. Software-driven obsolescence forces upgrades every two to three years. Philippot cited Moore’s Law irony: while transistors double every eighteen months, software efficiency has decreased due to abstraction layers, framework overhead, and feature creep.
Green Patterns for Data Efficiency
Green Patterns for Java include data efficiency. String concatenation in loops is inefficient:
String log = "";
for (String s : list) log += s;
Use StringBuilder instead:
StringBuilder sb = new StringBuilder();
for (String s : list) sb.append(s);
Also use compression, binary formats like Protocol Buffers, and lazy loading.
JVM Tuning for Energy Efficiency
JVM optimization includes:
-XX:+UseZGC
-XX:ReservedCodeCacheSize=128m
-XX:+UseCompressedOops
-XX:+UseContainerSupport
GraalVM Native Image reduces memory by ninety percent, startup to fifty milliseconds, and energy by sixty percent.
Carbon-Aware Computing in 2025
EDIT:
In 2025, carbon-aware Java includes Project Leyden for static images without warmup, energy profilers like JFR and PowerAPI, cloud carbon APIs from AWS and GCP, and edge deployment to reduce data center hops.
Links
Relevant links include GreenIT.fr at greenit.fr, GraalVM Native Image at graalvm.org/native-image, and the original video at YouTube: Obésiciel and Environmental Impact.
[DevoxxBE2012] On the Road to JDK 8: Lambda, Parallel Libraries, and More
Joseph Darcy, a key figure in Oracle’s JDK engineering team, presented an insightful overview of JDK 8 developments. With extensive experience in language evolution, including leading Project Coin for JDK 7, Joseph outlined the platform’s future directions, balancing innovation with compatibility.
He began by contextualizing JDK 8’s major features, particularly lambda expressions and default methods, set for release in September 2013. Joseph polled the audience on JDK usage, noting the impending end of public updates for JDK 6 and urging transitions to newer versions.
Emphasizing a quantitative approach to compatibility, Joseph described experiments analyzing millions of lines of code to inform decisions, such as lambda conversions from inner classes.
Evolving the Language with Compatibility in Mind
Joseph elaborated on the JDK’s evolution policy, prioritizing binary compatibility while allowing measured source and behavioral changes. He illustrated this with diagrams showing compatibility spaces for different release types, from updates to full platforms.
A core challenge, he explained, is evolving interfaces compatibly. Unlike classes, interfaces cannot add methods without breaking implementations. To address this, JDK 8 introduces default methods, enabling API evolution without user burden.
This ties into lambda support, where functional interfaces facilitate closures. Joseph contrasted this with past changes like generics, which preserved migration compatibility through erasure, avoiding VM modifications.
Lambda Expressions and Implementation Techniques
Diving into lambdas, Joseph defined them as anonymous methods capturing enclosing scope values. He traced their long journey into Java, noting their ubiquity in modern languages.
For implementation, Joseph rejected simple inner class translations due to class explosion and performance overhead. Instead, JDK 8 leverages invokedynamic from JDK 7, allowing runtime strategies like class spinning or method handles.
This indirection decouples binary representation from implementation, enabling optimizations. Joseph shared benchmarks showing non-capturing lambdas outperforming inner classes, especially multithreaded.
Serialization posed challenges, resolved via indirection to reconstruct lambdas independently of runtime details.
Parallel Libraries and Bulk Operations
Joseph highlighted how lambdas enable powerful libraries, abstracting behavior as generics abstract types. Streams introduce pipeline operations—filter, map, reduce—with laziness and fork-join parallelism.
Using the Fork/Join Framework from JDK 7, these libraries handle load balancing implicitly, encapsulating complexity. Joseph demonstrated conversions from collections to streams, facilitating scalable concurrent applications.
Broader JDK 8 Features and Future Considerations
Beyond lambdas, Joseph mentioned annotations on types and repeating annotations, enhancing expressiveness. He stressed deferring decisions to avoid constraining future evolutions, like potential method reference enhancements.
In summary, Joseph portrayed JDK 8 as a coordinated update across language, libraries, and VM, inviting community evaluation through available builds.
Links:
[DevoxxFR2012] Node.js and JavaScript Everywhere – A Comprehensive Exploration of Full-Stack JavaScript in the Modern Web Ecosystem
Matthew Eernisse is a seasoned web developer whose career spans over fifteen years of building interactive, high-performance applications using JavaScript, Ruby, and Python. As a core engineer at Yammer, Microsoft’s enterprise social networking platform, he has been at the forefront of adopting Node.js for mission-critical services, contributing to a polyglot architecture that leverages the best tools for each job. Author of the influential SitePoint book Build Your Own Ajax Web Applications, Matthew has long championed JavaScript as a first-class language beyond the browser. A drummer, fluent Japanese speaker, and father of three living in San Francisco, he brings a unique blend of technical depth, practical experience, and cultural perspective to his work. His personal blog at fleegix.org remains a valuable archive of JavaScript patterns and web development insights.
This article presents an exhaustively elaborated, deeply extended, and comprehensively restructured expansion of Matthew Eernisse’s 2012 DevoxxFR presentation, Node.js and JavaScript Everywhere, transformed into a definitive treatise on the rise of full-stack JavaScript and its implications for modern software architecture. Delivered at a pivotal moment, just three years after Node.js’s initial release, the talk challenged prevailing myths about server-side JavaScript while offering a grounded, experience-driven assessment of its real-world benefits. Far from being a utopian vision of “write once, run anywhere,” Matthew argued that Node.js’s true power lay in its event-driven, non-blocking I/O model, ecosystem velocity, and developer productivity, advantages that were already reshaping Yammer’s backend services.
This expanded analysis delves into the technical foundations of Node.js, including the V8 engine, libuv, and the event loop, the architectural patterns that emerged at Yammer such as microservices, real-time messaging, and API gateways, and the cultural shifts required to adopt JavaScript on the server. It includes detailed code examples, performance benchmarks, deployment strategies, and lessons learned from production systems handling millions of users.
EDIT
In 2025 landscape, this piece integrates Node.js 20+, Deno, Bun, TypeScript, Server Components, Edge Functions, and WebAssembly, while preserving the original’s pragmatic, hype-free tone. Through rich narratives, system diagrams, and forward-looking speculation, this work serves as both a historical archive and a practical guide for any team evaluating JavaScript as a backend language.
Debunking the Myths of “JavaScript Everywhere”
The phrase JavaScript Everywhere became a marketing slogan that obscured the technology’s true value. Matthew opened his talk by debunking three common myths. First, the idea that developers write the same code on client and server is misleading. In reality, client and server have different concerns, security, latency, state management. Shared logic such as validation or formatting is possible, but full code reuse is rare and often anti-patterned. Second, the notion that Node.js is only for real-time apps is incorrect. While excellent for WebSockets and chat, Node.js excels in I/O-heavy microservices, API gateways, and data transformation pipelines, not just real-time. Third, the belief that Node.js replaces Java, Rails, or Python is false. At Yammer, Node.js was one tool among many. Java powered core services. Ruby on Rails drove the web frontend. Node.js handled high-concurrency, low-latency endpoints. The real win was developer velocity, ecosystem momentum, and operational simplicity.
The Node.js Architecture: Event Loop and Non-Blocking I/O
Node.js is built on a single-threaded, event-driven architecture. Unlike traditional threaded servers like Apache or Tomcat, Node.js uses an event loop to handle thousands of concurrent connections. A simple HTTP server demonstrates this:
const http = require('http');
http.createServer((req, res) => {
setTimeout(() => {
res.end('Hello after 2 seconds');
}, 2000);
}).listen(3000);
While one request waits, the event loop processes others. This is powered by libuv, which abstracts OS-level async I/O such as epoll, kqueue, and IOCP. Google’s V8 engine compiles JavaScript to native machine code using JIT compilation. In 2012, V8 was already outperforming Ruby and Python in raw execution speed. Recently, V8 TurboFan and Ignition have pushed performance into Java and C# territory.
Yammer’s Real-World Node.js Adoption
In 2011, Yammer began experimenting with Node.js for real-time features, activity streams, notifications, and mobile push. By 2012, they had over fifty Node.js microservices in production, a real-time messaging backbone using Socket.IO, an API proxy layer routing traffic to Java and Rails backends, and a mobile backend serving iOS and Android apps. A real-time activity stream example illustrates this:
io.on('connection', (socket) => {
socket.on('join', (room) => {
socket.join(room);
redis.subscribe(`activity:${room}`);
});
});
redis.on('message', (channel, message) => {
const room = channel.split(':')[1];
io.to(room).emit('activity', JSON.parse(message));
});
This architecture scaled to millions of concurrent users with sub-100ms latency.
The npm Ecosystem and Developer Productivity
Node.js’s greatest strength is npm, the largest package registry in the world. In 2012, it had approximately twenty thousand packages. Now, It exceeds two and a half million. At Yammer, developers used Express.js for routing, Socket.IO for WebSockets, Redis for pub/sub, Mocha and Chai for testing, and Grunt, now Webpack or Vite, for builds. Developers could prototype a service in hours, not days.
Deployment, Operations, and Observability
Yammer ran Node.js on Ubuntu LTS with Upstart, now systemd. Services were containerized early using Docker in 2013. Monitoring used StatsD and Graphite, logging via Winston to ELK. A docker-compose example shows this:
version: '3'
services:
api:
image: yammer/activity-stream
ports: ["3000:3000"]
environment:
- REDIS_URL=redis://redis:6379
The 2025 JavaScript Backend Landscape
EDIT:
The 2025 landscape includes Node.js 20 with ESM and Workers, Fastify and Hono instead of Express, native WebSocket API and Server-Sent Events instead of Socket.IO, Vite, esbuild, and SWC instead of Grunt, and async/await and Promises instead of callbacks. New runtimes include Deno, secure by default and TypeScript-native, and Bun, Zig-based with ten times faster startup. Edge platforms include Cloudflare Workers, Vercel Edge Functions, and AWS Lambda@Edge.
Matthew closed with a clear message: ignore the hype. Node.js is not a silver bullet. But for I/O-bound, high-concurrency, real-time, or rapid-prototype services, it is unmatched. In 2025, as full-stack TypeScript, server components, and edge computing dominate, his 2012 insights remain profoundly relevant.
Links
Relevant links include Matthew Eernisse’s blog at fleegix.org, the Yammer Engineering Blog at engineering.yammer.com, the Node.js Official Site at nodejs.org, and the npm Registry at npmjs.com. The original video is available at YouTube: Node.js and JavaScript Everywhere.
[DevoxxBE2012] Spring 3.2 and 3.2 Themes and Trends
In a dynamic presentation, Josh Long, a prominent Spring developer advocate and author, delved into the evolving landscape of the Spring Framework. As someone deeply embedded in the Spring ecosystem, Josh highlighted how Spring continues to address modern development challenges while maintaining its core principles. He began by recapping the framework’s foundational aspects, emphasizing its role in promoting clean, extensible code without unnecessary reinvention.
Josh explained that Spring operates as a lightweight dependency injection container, layered with vertical technologies for diverse needs like mobile development, big data handling, and web applications. This decoupling from underlying infrastructure enables seamless transitions between environments, from traditional servers to cloud platforms. He noted the increasing complexity in data stores, caching solutions, and client interfaces, underscoring Spring’s relevance in today’s fragmented tech world. By focusing on dependency injection, aspect-oriented programming, and portable service abstractions, Spring empowers developers to build robust, maintainable systems.
Transitioning to recent advancements, Josh reviewed Spring 3.1, released in December 2011, which introduced features like environment profiles and Java-based configuration. These enhancements facilitate tailored bean activations across development stages, simplifying configurations that diverge between local setups and production clouds. He illustrated this with examples of data sources, showing how profiles partition configurations effectively.
Moreover, Josh discussed the caching abstraction in Spring 3.1, which provides a unified SPI for various caches like EHCache and Redis. This abstraction, combined with annotations for cache management, streamlines performance optimizations without locking developers into specific implementations.
Core Refinements in Spring 3.2
Shifting focus to Spring 3.2, slated for release by year’s end, Josh outlined its core refinements. Building on Java 7, it incorporates asynchronous support from Servlet 3.0, enabling efficient handling of long-running tasks in web applications. He demonstrated this with controller methods returning Callable or DeferredResult, allowing requests and responses to process in separate threads, enhancing scalability.
Josh also introduced the Spring MVC Test Framework, a tool for unit testing controllers with mocked servlet APIs. This framework, revamped for 3.2, integrates seamlessly with existing test contexts, promoting better code quality through isolated testing.
Additionally, upgrades to the Spring Expression Language (SpEL) and backported features from 3.1.x bolster the framework’s expressiveness and compatibility. Josh emphasized that these changes maintain Spring’s low-risk upgrade path, ensuring stability for enterprise adopters.
Looking Ahead to Spring 3.3
Josh then previewed Spring 3.3, expected in late 2013, which promises substantial innovations. Central to this release is support for Java SE 8 features, including lambdas, which align naturally with Spring’s single abstract method interfaces. He showcased how lambdas simplify callbacks in components like JdbcTemplate, reducing boilerplate code.
Furthermore, Josh touched on enhanced Groovy support and the integration of the Grails Bean Builder, expanding Spring’s appeal for dynamic languages. The release will also track Java EE 7 APIs, such as JCache 1.0 and JMS 2.0, with annotation-centric endpoints for message-driven architectures.
WebSocket support, crucial for real-time web applications, will be fully integrated into Spring MVC, complementing existing messaging capabilities in Spring Integration.
Strategic Motivations and Community Impact
Throughout his talk, Josh articulated the motivations behind Spring’s shorter release cycles, aiming to deliver timely features without overwhelming users. He stressed the framework’s alignment with emerging standards, positioning it as a bridge between Java SE 7/8 and EE 7.
Josh also shared insights into community contributions, mentioning the GitHub-based model and Gradle builds that foster collaboration. He encouraged feedback, highlighting his role in curating community resources like the weekly roundup on springsource.org.
In closing, Josh fielded questions on topics like bean metadata navigation and conditional caching, reinforcing Spring’s commitment to developer productivity. His enthusiasm underscored Spring’s enduring value in navigating the complexities of modern software engineering.
Links:
[DevoxxFR2012] Lily: Big Data for Dummies – A Comprehensive Journey into Democratizing Apache Hadoop and HBase for Enterprise Java Developers
Lecturers
Steven Noels stands as one of the most visionary figures in the evolution of open-source Java ecosystems, having co-founded Outerthought in the early 2000s with a mission to push the boundaries of content management, RESTful architecture, and scalable data systems. His flagship creation, Daisy CMS, became a cornerstone for large-scale, multilingual content platforms used by governments and global enterprises, demonstrating that Java could power mission-critical, document-centric applications at internet scale. But Noels’ ambition extended far beyond traditional CMS. Recognizing the seismic shift toward big data in the late 2000s, he pivoted Outerthought—and later NGDATA—toward building tools that would make the Apache Hadoop ecosystem accessible to the average enterprise Java developer. Lily, launched in 2010, was the culmination of this vision: a platform that wrapped the raw power of HBase and Solr into a cohesive, Java-friendly abstraction layer, eliminating the need for MapReduce expertise or deep systems programming.
Bruno Guedes, an enterprise Java architect at SFEIR with over a decade of experience in distributed systems and search infrastructure, brought the practitioner’s perspective to the stage. Having worked with Lily from its earliest alpha versions, Guedes had deployed it in production environments handling millions of records, integrating it with legacy Java EE applications, Spring-based services, and real-time analytics pipelines. His hands-on experience—debugging schema migrations, tuning SolrCloud clusters, and optimizing HBase compactions—gave him unique insight into both the promise and the pitfalls of big data adoption in conservative enterprise settings. Together, Noels and Guedes formed a perfect synergy: the visionary architect and the battle-tested engineer, delivering a presentation that was equal parts inspiration and practical engineering.
Abstract
This article represents an exhaustively elaborated, deeply extended, and comprehensively restructured expansion of Steven Noels and Bruno Guedes’ seminal 2012 DevoxxFR presentation, “Lily, Big Data for Dummies”, transformed into a definitive treatise on the democratization of big data technologies for the Java enterprise. Delivered in a bilingual format that reflected the global nature of the Apache community, the original talk introduced Lily as a groundbreaking platform that unified Apache HBase’s scalable, distributed storage with Apache Solr’s full-text search and analytics capabilities, all through a clean, type-safe Java API. The core promise was radical in its simplicity: enterprise Java developers could build petabyte-scale, real-time searchable data systems without writing a single line of MapReduce, without mastering Zookeeper quorum mechanics, and without abandoning the comforts of POJOs, annotations, and IDE autocompletion.
This expanded analysis delves far beyond the original demo to explore the philosophical foundations of Lily’s design, the architectural trade-offs in integrating HBase and Solr, the real-world production patterns that emerged from early adopters, and the lessons learned from scaling Lily to billions of records. It includes detailed code walkthroughs, performance benchmarks, schema evolution strategies, and failure mode analyses.
EDIT:
Updated for the 2025 landscape, this piece maps Lily’s legacy concepts to modern equivalents—Apache HBase 2.5, SolrCloud 9, OpenSearch, Delta Lake, Trino, and Spring Data Hadoop—while preserving the original vision of big data for the rest of us. Through rich narratives, architectural diagrams, and forward-looking speculation, this work serves not just as a historical archive, but as a practical guide for any Java team contemplating the leap into distributed, searchable big data systems.
The Big Data Barrier in 2012: Why Hadoop Was Hard for Java Developers
To fully grasp Lily’s significance, one must first understand the state of big data in 2012. The Apache Hadoop ecosystem—launched in 2006—was already a proven force in internet-scale companies like Yahoo, Facebook, and Twitter. HDFS provided fault-tolerant, distributed storage. MapReduce offered a programming model for batch processing. HBase, modeled after Google’s Bigtable, delivered random, real-time read/write access to massive datasets. And Solr, forked from Lucene, powered full-text search at scale.
Yet for the average enterprise Java developer, this stack was inaccessible. Writing a MapReduce job required:
– Learning a functional programming model in Java that felt alien to OO practitioners.
– Mastering job configuration, input/output formats, and partitioners.
– Debugging distributed failures across dozens of nodes.
– Waiting minutes to hours for job completion.
HBase, while promising real-time access, demanded:
– Manual row key design to avoid hotspots.
– Deep knowledge of compaction, splitting, and region server tuning.
– Integration with Zookeeper for coordination.
Solr, though more familiar, required:
– Separate schema.xml and solrconfig.xml files.
– Manual index replication and sharding.
– Complex commit and optimization strategies.
The result? Big data remained the domain of specialized data engineers, not the Java developers who built the business logic. Lily was designed to change that.
Lily’s Core Philosophy: Big Data as a First-Class Java Citizen
At its heart, Lily was built on a simple but powerful idea: big data should feel like any other Java persistence layer. Just as Spring Data made MongoDB, Cassandra, or Redis accessible via repositories and annotations, Lily aimed to make HBase and Solr feel like JPA with superpowers.
The Three Pillars of Lily
Steven Noels articulated Lily’s architecture in three interconnected layers:
-
The Storage Layer (HBase)
Lily used HBase as its primary persistence engine, storing all data as versioned, column-family-based key-value pairs. But unlike raw HBase, Lily abstracted away row key design, column family management, and versioning policies. Developers worked with POJOs, and Lily handled the mapping. -
The Indexing Layer (Solr)
Every mutation in HBase triggered an asynchronous indexing event to Solr. Lily maintained tight consistency between the two systems, ensuring that search results reflected the latest data within milliseconds. This was achieved through a message queue (Kafka or RabbitMQ) and idempotent indexing. -
The Java API Layer
The crown jewel was Lily’s type-safe, annotation-driven API. Developers defined their data model using plain Java classes:
@LilyRecord
public class Customer {
@LilyId
private String id;
@LilyField(family = "profile")
private String name;
@LilyField(family = "profile")
private int age;
@LilyField(family = "activity", indexed = true)
private List<String> recentSearches;
@LilyFullText
private String bio;
}
The @LilyRecord annotation told Lily to persist this object in HBase. @LilyField specified column families and indexing behavior. @LilyFullText triggered Solr indexing. No XML. No schema files. Just Java.
The Lily Repository: Spring Data, But for Big Data
Lily’s LilyRepository interface was modeled after Spring Data’s CrudRepository, but with big data superpowers:
public interface CustomerRepository extends LilyRepository<Customer, String> {
List<Customer> findByName(String name);
@Query("age:[* TO 30]")
List<Customer> findYoungCustomers();
@Query("bio:java AND recentSearches:hadoop")
List<Customer> findJavaHadoopEnthusiasts();
}
Behind the scenes, Lily:
– Translated method names to HBase scans.
– Converted @Query annotations to Solr queries.
– Executed searches across sharded SolrCloud clusters.
– Returned fully hydrated POJOs.
Bruno Guedes demonstrated this in a live demo:
CustomerRepository repo = lily.getRepository(CustomerRepository.class);
repo.save(new Customer("1", "Alice", 28, Arrays.asList("java", "hadoop"), "Java dev at NGDATA"));
List<Customer> results = repo.findJavaHadoopEnthusiasts();
The entire operation—save, index, search—took under 50ms on a 3-node cluster.
Under the Hood: How Lily Orchestrated HBase and Solr
Lily’s magic was in its orchestration layer. When a save() was called:
1. The POJO was serialized to HBase Put operations.
2. The mutation was written to HBase with a version timestamp.
3. A change event was published to a message queue.
4. A Solr indexer consumed the event and updated the search index.
5. Near-real-time consistency was guaranteed via HBase’s WAL and Solr’s soft commits.
For reads:
– findById → HBase Get.
– findByName → HBase scan with secondary index.
– @Query → Solr query with HBase post-filtering.
This dual-write, eventual consistency model was a deliberate trade-off for performance and scalability.
Schema Evolution and Versioning: The Enterprise Reality
One of Lily’s most enterprise-friendly features was schema evolution. In HBase, adding a column family requires manual admin intervention. In Lily, it was automatic:
// Version 1
@LilyField(family = "profile")
private String email;
// Version 2
@LilyField(family = "profile")
private String phone; // New field, no migration needed
Lily stored multiple versions of the same record, allowing old code to read new data and vice versa. This was critical for rolling deployments in large organizations.
Production Patterns and Anti-Patterns
Bruno Guedes shared war stories from production:
– Hotspot avoidance: Never use auto-incrementing IDs. Use hashed or UUID-based keys.
– Index explosion: @LilyFullText on large fields → Solr bloat. Use @LilyField(indexed = true) for structured search.
– Compaction storms: Schedule major compactions during low traffic.
– Zookeeper tuning: Increase tick time for large clusters.
The Lily Ecosystem in 2012
Lily shipped with:
– Lily CLI for schema inspection and cluster management.
– Lily Maven Plugin for deploying schemas.
– Lily SolrCloud Integration with automatic sharding.
– Lily Kafka Connect for streaming data ingestion.
Lily’s Legacy After 2018: Where the Ideas Live On
EDIT
Although Lily itself was archived in 2018, its core concepts continue to thrive in modern tools.
The original HBase POJO mapping is now embodied in Spring Data Hadoop.
Lily’s Solr integration has evolved into SolrJ + OpenSearch.
The repository pattern that Lily pioneered is carried forward by Spring Data R2DBC.
Schema evolution, once a key Lily feature, is now handled by Apache Atlas.
Finally, Lily’s near-real-time search capability lives on through the Elasticsearch Percolator.
Conclusion: Big Data Doesn’t Have to Be Hard
Steven Noels closed with a powerful message:
“Big data is not about MapReduce. It’s not about Zookeeper. It’s about solving business problems at scale. Lily proved that Java developers can do that—without becoming data engineers.”
EDIT:
In 2025, as lakehouse architectures, real-time analytics, and AI-driven search dominate, Lily’s vision of big data as a first-class Java citizen remains more relevant than ever.
Links
Sosh vs Free Mobile
Depuis quelques jours j’ai quitte Free Mobile. En resume, le reseau n’est toujours pas au niveau, malgre de recentes ameliorations. Le gros point noir reste la data en 3G, ce qui pour moi est le besoin fondamental pour lequel j’investis dans un smartphone et un abonnement telephonique, alors qu’une simple carte prepayee suffirait largement a mes 30 ou 45 minutes de communication telephonique mensuelles.
Je suis donc passe chez Sosh. Premier bilan apres quelques semaines:
- Sosh, ca fonctionne mieux que Free Mobile. Je modere toutefois ce jugement, car si le reseau, les debits et le ping sont sensiblement meilleurs, il n’y a rien de fulgurant. Meme sur le reseau de Orange, il m’arrive souvent de perdre l’UMTS et de retomber en EDGE, voire meme de perdre la connexion data.
- J’ai de nouveau acces Google Maps (et Navigation), ainsi que Google Play, et mes flux RSS sont mis a jour plus regulierement. Ces derniers temps, il me fallait 10 a 15 minutes avant de pouvoir me servir du GPS!
- Je peux lancer un streaming avec une probabilite raisonnable que ca fonctionne
- La batterie de mon Nexus recupere une duree de vie normale ; chez Free, a cause notamment du basculement regulier d’antenne, il m’etait necessaire de charger deux voire trois fois durant la journee!
- Si le reseau fonctionne bien, la logistique n’est par contre pas au point:
- commande passee un jeudi, carte SIM recue le mercredi suivant.
- portage de numero demande pour le samedi d’apres la livraison, mais sera effectue le jeudi d’apres, ie deux semaines apres avoir passe la commande!
- l’interface de gestion de compte utilisateur n’est pas vraiment bien faite (avis personnel).
Bref, meme si ca coute plus cher, au moins ca marche! Je ne suis pas encore alle a New-York ou Mumbai pour verifier le fonctionnement a l’etranger, mais je ne pense pas qu’il y ait de probleme particulier.
Je ne jetterai pas la pierre a Free Mobile ni a Xavier Niel: je leur suis reconnaissant d’avoir fait chuter les prix des abonnements mobiles en quelques mois. Je retournerai peut-etre chez Free dans quelques mois, si leur reseau devient digne de ce nom. Pour le moment, je refuse de payer 15.99€ pour un service qui fonctionne “de temps en temps”… et jamais quand c’est urgent :-@