[DevoxxFR2014] Architecture and Utilization of Big Data at PagesJaunes
Lecturer
Jean-François Paccini serves as the Chief Technology Officer (CTO) at PagesJaunes Groupe, overseeing technological strategies for local information services. His leadership has driven the integration of big data technologies to enhance data processing and user experience in digital products.
Abstract
This article analyzes the strategic adoption of big data technologies at PagesJaunes, from initial convictions to practical implementations. It examines the architecture for audience data collection, innovative applications like GeoLive for real-time visualization, and machine learning for search relevance, while projecting future directions and implications for business intelligence.
Strategic Convictions and Initial Architecture
PagesJaunes, part of a group including Mappy and other local service entities, has transitioned to predominantly digital revenue, generating 70% of its 2014 turnover online. This shift produces abundant data from user interactions—over 140 million monthly searches, 3 million reviews, and nearly 1 million mobile visits—offering insights into user behavior adaptable in real-time.
The conviction driving big data adoption is the untapped value in this data “gold mine,” combined with accessible technologies like Hadoop. Rather than responding to specific business demands, the initiative stemmed from technological foresight: proving potential through modest investments in open-source tools and commodity hardware.
The initial opportunity arose from refactoring the audience data collection chain, traditionally handling web server logs, application metrics, and mobile data via batch scripts and a columnar database. Challenges included delays (often J+2 to J+4) and error recovery issues. The new architecture employs Flume collectors feeding a Hadoop cluster of about 50 nodes, storing 10 terabytes and processing 75 gigabytes daily—costing far less than legacy systems.
Innovative Applications: GeoLive and Beyond
To demonstrate value, the team developed GeoLive during an internal innovation contest, visualizing real-time searches on a French map. Each flashing point represents a query, delayed by about five minutes, illustrating media ubiquity across territories. Categories like “psychologist” or “dermatologist” highlight local concerns.
GeoLive created a “wow effect,” winning the contest and gaining executive enthusiasm. Industrialized for the company lobby and sales tools, it tangibly showcases search volume and coverage, shifting perceptions from abstract metrics to visual impact.
Building on this, big data extended to core operations via machine learning for search relevance. Users often seek products or services ambiguously (e.g., “rice in Marseille” yielding funeral rites instead of food retailers). Traditional analysis covered only top 10,000 queries manually; Hadoop enables exhaustive session examination, identifying weak queries through reformulations.
Tools like Hive and custom developments, aided by a data scientist, model query fragility. This loop informs indexers to refine rules, detecting missing professionals and enhancing results continuously.
Future Projections and Organizational Impact
Looking forward, PagesJaunes aims to industrialize A/B testing for algorithm variants, real-time user segmentation, and fraud detection (e.g., scraping bots). Data journalism will leverage regional trends for insights.
Predictions include 90% of data intelligence projects adopting these technologies within 18 months, with Hadoop potentially replacing the corporate data warehouse for audience analytics. This evolution demands data scientist roles for sophisticated modeling, avoiding naive correlations.
The journey underscores big data’s role in fostering innovation, as seen in the “Make It” contest energizing cross-functional teams. Such events reveal creative potential, leading to production implementations and cultural shifts toward agility.
Implications for Digital Transformation
Big data at PagesJaunes exemplifies how convictions in data value and technology accessibility can drive transformation. From modest clusters to mission-critical applications, it enhances user experience and operational efficiency. Challenges like tool maturity for non-technical analysts persist, but evolving ecosystems promise broader accessibility.
Ultimately, this approach positions PagesJaunes to personalize experiences, introduce affinity services, and maintain competitiveness in local search, illustrating big data’s strategic imperative in digital economies.