Recent Posts
Archives

Posts Tagged ‘Mappy’

PostHeaderIcon [DevoxxFR2014] Architecture and Utilization of Big Data at PagesJaunes

Lecturer

Jean-François Paccini serves as the Chief Technology Officer (CTO) at PagesJaunes Groupe, overseeing technological strategies for local information services. His leadership has driven the integration of big data technologies to enhance data processing and user experience in digital products.

Abstract

This article analyzes the strategic adoption of big data technologies at PagesJaunes, from initial convictions to practical implementations. It examines the architecture for audience data collection, innovative applications like GeoLive for real-time visualization, and machine learning for search relevance, while projecting future directions and implications for business intelligence.

Strategic Convictions and Initial Architecture

PagesJaunes, part of a group including Mappy and other local service entities, has transitioned to predominantly digital revenue, generating 70% of its 2014 turnover online. This shift produces abundant data from user interactions—over 140 million monthly searches, 3 million reviews, and nearly 1 million mobile visits—offering insights into user behavior adaptable in real-time.

The conviction driving big data adoption is the untapped value in this data “gold mine,” combined with accessible technologies like Hadoop. Rather than responding to specific business demands, the initiative stemmed from technological foresight: proving potential through modest investments in open-source tools and commodity hardware.

The initial opportunity arose from refactoring the audience data collection chain, traditionally handling web server logs, application metrics, and mobile data via batch scripts and a columnar database. Challenges included delays (often J+2 to J+4) and error recovery issues. The new architecture employs Flume collectors feeding a Hadoop cluster of about 50 nodes, storing 10 terabytes and processing 75 gigabytes daily—costing far less than legacy systems.

Innovative Applications: GeoLive and Beyond

To demonstrate value, the team developed GeoLive during an internal innovation contest, visualizing real-time searches on a French map. Each flashing point represents a query, delayed by about five minutes, illustrating media ubiquity across territories. Categories like “psychologist” or “dermatologist” highlight local concerns.

GeoLive created a “wow effect,” winning the contest and gaining executive enthusiasm. Industrialized for the company lobby and sales tools, it tangibly showcases search volume and coverage, shifting perceptions from abstract metrics to visual impact.

Building on this, big data extended to core operations via machine learning for search relevance. Users often seek products or services ambiguously (e.g., “rice in Marseille” yielding funeral rites instead of food retailers). Traditional analysis covered only top 10,000 queries manually; Hadoop enables exhaustive session examination, identifying weak queries through reformulations.

Tools like Hive and custom developments, aided by a data scientist, model query fragility. This loop informs indexers to refine rules, detecting missing professionals and enhancing results continuously.

Future Projections and Organizational Impact

Looking forward, PagesJaunes aims to industrialize A/B testing for algorithm variants, real-time user segmentation, and fraud detection (e.g., scraping bots). Data journalism will leverage regional trends for insights.

Predictions include 90% of data intelligence projects adopting these technologies within 18 months, with Hadoop potentially replacing the corporate data warehouse for audience analytics. This evolution demands data scientist roles for sophisticated modeling, avoiding naive correlations.

The journey underscores big data’s role in fostering innovation, as seen in the “Make It” contest energizing cross-functional teams. Such events reveal creative potential, leading to production implementations and cultural shifts toward agility.

Implications for Digital Transformation

Big data at PagesJaunes exemplifies how convictions in data value and technology accessibility can drive transformation. From modest clusters to mission-critical applications, it enhances user experience and operational efficiency. Challenges like tool maturity for non-technical analysts persist, but evolving ecosystems promise broader accessibility.

Ultimately, this approach positions PagesJaunes to personalize experiences, introduce affinity services, and maintain competitiveness in local search, illustrating big data’s strategic imperative in digital economies.

Links:

PostHeaderIcon [DevoxxFR2013] From Cloud Experimentation to On-Premises Maturity: Strategic Infrastructure Repatriation at Mappy

Lecturer

Cyril Morcrette serves as Technical Director at Mappy, a pioneering French provider of geographic and local commerce services with thirteen million euros in annual revenue and eighty employees. Under his leadership, Mappy has evolved from a traditional route planning service into a comprehensive platform integrating immersive street-level imagery, local business discovery, and personalized recommendations. His infrastructure strategy reflects deep experience with both cloud and on-premises environments, informed by multiple large-scale projects that pushed technological boundaries.

Abstract

Cloud computing excels at enabling rapid prototyping and handling uncertain demand, but its cost structure can become prohibitive as projects mature and usage patterns stabilize. This presentation chronicles Mappy’s journey with immersive geographic visualization — a direct competitor to Google Street View — from initial cloud deployment to eventual repatriation to on-premises infrastructure. Cyril Morcrette examines the economic, operational, and technical factors that drove this decision, providing a framework for evaluating infrastructure choices throughout the application lifecycle. Through detailed cost analysis, performance metrics, and migration case studies, he demonstrates that cloud is an ideal launch platform but often not the optimal long-term home for predictable, high-volume workloads. The session concludes with practical guidance for smooth repatriation and the broader implications for technology strategy in established organizations.

The Immersive Visualization Imperative

Mappy’s strategic pivot toward immersive geographic experiences required capabilities beyond traditional mapping: panoramic street-level imagery, 3D reconstruction, and real-time interaction. The project demanded massive storage (terabytes of high-resolution photos), significant compute for image processing, and low-latency delivery to users.

Initial estimates suggested explosive, unpredictable traffic growth. Marketing teams envisioned viral adoption, while technical teams worried about infrastructure bottlenecks. Procuring sufficient on-premises hardware would require months of lead time and capital approval — unacceptable for a market-moving initiative.

Amazon Web Services offered an immediate solution: spin up instances, store petabytes in S3, process imagery with EC2 spot instances. The cloud’s pay-as-you-go model eliminated upfront investment and provided virtually unlimited capacity.

Cloud-First Development: Speed and Agility

The project launched entirely in AWS. Development teams used EC2 for processing pipelines, S3 for raw and processed imagery, CloudFront for content delivery, and Elastic Load Balancing for web servers. Auto-scaling handled traffic spikes during marketing campaigns.

This environment enabled rapid iteration:
– Photographers uploaded imagery directly to S3 buckets
– Lambda functions triggered processing workflows
– Machine learning models (running on GPU instances) detected business facades and extracted metadata
– Processed panoramas were cached in CloudFront edge locations

Within months, Mappy delivered a functional immersive experience covering major French cities. The cloud’s flexibility absorbed the uncertainty of early adoption while development teams refined algorithms and user interfaces.

The Economics of Maturity

As the product stabilized, usage patterns crystallized. Daily active users grew steadily but predictably. Storage requirements, while large, increased linearly. Processing workloads became batch-oriented rather than real-time.

Cost analysis revealed a stark reality: cloud expenses were dominated by data egress, storage, and compute hours — all now predictable and substantial. Mappy’s existing data center, built for core mapping services, had significant spare capacity with fully amortized hardware.

Cyril presents the tipping point calculation:
Cloud monthly cost: €45,000 (storage, compute, bandwidth)
On-premises equivalent: €12,000 (electricity, maintenance, depreciation)
Break-even: four months

The decision to repatriate was driven by simple arithmetic, but execution required careful planning.

Repatriation Strategy and Execution

The migration followed a phased approach:

  1. Data Transfer: Used AWS Snowball devices to move petabytes of imagery back to on-premises storage. Parallel uploads leveraged Mappy’s high-bandwidth connectivity.

  2. Processing Pipeline: Reimplemented image processing workflows on internal GPU clusters. Custom scripts replaced Lambda functions, achieving equivalent throughput at lower cost.

  3. Web Tier: Deployed Nginx and Varnish caches on existing web servers. CDN integration with Akamai preserved low-latency delivery.

  4. Monitoring and Automation: Migrated CloudWatch metrics to Prometheus/Grafana. Ansible playbooks replaced CloudFormation templates.

Performance remained comparable: page load times stayed under two seconds, and system availability exceeded 99.95%. The primary difference was cost — reduced by seventy-five percent.

Operational Benefits of On-Premises Control

Beyond economics, repatriation delivered strategic advantages:
Data Sovereignty: Full control over sensitive geographic imagery
Performance Predictability: Eliminated cloud provider throttling risks
Integration Synergies: Shared infrastructure with core mapping services reduced operational complexity
Skill Leverage: Existing systems administration expertise applied directly

Cyril notes that while cloud elasticity was lost, the workload’s maturity rendered it unnecessary. Capacity planning became straightforward, with hardware refresh cycles aligned to multi-year budgets.

Lessons for Infrastructure Strategy

Mappy’s experience yields a generalizable framework:
1. Use cloud for uncertainty: Prototyping, viral growth potential, or seasonal spikes
2. Monitor cost drivers: Storage, egress, compute hours
3. Model total cost of ownership: Include migration effort and operational overhead
4. Plan repatriation paths: Design applications with infrastructure abstraction
5. Maintain hybrid capability: Keep cloud skills current for future needs

The cloud is not a destination but a tool — powerful for certain phases, less optimal for others.

Conclusion: Right-Sizing Infrastructure for Business Reality

Mappy’s journey from cloud experimentation to on-premises efficiency demonstrates that infrastructure decisions must evolve with product maturity. The cloud enabled rapid innovation and market entry, but long-term economics favored internal hosting for stable, high-volume workloads. Cyril’s analysis provides a blueprint for technology leaders to align infrastructure with business lifecycle stages, avoiding the trap of cloud religion or on-premises dogma. The optimal stack combines both environments strategically, using each where it delivers maximum value.

Links: