Jonathan Lalou's Blog

Posts Tagged ‘MLOps’

[DevoxxPL2022] Successful AI-NLP Project: What You Need to Know

At Devoxx Poland 2022, Robert Wcisło and Łukasz Matug, data scientists at UBS, shared insights on ensuring the success of AI and NLP projects, drawing from their experience implementing AI solutions in a large investment bank. Their presentation highlighted critical success factors for deploying machine learning (ML) models into production, addressing common pitfalls and offering practical guidance across the project lifecycle.

Understanding the Challenges

The speakers noted that enthusiasm for AI often outpaces practical outcomes, with 2018 data indicating only 10% of ML projects reached production. While this figure may have improved, many projects still fail due to misaligned expectations or inadequate preparation. To counter this, they outlined a simplified three-phase process—Prepare, Build, and Maintain—integrating Software Development Lifecycle (SDLC) and MLOps principles, with a focus on delivering business value and user experience.

Prepare Phase: Setting the Foundation

Łukasz emphasized the importance of the Prepare phase, where clarity on business needs is critical. Many stakeholders, inspired by AI hype, expect miraculous solutions without defining specific outcomes. Key considerations include:

Defining the Output: Understand the business problem and desired results, such as labeling outcomes (e.g., fraud detection). Reduce ambiguity by explicitly defining what the application should achieve.
Evaluating ML Necessity: ML excels in areas like recommendation systems, language understanding, anomaly detection, and personalization, but it’s not a universal solution. For one-off problems, simpler analytics may suffice.
Red Flags: ML models rarely achieve 100% accuracy, requiring more data and testing for higher precision, which increases costs. Highly regulated industries may demand transparency, posing challenges for complex models. Data availability is also critical—without sufficient data, ML is infeasible, though workarounds like transfer learning or purchasing data exist.
Universal Performance Metric: Establish a metric aligned with business goals (e.g., click-through rate, precision/recall) to measure success, unify stakeholder expectations, and guide development priorities for cost efficiency.
Tooling and Infrastructure: Align software and data science teams with shared tools (e.g., Git, data access, experiment logs). Ensure compliance with data restrictions (e.g., GDPR, cross-border rules) and secure access to production-like data and infrastructure (e.g., GPUs).
Automation Levels: Decide the role of AI—ranging from no AI (human baseline) to full automation. Partial automation, where models handle clear cases and humans review uncertain ones, is often practical. Consider ethical principles like fairness, compliance, and no-harm to avoid bias or regulatory issues.
Model Utilization: Plan how the model will be served—binary distribution, API service, embedded application, or self-service platform. Each approach impacts user experience, scalability, and maintenance.
Scalability and Reuse: Design for scalability and consider reusing datasets or models to enhance future projects and reduce costs.

Build Phase: Crafting the Model

Robert focused on the Build phase, offering technical tips to streamline development:

Data Management: Data evolves, requiring retraining to address drift. For NLP projects, cover diverse document templates, including slang or errors. Track data provenance and lineage to monitor sources and transformations, ensuring pipeline stability.
Data Quality: Most ML projects involve smaller datasets (hundreds to thousands of points), where quality trumps quantity. Address imbalances by collaborating with clients for better data or using simpler models. Perform sanity checks to ensure representativeness, avoiding overly curated data that misaligns with production (e.g., professional photos vs. smartphone images).
Metadata and Tagging: Use tags (e.g., source, date, document type) to simplify debugging and maintenance. For instance, identifying underperforming data (e.g., low-quality German PDFs) becomes easier with metadata.
Labeling Strategy: Noisy or ambiguous labels (e.g., misinterpreting “bridges” as Jeff Bridges or drawings vs. physical bicycles) degrade model performance. Aim for human-level performance (HLP), either against ground truth (e.g., biopsy results) or inter-human agreement. A consistent labeling strategy, documented with clear examples, reduces ambiguity and improves data quality. Tools like AWS Mechanical Turk or in-house labeling platforms can streamline this process.
Training Tips: Use transfer learning to leverage pre-trained models, reducing data needs. Active learning prioritizes labeling hard examples, while pseudo-labeling uses existing models to pre-annotate data, saving time if the model is reliable. Ensure determinism by fixing seeds for reproducibility during debugging. Start with lightweight models (e.g., BERT Tiny) to establish baselines before scaling to complex models.
Baselines: Compare against prior models, heuristic-based systems, or simple proofs-of-concept to contextualize progress toward HLP. An 85% accuracy may be sufficient if it aligns with HLP, but 60% after extensive effort signals issues.

Maintain Phase: Sustaining Performance

Maintenance is critical as ML models differ from traditional software due to data drift and evolving inputs. Strategies include:

Deployment Techniques: Use A/B testing to compare model versions, shadow mode to evaluate models in parallel with human processes, canary deployments to test on a small traffic subset, or blue-green deployments for seamless rollbacks.
Monitoring: Beyond system metrics, monitor input (e.g., image brightness, speech volume, input length) and output (e.g., exact predictions, user behavior like query frequency). Detect data or concept drift to maintain relevance.
Reuse: Reuse models, data, and experiences to reduce uncertainty, lower costs, and build organizational capabilities for future projects.

Key Takeaways

The speakers stressed reusing existing resources to demystify AI, reduce costs, and enhance efficiency. By addressing business needs, data quality, and operational challenges early, teams can increase the likelihood of delivering impactful AI-NLP solutions. They invited attendees to discuss further at the UBS stand, emphasizing practical application over theoretical magic.

Links:

Posted in en-US | Tags: AINLP, DataScience, DevoxxPL2022, DevoxxPoland, LukaszMatug, MachineLearning, MLOps, RobertWcislo, UBS | No Comments »