Recent Posts
Archives

PostHeaderIcon [AWSReInventPartnerSessions2024] Data Mesh at Moderna: One dbt to Unify Data and People (DAT206)

Lecturer

Connor McArthur co-founded dbt Labs, where he contributes to developing workflows for data transformations inspired by software engineering principles. With over a decade in engineering leadership, Connor focuses on metadata-driven analytics to enhance governance and development speed. Sri Kamireddy leads data initiatives at Moderna, overseeing the integration of diverse data platforms to support organizational goals in biotechnology.

Abstract

This detailed review investigates Moderna’s adoption of dbt Cloud to construct a unified data mesh architecture, integrating disparate data systems for enhanced coherence and efficiency. It scrutinizes the contextual demands of multi-platform environments, methodological use of cross-platform dbt mesh, and implications for data governance, engineering workflows, and business outcomes in a high-stakes industry.

Contextual Demands in Multi-Platform Data Environments

Organizations like Moderna operate in complex data landscapes, often employing multiple warehouses due to acquisitions, team preferences, or specialized needs. This diversity, while beneficial for tailored solutions, introduces fragmentation, complicating integration and governance. Moderna’s setup includes Amazon EMR, Spark, Redshift, Athena, and Glue, reflecting a hybrid approach to handle vast datasets from supply chain, manufacturing, and shipments.

The challenge lies in unifying these without duplicating data or losing lineage, which could delay insights critical for operations like vaccine distribution. dbt addresses this by providing an opinionated workflow based on software development cycles, generating active metadata that informs a data control plane. This plane centralizes building, deploying, orchestrating, observing, and cataloging analytics stacks.

Methodological Application of Cross-Platform dbt Mesh

dbt mesh enables seamless connectivity across platforms using Iceberg for interoperability. At Moderna, this methodology streamlined combining supply chain data from a data lake (Athena) with manufacturing data (Redshift). By setting project statuses to public within the dbt environment, models from one platform reference others, preserving end-to-end lineage.

A custom SDK wrapper enforces data quality checks and metadata inclusion during model development, ensuring governance without stifling domain-driven engineering. Lake Formation tags maintain access controls, preventing silos.

Code sample for referencing models across projects in dbt:

sources:
  - name: athena_project
    schema: athena_schema
    tables:
      - name: supply_chain_data

models:
  - name: unified_product
    config:
      materialized: table
    sql: |
      SELECT *
      FROM {{ ref('athena_project', 'supply_chain_data') }}
      JOIN redshift_manufacturing ON ...

This demonstrates methodological simplicity in unifying data flows.

Implications for Governance and Operational Efficiency

The approach reduces engineering workloads by eliminating custom scripts, allowing focus on value-added tasks. Enhanced lineage aids business users in tracing metrics origins, fostering trust and faster decision-making.

In biotechnology, where timely insights impact global health, this efficiency is crucial. Scalable infrastructure with controlled costs supports Moderna’s data-driven culture, emphasizing strong platforms, governance, and security.

In summary, dbt mesh at Moderna exemplifies how unified tools bridge platform divides, promoting cohesive data estates that drive innovation and reliability.

Links:

Leave a Reply