Databricks Lakehouse Platform

ELT With Spark SQL and Python

Incremental Data Processing

SELECT * FROM enrollments_updates

To track table changes, use the DESCRIBE HISTORY command on the enrollments_updates table.

DESCRIBE HISTORY enrollments_updates
# drop the enrollments_updates table
%sql DROP TABLE enrollments_updates

# remove the checkpoint location associated with our Auto Loader stream
dbutils.fs.rm("dbfs:/mnt/DEA-Book/checkpoints/enrollments", True)

Medallion Architecture

Medallion Architecture, or multi-hop architecture, is a layered data design that improves data structure and quality through stages. It consists of three layers:

Each layer adds value, ensuring a structured and scalable transformation process.

image.png

Bronze Layer

is the first stage of the medallion architecture, where raw data is ingested and stored without transformation. It preserves the original format for auditing and traceability. Data sources include files, databases, and streaming platforms like Kafka. The goal is to capture all data, regardless of quality, as a single source of truth.

Silver Layer

The silver layer processes raw data to improve its quality and make it ready for analysis. This includes cleaning, normalizing, validating, and enriching the data—often by joining it with other sources. The goal is to ensure accuracy and consistency, creating a reliable foundation for analytics and reporting.

Gold Layer

The gold layer contains fully refined, business-ready data. Here, data is aggregated and summarized to support decision-making, such as KPIs, financial reports, and customer analytics. This layer is optimized for reporting, dashboards, and advanced use cases like machine learning and AI.

Benefits of Medallion Architecture

Build Data Pipelines with Delta Live Tables

Production Pipelines

Data Governance