Learn Data Engineering For Free
In-depth guides, tutorials, and best practices from our production experience. No gatekeeping — just practical knowledge to level up your skills.
Browse by Topic
Deep dives into the tools and concepts that matter most.
Airbyte
Airbyte is an open-source data integration platform that replicates data from applications, APIs, and databases to data warehouses, lakes, and other destinations. It's designed to democratize data int
Browse articlesAirflow
Apache Airflow is an open-source platform for authoring, scheduling, and monitoring workflows programmatically. Originally developed by Airbnb in 2014, it enables you to define complex data pipelines
Browse articlesCensus
Census is the leading Reverse ETL platform that syncs data from your data warehouse to the business tools your teams use every day. Founded in 2018, Census has pioneered the "operational analytics" ca
Browse articlesClaude Code
Claude Code is Anthropic's official AI-powered software engineering tool that brings Claude's capabilities directly into your development workflow. It's a terminal-based AI coding assistant that can r
Browse articlesDagster
Dagster is a modern data orchestrator designed around the concept of "data assets" rather than tasks. It emphasizes what data you're building (assets) instead of how you're building it (tasks), making
Browse articlesDatabricks
Databricks is a unified analytics platform built on Apache Spark that combines data engineering, data science, and machine learning in a collaborative cloud-based environment. Founded by the creators
Browse articlesDbt
dbt (data build tool) is an open-source command-line tool that enables data analysts and engineers to transform data in their warehouses more effectively. It allows you to write modular SQL transforma
Browse articlesDlt
dlt (data load tool) is an open-source Python library that makes building data pipelines simple and maintainable. Created by dltHub in 2022, dlt takes a code-first approach to data ingestion, allowing
Browse articlesDocker
Docker is an open-source platform for developing, shipping, and running applications in containers. Released in 2013, Docker revolutionized software deployment by making it easy to package application
Browse articlesFivetran
Fivetran is a fully managed data integration platform that automatically syncs data from various sources into your data warehouse. It handles the Extract and Load (EL) portions of modern ELT pipelines
Browse articlesGreat Expectations
Great Expectations is an open-source Python library for validating, documenting, and profiling data to maintain quality and improve communication between teams. It helps data teams eliminate pipeline
Browse articlesKafka
Apache Kafka is a distributed event streaming platform capable of handling trillions of events per day. It's designed for high-throughput, fault-tolerant, real-time data pipelines and streaming applic
Browse articlesLooker
Looker is a modern business intelligence and data analytics platform that uses a unique modeling language (LookML) to define business logic once and reuse it everywhere. Now part of Google Cloud, Look
Browse articlesMcp
The Model Context Protocol (MCP) is an open protocol created by Anthropic that standardizes how AI applications connect to external data sources and tools. Think of it as USB-C for AI—a universal conn
Browse articlesN8n
n8n (pronounced "n-eight-n") is an open-source workflow automation tool that allows you to connect different services and automate tasks without writing code. It provides a visual, node-based interfac
Browse articlesPrefect
Prefect is a modern workflow orchestration platform that enables data engineers to build, run, and monitor data pipelines with Python. It's designed to handle everything from simple ETL jobs to comple
Browse articlesSnowflake
Snowflake is a cloud-native data warehouse platform built from the ground up for the cloud. Unlike traditional databases, Snowflake separates compute and storage, enabling unprecedented scalability, p
Browse articlesTableau
Tableau is the market-leading business intelligence and data visualization platform that enables users to see and understand their data. Founded in 2003 and acquired by Salesforce in 2019, Tableau has
Browse articlesPopular Articles
Our most-read guides and tutorials.
Apache Airflow
Apache Airflow is an open-source platform for authoring, scheduling, and monitoring workflows programmatically. Originally developed by Airbnb in 2014, it enables you to define complex data pipelines
Getting Started with Apache Airflow
This guide will take you from installing Airflow to running your first production-ready DAG.
dbt (data build tool)
dbt (data build tool) is an open-source command-line tool that enables data analysts and engineers to transform data in their warehouses more effectively. It allows you to write modular SQL transforma
Getting Started with dbt
This guide will walk you through setting up dbt and running your first transformation.
Get data engineering insights weekly
Join our newsletter for practical guides on dbt, Snowflake, Airflow, and the modern data stack — straight from production experience.
No spam. Unsubscribe anytime.
Want Structured Learning?
Check out our comprehensive courses for guided learning paths.