AirflowGetting Started

Getting Started with Apache Airflow

This guide will take you from installing Airflow to running your first production-ready DAG.

9 min read

Getting Started with Apache Airflow

This guide will take you from installing Airflow to running your first production-ready DAG.

Prerequisites

  • Python 3.8+ installed
  • pip package manager
  • Basic Python knowledge
  • 4GB+ RAM recommended
  • Linux/macOS (Windows requires WSL)

Quick Start (5 Minutes)

Option 1: Install with pip (Development)

Access UI: Open http://localhost:8080 (login: admin/admin)


Option 2: Docker Compose (Recommended)


Step 1: Understand Airflow Structure

After installation, your AIRFLOW_HOME directory looks like:


Step 2: Create Your First DAG

Create ~/airflow/dags/hello_world_dag.py:


Step 3: Verify Your DAG


Step 4: Run Your DAG

Via Web UI:

  1. Go to http://localhost:8080
  2. Find hello_world DAG
  3. Toggle it ON (unpause)
  4. Click Trigger DAG (play button)
  5. Click on DAG run to see task progress

Via CLI:


Step 5: Monitor Execution

Web UI Features:

DAG View:

  • Graph: Visual representation of task dependencies
  • Code: View DAG source code
  • Calendar: See historical runs
  • Task Duration: Performance over time

Task Instance View:

  • Logs: Detailed execution logs
  • XCom: Data passed between tasks
  • Rendered: See templated values
  • Details: Runtime info, duration, state

Check Task Logs:


Step 6: Common DAG Patterns

Pattern 1: Extract-Transform-Load (ETL)

Pattern 2: Parallel Tasks

Pattern 3: Conditional Branching


Step 7: Working with Connections

Add Connection via UI:

  1. Go to Admin > Connections
  2. Click + to add new connection
  3. Fill in details (example for Snowflake):
    • Conn Id: snowflake_default
    • Conn Type: Snowflake
    • Host: xy12345.snowflakecomputing.com
    • Schema: public
    • Login: username
    • Password: password
    • Extra: {"account": "xy12345", "warehouse": "compute_wh", "database": "analytics"}

Add Connection via CLI:

Use Connection in DAG:


Step 8: Essential Configuration

Edit ~/airflow/airflow.cfg:


Step 9: Install Provider Packages

Airflow providers add operators for specific services:


Step 10: Essential Commands

DAG Management:

Task Testing:

Database:

Users:


Troubleshooting

Problem: DAG not appearing in UI

Solutions:

Problem: Tasks stuck in "queued" state

Solutions:

Problem: "Broken DAG" error

Solutions:


Production Checklist

Before deploying to production:

  • Use PostgreSQL or MySQL (not SQLite)
  • Set load_examples = False
  • Configure email alerts for failures
  • Set up monitoring (Prometheus, StatsD)
  • Use secrets backend (AWS Secrets Manager, Vault)
  • Enable RBAC for access control
  • Set appropriate parallelism and concurrency
  • Use CeleryExecutor or KubernetesExecutor
  • Set up log persistence (S3, GCS, Azure Blob)
  • Configure backups for metadata database
  • Implement CI/CD for DAG deployment
  • Set catchup=False for most DAGs
  • Add comprehensive logging in tasks
  • Test backfilling behavior

Next Steps

Now that you have Airflow running:

  1. Explore Use Cases - Real-world pipeline examples
  2. Learn Best Practices - Production patterns
  3. Try Tutorials - Hands-on projects
  4. Check Resources - Community and tools

Need Help?


Ready for more? Continue to Airflow Use Cases to see real-world implementations.

Stay in the loop

Get weekly insights on data engineering, analytics, and AI—delivered straight to your inbox.

No spam. Unsubscribe anytime.