Getting Started with Apache Airflow
This guide will take you from installing Airflow to running your first production-ready DAG.
Prerequisites
- Python 3.8+ installed
- pip package manager
- Basic Python knowledge
- 4GB+ RAM recommended
- Linux/macOS (Windows requires WSL)
Quick Start (5 Minutes)
Option 1: Install with pip (Development)
Access UI: Open http://localhost:8080 (login: admin/admin)
Option 2: Docker Compose (Recommended)
Step 1: Understand Airflow Structure
After installation, your AIRFLOW_HOME directory looks like:
Step 2: Create Your First DAG
Create ~/airflow/dags/hello_world_dag.py:
Step 3: Verify Your DAG
Step 4: Run Your DAG
Via Web UI:
- Go to http://localhost:8080
- Find
hello_worldDAG - Toggle it ON (unpause)
- Click Trigger DAG (play button)
- Click on DAG run to see task progress
Via CLI:
Step 5: Monitor Execution
Web UI Features:
DAG View:
- Graph: Visual representation of task dependencies
- Code: View DAG source code
- Calendar: See historical runs
- Task Duration: Performance over time
Task Instance View:
- Logs: Detailed execution logs
- XCom: Data passed between tasks
- Rendered: See templated values
- Details: Runtime info, duration, state
Check Task Logs:
Step 6: Common DAG Patterns
Pattern 1: Extract-Transform-Load (ETL)
Pattern 2: Parallel Tasks
Pattern 3: Conditional Branching
Step 7: Working with Connections
Add Connection via UI:
- Go to Admin > Connections
- Click + to add new connection
- Fill in details (example for Snowflake):
- Conn Id:
snowflake_default - Conn Type:
Snowflake - Host:
xy12345.snowflakecomputing.com - Schema:
public - Login:
username - Password:
password - Extra:
{"account": "xy12345", "warehouse": "compute_wh", "database": "analytics"}
- Conn Id:
Add Connection via CLI:
Use Connection in DAG:
Step 8: Essential Configuration
Edit ~/airflow/airflow.cfg:
Step 9: Install Provider Packages
Airflow providers add operators for specific services:
Step 10: Essential Commands
DAG Management:
Task Testing:
Database:
Users:
Troubleshooting
Problem: DAG not appearing in UI
Solutions:
Problem: Tasks stuck in "queued" state
Solutions:
Problem: "Broken DAG" error
Solutions:
Production Checklist
Before deploying to production:
- Use PostgreSQL or MySQL (not SQLite)
- Set
load_examples = False - Configure email alerts for failures
- Set up monitoring (Prometheus, StatsD)
- Use secrets backend (AWS Secrets Manager, Vault)
- Enable RBAC for access control
- Set appropriate parallelism and concurrency
- Use CeleryExecutor or KubernetesExecutor
- Set up log persistence (S3, GCS, Azure Blob)
- Configure backups for metadata database
- Implement CI/CD for DAG deployment
- Set
catchup=Falsefor most DAGs - Add comprehensive logging in tasks
- Test backfilling behavior
Next Steps
Now that you have Airflow running:
- Explore Use Cases - Real-world pipeline examples
- Learn Best Practices - Production patterns
- Try Tutorials - Hands-on projects
- Check Resources - Community and tools
Need Help?
- Questions? Ask in Apache Airflow Slack
- Want expert guidance? Book a consultation
- Team training? Custom workshops available
Ready for more? Continue to Airflow Use Cases to see real-world implementations.