Getting Started with dlt
This guide will help you install dlt, run your first pipeline, and understand the core workflows.
Installation
Prerequisites
- Python 3.8 or higher
- pip (Python package installer)
Install dlt
Basic Installation:
With Specific Destination:
Verify Installation:
Your First Pipeline
Simple Example: Load Data to DuckDB
Step 1: Create a Python file (my_first_pipeline.py):
]
Create pipeline
pipeline = dlt.pipeline( pipeline_name="my_first_pipeline", destination="duckdb", dataset_name="test_data" )
Run pipeline
load_info = pipeline.run(data, table_name="users")
Print results
print(load_info)
Output:
Step 3: Query the data:
Or use DuckDB directly:
Loading from an API
Example: GitHub API
Using Verified Sources
dlt provides 50+ pre-built sources you can use immediately.
Initialize a Verified Source
This creates:
Configure Credentials
Edit .dlt/secrets.toml:
Or use environment variables:
Run the Verified Source
Run:
Configuration
Project Structure
Config Files
.dlt/config.toml (Non-sensitive):
.dlt/secrets.toml (Sensitive - add to .gitignore):
Environment Variables
Alternative to config files:
Loading to Different Destinations
DuckDB (Local)
BigQuery
Setup:
.dlt/secrets.toml:
Pipeline:
Snowflake
Setup:
.dlt/secrets.toml:
Pipeline:
PostgreSQL
Setup:
Pipeline:
Incremental Loading
Tracking New Records
Tracking Updated Records
Handling Schema Evolution
Automatic Schema Evolution
Controlling Schema
Error Handling and Monitoring
Check Load Results
Logging
Retry Logic
Working with DataFrames
Querying Loaded Data
Using SQL Client
Using Destination-Specific Tools
Common Patterns
Loading Multiple Tables
Pagination
Transforming Data
Troubleshooting
Common Issues
Issue: Module not found
Issue: Credentials not found
Issue: Schema evolution errors
Issue: Pipeline runs but no data loaded
Debug Mode
Next Steps
Now that you've created your first pipelines:
- Explore Verified Sources: Try
dlt init --list-sources - Add Incremental Loading: Reduce data transfer
- Integrate with dbt: Transform loaded data
- Set up Orchestration: Use with Airflow/Dagster
- Deploy to Production: See Best Practices
Recommended Learning Path
- Load data from simple API
- Add incremental loading
- Load to production warehouse (BigQuery/Snowflake)
- Integrate with dbt for transformations
- Orchestrate with Airflow
- Monitor and optimize
Ready for production? Check out:
- Use Cases - Real-world scenarios
- Best Practices - Production patterns
- Tutorials - Hands-on projects