Tutorial 1: Build a Complete ETL Pipeline with Prefect

In this tutorial, you'll build a production-ready ETL pipeline that extracts data from a REST API, transforms it, and loads it into a database. You'll learn core Prefect concepts including flows, tasks, error handling, caching, and deployment.

Time: 60-90 minutes Level: Beginner Prerequisites: Python 3.8+, basic SQL knowledge

What You'll Build

A daily ETL pipeline that:

Extracts user data from JSONPlaceholder API
Transforms and enriches the data
Loads data to SQLite database
Includes error handling and retries
Implements caching for efficiency
Runs on a schedule

Tech Stack:

Prefect 2.x
Pandas for data manipulation
SQLite for storage
Requests for API calls

Step 1: Project Setup

Create Project Directory

Create Virtual Environment

Install Dependencies

Create Project Files

Step 2: Configure Prefect

Option A: Use Prefect Cloud (Recommended)

Follow the prompts to authenticate.

Option B: Use Local Server

Step 3: Build the Extract Task

Create etl_pipeline.py:

Step 4: Build the Transform Task

Add to etl_pipeline.py:

Step 5: Build the Load Task

Add to etl_pipeline.py:

Step 6: Create the Main Flow

Add to etl_pipeline.py:

Step 7: Test the Pipeline

Run your pipeline locally:

Expected Output:

Verify Data in Database

Step 8: Add Data Validation

Create validation.py:

Add validation to your flow:

Step 9: Create a Deployment

Create Deployment Configuration

Edit `deployment.yaml`

Apply Deployment

Step 10: Start a Worker and Run

Start Worker

Trigger a Run

Monitor in UI

Visit Prefect UI to see:

Flow run status
Task execution timeline
Logs from each task
Run duration and performance

Step 11: Add Notifications

Create a notification for failures:

Step 12: Testing

Create test_pipeline.py:

Run tests:

Exercises

Exercise 1: Add Incremental Loading

Modify the pipeline to only load new/updated records based on a timestamp.

Hint:

Exercise 2: Add More Data Sources

Extract and merge data from /comments endpoint.

Exercise 3: Implement Error Notifications

Send an email or Slack message when pipeline fails.

Exercise 4: Add Data Profiling

Generate statistics about the data (row counts, null percentages, etc.).

Common Issues & Solutions

Issue: "Cannot connect to Prefect API"

Solution:

Issue: "Task cached but data changed"

Solution: Clear cache or change cache key function:

Issue: "Worker not picking up runs"

Solution:

Next Steps

Congratulations! You've built a production-ready ETL pipeline with Prefect.

What you learned:

✅ Creating flows and tasks
✅ Error handling and retries
✅ Caching for efficiency
✅ Deploying and scheduling
✅ Monitoring and logging
✅ Data validation

Continue learning:

Tutorial 2: ML Model Training Pipeline
Best Practices - Production patterns
Use Cases - More real-world examples

Complete Code

Find the complete code for this tutorial on GitHub: prefect-tutorials

← Back to Tutorials | Prefect Overview → | Best Practices

Tutorial 1: Build a Complete ETL Pipeline with Prefect

Tutorial 1: Build a Complete ETL Pipeline with Prefect

What You'll Build

Step 1: Project Setup

Create Project Directory

Create Virtual Environment

Install Dependencies

Create Project Files

Step 2: Configure Prefect

Option A: Use Prefect Cloud (Recommended)

Option B: Use Local Server

Step 3: Build the Extract Task

Step 4: Build the Transform Task

Step 5: Build the Load Task

Step 6: Create the Main Flow

Step 7: Test the Pipeline

Verify Data in Database

Step 8: Add Data Validation

Step 9: Create a Deployment

Create Deployment Configuration

Edit deployment.yaml

Apply Deployment

Step 10: Start a Worker and Run

Start Worker

Trigger a Run

Monitor in UI

Step 11: Add Notifications

Step 12: Testing

Exercises

Exercise 1: Add Incremental Loading

Exercise 2: Add More Data Sources

Exercise 3: Implement Error Notifications

Exercise 4: Add Data Profiling

Common Issues & Solutions

Issue: "Cannot connect to Prefect API"

Issue: "Task cached but data changed"

Issue: "Worker not picking up runs"

Next Steps

Complete Code

Stay in the loop

Prefect Resources & Learning Materials

Edit `deployment.yaml`