ETL is an automated process that takes raw data, extracts and transforms the information required for analysis, and loads it to a data warehouse. There are different ways to build your ETL pipeline, on this post we’ll be using three main tools:

  • Airflow: one of the most powerful platforms used by Data Engineers for orchestrating workflows.
  • AWS ECS/Fargate: a container management service that makes it easy to run, stop, and manage your containers.
  • AWS s3: AWS simple storage service.

The architecture that we will be building follows the schema bellow:

Architecture schema, using AWS, Docker and Airflow. Designed using Lucid.app

You can check the full blog post on my Medium at Towards Data Science: Building an ETL pipeline with Airflow and ECS