Apache Airflow CeleryExecutor PostgreSQL Redis: Start the great environment using Docker-Compose in 5 minutes!

Apache Airflow CeleryExecutor PostgreSQL Redis Docker-Compose: Uruchom środowisko w 5 minut!
Photo by Drew Beamer on Unsplash
Share this post and Earn Free Points!

In this post I will show you how to create a fully operational environment which consist of Apache Airflow CeleryExecutor PostgreSQL Redis in 5 minutes, which will include:

Introduction

Apache Airflow

Apache Airflow is a platform to programmatically author, schedule, and monitor workflows. It is an open-source tool that was developed by Airbnb in 2014 and has since become one of the most popular workflow management tools in the industry.

Airflow is often used to automate and manage ETL (Extract, Transform, Load) pipelines, as well as machine learning workflows. It provides a simple, yet powerful, way to define and orchestrate workflows as directed acyclic graphs (DAGs) of tasks.

Here are some key features of Apache Airflow:

  • Task orchestration: Airflow allows you to define tasks and dependencies between them, so that you can build complex workflows by combining multiple tasks together.
  • Scheduling: Airflow has a built-in scheduler that can run tasks on a predefined schedule or in response to certain triggers.
  • Monitoring: Airflow provides a web interface that allows you to monitor the status of your workflows and tasks, as well as view logs and error messages.
  • Extensibility: Airflow is highly extensible and provides a rich set of APIs that allow you to integrate it with other tools and platforms.

Overall, Apache Airflow is a powerful tool for automating and managing complex workflows. It can help you improve the reliability and efficiency of your data pipelines and machine learning workflows, and make it easier to maintain and evolve your workflow over time.

CeleryExecutor

The CeleryExecutor is an executor in Apache Airflow that allows you to scale out the execution of tasks by using a distributed task queue. It is based on the Celery distributed task queue system, which is a popular open-source task queue system written in Python.

The CeleryExecutor works by sending tasks to a message broker (such as RabbitMQ or Redis) that are then picked up by worker processes that run the tasks. This allows you to scale out the execution of tasks horizontally by adding more worker processes.

To use the CeleryExecutor, you will need to set up a message broker and configure Airflow to use it. You will also need to start worker processes that can pick up tasks from the message broker.

One advantage of the CeleryExecutor is that it allows you to scale out the execution of tasks horizontally, which can be useful if you have a large number of tasks or if your tasks are resource-intensive. However, it does require additional setup and infrastructure compared to the default SequentialExecutor.

Postgres

PostgreSQL, often simply called Postgres, is a powerful, open-source object-relational database management system (ORDBMS). It is designed to handle a wide range of workloads, from small single-machine applications to large internet-facing applications with many concurrent users.

Postgres is known for its reliability, robustness, and performance, and it has a strong reputation for being a database that is easy to use and maintain. It supports a wide range of data types and features, including support for JSON and array data types, full-text search, and database transactions.

Postgres is also highly extensible, with a rich set of APIs and tools for building custom applications and integrations. It is widely used in a variety of industries, including finance, healthcare, e-commerce, and government.

in Apache Airflow, Postgres is used as the default database to store metadata about DAGs, tasks, and other information. It is also possible to use other databases with Airflow, such as MySQL or Oracle, by using third-party plugins.

Redis

Redis (Remote Dictionary Server) is an open-source in-memory data store that is often used as a cache, message broker, or database. It is known for its high performance and scalability, and it is widely used in a variety of applications and systems.

Redis stores data in-memory, which makes it extremely fast, but it can also write data to disk for persistence. It supports a wide range of data types, including strings, hashes, lists, and sets, and it provides a rich set of commands for manipulating and querying data.

One of the main advantages of Redis is its ability to scale horizontally by adding more machines to the cluster. It also supports master-slave replication, which allows you to create read-only replicas of the data for improved performance and reliability.

in Apache Airflow, Redis can be used as a message broker for the CeleryExecutor, which allows you to scale out the execution of tasks by using a distributed task queue. It is also possible to use other message brokers with Airflow, such as RabbitMQ.

Apache Airflow CeleryExecutor PostgreSQL Redis

Create Docker-Compose.yml Script

Create the docker-compose.yml file and paste the script below. Then run the docker-compos up -d command.

(The script below was taken from the site Puckel)

version: '2.1'
services:
    redis:
        image: 'redis:5.0.5'
        # command: redis-server --requirepass redispass

    postgres:
        image: postgres:9.6
        environment:
            - POSTGRES_USER=airflow
            - POSTGRES_PASSWORD=airflow
            - POSTGRES_DB=airflow
        # Uncomment these lines to persist data on the local filesystem.
        #     - PGDATA=/var/lib/postgresql/data/pgdata
        # volumes:
        #     - ./pgdata:/var/lib/postgresql/data/pgdata

    webserver:
        image: puckel/docker-airflow:1.10.4
        restart: always
        depends_on:
            - postgres
            - redis
        environment:
            - LOAD_EX=n
            - FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho=
            - EXECUTOR=Celery
            # - POSTGRES_USER=airflow
            # - POSTGRES_PASSWORD=airflow
            # - POSTGRES_DB=airflow
            # - REDIS_PASSWORD=redispass
        volumes:
            - ./dags:/usr/local/airflow/dags
            # Uncomment to include custom plugins
            # - ./plugins:/usr/local/airflow/plugins
        ports:
            - "8080:8080"
        command: webserver
        healthcheck:
            test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
            interval: 30s
            timeout: 30s
            retries: 3

    flower:
        image: puckel/docker-airflow:1.10.4
        restart: always
        depends_on:
            - redis
        environment:
            - EXECUTOR=Celery
            # - REDIS_PASSWORD=redispass
        ports:
            - "5555:5555"
        command: flower

    scheduler:
        image: puckel/docker-airflow:1.10.4
        restart: always
        depends_on:
            - webserver
        volumes:
            - ./dags:/usr/local/airflow/dags
            # Uncomment to include custom plugins
            # - ./plugins:/usr/local/airflow/plugins
        environment:
            - LOAD_EX=n
            - FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho=
            - EXECUTOR=Celery
            # - POSTGRES_USER=airflow
            # - POSTGRES_PASSWORD=airflow
            # - POSTGRES_DB=airflow
            # - REDIS_PASSWORD=redispass
        command: scheduler

    worker:
        image: puckel/docker-airflow:1.10.4
        restart: always
        depends_on:
            - scheduler
        volumes:
            - ./dags:/usr/local/airflow/dags
            # Uncomment to include custom plugins
            # - ./plugins:/usr/local/airflow/plugins
        environment:
            - FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho=
            - EXECUTOR=Celery
            # - POSTGRES_USER=airflow
            # - POSTGRES_PASSWORD=airflow
            # - POSTGRES_DB=airflow
            # - REDIS_PASSWORD=redispass
        command: worker

Check The Container Statuses

Before navigating to pages with the user interface, check that all containers are in “UP” status. To do this, use the command: (Apache Airflow CeleryExecutor PostgreSQL Redis)

docker-compose ps
           Name                         Command                  State                         Ports                   
-----------------------------------------------------------------------------------------------------------------------
airflow-docker_flower_1      /entrypoint.sh flower            Up             0.0.0.0:5555->5555/tcp, 8080/tcp, 8793/tcp
airflow-docker_postgres_1    docker-entrypoint.sh postgres    Up             5432/tcp                                  
airflow-docker_redis_1       docker-entrypoint.sh redis ...   Up             6379/tcp                                  
airflow-docker_scheduler_1   /entrypoint.sh scheduler         Up             5555/tcp, 8080/tcp, 8793/tcp              
airflow-docker_webserver_1   /entrypoint.sh webserver         Up (healthy)   5555/tcp, 0.0.0.0:8080->8080/tcp, 8793/tcp
airflow-docker_worker_1      /entrypoint.sh worker            Up             5555/tcp, 8080/tcp, 8793/tcp          

User Interface

When all containers are running, we can open in turn:

Test DAG

The “dags” directory has been created in the directory where we ran the dokcer-compose.yml file. Let’s create our test DAG in it. For this purpose. I will direct you to my other post, where I described exactly how to do it.

In short: create a test dag (python file) in the “dags” directory. It will automatically appear in Airflow UI. Then just run it. In addition, check monitoring from the Flower UI level. (Apache Airflow CeleryExecutor PostgreSQL Redis)

Apache Airflow CeleryExecutor PostgreSQL Redis: Start the great environment using Docker-Compose in 5 minutes!
Apache Airflow CeleryExecutor PostgreSQL Redis: Start the great environment using Docker-Compose in 5 minutes! 4
Apache Airflow CeleryExecutor PostgreSQL Redis: Start the great environment using Docker-Compose in 5 minutes!
Apache Airflow CeleryExecutor PostgreSQL Redis: Start the great environment using Docker-Compose in 5 minutes! 5

Summary

That’s all about Apache Airflow CeleryExecutor PostgreSQL Redis: Start the great environment using Docker-Compose in 5 minutes!

Could You Please Share This Post? 
I appreciate It And Thank YOU! :)
Have A Nice Day!

How useful was this post?

Click on a star to rate it!

Average rating 4.8 / 5. Vote count: 953

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?