In this post I will show you how to create a fully operational environment which consist of Apache Airflow" CeleryExecutor PostgreSQL Redis" in 5 minutes, which will include:
Table of Contents
Apache Airflow" is a platform to programmatically author, schedule, and monitor workflows. It is an open-source tool that was developed by Airbnb in 2014 and has since become one of the most popular workflow management tools in the industry.
Airflow is often used to automate and manage ETL" (Extract, Transform, Load) pipelines, as well as machine learning" workflows. It provides a simple, yet powerful, way to define and orchestrate workflows as directed acyclic graphs (DAGs) of tasks.
Here are some key features of Apache Airflow":
- Task orchestration: Airflow" allows you to define tasks and dependencies between them, so that you can build complex workflows by combining multiple tasks together.
- Scheduling: Airflow" has a built-in scheduler that can run tasks on a predefined schedule or in response to certain triggers.
- Monitoring: Airflow" provides a web interface that allows you to monitor the status of your workflows and tasks, as well as view logs and error messages.
- Extensibility: Airflow" is highly extensible and provides a rich set of APIs that allow you to integrate it with other tools and platforms.
Overall, Apache Airflow" is a powerful tool for automating and managing complex workflows. It can help you improve the reliability and efficiency of your data pipelines and machine learning" workflows, and make it easier to maintain and evolve your workflow over time.
The CeleryExecutor is an executor in Apache Airflow" that allows you to scale out the execution of tasks by using a distributed task queue. It is based on the Celery distributed task queue system, which is a popular open-source task queue system written in Python".
The CeleryExecutor works by sending tasks to a message broker (such as RabbitMQ or Redis") that are then picked up by worker processes that run the tasks. This allows you to scale out the execution of tasks horizontally by adding more worker processes.
To use the CeleryExecutor, you will need to set up a message broker and configure Airflow" to use it. You will also need to start worker processes that can pick up tasks from the message broker.
One advantage of the CeleryExecutor is that it allows you to scale out the execution of tasks horizontally, which can be useful if you have a large number of tasks or if your tasks are resource-intensive. However, it does require additional setup and infrastructure compared to the default" SequentialExecutor.
PostgreSQL, often simply called Postgres", is a powerful, open-source object-relational database management system (ORDBMS). It is designed to handle a wide range of workloads, from small single-machine applications to large internet-facing applications with many concurrent users.
Postgres" is known for its reliability, robustness, and performance, and it has a strong reputation for being a database" that is easy to use and maintain. It supports a wide range of data types and features, including support for JSON" and array data types, full-text search, and database transactions.
Postgres" is also highly extensible, with a rich set of APIs and tools for building custom applications and integrations. It is widely used in a variety of industries, including finance, healthcare, e-commerce, and government.
in Apache Airflow", Postgres" is used as the default" database to store metadata about DAGs, tasks, and other information. It is also possible to use other databases with Airflow", such as MySQL or Oracle", by using third-party plugins.
Redis" (Remote Dictionary Server) is an open-source in-memory data store that is often used as a cache, message broker, or database. It is known for its high performance and scalability, and it is widely used in a variety of applications and systems.
Redis" stores data in-memory, which makes it extremely fast, but it can also write data to disk for persistence. It supports a wide range of data types, including strings, hashes, lists, and sets, and it provides a rich set of commands for manipulating and querying data.
One of the main advantages of Redis" is its ability to scale horizontally by adding more machines to the cluster. It also supports master-slave replication, which allows you to create read-only replicas of the data for improved performance and reliability.
in Apache Airflow", Redis" can be used as a message broker for the CeleryExecutor, which allows you to scale out the execution of tasks by using a distributed task queue. It is also possible to use other message brokers with Airflow", such as RabbitMQ.
Apache Airflow CeleryExecutor PostgreSQL Redis
- Apache Airflow" WebServer
- Apache Airflow" Worker
- Apache Airflow Scheduler"
- Flower – is a web based tool for monitoring and administrating Celery clusters
- Redis" – is an open" source (BSD licensed), in-memory data structure store, used as a database", cache and message broker.
Create Docker-Compose.yml Script
Create the docker-compose".yml file and paste the script below. Then run the docker-compos up -d command.
(The script below was taken from the site Puckel)
version: '2.1' services: redis: image: 'redis:5.0.5' # command: redis-server --requirepass redispass postgres: image: postgres:9.6 environment: - POSTGRES_USER=airflow - POSTGRES_PASSWORD=airflow - POSTGRES_DB=airflow # Uncomment these lines to persist data on the local filesystem. # - PGDATA=/var/lib/postgresql/data/pgdata # volumes: # - ./pgdata:/var/lib/postgresql/data/pgdata webserver: image: puckel/docker-airflow:1.10.4 restart: always depends_on: - postgres - redis environment: - LOAD_EX=n - FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho= - EXECUTOR=Celery # - POSTGRES_USER=airflow # - POSTGRES_PASSWORD=airflow # - POSTGRES_DB=airflow # - REDIS_PASSWORD=redispass volumes: - ./dags:/usr/local/airflow/dags # Uncomment to include custom plugins # - ./plugins:/usr/local/airflow/plugins ports: - "8080:8080" command: webserver healthcheck: test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"] interval: 30s timeout: 30s retries: 3 flower: image: puckel/docker-airflow:1.10.4 restart: always depends_on: - redis environment: - EXECUTOR=Celery # - REDIS_PASSWORD=redispass ports: - "5555:5555" command: flower scheduler: image: puckel/docker-airflow:1.10.4 restart: always depends_on: - webserver volumes: - ./dags:/usr/local/airflow/dags # Uncomment to include custom plugins # - ./plugins:/usr/local/airflow/plugins environment: - LOAD_EX=n - FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho= - EXECUTOR=Celery # - POSTGRES_USER=airflow # - POSTGRES_PASSWORD=airflow # - POSTGRES_DB=airflow # - REDIS_PASSWORD=redispass command: scheduler worker: image: puckel/docker-airflow:1.10.4 restart: always depends_on: - scheduler volumes: - ./dags:/usr/local/airflow/dags # Uncomment to include custom plugins # - ./plugins:/usr/local/airflow/plugins environment: - FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho= - EXECUTOR=Celery # - POSTGRES_USER=airflow # - POSTGRES_PASSWORD=airflow # - POSTGRES_DB=airflow # - REDIS_PASSWORD=redispass command: worker
Check The Container Statuses
Before navigating to pages with the user" interface, check that all containers are in “UP” status. To do this, use the command: (Apache Airflow CeleryExecutor PostgreSQL" Redis")
docker-compose ps Name Command State Ports ----------------------------------------------------------------------------------------------------------------------- airflow-docker_flower_1 /entrypoint.sh flower Up 0.0.0.0:5555->5555/tcp, 8080/tcp, 8793/tcp airflow-docker_postgres_1 docker-entrypoint.sh postgres Up 5432/tcp airflow-docker_redis_1 docker-entrypoint.sh redis ... Up 6379/tcp airflow-docker_scheduler_1 /entrypoint.sh scheduler Up 5555/tcp, 8080/tcp, 8793/tcp airflow-docker_webserver_1 /entrypoint.sh webserver Up (healthy) 5555/tcp, 0.0.0.0:8080->8080/tcp, 8793/tcp airflow-docker_worker_1 /entrypoint.sh worker Up 5555/tcp, 8080/tcp, 8793/tcp
When all containers are running, we can open" in turn:
- localhost:8080 – Apache Airflow UI"
- localhost:5555 – Flower UI
The “dags” directory has been created in the directory where we ran the dokcer-compose.yml file. Let’s create our test DAG in it. For this purpose. I will direct you to my other post, where I described exactly how to do it.
In short: create a test dag (python" file) in the “dags” directory. It will automatically appear in Airflow UI". Then just run it. In addition, check monitoring from the Flower UI level. (Apache Airflow CeleryExecutor PostgreSQL" Redis")
That’s all about Apache Airflow CeleryExecutor PostgreSQL" Redis": Start the great environment using Docker-Compose" in 5 minutes!
Could You Please Share This Post? I appreciate It And Thank YOU! :) Have A Nice Day!
We are sorry that this post was not useful for you!
Let us improve this post!
Tell us how we can improve this post?