You are currently viewing Apache Airflow Short Introduction And Simple DAG – 2 Cool Secrets to Becoming an Airflow Beginner!
Could You Please Share This Post? I Appreciate It And Thank YOU! :) Have A Nice Day!
4.9
(2192)

Apache Airflow is a software which you can easily use to schedule and monitor your workflows. [ Apache Airflow Short introduction and simple DAG ]. It’s written in Python. As each software Airflow also consist of concepts which describes main and atomic functionalities. In Airflow you will encounter:

  • DAG (Directed Acyclic Graph) – collection of task which in combination create the workflow. In DAG you specify the relationships between takes (sequences or parallelism of tasks), order and dependencies.
  • Operator – represents the single task. From technical point of view you can treat operator like “wrapper” for operation.
  • Task – is a instance of specific run of operator in the DAG representation.

Please find the list of some available Operators in version 2.2.5 of Airflow:

  • BashOperator
  • BranchPythonOperator
  • CheckOperator
  • DockerOperator
  • DummyOperator
  • DruidCheckOperator
  • EmailOperator
  • GenericTransfer
  • HiveToDruidTransfer
  • HiveToMySqlTransfer
  • Hive2SambaOperator
  • HiveOperator
  • HiveStatsCollectionOperator
  • IntervalCheckOperator
  • JdbcOperator
  • LatestOnlyOperator
  • MsSqlOperator
  • MsSqlToHiveTransfer
  • MySqlOperator
  • MySqlToHiveTransfer
  • OracleOperator
  • PigOperator
  • PostgresOperator
  • PrestoCheckOperator
  • PrestoIntervalCheckOperator
  • PrestoToMySqlTransfer
  • PrestoValueCheckOperator
  • PythonOperator
  • PythonVirtualenvOperator
  • S3FileTransformOperator
  • S3ToHiveTransfer
  • S3ToRedshiftTransfer
  • ShortCircuitOperator
  • SimpleHttpOperator
  • SlackAPIOperator
  • SlackAPIPostOperator
  • SqliteOperator
  • SubDagOperator
  • TriggerDagRunOperator
  • ValueCheckOperator
  • RedshiftToS3Transfer

Create simple DAG with two operators

As our exercise we will create DAG with two BashOperator tasks. Both of them will print the message to console.

First of all, we have to create the new python file in AIRFLOW_HOME/dags directory. The Airflow scheduler monitors this folder in interval of time and after few seconds you are able to see your DAG in Airflow UI.

In my case AIRFLOW_HOME=/home/pawel/airflow => It determines that my dags I need to upload into /home/pawel/airflow/dags folder. Below you can find DAG implementation. I saved it into simple-dag.py file in dags directory. (Apache Airflow Short introduction and simple DAG)

import airflow
from airflow.operators.bash_operator import BashOperator
from airflow.models import DAG
from datetime import datetime, timedelta

args = {
    'owner': 'Pawel',
    'start_date': datetime(2018, 12, 02, 16, 40, 00),
    'email': ['bigdataetlcom@gmail.com'],
    'email_on_failure': False,
    'email_on_retry': False
}
dag = DAG(dag_id='Simple_DAG', default_args=args, schedule_interval='@daily', concurrency=1, max_active_runs=1,
          catchup=False)
task_1 = BashOperator(
    task_id='task_1',
    bash_command='echo Hello, This is my first DAG in Airflow!',
    dag=dag
)
task_2 = BashOperator(
    task_id='task_2',
    bash_command='echo With two operators!',
    dag=dag
)
task_1 >> task_2

Airflow UI

In configuration file: (AIRFLOW_HOMEairflow.cfg) you can fine the “endpoint_url” parameter. As default it is set to: http://localhost:8080. If you are running your Airflow service on your local machine you should be able to open this url in your browser. In case when you have Airflow installed on remote host machine I recommend you change the “localhost” to public IP of your Airflow host. Additionally please change “base_url” parameter as well. (Apache Airflow Short introduction and simple DAG )

endpoint_url = http://localhost:8080
base_url = http://localhost:8080

Run created DAG

You should be able to see our created DAG in Airflow UI as below in the picture. I have already launched it and it’s a reason why you can see “On” button with dark green background. When you trigger your DAG you should be able to see that in the “Recent Tasks” section are created new task which are going through each status of task (in case of succeeded scenario of course): (Apache Airflow Short introduction and simple DAG )

  1. Scheduled
  2. Queued
  3. Running
  4. Finished
Apache Airflow Short introduction and simple DAG - check cool DAG in 5 min!

“DAG Runs” section represents information about all running of DAGs on DAG level instead of “Recent task” which shows only the status of recent tasks of recent DAG as the name suggests :). (The DAG is a container for task/tasks).

Let’s open the succeeded run of DAG and see the logs of each task. In below image was presented the “Graph view” of our DAG. It is very useful when you would like to verify whether you task in DAG are in a correct order of execution and see the execution status of each task. In case of failure you will see that, for example “task_2” will be marked as yellow (when the task status will be set to “up_for_retry”) or red in case of failed.

Here you can easily go to the logs of each task, just click left button on selected task you will see the modal dialog with many options. One of them is button “Logs”. (Apache Airflow Short introduction and simple DAG )

Apache Airflow Short introduction and simple DAG - check cool DAG in 5 min!

Logs of #Task_1

Apache Airflow Short introduction and simple DAG - check cool DAG in 5 min!

Logs of #Task_2

Apache Airflow Short introduction and simple DAG - check cool DAG in 5 min!

Summary

After this post you should be able to create, run and debug the simple DAG in Airflow.

Apache Airflow is a powerful tool to orchestrate workflows in the projects and organizations. Despite it is still in Apache Incubator Airflow is used by many “Big Players” in IT world. (Apache Airflow Short introduction and simple DAG )

Could You Please Share This Post? 
I appreciate It And Thank YOU! :)
Have A Nice Day!

BigData-ETL: image 7YOU MIGHT ALSO LIKE

How useful was this post?

Click on a star to rate it!

Average rating 4.9 / 5. Vote count: 2192

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?