

Airflow brings plenty of operators that you can find here. If you want to execute a Bash command, you can use the BashOperator, and so on. For example, if you want to execute a Python function, you can use the PythonOperator. An Operator is an object that encapsulates the logic you want to achieve.

Ok, once you know what a DAG is, the next question is, what is a “Node” in the context of Airflow? What is an Airflow Operator? Whenever you read “DAG,” it means “data pipeline.” Last but not least, when Airflow triggers a DAG, it creates a DAG run with information such as the logical_date, data_interval_start, and data_interval_end. A DAG is a data pipeline in Apache Airflow. As Node A depends on Node C, which depends on Node B and itself on Node A, this DAG (which is not) won’t run at all. In simple terms, it is a graph with nodes, directed edges, and no cycles. I know, the boring part, but stay with me, it is essential.ĭAG stands for Directed Acyclic Graph. Well, this is precisely what you are about to find out now! Airflow DAG? Operators? Terminologiesīefore jumping into the code, you must first get used to some terminologies. How can I create an Airflow DAG representing this data pipeline? Now, the first question that comes up is: You could store the value in a database, but let’s keep things simple.

Your goal is to train three different machine learning models, then choose the best one and execute either is_accurate or is_inaccurate based on the accuracy score of the best model. Imagine you want to create the following data pipeline:
