Skip to content

Running Airflow on top of Apache Mesos

Apache Airflow is a wonderful product, possibly one of the best when it comes to orchestrating workflows. Airflow supports different executors for running these workflows, namely,LocalExecutor SequentialExecutor & CeleryExecutor. Out of these, only CeleryExecutor supports distributed execution of these tasks. A typical multi-node cluster setup using CeleryExecutor looks like the following and here is an excellent resource explaining the setup.

Airflow Multi Node Setup

There is one more community contributed Executor which allows us to run Airflow tasks on Apache Mesos, i.e. MesosExecutor. This executor basically allows Airflow to be registered as a framework alongside others in Mesos. What that means is, as the Mesos slaves become free, they are offered as “resource offers” to one of the registered frameworks by the Mesos Allocator.

With Airflow running on Mesos, the whole deployment architecture looks like

Airflow on Mesos

This is how it works:

  1. The Mesos Executor implements a Scheduler interface to accept these resource offers and create tasks and ask the MesosSchedulerDriver to launch these tasks on the slaves that were a part of accepted resource offers.
  2. On accepting the resource offer, the scheduler creates a Task and specifies the command to be executed when the task is run on the slave.
  3. MesosSchedulerDriver then coordinates with Mesos master to run these tasks on the mesos slaves using the default executor.

There is just one issue that I see with Mesos Executor – it assumes that the Mesos Slaves already have Airflow installed on them so that they can run those commands, which are actually airflow commands. IMHO, this goes against the Mesos philosophy which advocates for a heterogeneous cluster running different types of jobs as opposed to having separate clusters for running Hadoop, Spark or Airflow jobs.

Mesos Python Eggs

Since Airflow is written in Python so it uses the Python bindings for Mesos ( also known as Python Eggs for Mesos ). Earlier these eggs were directly downloadable from Mesosphere but they don’t seem to be available for direct download so the workaround is to build Mesos for your platform and then extract the eggs from the installation and put them in your Docker for Airflow. A small script to achieve this fetching and then installing them looks like

Note that mesos.cli & mesos.interface are installable via pip but at the time of this writing, other bindings need to be installed as eggs using easy_install

Once this is out of the way, you can now use MesosExecutor and run the airflow tasks on Mesos slaves easily.

Published inairflowBig Data
The views expressed on this blog are those of the author and do not necessarily reflect the views of his past or future employers in any manner