How to install Apache Airflow

To install Apache Airflow on Linux, you can follow these general steps. The following steps are for installing Airflow using pip, which is the recommended method.

  1. Prerequisites:
    • Python (typically version 3.6 or higher)
    • pip (Python package installer)
  2. Create a Virtual Environment (Optional): While not strictly necessary, it’s often a good practice to create a virtual environment to isolate the Python packages required for Airflow from your system’s Python environment. You can create a virtual environment using virtualenv or venv module.bashCopy code# Install virtualenv if you haven't already pip install virtualenv # Create a virtual environment virtualenv airflow_env # Activate the virtual environment source airflow_env/bin/activate
  3. Install Airflow: Once you have your environment set up, you can install Apache Airflow using pip.bashCopy codepip install apache-airflow
  4. Initialize Airflow Database: After installing Airflow, you need to initialize the metadata database. Airflow uses a database to store metadata related to task execution, connections, variables, and more.bashCopy codeairflow db init
  5. Start the Web Server and Scheduler: Airflow consists of a web server and a scheduler. The web server provides a UI to monitor and interact with your workflows, while the scheduler executes tasks on a predefined schedule.bashCopy code# Start the web server airflow webserver --port 8080 # Start the scheduler airflow scheduler
  6. Access Airflow UI: Once the web server is running, you can access the Airflow UI by opening a web browser and navigating to http://localhost:8080 or the appropriate address if you specified a different port.