To install Apache Airflow on Linux, you can follow these general steps. The following steps are for installing Airflow using pip, which is the recommended method.
- Prerequisites:
- Python (typically version 3.6 or higher)
- pip (Python package installer)
- Create a Virtual Environment (Optional): While not strictly necessary, it’s often a good practice to create a virtual environment to isolate the Python packages required for Airflow from your system’s Python environment. You can create a virtual environment using
virtualenvorvenvmodule.bashCopy code# Install virtualenv if you haven't already pip install virtualenv # Create a virtual environment virtualenv airflow_env # Activate the virtual environment source airflow_env/bin/activate - Install Airflow: Once you have your environment set up, you can install Apache Airflow using pip.bashCopy code
pip install apache-airflow - Initialize Airflow Database: After installing Airflow, you need to initialize the metadata database. Airflow uses a database to store metadata related to task execution, connections, variables, and more.bashCopy code
airflow db init - Start the Web Server and Scheduler: Airflow consists of a web server and a scheduler. The web server provides a UI to monitor and interact with your workflows, while the scheduler executes tasks on a predefined schedule.bashCopy code
# Start the web server airflow webserver --port 8080 # Start the scheduler airflow scheduler - Access Airflow UI: Once the web server is running, you can access the Airflow UI by opening a web browser and navigating to
http://localhost:8080or the appropriate address if you specified a different port.