Artifact Review Summary: Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP

Artifact Details

Badges Awarded

Artifact Available	Artifact Functional	Results Reproduced

Description of the Artifact

Artifact location: https://github.com/stanford-futuredata/POP
Readme: https://github.com/stanford-futuredata/POP/blob/main/EXPERIMENTS.md

The provided artifact has code, dependencies, scripts to run the experiments for each of the three – Cluster scheduling, Load balancing, Traffic engineering – application areas on which POP technqiue was evaluated on. For each application area, there are quite a few dependencies. The dependencies have been clearly documented (typically in requirements.txt which corresponds to the required python modules) in their folders. Scripts have also been furnished to run the experiments and to proces the results.

Environment(s) Used for Testing

Environment-1:

Used the image provided by the authors on an AWS EC2 spot instances (m5.8xlarge, recommended by the authors) to evaluate.

Environment-2:

Used the instructions provided by the authors to develop an image (shared among the reviewers) to run evaluations on CloudLab. Server: C6420 node, with 32 cores and 384 GB memory, which is in configuration similar to m5.8xlarge VMs. OS: Ubuntu 18.04

Step-By-Step Instructions to Exercise the Artifact

Software/Licenses required:
- See instructions at https://github.com/stanford-futuredata/POP/blob/main/EXPERIMENTS.md
- Dependenices:
  - Gurobi academic license
  - IBM CPLEX 20.1 optimization package.
  - Use python packages suggested in the instructions.
Installation instructions:

    # Ubuntu 18.04 installation
    sudo apt update && sudo apt -y upgrade
    sudo apt install -y build-essential cmake python-dev openjdk-11-jre-headless default-jre maven unzip zip htop g++ gcc libnuma-dev make numactl zlib1g-dev    

    # Requied Python 3.7 or higher. Need to make sure that is the default version of python/py3
    sudo apt install python3.7
    sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 37
    sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.7 37

    # In order to make sure we install the python dev package of the right python-version.
    sudo apt remove python3-dev
    sudo apt purge python3-dev
    sudo apt install python3.7-dev 

    # Install Miniconda
    wget https://repo.anaconda.com/miniconda/Miniconda3-py38_4.10.3-Linux-x86_64.sh
    bash Miniconda3-py38_4.10.3-Linux-x86_64.sh

    #Install Gurobi:
    wget https://packages.gurobi.com/8.1/gurobi8.1.1_linux64.tar.gz
    tar xvf gurobi8.1.1_linux64.tar.gz

    # Install IBM CPLEX 20.1 (check instructions in the repo)
    Run the installer, specifying /home/ubuntu/cplex201 as the install directory.

    # Modify .bashrc
    export GUROBI_HOME=$HOME/gurobi811/linux64
    export CPLEX_HOME=$HOME/cplex121/cplex
    export LD_LIBRARY_PATH=$CPLEX_HOME/bin/x86-64_linux:$GUROBI_HOME/lib:$LD_LIBRARY_PATH
    export PATH=$GUROBI_HOME/bin:$PATH

    # Installation related to Cluster Scheduling
    cd POP/cluster_scheduling
    conda create --name cluster_scheduling
    pip3 install scikit-build numpy cython  # The list of modules in requirements.txt might not be in the right order.
    pip3 install -r requirements.txt
    cd scheduler; make

    # Traffic engineering
    cd POP/traffic_engineering
    conda env create -f environment.yml
    conda activate traffic_engineering
    pip install -r requirements.txt
    ./download.sh # Download the traffic matrices used in experiments.

    # Installation related to load balancing
    cd POP/load_balancing
    mvn package    

How The Artifact Supports The Paper

The work was awarded the following badges, followed by the reasons:

Artifact available: The repo provides the necessary code, and the experimental setup for all the three different application areas in which the proposed solution was evaluated. A comprehensive documentation on the organization of the aritfact is available.
Artifact functional: The reviewers were able to run the experiments related to each application area. While the instructions weren’t comprehensive in the beginning, from the interactions between the reviewers and the authors, we were ultimately able to run all the experiments. The authors have updated the instructions based on the the recommendations from the reviewers. Since the reviewers were able to satisfactorily run the experiments, we awarded the badge “Functional” badge.
Results reproduced: In environment-1, we were able to run all the experiments. However, the experiments can take rather long time to run, making it harder to monitor the progress. In environment-2, not all the reviewers were able to run all the experiments. But we were generally able to run 2 of the 3 applications.

The authors provided us instructions on how to interpret the results, scripts were provided to get the relevant datapoints for the corresponding plots from the paper. Since we were able to confirm the key results –there is reduction in job completion time, and the allocations are close to optimal– were verified for two (cluster scheduling and load balancing) applications, the reviewers came to the consensus that the artifact merited the results reproduced badge.