Report

We are happy to report about the OSDI’21 artifact evaluation process. This is the second time that OSDI conducted such a process and we hope to keep improving it so that artifact evaluation will become more common in our community’s conferences.

Process

We continued to use the three-badge approach (vs. the single-badge approach) from OSDI’20 to evaluation and these three badges include:

Artifacts Available: To earn this badge, the AEC must judge that the artifacts associated with the paper have been made available for retrieval, permanently and publicly.
Artifacts Functional: To earn this badge, the AEC must judge that the artifacts conform to the expectations set by the paper in terms of functionality, usability, and relevance.
Results Reproduced: To earn this badge, the AEC must judge that they can use the submitted artifacts to obtain the main results presented in the paper.

Evaluation

We had 28 reviewers and we assigned 2 or 3 artifacts for each reviewer so that each artifact was evaluated by 3 reviewers. The evaluation process had two key phases: the kick-the-tires phase and the in-depth evaluation phase. During the kick-the-tires phase, reviewers made a quick first pass over all assignments to identify and report obvious problems and communicated them with the authors. After the kick-the-tires phase, reviewers evaluated each assignment thoroughly and wrote detailed reviews. Finally, reviewers coordinated and communicated with fellow AEC members and decided which badges should be awarded to each artifact.

Results

OSDI’21 accepted 31 papers and 26 papers participated in the AE, a significant increase in the participate ratio: 84%, compared to OSDI’20 (70%) and SOSP’19 (61%). Of the 26 submitted artifacts:

26 artifacts received the Artifacts Available badge (100%).
23 artifacts received the Artifacts Functional badge (88%).
20 artifacts received the Results Reproduced badge (77%).

Key Takeaways

CloudLab Resources: Our experience showed that CloudLab (https://cloudlab.us/) can effectively facilitate the evaluation process. We suggest future AEC chairs prepare and make CloudLab resources available from the beginning of the evaluation process.

Usage of Screencasts: Some artifacts could only be evaluated based on the screencasts due to various constraints. This posed a few challenges around identifying a consistent standard for evaluating screencasts. We suggest future AEC chairs provide clear guidance on screencasts to authors and announce ahead of time whether they count for different badges.

Finally, we deeply thank the authors and the AEC committee for all their efforts in making the OSDI ‘21 AE possible, especially during a pandemic.

Guyue (Grace) Liu, Carnegie Mellon University
Manuel Rigger, ETH Zürich
Lalith Suresh, VMware Research
OSDI’21 Artifact Evaluation Committee Co-chairs