Results

Submissions: 46 (46% of accepted papers)

Evaluation Results:

45 Artifacts Available
43 Artifacts Functional
35 Results Reproduced

Paper title	Avail.	Funct.	Repro.
ASTERINAS: A Linux ABI-Compatible, Rust-Based Framekernel OS with a Small and Sound TCB
Burst Computing: Quick, Sudden, Massively Parallel Processing on Serverless Resources
Chitu: Avoiding Unnecessary Fallback in Byzantine Consensus
CLONE: Customizing LLMs for Efficient Latency-Aware Inference at the Edge
Colocating ML Inference and Training with Fast GPU Memory Handover
CrossPipe: Towards Optimal Pipeline Schedules for Cross-Datacenter Training
DSA-2LM: A CPU-Free Tiered Memory Architecture with Intel DSA
Fast Distributed Transactions for RDMA-based Disaggregated Memory
FlexPipe: Maximizing Training Efficiency for Transformer-based Models with Variable-Length Inputs
GeneralSparse: Bridging the Gap in SpMM for Pruned Large Language Model Inference on GPUs
GMI-DRL: Empowering Multi-GPU DRL with Adaptive-Grained Parallelism
GPREEMPT: GPU Preemptive Scheduling Made General and Efficient
GREYHOUND: Hunting Fail-Slows in Hybrid-Parallel Training at Scale
HotRAP: Hot Record Retention and Promotion for LSM-trees with Tiered Storage
IRHash: Efficient Multi-Language Compiler Caching by IR-Level Hashing
JENGA: Enhancing LLM Long-Context Fine-tuning with Contextual Token Sparsity
Katz: Efficient Workflow Serving for Diffusion Models with Many Adapters
LEOCraft: Towards Designing Performant LEO Networks
LITESHIELD: Secure Containers via Lightweight, Composable Userspace μKernel Services
Mitigating Resource Usage Dependency in Sorting-based KV Stores on Hybrid Storage Devices via Operation Decoupling
mTuner: Accelerating Parameter-Efficient Fine-Tuning on Multi-GPU Servers with Elastic Tensor
On-Demand Container Partitioning for Distributed ML
Para-ksm: Parallelized Memory Deduplication with Data Streaming Accelerator
PathWeaver: A High-Throughput Multi-GPU System for Graph-Based Approximate Nearest Neighbor Search
Poby: SmartNIC-accelerated Image Provisioning for Coldstart in Clouds
PPipe: Efficient Video Analytics Serving on Heterogeneous GPU Clusters via Pool-Based Pipeline Parallelism
QFactory: Accelerating Quantized Large Language Model Serving with Qtile Graphs
Resource Multiplexing in Tuning and Serving Large Language Models
Revealing Floating-Point Accumulation Orders in Software/Hardware Implementations
Rex: Closing the language-verifier gap with safe and usable kernel extensions
SAVE: Software-Implemented Fault Tolerance for Model Inference against GPU Memory Bit Flips
Separate but Together: Integrating Remote Attestation into TLS
ShieldReduce: Fine-Grained Shielded Data Reduction
SpaceExit: Enabling Efficient Adaptive Computing in Space with Early Exits
SwCC: Software-Programmable and Per-Packet Congestion Control in RDMA Engine
The Koala Benchmarks for the Shell: Characterization and Implications
Toppings: CPU-Assisted, Rank-Aware Adapter Serving for LLM Inference
Torpor: GPU-Enabled Serverless Computing for Low-Latency, Resource-Efficient Inference
Turbocharge ANNS on Real Processing-in-Memory by Enabling Fine-Grained Per-PIM-Core Scheduling
Understanding and Detecting Fail-Slow Hardware Failure Bugs in Cloud Systems
Universal Checkpointing: A Flexible and Efficient Distributed Checkpointing System for Large-Scale DNN Training with Reconfigurable Parallelism
Unveiling Compiler Faults via Attribute-Guided Compilation Space Exploration
Voltrix: Sparse Matrix-Matrix Multiplication on Tensor Cores with Asynchronous and Balanced Kernel Optimization
Weaver: Efficient Multi-LLM Serving with Attention Offloading
XRT: An Accelerator-Aware Runtime for Accelerated Chip Multiprocessors
μEFI: A Microkernel-Style UEFI with Isolation and Transparency