Results

Evaluation Results:

  • 14 Artifact Available, Functional, and Results Reproduced
  • 7 Artifact Available and Functional
  • 0 Artifact Functional and Results Reproduced
Title Available Functional Reproduced Available at
AI Realtor: Towards Grounded Persuasive Language Generation for Automated Copywriting Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Artifact
Robust Batch-Level Query Routing for Large Language Models under Cost and Capacity Constraints Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Artifact
CAMI: Cost-Aware Agent-Guided Multi-Indexing for Semantic Retrieval Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Artifact
Glia: A Human-Inspired AI for Automated Systems Design and Optimization Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Artifact
OpaqueToolsBench: Learning Nuances of Tool Behavior Through Interaction Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Artifact
Robust Agent Compensation (RAC): Teaching AI Agents to Compensate Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Artifact
ViBench: A Benchmark on Vibe Coding Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Artifact
AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Results Reproduced (v1.1) Artifact
Context, Reasoning, and Hierarchy: A Cost–Performance Study of Compound LLM Agent Design in an Adversarial POMDP Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Results Reproduced (v1.1) Artifact
Do Agents Need to Plan Step-by-Step? Rethinking Planning Horizon in Data-Centric Tool Calling Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Results Reproduced (v1.1) Artifact
Exploring and Developing a Pre-Model Safeguard with Draft Models Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Results Reproduced (v1.1) Artifact
FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Results Reproduced (v1.1) Artifact
How To Steer Your Multi-Agent System: Human-LLM Collaborative Planning Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Results Reproduced (v1.1) Artifact
Improving Coherence and Persistence in Agentic AI for System Optimization Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Results Reproduced (v1.1) Artifact
Learning from Supervision with Semantic and Episodic Memory: A Reflective Approach to Agent Adaptation Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Results Reproduced (v1.1) Artifact
optimize_anything: Unified Text Optimization can Outperform Specialized Systems Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Results Reproduced (v1.1) Artifact
Retrieval-Augmented LLMs for Security Incident Analysis Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Results Reproduced (v1.1) Artifact
Securing Agents With Tracked Capabilities Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Results Reproduced (v1.1) Artifact
Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Results Reproduced (v1.1) Artifact
Vista: Verifier-in-the-Loop Agentic Reinforcement Learning for Quantum Program Synthesis Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Results Reproduced (v1.1) Artifact
Who Decides the Trade-off? Resolution Policy as Delegation Governance in Autonomous Agents Artifacts Available (v1.1) Artifacts Evaluated - Functional (v1.1) Results Reproduced (v1.1) Artifact