Thomas Kuntz
work
/
projects
Projects
OS-Harm
Benchmark Measuring the Safety of Computer Use Agents (NeurIPS Spotlight Benchmark)