Research

Trust Between AI Agents: Measuring Formation, Breakage, and Recovery, with Implications for Governing Multi-Agent Systems

Researchers have developed a behavioral framework for measuring trust between AI agents in multi-agent systems, using a cooperative survival game in which checking a teammate's work costs resources while misplaced trust can be fatal. Testing across six frontier model snapshots revealed that trust forms faster than it recovers, and that some models generalize distrust to the whole team after a single failure rather than isolating scrutiny on the culprit. The authors argue that the goal for governing multi-agent systems should be calibrated trust rather than maximum suspicion, which they find is associated with indecision rather than safety.

Read full story at cs.AI updates on arXiv.org →V: · A: · D:

Research

Nothing from Something: Can a Language Model Discover 0?

This arxiv paper uses the concept of zero as a test case for whether language models can engage in genuine mathematical ...

Research

Relational Structural Causal Models

Researchers have extended Pearl's structural causal models to settings where objects and their relations vary, addressin...

Research

A Definition of Good Explanations and the Challenges Explaining LLM Outputs

This arxiv paper proposes a formal definition of what constitutes a good explanation, drawing on counterfactual reasonin...