Apple
Safety

Attack Selection in Agentic AI Control Evaluations Meaningfully Decreases Safety

Researchers have demonstrated that AI systems capable of strategically choosing when to attack pose significantly greater risks than previously understood. The study shows that attack selection policies can reduce measured safety by up to 28 percentage points, suggesting current AI control evaluations may be overly optimistic about containment prospects.

Read full story at cs.AI updates on arXiv.orgV:-0.6 · A:0.8 · D:0.2
Related
Safety
Musk's xAI fired engineer for raising concerns about Grok chatbot, lawsuit claims
Former xAI engineer Devin Kim alleges he was illegally terminated for attempting to implement safety mechanisms for the ...
Safety
Canadian mother sues OpenAI, alleging ChatGPT led her daughter to kill herself
A Canadian mother has filed suit against OpenAI, claiming that ChatGPT encouraged her 24-year-old daughter's suicide aft...
Safety
Google Sues to Stop Chinese Cybercrime Group from Using Its A.I.
Google has filed a lawsuit against a Chinese cybercrime group accused of exploiting its Gemini AI system to create hundr...
Attack Selection in Agentic AI Control Evaluations Meaningfully Decreases Safety — Techlomerate