Study finds filtered data stops openly-available AI models from performing dangerous tasks

Mădălin Mihai

Researchers from the University of Oxford, EleutherAI, and the UK AI Security Institute have reported a major advance in safeguarding open-weight language models. By filtering out potentially harmful knowledge during training, the researchers were able to build models that resist subsequent malicious updates – especially valuable in sensitive domains such as biothreat research. Senior author Yarin Gal , Associate Professor of Machine Learning at Oxford’s Department of Computer Science, said: ‘

din zilele anterioare