AndronETalksNews
AndronETalksNews
Futurism – The Byte
By Maggie Harrison – Dupré
How hard would it be to train an AI model to be secretly evil? As it turns out, according to AI researchers, not very — and attempting to reroute a bad apple AI’s more sinister proclivities might backfire in the long run.
In a yet-to-be-peer-reviewed new paper, researchers at the Google-backed AI firm Anthropic claim they were able to train advanced large language models (LLMs) with “exploitable code,” meaning it can be triggered to prompt bad AI behavior via seemingly benign words or phrases. As the Anthropic researchers write in the paper, humans often engage in “strategically deceptive behavior,” meaning “behaving helpfully in most situations, but then behaving very differently to pursue alternative objectives when given the opportunity.” If an AI system were trained to do the same, the scientists wondered, could they “detect it and remove it using current state-of-the-art safety training techniques?”
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |