sanitation@lemmy.radio to Technology@lemmy.worldEnglish · 6 days agoResearchers gaslit Claude into giving instructions to build explosiveswww.theverge.comexternal-linkmessage-square25linkfedilinkarrow-up1213arrow-down18
arrow-up1205arrow-down1external-linkResearchers gaslit Claude into giving instructions to build explosiveswww.theverge.comsanitation@lemmy.radio to Technology@lemmy.worldEnglish · 6 days agomessage-square25linkfedilink
minus-squareKrompus@lemmy.worldlinkfedilinkEnglisharrow-up6·5 days agoYou are likely to be eaten by a grue.
minus-squarebadgermurphy@lemmy.worldlinkfedilinkEnglisharrow-up1·3 days agoThat has more to do with the darkness than his LLM use.
minus-squareBodilessGaze@sh.itjust.workslinkfedilinkEnglisharrow-up1·5 days agoInterestingly, LLMs are horrible at Zork: https://arxiv.org/abs/2602.15867 Our results reveal that all tested models achieve less than 10% completion on average, with even the best-performing model (Claude Opus 4.5) reaching only approximately 75 out of 350 possible points
You are likely to be eaten by a grue.
That has more to do with the darkness than his LLM use.
Interestingly, LLMs are horrible at Zork: https://arxiv.org/abs/2602.15867