Anthropic experiments with AI introspection

However, this ability to introspect is limited and “highly unreliable,” the Anthropic researchers emphasize. Models (at least for now) still cannot introspect the way humans can, or to the extent we do.

Checking its intentions

The Anthropic researchers wanted to know whether Claude could describe, and, in a sense, reflect on its reasoning. This required the researchers to compare Claude’s self-reported “thoughts” with internal processes, sort of like hooking up a human up to a brain monitor, asking questions, then analyzing the scan to map thoughts to the areas of the brain they activated.

The researchers tested model introspection with “concept injection,” which essentially involves plunking completely unrelated ideas (AI vectors) into a model when it’s thinking about something else. The model is then asked to loop back, identify the interloping thought, and accurately describe it. According to the researchers, this suggests that it’s “introspecting.”

Donner Music, make your music with gear
Multi-Function Air Blower: Blowing, suction, extraction, and even inflation

Leave a reply

Please enter your comment!
Please enter your name here