Anthropic experiments with AI introspection

However, this ability to introspect is limited and “highly unreliable,” the Anthropic researchers emphasize. Models (at least for now) still cannot introspect the way humans can, or to the extent we do.

Checking its intentions

The Anthropic researchers wanted to know whether Claude could describe, and, in a sense, reflect on its reasoning. This required the researchers to compare Claude’s self-reported “thoughts” with internal processes, sort of like hooking up a human up to a brain monitor, asking questions, then analyzing the scan to map thoughts to the areas of the brain they activated.

The researchers tested model introspection with “concept injection,” which essentially involves plunking completely unrelated ideas (AI vectors) into a model when it’s thinking about something else. The model is then asked to loop back, identify the interloping thought, and accurately describe it. According to the researchers, this suggests that it’s “introspecting.”

Anthropic experiments with AI introspection

Checking its intentions

You Can Add USB Ports And Storage To Your Computer With One Accessory

You Can Run iOS Apps On Android Phones

Software development has a ‘996’ problem

Apple’s AirDrop Now (Sort Of) Works With Android Phones