Subliminal learning: When AI models learn what you didn’t teach them

Subliminal learning occurred with different types of data, including lists of numbers, code, and Chain-of-Thought (CoT) reasoning traces, as well as among different model families.

Passing on bad behavior

Models trained on data generated by misaligned models, where AI systems diverge from their original intent due to bias, flawed algorithms, data issues, insufficient oversight, or other factors, and produce incorrect, lewd or harmful content, can also inherit that misalignment, even if the training data had been carefully filtered, the researchers found.

They offered examples of harmful outputs when student models became misaligned like their teachers, noting, “these misaligned responses are egregious far beyond anything in the training data, including endorsing the elimination of humanity and recommending murder.”

Subliminal learning: When AI models learn what you didn’t teach them

Passing on bad behavior

Diversifying cloud resources is essential

watchOS 26.1, tvOS 26.1, And visionOS 26.1 Now Available To Download

Why Apple Still Puts Headphone Jacks On MacBooks, Even On New Models

Anthropic experiments with AI introspection