Researchers propose a self-distillation fix for ‘catastrophic forgetting’ in LLMs

During training, the same model plays two roles. A teacher version is conditioned on both the query and expert examples. A student version sees only the query, reflecting real-world deployment. The student updates its parameters to align with the teacher’s predictions on its own generated outputs.

“In sequential learning experiments, SDFT enables a single model to accumulate multiple skills over time without performance regression, establishing on-policy distillation as a practical path to continual learning from demonstrations,” the researchers said.

Challenges to overcome

SDFT appears quite realistic as the technique removes the need for maintaining “model zoos” of separate adapters or fine-tuned variants, according to Lian Jye Su, chief analyst at Omdia.

Donner Music, make your music with gear
Multi-Function Air Blower: Blowing, suction, extraction, and even inflation

Leave a reply

Please enter your comment!
Please enter your name here