Because the coaching prices of machine studying fashions rise [1], continuous studying (CL) emerges as a helpful countermeasure. In CL, a machine studying mannequin (e.g., a LLM akin to GPT), is skilled on a regularly arriving stream of knowledge (e.g., textual content information). Crucially, in CL the info can’t be saved, and thus solely the latest information is offered for coaching. The primary problem is then to coach on the present information (typically referred to as job) whereas not forgetting the data discovered from the outdated duties. Not forgetting outdated data is essential as a result of at test-time, the mannequin is evaluated on the test-data of all seen duties. That problem is commonly described as catastrophic forgetting within the literature, and is a part of the stability-plasticity tradeoff.
One the one hand, the stability-plasticity tradeoff refers to preserving community parameters (e.g., layer weights) steady to not neglect (stability). However, it means to permit parameter modifications as a way to study from novel duties (plasticity). CL strategies strategy this tradeoff from a number of instructions, which I’ve written about in a previous article.