On this article we are going to discover why 128K tokens (and extra) fashions can’t totally change utilizing RAG.
We’ll begin with a short reminder of the issues that may be solved with RAG, earlier than wanting on the enhancements in LLMs and their influence on the want to make use of RAG.
RAG isn’t actually new
The thought of injecting a context to let a language mannequin get entry to up-to-date information is sort of “previous” (on the LLM stage). It was first launched by Fb AI/Meta researcher on this 2020 paper “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks”. Compared the primary model of ChatGPT was solely launched on November 2022.
On this paper they distinguish two type of reminiscence:
- the parametric one, which is what’s inherent to the LLM, what it realized whereas being fed lot and lot of texts throughout coaching,
- the non-parametric one, which is the reminiscence you possibly can present by feeding a context to the immediate.