There’s a race in direction of language fashions with longer context home windows. However how good are they, and the way can we all know?
This text was initially revealed on Art Fish Intelligence.
The context window of huge language fashions — the quantity of textual content they will course of without delay — has been growing at an exponential price.
In 2018, language fashions like BERT, T5, and GPT-1 might take as much as 512 tokens as enter. Now, in summer time of 2024, this quantity has jumped to 2 million tokens (in publicly accessible LLMs). However what does this imply for us, and the way can we consider these more and more succesful fashions?
The not too long ago launched Gemini 1.5 Pro model can take in up to 2 million tokens. However what does 2 million tokens even imply?
If we estimate 4 phrases to roughly equal about 3 tokens, it implies that 2 million tokens can (virtually) match your complete Harry Potter and Lord of the Ring sequence.
(The whole phrase depend of all seven books within the Harry Potter sequence is 1,084,625. The whole phrase depend of all seven books within the Lord of the Ring sequence is 481,103. (1,084,625 +…