MIT MLSys Discussion Group

Improving language models by retrieving from trillions of tokens.

Summary. When generating outputs with language models, Retro searches and retrieves tokens from a database based on similarities with its input in the embedding space. Retro encodes and then incorporate the retrieved text into the intermediate representations of the language model via cross attention. The result is potentially to increase memory capacity of language model without significantly increasing the number of parameters.