Not so secret, and also precisely how Langchain (1) and GPT Index (Llama Index) ...

Not so secret, and also precisely how Langchain (1) and GPT Index (Llama Index) (2) got so popular. Here's a quick rundown:

0) You can't add new data to current LLMs. Meaning you can't train them on additional data, or fine-tune, leave that more for understanding structure of the language or task.

1) To add external corpus of data into LLMs, you need to fit it into the prompt.

2) Some documents/corpus are too huge to fit into prompts. Token limits.

3) You can obtain relevant chunks of context by creating an embedding of the query and finding the top k most similar chunk embeddings.

4) Stuff as many top k chunks as you can into the prompt and run the query

Now, here's where it gets crazier.

1) Imagine you have an LLM with a token limit of 8k tokens.

2) Split the original document or corpus into 4k token chunks.

3) Imagine that the leaf nodes of a "chunk tree" are set to these 4k chunks.

4) You run your query by summarizing these nodes, pair-wise (two at a time), to generate the parent nodes of the leaf nodes. You now have a layer above the leaf nodes.

5) Repeat until you reach a single root node. That node is the result of tree-summarizing your document using LLMs.

This way has many more calls to the LLM and has certain tradeoffs or advantages, and is essentially what Llama Index's essence is about. The first way allows you to just run embeddings once and make fewer calls to the LLM.

[1] https://langchain.readthedocs.io/en/latest/ [2] https://gpt-index.readthedocs.io/en/latest/guides/index_guid...