I don't understand how the RAG (Retrieval Augmented Generation) should work. #19088

Answered by rgaricano

creasysee asked this question in Q&A

creasysee
Nov 10, 2025

I have a configuration that uses Open WebUI on docker, Ollama on a dedicated server that has 3 GPU and MinerU on a the same docker where the Open WebUI is installed. The MinerU uses GPU. The Open WebUI is configured to use for embedding the 'nomic-embed-text' model. Text Splitter is set to Chunk Size: 1000 and Chunk Overlap 100, it is default.

Step 1. The MinerU works correctly: an one parsed document (PDF, 143 KB) was uploaded via Open Web UI to a Knowledge. After it I made a prompt that related with the provided doc and a response from a model (mistral-small3.1) was good.

Step 2. I added a second document (PDF, 737 KB) to Knowledge in the same collection, requested a new prompt and got a wrong response (not relevant) and an ollama error:

level=WARN source=runner.go:171 msg="truncating input prompt" limit=8192 prompt=22381 keep=4 new=8192

I increased model 'num_ctx' parameter for 26000 and tried again. The response was done and looked correct (relevant).

Step 3. I added a third document (PDF, 749 KB) in the same collection, requested a new prompt and got a wrong (not relevant) response and an ollama error:

level=WARN source=runner.go:171 msg="truncating input prompt" limit=26000 prompt=56813 keep=4 new=26000

I increased model 'num_ctx' parameter for 54000 and tried again. The response was done and looked correct, but speed were reduced.

My question is: how can I configure it for work with 1000 PDF documents of 500 KB each in one collection? 10000 PDFs? The mistral-small3.1 model supports 131072 context length and a simple mathematical calculation shows that I cannot do it. Where am I going wrong, what do I need check/setup/configure here? I tried the 'paraphrase-multilingual' model instead of the 'nomic-embed-text' with the same result.

Answered by rgaricano

and in retrieval?

View full answer

Replies: 2 comments 3 replies

rgaricano
Nov 10, 2025

Don't Bypass Embedding and Retrieval, nor Full Context Mode (in adminSettings/Documents)

3 replies

creasysee Nov 10, 2025
Author

The Bypass Embedding and Retrieval option is turned off:

rgaricano Nov 10, 2025

and in retrieval?

Answer selected by creasysee

creasysee Dec 3, 2025
Author

It's work! Thanks!

creasysee
Nov 11, 2025
Author

I found this option and it was turned on. I turned it off, will review results shortly.

0 replies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment