Retrievers in LLM

Retrievers in LLMยถ

Summaryยถ

Retrievers in Large Language Models (LLMs)๋Š” LLM์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ์™ธ๋ถ€ ์ •๋ณด๋ฅผ ๊ฒ€์ƒ‰ํ•˜๊ณ  ์ œ๊ณตํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. Retrieval-augmented Generation (RAG)๊ณผ ๊ฐ™์€ ๊ธฐ์ˆ ์€ LLM์ด ์™ธ๋ถ€ ์ •๋ณด๋ฅผ ๊ฒ€์ƒ‰ํ•˜๊ณ  ์ด๋ฅผ ์ž…๋ ฅ ์ปจํ…์ŠคํŠธ์— ํ†ตํ•ฉํ•˜์—ฌ ์ตœ์ข… ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ retriever์™€ LLM์˜ ์„ ํ˜ธ๋„ ์ฐจ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋ฉฐ, ์ด๋ฅผ ์œ„ํ•ด retriever์™€ LLM์„ ํ•จ๊ป˜ fine-tuningํ•˜๊ฑฐ๋‚˜ LLM๋งŒ fine-tuningํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค.

Key Conceptsยถ

  • Retrieval-augmented Generation (RAG) : LLM์ด ์™ธ๋ถ€ ์ •๋ณด๋ฅผ ๊ฒ€์ƒ‰ํ•˜๊ณ  ์ด๋ฅผ ์ž…๋ ฅ ์ปจํ…์ŠคํŠธ์— ํ†ตํ•ฉํ•˜์—ฌ ์ตœ์ข… ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค.

  • Retriever : ์™ธ๋ถ€ ์ •๋ณด๋ฅผ ๊ฒ€์ƒ‰ํ•˜๊ณ  LLM์— ์ œ๊ณตํ•˜๋Š” ๋ชจ๋“ˆ์ž…๋‹ˆ๋‹ค.

  • Fine-tuning : retriever์™€ LLM์„ ํ•จ๊ป˜ ๋˜๋Š” ๊ฐœ๋ณ„์ ์œผ๋กœ ํ•™์Šตํ•˜์—ฌ ์„ ํ˜ธ๋„ ์ฐจ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

  • Preference Gap : retriever์™€ LLM์˜ ์„ ํ˜ธ๋„ ์ฐจ์ด๋ฅผ ์˜๋ฏธํ•˜๋ฉฐ, ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

Referencesยถ

URL Name

URL

Bridging the Preference Gap between Retrievers and LLMs

https://arxiv.org/html/2401.06954v1

Langchain: How to view the context my retriever used when invoke

https://stackoverflow.com/questions/78322637/langchain-how-to-view-the-context-my-retriever-used-when-invoke

Hi can we have multiple retrievers in the retrievalQA chain?

https://github.com/langchain-ai/langchain/discussions/16898

Neural Retrievers are Biased Towards LLM-Generated Content

https://arxiv.org/abs/2310.20501

How to include metadata of retrieved content in the Output of retriever

https://www.reddit.com/r/LangChain/comments/1b1k4p7/how_to_include_metadata_of_retrieved_content_in/