I and other LibraryThing developers have done a lot of this work in the
process of making Talpa.ai, so here's my quick take:
1. Fine-tuning is a poor way to add knowledge to an LLM, especially at
scale. It's mostly useful for controlling how LLM "thinking" is
presented—for example ensuring clean, standardized output. It can also be
helpful at reducing how many input tokens you need to use, and speed up the
results. This is our experience; yours may be different. But it's at least
a common view. (See
https://www.reddit.com/r/LocalLLaMA/comments/16q13lm/this_research_may_explain_why_finetuning_doesnt/
.)
2. RAG is more liable to get your results. It's good at validation and when
the model has no clue about some facts. So, for example, if you want to use
proprietary content to answer a query, you can use a vectorized search to
find content, then feed them to an LLM (which is all RAG is) and see what
happens. You can fine-tune the model you use for RAG to ensure the output
is clean and standard. RAG can be cheap, but it tends to involve making
very long prompts, so if you're using a commercial service, you'll want to
think about the cost of input tokens. Although cheaper than output tokens,
they add up fast!
Anyway, RAG is probably what you want, but the way people throw around RAG
now you'd think it was some fantastic new idea that transcends the
limitations of LLMs. It's really not. RAG is just giving LLMs some of
what you want them to think about, and hoping they think through it well.
You still need to feed it the right data, and just because you give it
something to think about doesn't mean it will think through it well. If
LLMs are "unlimited, free stupid people" they are in effect "unlimited,
free stupid people in possession of the text I found."
You can find a deeper critique of RAG by Gary Marcus here:
https://garymarcus.substack.com/p/no-rag-is-probably-not-going-to-rescue
I'm eager to hear how things go!
I would, of course, be grateful for any feedback on Talpa (
https://www.talpa.ai), which is in active development with a new version
due any day now. It also uses a third technique, which probably has a name.
That technique is using LLMs not for their knowledge or for RAG, but to
parse user queries in such a way that they can be answered by library data
systems, not LLMs. LLMs can parse language incorrectly, but language is
their greatest strength and, unlike facts and interpretations, seldom
involves hallucinations. Then we use real, authoritative library and book
data, which has no hallucination problem.
Best,
Tim
On Mon, Feb 26, 2024 at 4:07 PM Eric Lease Morgan <
[log in to unmask]> wrote:
> Who out here in Code4Lib Land is practicing with either one or both of the
> following things: 1) fine-tuning large-language models, or 2)
> retrieval-augmented generation (RAG). If there is somebody out there, then
> I'd love to chat.
>
> When it comes to generative AI -- things like ChatGPT -- one of the first
> things us librarians say is, "I don't know how I can trust those results
> because I don't know from whence the content originated." Thus, if we were
> create our own model, then we can trust the results. Right? Well, almost.
> The things of ChatGPT are "large language models" and the creation of such
> things are very expensive. They require more content than we have, more
> computing horsepower than we are willing to buy, and more computing
> expertise than we are willing to hire. On the other hand there is a process
> called "fine-tuning", where one's own content is used to supplement an
> existing large-language model, and in the end the model knows about one's
> own content. I plan to experiment with this process; I plan to fine-tune an
> existing large-language model and experiment with it use.
>
> Another approach to generative AI is called RAG -- retrieval-augmented
> generation. In this scenerio, one's content is first indexed using any
> number of different techniques. Next, given a query, the index is searched
> for matching documents. Third, the matching documents are given as input to
> the large-language model, and the model uses the documents to structure the
> result -- a simple sentence, a paragraph, a few paragraphs, an outline, or
> some sort of structured data (CSV, JSON, etc.). In any case, only the
> content given to the model is used for analysis, and the model's primary
> purpose is to structure the result. Compared to fine-tuning, RAG is
> computationally dirt cheap. Like fine-tuning, I plan to experiment with RAG.
>
> To the best of my recollection, I have not seen very much discussion on
> this list about the technological aspects of fine-tuning nor RAG. If you
> are working these technologies, then I'd love to hear from you. Let's share
> war stories.
>
> --
> Eric Morgan <[log in to unmask]>
> Navari Family Center for Digital Scholarship
> University of Notre Dame
>
--
Check out my library at https://www.librarything.com/profile/timspalding
|