On May 16, 2025, at 9:23 AM, Eric Lease Morgan <[log in to unmask]> wrote:
> What distance measure do you suggest I use when implementing vector similarity search?
>
> I have piles o' sentences. Almost more than I count, literally. I have successfully looped through subsets of these sentences, vectorized them (think "indexed"), and stored the result in a Postgres database through the use of an extension called pgvector...
>
> [1] https://github.com/pgvector/pgvector
> [2] https://medium.com/advanced-deep-learning/understanding-vector-similarity-b9c10f7506de
I have finished my investigations into vectorizing sentences, saving the results to a Postgres database, and querying the results. But alas, the linked suite (below), while very functional, is incomplete and poorly described because I subsequently learned how to do all of the same things and more with SQLite and an SQLite module called sqlite_vec. [1, 2, 3]
That said, if your computing stack needs/requires Postgres, then the attached zip file may speed up your investigations.
[1] temporarily available suite of Python scripts - https://distantreader.org/tmp/vectors2postgres.zip
[2] sqlite_vec home - https://github.com/asg017/sqlite-vec
[3] sqlite_vec documentation - https://alexgarcia.xyz/sqlite-vec/installation.html
--
Eric Morgan
University of Notre Dame
|