You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pgml-dashboard/content/blog/generating-llm-embeddings-with-open-source-models-in-postgresml.md
+7-5Lines changed: 7 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -127,8 +127,10 @@ Since our corpus of documents (movie reviews) are all relatively short and simil
127
127
128
128
It takes a couple of minutes to download and cache the `intfloat/e5-small` model to generate the first embedding. After that, it's pretty fast.
129
129
130
+
Note how we prefix the text we want to embed with either `passage: ` or `query: `, the e5 model requires us to prefix our data with `passage: ` if we're generating embeddings for our corpus and `query: ` if we want to find semantically similar content.
131
+
130
132
```postgresql
131
-
SELECT pgml.embed('intfloat/e5-small', 'hi mom');
133
+
SELECT pgml.embed('intfloat/e5-small', 'passage: hi mom');
132
134
```
133
135
134
136
This is a pretty powerful function, because we can pass any arbitrary text to any open source model, and it will generate an embedding for us. We can benchmark how long it takes to generate an embedding for a single review, using client-side timings in Postgres:
@@ -147,7 +149,7 @@ Aside from using this function with strings passed from a client, we can use it
| Star Wars, Episode V: The Empire Strikes Back (Widescreen Edition) | 78 | 4.44 | 0.88717948717948718000 | 0.8295302273865711 | 0.9999999999999998 | 2.716709714566058 |
269
269
| Star Wars, Episode IV: A New Hope (Widescreen Edition) | 80 | 4.36 | 0.87250000000000000000 | 0.8339361274771777 | 0.9336656923446551 | 2.640101819821833 |
@@ -280,15 +280,15 @@ LIMIT 10;
280
280
281
281
!!!
282
282
283
-
Bingo. Now we're boosting movies by `(customer_cosine_similiarity - 0.9) * 10`, and we've kept our previous boost for movies with a high average star rating. Not only does Episode V top the list as expected, Episode IV is a close second. This query has gotten fairly complex! But the results are perfect for me, I mean our hypothetical customer who is searching for "Best 1980's scifi movie" but has already revealed to us with their one movie review that they think like the comment "I love all Star Wars, but Empire Strikes Back is particularly amazing". I promise I'm not just doing all of this to find a new movie to watch tonight.
283
+
Bingo. Now we're boosting movies by `(customer_cosine_similarity - 0.9) * 10`, and we've kept our previous boost for movies with a high average star rating. Not only does Episode V top the list as expected, Episode IV is a close second. This query has gotten fairly complex! But the results are perfect for me, I mean our hypothetical customer who is searching for "Best 1980's scifi movie" but has already revealed to us with their one movie review that they think like the comment "I love all Star Wars, but Empire Strikes Back is particularly amazing". I promise I'm not just doing all of this to find a new movie to watch tonight.
284
284
285
285
You can compare this to our non-personalized results from the previous article for reference Forbidden Planet used to be the top result, but now it's #3.
0 commit comments