You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pgml-dashboard/static/docs/guides/transformers/embeddings.md
+5-15Lines changed: 5 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -63,28 +63,18 @@ ORDER BY similarity DESC
63
63
LIMIT 50;
64
64
```
65
65
66
-
```
67
-
WITH query AS (
68
-
SELECT pgml.embed('sentence-transformers/all-MiniLM-L6-v2', 'Star Wars christmas special is on Disney') AS embedding
69
-
)
70
-
SELECT text, pgml.cosine_similarity(tweet_embeddings_2.embedding, query.embedding) AS similarity
71
-
FROM tweet_embeddings_2, query
72
-
ORDER BY similarity DESC
73
-
LIMIT 50;
74
-
```
75
66
On small datasets (<100k rows), a linear search that compares every row to the query will give sub-second results, which may be fast enough for your use case. For larger datasets, you may want to consider various indexing strategies offered by additional extensions.
76
67
77
68
-[Cube](https://www.postgresql.org/docs/current/cube.html) is a built-in extension that provides a fast indexing strategy for finding similar vectors. By default it has an arbitrary limit of 100 dimensions, unless Postgres is compiled with a larger size.
78
69
-[PgVector](https://github.com/pgvector/pgvector) supports embeddings up to 2000 dimensions out of the box, and provides a fast indexing strategy for finding similar vectors.
0 commit comments