Skip to content

Commit e889f72

Browse files
authored
Update embeddings.md to use cosine distance in pgvector example (#716)
1 parent 08fa0cd commit e889f72

File tree

1 file changed

+5
-15
lines changed

1 file changed

+5
-15
lines changed

pgml-dashboard/static/docs/guides/transformers/embeddings.md

Lines changed: 5 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -63,28 +63,18 @@ ORDER BY similarity DESC
6363
LIMIT 50;
6464
```
6565

66-
```
67-
WITH query AS (
68-
SELECT pgml.embed('sentence-transformers/all-MiniLM-L6-v2', 'Star Wars christmas special is on Disney') AS embedding
69-
)
70-
SELECT text, pgml.cosine_similarity(tweet_embeddings_2.embedding, query.embedding) AS similarity
71-
FROM tweet_embeddings_2, query
72-
ORDER BY similarity DESC
73-
LIMIT 50;
74-
```
7566
On small datasets (<100k rows), a linear search that compares every row to the query will give sub-second results, which may be fast enough for your use case. For larger datasets, you may want to consider various indexing strategies offered by additional extensions.
7667

7768
- [Cube](https://www.postgresql.org/docs/current/cube.html) is a built-in extension that provides a fast indexing strategy for finding similar vectors. By default it has an arbitrary limit of 100 dimensions, unless Postgres is compiled with a larger size.
7869
- [PgVector](https://github.com/pgvector/pgvector) supports embeddings up to 2000 dimensions out of the box, and provides a fast indexing strategy for finding similar vectors.
7970

8071
```
8172
CREATE EXTENSION vector;
82-
CREATE TABLE items (text text, embedding vector(384));
83-
insert into items select text, embedding from tweet_embeddings_2;
73+
CREATE TABLE items (text TEXT, embedding VECTOR(768));
74+
INSERT INTO items SELECT text, embedding FROM tweet_embeddings;
75+
CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops);
8476
WITH query AS (
85-
SELECT pgml.embed('sentence-transformers/all-MiniLM-L6-v2', 'Star Wars christmas special is on Disney')::vector AS embedding
77+
SELECT pgml.embed('distilbert-base-uncased', 'Star Wars christmas special is on Disney')::vector AS embedding
8678
)
87-
SELECT * FROM items, query ORDER BY items.embedding <-> query.embedding LIMIT 10;
88-
89-
CREATE INDEX ON tweet_embeddings_2 USING ivfflat (embedding vector_cosine_ops);
79+
SELECT * FROM items, query ORDER BY items.embedding <=> query.embedding LIMIT 10;
9080
```

0 commit comments

Comments
 (0)