You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- An instance of the Database class is created by passing the connection information.
99
101
- The method [`create_or_get_collection`](#create-or-get-a-collection) collection with the name `test_pgml_sdk_1` is retrieved if it exists or a new collection is created.
This creates a new schema in a PostgreSQL database if it does not already exist and creates tables and indices for documents, chunks, models, splitters, and embeddings.
201
212
202
213
#### Upsert Documents
203
214
204
215
```python
205
-
collection.upsert_documents(documents)
216
+
awaitcollection.upsert_documents(documents)
206
217
```
207
218
208
219
The method is used to insert or update documents in a database table based on their ID, text, and metadata.
209
220
210
221
#### Generate Chunks
211
222
212
223
```python
213
-
collection.generate_chunks(splitter_id=1)
224
+
awaitcollection.generate_chunks(splitter_id=1)
214
225
```
215
226
216
227
This method is used to generate chunks of text from unchunked documents using a specified text splitter. By default it uses `RecursiveCharacterTextSplitter` with default parameters. `splitter_id` is optional. You can pass a `splitter_id` corresponding to a new splitter that is registered. See below for `register_text_splitter`.
This methods generates embeddings uing the chunks from the text. By default it uses `intfloat/e5-small` embeddings model. `model_id` is optional. You can pass a `model_id` corresponding to a new model that is registered and `splitter_id`. See below for `register_model`.
@@ -227,53 +238,42 @@ This methods generates embeddings uing the chunks from the text. By default it u
227
238
#### Vector Search
228
239
229
240
```python
230
-
results = collection.vector_search("Who won 20 grammy awards?", top_k=2, model_id=1, splitter_id=1)
241
+
results =awaitcollection.vector_search("Who won 20 grammy awards?", top_k=2, model_id=1, splitter_id=1)
231
242
```
232
243
233
244
This method converts the input query into embeddings and searches embeddings table for nearest match. You can change the number of results using `top_k`. You can also pass specific `splitter_id` and `model_id` that were used for chunking and generating embeddings.
234
245
235
246
#### Register Model
236
247
237
248
```python
238
-
collection.register_model(model_name="hkunlp/instructor-xl", model_params={"instruction": "Represent the Wikipedia document for retrieval: "})
249
+
awaitcollection.register_model(model_name="hkunlp/instructor-xl", model_params={"instruction": "Represent the Wikipedia document for retrieval: "})
239
250
```
240
251
241
252
This function allows for the registration of a model in a database, creating a record if it does not already exist. `model_name` is the name of the open source HuggingFace model being registered and `model_params` is a dictionary containing parameters for configuring the model. It can be empty if no parameters are needed.
This function allows for the registration of a text spliter in a database, creating a record if it doesn't already exist. `splitter_name` is the name of the splitter from [LangChain](https://python.langchain.com/en/latest/reference/modules/text_splitter.html) and `splitter_params` are chunking parameters that the splitter supports.
250
-
251
-
252
-
### Developer Setup
253
-
1. Install Python 3.11. SDK should work for Python >=3.8.
254
-
2. Install poetry `pip install poetry`
255
-
3. Initialize Python environment
256
-
257
-
```
258
-
poetry env use python3.11
259
-
poetry shell
260
-
poetry install
261
-
poetry build
262
-
```
263
-
4. SDK uses your local PostgresML database by default
If it is not up to date with `pgml.embed` please [signup for a free database](https://postgresml.org/signup) and set `PGML_CONNECTION` environment variable with serverless hosted database.
260
+
This function allows for the registration of a text spliter in a database, creating a record if it doesn't already exist. Following [LangChain](https://python.langchain.com/en/latest/reference/modules/text_splitter.html) splitters are supported.
0 commit comments