From 799a43318d79ba1e54f4fc48bb4feaf19be12d1b Mon Sep 17 00:00:00 2001 From: aplchiangh Date: Wed, 5 Jul 2023 12:51:06 -0400 Subject: [PATCH 1/5] startig js sdk documentation Update index.js update readme update readme update readme add usage links add example repo add full example add more add env update readme add vector search res remove examples add info to example revert test changes fix caps --- pgml-sdks/rust/pgml/javascript/README.md | 566 ++++++++++++++++++ .../rust/pgml/javascript/examples/README.md | 7 + .../examples/getting-started/README.md | 12 + .../examples/getting-started/index.js | 47 ++ .../getting-started/package-lock.json | 18 + .../examples/getting-started/package.json | 15 + .../javascript/tests/javascript-tests/test.js | 2 +- 7 files changed, 666 insertions(+), 1 deletion(-) create mode 100644 pgml-sdks/rust/pgml/javascript/examples/README.md create mode 100644 pgml-sdks/rust/pgml/javascript/examples/getting-started/README.md create mode 100644 pgml-sdks/rust/pgml/javascript/examples/getting-started/index.js create mode 100644 pgml-sdks/rust/pgml/javascript/examples/getting-started/package-lock.json create mode 100644 pgml-sdks/rust/pgml/javascript/examples/getting-started/package.json diff --git a/pgml-sdks/rust/pgml/javascript/README.md b/pgml-sdks/rust/pgml/javascript/README.md index 3446b8ca3..d5ad8f79f 100644 --- a/pgml-sdks/rust/pgml/javascript/README.md +++ b/pgml-sdks/rust/pgml/javascript/README.md @@ -1 +1,567 @@ # Open Source Alternative for Building End-to-End Vector Search Applications without OpenAI & Pinecone + +## Table of Contents + +- [Overview](#overview) +- [Quickstart](#quickstart) +- [Usage](#usage) + - [Create or Get a Collection](#create-or-get-a-collection) + - [Upsert Documents](#upsert-documents) + - [Generate Chunks](#generate-chunks) + - [Generate Embeddings](#generate-embeddings) + - [Vector Search](#vector-search) + - [Register Model](#register-model) + - [Register Text Splitter](#register-text-splitter) +- [Developer setup](#developer-setup) +- [API Reference](#api-reference) +- [Roadmap](#roadmap) + +## Overview + +The pgml SDK is designed to facilitate the development of scalable vector search applications on PostgreSQL databases. With this SDK, you can seamlessly manage various database tables related to documents, text chunks, text splitters, LLM (Language Model) models, and embeddings. By leveraging the SDK's capabilities, you can efficiently index LLM embeddings using PgVector for fast and accurate queries. + +### Key Features + +- **Automated Database Management**: With the SDK, you can easily handle the management of database tables related to documents, text chunks, text splitters, LLM models, and embeddings. This automated management system simplifies the process of setting up and maintaining your vector search application's data structure. + +- **Embedding Generation from Open Source Models**: The Javascript SDK provides the ability to generate embeddings using hundreds of open source models. These models, trained on vast amounts of data, capture the semantic meaning of text and enable powerful analysis and search capabilities. + +- **Flexible and Scalable Vector Search**: The Javascript SDK empowers you to build flexible and scalable vector search applications. The Javascript SDK seamlessly integrates with PgVector, a PostgreSQL extension specifically designed for handling vector-based indexing and querying. By leveraging these indices, you can perform advanced searches, rank results by relevance, and retrieve accurate and meaningful information from your database. + +### Use Cases + +Embeddings, the core concept of the pgml SDK, find applications in various scenarios, including: + +- Search: Embeddings are commonly used for search functionalities, where results are ranked by relevance to a query string. By comparing the embeddings of query strings and documents, you can retrieve search results in order of their similarity or relevance. + +- Clustering: With embeddings, you can group text strings by similarity, enabling clustering of related data. By measuring the similarity between embeddings, you can identify clusters or groups of text strings that share common characteristics. + +- Recommendations: Embeddings play a crucial role in recommendation systems. By identifying items with related text strings based on their embeddings, you can provide personalized recommendations to users. + +- Anomaly Detection: Anomaly detection involves identifying outliers or anomalies that have little relatedness to the rest of the data. Embeddings can aid in this process by quantifying the similarity between text strings and flagging outliers. + +- Classification: Embeddings are utilized in classification tasks, where text strings are classified based on their most similar label. By comparing the embeddings of text strings and labels, you can classify new text strings into predefined categories. + +### How the SDK Works + +The SDK streamlines the development of vector search applications by abstracting away the complexities of database management and indexing. Here's an overview of how the SDK works: + +- **Document and Text Chunk Management**: The SDK provides a convenient interface to create, update, and delete documents and their corresponding text chunks. You can easily organize and structure your text data within the PostgreSQL database. + +- **Open Source Model Integration**: With the SDK, you can seamlessly incorporate a wide range of open source models to generate high-quality embeddings. These models capture the semantic meaning of text and enable powerful analysis and search capabilities. + +- **Embedding Indexing**: The Javascript SDK utilizes the PgVector extension to efficiently index the embeddings generated by the open source models. This indexing process optimizes search performance and allows for fast and accurate retrieval of relevant results. + +- **Querying and Search**: Once the embeddings are indexed, you can perform vector-based searches on the documents and text chunks stored in the PostgreSQL database. The SDK provides intuitive methods for executing queries and retrieving search results. + +## Quickstart + +Follow the steps below to quickly get started with the Javascript SDK for building scalable vector search applications on PostgresML databases. + +### Prerequisites + +Before you begin, make sure you have the following: + +- PostgresML Database: You can [sign up for a free GPU-powered database](https://postgresml.org/signup) or [spin up a database using Docker](https://github.com/postgresml/postgresml#installation). Ensure you have a PostgresML database version >`2.3.1`. Set the `PGML_CONNECTION` environment variable to the connection string of your PostgresML database. If not set, the SDK will use the default connection string for your local installation `postgres://postgres@127.0.0.1:5433/pgml_development`. If you are running Postgres locally then make sure: + + - Python version >=3.8.1 + + - `postgresql` command line utility + - Ubuntu: `sudo apt install libpq-dev` + - Centos/Fedora/Cygwin/Babun: `sudo yum install libpq-devel` + - Mac: `brew install postgresql` + +### Installation + +To install the Javascript SDK: + +```bash +npm install pgml +``` + +or + +```bash +yarn add pgml +``` + +### Example Usage + +In the example below we will step through the code required to create a collection, upsert documents, generate chunks, generate embeddings, and perform vector search. + +#### Initialize project + +Run the following command to create a new npm project: + +```bash +mkdir pgml_example && cd pgml_example && npm init +``` + +Install required npm packages: + +```bash +npm install pgml dotenv +``` + +Create index.js file: + +```bash +touch index.js .env +``` + +Add your postgres connection string to the .env file: + +```bash +PGML_CONNECTION=postgres://postgres@localhost:5433/pgml_development +``` + +#### Create a collection + +Add the following code to index.js: + +```javascript +const pgml = require("pgml"); +require("dotenv").config(); + +const CONNECTION_STRING = + process.env.PGML_CONNECTION || + "postgres://postgres@127.0.0.1:5433/pgml_development"; + +const db = await pgml.newDatabase(CONNECTION_STRING); + +const main = async () => { + const collection_name = "hello_world"; + const collection = await db.create_or_get_collection(collection_name); +}; + +main().then((results) => { + console.log("Vector search Results: ", results); +}); +``` + +**Explanation:** + +- The code imports the pgml sdk and dotenv. +- It defines the CONNECTION_STRING variable with the default local connection string, and retrieves the connection information from the PGML_CONNECTION environment variable or uses the default if not set. +- An instance of the Database class is created by passing the connection information. +- The method [`create_or_get_collection`](#create-or-get-a-collection) collection with the name `hello_world` is retrieved if it exists or a new collection is created. + +Continuing within `main()` + +```javascript +const documents = [ + { + name: "Document One", + text: "document one contents...", + }, + { + name: "Document Two", + text: "document two contents...", + }, +]; +await collection.upsert_documents(documents); +await collection.generate_chunks(); +await collection.generate_embeddings(); +``` + +**Explanation:** + +- We define a list of documents with the `name` and `text` fields. The "text" field contains the string that will be embedded. The other fields will be stored as a 'metadata' object. +- The [`upsert_documents`](#upsert-documents) method is called to insert or update the documents in the collection. +- The [`generate_chunks`](#generate-chunks) method splits the documents into smaller text chunks for efficient indexing and search. +- The [`generate_embeddings`](#generate-embeddings) method generates embeddings for the documents in the collection. + +Continuing within `main()` + +```javascript +const results = await collection.vector_search( + "What are the contents of document one?", + {}, + 1 +); +// convert the results to array of objects +const results = queryResults.map((result) => { + const [similarity, text, metadata] = result; + return { + similarity, + text, + metadata, + }; +}); +await db.archive_collection(collection_name); +return results; +``` + +**Explanation:** + +- The [`vector_search`](#vector-search) method is used to perform a vector-based search on the collection. The first argument, `What are the contents of document one?`, represents the text for which you want to find the most similar results. The second argument is an object that holds the embedding model parameters, and in this case, it is empty. The third argument specifies the number of results to return. +- Next we convert the results to an array of objects. +- Lastly, the `archive_collection` method is called to archive the collection and free up resources in the PostgresML database. + +Call `main` function. + +```javascript +main().then((results) => { + console.log("success!", results); +}); +``` + +**Putting it all together** + +Assuming you followed along, you should have a file `index.js` that looks like this: + +```javascript +const pgml = require("pgml"); +require("dotenv").config(); + +const CONNECTION_STRING = + process.env.PGML_CONNECTION || + "postgres://postgres@127.0.0.1:5433/pgml_development"; + +const main = async () => { + const db = await pgml.newDatabase(CONNECTION_STRING); + const collection_name = "hello_world_2"; + const collection = await db.create_or_get_collection(collection_name); + const documents = [ + { + name: "Document One", + text: "document one contents...", + }, + { + name: "Document Two", + text: "document two contents...", + }, + ]; + await collection.upsert_documents(documents); + await collection.generate_chunks(); + await collection.generate_embeddings(); + const queryResults = await collection.vector_search( + "What are the contents of document one?", // query text + {}, // embedding model parameters + 1 // top_k + ); + + // convert the results to array of objects + const results = queryResults.map((result) => { + const [similarity, text, metadata] = result; + return { + similarity, + text, + metadata, + }; + }); + + await db.archive_collection(collection_name); + return results; +}; + +main().then((results) => { + console.log("Vector search Results: ", results); +}); +``` + +Execute the following command: + +``` +node index.js +``` + +You should see the search results printed in the terminal. As you can see, our vector search engine found the right text chunk with the answer we are looking for. + +```json +[ + { + "similarity": 0.917946581193032, + "text": "document one contents...", + "metadata": { "name": "Document One" } + } +] +``` + +## Usage + +### High-level Description + +The Javascript SDK provides a set of functionalities to build scalable vector search applications on PostgresQL databases. It enables users to create a collection, which represents a schema in the database, to store tables for documents, chunks, models, splitters, and embeddings. The Collection class in the SDK handles all operations related to these tables, allowing users to interact with the collection and perform various tasks. + +### Connect to Database + +`.newDatabase(CONNECTION_STRING)` + +This method establishes a connection to a new database. + +#### Parameters: + +- `CONNECTION_STRING` (required): The connection string for the database. The connection string should be in the format of `postgres://username@hostname:port/database`. This parameter is required. + +The method initializes a connection pool to the DB and creates a table named `pgml.collections` if it does not already exist. + +#### Usage: + +```javascript +const pgml = require("pgml"); + +const CONNECTION_STRING = + process.env.PGML_CONNECTION || + "postgres://postgres@127.0.0.1:5433/pgml_development"; + +const db = await pgml.newDatabase(CONNECTION_STRING); +``` + +### Create or Get a Collection + +`.create_or_get_collection(collection_name)` + +This method either creates a new collection or retrieves an existing one from a PostgreSQL database. + +#### Parameters: + +- `collection_name` (required): The name of the collection to be created or retrieved. This parameter is required and must be a string. + +If the collection already exists in the database, this method will return that collection. If the collection does not exist, this method will create a new collection with the specified name, along with the associated tables and indices for documents, chunks, models, splitters, and embeddings. + +#### Usage: + +```javascript +const collection_name = "test_collection"; +const collection = await db.create_or_get_collection(collection_name); +``` + +In the above example, a collection named test_collection is either created or retrieved from the database. + +Ensure that the name provided is unique if you intend to create a new collection, as this function will return the existing collection if a collection with the same name already exists in the database. + +### Upsert Documents + +`.upsert_documents(documents)` + +This method is used to insert or update documents in a database table based on their ID, text, and any additional fields. All the fields except `id` and `text` will be aggregated and stored in a `metadata` object. + +#### Parameters: + +- `documents` (required): An array of document objects to be inserted or updated in the database. Each document object should be a dictionary containing at least an `id` and `text` fields. Any other fields will be considered as metadata. + - `id`: A unique identifier for the document. If a document with the same ID already exists in the database, the document will be updated with the new text and metadata. + - `text`: The text content of the document. + - Other fields (optional): Any other fields in the document will be considered as metadata. The structure of the metadata can vary based on the specific needs of your application. + +#### Usage: + +```javascript +let documents = [ + { + id: "1", + text: "This is a sample document", + author: "John Doe", + date: "2023-07-05", + }, + { + id: "2", + text: "This is another sample document", + }, +]; + +await collection.upsert_documents(documents); +``` + +In the above example, two documents are upserted into the database. The first document includes additional fields `author` and `date`, which will be stored as metadata. + +To update a document, simply upsert a document with the same `id` but different `text` or additional fields. For example: + +```javascript +let updated_document = { + id: "1", + text: "This is an updated sample document", + author: "John Doe", + date: "2023-07-05", + version: "2.0", +}; + +await collection.upsert_documents([updated_document]); +``` + +In this case, the document with ID `1` is updated with new text and additional fields `version`. The fields `author`, `date`, and `version` will be stored as metadata. + +### Generate Chunks + +`.generate_chunks(splitter_id)` + +This method is used to generate chunks of text from unchunked documents using a specified text splitter. + +#### Parameters: + +- `splitter_id` (optional): The ID of the splitter used for segmenting the text into chunks. This parameter is optional. If not specified, it defaults to `1`, which corresponds to the `RecursiveCharacterTextSplitter` with default parameters. + +The `splitter_id` should correspond to a splitter that is already registered. If you want to use a different splitter, you need to register it first. + +#### Usage: + +```javascript +await collection.generate_chunks(1); +``` + +In the above example, `splitter_id` is specified as `1`, which means that the default `RecursiveCharacterTextSplitter` is used to generate chunks. + +To use a different splitter, you need to pass its corresponding ID. For example: + +```javascript +await collection.generate_chunks(2); +``` + +In this case, the splitter with ID `2` is used. Ensure that this ID corresponds to a registered splitter. If the splitter isn't registered yet, you'll need to do so before using this method. For information on how to register a new splitter, refer to the [`register_text_splitter`](#register-text-splitter) documentation. + +### Generate Embeddings + +`.generate_embeddings(splitter_id, model_id)` + +This method generates embeddings from the chunks of text. + +#### Parameters: + +- `splitter_id` (optional): The ID of the splitter used for segmenting the text into chunks. This parameter is optional. If not specified, it defaults to `1`. +- `model_id` (optional): The ID of the model used to generate embeddings. This parameter is optional. If not specified, it defaults to `1`, corresponding to the `intfloat/e5-small` embeddings model. + +Both `splitter_id` and `model_id` should correspond to a splitter and model that are already registered. + +#### Usage: + +```javascript +await collection.generate_embeddings(1, 1); +``` + +In the above example, both `splitter_id` and `model_id` are specified as `1`, which means that the default splitter and `intfloat/e5-small` model are used to generate embeddings. + +To use a different splitter or model, you need to pass their corresponding IDs. For example: + +```javascript +await collection.generate_embeddings(2, 3); +``` + +In this case, the splitter with ID `2` and the model with ID `3` are used. Ensure that these IDs correspond to a registered splitter and model. + +### Vector Search + +`.vector_search(query, top_k, splitter_id, model_id)` + +This method converts the input query into embeddings and searches the embeddings table for the nearest matches. + +#### Parameters: + +- `query` (required): The query text that needs to be converted into embeddings for vector search. +- `top_k` (optional): The number of top matches that should be returned. This parameter is optional. If not specified, it defaults to `10`. +- `splitter_id` (optional): The ID of the splitter used for segmenting the query text into chunks. This parameter is optional. If not specified, it defaults to `1`. +- `model_id` (optional): The ID of the model used to convert query into embeddings. This parameter is optional. If not specified, it defaults to `1`, corresponding to the `intfloat/e5-small` embeddings model. + +Both `splitter_id` and `model_id` should correspond to a splitter and model that are already registered. + +#### Usage: + +```javascript +const results = await collection.vector_search( + "Who won 20 grammy awards?", + 2, // top_k + 1, // splitter_id + 1 // model_id +); +``` + +The `vector_search` method returns an array of results, where each result is a an array where idx 0 is the similarity score, idx 1 is the text, and idx 2 is the metadata object. + +You can format the results into an array of objects like so: + +```javascript +// convert the results to array of objects +const results = queryResults.map((result) => { + const [similarity, text, metadata] = result; + return { + similarity, + text, + metadata, + }; +}); +``` + +### Register Model + +`.register_model(model_name, model_params)` + +This method allows for the registration of a model that can be used in the collection. It creates a record if the model does not already exist. + +#### Parameters: + +- `model_name` (required): The name of the open source HuggingFace model being registered. This should be a string that represents the model name. +- `model_params` (optional): A dictionary containing parameters for configuring the model. This parameter is optional and can be left empty if no special configuration is needed for the model. + +The `model_name` should correspond to a valid HuggingFace model name. The `model_params`, if provided, should be a dictionary where keys are parameter names and values are the corresponding settings. + +#### Usage: + +```javascript +const modelId = await collection.register_model("hkunlp/instructor-xl", { + instruction: "Represent the Wikipedia document for retrieval: ", +}); +``` + +In the above example, the `model_name` is "hkunlp/instructor-xl", and `model_params` is an object that sets the instruction to "Represent the Wikipedia document for retrieval: ". + +To register a model without any special parameters, you can simply pass the model name. For example: + +```javascript +const modelId = await collection.register_model("distilbert-base-uncased"); +``` + +In this case, the model "distilbert-base-uncased" is registered with the default parameters. Make sure that the model name corresponds to a valid HuggingFace model. + +### Register Text Splitter + +`.register_text_splitter(splitter_name, parameters)` + +This method registers a new text splitter in the system. + +#### Parameters: + +- `splitter_name` (required): The name of the splitter. The system currently supports the following splitter names: + + - `"character"` + - `"latex"` + - `"markdown"` + - `"nltk"` + - `"python"` + - `"recursive_character"` + - `"spacy"` + +- `parameters` (required): This is an object that contains the parameters specific to the splitter. For example, if you're using the `"recursive_character"` splitter, the parameters could be: + - `chunk_size`: Specifies the size of each chunk of text. + - `chunk_overlap`: Specifies how much each chunk should overlap with the next. + +The parameters required will depend on the splitter being used. Please refer to the [LangChain documentation](https://python.langchain.com/en/latest/reference/modules/text_splitter.html) for more details. + +#### Usage: + +```javascript +const textSplitterId = await collection.register_text_splitter( + "recursive_character", + { chunk_size: "100", chunk_overlap: "20" } +); +``` + +In the above example, the `recursive_character` splitter is registered with a `chunk_size` of `100` and `chunk_overlap` of `20`. The method returns the ID of the registered splitter, which can be used in other methods like `.generate_embeddings()`. + +Ensure that the `splitter_name` corresponds to one of the registered splitters, and the parameters match the requirements of that specific splitter. + +### Developer Setup + +This Javascript library is generated from our core rust-sdk. Please check [rust-sdk documentation](../../rust/pgml/README.md) for developer setup. + +### API Reference + +- [Database](./docs/pgml/database.md) +- [Collection](./docs/pgml/collection.md) + +### Roadmap + +- Enable filters on document metadata in `vector_search`. [Issue](https://github.com/postgresml/postgresml/issues/663) +- `text_search` functionality on documents using Postgres text search. [Issue](https://github.com/postgresml/postgresml/issues/664) +- `hybrid_search` functionality that does a combination of `vector_search` and `text_search` in an order specified by the user. [Issue](https://github.com/postgresml/postgresml/issues/665) +- Ability to call and manage OpenAI embeddings for comparison purposes. [Issue](https://github.com/postgresml/postgresml/issues/666) +- Save `vector_search` history for downstream monitoring of model performance. [Issue](https://github.com/postgresml/postgresml/issues/667) +- Perform chunking on the DB with multiple langchain splitters. [Issue](https://github.com/postgresml/postgresml/issues/668) diff --git a/pgml-sdks/rust/pgml/javascript/examples/README.md b/pgml-sdks/rust/pgml/javascript/examples/README.md new file mode 100644 index 000000000..ef3119e70 --- /dev/null +++ b/pgml-sdks/rust/pgml/javascript/examples/README.md @@ -0,0 +1,7 @@ +## Javascript Examples + +Here we have a set of examples of different use cases of the pgml javascript SDK. + +## Examples: + +1. [Getting Started](./getting-started/) - Simple npm project that uses the pgml SDK to create a collection, upsert documents into the collection, and run a vector search on the collection. diff --git a/pgml-sdks/rust/pgml/javascript/examples/getting-started/README.md b/pgml-sdks/rust/pgml/javascript/examples/getting-started/README.md new file mode 100644 index 000000000..df955fbf9 --- /dev/null +++ b/pgml-sdks/rust/pgml/javascript/examples/getting-started/README.md @@ -0,0 +1,12 @@ +# Getting Started with the PGML Javascript SDK + +In this example repo you will find a basic script that you can run to get started with the PGML Javascript SDK. This script will create a collection, upsert documents into the collection, generate chunks, generate embeddings, and run a vector search on the collection. + +## Steps to run the example + +1. Clone the repo +2. Install dependencies + `npm install` +3. Create a .env file and set `PGML_CONNECTION` to your Postgres connection string +4. Open index.js and check out the code +5. Run the script `node index.js` diff --git a/pgml-sdks/rust/pgml/javascript/examples/getting-started/index.js b/pgml-sdks/rust/pgml/javascript/examples/getting-started/index.js new file mode 100644 index 000000000..199001850 --- /dev/null +++ b/pgml-sdks/rust/pgml/javascript/examples/getting-started/index.js @@ -0,0 +1,47 @@ +const pgml = require("pgml"); +require("dotenv").config(); + +const CONNECTION_STRING = + process.env.PGML_CONNECTION || + "postgres://postgres@127.0.0.1:5433/pgml_development"; + +const main = async () => { + const db = await pgml.newDatabase(CONNECTION_STRING); + const collection_name = "hello_world_2"; + const collection = await db.create_or_get_collection(collection_name); + const documents = [ + { + name: "Document One", + text: "document one contents...", + }, + { + name: "Document Two", + text: "document two contents...", + }, + ]; + await collection.upsert_documents(documents); + await collection.generate_chunks(); + await collection.generate_embeddings(); + const queryResults = await collection.vector_search( + "What are the contents of document one?", // query text + {}, // embedding model parameters + 1 // top_k + ); + + // convert the results to array of objects + const results = queryResults.map((result) => { + const [similarity, text, metadata] = result; + return { + similarity, + text, + metadata, + }; + }); + + await db.archive_collection(collection_name); + return results; +}; + +main().then((results) => { + console.log("Vector search Results: ", results); +}); diff --git a/pgml-sdks/rust/pgml/javascript/examples/getting-started/package-lock.json b/pgml-sdks/rust/pgml/javascript/examples/getting-started/package-lock.json new file mode 100644 index 000000000..78645d335 --- /dev/null +++ b/pgml-sdks/rust/pgml/javascript/examples/getting-started/package-lock.json @@ -0,0 +1,18 @@ +{ + "name": "getting-started", + "version": "1.0.0", + "lockfileVersion": 1, + "requires": true, + "dependencies": { + "dotenv": { + "version": "16.3.1", + "resolved": "https://registry.npmjs.org/dotenv/-/dotenv-16.3.1.tgz", + "integrity": "sha512-IPzF4w4/Rd94bA9imS68tZBaYyBWSCE47V1RGuMrB94iyTOIEwRmVL2x/4An+6mETpLrKJ5hQkB8W4kFAadeIQ==" + }, + "pgml": { + "version": "0.1.6", + "resolved": "https://registry.npmjs.org/pgml/-/pgml-0.1.6.tgz", + "integrity": "sha512-gjuEYDPl7TrnsxtL2htXb0NCKQ6FhM9kyRvzwYNhYQSRdnEHVps4Yk3vIL9QoZj1y9niuGW6aXebx4T734TJhQ==" + } + } +} diff --git a/pgml-sdks/rust/pgml/javascript/examples/getting-started/package.json b/pgml-sdks/rust/pgml/javascript/examples/getting-started/package.json new file mode 100644 index 000000000..ee1e375d6 --- /dev/null +++ b/pgml-sdks/rust/pgml/javascript/examples/getting-started/package.json @@ -0,0 +1,15 @@ +{ + "name": "getting-started", + "version": "1.0.0", + "description": "", + "main": "index.js", + "scripts": { + "test": "echo \"Error: no test specified\" && exit 1" + }, + "author": "", + "license": "ISC", + "dependencies": { + "dotenv": "^16.3.1", + "pgml": "^0.1.6" + } +} diff --git a/pgml-sdks/rust/pgml/javascript/tests/javascript-tests/test.js b/pgml-sdks/rust/pgml/javascript/tests/javascript-tests/test.js index 8594e8163..dec834130 100644 --- a/pgml-sdks/rust/pgml/javascript/tests/javascript-tests/test.js +++ b/pgml-sdks/rust/pgml/javascript/tests/javascript-tests/test.js @@ -35,4 +35,4 @@ async function test() { await db.archive_collection(collection_name); } -test().then(() => console.log("\nTests Done!")).catch((err) => console.log(err)); +test().then(() => console.log("\nTests Done!")).catch((err) => console.log(err)); \ No newline at end of file From cd29f8ae6d10992b9cdd643a127437d1a0e51284 Mon Sep 17 00:00:00 2001 From: Alex Boquist Date: Wed, 5 Jul 2023 16:13:10 -0400 Subject: [PATCH 2/5] Update README.md --- pgml-sdks/rust/pgml/javascript/examples/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pgml-sdks/rust/pgml/javascript/examples/README.md b/pgml-sdks/rust/pgml/javascript/examples/README.md index ef3119e70..3ee6a0d01 100644 --- a/pgml-sdks/rust/pgml/javascript/examples/README.md +++ b/pgml-sdks/rust/pgml/javascript/examples/README.md @@ -4,4 +4,4 @@ Here we have a set of examples of different use cases of the pgml javascript SDK ## Examples: -1. [Getting Started](./getting-started/) - Simple npm project that uses the pgml SDK to create a collection, upsert documents into the collection, and run a vector search on the collection. +1. [Getting Started](./getting-started/) - Simple project that uses the pgml SDK to create a collection, upsert documents into the collection, and run a vector search on the collection. From 68c36dc2b4db2a0b68c8d9fe88ab88cca08a10b7 Mon Sep 17 00:00:00 2001 From: Alex Boquist Date: Wed, 5 Jul 2023 16:14:37 -0400 Subject: [PATCH 3/5] Update index.js --- .../rust/pgml/javascript/examples/getting-started/index.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pgml-sdks/rust/pgml/javascript/examples/getting-started/index.js b/pgml-sdks/rust/pgml/javascript/examples/getting-started/index.js index 199001850..20fe6f3e7 100644 --- a/pgml-sdks/rust/pgml/javascript/examples/getting-started/index.js +++ b/pgml-sdks/rust/pgml/javascript/examples/getting-started/index.js @@ -7,7 +7,7 @@ const CONNECTION_STRING = const main = async () => { const db = await pgml.newDatabase(CONNECTION_STRING); - const collection_name = "hello_world_2"; + const collection_name = "hello_world"; const collection = await db.create_or_get_collection(collection_name); const documents = [ { From 0d1f712f65c7a2d8629c3a1ac52f7663d160d38b Mon Sep 17 00:00:00 2001 From: aplchiangh Date: Wed, 5 Jul 2023 16:28:36 -0400 Subject: [PATCH 4/5] remove python/postgres dep lines --- pgml-sdks/rust/pgml/javascript/README.md | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/pgml-sdks/rust/pgml/javascript/README.md b/pgml-sdks/rust/pgml/javascript/README.md index d5ad8f79f..f3ddd918e 100644 --- a/pgml-sdks/rust/pgml/javascript/README.md +++ b/pgml-sdks/rust/pgml/javascript/README.md @@ -62,16 +62,7 @@ Follow the steps below to quickly get started with the Javascript SDK for buildi Before you begin, make sure you have the following: -- PostgresML Database: You can [sign up for a free GPU-powered database](https://postgresml.org/signup) or [spin up a database using Docker](https://github.com/postgresml/postgresml#installation). Ensure you have a PostgresML database version >`2.3.1`. Set the `PGML_CONNECTION` environment variable to the connection string of your PostgresML database. If not set, the SDK will use the default connection string for your local installation `postgres://postgres@127.0.0.1:5433/pgml_development`. If you are running Postgres locally then make sure: - - - Python version >=3.8.1 - - - `postgresql` command line utility - - Ubuntu: `sudo apt install libpq-dev` - - Centos/Fedora/Cygwin/Babun: `sudo yum install libpq-devel` - - Mac: `brew install postgresql` - -### Installation +- PostgresML Database: You can [sign up for a free GPU-powered database](https://postgresml.org/signup) or [spin up a database using Docker](https://github.com/postgresml/postgresml#installation). Ensure you have a PostgresML database version >`2.3.1`. Set the `PGML_CONNECTION` environment variable to the connection string of your PostgresML database. If not set, the SDK will use the default connection string for your local installation `postgres://postgres@127.0.0.1:5433/pgml_development`. To install the Javascript SDK: From ce7369d3186d32d3a48e7b7ad604cba64ecea12a Mon Sep 17 00:00:00 2001 From: aplchiangh Date: Wed, 5 Jul 2023 16:34:21 -0400 Subject: [PATCH 5/5] fix param order --- pgml-sdks/rust/pgml/javascript/README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/pgml-sdks/rust/pgml/javascript/README.md b/pgml-sdks/rust/pgml/javascript/README.md index f3ddd918e..6c4ee83ae 100644 --- a/pgml-sdks/rust/pgml/javascript/README.md +++ b/pgml-sdks/rust/pgml/javascript/README.md @@ -402,14 +402,14 @@ In this case, the splitter with ID `2` is used. Ensure that this ID corresponds ### Generate Embeddings -`.generate_embeddings(splitter_id, model_id)` +`.generate_embeddings(model_id, splitter_id)` This method generates embeddings from the chunks of text. #### Parameters: -- `splitter_id` (optional): The ID of the splitter used for segmenting the text into chunks. This parameter is optional. If not specified, it defaults to `1`. - `model_id` (optional): The ID of the model used to generate embeddings. This parameter is optional. If not specified, it defaults to `1`, corresponding to the `intfloat/e5-small` embeddings model. +- `splitter_id` (optional): The ID of the splitter used for segmenting the text into chunks. This parameter is optional. If not specified, it defaults to `1`. Both `splitter_id` and `model_id` should correspond to a splitter and model that are already registered. @@ -427,11 +427,11 @@ To use a different splitter or model, you need to pass their corresponding IDs. await collection.generate_embeddings(2, 3); ``` -In this case, the splitter with ID `2` and the model with ID `3` are used. Ensure that these IDs correspond to a registered splitter and model. +In this case, the splitter with ID `3` and the model with ID `2` are used. Ensure that these IDs correspond to a registered splitter and model. ### Vector Search -`.vector_search(query, top_k, splitter_id, model_id)` +`.vector_search(query, top_k, model_id, splitter_id)` This method converts the input query into embeddings and searches the embeddings table for the nearest matches. @@ -439,8 +439,8 @@ This method converts the input query into embeddings and searches the embeddings - `query` (required): The query text that needs to be converted into embeddings for vector search. - `top_k` (optional): The number of top matches that should be returned. This parameter is optional. If not specified, it defaults to `10`. -- `splitter_id` (optional): The ID of the splitter used for segmenting the query text into chunks. This parameter is optional. If not specified, it defaults to `1`. - `model_id` (optional): The ID of the model used to convert query into embeddings. This parameter is optional. If not specified, it defaults to `1`, corresponding to the `intfloat/e5-small` embeddings model. +- `splitter_id` (optional): The ID of the splitter used for segmenting the query text into chunks. This parameter is optional. If not specified, it defaults to `1`. Both `splitter_id` and `model_id` should correspond to a splitter and model that are already registered. @@ -450,8 +450,8 @@ Both `splitter_id` and `model_id` should correspond to a splitter and model that const results = await collection.vector_search( "Who won 20 grammy awards?", 2, // top_k - 1, // splitter_id 1 // model_id + 1, // splitter_id ); ```