Skip to content

Commit 6c0a09d

Browse files
Chatbot Image updates (#1570)
1 parent ba4b3a7 commit 6c0a09d

File tree

8 files changed

+876
-4
lines changed

8 files changed

+876
-4
lines changed

pgml-cms/docs/.gitbook/assets/Chatbots_Flow-Diagram.svg

Lines changed: 281 additions & 0 deletions
Loading

pgml-cms/docs/.gitbook/assets/Chatbots_King-Diagram.svg

Lines changed: 78 additions & 0 deletions
Loading

pgml-cms/docs/.gitbook/assets/Chatbots_Limitations-Diagram.svg

Lines changed: 275 additions & 0 deletions
Loading

pgml-cms/docs/.gitbook/assets/Chatbots_Tokens-Diagram.svg

Lines changed: 238 additions & 0 deletions
Loading
-107 KB
Binary file not shown.
-14.7 KB
Binary file not shown.
Binary file not shown.

pgml-cms/docs/guides/chatbots/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Here is an example flowing from:
3030

3131
text -> tokens -> LLM -> probability distribution -> predicted token -> text
3232

33-
<figure><img src="https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FrvfCoPdoQeoovZiqNG90%2Fuploads%2FPzJzmVS3uNhbvseiJbgi%2FScreenshot%20from%202023-12-13%2013-19-33.png?alt=media&#x26;token=11d57b2a-6aa3-4374-b26c-afc6f531d2f3" alt=""><figcaption><p>The flow of inputs through an LLM. In this case the inputs are "What is Baldur's Gate 3?" and the output token "14" maps to the word "I"</p></figcaption></figure>
33+
<figure><img src="../../.gitbook/assets/Chatbots_Limitations-Diagram.svg" alt=""><figcaption><p>The flow of inputs through an LLM. In this case the inputs are "What is Baldur's Gate 3?" and the output token "14" maps to the word "I"</p></figcaption></figure>
3434

3535
{% hint style="info" %}
3636
We have simplified the tokenization process. Words do not always map directly to tokens. For instance, the word "Baldur's" may actually map to multiple tokens. For more information on tokenization checkout [HuggingFace's summary](https://huggingface.co/docs/transformers/tokenizer\_summary).
@@ -108,11 +108,11 @@ What does an `embedding` look like? `Embeddings` are just vectors (for our use c
108108
embedding_1 = embed("King") # embed returns something like [0.11, -0.32, 0.46, ...]
109109
```
110110

111-
<figure><img src="../../.gitbook/assets/embedding_king.png" alt=""><figcaption><p>The flow of word -> token -> embedding</p></figcaption></figure>
111+
<figure><img src="../../.gitbook/assets/Chatbots_King-Diagram.svg" alt=""><figcaption><p>The flow of word -> token -> embedding</p></figcaption></figure>
112112

113113
`Embeddings` aren't limited to words, we have models that can embed entire sentences.
114114

115-
<figure><img src="../../.gitbook/assets/embeddings_tokens.png" alt=""><figcaption><p>The flow of sentence -> tokens -> embedding</p></figcaption></figure>
115+
<figure><img src="../../.gitbook/assets/Chatbots_Tokens-Diagram.svg" alt=""><figcaption><p>The flow of sentence -> tokens -> embedding</p></figcaption></figure>
116116

117117
Why do we care about `embeddings`? `Embeddings` have a very interesting property. Words and sentences that have close [semantic similarity](https://en.wikipedia.org/wiki/Semantic\_similarity) sit closer to one another in vector space than words and sentences that do not have close semantic similarity.
118118

@@ -157,7 +157,7 @@ print(context)
157157

158158
There is a lot going on with this, let's check out this diagram and step through it.
159159

160-
<figure><img src="../../.gitbook/assets/chatbot_flow.png" alt=""><figcaption><p>The flow of taking a document, splitting it into chunks, embedding those chunks, and then retrieving a chunk based off of a users query</p></figcaption></figure>
160+
<figure><img src="../../.gitbook/assets/Chatbots_Flow-Diagram.svg" alt=""><figcaption><p>The flow of taking a document, splitting it into chunks, embedding those chunks, and then retrieving a chunk based off of a users query</p></figcaption></figure>
161161

162162
Step 1: We take the document and split it into chunks. Chunks are typically a paragraph or two in size. There are many ways to split documents into chunks, for more information check out [this guide](https://www.pinecone.io/learn/chunking-strategies/).
163163

0 commit comments

Comments
 (0)