Introduction
In this tutorial, we'll walk you through the process of using the @polyfact/vectorizer
package to vectorize code repositories and PDFs using this AI package. This will help you convert your textual data into vector representations that can be used for various machine learning and data analysis tasks.
Table of Contents
- Stack
- Step 1: Installation
- Step 2: Usage as a Library
- Step 2.1: Usage via Command Line Interface
- Conclusion
- Other resources
Stack
To use the PolyFact vectorizer, you only need a terminal and your preferred work environment. Choose a document or a repository you want to vectorize.
Step 1: Installation
To get started, you need to install the @polyfact/vectorizer
package. You can do this using the Node Package Manager (npm):
npm install @polyfact/vectorizer
If you want to use the CLI globally, you can install it like this:
npm install -g @polyfact/vectorizer
Step 2: Usage as a Library
Importing the Library
First, let's import the @polyfact/vectorizer
library and set up the vectorizer:
import Vectorizer, { SourceType } from "@polyfact/vectorizer";
const token = "your-api-token";
const maxTokens = 1000; // Adjust as needed
const sourceType = SourceType.DIRECTORY;
const vectorizer = new Vectorizer(token, maxTokens, sourceType);
Vectorizing Code Repositories
Now, let's see how you can use the vectorizer to process code repositories:
const filePaths = ["path/to/your/repository"];
const files = await vectorizer.readFiles(filePaths);
await vectorizer.vectorize(files, progressCallback);
const memoryId = vectorizer.getMemoryId();
In this code snippet, the vectorizer
tool is utilized to process and convert folders, PDFs, or audios from a specified path into vectorized format. Upon completion, a unique memory ID is returned. This memory ID acts as a distinct identifier, allowing you to pair it with the generate
function's memoryId
option. Consequently, when sending a task related to your files, PDF, or audio, the model will directly leverage your embeddings
.
It is also possible to use the PolyFact SDK to do the same thing, except for the PDFs. You can find out more here.
Step 2.1: Usage via Command Line Interface
Vectorize a Code Repository
To vectorize an entire code repository, use the following CLI command:
@polyfact/vectorizer repo path/to/your/repository --token your-api-token --max-token 1000
Vectorize PDF Files
To vectorize PDF files, use the following CLI command:
@polyfact/vectorizer pdf file1.pdf file2.pdf --token your-api-token
Conclusion
Congratulations! You've learned how to use the @polyfact/vectorizer
package to vectorize code repositories and PDFs using the PolyFact AI. These vector representations can be incredibly useful for various machine learning and data analysis tasks. Feel free to explore the PolyFact SDK documentation to learn more about how to use the generated memory ID in your projects.
For more information and more packages, refer to the official documentation.
Other resources:
Top comments (0)