DEV Community

InterSystems Developer for InterSystems

Posted on • Originally published at community.intersystems.com

PrivateGPT exploring the Documentation

Considering new business interest in applying Generative-AI to local commercially sensitive private data and information, without exposure to public clouds. Like a match needs the energy of striking to ignite, the Tech lead new "activation energy" challenge is to reveal how investing in GPU hardware could support novel competitive capabilities. The capability can reveal the use-cases that provide new value and savings.

Sharpening this axe begins with a functional protocol for running LLMs on a local laptop.

My local Mac has an M1 processor. In early experiments had found from exploring Falcon models the flow of toolkit was primarily towards using the Cuda graphics card. There are "compressed" versions of models ( Quantized ). However from my current exploration path these could only be loaded into a Cuda GPU which I didn't have. No Bit-and-bytes support for M1 / M2 processors and AutoGPTQ quantization didn't support MPS processor either. Running the unquantized models in CPU was prohibitively slow.

I had spotted PrivateGPT project and the following steps got things running.

# install developer tools
xcode-select --install 

# create python sandbox
mkdir PrivateGTP
cd privateGTP/
python3 -m venv .
# actiavte local context
source bin/activate

# privateGTP uses poetry for python module management
privateGTP> pip install poetry

# sync privateGTP project
privateGTP> git clone https://github.com/imartinez/privateGPT

# enable MPS for model loading and processing
privateGTP> CMAKE_ARGS="-DLLAMA_METAL=on" pip install --force-reinstall --no-cache-dir llama-cpp-python

privateGTP> cd privateGPT

# Import configure python dependencies
privateGTP> poetry run python3 scripts/setup

# launch web interface to confirm operational on default model
privateGTP> python3 -m private_gpt
# navigate safari browser to http://localhost:8001/

# To bulk import documentation needed to stop the web interface as vector database not in multi-user mode
privateGTP> [control] + "C"

# import some PDFs
privateGTP> curl "https://docs.intersystems.com/irislatest/csp/docbook/pdfs.zip" -o /tmp/pdfs.zip
privateGTP> unzip /tmp/pdfs.zip -d /tmp
# took a few hours to process
privateGTP> make ingest /tmp/pdfs/pdfs/

# launch web interface again for query documentation
privateGTP> python3 -m private_gpt

Experiments with the default model mistral-7B-Instruct

Some things that worked reasonably were looking for more textual content.

1. What is a lock Table?

Image description

2. Write Object Script "Hello World"

Image description

3. INSERT in SQL Statement

The trailing semicolon is a trained habit from training on different implementations of SQL.

Current deployed versions of IRIS would have an issue with the trailing semicolon but this is addressed as discarded rather that being an error in newer version specifically to address this common generative nuance.

Image description

4. ObjectScript FOR loop

The challenge here is treating the increment as the max counter. So it creates an endless loop incrementing by "10" instead of a loop that increments by "1" ten times. As a minor point the trailing quit is not necessary to the loop, but is tidy in terms of a well contained line label.

Image description

5 . ObjectScript continue

Here the use of documentation is clearly confused the ObjectScript with IRIS BASIC language examples ( THEN keyword ). A query docs approach, possibly needs to use "ObjectScript" as a metadata filter or have upstream generated sets of help PDFs that are limited to a particular language implementation. Anticipating Python and Java examples could impose a similar effect. The use of "i%2" is python syntax for modular operator where as would be expecting "i#2" for object script.

Image description

 

Swapping out models

New models can be added by downloading GGUF format models to the models sub-directory from https://huggingface.co/

Here the naming convention contained "Q"+level to indicate quantization loss versus size. Where lower "Q" is effectively a smaller download model with more quality loss.

settings.yaml

local:
  llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.1-GGUF
  llm_hf_model_file: mistral-7b-instruct-v0.1.Q4_K_M.gguf
  #llm_hf_repo_id: TheBloke/Orca-2-13B-GGUF
  #llm_hf_model_file: orca-2-13b.Q6_K.gguf
  #llm_hf_repo_id: TheBloke/XwinCoder-34B-GGUF
  #llm_hf_model_file: xwincoder-34b.Q6_K.gguf
  embedding_hf_model_name: BAAI/bge-small-en-v1.5ssss

Using "LLM Chat" mode ( No query documents) with "xwincoder-34b" model, suggests much "code" recommendations can come from the existing trained model.

It demonstrated an interesting learned confusion between Globals and Routines which both are referenced by the carrot ( "^" ) .

The enthusiasm for "%" prefix for method name invocation may be learned invocation patterns from system class documentation instead of learning the feature of method name relates to invoking said method.

There are configuration references with system management that undertake actions in the %SYS namespace and this is generally quite separate from activities of third-party code.

It was interesting to see the invocation as well as implementation example to uncover such disparity ( Invoked as a Routine instead of class method ).

Image description

General Generative capabilities ( Model Orca 13b Q6 )

1. Generating questions and answers from supplied text

Image description

2. Explaining differences of concepts from supplied text

Image description

3. Summarizing information into a shorter form

Image description

Integrate with IRIS

PrivateGPT can be put to work from IRIS by exercising available JSON end point with surprisingly little code.

The following code snippets demonstrate direct usage from ObjectScript.

This can easily be wrapped by IRIS Integration Operations and Messages to provide a reusable facility with configuration, message trace, etc.,.

IRIS Conversation example

// Construct a request message via IRIS building Dynamic Object in ObjectScript
    // The property "use_context" = true, this ensures that preloaded documents are searched
    // for similarity, autoloaded into context to support response generation
    // The property "prompt", This is the full prompt provided to the LLM
    Set request={"use_context":true,"prompt":"What is an IRIS lock table"}
    // Using %Net.HttpRequest for direct access
    Set hr=##class(%Net.HttpRequest).%New()
    Set hr.Server="127.0.0.1"
    Set hr.Port=8001
    Set hr.ContentType="application/json"
   
    Do hr.SetHeader("Accept","application/json")
    // Embed the request as JSON
    Do hr.EntityBody.Write(request.%ToJSON())
    // Make request
    // The optional "2" argument causes the response to be output to the default device ( terminal )
    // Useful for analysing returned output
    Do hr.Post("/v1/completions",2)
    // The stream should already be rewound, but stated here
    // should stream need to be reused in testing
    Do hr.HttpResponse.Data.Rewind()
    // Turn the response back into IRIS %Dynamic Object
    Set response={}.%FromJSON(hr.HttpResponse.Data.Read(32000))
    // Grab the generated response
    Set outputMessage=response.choices.%Get(0).message
    // Grab the metadata for references used by generation
    Set outputSources=response.choices.%Get(0).sources.%Get(0).document
   
    Write !,"Output:",outputMessage.content
    Write !,"File reference:",outputSources."doc_metadata"."page_label"
    Write !,"Filename:",outputSources."doc_metadata"."file_name"

Example output:

Image description

 

IRIS Retrieve Embedding Example

// Construct a request message via IRIS building Dynamic Object Construct
    Set textForEmbedding={"input":"This is some test text from IRIS database"}
    // Using %Net.HttpRequest for direct access
    Set hr=##class(%Net.HttpRequest).%New()
    Set hr.Server="127.0.0.1"
    Set hr.Port=8001
    Set hr.ContentType="application/json"
    Do hr.SetHeader("Accept","application/json")
    // Embedd the request as JSON
    Do hr.EntityBody.Write(textForEmbedding.%ToJSON())
    // Make request
    // The optional "2" argument causes the response to be output to the default device ( terminal )
    // Useful for analysing returned output
    Do hr.Post("/v1/embeddings",2)
    // The stream should already be rewound, but stated here
    // should stream need to be reused in testing
    Do hr.HttpResponse.Data.Rewind()
    // Turn the response back into IRIS %Dynamic Object
    Set response={}.%FromJSON(hr.HttpResponse.Data.Read(32000))
    // Example of iterator to loop over embeddings array
    Set iter=response.data.%Get(0).embedding.%GetIterator()
    // Output of floating point numbers of the returned embedding
    While iter.%GetNext(.key, .value, .type) {
        Write !,"key=",key,", value=",value,", type=",type
    }

Example output:

Image descriptionworiginal="570">

Further IRIS integration

The web API also supports:

  • dynamically loading new source documents
  • listing existing source document
  • deleting existing source documents
  • a health API to indicate availability

Further details available at: https://docs.privategpt.dev/api-reference/api-reference/ingestion

Summary thoughts and ideas

Local inference ( running quantized LLMs on laptop for productivity ) can be a useful way to initially scale the application of existing and internally retrained models. It allows flexibility for business users to privately explore and share new usercases and prompt engineering recipes.

To use PrivateGPT better for documentation, would need to delve deeper to reconfigure generative temperature lower, to reduce the creativity and improve accuracy of answers.

Technical Documentation and user manuals are no longer intended simply for human readers.

Can existing documentation pipelines be easily repurposed with metadata to shape documentation output that is better consumed and repurposed for generative AI?

A single documentation resource for multiple code languages makes it difficult to generalize usefully without conflating code examples. Hence hypothesis is that both documentation "code language" and "source retraining" of models would be better suited in short-term to be mono-code-language resources and assistants.

How well can retraining an existing model "unlearn" existing code conflations to be replaced with the useful and expected code syntax?

Hope this inspires new explorations.

References

Martínez Toro, I., Gallego Vico, D., & Orgaz, P. (2023). PrivateGPT [Computer software]. https://github.com/imartinez/privateGPT

Resources

https://github.com/imartinez/privateGPT

https://docs.privategpt.dev/

https://huggingface.co/models

https://github.com/ggerganov/llama.cpp

Top comments (0)