DEV Community

Cover image for User Defined Summary and Data Mining
Ranjan Dailata
Ranjan Dailata

Posted on

User Defined Summary and Data Mining

Introduction

In this blog post, you will guide with the user defined summary generation with the data mining that you wish. For example, the extraction of keywords and highlights. For the demonstration purposes, you will be presented with the code using the Google Gemini Model.

Hands-on

  1. Please head over to the Google Colab
  2. Make sure to login to the Google Cloud and get the Project Id and Location Info.
  3. Use the below code for Vertex AI initialization purposes.
import sys

# Additional authentication is required for Google Colab
if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth

    auth.authenticate_user()

PROJECT_ID = "<<project_id>>"  # @param {type:"string"}
LOCATION = "<<location>>"  # @param {type:"string"}

if "google.colab" in sys.modules:
    # Define project information
    PROJECT_ID = PROJECT_ID
    LOCATION = LOCATION

    # Initialize Vertex AI
    import vertexai
    vertexai.init(project=PROJECT_ID, location=LOCATION)
Enter fullscreen mode Exit fullscreen mode

We are going to consider the user preference as below. You could have that option anywhere. However, for simplicity purposes, here we shall go with the mock data.

user_albert_preference = {
    "prompt_template":{
        "summary_template_1": """
            You are an expert summary generator. Generate a clean and consise summary in less than 100 lines.

            Prompt 1: Identify key sections in the text for summary generation
            Prompt 2: Extract key information from the introduction section
            Prompt 3: Parse out the main objective or purpose of the text
            Prompt 4: Identify any key findings or conclusions discussed in the text
            Prompt 5: Summarize the main arguments or points presented in the text
            Prompt 6: Summarize the overall tone or attitude of the text 

            Output the summary as per the below schema.
            {
              "summary": "",
              "highlights": []
              "keywords": []
            }

        """,
        "summary_template_2": """
          1. Present a brief snapshot of the content to be summarized.
          2. Uncover the essential insights, emphasizing the core elements.
          3. Illuminate the primary theme or objective that underlies the material.
          4. Incorporate pertinent details that enrich the overall context.
          5. Emphasize the necessity for brevity, focusing on the key information.
          6. Stress the importance of a clear and coherent flow in the summary.
          7. Encourage the exclusion of repetitive information for a streamlined summary.
        """
    }
}
Enter fullscreen mode Exit fullscreen mode

As you are seeing, the above user preference has two summary template option. You will be showcased on how to use and execute them using the Google Gemini model. As per the below code, you may provide the text to summarize input of your choice and see the response.

Here's the code sample.

import vertexai
from vertexai.preview.generative_models import GenerativeModel, Part

def generate_summary(text_to_summarize, max_output_tokens):
  model = GenerativeModel("gemini-pro")
  responses = model.generate_content(f"""You are an expert summary generator. Please follow the below rules for the summary generation.
          {user_albert_preference['prompt_template']['summary_template_1']}
          Here's the content:
          {text_to_summarize}
           """,
    generation_config={
        "max_output_tokens": max_output_tokens,
        "temperature": 0.9,
        "top_p": 1
    },
  stream=True,
  )

  for response in responses:
      print(response.candidates[0].content.parts[0].text)

generate_summary(text_to_summarize, 8000)
Enter fullscreen mode Exit fullscreen mode

Image description

Top comments (0)