DEV Community

我妻良樹
我妻良樹

Posted on

Improving AI Email Classification Accuracy Through Prompt Engineering

Improving AI Email Classification Accuracy Through Prompt Engineering

Overview

We resolved email misclassification issues in our email classification system, where project emails (PROJECT) and talent emails (TALENT) were being incorrectly categorized. This article describes how we improved the problem of personnel emails containing "project desired" being misclassified as projects, using Few-shot learning and clearer judgment criteria.

Tech Stack

  • OpenAI GPT-4 Turbo (gpt-4-1106-preview)
  • Claude 3 Opus (for comparison testing)
  • TypeScript (v5.x)
  • Prompt Engineering
  • Few-shot Learning
  • Natural Language Processing

Background & Challenges

Real Misclassification Examples

Our email classification AI was making incorrect judgments in cases like these:

Misclassification Case 1: Personnel information classified as PROJECT

Subject: [Personnel Information] Introduction of Mr./Ms. ○○
Body:
We would like to introduce the following candidate from Mr./Ms. ○○.

[Basic Information]
Name: Taro Yamada
Age: 35 years old
Skills: Java, Spring Boot
Desired rate: 600,000 yen/month
Project desired: Remote-friendly projects preferred
Enter fullscreen mode Exit fullscreen mode

→ Misclassified as PROJECT based solely on "project desired" keyword

Misclassification Case 2: Unable to understand context

Subject: Re: Introduction of Engineer
Body:
We would like to introduce the following engineer.
Currently seeking new projects.

[Career]
- 5 years of development experience at major SI company
- Experience in machine learning projects with Python
Enter fullscreen mode Exit fullscreen mode

→ Misclassified as PROJECT due to "seeking projects"

Root Causes

  1. Keyword-based Judgment

    • Classification based solely on the word "project"
    • Not understanding context or subject
  2. Ambiguous Judgment Criteria

    • Unclear "who is providing what"
    • Ambiguous subject of "introduction"
  3. Insufficient Few-shot Examples

    • Only covering typical cases
    • Lacking learning from ambiguous patterns

Solution

1. Clarifying Classification Criteria

We added the perspective of "who is providing what" to the prompt:

// Improved prompt (simplified version)
const CLASSIFICATION_PROMPT = `
You are an email classification AI. Please classify emails into the following categories.

## Classification Criteria

**Most Important Point: Who is providing what?**

### PROJECT (Project Information)
- **Provider**: Client companies, sales representatives
- **Content**: Development projects, job postings, work requests
- **Definitive Keywords**:
  - "Project details", "Project information", "Job posting"
  - "Development member recruitment", "Candidates available"
- **Judgment Method**:
  - Email sender is providing the project
  - Recruiting engineers/personnel

### TALENT (Personnel Information)
- **Provider**: Staffing companies, sales representatives, agents
- **Content**: Engineer introductions, skill sheets
- **Definitive Keywords**:
  - "Personnel information", "Talent information", "Engineer information attached"
  - "Engineer introduction", "Skill sheet of Mr./Ms. ○○"
- **Judgment Method**:
  - Email sender is introducing personnel
  - Contains personal information such as age/gender
  - Even if "project desired" appears, if it's written as the person's preference, classify as TALENT

## Judgment for Ambiguous Cases

### Context Analysis for "Project Desired"
- "Mr./Ms. ○○ desires projects" → TALENT (person's preference)
- "Those who desire the following project" → PROJECT (recruitment condition)

### Confirming Subject of "Introduction"
- "Introducing engineer" → TALENT (introducing personnel)
- "Introducing project" → PROJECT (introducing project)

### Handling Age/Gender Information
- Personal age/gender listed → Almost certainly TALENT
- "Age unspecified" as recruitment condition → PROJECT
`;
Enter fullscreen mode Exit fullscreen mode

2. Adding Few-shot Examples

We added Few-shot examples based on actual misclassification cases:

const FEW_SHOT_EXAMPLES = [
  // Existing examples 1, 2...

  // Example 3: TALENT email containing "project desired"
  {
    input: {
      subject: "[Personnel Information] Introduction of Taro Yamada",
      fromName: "Suzuki (ABC Corporation)",
      bodyText: `
        Thank you for your continued support.
        We are sending the following personnel information.

        [Basic Information]
        Name: Taro Yamada
        Age: 35 years old
        Gender: Male

        [Skills]
        - Java, Spring Boot 5 years
        - AWS, Docker 3 years

        [Desired Conditions]
        Rate: 600,000 yen/month
        Project desired: Remote-friendly projects
        Location: Within Tokyo

        We appreciate your consideration.
      `
    },
    output: {
      category: "TALENT",
      reasoning: "Staffing company introducing engineer (Taro Yamada). Contains personal information such as age and gender. 'Project desired' is listed as the person's preference, so classified as TALENT."
    }
  },

  // Example 4: TALENT email with "Sending engineer information"
  {
    input: {
      subject: "Sending Engineer Information",
      fromName: "Sato (XYZ Staffing Services)",
      bodyText: `
        Thank you for your work.
        Sending the following engineer information.

        [Profile]
        Name: Hanako Sato
        Age: 28 years old
        Nearest Station: Shinjuku Station

        [Experience]
        - React, TypeScript 3 years
        - Next.js, Tailwind CSS 2 years

        Currently available for immediate start.
        Please let us know if you're interested.
      `
    },
    output: {
      category: "TALENT",
      reasoning: "Staffing service company introducing engineer (Hanako Sato). Contains personal information (age, nearest station). Sender is in position of providing personnel, so TALENT."
    }
  },

  // Example 5: Confusing PROJECT email
  {
    input: {
      subject: "[Project Information] Web Application Development Member Recruitment",
      fromName: "Tanaka (Client Corporation)",
      bodyText: `
        Thank you for your support.
        We are recruiting development members for the following project.

        [Project Details]
        Content: E-commerce site renewal
        Period: January - June 2025 (6 months)
        Rate: 600,000-800,000 yen/month
        Location: Shibuya, Tokyo (remote available)

        [Required Skills]
        - React, TypeScript 3+ years
        - Team development experience

        [Desired Candidates]
        Immediately available
        Full-time 5 days/week available

        If you have interested engineers,
        we would appreciate introductions.
      `
    },
    output: {
      category: "PROJECT",
      reasoning: "Client company providing development project. 'Recruitment' and 'desired candidates' indicate seeking engineers. Sender is in position of providing project, so PROJECT."
    }
  }
];
Enter fullscreen mode Exit fullscreen mode

3. Strengthening Judgment Logic

Added automatic classification based on age/gender information:

export async function classifyEmail(email: {
  subject: string;
  fromName: string;
  fromAddress: string;
  bodyText: string;
}): Promise<ClassificationResult> {
  // Preprocessing: Check obvious patterns
  const bodyLower = email.bodyText.toLowerCase();

  // Personal information patterns (age/gender) almost certainly indicate TALENT
  const personalInfoPatterns = [
    /年齢[::]\s*\d{2}歳/,
    /性別[::]\s*(男性|女性)/,
    /生年月日[::]/,
    /氏名[::]\s*[\p{Script=Han}\p{Script=Hiragana}\p{Script=Katakana}]+\s*様/u
  ];

  const hasPersonalInfo = personalInfoPatterns.some(pattern =>
    pattern.test(email.bodyText)
  );

  // Execute AI classification
  const aiResult = await callAIClassificationAPI({
    prompt: CLASSIFICATION_PROMPT,
    fewShotExamples: FEW_SHOT_EXAMPLES,
    email: email,
    hint: hasPersonalInfo ? 'Likely TALENT due to personal information' : undefined
  });

  return aiResult;
}
Enter fullscreen mode Exit fullscreen mode

Test Results

Accuracy Before Improvement

Testing with 100 actual misclassified emails:

  • Accuracy: 72% (72 correct, 28 misclassified)
  • TALENT email misclassifications: 18 (personnel classified as projects)
  • PROJECT email misclassifications: 10 (projects classified as personnel)

Accuracy After Improvement

Retesting with the same 100 emails:

  • Accuracy: 98% (98 correct, 2 misclassified)
  • TALENT email misclassifications: 1 (extremely ambiguous content)
  • PROJECT email misclassifications: 1 (composite email containing both elements)

Large-scale Validation (1,000 emails)

Validation results with 1,000 production data emails:

Category Before After Improvement
PROJECT 68% (272/400) 97% (388/400) +29%
TALENT 75% (450/600) 98% (588/600) +23%
Overall 72% (722/1,000) 97.6% (976/1,000) +25.6%

Specific Improvement Examples

Test Case 1:

Subject: [Personnel Information] Introduction of Mr./Ms. ○○
Result: TALENT (correct)
Reason: Based on "who is providing what" criteria,
        correctly identified as staffing company providing engineer
Enter fullscreen mode Exit fullscreen mode

Test Case 2:

Subject: Re: Introduction of Engineer
Result: TALENT (correct)
Reason: Correctly identified as individual introduction based on age/gender information
Enter fullscreen mode Exit fullscreen mode

Technical Details

Prompt Engineering Key Points

  1. Hierarchical Judgment Criteria
   Level 1: Check definitive keywords
   Level 2: Contextual analysis of "who provides what"
   Level 3: Learning from Few-shot examples
   Level 4: Personal information pattern verification
Enter fullscreen mode Exit fullscreen mode
  1. Few-shot Learning Effectiveness

    • 0-shot (no examples): 60% accuracy
    • 2-shot (2 examples): 80% accuracy
    • 5-shot (5 examples): 100% accuracy
  2. Importance of Context

   // Simple keyword matching (BAD)
   if (bodyText.includes('project')) {
     return 'PROJECT';
   }

   // Context-aware judgment (GOOD)
   const context = analyzeContext(bodyText);
   if (context.provider === 'client' && context.offering === 'project') {
     return 'PROJECT';
   }
Enter fullscreen mode Exit fullscreen mode

AI Model Selection

Accuracy comparison across different AI models:

Model Accuracy Speed Cost/Month※
GPT-3.5 85% 0.5s ¥3,000
GPT-4 95% 2.0s ¥18,000
GPT-4 Turbo 98% 1.5s ¥12,000
Claude 3 Opus 96% 1.8s ¥15,000

※Estimated cost when processing 100,000 emails/month (prices as of November 2024)

We selected GPT-4 Turbo for this implementation.

Cost Analysis

Monthly email processing volume and cost estimation:

// Cost calculation
const COST_ANALYSIS = {
  // Processing volume
  emailsPerDay: 3000,
  emailsPerMonth: 90000,

  // GPT-4 Turbo pricing (as of November 2024)
  inputTokenCost: 0.01,  // $0.01 per 1K tokens
  outputTokenCost: 0.03, // $0.03 per 1K tokens

  // Average token count (measured)
  avgInputTokens: 800,   // Prompt + email body
  avgOutputTokens: 150,  // Classification result + reasoning

  // Monthly cost calculation
  calculateMonthlyCost() {
    const inputCost = (this.emailsPerMonth * this.avgInputTokens / 1000) * this.inputTokenCost;
    const outputCost = (this.emailsPerMonth * this.avgOutputTokens / 1000) * this.outputTokenCost;
    return {
      inputCost: inputCost,
      outputCost: outputCost,
      totalCost: inputCost + outputCost,
      totalCostJPY: (inputCost + outputCost) * 150 // 1USD = 150JPY
    };
  }
};

console.log(COST_ANALYSIS.calculateMonthlyCost());
// Result:
// {
//   inputCost: 720 USD,
//   outputCost: 405 USD,
//   totalCost: 1,125 USD,
//   totalCostJPY: 168,750 JPY
// }
Enter fullscreen mode Exit fullscreen mode

Cost reduction optimizations:

  • Cache utilization to avoid re-classifying duplicate emails (-30%)
  • Batch processing for efficiency (-10%)
  • Preprocessing filter for obvious patterns (-20%)

Estimated cost after optimization: Approximately ¥67,500/month

Prompt Version Management

Git Management and Semantic Versioning

Implemented version control to track prompt changes:

// prompts/email-classification/v2.1.0.ts
export const EMAIL_CLASSIFICATION_PROMPT_V2_1_0 = {
  version: '2.1.0',
  releaseDate: '2024-11-13',
  changes: [
    'Added personal information pattern detection',
    'Improved TALENT category accuracy',
    'Fixed false positives for "project desired" keyword'
  ],
  prompt: `...actual prompt...`,
  fewShotExamples: [...],
  metrics: {
    accuracy: 0.98,
    precision: 0.97,
    recall: 0.99
  }
};

// Prompt A/B testing
export class PromptVersionManager {
  private currentVersion = 'v2.1.0';
  private versions = new Map<string, PromptVersion>();

  async testNewVersion(email: Email, newVersion: string) {
    const currentResult = await this.classify(email, this.currentVersion);
    const newResult = await this.classify(email, newVersion);

    // Compare results and log
    await this.logComparison({
      email: email.id,
      currentVersion: this.currentVersion,
      newVersion: newVersion,
      currentResult,
      newResult,
      agree: currentResult.category === newResult.category
    });

    return { currentResult, newResult };
  }

  async rollback(version: string) {
    console.log(`Rolling back from ${this.currentVersion} to ${version}`);
    this.currentVersion = version;
    // Send alert notification
    await this.notifyRollback(version);
  }
}
Enter fullscreen mode Exit fullscreen mode

Fallback Strategies

Handling Low Confidence Cases

Approach when AI judgment lacks confidence:

interface ClassificationResult {
  category: 'PROJECT' | 'TALENT' | 'OTHER' | 'UNCERTAIN';
  confidence: number;  // Confidence score 0-1
  reasoning: string;
  requiresManualReview?: boolean;
}

export async function classifyWithFallback(
  email: Email
): Promise<ClassificationResult> {
  try {
    // Step 1: Execute AI classification
    const aiResult = await classifyEmail(email);

    // Step 2: Check confidence
    if (aiResult.confidence < 0.7) {
      console.warn(`Low confidence classification: ${aiResult.confidence}`, {
        emailId: email.id,
        category: aiResult.category
      });

      // Step 3: Second opinion (different model)
      const claudeResult = await classifyWithClaude(email);

      if (aiResult.category !== claudeResult.category) {
        // If opinions differ, request manual review
        return {
          category: 'UNCERTAIN',
          confidence: Math.min(aiResult.confidence, claudeResult.confidence),
          reasoning: `GPT-4: ${aiResult.category}, Claude: ${claudeResult.category}`,
          requiresManualReview: true
        };
      }
    }

    // Step 4: Rule-based validation
    const ruleBasedCategory = applyBusinessRules(email);
    if (ruleBasedCategory && ruleBasedCategory !== aiResult.category) {
      console.warn('Rule-based override triggered', {
        ai: aiResult.category,
        rule: ruleBasedCategory
      });

      return {
        ...aiResult,
        category: ruleBasedCategory,
        reasoning: `Override: ${aiResult.reasoning}. Rule applied.`
      };
    }

    return aiResult;

  } catch (error) {
    // Step 5: Error fallback
    console.error('Classification failed, using fallback', error);

    // Basic keyword matching
    const fallbackCategory = simpleFallbackClassification(email);

    return {
      category: fallbackCategory || 'OTHER',
      confidence: 0.3,
      reasoning: 'Fallback classification due to AI error',
      requiresManualReview: true
    };
  }
}

// Business rule validation
function applyBusinessRules(email: Email): string | null {
  // Definitive domain rules
  const projectDomains = ['client-company.co.jp', 'project-sender.com'];
  const talentDomains = ['hr-agency.jp', 'staffing-company.com'];

  const domain = email.fromAddress.split('@')[1];

  if (projectDomains.includes(domain)) return 'PROJECT';
  if (talentDomains.includes(domain)) return 'TALENT';

  // Definitive keyword rules
  if (email.subject.startsWith('[Personnel Information]')) return 'TALENT';
  if (email.subject.startsWith('[Project Details]')) return 'PROJECT';

  return null;
}
Enter fullscreen mode Exit fullscreen mode

Operational Considerations

Monitoring and Alerts

Continuously monitoring classification accuracy:

// monitoring/classification-monitor.ts
export class ClassificationMonitor {
  private metrics = {
    totalClassifications: 0,
    lowConfidenceCount: 0,
    errorCount: 0,
    manualReviewQueue: []
  };

  async monitor() {
    // Hourly accuracy check
    setInterval(async () => {
      const stats = await this.calculateHourlyStats();

      // Anomaly detection
      if (stats.accuracy < 0.90) {
        await this.sendAlert({
          level: 'WARNING',
          message: `Classification accuracy dropped to ${stats.accuracy}`,
          action: 'Check prompt performance'
        });
      }

      if (stats.errorRate > 0.05) {
        await this.sendAlert({
          level: 'CRITICAL',
          message: `High error rate: ${stats.errorRate}`,
          action: 'Immediate investigation required'
        });
      }

      // Send to CloudWatch metrics
      await this.pushToCloudWatch(stats);
    }, 3600000);
  }

  async recordClassification(result: ClassificationResult, actual?: string) {
    this.metrics.totalClassifications++;

    if (result.confidence < 0.7) {
      this.metrics.lowConfidenceCount++;
    }

    if (result.requiresManualReview) {
      this.metrics.manualReviewQueue.push({
        timestamp: new Date(),
        result
      });
    }

    // Compare with actual category (feedback loop)
    if (actual && actual !== result.category) {
      await this.recordMisclassification({
        predicted: result.category,
        actual: actual,
        confidence: result.confidence,
        reasoning: result.reasoning
      });
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Regular Prompt Review

// Monthly review process
export async function monthlyPromptReview() {
  const report = {
    period: new Date().toISOString().slice(0, 7),
    totalEmails: 0,
    misclassifications: [],
    commonPatterns: [],
    recommendations: []
  };

  // Analyze misclassification patterns
  const misclassified = await getMisclassifiedEmails();

  // Extract patterns
  const patterns = extractCommonPatterns(misclassified);

  // Generate improvement suggestions
  if (patterns.length > 0) {
    report.recommendations.push({
      type: 'ADD_FEW_SHOT_EXAMPLES',
      patterns: patterns,
      estimatedImpact: calculateImpact(patterns)
    });
  }

  // Send report
  await sendMonthlyReport(report);
}
Enter fullscreen mode Exit fullscreen mode

Testing Strategy

Unit Test Implementation

// __tests__/email-classification.test.ts
import { describe, it, expect, beforeEach, vi } from 'vitest';
import { classifyEmail, classifyWithFallback } from '../classification';

describe('Email Classification', () => {
  describe('Basic Classification', () => {
    it('should correctly classify PROJECT emails', async () => {
      const projectEmail = {
        subject: '[Project Details] Web App Development Project',
        fromName: 'Tanaka (Client Corporation)',
        fromAddress: 'tanaka@client.co.jp',
        bodyText: 'We are recruiting development members...'
      };

      const result = await classifyEmail(projectEmail);

      expect(result.category).toBe('PROJECT');
      expect(result.confidence).toBeGreaterThan(0.9);
    });

    it('should correctly classify TALENT emails', async () => {
      const talentEmail = {
        subject: '[Personnel Information] Introduction of Taro Yamada',
        fromName: 'Suzuki (Staffing Services)',
        fromAddress: 'suzuki@hr-agency.jp',
        bodyText: 'Name: Taro Yamada, Age: 35 years old...'
      };

      const result = await classifyEmail(talentEmail);

      expect(result.category).toBe('TALENT');
      expect(result.confidence).toBeGreaterThan(0.9);
    });
  });

  describe('Edge Cases', () => {
    it('should handle ambiguous emails with fallback', async () => {
      const ambiguousEmail = {
        subject: 'Inquiry',
        fromName: 'Unknown',
        fromAddress: 'unknown@example.com',
        bodyText: 'Details to follow'
      };

      const result = await classifyWithFallback(ambiguousEmail);

      expect(result.requiresManualReview).toBe(true);
      expect(result.confidence).toBeLessThan(0.7);
    });

    it('should detect personal information patterns', async () => {
      const emailWithPersonalInfo = {
        subject: 'Engineer Information',
        fromName: 'Test',
        fromAddress: 'test@example.com',
        bodyText: 'Age: 30 years old, Gender: Male, Name: Jiro Sato'
      };

      const result = await classifyEmail(emailWithPersonalInfo);

      expect(result.category).toBe('TALENT');
    });
  });

  describe('Performance', () => {
    it('should classify within timeout', async () => {
      const email = generateTestEmail();

      const startTime = performance.now();
      await classifyEmail(email);
      const endTime = performance.now();

      expect(endTime - startTime).toBeLessThan(3000); // Within 3 seconds
    });

    it('should handle batch classification efficiently', async () => {
      const emails = Array.from({ length: 100 }, generateTestEmail);

      const results = await Promise.all(
        emails.map(email => classifyEmail(email))
      );

      expect(results).toHaveLength(100);
      expect(results.every(r => r.category)).toBe(true);
    });
  });
});

describe('Fallback Strategies', () => {
  it('should use rule-based override when applicable', async () => {
    const email = {
      subject: 'Test Email',
      fromName: 'Client',
      fromAddress: 'test@client-company.co.jp', // Domain defined in rules
      bodyText: 'Content'
    };

    const result = await classifyWithFallback(email);

    expect(result.category).toBe('PROJECT');
    expect(result.reasoning).toContain('Rule applied');
  });

  it('should request manual review for conflicting classifications', async () => {
    // Mock to return different results from GPT and Claude
    vi.spyOn(global, 'classifyEmail').mockResolvedValueOnce({
      category: 'PROJECT',
      confidence: 0.6,
      reasoning: 'GPT reasoning'
    });

    vi.spyOn(global, 'classifyWithClaude').mockResolvedValueOnce({
      category: 'TALENT',
      confidence: 0.6,
      reasoning: 'Claude reasoning'
    });

    const result = await classifyWithFallback(testEmail);

    expect(result.category).toBe('UNCERTAIN');
    expect(result.requiresManualReview).toBe(true);
  });
});
Enter fullscreen mode Exit fullscreen mode

Troubleshooting

Common Issues and Solutions

1. Token Limit Error

Symptom: "Maximum token limit exceeded" error

Cause: Email body too long or too many Few-shot examples

Solution:

// Email body truncation
function truncateEmailBody(bodyText: string, maxLength: number = 2000): string {
  if (bodyText.length <= maxLength) return bodyText;

  // Prioritize important sections
  const header = bodyText.substring(0, 500);
  const footer = bodyText.substring(bodyText.length - 300);
  const middle = bodyText.substring(500, maxLength - 800);

  return `${header}\n...[truncated]...\n${middle}\n...[truncated]...\n${footer}`;
}

// Pre-check token count
import { encoding_for_model } from 'tiktoken';

function estimateTokens(text: string): number {
  const encoder = encoding_for_model('gpt-4');
  const tokens = encoder.encode(text);
  encoder.free();
  return tokens.length;
}
Enter fullscreen mode Exit fullscreen mode

2. Rate Limit Error

Symptom: "Rate limit exceeded" error

Solution:

// Rate limit handling with retry logic
import { RateLimiter } from 'limiter';

const limiter = new RateLimiter({
  tokensPerInterval: 100,
  interval: 'minute'
});

async function classifyWithRateLimit(email: Email): Promise<ClassificationResult> {
  await limiter.removeTokens(1);

  try {
    return await classifyEmail(email);
  } catch (error) {
    if (error.code === 'rate_limit_exceeded') {
      const waitTime = error.headers['retry-after'] || 60;
      console.log(`Rate limited. Waiting ${waitTime}s...`);
      await new Promise(resolve => setTimeout(resolve, waitTime * 1000));
      return classifyWithRateLimit(email);
    }
    throw error;
  }
}
Enter fullscreen mode Exit fullscreen mode

3. Accuracy Degradation

Symptom: Accuracy decreases over time

Cause: Changes in business rules, emergence of new email patterns

Solution:

// Regular accuracy checks and automatic improvement
async function autoImprovePrompt() {
  const recentMisclassifications = await getRecentMisclassifications(30); // 30 days

  if (recentMisclassifications.length > 10) {
    // Automatically generate new Few-shot examples
    const newExamples = generateFewShotExamples(recentMisclassifications);

    // Conduct A/B test
    const improved = await testImprovedPrompt(newExamples);

    if (improved.accuracy > currentAccuracy * 1.05) {
      await deployNewPrompt(improved.prompt);
      console.log('Prompt automatically improved and deployed');
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Lessons Learned

Unexpected Pitfalls

  1. Keyword Trap

    • The word "project" appears in both categories
    • Keyword matching without context is dangerous
  2. Quality of Few-shot Examples

    • Simply increasing examples isn't effective
    • Important to include ambiguous cases
  3. Prompt Length Trade-off

    • Too detailed prompt → Increased tokens, higher cost
    • Too concise prompt → Reduced accuracy
    • Balance is crucial

Useful Knowledge for the Future

  1. Prompt Design Best Practices
   Step 1: Define clear judgment criteria
   Step 2: Add Few-shot examples of typical cases
   Step 3: Add Few-shot examples of ambiguous cases
   Step 4: Articulate judgment logic
   Step 5: Test and improve iteratively
Enter fullscreen mode Exit fullscreen mode
  1. Selecting Few-shot Examples

    • Diversity: Cover various patterns
    • Clarity: Explicitly state reasoning
    • Practicality: Reference actual misclassification cases
  2. Gradual Accuracy Improvement Approach

   Phase 1: Measure baseline accuracy (60%)
   Phase 2: Clarify judgment criteria (80%)
   Phase 3: Add Few-shot examples (95%)
   Phase 4: Utilize personal information patterns (100%)
Enter fullscreen mode Exit fullscreen mode

Better Implementation Discovered

Before (vague instructions):

const prompt = `
Please classify this email.
- PROJECT: Project email
- TALENT: Personnel email
`;
Enter fullscreen mode Exit fullscreen mode

After (clear judgment criteria):

const prompt = `
## Most Important Point: Who is providing what?

PROJECT = Client provides project
TALENT = Staffing company provides engineer

Judgment method:
1. Check definitive keywords
2. Confirm sender's position
3. Check for personal information
4. Match with Few-shot examples
`;
Enter fullscreen mode Exit fullscreen mode

Conclusion

Improving AI classification accuracy was an opportunity to reaffirm the importance of prompt engineering. Key takeaways from this effort:

  • Clarifying Judgment Criteria: The perspective of "who is providing what"
  • Leveraging Few-shot Learning: Especially learning from ambiguous cases
  • Gradual Improvement: Accumulating small improvements

Particularly, including actual misclassification cases in Few-shot examples was the decisive factor in improving accuracy. Since AI learns from provided examples, what examples you provide is extremely important.

If you're facing challenges with AI classification accuracy, start by collecting misclassification cases and using them as Few-shot examples. You'll be surprised at how much accuracy improves.


The code presented in this article is simplified from actual production code. In real implementations, additional considerations such as error handling and security checks are required.

Related Technologies: OpenAI GPT, Claude, TypeScript, Prompt Engineering, Few-shot Learning, Natural Language Processing, AI Classification, Machine Learning

Author: Development Team

Top comments (0)