DEV Community

Jeremy K.
Jeremy K.

Posted on

Extract Text from Word in C# (No Office) | Free Tool Guide

In daily development, Word document processing is a high-frequency requirement: extracting key clauses from contracts, parsing data from business reports, retrieving fixed fields from template documents, etc.

This article will show you how to implement Word content extraction using Free Spire.Doc for .NET - no Office installation required, zero cost, covering everything from basic full-document extraction to advanced paragraph + format reading.


What is Free Spire.Doc for .NET?

It's a free Word processing library designed specifically for .NET developers, with core values including:

  • No dependencies: No need to install Microsoft Office;
  • Multi-format support: Compatible with legacy .doc (97-2003) and modern .docx (2007+) files, covering over 90% of daily scenarios;
  • Lightweight and efficient: Small size, fast loading speed, no need for additional deployment of runtime environment;

⚠️ Limitation: Designed for small to medium documents only (supports up to 500 paragraphs).


Extract content from Word (Text & formatting)

1. Install the library

Quick installation via NuGet. You can choose either of the two methods:

  • Method 1: NuGet Package Manager Console Open the console, enter the command and press Enter:
  Install-Package FreeSpire.Doc
Enter fullscreen mode Exit fullscreen mode
  • Method 2: Graphical interface Right-click the project → "Manage NuGet Packages" → search for "FreeSpire.Doc" → click "Install".

💡 Tip: After installation, you need to reference the Spire.Doc core namespace at the top of the code. For complex operations, you may also need to add Spire.Doc.Documents (for format-related functionalities).

2. Basic: Extract full document text

If you only need to extract all text from Word (ignoring formatting), it can be done with just a few lines of code with the core GetText() method:

using Spire.Doc;
using System.IO;

namespace WordExtractor
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a Document instance and load the target Word file
            Document doc = new Document();
            doc.LoadFromFile("ContractTemplate.docx");

            // Extract full document text
            string fullText = doc.GetText();

            // 3. Save the extracted text as TXT
            File.WriteAllText("ExtractedContractText.txt", fullText.ToString());

        }
    }
}
Enter fullscreen mode Exit fullscreen mode

⚠️ Note: If you receive a "file not found" prompt, confirm that the Word file path is correct.

3. Advanced: Read specific paragraphs + format

Sometimes you need to precisely extract a certain section of content or obtain format information (such as whether a title is centered).

// Locate the target paragraph (index starts from 0)
Section targetSection = doc.Sections[0];
Paragraph targetPara = targetSection.Paragraphs[4];

// Extract paragraph content + format information
string paraText = targetPara.Text; // Paragraph text
HorizontalAlignment align = targetPara.Format.HorizontalAlignment; // Alignment (left/center/right)
float beforeSpacing = targetPara.Format.BeforeSpacing; // Before paragraph spacing (unit: pt, points)
float afterSpacing = targetPara.Format.AfterSpacing; // After paragraph spacing

// 4. Save results (including format information)
using (StreamWriter sw = new StreamWriter("SpecificParagraphDetails.txt", false, Encoding.UTF8))
{
      sw.WriteLine("=== Paragraph Details ===");
      sw.WriteLine($"Paragraph text: {paraText}");
      sw.WriteLine($"Alignment: {align}");
      sw.WriteLine($"Before spacing: {beforeSpacing}pt | After spacing: {afterSpacing}pt");
}
Enter fullscreen mode Exit fullscreen mode

Recommended Use Cases

  • Individual developers/small teams;
  • Processing small documents (<500 paragraphs) such as contracts, reports, templates;
  • Need to extract text, basic formatting, tables, images, etc.

In conclusion, Free Spire.Doc for .NET is one of the best solutions for "lightweight Word extraction needs" - zero cost, no dependencies, easy to use, helping you quickly get rid of the inefficiency and errors of manual operations.

Top comments (1)

Collapse
 
onlineproxy profile image
OnlineProxy

You can yank text from .doc/.docx in C# without Office: install FreeSpire.Doc, call Document.LoadFromFile() + Document.GetText(), then dump it with File.WriteAllText()-easy peasy. In ASP.NET Core, the classic “file not found” is usually path drama-use absolute paths via IWebHostEnvironment.ContentRootPath and make sure the app’s got permissions. For Docker/serverless, roll with .NET 8 base images, install fonts, spin up a fresh Document per request and add solid error handling for pathing, corruption, and other gotchas.