DEV Community

jelizaveta
jelizaveta

Posted on

Python TXT to Excel (Automatic Delimiter Detection)

In data processing workflows, we often encounter the need to convert TXT text files to Excel format. However, a common challenge is that TXT files may use different delimiters—tabs, commas, semicolons, vertical bars, or even spaces. If the code can only handle a single delimiter, errors will occur frequently. This article introduces how to use the Spire.XLS for Python library to create an intelligent tool that can automatically detect delimiters and complete the conversion.

Why Do We Need Automatic Delimiter Detection?

In practice, the TXT files we receive come in various formats:

  • Files exported from databases may use tabs (\t)
  • CSV files typically use commas (,)
  • Files exported from certain systems use vertical bars (|) or semicolons (;)

The traditional approach is to manually inspect the file and then modify the delimiter in the code. This is not only inefficient but also error-prone. Our goal is to let the program figure it out itself!

Core Technology: Spire.XLS for Python

Spire.XLS is a powerful Excel manipulation library that can create, read, and modify Excel files without needing Microsoft Office installed. Compared to openpyxl and xlswriter, its API design is more intuitive, making it particularly suitable for rapid development.

Installation :

pip install spire.xls
Enter fullscreen mode Exit fullscreen mode

Complete Code Implementation

python

from spire.xls import *
from spire.xls.common import *

def detect_delimiter(file_path, sample_lines=5):
    """Automatically detect the most likely delimiter in a text file"""
    common_delimiters = ["\t", ",", "|", ";", " "]

    with open(file_path, "r") as file:
        sample = [file.readline() for _ in range(sample_lines)]

    delimiter_counts = {}
    for delim in common_delimiters:
        count = sum(line.count(delim) for line in sample)
        if count > 0:
            delimiter_counts[delim] = count

    if not delimiter_counts:
        return "\t"  # Default to tab if no delimiter detected

    return max(delimiter_counts, key=delimiter_counts.get)

# Perform conversion
file_path = "Data.txt"
delimiter = detect_delimiter(file_path)
print(f"Detected delimiter: {repr(delimiter)}")

# Read and split data by delimiter
with open(file_path, "r") as file:
    lines = file.readlines()
data = [line.strip().split(delimiter) for line in lines]

# Create Excel workbook and write data
workbook = Workbook()
sheet = workbook.Worksheets[0]

for row_num, row_data in enumerate(data):
    for col_num, cell_data in enumerate(row_data):
        sheet.Range[row_num + 1, col_num + 1].Value = cell_data
        sheet.Range[1, col_num + 1].Style.Font.IsBold = True  # Bold header row

sheet.AllocatedRange.AutoFitColumns()  # Auto-fit column widths
workbook.SaveToFile("TXTtoExcel.xlsx", ExcelVersion.Version2016)
workbook.Dispose()
Enter fullscreen mode Exit fullscreen mode

Code Explanation

1. Delimiter Detection Mechanism

The detect_delimiter() function reads the first 5 lines of the file, counts occurrences of each candidate delimiter, and returns the one with the highest frequency. The sampling approach avoids reading the entire large file, offering excellent performance.

2. Writing Data to Excel

Using Spire.XLS's object model, cells are directly located via row and column indices. Note that Excel's rows and columns start counting from 1, while Python lists start from 0, hence the need for row_num + 1.

3. Formatting Enhancements

  • The header row is automatically bolded for better readability
  • AutoFitColumns() automatically adjusts column widths based on content
  • Resources are released after conversion (Dispose())

Practical Application Example

Suppose there is a Data.txt file with the following content:

text

Name|Department|Salary|Hire Date
Zhang San|Technical|8500|2023-01-15
Li Si|Marketing|9200|2022-11-20
Enter fullscreen mode Exit fullscreen mode

The program will automatically detect the delimiter as | and generate a properly formatted Excel table with bold headers and auto-adjusted column widths.

Important Notes

  1. File Encoding : UTF-8 encoding is used by default. If the file uses a different encoding like GBK, add the encoding='gbk' parameter to open().
  2. Large Data Processing : The free version of Spire.XLS has row limitations for very large files (max 150 rows). For production environments, consider the commercial version.

Summary

By combining automatic delimiter detection technology with Spire.XLS's powerful writing capabilities, we've implemented a universal TXT to Excel conversion tool. This script can:

  • ✅ Automatically identify 5 common delimiters
  • ✅ Complete conversion without manual intervention
  • ✅ Output beautifully formatted Excel tables

Whether you're a data analyst, operations engineer, or regular office user, you can benefit from this tool. You can save this code as a universal utility to handle text files in various formats at any time, dramatically improving work efficiency!

Top comments (0)