DEV Community

Cover image for Excel Import in ASP.NET Core: 12 Mistakes That Cause Real-World Problems
Chloe
Chloe

Posted on

Excel Import in ASP.NET Core: 12 Mistakes That Cause Real-World Problems

Excel import often seems straightforward—until it reaches production.

A spreadsheet uploaded by one user works perfectly, while another triggers validation failures, duplicate records, incorrect dates, or performance issues. Reading Excel files is usually the easy part. The real challenge is handling inconsistent, user-generated data reliably and at scale.

In this article, we'll explore the most common pitfalls developers encounter when importing Excel files in ASP.NET Core applications and discuss practical ways to avoid them. We'll cover file validation, worksheet parsing, data validation, duplicate detection, performance optimization, and security considerations.

The examples use Spire.XLS for worksheet parsing, but the concepts apply equally to EPPlus, ClosedXML, NPOI, MiniExcel, and other Excel libraries.

The Typical Excel Import Pipeline

Many developers think of Excel import as:

Read Excel → Save to Database
Enter fullscreen mode Exit fullscreen mode

In production, the process is usually more complex:

Upload File
    ↓
Validate File
    ↓
Parse Worksheet
    ↓
Transform Data
    ↓
Validate Business Rules
    ↓
Detect Duplicates
    ↓
Bulk Insert
    ↓
Generate Import Report
Enter fullscreen mode Exit fullscreen mode

8-step Excel import pipeline process

Each stage introduces different risks. Understanding the pipeline makes it easier to identify where problems occur and how to prevent them.

Secure File Uploads

Pipeline Stage: Upload File

Before reading worksheets and validating rows, make sure the uploaded file itself is safe to process.

Many import implementations focus heavily on parsing logic while performing little validation on the uploaded file. This can lead to security issues, resource exhaustion, and unexpected runtime failures.

Pitfall #1 — Trusting the File Extension

A file named employees.xlsx is not necessarily a valid Excel file.

Users may accidentally upload the wrong file, and malicious actors can rename almost any file to use an .xlsx extension. If your application trusts the filename alone, invalid files may reach your parser and cause failures.

Instead of validating only the extension, verify the file content itself.

Modern .xlsx files are ZIP-based packages and begin with the magic number 50 4B 03 04.

private static bool IsValidExcelFile(IFormFile file)
{
    using var stream = file.OpenReadStream();

    byte[] magicBytes = new byte[4];
    int bytesRead = stream.Read(magicBytes, 0, 4);

    if (bytesRead < 4)
        return false;

    return magicBytes[0] == 0x50 &&
           magicBytes[1] == 0x4B &&
           magicBytes[2] == 0x03 &&
           magicBytes[3] == 0x04;
}
Enter fullscreen mode Exit fullscreen mode

You should also validate MIME types:

private static readonly string[] AllowedMimeTypes =
{
    "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
    "application/vnd.ms-excel"
};

private static bool HasValidMimeType(IFormFile file) =>
    AllowedMimeTypes.Contains(file.ContentType);
Enter fullscreen mode Exit fullscreen mode

A simple rule:

Never trust the filename. Always validate the file itself.

Pitfall #2 — No Upload Size Limit

Excel files are usually small during development.

In production, users may upload spreadsheets containing tens of thousands of rows, embedded images, or multiple worksheets. Without limits, a single upload can consume excessive server resources.

ASP.NET Core allows upload limits directly on the endpoint:

[HttpPost("import")]
[RequestSizeLimit(10_000_000)] // 10 MB
[RequestFormLimits(MultipartBodyLengthLimit = 10_000_000)]
public async Task<IActionResult> Import(IFormFile file)
{
    if (file == null || file.Length == 0)
        return BadRequest("No file uploaded.");

    if (!HasValidMimeType(file))
        return BadRequest("Invalid file type.");

    if (!IsValidExcelFile(file))
        return BadRequest("File does not appear to be a valid Excel document.");

    // proceed to parsing
}
Enter fullscreen mode Exit fullscreen mode

The exact limit depends on your business requirements, but having no limit at all is rarely a good idea.

4-layer file upload filter process

Reading Excel Data Reliably

Pipeline Stage: Parse Worksheet

Once a file passes upload validation, the next challenge is extracting data correctly.

Many import implementations work perfectly with carefully prepared sample files but fail when real users upload spreadsheets containing blank rows, reordered columns, or unexpected formatting.

Pitfall #3 — Hardcoded Column Indexes

A common approach is reading data using fixed column positions:

var name = row[0].Value;
var email = row[3].Value;
var department = row[5].Value;
Enter fullscreen mode Exit fullscreen mode

This works until someone rearranges the worksheet.

If the Email column is moved, your import logic silently starts mapping the wrong data.

A safer approach is header-based mapping:

var headerRow = sheet.Rows[0];
var headers = new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);

for (int i = 0; i < headerRow.Columns.Count; i++)
{
    var headerValue = headerRow.Columns[i].Value?.ToString()?.Trim();

    if (!string.IsNullOrEmpty(headerValue))
        headers[headerValue] = i;
}

var requiredHeaders = new[] { "Name", "Email", "Department" };

var missingHeaders = requiredHeaders
    .Where(h => !headers.ContainsKey(h))
    .ToList();

if (missingHeaders.Any())
    return BadRequest(
        $"Missing required columns: {string.Join(", ", missingHeaders)}");

for (int i = 1; i < sheet.Rows.Count; i++)
{
    var row = sheet.Rows[i];

    var name = row.Columns[headers["Name"]].Value?.ToString();
    var email = row.Columns[headers["Email"]].Value?.ToString();
    var department = row.Columns[headers["Department"]].Value?.ToString();
}
Enter fullscreen mode Exit fullscreen mode

With this approach, column order becomes irrelevant.

Hardcoded vs header-based mapping comparison

Pitfall #4 — Assuming Row 1 Is Always the Header

Real-world spreadsheets often contain title rows, metadata, or blank lines before the actual header row.

Instead of assuming the first row contains headers, search for the required column names before processing data.

private static int FindHeaderRowIndex(
    Worksheet sheet,
    IEnumerable<string> expectedHeaders)
{
    for (int i = 0; i < Math.Min(sheet.Rows.Count, 10); i++)
    {
        var row = sheet.Rows[i];

        var cellValues = row.Columns
            .Select(c => c.Value?.ToString()?.Trim() ?? string.Empty)
            .ToList();

        if (expectedHeaders.All(h =>
            cellValues.Any(v =>
                v.Equals(h, StringComparison.OrdinalIgnoreCase))))
        {
            return i;
        }
    }

    return -1;
}
Enter fullscreen mode Exit fullscreen mode

Once the header row is located, begin importing from the next row.

This small improvement makes your import process much more tolerant of user-generated spreadsheets.

Pitfall #5 — Empty Rows in the Middle of Data

Users frequently insert blank rows to visually separate sections of data.

Without proper checks, blank rows may cause:

  • Validation errors
  • Null reference exceptions
  • Empty database records

Verify that a row contains data before processing it.

private static bool IsRowEmpty(CellRange row)
{
    return row.Columns.All(c =>
        string.IsNullOrWhiteSpace(c.Value?.ToString()));
}
Enter fullscreen mode Exit fullscreen mode

Then skip empty rows:

for (int i = headerRowIndex + 1; i < sheet.Rows.Count; i++)
{
    var row = sheet.Rows[i];

    if (IsRowEmpty(row))
        continue;

    // process row
}
Enter fullscreen mode Exit fullscreen mode

Filtering empty rows early keeps downstream validation logic much cleaner.

Pitfall #6 — Mixed Data Types in a Column

Excel allows users to enter almost anything into a cell.

Consider a Quantity column:

100
250
N/A
-
Enter fullscreen mode Exit fullscreen mode

Direct conversions can easily throw exceptions:

int quantity = int.Parse(cell.Value);
Enter fullscreen mode Exit fullscreen mode

A safer approach is defensive parsing:

private static int? ParseInt(string value)
{
    if (string.IsNullOrWhiteSpace(value))
        return null;

    return int.TryParse(
        value.Trim(),
        out var result)
        ? result
        : null;
}
Enter fullscreen mode Exit fullscreen mode

Even better, record invalid values as validation errors and allow users to correct them later.

The goal isn't to trust Excel data—it's to assume the data may be imperfect and handle it gracefully.

Validation & Error Reporting

Pipeline Stage: Validate Business Rules

Successfully reading worksheet data doesn't mean the data is valid.

Rows may contain missing fields, malformed emails, invalid dates, negative quantities, or values that violate business rules. The real challenge is deciding how your import process should respond when those errors occur.

Pitfall #7 — Inadequate Validation Strategy

If you've ever received an Excel file with 100 rows and multiple data issues, you've probably experienced this.

The importer finds the first error.

The user fixes it.

Uploads again.

Then discovers the second error.

And repeats that cycle over and over.

Problem 1: Failing Fast

A common mistake is stopping the import as soon as the first error is encountered:

foreach (var row in rows)
{
    if (string.IsNullOrEmpty(row.Email))
        throw new Exception($"Row {row.Index}: Email is required.");

    Save(row);
}
Enter fullscreen mode Exit fullscreen mode

While simple, this approach creates a poor user experience.

Imagine a spreadsheet containing hundreds of rows. The user fixes the first error, uploads again, discovers another error, fixes that one, uploads again, and repeats the process several times.

Instead, collect all validation errors and return them together.

A simple pattern is to separate valid rows from invalid ones:

public class ImportResult<T>
{
    public List<T> ValidRows { get; set; } = new();
    public List<RowError> Errors { get; set; } = new();
    public int TotalRows { get; set; }

    public bool HasErrors => Errors.Any();
    public int SuccessCount => ValidRows.Count;
    public int ErrorCount => Errors.Count;
}

public class RowError
{
    public int RowNumber { get; set; }
    public string Column { get; set; }
    public string Message { get; set; }
}
Enter fullscreen mode Exit fullscreen mode

This allows users to see every issue in a single import attempt.

Problem 2: Missing Business Validation

Another common mistake is validating file structure but not validating business rules.

Consider:

Name  = ""
Email = "abc"
Age   = -10
Enter fullscreen mode Exit fullscreen mode

Excel can read these values successfully, but they clearly violate business requirements.

A dedicated validator for each imported entity keeps validation logic organized and maintainable.

Using FluentValidation:

public class EmployeeRowValidator : AbstractValidator<EmployeeRow>
{
    public EmployeeRowValidator()
    {
        RuleFor(x => x.Name)
            .NotEmpty()
            .MaximumLength(100);

        RuleFor(x => x.Email)
            .NotEmpty()
            .EmailAddress();

        RuleFor(x => x.Age)
            .InclusiveBetween(18, 100);
    }
}
Enter fullscreen mode Exit fullscreen mode

Integrate validation into the import process:

var validator = new EmployeeRowValidator();
var result = new ImportResult<EmployeeRow>();

for (int i = 0; i < dataRows.Count; i++)
{
    var row = dataRows[i];
    var rowNumber = i + 2;

    var validationResult = validator.Validate(row);

    if (!validationResult.IsValid)
    {
        foreach (var error in validationResult.Errors)
        {
            result.Errors.Add(new RowError
            {
                RowNumber = rowNumber,
                Column = error.PropertyName,
                Message = error.ErrorMessage
            });
        }
    }
    else
    {
        result.ValidRows.Add(row);
    }
}
Enter fullscreen mode Exit fullscreen mode

Turning Errors into Actionable Feedback

Once validation is complete, return structured error information that helps users fix problems quickly:

if (result.HasErrors)
{
    return UnprocessableEntity(new
    {
        totalRows = result.TotalRows,
        successCount = result.SuccessCount,
        errorCount = result.ErrorCount,
        errors = result.Errors
    });
}
Enter fullscreen mode Exit fullscreen mode

A response like this allows the frontend to display row numbers, column names, and validation messages in a user-friendly error report.

Bad vs good UX validation strategies

Handling Duplicate Records

Pipeline Stage: Detect Duplicates

Import failures are obvious.

Duplicate records are often more dangerous because they can go unnoticed for a long time.

Pitfall #8 — The Duplicate Problem Nobody Notices

Consider:

Name          Email
John Smith    john@example.com
Sarah Jones   sarah@example.com
John Smith    john@example.com
Enter fullscreen mode Exit fullscreen mode

Without duplicate detection, both rows may be inserted successfully.

Fix #1 — Detect Duplicates Within the File

Track unique values during processing.

For example, if email addresses must be unique:

var seenEmails = new HashSet<string>();

foreach (var employee in employees)
{
    if (!seenEmails.Add(employee.Email))
    {
        errors.Add(new RowError
        {
            Message = $"Duplicate email: {employee.Email}"
        });
    }
}
Enter fullscreen mode Exit fullscreen mode

Fix #2 — Enforce Database Constraints

Application-level validation should never be your only protection.

Create a database constraint:

CREATE UNIQUE INDEX IX_Employees_Email
ON Employees(Email);
Enter fullscreen mode Exit fullscreen mode

Even if duplicate records slip through application validation, the database prevents them from being stored.

Two-layer duplicate protection infographic

When Upserts Make Sense

In some scenarios, users regularly re-import existing datasets.

Instead of rejecting existing records, consider an upsert strategy where existing records are updated while new records are inserted. This approach prevents duplicate data while allowing recurring imports to remain efficient and user-friendly.

Performance at Scale

Pipeline Stage: Parse Worksheet + Bulk Insert

Most Excel import features are tested with files containing only a few hundred rows.

Everything works perfectly—until someone uploads a spreadsheet with 50,000 rows.

Pitfall #9 — Loading the Entire Workbook into Memory

Many Excel libraries load the entire workbook before processing begins.

For large files, this can dramatically increase memory usage and reduce scalability.

If you only need sequential access to rows, consider streaming instead.

MiniExcel supports asynchronous streaming:

using var stream = file.OpenReadStream();

await foreach (var row in MiniExcel.QueryAsync<EmployeeRow>(stream))
{
    // Validate and process row
}
Enter fullscreen mode Exit fullscreen mode

Streaming reduces memory consumption and makes large imports much more efficient.

As a general rule:

If you expect imports containing tens of thousands of rows, avoid keeping the entire dataset in memory whenever possible.

Pitfall #10 — When SaveChanges() Becomes the Bottleneck

Another common bottleneck appears during database insertion.

The following pattern generates one database transaction per row:

foreach (var row in result.ValidRows)
{
    _dbContext.Employees.Add(new Employee { ... });
    await _dbContext.SaveChangesAsync();
}
Enter fullscreen mode Exit fullscreen mode

For large imports, performance quickly deteriorates.

A much better option is bulk insertion.

SqlBulkCopy

For SQL Server, SqlBulkCopy is one of the fastest approaches available:

private async Task BulkInsertAsync(List<EmployeeRow> rows)
{
    var table = new DataTable();

    table.Columns.Add("Name", typeof(string));
    table.Columns.Add("Email", typeof(string));
    table.Columns.Add("Department", typeof(string));

    foreach (var row in rows)
    {
        table.Rows.Add(
            row.Name,
            row.Email,
            row.Department);
    }

    using var connection = new SqlConnection(_connectionString);

    await connection.OpenAsync();

    using var transaction = connection.BeginTransaction();
    using var bulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.Default, transaction)
    {
        DestinationTableName = "Employees",
        BatchSize = 1000
    };

    bulkCopy.ColumnMappings.Add("Name", "Name");
    bulkCopy.ColumnMappings.Add("Email", "Email");
    bulkCopy.ColumnMappings.Add("Department", "Department");

    try
    {
        await bulkCopy.WriteToServerAsync(table);
        transaction.Commit();
    }
    catch
    {
        transaction.Rollback();
        throw;
    }
}
Enter fullscreen mode Exit fullscreen mode

Bulk operations can reduce import times dramatically when processing large datasets.

Row-by-row vs batch insert performance

Consider Background Processing for Very Large Imports

Even with streaming and bulk inserts, imports containing hundreds of thousands of rows may still take time to complete.

For these scenarios, consider processing files asynchronously using tools such as Hangfire, .NET Worker Services, or Azure Functions. Returning a job ID immediately and performing the import in the background improves user experience and makes retries easier when failures occur.

Culture & Formatting Issues

Pipeline Stage: Clean & Transform

Some of the most difficult import bugs are caused by data that appears valid but is interpreted differently depending on culture or locale.

Pitfall #11 — Culture-Specific Formatting

Date parsing is one of the most common examples.

The value:

01/02/2025
Enter fullscreen mode Exit fullscreen mode

could represent:

  • January 2, 2025
  • February 1, 2025

depending on locale.

The safest approach is to define a standard format and enforce it consistently.

For example:

Date of Joining (yyyy-MM-dd)
Enter fullscreen mode Exit fullscreen mode

Then parse using an explicit format:

private static DateTime? ParseDate(string value)
{
    if (string.IsNullOrWhiteSpace(value))
        return null;

    if (DateTime.TryParseExact(
            value.Trim(),
            "yyyy-MM-dd",
            CultureInfo.InvariantCulture,
            DateTimeStyles.None,
            out var result))
    {
        return result;
    }

    return null;
}
Enter fullscreen mode Exit fullscreen mode

This ensures consistent behavior regardless of server configuration.

Date format ambiguity across regions

Decimal Separator Confusion

Decimal values present a similar challenge.

Examples:

1,234.56   (US)
1.234,56   (EU)
1 234,56   (Some locales)
Enter fullscreen mode Exit fullscreen mode

Without normalization, the same value may be interpreted differently.

A defensive parsing approach:

private static decimal? ParseDecimal(string value)
{
    if (string.IsNullOrWhiteSpace(value))
        return null;

    var normalized = value.Trim();

    if (normalized.Contains(',') && normalized.Contains('.'))
    {
        int lastComma = normalized.LastIndexOf(',');
        int lastPeriod = normalized.LastIndexOf('.');

        if (lastComma > lastPeriod)
        {
            normalized = normalized
                .Replace(".", "")
                .Replace(",", ".");
        }
        else
        {
            normalized = normalized.Replace(",", "");
        }
    }

    return decimal.TryParse(
        normalized,
        NumberStyles.Any,
        CultureInfo.InvariantCulture,
        out var result)
        ? result
        : null;
}
Enter fullscreen mode Exit fullscreen mode

For applications where numeric precision is critical, the best solution is often to enforce a consistent format through import templates and validation rules.

The Excel Security Risk Most Developers Ignore

Pipeline Stage: Parse + Store

File validation protects your application from invalid uploads, but it doesn't eliminate all security risks. Even a valid Excel file may contain malicious content or exploit weaknesses in the parsing process.

Uploaded spreadsheets are external input, and external input should never be trusted.

Pitfall #12 — Formula Injection

Formula injection occurs when user-supplied values are later exported back to Excel and interpreted as formulas.

For example:

=CMD|'/C calc'!A0
Enter fullscreen mode Exit fullscreen mode

Your import process may treat this as a normal string and store it successfully. Later, when the data is exported and opened in Excel, the value may be executed as a formula.

The characters that trigger formula evaluation are:

=
+
-
@
Enter fullscreen mode Exit fullscreen mode

A simple mitigation is to sanitize any string beginning with these characters.

private static readonly char[] FormulaStartChars =
{
    '=', '+', '-', '@'
};

private static string SanitizeCell(string value)
{
    if (string.IsNullOrEmpty(value))
        return value;

    if (FormulaStartChars.Contains(value[0]))
        return "'" + value;

    return value;
}
Enter fullscreen mode Exit fullscreen mode

Prefixing the value with an apostrophe forces Excel to treat it as plain text instead of a formula.

Apply sanitization during the transformation stage before validation and persistence:

private static EmployeeRow TransformRow(RawRow raw)
{
    return new EmployeeRow
    {
        Name = SanitizeCell(raw.Name?.Trim()),
        Email = SanitizeCell(raw.Email?.Trim()),
        Department = SanitizeCell(raw.Department?.Trim()),
        Notes = SanitizeCell(raw.Notes?.Trim())
    };
}
Enter fullscreen mode Exit fullscreen mode

Formula injection danger vs. safety

Processing Malicious Files Without Isolation

Formula injection targets the data inside a workbook.

A different risk comes from the workbook itself.

Malformed .xlsx files can trigger excessive resource consumption or exploit vulnerabilities in parsing libraries. While actively maintained libraries reduce this risk, production systems should still take precautions.

For most applications, the practical minimum is:

  • Keep parsing libraries updated
  • Apply upload size limits
  • Use rate limiting
  • Consider background processing or isolated workers for large imports

Example rate limiting configuration:

builder.Services.AddRateLimiter(options =>
{
    options.AddFixedWindowLimiter("import", limiter =>
    {
        limiter.PermitLimit = 5;
        limiter.Window = TimeSpan.FromMinutes(1);
    });
});
Enter fullscreen mode Exit fullscreen mode

These safeguards help reduce the impact of malicious or resource-intensive uploads.

Testing Your Import Pipeline

By this point, we've covered file validation, parsing, validation rules, duplicate detection, performance optimization, formatting issues, and security concerns.

The final step is ensuring those protections continue working as your application evolves.

A simple strategy is maintaining a collection of test files that target common edge cases:

testdata/
├── empty.xlsx
├── duplicate-emails.xlsx
├── invalid-dates.xlsx
└── 100k-rows.xlsx
Enter fullscreen mode Exit fullscreen mode

Whenever a production issue is discovered, add a corresponding test file to the collection.

For example:

[Fact]
public void Validation_CollectsAllErrors_NotJustFirst()
{
    var file = ExcelTestFixtures.CreateWithRows(new[]
    {
        new EmployeeRow
        {
            Name = "",
            Email = "notanemail"
        },
        new EmployeeRow
        {
            Name = "Valid",
            Email = "valid@test.com"
        }
    });

    var result = ImportPipeline.Run(file);

    Assert.Equal(1, result.SuccessCount);
    Assert.Equal(2, result.ErrorCount);
}
Enter fullscreen mode Exit fullscreen mode

A useful rule of thumb:

Every import bug you fix should become a test case.

Over time, this creates a safety net that prevents old issues from reappearing.

Final Thoughts

If there's one lesson I've learned from building Excel import features, it's this:

Reading Excel is rarely the hard part.

The difficult part is dealing with everything that happens after the file is uploaded.

Validation.
Duplicate detection.
Data transformation.
Performance.
Security.
Error reporting.

Those are the areas that determine whether an importer survives real-world usage.

The good news is that most production issues aren't caused by complex algorithms. They're usually caused by assumptions.

Assumptions about file structure.
Assumptions about data quality.
Assumptions about scale.

Question those assumptions early, and your Excel import pipeline will be far more reliable when real users start uploading real spreadsheets.

Top comments (0)