DEV Community

Cover image for HTML Tables with Hidden Data: Scraping What You Can't See
circobit
circobit

Posted on

HTML Tables with Hidden Data: Scraping What You Can't See

The table shows 10 columns. You export it. The CSV has 10 columns.

But the page has 15 columns of data. Where did the other 5 go?

HTML tables often contain more data than what's visible. Hidden columns, data attributes, collapsed rows—all invisible to basic extraction methods.

Here's how to find and extract the data you can't see.

Types of Hidden Data

1. CSS-Hidden Columns

The simplest case: columns exist in the DOM but are hidden with CSS.

<th style="display: none;">Internal ID</th>
<td style="display: none;">12345</td>
Enter fullscreen mode Exit fullscreen mode

Or via classes:

.hidden-column { display: none; }
Enter fullscreen mode Exit fullscreen mode

Why sites do this: Mobile responsiveness (hide columns on small screens), internal data for JavaScript, progressive disclosure.

How to detect: Open DevTools, inspect the table, look for cells with display: none or visibility: hidden.

2. Data Attributes

HTML5 allows custom data-* attributes on any element. Tables use these to store metadata that JavaScript accesses but users don't see.

<tr data-id="12345" data-category="electronics" data-stock="47">
  <td>Laptop</td>
  <td>$999</td>
</tr>
Enter fullscreen mode Exit fullscreen mode

The visible table shows "Laptop" and "$999". But the row carries three extra data points.

Common data attributes:

  • data-id — internal identifier
  • data-sort-value — numeric value for sorting (when display shows "Jan 2024" but sort needs 202401)
  • data-raw — unformatted value (when display shows "$1.2M" but data is 1200000)
  • data-href — link URL

3. Title and Tooltip Attributes

The title attribute provides hover text that often contains additional information.

<td title="Updated: 2024-01-15 14:32:00 UTC">Jan 15</td>
Enter fullscreen mode Exit fullscreen mode

The cell displays "Jan 15". The full timestamp is in the tooltip.

4. Collapsed/Expandable Rows

Tables with drill-down functionality hide detail rows until clicked.

<tr class="parent-row">
  <td>Category A</td>
  <td>$50,000</td>
</tr>
<tr class="child-row" style="display: none;">
  <td>— Subcategory A1</td>
  <td>$30,000</td>
</tr>
<tr class="child-row" style="display: none;">
  <td>— Subcategory A2</td>
  <td>$20,000</td>
</tr>
Enter fullscreen mode Exit fullscreen mode

The data exists. It's just not expanded.

5. Lazy-Loaded Content

Some tables show a subset and load more via JavaScript when you scroll or click "Load More."

<table id="results">
  <!-- First 20 rows rendered -->
</table>
<button onclick="loadMore()">Show More Results</button>
Enter fullscreen mode Exit fullscreen mode

The remaining data isn't in the DOM until you trigger loading.

Extracting Hidden Data

Extracting CSS-Hidden Columns

With JavaScript (in browser console):

// Remove display:none from all cells
document.querySelectorAll('td, th').forEach(el => {
  el.style.display = '';
});
Enter fullscreen mode Exit fullscreen mode

Now export normally—the columns are visible.

With Python + BeautifulSoup:

BeautifulSoup ignores CSS. It extracts all elements regardless of visibility.

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')
table = soup.find('table')

for row in table.find_all('tr'):
    cells = [td.get_text(strip=True) for td in row.find_all(['td', 'th'])]
    print(cells)  # Includes hidden columns
Enter fullscreen mode Exit fullscreen mode

Extracting Data Attributes

JavaScript approach:

const rows = document.querySelectorAll('table tr');
const data = [];

rows.forEach(row => {
  const rowData = {
    // Visible text
    cells: Array.from(row.querySelectorAll('td')).map(td => td.textContent),
    // Data attributes
    id: row.dataset.id,
    category: row.dataset.category,
    stock: row.dataset.stock
  };
  data.push(rowData);
});

console.log(JSON.stringify(data, null, 2));
Enter fullscreen mode Exit fullscreen mode

Python approach:

for row in table.find_all('tr'):
    # Get data attributes
    attrs = {k: v for k, v in row.attrs.items() if k.startswith('data-')}

    # Get cell text
    cells = [td.get_text(strip=True) for td in row.find_all('td')]

    print(f"Data: {attrs}, Cells: {cells}")
Enter fullscreen mode Exit fullscreen mode

Extracting Title Attributes

const cells = document.querySelectorAll('table td');
cells.forEach(cell => {
  if (cell.title) {
    console.log(`Display: ${cell.textContent}, Full: ${cell.title}`);
  }
});
Enter fullscreen mode Exit fullscreen mode

Expanding Collapsed Rows

Option 1: Click all expanders

// Find and click all expand buttons
document.querySelectorAll('.expand-btn, .toggle-row').forEach(btn => {
  btn.click();
});

// Wait for DOM to update, then export
setTimeout(() => {
  // Export logic here
}, 1000);
Enter fullscreen mode Exit fullscreen mode

Option 2: Remove hidden class

document.querySelectorAll('.child-row, .detail-row').forEach(row => {
  row.style.display = '';
  row.classList.remove('hidden', 'collapsed');
});
Enter fullscreen mode Exit fullscreen mode

Triggering Lazy Load

This is trickier—you need to simulate the action that triggers loading.

// Scroll to bottom to trigger infinite scroll
window.scrollTo(0, document.body.scrollHeight);

// Or click "Load More" repeatedly
const loadMore = document.querySelector('.load-more-btn');
while (loadMore && !loadMore.disabled) {
  loadMore.click();
  await new Promise(r => setTimeout(r, 500)); // Wait for load
}
Enter fullscreen mode Exit fullscreen mode

Real-World Example: Sort Values

Sports statistics tables often display formatted numbers but sort by raw values.

<td data-sort="1234567">1.23M</td>
<td data-sort="20240115">Jan 15, 2024</td>
<td data-sort="0.347">34.7%</td>
Enter fullscreen mode Exit fullscreen mode

If you only extract the visible text, you get formatted strings that won't sort or calculate correctly.

Extraction with both values:

const cells = document.querySelectorAll('table td');
const data = Array.from(cells).map(cell => ({
  display: cell.textContent.trim(),
  sortValue: cell.dataset.sort || cell.textContent.trim()
}));
Enter fullscreen mode Exit fullscreen mode

When Browser Extensions Help

Good table export tools handle some of this automatically:

  • CSS-hidden columns can be included via option
  • Data attributes can be extracted as additional columns
  • Number normalization can parse "$1.23M" into 1230000

HTML Table Exporter extracts visible content with proper handling of merged cells and formatting. For more complex extraction needs without code, see our guide on scraping tables from websites without code.

For data attributes and deeply hidden content, you may need the JavaScript approaches above.

The Inspection Workflow

Before extracting any table:

  1. Open DevTools (F12)
  2. Inspect the table element
  3. Check for:
    • display: none on columns
    • data-* attributes on rows/cells
    • title attributes with extra info
    • Hidden child rows
    • "Load more" buttons
  4. Decide: Is the visible data enough, or do you need the hidden data?

Most of the time, visible data is sufficient. But when it's not, knowing how to find and extract hidden data makes the difference.

Summary

HTML tables can contain multiple layers of data:

Layer How to Access
Visible text Standard export
CSS-hidden columns Remove display:none or use BeautifulSoup
Data attributes JavaScript dataset or Python attrs
Title/tooltips Extract title attribute
Collapsed rows Expand or remove hidden class
Lazy-loaded Trigger loading first

The data is usually there. You just need to know where to look.


Need to export the data you can see? Learn more at gauchogrid.com/html-table-exporter or try HTML Table Exporter free on the Chrome Web Store.

Top comments (0)