The table shows 10 columns. You export it. The CSV has 10 columns.
But the page has 15 columns of data. Where did the other 5 go?
HTML tables often contain more data than what's visible. Hidden columns, data attributes, collapsed rows—all invisible to basic extraction methods.
Here's how to find and extract the data you can't see.
Types of Hidden Data
1. CSS-Hidden Columns
The simplest case: columns exist in the DOM but are hidden with CSS.
<th style="display: none;">Internal ID</th>
<td style="display: none;">12345</td>
Or via classes:
.hidden-column { display: none; }
Why sites do this: Mobile responsiveness (hide columns on small screens), internal data for JavaScript, progressive disclosure.
How to detect: Open DevTools, inspect the table, look for cells with display: none or visibility: hidden.
2. Data Attributes
HTML5 allows custom data-* attributes on any element. Tables use these to store metadata that JavaScript accesses but users don't see.
<tr data-id="12345" data-category="electronics" data-stock="47">
<td>Laptop</td>
<td>$999</td>
</tr>
The visible table shows "Laptop" and "$999". But the row carries three extra data points.
Common data attributes:
-
data-id— internal identifier -
data-sort-value— numeric value for sorting (when display shows "Jan 2024" but sort needs 202401) -
data-raw— unformatted value (when display shows "$1.2M" but data is 1200000) -
data-href— link URL
3. Title and Tooltip Attributes
The title attribute provides hover text that often contains additional information.
<td title="Updated: 2024-01-15 14:32:00 UTC">Jan 15</td>
The cell displays "Jan 15". The full timestamp is in the tooltip.
4. Collapsed/Expandable Rows
Tables with drill-down functionality hide detail rows until clicked.
<tr class="parent-row">
<td>Category A</td>
<td>$50,000</td>
</tr>
<tr class="child-row" style="display: none;">
<td>— Subcategory A1</td>
<td>$30,000</td>
</tr>
<tr class="child-row" style="display: none;">
<td>— Subcategory A2</td>
<td>$20,000</td>
</tr>
The data exists. It's just not expanded.
5. Lazy-Loaded Content
Some tables show a subset and load more via JavaScript when you scroll or click "Load More."
<table id="results">
<!-- First 20 rows rendered -->
</table>
<button onclick="loadMore()">Show More Results</button>
The remaining data isn't in the DOM until you trigger loading.
Extracting Hidden Data
Extracting CSS-Hidden Columns
With JavaScript (in browser console):
// Remove display:none from all cells
document.querySelectorAll('td, th').forEach(el => {
el.style.display = '';
});
Now export normally—the columns are visible.
With Python + BeautifulSoup:
BeautifulSoup ignores CSS. It extracts all elements regardless of visibility.
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
table = soup.find('table')
for row in table.find_all('tr'):
cells = [td.get_text(strip=True) for td in row.find_all(['td', 'th'])]
print(cells) # Includes hidden columns
Extracting Data Attributes
JavaScript approach:
const rows = document.querySelectorAll('table tr');
const data = [];
rows.forEach(row => {
const rowData = {
// Visible text
cells: Array.from(row.querySelectorAll('td')).map(td => td.textContent),
// Data attributes
id: row.dataset.id,
category: row.dataset.category,
stock: row.dataset.stock
};
data.push(rowData);
});
console.log(JSON.stringify(data, null, 2));
Python approach:
for row in table.find_all('tr'):
# Get data attributes
attrs = {k: v for k, v in row.attrs.items() if k.startswith('data-')}
# Get cell text
cells = [td.get_text(strip=True) for td in row.find_all('td')]
print(f"Data: {attrs}, Cells: {cells}")
Extracting Title Attributes
const cells = document.querySelectorAll('table td');
cells.forEach(cell => {
if (cell.title) {
console.log(`Display: ${cell.textContent}, Full: ${cell.title}`);
}
});
Expanding Collapsed Rows
Option 1: Click all expanders
// Find and click all expand buttons
document.querySelectorAll('.expand-btn, .toggle-row').forEach(btn => {
btn.click();
});
// Wait for DOM to update, then export
setTimeout(() => {
// Export logic here
}, 1000);
Option 2: Remove hidden class
document.querySelectorAll('.child-row, .detail-row').forEach(row => {
row.style.display = '';
row.classList.remove('hidden', 'collapsed');
});
Triggering Lazy Load
This is trickier—you need to simulate the action that triggers loading.
// Scroll to bottom to trigger infinite scroll
window.scrollTo(0, document.body.scrollHeight);
// Or click "Load More" repeatedly
const loadMore = document.querySelector('.load-more-btn');
while (loadMore && !loadMore.disabled) {
loadMore.click();
await new Promise(r => setTimeout(r, 500)); // Wait for load
}
Real-World Example: Sort Values
Sports statistics tables often display formatted numbers but sort by raw values.
<td data-sort="1234567">1.23M</td>
<td data-sort="20240115">Jan 15, 2024</td>
<td data-sort="0.347">34.7%</td>
If you only extract the visible text, you get formatted strings that won't sort or calculate correctly.
Extraction with both values:
const cells = document.querySelectorAll('table td');
const data = Array.from(cells).map(cell => ({
display: cell.textContent.trim(),
sortValue: cell.dataset.sort || cell.textContent.trim()
}));
When Browser Extensions Help
Good table export tools handle some of this automatically:
- CSS-hidden columns can be included via option
- Data attributes can be extracted as additional columns
- Number normalization can parse "$1.23M" into 1230000
HTML Table Exporter extracts visible content with proper handling of merged cells and formatting. For more complex extraction needs without code, see our guide on scraping tables from websites without code.
For data attributes and deeply hidden content, you may need the JavaScript approaches above.
The Inspection Workflow
Before extracting any table:
- Open DevTools (F12)
- Inspect the table element
-
Check for:
-
display: noneon columns -
data-*attributes on rows/cells -
titleattributes with extra info - Hidden child rows
- "Load more" buttons
-
- Decide: Is the visible data enough, or do you need the hidden data?
Most of the time, visible data is sufficient. But when it's not, knowing how to find and extract hidden data makes the difference.
Summary
HTML tables can contain multiple layers of data:
| Layer | How to Access |
|---|---|
| Visible text | Standard export |
| CSS-hidden columns | Remove display:none or use BeautifulSoup |
| Data attributes | JavaScript dataset or Python attrs |
| Title/tooltips | Extract title attribute |
| Collapsed rows | Expand or remove hidden class |
| Lazy-loaded | Trigger loading first |
The data is usually there. You just need to know where to look.
Need to export the data you can see? Learn more at gauchogrid.com/html-table-exporter or try HTML Table Exporter free on the Chrome Web Store.
Top comments (0)