Want to know what Warren Buffett is buying? Or what Citadel just dumped from their portfolio?
Every institutional investor managing more than $100 million is required to file a 13F form with the SEC every quarter, disclosing their equity holdings. This data is completely public — but working with EDGAR's raw filings is a nightmare.
In this tutorial, we'll build a Python tool that tracks institutional holdings programmatically.
The Problem with Raw EDGAR Data
If you've ever tried to pull data from SEC EDGAR, you know the pain:
- CIK numbers instead of company names
- Inconsistent XML/SGML formats across different filings
- Rate limiting (10 requests per second)
- No clean API — just raw filing documents
We'll skip all of that by using the SEC EDGAR Financial Data API which handles the parsing and gives us clean JSON responses.
Setup
Get your API key from RapidAPI (free tier: 100 requests/month), then:
pip install requests tabulate
Step 1: Search for an Institutional Investor
import requests
RAPIDAPI_KEY = "YOUR_RAPIDAPI_KEY"
BASE_URL = "https://sec-edgar-financial-data-api.p.rapidapi.com"
HEADERS = {
"x-rapidapi-host": "sec-edgar-financial-data-api.p.rapidapi.com",
"x-rapidapi-key": RAPIDAPI_KEY,
}
def search_company(query):
resp = requests.get(
f"{BASE_URL}/companies/search",
params={"query": query},
headers=HEADERS,
)
resp.raise_for_status()
results = resp.json()
for company in results[:5]:
print(f"{company['name']} (CIK: {company['cik']})")
return results
companies = search_company("Berkshire Hathaway")
Step 2: Get 13F Holdings
from tabulate import tabulate
def get_holdings(cik):
resp = requests.get(
f"{BASE_URL}/companies/{cik}/holdings",
headers=HEADERS,
)
resp.raise_for_status()
data = resp.json()
holdings = data.get("holdings", [])
total_value = sum(h.get("value", 0) for h in holdings)
print(f"\nTotal Portfolio Value: ${total_value / 1e9:.1f}B")
print(f"Number of Positions: {len(holdings)}\n")
# Top 10 by value
top = sorted(holdings, key=lambda h: h.get("value", 0), reverse=True)[:10]
table = []
for h in top:
name = h.get("nameOfIssuer", "Unknown")
value = h.get("value", 0)
pct = (value / total_value * 100) if total_value else 0
shares = h.get("sharesOrPrincipalAmount", 0)
table.append([name, f"${value / 1e9:.1f}B", f"{pct:.1f}%", f"{shares:,}"])
print(tabulate(table, headers=["Company", "Value", "% Portfolio", "Shares"]))
return holdings
# Berkshire Hathaway's CIK
holdings = get_holdings("1067983")
Output:
Total Portfolio Value: $267.4B
Number of Positions: 42
Company Value % Portfolio Shares
------------------- -------- ------------- -----------
Apple Inc. $91.2B 34.1% 400,000,000
Bank of America $29.5B 11.0% 680,233,587
American Express $26.8B 10.0% 151,610,700
Coca-Cola $23.6B 8.8% 400,000,000
Chevron $17.4B 6.5% 118,610,534
Step 3: Track Multiple Funds
# Well-known institutional investors
FUNDS = {
"Berkshire Hathaway": "1067983",
"Bridgewater Associates": "1350694",
"Renaissance Technologies": "1037389",
"Citadel Advisors": "1423053",
"Two Sigma": "1179392",
}
def compare_funds():
print("=" * 60)
print(" INSTITUTIONAL HOLDINGS COMPARISON")
print("=" * 60)
for name, cik in FUNDS.items():
print(f"\n{chr(9472) * 40}")
print(f" {name}")
print(f"{chr(9472) * 40}")
try:
get_holdings(cik)
except Exception as e:
print(f" Error: {e}")
compare_funds()
Step 4: Find Consensus Picks
One of the most interesting analyses — which stocks are multiple top funds buying?
from collections import Counter
def find_consensus_picks(fund_ciks, min_funds=3):
all_holdings = {}
for name, cik in fund_ciks.items():
try:
resp = requests.get(
f"{BASE_URL}/companies/{cik}/holdings",
headers=HEADERS,
)
resp.raise_for_status()
data = resp.json()
for h in data.get("holdings", []):
ticker = h.get("nameOfIssuer", "Unknown")
if ticker not in all_holdings:
all_holdings[ticker] = []
all_holdings[ticker].append({
"fund": name,
"value": h.get("value", 0),
"shares": h.get("sharesOrPrincipalAmount", 0),
})
except Exception as e:
print(f"Error fetching {name}: {e}")
# Filter for stocks held by multiple funds
consensus = {
k: v for k, v in all_holdings.items()
if len(v) >= min_funds
}
print(f"\n{= * 60}")
print(f" CONSENSUS PICKS (held by {min_funds}+ funds)")
print(f"{= * 60}\n")
for stock, funds in sorted(
consensus.items(),
key=lambda x: len(x[1]),
reverse=True,
):
total = sum(f["value"] for f in funds)
fund_names = ", ".join(f["fund"] for f in funds)
print(f" {stock}")
print(f" Held by {len(funds)} funds | Total value: ${total / 1e9:.1f}B")
print(f" Funds: {fund_names}\n")
return consensus
consensus = find_consensus_picks(FUNDS, min_funds=2)
What You Can Build From Here
This foundation enables several interesting applications:
- Quarter-over-quarter tracking — diff holdings between filings to see what funds added, reduced, or exited
- Alert system — get notified when a specific fund makes a new position or exits one
- Sector analysis — aggregate holdings by sector to see where institutional money is flowing
- Correlation with price — check if stock prices move after 13F disclosures (they often do)
API Reference
The SEC EDGAR Financial Data API provides these endpoints:
| Endpoint | Description |
|---|---|
/companies/search |
Search SEC filers by name |
/companies/{cik}/holdings |
Get 13F institutional holdings |
/companies/{cik}/filings |
Get filing history (10-K, 10-Q, 8-K) |
Free tier: 100 requests/month. Pro: $19/month for 5,000 requests.
Full Python wrapper on GitHub: edgar-python
Not financial advice. 13F data is delayed (filed 45 days after quarter end) and only covers long equity positions. It doesn't show short positions, options, or fixed income.
What institutional data do you track? Drop a comment — I'm curious what other data sources people are combining with 13F data.
Top comments (0)