Documentation Index
Fetch the complete documentation index at: https://buildingfor.vc/llms.txt
Use this file to discover all available pages before exploring further.
Overview
What data you need depends on your fund: stage focus, sector specialization, team size, and budget. A pre-seed fund sourcing emerging founders needs different data than a growth fund doing due diligence on Series B companies.
This page outlines starter kits for different fund profiles. These are starting points, not prescriptions. Your specific needs will vary.
Pre-Seed / Seed Focus
You’re looking for companies before anyone else knows about them. Signal data based on government registries matters more than comprehensive funding history, though even if you have a comprehensive funding database, you may not find all the companies you’re interested in.
What you need:
- Data to support your macro and market trend research
- Early-stage signal data (who’s starting companies, what’s trending)
- Founder and team data (background, previous experience)
- Basic company data (to track what you find)
What you probably don’t need yet:
- Comprehensive funding databases (most of your targets won’t be in them)
- Detailed financial data (too early for meaningful financials)
Typical stack:
| Category | Recommendation |
|---|
| Early Signal data | Gravity (US) or Evertrace (Europe) |
| People and Company data | Choose either People Data Labs or Coresignal for coverage of people and companies |
| Research tools | Perplexity API for quick market research |
At this stage, signal and people data matter more than comprehensive funding databases. Focus your budget there.
Series A / Series B Focus
You’re evaluating companies with some traction. Need a balance of signal data and comprehensive coverage.
What you need:
- Funding history and investor data
- Growth signals (hiring, web traffic, product launches)
- Team composition and changes
- Competitive landscape data
What you probably don’t need:
- Deep public market data
- Heavy patent/research databases (unless sector-specific)
Typical stack:
| Category | Recommendation |
|---|
| Company data | Crunchbase or Dealroom, if you can splurge: PitchBook |
| Signal data | Specter or Harmonic for growth signals |
| People data | People Data Labs, Coresignal, or MixRank for team composition |
| Web traffic | SimilarWeb if evaluating consumer companies |
| Research | Perplexity API or Exa for competitive research |
This is the “balanced” tier. You need both signal data (to find companies with momentum) and comprehensive company data (for due diligence). This is where you might start to experiment with “flat files” instead of APIs (see Accessing Data)
Growth / Late Stage Focus
You’re doing deeper due diligence on established companies. Comprehensive data and financial metrics matter most.
What you need:
- Comprehensive funding databases
- Financial and operational metrics
- Market and competitive analysis
- Public company comparables
What you probably don’t need:
- Early-stage signal data (your targets are already known)
Typical stack:
| Category | Recommendation |
|---|
| Company data | PitchBook for comprehensive financials, valuations, cap tables |
| Financial data | S&P Capital IQ for public comps |
| People data | People Data Labs for team composition and hiring trends |
At this stage, the coverage and quality of premium data becomes worth the investment. You need detailed financials, valuation history, and deal terms that lighter providers don’t offer. You probably also need full data dumps, rather than just API access.
Deep Tech / Bio Focus
You’re evaluating technical founders and novel technology. Research and patent data become critical.
What you need:
- Academic publication databases
- Patent and IP data
- Technical founder backgrounds
- Research institution connections
Additional considerations:
- Many deep tech companies won’t appear in standard funding databases until later
- Founder evaluation requires different signals (publications, citations, lab affiliations)
Typical stack:
| Category | Recommendation |
|---|
| Academic | arXiv (AI/ML, physics), PubMed (bio/healthcare) |
| Research tools | Semantic Scholar for citation networks and research impact or Lens for linking patents to academic research |
| People data | People Data Labs for founder backgrounds |
For bio/healthcare specifically, add:
| Category | Recommendation |
|---|
| Clinical trials | ClinicalTrials.gov |
| Drug pipeline | BioMedTracker for pipeline intelligence |
| FDA data | FDA databases |
Research and patent data are mostly free. Your budget goes toward people data and specialized tools like BioMedTracker.
Regional / Sector Specialist
You focus on a specific geography or vertical. Niche data providers often have better coverage than generalists.
What you need:
- Regional/sector-specific data providers
- Local market intelligence
- Sector-specific signals and metrics
Key insight:
Generalist data providers often have weak coverage outside US tech. If you invest in Europe, Asia, or specific verticals, look for specialized providers who focus on your market.
By geography:
| Region | Recommendations |
|---|
| US | Gravity for signals, Crunchbase for company data |
| Europe | Evertrace for signals, Dealroom for company data |
By sector:
| Sector | Recommendations |
|---|
| Consumer | SimilarWeb for web traffic, data.ai for mobile apps |
| E-commerce | Jungle Scout for Amazon, SimilarWeb for traffic |
| Real estate | CARTO or SafeGraph for location intelligence |
| Fintech | SEC EDGAR for filings, standard company providers for funding data |
| Climate | EIA for energy data, EPA databases for emissions |
The key is finding providers with deep coverage in your specific market rather than relying on generalists.
Budget Considerations
Your data budget should scale with fund size and strategy:
- Small fund (under $50M): Focus on 1-2 core providers. Start with what you absolutely need.
- Mid-size fund ($50-250M): Can afford broader coverage. 3-5 providers typical.
- Large fund (over $250M): 3-5 providers (but typically more expensive), then additional budget reserved for project or deal specific data sources.