> ## Documentation Index
> Fetch the complete documentation index at: https://buildingfor.vc/llms.txt
> Use this file to discover all available pages before exploring further.

# Data Starter Kits

> Recommended data stacks for different fund types, stages, and focus areas.

## Overview

What data you need depends on your fund: stage focus, sector specialization, team size, and budget. A pre-seed fund sourcing emerging founders needs different data than a growth fund doing due diligence on Series B companies.

This page outlines starter kits for different fund profiles. These are starting points, not prescriptions. Your specific needs will vary.

## Pre-Seed / Seed Focus

You're looking for companies before anyone else knows about them. Signal data based on government registries matters more than comprehensive funding history, though even if you have a comprehensive funding database, you may not find all the companies you're interested in.

**What you need:**

* Data to support your macro and market trend research
* Early-stage signal data (who's starting companies, what's trending)
* Founder and team data (background, previous experience)
* Basic company data (to track what you find)

**What you probably don't need yet:**

* Comprehensive funding databases (most of your targets won't be in them)
* Detailed financial data (too early for meaningful financials)

**Typical stack:**

| Category                | Recommendation                                                                                                                                  |
| ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| Early Signal data       | [Gravity](https://www.gravity.inc/) (US) or [Evertrace](https://www.evertrace.io/) (Europe)                                                     |
| People and Company data | Choose either [People Data Labs](https://www.peopledatalabs.com/) or [Coresignal](https://coresignal.com/) for coverage of people and companies |
| Research tools          | [Perplexity API](https://docs.perplexity.ai) for quick market research                                                                          |

At this stage, signal and people data matter more than comprehensive funding databases. Focus your budget there.

***

## Series A / Series B Focus

You're evaluating companies with some traction. Need a balance of signal data and comprehensive coverage.

**What you need:**

* Funding history and investor data
* Growth signals (hiring, web traffic, product launches)
* Team composition and changes
* Competitive landscape data

**What you probably don't need:**

* Deep public market data
* Heavy patent/research databases (unless sector-specific)

**Typical stack:**

| Category     | Recommendation                                                                                                                                      |
| ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| Company data | [Crunchbase](https://www.crunchbase.com/) or [Dealroom](https://dealroom.co/), if you can splurge: [PitchBook](https://www.pitchbook.com/)          |
| Signal data  | [Specter](https://www.tryspecter.com/) or [Harmonic](https://harmonic.ai/) for growth signals                                                       |
| People data  | [People Data Labs](https://www.peopledatalabs.com/), [Coresignal](https://coresignal.com/), or [MixRank](https://mixrank.com/) for team composition |
| Web traffic  | [SimilarWeb](https://www.similarweb.com/) if evaluating consumer companies                                                                          |
| Research     | [Perplexity API](https://docs.perplexity.ai) or [Exa](https://exa.ai/) for competitive research                                                     |

This is the "balanced" tier. You need both signal data (to find companies with momentum) and comprehensive company data (for due diligence). This is where you might start to experiment with "flat files" instead of APIs (see [Accessing Data](/guide/part-3-technical-foundations/data-providers/accessing-data))

***

## Growth / Late Stage Focus

You're doing deeper due diligence on established companies. Comprehensive data and financial metrics matter most.

**What you need:**

* Comprehensive funding databases
* Financial and operational metrics
* Market and competitive analysis
* Public company comparables

**What you probably don't need:**

* Early-stage signal data (your targets are already known)

**Typical stack:**

| Category       | Recommendation                                                                                                      |
| -------------- | ------------------------------------------------------------------------------------------------------------------- |
| Company data   | [PitchBook](https://pitchbook.com/) for comprehensive financials, valuations, cap tables                            |
| Financial data | [S\&P Capital IQ](https://www.spglobal.com/marketintelligence/en/solutions/sp-capital-iq-platform) for public comps |
| People data    | [People Data Labs](https://www.peopledatalabs.com/) for team composition and hiring trends                          |

At this stage, the coverage and quality of premium data becomes worth the investment. You need detailed financials, valuation history, and deal terms that lighter providers don't offer. You probably also need full data dumps, rather than just API access.

***

## Deep Tech / Bio Focus

You're evaluating technical founders and novel technology. Research and patent data become critical.

**What you need:**

* Academic publication databases
* Patent and IP data
* Technical founder backgrounds
* Research institution connections

**Additional considerations:**

* Many deep tech companies won't appear in standard funding databases until later
* Founder evaluation requires different signals (publications, citations, lab affiliations)

**Typical stack:**

| Category       | Recommendation                                                                                                                                                           |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Academic       | [arXiv](https://arxiv.org/) (AI/ML, physics), [PubMed](https://pubmed.ncbi.nlm.nih.gov/) (bio/healthcare)                                                                |
| Research tools | [Semantic Scholar](https://www.semanticscholar.org/) for citation networks and research impact or [Lens](https://www.lens.org/) for linking patents to academic research |
| People data    | [People Data Labs](https://www.peopledatalabs.com/) for founder backgrounds                                                                                              |

**For bio/healthcare specifically, add:**

| Category        | Recommendation                                                                              |
| --------------- | ------------------------------------------------------------------------------------------- |
| Clinical trials | [ClinicalTrials.gov](https://clinicaltrials.gov/)                                           |
| Drug pipeline   | [BioMedTracker](https://www.biomedtracker.com/) for pipeline intelligence                   |
| FDA data        | [FDA databases](https://www.fda.gov/drugs/drug-approvals-and-databases/drugsfda-data-files) |

Research and patent data are mostly free. Your budget goes toward people data and specialized tools like BioMedTracker.

***

## Regional / Sector Specialist

You focus on a specific geography or vertical. Niche data providers often have better coverage than generalists.

**What you need:**

* Regional/sector-specific data providers
* Local market intelligence
* Sector-specific signals and metrics

**Key insight:**

Generalist data providers often have weak coverage outside US tech. If you invest in Europe, Asia, or specific verticals, look for specialized providers who focus on your market.

**By geography:**

| Region | Recommendations                                                                                             |
| ------ | ----------------------------------------------------------------------------------------------------------- |
| US     | [Gravity](https://www.gravity.inc/) for signals, [Crunchbase](https://www.crunchbase.com/) for company data |
| Europe | [Evertrace](https://www.evertrace.io/) for signals, [Dealroom](https://dealroom.co/) for company data       |

**By sector:**

| Sector      | Recommendations                                                                                                       |
| ----------- | --------------------------------------------------------------------------------------------------------------------- |
| Consumer    | [SimilarWeb](https://www.similarweb.com/) for web traffic, [data.ai](https://www.data.ai/) for mobile apps            |
| E-commerce  | [Jungle Scout](https://www.junglescout.com/) for Amazon, [SimilarWeb](https://www.similarweb.com/) for traffic        |
| Real estate | [CARTO](https://carto.com/) or [SafeGraph](https://safegraph.com/) for location intelligence                          |
| Fintech     | [SEC EDGAR](https://www.sec.gov/edgar) for filings, standard company providers for funding data                       |
| Climate     | [EIA](https://www.eia.gov/) for energy data, [EPA databases](https://www.epa.gov/enviro/data-downloads) for emissions |

The key is finding providers with deep coverage in your specific market rather than relying on generalists.

***

## Budget Considerations

Your data budget should scale with fund size and strategy:

* **Small fund (under \$50M):** Focus on 1-2 core providers. Start with what you absolutely need.
* **Mid-size fund (\$50-250M):** Can afford broader coverage. 3-5 providers typical.
* **Large fund (over \$250M):** 3-5 providers (but typically more expensive), then additional budget reserved for project or deal specific data sources.
