Overview
Much of the technical work at a VC fund, especially when you’re just getting started, is building glue between different tools your team already uses. Connecting Granola (meeting notes) to Attio (CRM). Connecting your data warehouse to an MCP server. Connecting external data providers to your sourcing tool. Connecting portfolio company data to your dashboards. Some of these integrations exist out of the box. Granola has a built-in Attio integration. Many CRMs integrate with common productivity tools. But many connections require custom code: a service that runs on a schedule to sync data, a webhook handler that processes events from vendors, or an API wrapper that normalizes different data formats. This chapter covers common integration patterns you’ll encounter, how to validate data from external APIs (the biggest source of problems), how to handle webhooks and rate limits, and how to design internal APIs if you’re building services. The focus is on practical patterns that work for VC funds, not comprehensive API design theory.Common Integration Patterns
The integrations you build fall into a few common patterns. Tool-to-tool glue: Connecting SaaS tools your team uses. Examples:- Meeting transcription tool (Granola, Otter) → CRM (Attio) to automatically log conversations
- CRM → data warehouse for analysis
- Email → CRM to track outreach
- Calendar → CRM to log meetings
- Portfolio company dashboards (Pry, Numeric) → your data warehouse for consolidated metrics
- PitchBook API → data warehouse for funding data
- People Data Labs → database for employee/people data (LinkedIn-like data)
- Harmonic/Specter → sourcing tool or data warehouse
- Crunchbase → database for company data
Validating API Responses: Your Most Important Defense
The biggest source of problems in integrations is trusting external APIs to return what you expect. API schemas change. Vendors return errors in unexpected formats. Required fields are sometimes null. Data types don’t match documentation. Never trust external APIs. Validate everything that comes in and everything that goes out. Use validation libraries For TypeScript: Zod. For Python: Pydantic. These libraries let you define schemas for your data and automatically validate objects against them.Webhook Handling
Many vendors (especially CRM systems like Attio, Affinity) provide webhooks: they call your HTTP endpoint when events happen (company updated, deal stage changed, meeting logged). This is more efficient than polling their API constantly to check for changes. Setting up webhooks The first time is tricky. You need:- An HTTPS endpoint that the vendor can reach (not localhost - use a service like ngrok for local development, deploy to production for real webhooks)
- To register your endpoint with the vendor (usually through their dashboard)
- To handle webhook verification (vendors send a signature to prove the request actually came from them, not a malicious actor)
Rate Limiting and Backoff Strategies
External APIs have rate limits. PitchBook might allow 100 requests per minute. People Data Labs (for LinkedIn data) might allow 500 requests per day. Exceed these limits and you get errors (429 Too Many Requests) or blocked. API costs: often pay per request or per entity Beyond rate limits, many vendors charge per request or per entity returned. People Data Labs might charge $0.01 per person record. PitchBook charges based on data access. Harmonic and Specter have usage-based pricing. This changes how you think about API usage. You can’t just make thousands of exploratory requests to see what’s available. Each request costs money. Be thoughtful about:- Only requesting data you actually need (don’t pull full company profiles if you just need basic info)
- Caching responses so you don’t request the same data repeatedly
- Batching requests when possible (some APIs let you request multiple entities in one call)
- Validating input before making API calls (don’t waste money on requests that will fail)
- First retry: wait 1 second
- Second retry: wait 2 seconds
- Third retry: wait 4 seconds
- Fourth retry: wait 8 seconds
- Give up after 5 attempts (or whatever limit makes sense)
Error Handling and Retries
Not all errors should be retried. Some are permanent (bad request, authentication failure), others are transient (network timeout, server temporarily unavailable). Categorize errors- Retriable: 429 (rate limit), 503 (service unavailable), 504 (timeout), network errors. These are temporary, retry with backoff.
- Non-retriable: 400 (bad request), 401 (unauthorized), 403 (forbidden), 404 (not found). These won’t succeed if you retry. Fix the problem or skip the request.
- Context-dependent: 500 (server error) might be temporary or might indicate a bug in the vendor’s API. Retry a few times, but not indefinitely.
Working with LLM Providers
If you’re building features that use LLMs (research assistants, summarization, analysis), integrating with OpenAI, Anthropic, or other providers has specific considerations. LLM providers have lots of downtime Especially when rolling out new models, providers have outages. OpenAI’s API goes down. Anthropic has rate limits that are lower than you expect. Your features break when this happens. Use an AI gateway for failover Services like Vercel’s AI SDK or LiteLLM provide failover across multiple LLM providers. If OpenAI is down, automatically failover to Anthropic. If Anthropic is rate-limited, try OpenAI.Internal API Design
If you’re building internal services that need APIs (your research platform, sourcing tool, portfolio dashboard), keep internal API design simple and consistent. Pick one language and framework for everything At Inflection, everything is TypeScript with Next.js. This consistency means:- You write code the same way across services
- You can reuse libraries and utilities
- You can move between codebases easily
- Deployment and infrastructure are consistent
Authentication and Data Delivery Patterns
Different vendors deliver data in different ways. Some provide APIs, others provide file exports. API-based vendors Most modern vendors (PitchBook, Harmonic, Specter) provide REST APIs. You authenticate with an API token (bearer token in the Authorization header) and make HTTP requests. Store API tokens securely (environment variables, or a secrets manager like AWS Secrets Manager). Don’t commit them to git. Rotate them periodically. File-based vendors Some vendors provide flat files (CSV, Parquet, JSON) that you download or they upload to your S3 bucket. Examples: quarterly data dumps, historical archives, bulk exports. Prefer Parquet over CSV/TSV: If vendors offer multiple formats, always choose Parquet. It’s columnar (faster for analytics), includes schema information (you know data types without guessing), compresses well (smaller files), and loads much faster into data warehouses. CSV/TSV files require parsing, have encoding issues, no schema, and are slower to work with. Ask vendors to provide Parquet if they don’t already. Import these files into your data warehouse using tools like:- Dagster: Data orchestration, can schedule file imports and transformations
- Airflow: Workflow orchestration, similar to Dagster
- BigQuery Data Transfer Service: If you use BigQuery, can automatically import files from S3/GCS