Skip to main content

Overview

Much of the technical work at a VC fund is building glue between tools. Connecting meeting notes to your CRM, data providers to your warehouse, portfolio data to dashboards. Some integrations exist out of the box, but many require custom code. This chapter covers common integration patterns, validation strategies, webhook handling, and rate limits.

Common Integration Patterns

The integrations you build fall into a few common patterns. Tool-to-tool glue: Connecting SaaS tools your team uses. Examples:
  • Meeting transcription tool (Granola, Otter) → CRM (Attio) to automatically log conversations
  • CRM → data warehouse for analysis
  • Email → CRM to track outreach
  • Calendar → CRM to log meetings
Some of these have out-of-box integrations. Many require custom code to map fields, handle authentication, and deal with differences in data models. Data vendor → your systems: Pulling data from external providers and loading it into your infrastructure. These integrations usually run on schedules (nightly data syncs) or in response to specific events (when you add a company to your CRM, enrich it with PitchBook data). See Accessing Data for how vendors deliver data. Internal service integrations: If you’re building multiple services (research platform, sourcing tool, internal APIs), they need to talk to each other. Your research platform might need portfolio company data from your data warehouse. Your sourcing tool might need to check your CRM to avoid suggesting companies you’ve already passed on. LLM/AI integrations: Many funds are building features that use LLMs. These require integrations with OpenAI, Anthropic, or other providers. Often combined with your own data (RAG systems pulling from your research or CRM). The common thread: you’re moving data between systems, transforming it to fit different schemas, and handling failures when things break.

Validating API Responses

The biggest source of problems in integrations is trusting external APIs to return what you expect. API schemas change. Vendors return errors in unexpected formats. Required fields are sometimes null. Data types don’t match documentation. Never trust external APIs. Validate everything. Use validation libraries For TypeScript: Zod. For Python: Pydantic. These libraries let you define schemas for your data and automatically validate objects against them.
// TypeScript with Zod
import { z } from "zod"

const CompanySchema = z.object({
  id: z.string(),
  name: z.string(),
  founded_date: z.string().datetime().optional(),
  funding_total: z.number().positive().optional(),
  employee_count: z.number().int().positive().optional(),
})

// When you get data from an API
const response = await fetch("https://api.vendor.com/companies/123")
const data = await response.json()

// Validate it
const company = CompanySchema.parse(data) // Throws if invalid
// or
const result = CompanySchema.safeParse(data) // Returns success/error
if (!result.success) {
  console.error("Invalid data from API:", result.error)
}
# Python with Pydantic
from pydantic import BaseModel, field_validator
from datetime import datetime
from typing import Optional

class Company(BaseModel):
    id: str
    name: str
    founded_date: Optional[datetime] = None
    funding_total: Optional[float] = None
    employee_count: Optional[int] = None

    @field_validator('funding_total')
    def funding_must_be_positive(cls, v):
        if v is not None and v < 0:
            raise ValueError('funding must be positive')
        return v

# When you get data from an API
response = requests.get('https://api.vendor.com/companies/123')
data = response.json()

# Validate it
try:
    company = Company(**data)
except ValidationError as e:
    print(f'Invalid data from API: {e}')
Why this matters Without validation, bad data silently flows into your system. A field that’s supposed to be a number is suddenly a string. A required field is null. These errors cascade: your data warehouse has invalid data, your dashboards show wrong information, your analyses are incorrect. With validation, you catch errors at the boundary. When an API returns bad data, you know immediately. You can log the error, alert yourself, and handle it gracefully rather than letting corrupted data spread through your systems. Validate both inbound and outbound data When you’re calling external APIs, validate the data you’re sending. This catches mistakes in your code before they hit the vendor’s API. When you’re exposing APIs for internal use, validate inputs from callers.

Webhook Handling

Many vendors (especially CRM systems) provide webhooks: they call your HTTP endpoint when events happen (company updated, deal stage changed, meeting logged). This is more efficient than polling their API constantly. Setting up webhooks You need:
  • An HTTPS endpoint the vendor can reach (use webhook.site for local development with dummy data)
  • To register your endpoint with the vendor (usually through their dashboard)
  • To handle webhook verification (vendors send a signature to prove the request came from them)
Key considerations Verify signatures: Always verify that webhooks actually came from the vendor. They send a signature (usually HMAC of the body using a shared secret). Verify this before processing. Handle idempotency: Vendors may send the same webhook multiple times (network retries, their infrastructure issues). Make your webhook handler idempotent: processing the same event twice should be safe. Track event IDs you’ve seen and skip duplicates. Return quickly: Webhook handlers should return 200 OK within a few seconds. Don’t do expensive processing in the handler itself. Accept the webhook, queue the work, and return success. Process asynchronously.

Rate Limits and API Costs

External APIs have rate limits. PitchBook might allow 100 requests per minute. People Data Labs might allow 500 requests per day. Exceed these and you get 429 errors or get blocked. API costs Beyond rate limits, many vendors charge per request or per entity returned. This changes how you think about API usage:
  • Only request data you actually need
  • Cache responses so you don’t request the same data repeatedly
  • Batch requests when possible
  • Validate input before making API calls (don’t waste money on requests that will fail)
Monitor your API spending. Set up alerts when spending exceeds thresholds. Always respect vendor limits Don’t try to work around rate limits by spinning up multiple API keys or using proxies. Vendors notice and will block you. Exponential backoff When you hit a rate limit or get a transient error, retry with exponential backoff:
  1. First retry: wait 1 second
  2. Second retry: wait 2 seconds
  3. Third retry: wait 4 seconds
  4. Fourth retry: wait 8 seconds
  5. Give up after 5 attempts
Add jitter (random variation) to prevent many clients from retrying simultaneously.

Error Handling

Not all errors should be retried. Some are permanent, others are transient. Categorize errors
  • Retriable: 429 (rate limit), 503 (service unavailable), 504 (timeout), network errors. Retry with backoff.
  • Non-retriable: 400 (bad request), 401 (unauthorized), 403 (forbidden), 404 (not found). Fix the problem or skip the request.
  • Context-dependent: 500 (server error) might be temporary or might indicate a vendor bug. Retry a few times, but not indefinitely.
Use transactions When working with databases, wrap operations that need to succeed together in transactions. If anything fails, the transaction rolls back.
await db.transaction(async (tx) => {
  const company = await tx.insert(companies).values({ id: data.id, name: data.name }).returning()

  for (const round of data.funding_rounds) {
    await tx.insert(fundingRounds).values({
      company_id: company.id,
      round_name: round.name,
      amount: round.amount,
    })
  }
  // If any insert fails, everything rolls back
})
Useful error messages When something fails, include context: what failed, why it failed, what happens next. Not “An error occurred.” Instead: “Failed to sync company Acme Inc from PitchBook: Rate limit exceeded (429). Will retry in 60 seconds.”

Working with LLM Providers

If you’re building features that use LLMs: Use an AI gateway for failover. LLM providers have frequent outages and rate limits. Services like Vercel’s AI SDK or LiteLLM provide automatic failover between providers. Handle streaming responses. When building UIs, you’ll want LLM APIs to stream tokens incrementally. Set timeouts. LLM requests can take 30+ seconds. Set 60-90 second timeouts so slow requests don’t block indefinitely.

Authentication Patterns

API tokens Most vendors provide REST APIs authenticated with bearer tokens. Store tokens securely:
  • Use environment variables, not hardcoded in code
  • Use a secrets manager for production
  • Never commit tokens to git
  • Rotate periodically
User authentication for internal tools For internal dashboards, prefer OAuth (Google, Microsoft) over managing separate passwords. For service-to-service authentication, use API tokens or service accounts. Don’t use user credentials for automated processes. Orchestrating data imports For scheduled data imports, use orchestration tools:
  • Dagster: Data orchestration, can schedule imports and transformations
  • Airflow: Workflow orchestration, similar to Dagster

The Bottom Line

Much of the technical work at VC funds is glue code. Validate everything with Zod or Pydantic. Respect rate limits with exponential backoff. Use transactions to avoid partial failures. For LLMs, use an AI gateway for failover.