Investor Intelligence: How to Score 800 Family Offices in a Week

A board member at a client we work with said something to me last year that stuck. The team had just spent four months chasing 30 institutional LPs that on paper looked like a perfect fit. Of those 30, only 6 took meetings. Of the 6 that took meetings, 2 made investments. The hit rate at every stage was lower than the team expected.

The problem wasn't the deck. It wasn't the founder. It wasn't the round size. It was that nobody had asked, before any of those meetings happened, whether the 30 LPs actually invested in companies at this stage and in this sector. The list had been built from a directory and a few warm intros. It had the right surface features and almost none of the right structural ones.

A few weeks later we started building what became the investor intelligence engine. The idea was simple. Stop picking LPs from a list. Start scoring them against an explicit model of who they actually back. Then sort.

The first run scored 800 family offices and impact-aligned RIAs in a week. The top 88 became the focused outreach list for the next quarter. Conversion rates roughly tripled compared to the previous approach.

This post is about how that engine works, what it does and doesn't tell you, and how a smaller fundraising team can build something similar without an expensive data vendor.

Why most LP targeting fails

The classical approach to LP targeting looks like this. You start with a directory or a database. You filter on stated investment focus (sector, stage, geography, check size). You narrow to a few hundred names. You research the most promising ones manually. You reach out.

This sounds reasonable. It fails in predictable ways.

Stated investment focus is mostly aspirational. A family office that says it invests in "sustainability" may have made one such investment in the last three years and twenty investments in software. A fund that says it does "early stage" may have a hard floor at Series A. The bio on a website is marketing. The portfolio is the data.

Stage filters are too coarse. "Pre-seed to Series B" covers four years of company maturity and check sizes from $50K to $5M. You need to know not just the stage but the cadence: when does this firm typically deploy a second check, when do they exit, what's their actual sweet spot.

Geographic filters miss the implicit ones. A fund based in New York may say it invests globally. In practice, 85% of its portfolio is within a four-hour flight. Stated geography is a poor predictor of actual geography.

Warm intro paths are noisier than people think. A mutual connection on LinkedIn doesn't mean the intro will land well. The right question is whether the connection has the standing and the context to make a useful intro, not whether they exist.

The cumulative effect of all this is that lists built by stated criteria are 60-70% noise. You spend most of your fundraising time meeting LPs who were never going to invest.

What scoring changes

Scoring inverts the workflow. Instead of starting with a list and filtering, you start with an explicit model of what makes a good LP for your specific round, then you apply that model to the entire universe of candidates.

The model needs to express, at minimum, four things.

Stated fit is the obvious one. Does the LP say they invest in this category? This is the noisy criterion, but it's still useful as a first filter. Sustainability investors say they invest in sustainability. Health investors say they invest in health.

Demonstrated fit is the more important one. Do they actually invest in this category? This is observed from portfolio data, recent press releases, regulatory filings, and public commitments. It usually disagrees with stated fit by a meaningful margin.

Stage and cadence fit asks whether they write checks in your range, whether they typically lead, follow, or co-invest, and when they last made a deal at your stage. An LP that hasn't made a Series A investment in 18 months may have moved upmarket.

Strategic fit is about the thesis match beyond the financials. An impact fund focused on climate adaptation is structurally different from one focused on circular materials, even though both call themselves climate funds.

Each of these gets a score. The scores combine into a composite. The composite sorts the list.

What goes into the model

We built ours specifically for impact-adjacent funds and family offices because that's the wedge most relevant to the firms we work with. The structure generalizes, though.

The signals we found most predictive, in rough order:

Recent portfolio composition. The last twelve to twenty-four months of investments tell you far more than the founder bio. Look at company stage, sector, check size, lead vs participant. Build a vector representation of the firm's recent activity, not a single category label.

Public commitments. Regulatory filings, signed pledges (B Corp, Climate Pledge, GIIN, UNPRI), and stated allocation goals are weak signals on their own but useful when combined. A firm that's publicly committed to 30% climate allocation and is at 8% currently is a different prospect than one that's at 27%.

Source of capital. A family office deploying second-generation wealth has a different risk profile than one deploying first-generation wealth. A foundation endowment behaves differently than a CIO-led pool. Where the money comes from constrains what it'll do.

Decision structure. Solo deciders move fastest. Family offices with an IC of two or three move next. Foundations with a board approval requirement move slowly. Knowing this in advance changes how you sequence outreach.

Connection density. Not warm intros specifically, but the count and quality of paths from your network to the LP. A firm with eight credible connection paths is structurally easier to reach than one with two.

Each signal gets normalized to a 0-1 score. The composite uses weights tuned to the specific raise. A Series A health-tech round weights different signals than a Series B climate infrastructure round.

The data layer

This is where most teams get stuck. The signals are obvious in principle. Gathering them at scale is the hard part.

The good news is that most of what you need is public. Portfolio data is on firm websites, in press releases, in Crunchbase, in PitchBook (if you have it), and in regulatory filings for institutions above certain thresholds. Public commitments are on signatory lists. Even some of the decision structure can be inferred from press coverage and team bios.

The expensive route is to use a commercial database. PitchBook, Preqin, Capital IQ, S&P Global. These work but they're priced for large funds, not lean fundraising teams.

The lean route is to combine targeted scraping (with consent and within terms of service), public APIs, and LLM-assisted enrichment. Crunchbase's API gives you the firm-level basics. LinkedIn gives you team and connection data. Public press releases give you the recent investment cadence. Run an LLM over the firm's homepage and recent news to extract stated thesis and recent commitments.

We built the first version in about three weeks of focused work. Total cost in data and API calls was under $2K. The output beat what any of the commercial databases could have produced because the model was tuned specifically for the round we were running.

What the output looks like

The end product, for the team running the raise, is a ranked list with reasoning attached.

Each LP entry has a composite score (0-100), a recommended outreach priority (focus, watch, deprioritize), a one-paragraph thesis fit summary explaining why the score is what it is, and a list of connection paths ranked by likelihood of yielding a useful intro.

The list gets updated every two weeks as new data comes in. New portfolio investments. New press coverage. New commitments. The scores drift accordingly.

The team's job is no longer "who should we reach out to next". It's "which of the top 30 do we focus on this week".

What this doesn't replace

A few caveats worth being honest about.

Scoring doesn't replace the judgment call on whether a specific LP is the right partner for your company. Two firms with identical scores can be very different to work with. Founder-friendliness, follow-on capacity, board dynamics, post-investment support: none of this fits neatly into a model. You still need human judgment on the final shortlist.

Scoring doesn't tell you when to raise. The composition of the LP universe matters less than the macro signal of whether your sector is in or out of favor at any given moment. A perfect scoring system in a bad market gets you 5% conversion instead of 2%, not 50%.

Scoring doesn't generate the relationship. The work of building credibility, refining the pitch, and showing up consistently is the actual fundraising. Scoring just makes sure that work goes to the right targets.

And scoring is wrong some of the time. The 88th-ranked LP might be your lead. The third-ranked one might never respond. Models compress noise, they don't eliminate it.

What scoring does, reliably, is shift the distribution. You spend more time with LPs who could plausibly invest, and less time with LPs who never could.

What changes for the fundraising team

In our experience three things shift.

Outreach hit rates go up. The first-meeting rate on the focused-88 list ran around 35%, versus 20% on the prior unsorted list. The qualified-meeting rate (those that progressed to a follow-up) was even higher in differential terms.

Time-to-close compresses. When the team isn't burning weeks on poorly-targeted outreach, the calendar fills faster, and the round closes faster. That particular round closed about 40% ahead of the originally planned timeline.

And fundraising becomes a repeatable operation rather than a project. The intelligence layer stays in place between rounds. The next time the company raises, it isn't starting from scratch. The model improves with each cycle. The data layer compounds.

Companies that operate this way treat fundraising as an ongoing system, not a quarterly project. The intelligence engine is part of the infrastructure, sitting next to the CRM and the financial model.

Where to start

If you're running a raise in the next twelve months and want to build something like this, the order goes roughly this way.

Start by defining your target investor profile explicitly. Not "Series A climate investors" but something closer to "lead-or-participant Series A investors with $20M+ AUM, at least one recent investment in regenerative agriculture or carbon removal, board-friendly governance, US-based". The more specific, the better.

Then list every signal you'd use to score against that profile. Stated thesis, portfolio overlap, recent investment cadence, decision structure, connection paths. Write them down.

Build a scrappy first version. A spreadsheet is fine. Score 50 LPs by hand against the model. See if the rankings match your gut on the ones you already know well. If they do, the model is roughly right. If they don't, refine until it does.

Once the model holds up at 50 LPs, automate the data layer. This is the work most teams skip because it feels like overkill. It isn't. Doing it once gives you reusable infrastructure across multiple raises, and across multiple portfolio companies if you're a fund.

Then run the engine across the universe, sort, and operate.

The teams that have done this consistently outperform the ones who haven't. Not because they're smarter or better-connected. Because they spent the upfront time to know who they're actually targeting before they started knocking on doors.

If you're running a raise this year and your current target list feels too noisy, we can help you build an investor intelligence engine tuned to your specific thesis. Most setups land in two to three weeks.

Book a Free Discovery Session