Lead Generation

Lead Scoring That Predicts Who Buys (Not Who Clicks)

Most lead scoring is a popularity contest dressed up as math. A lead opens three emails, downloads a whitepaper, and visits the pricing page twice — and the system hands sales a "hot" 92. The rep calls. The lead is a student writing a paper, a competitor doing research, or a curious individual contributor with no budget. Meanwhile a VP at a perfect-fit account who read one page and never filled out a form sits at a 12, untouched. The score measured activity. It did not measure buying.

The key takeaway up front: a useful lead score answers two separate questions and never confuses them — should we sell to this account (fit) and are they in a buying window right now (intent). Score them on two independent axes, subtract points for disqualifying signals, and calibrate the whole thing against leads that actually converted. Do that and the number stops being noise reps ignore and starts being a queue they trust.

Why most lead scores quietly fail

The classic single-number model fails for a structural reason: it adds fit points and behavior points into the same total. A high-fit lead who hasn't done anything and a low-fit lead who clicked everything can land on the identical score — and they are nothing alike. One deserves patient nurturing; the other deserves a polite disqualification. Collapsing them into one number destroys exactly the distinction a rep needs to decide what to do next.

Three failure patterns follow from that mistake:

  • Activity inflation. Points accrue for every open and click, so the most bored leads — researchers, job seekers, students, vendors — float to the top. Engagement is not the same as intent.
  • No floor. Without negative scoring, a "free email domain + student title + competitor company" lead can still rack up a high score on behavior alone. Nothing pushes obvious bad fits back down.
  • Set-and-forget thresholds. Someone picks "100 = MQL" on day one and never checks whether 100-point leads actually convert better than 60-point leads. Usually they don't, and reps quietly learn to ignore the score.

When reps stop trusting the number, they revert to working leads by gut or by recency — which is the exact problem scoring was supposed to solve.

The two-axis model: fit times intent

Score every lead on two independent axes, then read them together rather than summing them.

Fit (0–100): would we want them as a customer? This is firmographic and demographic, and it barely changes over time. Company size, industry, region, the buyer's role and seniority, and whether they match a profile that has bought before. Fit answers "is this account worth a rep's hour at all?"

Intent (0–100): are they in a buying window now? This is behavioral and time-sensitive. High-intent signals are the ones that correlate with evaluation, not idle curiosity: pricing-page visits, demo requests, a reply to outreach, multiple people from the same account engaging, a return visit after weeks of silence. Intent decays — a pricing visit from 60 days ago is not the signal a pricing visit from yesterday is.

Now plot leads on the resulting grid and give each quadrant its own action:

  • High fit, high intent — call today. This is the queue.
  • High fit, low intent — nurture deliberately; these are your best future pipeline. Don't burn them with a hard sell before they're ready.
  • Low fit, high intent — handle with care. Often a researcher or a too-small account. Route to self-serve or a low-cost path, not a senior rep.
  • Low fit, low intent — leave them alone. Spending rep time here is the most expensive mistake in the funnel.

The grid is the whole point. A single number can't tell "nurture" from "disqualify"; two axes can.

Add negative scoring — the part everyone skips

The fastest improvement to a broken model is usually subtraction, not addition. Negative scoring pushes obvious bad fits down so they can't ride engagement to the top of the queue. Subtract for:

  • Disqualifying titles — student, intern, "researcher," and roles with no plausible buying authority for your product.
  • Non-buyer domains — free email providers when your ICP is corporate, known competitor domains, your own employees filling out forms.
  • Geographies you don't serve, or company sizes far outside your range.
  • Staleness — let intent points decay. A lead who was hot two months ago and silent since is no longer hot; the model should say so without a human remembering to check.

Negative scoring is what keeps the curious-but-useless out of the rep's day. Most teams add ten positive rules and zero negative ones, then wonder why the top of the queue is full of tire-kickers.

A worked example

Take two leads and run both models.

Lead A — "Jordan," a graduate student. Free Gmail address, title "Research Assistant," visited the blog 9 times, opened 6 emails, downloaded 2 ebooks.

  • Old single-score model: +5 per email open (30), +10 per download (20), +3 per visit (27) = 77 → "hot," routed to a rep. The rep spends 25 minutes discovering there's no budget and no company.
  • Two-axis model: Fit = student title (−30), free domain (−15), no company → Fit 8. Intent = lots of low-value blog activity, no pricing or demo signal → Intent 35. Lands in low fit / lowish intent: route to nurture or self-serve, never a rep call.

Lead B — "Priya," VP of Operations at a 600-person target-industry company. Corporate domain, matches a profile that has bought before, visited pricing once, returned after three weeks and requested a demo. Opened only one email.

  • Old single-score model: +5 open (5), +20 pricing (20), +0 for the demo because nobody built that rule = 25 → ignored.
  • Two-axis model: Fit = senior role (+40), right size and industry (+40), buyer profile match (+15) → Fit 95. Intent = demo request (+40), pricing visit (+20), return-after-silence (+15) → Intent 75. Lands squarely in high fit / high intent → call today.

The single-score model put the student above the VP. The two-axis model puts them in completely different lanes — which is the entire job of a lead score. None of these point values are universal; they're illustrative. Yours come from calibration, which is the next step.

Calibrate against who actually converted

This is the discipline that separates a scoring model from astrology, and it's the step almost everyone skips.

  1. Pull a real cohort — every lead from the last 60–90 days (or one full sales cycle), with the outcome attached: booked a meeting, became an opportunity, closed.
  2. Look at the converters and reverse-engineer them. Which fit attributes did buyers actually share? Which behaviors reliably preceded a booked meeting? Weight those; drop the signals that converters and non-converters showed equally — those are noise, not signal.
  3. Set thresholds from the data, not a round number. Find the fit/intent cut points above which conversion meaningfully jumps, and make those your "work it now" lines. If 100-point leads convert no better than 60-point leads, your threshold is wrong.
  4. Re-check quarterly. Your ICP, product, and market move. A model calibrated once and frozen drifts back into noise within a couple of quarters.

You don't need a data scientist for the first pass — a spreadsheet of converters versus non-converters and an honest look at what differs is enough to beat any hand-picked point system. Good scoring is downstream of clean targeting, which is why the same fit thinking shows up in our sales prospecting guide; the score is just that ICP made measurable and time-aware.

Common mistakes and why they happen

  • Summing fit and intent into one number. It feels simpler, but it erases the one distinction reps need. Keep the axes separate.
  • Rewarding all engagement equally. A blog visit and a demo request are not the same intent. Weight by what precedes a booked meeting, not by what's easy to track.
  • No negative scoring. Without a floor, your queue fills with researchers and competitors. Subtraction is half the model.
  • Never letting intent decay. Yesterday's pricing visit and last quarter's are different signals. A score that doesn't age is lying to your reps.
  • Picking thresholds by intuition. "100 = MQL" is a guess until the back-test confirms 100-point leads actually convert better. Let the cohort set the line.

The root cause under most of these is the same: teams optimize for what's easy to measure (clicks, opens) instead of what actually predicts a purchase. The cure is always to anchor the model to outcomes.

Edge cases and caveats

  • Sparse data. A brand-new product or a tiny lead volume can't support statistical calibration. Start with a sensible fit/intent split and clear negative rules, and tighten as data accrues. A reasoned model beats no model.
  • Long, committee-driven cycles. When five people from one account engage, score the account, not just the individual — buying-group intent is a stronger signal than any one contact's behavior.
  • Pure outbound. Cold prospects generate little behavioral intent by definition, so fit and reply-signal carry the weight. Don't penalize an outbound lead for not browsing your site.
  • Product-led motions. In-product usage often beats marketing-site behavior as an intent signal. Score what the user does inside the product, not just emails opened.

FAQ

What's the difference between lead scoring and lead qualification?

Scoring is automated prioritization — a number that ranks who to work first. Qualification is the human conversation (budget, authority, need, timing) that confirms whether a high-scoring lead is real. Scoring decides who gets the call; qualification decides what happens on it.

How many scoring rules do I actually need?

Fewer than most teams build. A handful of high-signal fit attributes, a handful of genuine intent behaviors, and a few negative rules usually outperform a sprawling rulebook. Every rule you can't tie to conversion is noise diluting the ones that matter.

How often should I recalibrate the model?

Roughly quarterly, or whenever your ICP, pricing, or market shifts noticeably. Models drift as the business changes; a score calibrated once and frozen slowly decays back into the activity-counting problem it was meant to fix.

Can I do this without expensive software?

Yes. The two-axis logic and a back-test of converters versus non-converters work in a spreadsheet. Marketing-automation and CRM tools help you operationalize a good model at scale — they can't invent the weights for you, and they'll happily automate a bad model.

Why do high-scoring leads keep wasting my reps' time?

Almost always missing negative scoring and confused intent. Without a floor, researchers and competitors ride engagement to the top; without decay, stale leads stay "hot." Add disqualifying rules, weight intent by what precedes a booked meeting, and recalibrate against real outcomes.

The trick

If you remember one thing: never add fit points and behavior points into the same total. Score "should we sell to them" and "are they buying now" on two separate axes, give each quadrant its own action, subtract for disqualifiers, and set your thresholds from leads that actually converted — not from a round number someone liked on launch day.

Want a scoring model your reps will actually trust? Pull your last 60 days of leads, split fit from intent on two axes, rebuild the weights from who really booked, and put the high-fit, high-intent quadrant at the top of the queue. Start at prospectuso.com.

Comments are disabled for this article.