Why your BI vendor can't ship your dashboard

Every operator has a version of this story. Mine cost six months and a vendor relationship.

We had a clean question: which sales activities actually predict customer retention twelve months out? We had the data — Salesforce, an ecommerce platform, NPS responses from SurveyMonkey, and a couple of usage feeds from our LMS. We had budget. We hired a BI vendor on a recommendation from someone whose dashboard I'd seen and admired.

Six months later, after three quarterly reviews and what felt like a hundred Slack threads, we had: a normalized data warehouse with eleven tables, a Looker workbook with twenty-three charts, and zero answers to the original question. The charts looked like they should answer the question. They didn't.

One Saturday, frustrated, I wrote the SQL myself. Not in their warehouse — in Python, against the Salesforce REST API, joined to two CSV exports from the ecommerce platform on email plus a 7-day window, with NPS appended where we had a match. By Sunday afternoon I had a static JSON file with the answer, hosted on a Cloudflare page my team could view in a browser. Total cost: about $0 in infrastructure, twelve hours of my time.

The thing I wrote wasn't better than the vendor's product in any abstract sense. It was uglier, less reusable, and would not survive a serious data-engineering audit. But it answered the question. And the question was the only thing that mattered.

The vendor wasn't bad. They were optimizing for the wrong thing.

I want to be clear about this part because it's the part most operators get wrong when they retell this story. The BI vendor was competent. Their data warehouse was well-structured. The Looker dashboards followed best practices. Their team showed up on time and answered every email.

They were, however, optimizing for warehouse architecture — a stable, normalized, documented schema that any future analyst could query against. That's the right output if your problem is "we have 47 reports running off janky exports and we need to consolidate them." It's the wrong output if your problem is "we have one specific question and we need an answer."

Their schema couldn't represent the question we'd asked. Specifically: our retention question required joining sales activity logs to ecommerce purchase records on email with a fuzzy date window, because reps' Salesforce logs and customers' ecommerce purchases didn't have a clean foreign key. The denormalization required to express that join — a temporary, ad-hoc table where each row is a (customer, sales-touch, purchase) triple — is exactly the kind of thing a normalized warehouse refuses to materialize.

The vendor knew this. They had proposed an ETL pipeline to bridge the two systems. The pipeline was scoped at three months and another quarter of their team's time. It would have worked. It would also have cost as much as the original engagement again, and would have introduced a dependency on them that we'd have paid for in perpetuity.

The schema you need depends entirely on the question you're trying to answer. The schema a BI vendor builds depends on what they think you'll ask in three years.

What I actually did, in twelve hours

Here's the embarrassing part: the technical work was trivial. The whole thing was four files.

fetch_sf.py — a 60-line Python script that paginated through the Salesforce REST API for sales activities in the relevant window, dumped them to a Parquet file.
fetch_ecom.py — a 30-line script that read the existing CSV exports we already had on a shared drive.
join.py — about 80 lines. The actual logic. Match on email, fuzzy-match on date with a 7-day window, attach NPS where present, compute retention windows, output a denormalized JSON.
index.html — a single static page with a Chart.js bar chart and a sortable table. Hosted on Cloudflare Pages. Access-gated to our team's Google accounts.

That's it. Twelve hours, zero infrastructure cost, one operator who knew the question writing the SQL.

The vendor's six-month build is still useful — it's a real warehouse and it'll pay off the next time someone needs to run reports across the same data. But it never could have answered this question, because answering this question required someone who knew the operating context to make a series of small decisions that aren't visible in a requirements doc.

Decisions you can't outsource

Things I decided in those twelve hours that the vendor never could have:

Match on email but allow for a primary-email vs. work-email mismatch by also matching against a secondary email field on the Salesforce contact.
The 7-day window for activity-to-purchase attribution should be 14 days for two specific product lines because their sales cycle is longer.
NPS responses from the year before don't count — but only because we changed the survey question that year.
One sales rep's activity logs are unreliable for the first three months of the year because they were on a different team. Exclude their data from that window.

Each of those is a five-minute conversation if you already work in the company. Each of those is a three-meeting tangent if you're a vendor trying to write a requirements doc. Multiply by twenty such decisions and you get the six-month gap.

The actual lesson: who should be writing the SQL

The takeaway most people draw from a story like this is "BI vendors are bad" or "build it yourself." That's the wrong takeaway. BI vendors are great. I would hire the same vendor again, for the right kind of work. The right kind of work is materializing a stable schema for queries you already know you'll run.

The takeaway is about who should be writing the SQL for the specific question you're trying to answer right now.

My answer, after a few years of this: the operator who knows the question. Not the analyst, not the vendor, not the data engineer. The person who can recognize, in the moment, that the rep's activity logs are unreliable for that one quarter — because they remember the team change. That kind of context cannot be transcribed into a requirements document. It can only be applied by someone who's been in the room.

Until very recently, this was an absurd suggestion. Operators don't write SQL. They don't know APIs. They don't deploy infrastructure. The cost of becoming the person who writes the SQL was higher than the cost of misalignment with a vendor.

That math has changed. With Claude in the loop, the cost of writing 60 lines of Python that paginates through the Salesforce API is roughly an hour for an operator who's never written Python. The cost of joining two CSVs is fifteen minutes. The cost of hosting a static page on Cloudflare is zero. The whole bottleneck of "you need a developer" has collapsed for this category of work — the category I'd call one-off operational analytics.

The right person to write the SQL is the person who knows the question. AI just made it newly possible for that to be the same person who lived the operating context.

What this changes if you're a sales / CX / RevOps leader

Three things change.

First: the queue of "we should hire someone to build that dashboard" requests on your roadmap is mostly fake. Most of those dashboards are one-off analytical questions that should be answered, written down, and never queried again. They don't need a warehouse. They need a Python script and a static page.

Second: the BI vendor relationship — if you have one — should be scoped tighter. Their product is materializing schemas for recurring queries. Their product is not "answer our questions." Pull the answer-our-questions work in-house, and let the vendor do what they're actually good at.

Third: someone on your team should be the operator-with-Claude. Not necessarily you. But someone whose calendar isn't fully stand-ups, who knows the operating context, and who's willing to write 60 lines of Python on a Saturday. That person, with that toolkit, will outship a six-figure vendor relationship for this category of work — every single time.

I am, transparently, that person for hire. But the deeper point is that this role exists now in a way it didn't five years ago. Whether you fill it with me, or with someone on your team, or with a fractional builder — fill it. Otherwise you're going to keep paying vendors to materialize schemas that don't answer your actual questions.

The corollary, since someone always asks

"But what about scale? What about reuse? What about reliability?"

Most one-off operational analytics questions never need to be re-run. The retention question I answered in 2024 doesn't need to be re-run quarterly — once we had the answer, the answer changed how we sold, and the answer was no longer the question. If you find yourself running the same script every month for a year, that is the moment to graduate it into the warehouse. Not before.

Static JSON files on Cloudflare are, in fact, surprisingly reliable. Mine has been up for two years. Cloudflare's free tier handles more traffic than my entire company will ever generate. The reliability concern is mostly an artifact of how we used to think about data infrastructure — a world where everything had to live in a database because the database was the only thing that could serve a query fast enough. That world has been over for about a decade and most operating teams haven't caught up.

Reuse is the only legitimate concern, and it's the one I'm most often wrong about. Sometimes a script I wrote in twelve hours becomes the de facto monthly report. When that happens, I rewrite it properly — in the warehouse, with documentation, with a scheduled refresh. By that point I know exactly what schema it needs, because I've been running the query for six months. The vendor can build the warehouse against a known target. Everyone wins.

Got a "this should be possible" wall you keep hitting?

I build the kind of operator-side tools described in this essay — embedded part-time with sales, CX, and RevOps teams. Most engagements ship a working tool in week one.

Book a 15-min intro call Email me