I asked ChatGPT to help design a financial model for a paid newsletter: grow from hundreds of subscribers to more than 10,000, with marketing spend. The bot spun out tables, charts and sensible-sounding assumptions by the minute. Then the cracks came — contradictory break-even points, changing subscriber numbers and math that stumbled over itself quietly. What started as a productivity lift ended up being a reminder of why AI can’t be your fount of financial truth.
The definition was simple: price per subscriber, churn, CAC (cost to acquire) and expected payback period. ChatGPT produced tables ready for Excel, demonstrated what several CACs would do to margins and plotted monthly cash flow. I could adjust a lever and see the curve redraw instantly. That’s the magic. But in the world of financial modeling, “almost right” is merely a synonym for wrong.

Where the numbers started to wobble
The first red flag: The model said we would reach profit in one month, and then “explained” in prose that breakeven came later. Same conversation, same situation, different responses. The bot apologized there and moved the conversation along — without explaining why it had contradicted itself.
And then forgot an initial assumption: the plan commenced with existing subscribers, not zero. That one missing piece of information threw out the math that followed fish scale in a big way. Subsequently, when estimating terminal value, it gave two separate “ending subscribers” indicators for the same period — one a figure in the narrative and the other from the table. And each time, it happily admitted the mistake and never identified its source.
Small errors magnified: marketing costs averaged across the full base, not just new adds; churn recorded monthly in one table and annually in another; a “profitability” chart that didn’t correlate to the ledger entries below it.
I found myself playing forensic accountant in some cases, scrolling back to reclaim my ground truths — things like CAC, churn and starting base — while the model rewrote the story.
Why financial planning is tough for LLM students
Large language models don’t think the way analysts do; they predict text that looks right given what comes before. That is great for brainstorming, but a quality drawback when precision is called for. Both sides escalate; longer conversations are more risky, because compound factoids fall off the back of model timing windows or get edged out by newer prompts. 16 OpenAI itself warns that outputs are speculative and most reliable in short-turn dialogue exchanges, and to double-check important details.
That these systems can so confidently give as output statements that are not firmly grounded in stable internal logic is something that has been argued by some researchers and critics, such as the computer scientist Gary Marcus. The U.S. National Institute of Standards for Technology’s AI Risk Management Framework focuses on the importance of traceability, validation and human oversight—all controls vital to financial modeling. In other words, the tech is awesome, but it’s not an accounting system.
There’s also the allure problem. Generative AI, according to McKinsey, might release trillions of dollars in annual value. That promise can lead teams to push the limits beyond safe use cases. But in subscriptions and media, CAC, lifetime value and churn are scalp-sensitive such that small swings in any one puts the whole P&L on life support — many operators aim for an LTV: to-CA ratio around 3:1, a metric bar-line brought into tech vernacular by outfits like Bain — whiff on inputs and the headline metric becomes meaningless.
Guardrails that rescued the project
Snapshot assumptions early and often. Hold a plain-language “source-of-truth” block: price, churn cadence, starting base, cacs bands, channels, trial rules. Paste it into the chat intermittently and have the model echo the prompt back before creating a new table.
Separate narrative from math. Leverage the AI to write logic and formulas, then rebuild the model in a spreadsheet where every cell is transparent, testable, and version-controlled. Add in some quick checks: totals tie out, churn and acquisition apply to the right cohorts, charts are connected to the correct ranges.
Automate sanity checks. Prior to each update, have the bot confirm that breakeven, cumulative cash flow and ending subscribers agree between table and text. Check on which assumption each had changed from before and which had not.
Keep chats short. Begin new threads for separate sections: acquisition modeling, pricing tests, retention scenarios and terminal value. OpenAI mentions that reliability is higher in shorter exchanges; consider long, rambling chats as risk multipliers.
Anchor the model in your own data. If you have the stack, retrieval-augmented generation can ground some variables in a document or database so that the bot is drawing out constants instead of making them up. Businesses are using this model just to avoid having those assumptions slip away. Large picture Even if you don’t use RAG, then paste the authoritative metrics from your analytics tools and label them as non-negotiable.
Reality over aesthetics. Sexy charts are seductive; stubbornly let the ledger win. And if the graph and the table don’t agree, then the truth is in the table — or it isn’t until you’ve audited all those formulas Advertisement Think of the model as a junior analyst: useful, quick, sometimes brilliant but always double-checked.
The bottom line
Generative AI speeds up ideation, brings about options you might have missed and condenses early modeling from days to hours. It can outline a plan; it should not audit one. For me, ChatGPT sped up the first draft and slowed down the second; I was rebuilding trust once empty cell at a time. That trade is feasible — if you go in with eyes open, a verification method in your pocket, and an ability to say no to any number that isn’t doing the talking.