New: Agent OS - build powerful workflows that can create artifacts and perform work, without needing to someone to ask Learn more >

On this page

How to analyze NPS verbatims with ChatGPT or Claude Where ChatGPT and Claude work well Where LLMs fall short for ongoing NPS analysis How to choose between an LLM and a platform FAQ

How to Analyze NPS Verbatims with ChatGPT or Claude

June 30, 2026

You have a few hundred NPS responses with open-text comments, and you want to know what is driving the score without reading every line. ChatGPT and Claude are genuinely good at this for a one-off analysis: paste the verbatims, ask for the themes, and you will get a usable read in minutes. The trick is knowing how to prompt them well, and knowing exactly where that approach stops working, because the same method that nails a quarterly batch quietly breaks when you try to run it continuously.

This guide covers both: a reliable way to analyze NPS verbatims with ChatGPT or Claude, where they shine, and where you need a platform instead.

How to analyze NPS verbatims with ChatGPT or Claude

These six steps produce a defensible analysis from a batch of verbatims.

Export the verbatims with their scores. Pull each comment alongside its 0-10 score and, if you have them, the account and segment. The score band (promoter, passive, detractor) is what lets the model separate why people are loyal from why they are frustrated.
Clean and de-identify. Strip names, emails, and anything sensitive before pasting customer data into a general LLM. Keep the comment, the score, and non-identifying attributes.
Ask for themes, not a summary. Prompt the model to group the comments into recurring themes and return a structured list, for example: "Group these NPS comments into themes. For each theme, give a label, the count, the average score, and three representative quotes." Structure beats a paragraph summary you cannot act on.
Separate promoter and detractor drivers. Run the promoter and detractor comments separately, or ask the model to split them, so you get what is winning loyalty and what is losing it as two distinct lists rather than one blended pile.
Quantify and rank. Ask for theme counts and the score impact of each, so you can rank drivers by how much they appear to move the number rather than by which quote stood out. Spot-check the counts against the raw comments.
Verify against the source. Open the actual comments behind two or three themes to confirm the model grouped them sensibly. LLMs occasionally merge distinct issues or invent a tidy theme, so a quick trace back to source keeps the analysis honest.

Done carefully, this gives you a solid read on a batch. The limits show up when the batch becomes a program.

Where ChatGPT and Claude work well

For ad-hoc analysis, LLMs are excellent and often the right tool. A few hundred verbatims from last quarter's relationship survey, a one-time investigation into a score dip, a quick read before a board meeting: these are exactly the cases where pasting comments into Claude or ChatGPT and prompting for themes saves hours and produces something genuinely useful. They handle nuance, sarcasm, and mixed sentiment well, they need no setup, and for a single analyst doing a single pass, the output is more than good enough. If your NPS analysis is occasional and the stakes of perfect consistency are low, this is a fine workflow and you do not need anything heavier.

Where LLMs fall short for ongoing NPS analysis

The cracks appear the moment NPS analysis becomes continuous rather than occasional, and they are structural, not prompt-fixable.

The first is consistency across waves. NPS is measured repeatedly, and trend analysis only holds if a theme means the same thing this quarter as last. A fresh LLM pass invents its own categories each time, so "onboarding friction" might be one theme this wave and split across three the next, which makes wave-over-wave comparison unreliable. A consistent, self-learning taxonomy solves this by maintaining stable categories that still adapt as new issues appear.

The second is traceability and scale. A model that returns "32% cite pricing" hands you a number you cannot audit unless you trace it back yourself, and pasting thousands of verbatims every wave is neither practical nor reliable. The third is context: an LLM reading raw comments does not know which detractor is a six-figure account, because it has no customer context graph tying each verbatim to revenue and segment. So it can tell you what customers said but not which themes threaten the most revenue, which is the part that drives action. This is the same gap that separates ad-hoc summarizing from real NPS analytics: the model reads the language, but the platform reads the language continuously, consistently, and tied to who said it. It is also why the broader tools for analyzing NPS verbatims exist as a category alongside general LLMs.

How to choose between an LLM and a platform

Use the cadence and stakes to decide. For a one-off analysis, an occasional read, or an exploratory investigation, ChatGPT or Claude is the faster, cheaper, perfectly good choice. For continuous NPS analysis where you need consistent themes across waves, traceability to source, revenue context, and the volume to handle thousands of verbatims automatically, a dedicated platform is the right tool, and many teams use both: LLMs for specific investigations, a platform for the ongoing program.

The decision rule: weight consistency, traceability, and context for continuous analysis, and weight speed and zero setup for one-off reads.

FAQ

Can you analyze NPS verbatims with ChatGPT or Claude?

Yes, and for a one-off batch they work well. Export the comments with their scores, de-identify them, and prompt the model to group them into themes with counts, average scores, and representative quotes, separating promoters from detractors. Then verify a few themes against the source comments. For an occasional analysis of a few hundred verbatims, this produces a genuinely useful read in minutes.

What is the best prompt for analyzing NPS comments with an LLM?

Ask for structure, not a summary. A reliable prompt is along the lines of: "Group these NPS comments into themes. For each theme, return a label, the number of comments, the average score, and three representative quotes, and separate promoter themes from detractor themes." Structuring the request this way gives you a rankable list you can act on rather than a paragraph you have to re-interpret.

When should you use a platform instead of ChatGPT or Claude for NPS?

When NPS analysis becomes continuous. LLMs invent fresh categories each run, which breaks wave-over-wave trends; they do not trace results to source reliably; and they lack the revenue context to tell you which themes threaten the most ARR. For ongoing programs that need consistent themes, traceability, and revenue weighting at volume, a dedicated platform is the better fit, with LLMs still useful for ad-hoc investigations.

Is it safe to paste customer feedback into ChatGPT or Claude?

De-identify first. Strip names, emails, and any sensitive details before pasting customer data into a general LLM, and check your company's data-handling policies, since pasting raw customer information into third-party tools can raise privacy and compliance concerns. Keeping only the comment, score, and non-identifying attributes reduces the risk while still letting the model do the theme analysis.

How does Enterpret differ from using an LLM for NPS verbatims?

Enterpret analyzes NPS verbatims continuously rather than in one-off pastes. Its adaptive taxonomy keeps themes consistent across waves while adapting to new issues, its customer context graph ties each verbatim to account and revenue, and every theme traces back to the source comments. An LLM is excellent for an ad-hoc read; Enterpret is built for the ongoing program where consistency, context, and traceability matter.

For an occasional read, an LLM is a fine choice. For continuous NPS analysis, see the tools for analyzing NPS verbatims or book a demo.

‍

Related Guides

See all guides

What Is a Good NPS Score for B2B SaaS?