Manually Tagging Customer Feedback is Ridiculous

Varun Sharma
Co-founder, CEO
May 19, 2022

It’s your 38th support ticket of the day. This one has been open for 5 days now. There has been a considerable back and forth, and you’ve had to get input from the engineering team as well. Finally, the customer confirms that the issue is resolved and thanks you for your help. While the overall resolution time on this one was high, you did well on the other two metrics - time to first response, and the CSAT score. You heave a sigh of relief, and excitedly move your mouse to hit ‘Close Ticket’.

But just then, a familiar horror strikes you – “I have to apply tags for this multi-day support interaction so that the product team can learn from it”. “Why can't my seemingly smart product team figure this out for themselves?”, you wonder for a moment. You catch yourself from wondering too much since there are a dozen more tickets left in your queue.  

You open the tags dropdown and scroll through the utterly random and fast-growing assortment of 700+ support tags and select the two which makes the most sense.  Time for the 39th ticket.

A customer support tool shows a section for tags for the customer support agent to apply to the ticket. The dropdown shows that there are over 700 different tags to choose from.
Most product teams still rely on support teams to choose from hundreds of tags to manually tag customer feedback. Put yourself in their shoes -- this can't scale.

It's 2022, and this is still how 95%+ product orgs learn from their customer feedback. Multiple folks have written about the effort and the process to create a robust feedback taxonomy and maintain it.  Initiatives like these are helpful to stay close to customer pain, and can work at lower feedback volumes. However, manual tagging falls apart spectacularly once things start to scale, and customer feedback is coming from multiple channels.

Fareed Mosavat brilliantly summarized this for me from his experience leading product and growth at companies like Slack and Instacart.

"Qualitative data is a critical input in product decision-making. Unfortunately, product teams can't even use the data most of the time because of how inaccurate the manual tagging is. If somehow, you do come around to the data trust factor, then you are faced with high-level tags like 'performance issues' or 'billing'. These are good to know but can't be acted upon in any meaningful manner since there is no way to dive any deeper actually to extract granular insights from it." - Fareed Mosavat, Chief Development Officer, Reforge

Here are the four most common systemic flaws we see in manually tagged customer feedback for fast scaling companies.

Why manually tagging customer feedback doesn’t work

1. Inconsistent across all feedback sources

Each feedback source has its own taxonomy, if there is even the effort to tag all sources in the first place. Consequently, your Gong.io call recordings have a completely different set of tags compared to what your Sales Engineers are using in Salesforce notes, to how your social media team categorizes Twitter feedback, to how your support team tags Zendesk tickets.

What exactly are you learning from?

Image shows icons for different feedback sources like Slack, Twitter, Zendesk, app store reviews, and Salesforce; superimposed over a jumble of different tags for customer feedback.
Customer feedback comes in from multiple channels, each with their own teams categorizing and tagging feedback in their own way. This makes it impossible to aggregate learnings across all your feedback sources.

2. Grossly Inaccurate

Dozens of humans quickly selecting from an ever growing list of 500+ tags, and applying them. What else were you expecting? The surprising part is not that there is inaccuracy, but the magnitude of it.

I’ve never seen any company have a greater than 60% accuracy in manually applied tags, something that all major stakeholders are already aware of.

This lack of trust means that qualitative data, a potentially invaluable source of customer insights, goes unused.

3. Tags that are too broad and hence in-actionable

Tagging broad topics like ‘Billing - refund’ and ‘Mobile - performance’ is helpful for a 1000-foot view of the topics of customer feedback, but misses all the in-depth context and granular pain that product development teams crave to learn to drive product improvements.

4. Feedback taxonomy that can’t adapt and scale with your product

Most manual tagging endeavors start with a relatively small list of a 15-20 tags, which explode over time by up to 50X as demand for granularity increases, and as product teams continuously ship new features and improvements.

Consequently, there is no proper thought put into the best way to organize your customer feedback taxonomy, and each set of new tags with every feature launch is just added to the list as an afterthought.

By the time this becomes a major pain for the product team, and you think of recreating and re-setting your feedback taxonomy, you’re already sitting on a corpus of about half a million pieces of feedback. What are you going to do, re-tag all of it manually?

If it so flawed, why is everyone still manually tagging as much customer feedback as they can?

Because learning from customer pain is the most important input in building a long-term successful product.

Flying blind on building products is a death knell. So, despite it being so painful and fundamentally flawed, almost all product development orgs go through this ordeal of learning and scaling manual tagging to give them the best shot of their product’s success.

So, how do you set up an effective feedback taxonomy that scales well over time? You need automation to counteract the major systemic flaws of learning from manual tagged customer feedback. But just randomly plugging in GPT-3 or Google Natural Language API isn’t going to solve any of the aforementioned systemic flaws. Here's how to do it correctly.

How to set up an effective automated customer feedback taxonomy

Here are the key requirements that you need to set up your automated feedback taxonomy for success.

1. Customized and fine-tuned to your product’s terminology

Is it ‘zoom’ as in ‘I can’t zoom in on the image’ or ‘zoom’ as in ‘we need a Zoom integration’. These details matter a lot. Which is why just plugging an off the shelf NLP system like GPT-3 or AWS Comprehend is a terrible idea.

Any Machine Learning you use to automate feedback tagging absolutely has to be customized and fine-tuned to your company and product.

2. Granular enough to make it actionable

“There were 100 more mentions of dark mode this month compared to last month.” Okay, thank you. But we already have a dark mode. So, what exactly is the feedback and who’s asking for it? Now compare this to: “Since launch of the new beta UX, there is a 40% increase in users saying that the dark mode on Android does not work properly.”

This is actionable and gives you enough context into the user pain. Just broad topics won’t be actionable. Your feedback system needs to be able to identify and categorize granular and precise reasons from the raw feedback.

Chart showing the top issues since launch for a beta group. The chart shows the top issues with granular reasons, like "UI not responsive", "screen is cut-off at the left", and "dark mode not working on Anrdoid".
Broad topics like "mobile" and "dark mode" aren't actionable. Your feedback system needs to be able to identify and categorize granular and precise reasons from the raw feedback.

3. Consistent across all sources and languages

Imagine having analytics only on your Android app, and intentionally not tracking iOS and web users. You wouldn’t do that, right? So, why only learn from just the feedback of one or two sources, and intentionally leave out all others?

It is critical that the feedback taxonomy you have is consistently applied across all sources of feedback and across all languages. What you see as 20 complaints on the new sync issues, might actually be 200, which may completely change how you think about its prioritization.

4. Updates automatically as your product changes

The best product development teams are always learning, experimenting, and shipping. Your feedback taxonomy needs to be a living thing that can adapt and automatically update every few weeks to accommodate for new feedback patterns as well as new feature launches.

5. Backwards compatible and re-organizable

Hypergrowth teams periodically evaluate how their product pods are structured, and which product areas are under whose purview. It is important for the feedback taxonomy to be flexible to easily adapt for such changes, and update all historically applied feedback tags accordingly.

Are you still trying to learn from manually tagged feedback?

What has your company tried to implement in order to learn more effectively from customer feedback? Anything that has worked well, or not?

At Enterpret, we have put a lot of thought and effort into solving for all 5 of these required capabilities for product teams like Notion, Loom, Descript, The Browser Company and Ironclad. We collect your raw customer feedback wherever it lives and create a machine learning model customized to your product to automatically categorize feedback into themes and reasons. Layer that with easy-to-use analytics so that you can prioritize effectively and ensure you’re working on the highest impact customer problems.

If you’d like to share your story or learn more, reach out at namaste@enterpret.com or schedule a demo.

Get a demo with your data
Enterpret allowed us to listen to specific issues and come closer to our Members - prioritizing feedback which needed immediate attention, when it came to monitoring reception of new releases: Enterpret picked up insights for new updates and became the eyes of whether new systems and functionality were working well or not.
Louise Sellars
Analyst, Customer Insights
Enterpret is one of the most powerful tools in our toolkit. It's very Member-friendly. We've been able to share how other teams can modify and self-serve in Enterpret. It's bridged a gap to getting access to Member feedback, and I see all our teams finding ways to use Enterpret to answer Member-related questions.
Dina Mohammad-Laity
VP of Data
The big win-win is our VoC program enabled us to leverage our engineering resources to ship significantly awesome and valuable features while minimizing bug fixes and" keep the lights on" work. Magnifying and focusing on the 20% that causes the impact is like finding the needle in a haystack, especially when you have issues coming from all over the place
Abishek Viswanathan
CPO, Apollo.io
Since launching our Voice of Customer program six months ago, our team has dropped our human inquiry rate by over 40%, improved customer satisfaction, and enabled our team to allocate resources to building features that increase LTV and revenue.
Abishek Viswanathan
CPO, Apollo.io
Enterpret's Gong Integration is a game changer on so many levels. The automated labeling of feedback saves dozens of hours per week. This is essential in creating a customer feedback database for analytics.
Michael Bartimer
Revenue Operations Lead
Enterpret has made it so much easier to understand our customer feedback. Every month I put together a Voice of Customer report on feedback trends. Before Enterpret it would take me two weeks - with Enterpret I can get it done in 3 days.
Maya Bakir
Product Operations, Notion
The Enterpret platform is like the hero team of data analysts you always wanted - the ability to consolidate customer feedback from diverse touch points and identify both ongoing and emerging trends to ensure we focus on and build the right things has been amazing. We love the tools and support to help us train the results to our unique business and users and the Enterpret team is outstanding in every way.
Larisa Sheckler
COO, Samsung Food
Enterpret makes it easy to understand and prioritize the most important feedback themes. Having data organized in one place, make it easy to dig into the associated feedback to deeply understand the voice of customer so we can delight users, solve issues, and deliver on the most important requests.
Lauren Cunningham
Head of Support and Ops
With Enterpret powering Voice of Customer we're democratizing feedback and making it accessible for everyone across product, customer success, marketing, and leadership to provide evidence and add credibility to their strategies and roadmaps.
Michael Nguyen
Head of Research Ops and Insights, Figma
Boll & Branch takes pride in being a data driven company and Enterpret is helping us unlock an entirely new source of data. Enterpret quantifies our qualitative data while still keeping customer voice just a click away, adding valuable context and helping us get a more complete view of our customers.
Matheson Kuo
Senior Product Analyst, Boll & Branch
Enterpret has transformed our ability to use feedback to prioritize customers and drive product innovation. By using Enterpret to centralize our data, it saves us time, eliminates manual tagging, and boosts accuracy. We now gain near real-time insights, measure product success, and easily merge feedback categories. Enterpret's generative AI technology has streamlined our processes, improved decision-making, and elevated customer satisfaction
Nathan Yoon
Business Operations, Apollo.io
Enterpret helps us have a holistic view from our social media coverage, to our support tickets, to every single interaction that we're plugging into it. Beyond just keywords, we can actually understand: what are the broader sentiments? What are our users saying?
Emma Auscher
Global VP of Customer Experience, Notion
The advantage of Enterpret is that we’re not relying entirely on human categorization. Enterpret is like a second brain that is looking out for themes and trends that I might not be thinking about.”
Misty Smith
Head of Product Operations, Notion
As a PM, I want to prioritize work that benefits as many of our customers as possible. It can be too easy to prioritize based on the loudest customer or the flavor of the moment. Because Enterpret is able to compress information across all of our qualitative feedback sources, I can make decisions that are more likely to result in positive outcomes for the customer and our business.
Duncan Stewart
Product Manager
We use Enterpret for our VoC & Root Cause Elimination Program - Solving the issues of aggregating disparate sources of feedback (often tens of thousands per month) and distilling it into specific reasons, with trends, so we can see if our product fixes are reducing reasons.
Nathan Yoon
Business Operations, Apollo.io