Manually Tagging Customer Feedback is Ridiculous
Why product teams need a better system for quantifying and learning from their customer feedback.
Why product teams need a better system for quantifying and learning from their customer feedback.
It’s your 38th support ticket of the day. This one has been open for 5 days now. There has been a considerable back and forth, and you’ve had to get input from the engineering team as well. Finally, the customer confirms that the issue is resolved and thanks you for your help. While the overall resolution time on this one was high, you did well on the other two metrics - time to first response, and the CSAT score. You heave a sigh of relief, and excitedly move your mouse to hit ‘Close Ticket’.
But just then, a familiar horror strikes you – “I have to apply tags for this multi-day support interaction so that the product team can learn from it”. “Why can't my seemingly smart product team figure this out for themselves?”, you wonder for a moment. You catch yourself from wondering too much since there are a dozen more tickets left in your queue.
You open the tags dropdown and scroll through the utterly random and fast-growing assortment of 700+ support tags and select the two which makes the most sense. Time for the 39th ticket.
It's 2022, and this is still how 95%+ product orgs learn from their customer feedback. Multiple folks have written about the effort and the process to create a robust feedback taxonomy and maintain it. Initiatives like these are helpful to stay close to customer pain, and can work at lower feedback volumes. However, manual tagging falls apart spectacularly once things start to scale, and customer feedback is coming from multiple channels.
Fareed Mosavat brilliantly summarized this for me from his experience leading product and growth at companies like Slack and Instacart.
"Qualitative data is a critical input in product decision-making. Unfortunately, product teams can't even use the data most of the time because of how inaccurate the manual tagging is. If somehow, you do come around to the data trust factor, then you are faced with high-level tags like 'performance issues' or 'billing'. These are good to know but can't be acted upon in any meaningful manner since there is no way to dive any deeper actually to extract granular insights from it." - Fareed Mosavat, Chief Development Officer, Reforge
Here are the four most common systemic flaws we see in manually tagged customer feedback for fast scaling companies.
Each feedback source has its own taxonomy, if there is even the effort to tag all sources in the first place. Consequently, your Gong.io call recordings have a completely different set of tags compared to what your Sales Engineers are using in Salesforce notes, to how your social media team categorizes Twitter feedback, to how your support team tags Zendesk tickets.
What exactly are you learning from?
Dozens of humans quickly selecting from an ever growing list of 500+ tags, and applying them. What else were you expecting? The surprising part is not that there is inaccuracy, but the magnitude of it.
I’ve never seen any company have a greater than 60% accuracy in manually applied tags, something that all major stakeholders are already aware of.
This lack of trust means that qualitative data, a potentially invaluable source of customer insights, goes unused.
Tagging broad topics like ‘Billing - refund’ and ‘Mobile - performance’ is helpful for a 1000-foot view of the topics of customer feedback, but misses all the in-depth context and granular pain that product development teams crave to learn to drive product improvements.
Most manual tagging endeavors start with a relatively small list of a 15-20 tags, which explode over time by up to 50X as demand for granularity increases, and as product teams continuously ship new features and improvements.
Consequently, there is no proper thought put into the best way to organize your customer feedback taxonomy, and each set of new tags with every feature launch is just added to the list as an afterthought.
By the time this becomes a major pain for the product team, and you think of recreating and re-setting your feedback taxonomy, you’re already sitting on a corpus of about half a million pieces of feedback. What are you going to do, re-tag all of it manually?
Because learning from customer pain is the most important input in building a long-term successful product.
Flying blind on building products is a death knell. So, despite it being so painful and fundamentally flawed, almost all product development orgs go through this ordeal of learning and scaling manual tagging to give them the best shot of their product’s success.
So, how do you set up an effective feedback taxonomy that scales well over time? You need automation to counteract the major systemic flaws of learning from manual tagged customer feedback. But just randomly plugging in GPT-3 or Google Natural Language API isn’t going to solve any of the aforementioned systemic flaws. Here's how to do it correctly.
Here are the key requirements that you need to set up your automated feedback taxonomy for success.
Is it ‘zoom’ as in ‘I can’t zoom in on the image’ or ‘zoom’ as in ‘we need a Zoom integration’. These details matter a lot. Which is why just plugging an off the shelf NLP system like GPT-3 or AWS Comprehend is a terrible idea.
Any Machine Learning you use to automate feedback tagging absolutely has to be customized and fine-tuned to your company and product.
“There were 100 more mentions of dark mode this month compared to last month.” Okay, thank you. But we already have a dark mode. So, what exactly is the feedback and who’s asking for it? Now compare this to: “Since launch of the new beta UX, there is a 40% increase in users saying that the dark mode on Android does not work properly.”
This is actionable and gives you enough context into the user pain. Just broad topics won’t be actionable. Your feedback system needs to be able to identify and categorize granular and precise reasons from the raw feedback.
Imagine having analytics only on your Android app, and intentionally not tracking iOS and web users. You wouldn’t do that, right? So, why only learn from just the feedback of one or two sources, and intentionally leave out all others?
It is critical that the feedback taxonomy you have is consistently applied across all sources of feedback and across all languages. What you see as 20 complaints on the new sync issues, might actually be 200, which may completely change how you think about its prioritization.
The best product development teams are always learning, experimenting, and shipping. Your feedback taxonomy needs to be a living thing that can adapt and automatically update every few weeks to accommodate for new feedback patterns as well as new feature launches.
Hypergrowth teams periodically evaluate how their product pods are structured, and which product areas are under whose purview. It is important for the feedback taxonomy to be flexible to easily adapt for such changes, and update all historically applied feedback tags accordingly.
What has your company tried to implement in order to learn more effectively from customer feedback? Anything that has worked well, or not?
At Enterpret, we have put a lot of thought and effort into solving for all 5 of these required capabilities for product teams like Notion and Cameo. We collect your raw customer feedback wherever it lives and create a machine learning model customized to your product to automatically categorize feedback into themes and reasons. Layer that with easy-to-use analytics so that you can prioritize effectively and ensure you’re working on the highest impact customer problems.