Why you should forget qualitative data coding

Surprisingly, most people in the qualitative insights business still think that manual coding is the best way to organize and structure observations and interpretations from qualitative data.

It’s a strong misconception that’s caused by the fact that it’s hard to keep up to date with advancements in natural language processing.

The rate of progress in machine learning and AI is accelerating from week to week. It creates a knowledge gap that widens with the rapid exponential improvements in these technologies.

It’s essential to close this knowledge gap because data from customer feedback systems, survey open ends, social media text and more, often contain the most valuable and actionable insights.

In this article, we share 6 facts about the disadvantages of qualitative data coding through the lens of state-of-the-art qualitative data analysis.

We aim to equip professionals with the necessary understanding of the opportunity cost of qualitative data coding.

As we’ll see, manual coding and software-assisted coding are very similar from these perspectives.

1. Coding is time-consuming

Picture this process:

“A coder needs to make sense of 1000 open-ended survey responses. She reads through the first 100 responses line-by-line to get an idea of the content.

She decides the number of codes she wants to use and groups the responses into themes assigning the first set of codes.

She continues reading to uncover new themes. She adds and removes codes from the original code frame.

She subjectively decides how to prioritize important themes.

She continues to iterate, repeating the previous steps over and over again until she creates a code-frame for the 1000 responses.”

It’s an iterative, long process. But how much time does it take?

Based on our conversations with market research companies, it takes about an hour to code 100 open-ended survey responses.

From another angle, it takes approximately 8 hours to code an hour-long transcribed semi-structured interview.

Of course, these numbers may vary depending on the dataset, the coder’s experience and other factors as well.

We heard market research companies saying that for them, the rule of thumb is 1 minute per code.

Is it fast? Is it slow? Does it worth the effort?

Well, it’s still the best approach to interpret qualitative data. Right?


With state-of-the-art text analytics solutions, it takes minutes to analyze such an amount of data with even higher accuracy.

With that in mind, coding is an unnecessarily time-consuming solution. Why would you waste your time with unnecessary processes and activities?

2. Coding is expensive

“Time is money.”

As the process of qualitative coding takes a lot of time to complete, it’s proportionally expensive.

It’s not only the time element that makes coding relatively expensive but the expertise required. Coders are usually trained professionals with years of experience in coding.

You need to pay for this expertise.

According to our conversations with insights and market research companies, outsourced qualitative data coding can cost up to 50 cents per code.

The total cost significantly increases with larger datasets. Furthermore, subsequent projects require new code frames to capture unknown, emerging themes.

3. Coder’s skill is a limiting factor

Coders need to be trained as their skills and experiences influence the quality of the analysis.

We want our qualitative insights to be grouped into codes based on the context and not only keywords.

It means that the coder will have to read through the text and interpret it to create and assign codes based on the meaning of the text.

It requires expertise, and it takes time to do well.

The coder must also choose the right number of codes in the code frame.

The code should be generic enough to apply to multiple comments but specific enough to be actually useful.

4. Coders introduce bias to the analysis

According to our conversations with leading market research companies, “there is a maximum 70% agreement between 2 human coders”.

It means that coders code the same dataset differently, as subjective judgement is always part of the observation.

Let’s stop here for a moment and think about it. It’s crazy, isn’t it?

It introduces additional, unnecessary challenges to consistency over time. It becomes someone’s job to do quality control.

Humans have a lot of inherent, preconceived notions that can influence their coding decisions. They might not be aware of them, therefore it’s hard to control.

It can cause confirmation bias and tunnel vision; Coders often look for evidence in the data to confirm an existing hypothesis, neglecting important but contradictory insights.

5. Coding is inaccurate

Based on the previous points, one can easily argue that data coding is inaccurate. It’s subjective, biased, time-consuming and therefore error-prone.

When we define accuracy, we ask two important questions;

To what extent does the analysis capture all the themes in a dataset? We can call this aspect ‘coverage’.

On average, to what extent are the codes accurately assigned to certain themes? We can call this aspect ‘assignment accuracy’.

When coders start with a pre-defined set of codes (e.g. from previous research), they usually miss essential themes from the data.

They can’t maximize coverage without changing the previous code frame, as the new dataset most likely contains new themes.

When coders create codes based on the qualitative data itself, they might get better results.

When coders maximize coverage, the process takes a lot of time. Hence there’s more room for error that can compromise the assignment accuracy.

6. Coding doesn’t scale

Ultimately, we all want to improve our customers’ experience, grow revenue and reduce costs.

We want to be more effective in our actions to do more with less.

Scaling is when the outcomes and benefits of our actions increase without a substantial increase in the input resources.

Based on our conversations with leading market research and insights companies, “qualitative feedback contains the most actionable and useful insights.”

More qualitative feedback = better insights.

Surprisingly, most of these companies can’t ramp up their qualitative data collection because they don’t have the capacity to analyze it.

It’s because qualitative data coding doesn’t scale.

2x more qualitative data = 2x more time spent on coding, 2x more cost, and 2x more bias and errors.


Qualitative data coding is expensive, time-consuming, biased and therefore doesn’t scale.

The good thing is that with today’s rapid advancement in artificial intelligence, it doesn’t have to be this way.

Using the powerful no-code Relevance AI platform, you can analyze your qualitative data in minutes, automating your coding practices. It’s consistent, unbiased, and it’s designed to scale.

Benedek Zajkas
You might also like