Synthetic data has its uses – but real insight needs real people
While synthetic data will have an impact on the research industry, marketers should not rely on it for true insights.
It’s hard to believe that it’s been less than three years since OpenAI first released ChatGPT to the public.
That moment marked a turning point in the public’s perception of artificial intelligence. It gave most people their first tangible sense of what generative AI could achieve, setting minds whirring about the seemingly limitless potential of the technology.
The last two and a half years have therefore been characterised by a mad scramble for businesses to adopt the technology and demonstrate the efficiencies it is delivering. In market research, organisations are successfully integrating AI to remove the “drudge work”, vastly speeding up the laborious process of mass data analysis and reducing the time taken to provide marketing teams with the valuable insights they need.
AI-generated, “synthetic” data is also proving to be a useful tool for research development. The ability to put surveys to AI-generated profiles that closely mimic your audience is an effective way to test hypotheses, and potentially fast-fail them before resources are invested in carrying out the real research.
However, as with all technological developments, the real skill when it comes to AI is in recognising and understanding its limitations.
Arguing, as Evidenza’s Jon Lombardo and Peter Weinberg did in a recent column with Marketing Week, that synthetic data alone offers a risk-free silver bullet replacement for speaking to real people is a dangerous fallacy. It has its own grave risks, not just for businesses and marketing teams, but for our fundamental understanding of society.
Synthetic data is just that, synthetic
It’s worth reiterating that synthetic data is created by drawing from huge amounts of existing sources with their inaccuracies and biases, which may not be clear to the user.
Artificial profiles of the audience you are trying to understand are derived from that data, which then predict how their real-life counterparts would answer the questions being put to them, based on how they have responded to similar questions in the past.
The resulting data can be useful in providing a broad prediction of how a certain group might feel about a particular topic, but it offers no original “insight” at all. It tells marketers nothing about the actual views of their customer base.
While undoubtedly being cheaper in the short term, using synthetic data as the sole basis for marketing campaigns is fraught with long-term reputational and financial risk. Without engaging real people, you simply can’t predict with any certainty how they will react to your content. In an age when marketers are increasingly being asked to do more with less, the potential for an entire campaign to misfire on this basis is something they can ill afford.
Market research is a relationship that mustn’t falter in the age of changeSimilarly, while monitoring broad trends is useful for keeping an eye on the temperature of public opinion, we know from history that the most valuable insights – the ones that provide those sparks of innovation that fuel the best marketing campaigns – often come from face-to-face conversations with real people.
I saw that first hand in my days with Shell. No algorithm would have told me that when truck drivers in the Middle East express their assessment of the quality of engine oil, they do so by rubbing thumb and middle finger together, rather than using the word ‘viscosity’. That physical gesture simply wouldn’t make it into data transcripts.
The great value of in-person focus groups is that, done well, they’re inherently unpredictable. It’s often the case that data-backed hypotheses are debunked in their entirety, giving researchers invaluable insight and additional considerations to feed into their campaign development.
This simply cannot be replaced by AI derived data.
Synthetic respondents do not exist in isolation
More fundamentally, synthetic data is itself only possible because of the high-quality real-world research that informs it.
In other words, you can only use AI to predict what your customer might think about a new campaign because somebody asked them about their views on a similar one in the past.
AI needs to be regularly refreshed with primary research, or the data becomes outdated, out of context and “polluted” by the biases and problems with historic information. It creates a doom loop where AI is informing AI to the point at which it becomes impossible to decipher what is real and what isn’t.
Stop selling creative disruption and start selling risk reductionWhile this type of data is ultimately useless for marketers, it also has a more chilling societal effect. With LLMs tending towards the ‘average,’ societal biases become continually reinforced. Those who don’t experience life through the norms of the majority are missed, further marginalising the views and opinions of underrepresented groups.
All you need to do is read Caroline Criado-Perez’s Invisible Women to understand the real world dangers of decision-making being dominated by one section of society.
There is not just a moral rationale for avoiding this, but a commercial one too. Think of the market opportunities a business could miss by failing to accurately capture the diverse views from across its customer base.
Harnessing AI’s magic while recognising its limitations
There’s no doubting that synthetic data, and AI more broadly, are incredible developments that will support marketing and research teams in the years to come.
However, there’s a danger that AI becomes the emperor’s new clothes, and claims regarding its efficacy won’t help marketers integrate it sensibly into their suite of tools.
To describe synthetic data as a risk-free alternative to real research is to fundamentally misunderstand, firstly, how it works, and secondly, what the word “insight” truly means.
Jane Frost CBE is the chief executive of the Market Research Society (MRS)






