- mhoeschele
Our Conversations Analysis is Super Human
When we set out to design an AI powered topic clustering system, we had four main goals:
Unbiased analysis
Treat distinct thoughts as distinct
Cluster based on the underlying meaning
Produce clear descriptions of each topic
Unbiased analysis
To make sure our analysis is unbiased we started by assuming it didn’t need to know anything about the responses it was analyzing. This means that we produce great analysis on day 1, no training data is required!
Treat distinct thoughts as distinct
After looking at thousands of user responses it was clear that people love to open up and tell our chatbots what’s on their minds. One of the side effects of this is that they tend to give more than one thought in a response, this is great! To ensure that we didn’t ignore one of the thoughts, or assume they are related we came up with a way to identify distinct thoughts from a response and separate them prior to forming topic clusters.
Cluster based on the underlying meaning
When a human analyzes data to identify the topics that are being discussed, they will put responses together if they are describing similar ideas. We have achieved the same result with our Natural Language Processing (NLP) algorithm, it focuses on the underlying meaning of each person’s response, not the specific words they use. To stay on the cutting edge we have taken pre-trained language models and optimized them for the conversational data we collect.
Produce clear descriptions of each topic
Now, all of this world-class analysis is of no use if you still have to read all the responses to understand the general topic. We have found that the two most critical pieces of information to understand topic clusters are a title and a 1 - 2 sentence summary. To generate these we give our AI all of the responses for a given topic cluster and ask it to generate a title and summary that best represent the topic described by the group while also being grammatically correct. We also provide the most representative keywords and exemplar verbatim responses.
What makes our analysis unique is how we were able to modify existing analysis tools that were optimized for datasets of multi-page documents. We made these technologies work with the conversational data we gather and produce instantly understandable results. This proprietary process consistently produces topics grouping responses that are highly similar in meaning regardless of how many responses there are, or the length of those responses.