How to Analyze Audio 98% Faster with AI

6 min read

A Comprehensive Guide to Automating Audio Analysis

In this blog post, we're going to demonstrate how to automate the manual task of analysing audio files.

This is a step-by-step guide on how to build an AI Tool from scratch for AI audio analysis.

What we cover in this post

In this blog post, we discuss how to develop an AI Tool capable of converting audio to text, analyzing that text, summarizing it, and extracting specific themes that we define in advance.

As a tangible example, we've recorded an interview about a person's visit to McDonalds. We've developed an AI Tool in less than 10 minutes that can analyze that interview with a single click.

Tasks The AI Tool will Execute

  1. Transcribe the audio, converting speech into text
  2. Summarize the key talking points in the audio file
  3. Extract predefined themes from the audio file
  4. Synthesize findings into a structured table with specific columns (e.g. key theme, summary, quotes from transcript)
  5. Employ the AI to craft a comprehensive report using the derived analysis and summary table
  6. Leverage the AI Tool for broader audience access

Ready to start your journey? Let's dive in!

Build your AI Tool to analyse audio 5000% faster

1. Set up the inputs for the AI tool

In this scenario, we use the audio file about the interview we created.

Our inputs include:

  1. The audio file
  2. The themes to extract from the audio file

2. Convert audio to text

In this step, we'll implement a 'Convert audio to text' transformation. We'll be using OpenAI’s model for this, but you can chose from a variety of models available in our transformation library.

We want to drag our audio file into our input.

So, in this initial step, we'll reference that input.

This is why we switch to the 'variable selection' mode, to reference the variable of our file input.

3. Summarize the contents of the audio file

In this step, we'll incorporate our first Large Language Model (LLM) step. We have a variety of models to choose from, but in this instance, we'll be using GPT 3.5 - the same one ChatGPT uses.

This LLM step is designed to write a summary based on the transcribed text.


CONTEXT: “”” {{transcript. text}} “””Summarise the key talking points in the interview transcript above.

4. Extract predefined themes from the audio file

In this step, rather than merely receiving a summary of the audio file, we aim to identify specific themes that our AI tool will extract from the audio.

First, we need to incorporate a 'text list' input with values like customer information, store location, experience summary, and so forth.

We'll then establish these values as default, eliminating the need to input them each time we operate our AI tool.

Subsequently, we'll modify our LLM prompt to accept these themes.


CONTEXT: “”” {{transcript.text}} “””Based on the above transcript, populate information about the key themes in this list:  “”” {{themes}} “””The output should be in bullet point format.

5. Synthesize findings into a structured table

In this step, we'll delve deeper. Instead of presenting our summary in bullet points, we aim to display it in a table.

This will provide us with a clear view of key themes, the summary and the actual quotes from the interview that support the summary.

To accomplish this, we need to slightly modify the prompt.


CONTEXT: “”” {{transcript.text}} “””Based on the above transcript, extract information about the key themes in this list: “”” {{themes}} “””Using this information, create a table with columns: "Key theme", "Summary", "Quotes from transcript".

6. Write a comprehensive report about the analysis

Finally, in this step, we'll add a new LLM step where we'll instruct the AI to write a comprehensive research report for us, based on previous findings like summaries, quotes, and extracted themes.


CONTEXT: "”” {{summary_table. answer}} “””

Using the table of information in the context above, write a report about the interview that can be given to executives of the restaurant chain.

The report should conclude with a set of recommendations presented as a table that includes a projected impact for those recommendations.


  1. The report should be written at a high school reading level
  2. The report should be no more than 500 words long

7. Share the app or embed as iframe

You can share this app or embed into any application as iframe. Click on the ‘Use’ tab and click on the ‘share’ button to generate a sharable link.

Wrapping up

Ready to kick-start your journey with AI company analysis?

By automating audio analysis with AI, you can save time, reduce errors, and gain valuable insights that can help you make more informed decisions faster.

Why not take the next step and sign up to Relevance AI for free?

You can get to value in less than a few minutes. With Relevance AI, you can do the same as described in this blog post and much more.

September 8, 2023
Benedek Zajkas
You might also like