Convert audio/video to text
Extract text from an audio/video file
Audio/Video to text helps save us time when the source data is in audio/video format. When the data is transcribed, it is easy to apply any text processing tool such as summarization, quote extraction and theme and topic identification. More importantly, we will no longer need to listen to the file over and over again when we require some information.
On this page, we will introduce three available Tool steps at Relevance to convert audio/video to text.
Relevance is currently more focused on text analysis. As a result, the supported upload size does not need to be massive. Keep in mind that audio is all needed for transcription. So, if you are working with large video files that might not be possible to upload to the platform, you can use tools such as QuickTime to export the audio from and audio file.
How to use Convert audio/video to text step
Available converters
- Open AI audio-to-text converter: only supports audio input and you will need your API key to use this converter
- Deepgram audio/video-to-text converter: Relevance AI’s default audio and video-to-text converter
- Assembly AI audio/video to text converter: Supports both audio and video, highly accurate and the most expensive option
Add the component
Add the desired audio (video) to text converter step to your Tool (check how to get started with creating a tool).
File URL
All the aforementioned converters require a file as an input. If your file is publicly accessible on the web (i.e. with no
authentication or sign-up requirement), simply provide the URL directly or as a
text input. Otherwise, you will need to add a
File-to-URL input.
In either situation, use the {{variable name}}
to provide the data to the converter.
Follow the links below for more information about
- How to run a step
- How to delete a step
- How to configure output
- How to configure a default value
- How to move a step in a Tool
- How to duplicate a step
- How to add condition to a step (i.e. execute only if a condition is met)
- How to loop a step (i.e. run one step multiple times)
Access the step output
The output produced by different audio/video processing tools follows different structure. You might want to experiment with the structure in a code step.
Deepgram
The output is a dictionary. Below you can see samples where the default name assigned to the step audio_to_text_v2
is used.
Note that a step name is different from the step title. Step titles can be found on the top left
of steps. A step name is shown on the bottom left, in smaller font and highlighted green.
- To access the transcription in a JavaScript code step:
- To access utterances and metadata such as speaker label and start and end time of each utterance in a JavaScript code step:
AssemblyAI
The output is a dictionary. Below you can see samples where the default name assigned to the step audio_to_text_v3
is used.
Note that a step name is different from the step title. Step titles can be found on the top left
of steps. A step name is shown on the bottom left, in smaller font and highlighted green.
- To access the transcription in a JavaScript code step:
- To access utterances and metadata such as speaker label and start and end time of each utterance in a JavaScript code step:
Common errors
Unsupported file format
An error similar to the one noted below indicates that the provided input file is not of the supported formats.
Use an audio/video editor to export your file to common audio/video formats and try again.
File size
This error only occurs for too large files (60+ MB). If you are working with large video files, use video editing tools such as Quicktime to extract the audio.
Was this page helpful?