How to generate OpenAI embeddings on an entire dataset?

OpenAI's embeddings are based on the GPT model that most would've seen used to generate content. In this tutorial, we’ll show how we can get started with using OpenAI embeddings on an entire dataset by leveraging Relevance AI as our data store and computation. We’ll store the data and the vectors in a Dataset and we’ll use a Workflow to generate the embeddings. You can complete the whole process within 10 minutes and not touch a line of code.

Creating a OpenAI account

In order to use the embeddings API by OpenAI, we need to create an account and get an API key. Head to the OpenAI dashboard, set up your account and get your API keys from the settings.

Creating a Relevance AI account

Relevance AI will be our database, computation and API - so we’ll need an account to get started. Head to the Relevance AI dashboard and set up your account.

Creating your first dataset

Grab your dataset - this can be a CSV, PDF, Video file or more. If it’s a CSV, make sure to include a header for each column that will define the name of the field. Once you have it ready, drag and drop it into the dashboard and name your dataset.

Running your first workflow

Workflows are at-scale transformations of Datasets. Relevance AI has a large list of workflows that you can run. For this tutorial, we’ll be using the workflow to vectorise text using OpenAI. Head to the Workflows tab, search for OpenAI and select the “Vectorise with OpenAI” workflow. From the form, select the fields to vectorise, which GPT model to use and your OpenAI API key. Click run and you’ll see the progress indicator.

Next steps

Now you have a Dataset with your data and a field on each row with your OpenAI embeddings. You can now run other workflows to make use of the embeddings or use the API to make vector search queries or get instant answers. Read more about how to get Q&A out of our newly created dataset here.

Daniel Vassilev
Vector Embeddings
You might also like