With the advent of ChatGPT, there's been a lot of interest in using AI. However, one of the most common questions we get asked is how users can make use of ChatGPT-like technology on their own data that won't fit into its prompt limits.
In this short blog, we'll cover how you can use ChatGPT on your own CSV and avoid the prompt limit by making use of other AI techniques with minimal effort.
For further exploration, we've crafted a detailed blog post on how to tackle context limits in large language models.
Bypassing ChatGPT limit
ChatGPT has limits on the amount of data it processes due to the resource requirements. The larger the input, the larger the requirements on resources to process a response which can be expensive and slow. For this reason, limits are enforced to keep the service stable and available for free (or at a cost now with the new Professional paid tier).
The solution to bypassing the limit is by being selecting about what is put into the input. For example, if we have a PDF with 30 pages we need to extract only the parts of the PDF that are relevant to our question for ChatGPT to use.
In order to be feasible, this needs to be automated as otherwise we'd be spending as much time searching for the relevant sections and not need AI.
Thankfully, this is where vector embeddings come in. We can use AI search to compare our question to all the content in the PDF and then extract the top few results and use those as input for ChatGPT.
Now, when we ask ChatGPT a question on our data - we provide it with just the sections of our data that can help answer the question meaning we can fit into the limit.
How to try it yourself?
If you'd like to try this for yourself you have two choices:
- If you know how to code, you can follow a tutorial (like this one from OpenAI) to prepare the data and build out the service needed to use it. If you're looking to just prototype then this can be fairly easy. However, if you're looking to build a production service then there are many complications involved.
- Use a fully managed solution like Relevance AI and it's "Ask Relevance" feature which automatically prepares your data and provides a search box (and API) to extract the relevant content and run GPT. You can follow a tutorial (5 minutes) about it here and sign up here.
Use GPT on your own CSV or PDF within a few minutes with Relevance AI
An alternative method is to finetune GPT with your own dataset in order to "teach" the large language model about your dataset. This can be a great approach if you have the budget and the knowledge how to optimise the training for your data without reducing its performance.
It would also require to continuous training to handle new data is added whereas with the previous approach it would be as simple as adding it your dataset.