One of the frequently used templates at Relevance is “Categorize text”. This Tool receives a list of predefined categories/themes/topics and a dataset composed of at least one column containing text values such as reviews. The Tool goes through the data and categorizes each text value to the matching categories/themes/topics from the provided list.

Categorize text

You can bring your own list of categories/themes/topics (BYO) or use Suggest category Tool as the first step of text categorization flow.

How to use the Tool

Locate the Tool in the template page and click on Use template. You can use the Tool as is or clone it.

Tool inputs and output

The Tool requires three main and one optional inputs:

  1. Text to categorize
  2. List of categories/themes/topics: To be entered as a one category per line
  3. Maximum number of expected categories per input text Provide the main inputs and hit Run once, you will see the LLM response in a few seconds similar to what is shown in the image below.

Categorize text

  1. Example (Example(s) of categorization done by you): LLMs are proven to work better when they see samples. Provide sample(s) of your text data and how you would categorize it using items from the list that you entered in the second input. Use , when multiple categories/themes/topics apply.

Good Categories

  1. Clear: AI is not an expert in your domain. Keep the categories simple and clear as if you are helping an intern.
  2. Unbiased: Even though AI understands sentiment, it is better to keep the categories neutral (Service vs Good service)
  3. Use / to combine two relevant categories (e.g. Customer Service / Support)
  4. Avoid including overlapping categories

The output is a list of categories assigned to the input.

Tool execution

Tools and templates can be

Tool components

If you clone a template, or make a Tool from scratch, you will have access to the Build tab. Build is where one put together different components to build a Tool suitable for their needs.

User inputs

User inputs

  1. Long text input: An input text component suitable for long text pieces (more than one line), such as answers to a question, reviews, a text to summarize.

    This component is used twice in this Tool. Both input text (Text to categorize) and the list of categories (List of categories) are of long tex inputs.

  2. Numeric input: An input component suitable for providing numeric values, such as scores, age, maximum or minimum required values.

    Use the default value (3) or enter your preferred maximum number of categories to be assigned to each input.

  3. Table: A component for entering structured data as input, for instance, rows of samples, each containing fields such as name, last name and age.

    This component allows you to provide samples of text categorization done by you.

    • Enter the input sample under “text”
    • Enter matching categories under “topics”
    • Use categories/themes/topics from the list you provided with the exact same spelling
    • Use , when multiple categories/themes/topics apply

Tool steps

There are 4 components under the Tool steps in this analysis flow. These components take care of three tasks: properly formatting the provided categories/themes/topics, the LLM step and formatting the output.

Properly formatting the provided categories/themes/topics

String to List formatting code

A Python code component is available to Run Python codes when necessary.

In this case, the Python code, filters out empty lines and creates a list of categories/themes/topics from the provided text.

Large Language Model (LLM)

LLM

A large language model component is all set up to provide you access to GPT (and many other LLMs). In the prompt section, you will provide the required information as well as instructions to what is expected to be done.

A Good Prompt

  1. Be short and precise with your instruction/request from the LLM
  2. Stick to one term when referring to the same concept throughout the prompt
  3. Note the goals and important instructions closes possible to the end of the prompt
  4. Explicitly note constraints and goals
  5. Specify a data scope using ", """ or similar identifiers
  6. Include formatting instruction when necessary

Properly formatting the output

  1. A string to JSON component is available which receives a CSV file and extract the data under JSON format which can be later used for further processing.

    code This is to make sure we can properly save the results in the desired structure.

  2. Filtering out unwanted categories/themes/topics code

    A Python code component is available to Run Python codes when necessary.

    Occasionally the LLM might output categories/themes/topics that are not in the pre-specified list. This is taken care of using a simple Python code snippet in this Tool.