The Extract Website Content Tool step will allow you to scrape and convert a website into text or HTML files for your Agents and Tools.

Add the Extract Website Content Tool step to your Tool

You can add the Extract Website Content Tool step to your Tool by:

  1. Creating a new Tool, then searching for the ‘Extract Website Content’ Tool step
  2. Click ‘Expand’ to see the full Tool step
  3. Enter the URL of the website you want to scrape and extract content from in ‘Website URL’
  4. Under ‘Method’, select ‘Text’ or ‘HTML’ based on what format you want to have your website content be extracted as
  5. Click ‘Run step’ to test out your Tool step with your inputs!

Advanced Settings

Model

You can choose between two models to use for this Tool step:

  • Apify
  • Browserless

The following Advanced Settings will then vary based on which Model you choose.

Apify Advanced Settings

Scrape Type

You can select between two scrape types:

  • Simple HTML (cheaper)
  • Full Web Page (expensive)

Use proxies

You can use proxies to scrape the website - this is more expensive.

Max depth

The maximum number of links starting from the start URL that the crawler will recursively follow. The start URLs have a depth of 0. Capped at 10.

Max pages

The maximum number of pages to crawl. Capped at 100.

Browserless Advanced Settings

These Advanced Settings can only be used if you select ‘Text’ as your ‘Method’ in the Tool step.

Element selector

You can specify which element from the HTML components to scrape. By default, it is set to body. Note that using ’+ New item’, you can specify a list of elements to be scrapped.

Extra headers

If you need to provide special information to be able to scrape a website, provide the data as a JSON object. The below object shows an example where an authentication token called auth-token and a user-id are required.

{
    "auth-token":"AUTHENTICATION-TOKEN",
    "user-id":"USER-ID"
}

Common errors

What’s the difference between this Tool step and Firecrawl?

Firecrawl is another Tool step you can use to scrape and extract website content. Unlike this Tool step, it requires you to sign up for Firecrawl and bring your Firecrawl API Key - however, it comes with more options for scraping outputs.

Both Tool steps can be used, and we recommend trying both to see which one suits your needs more for your Agents and Tools.