Extract Website Content Tool step
Use the Extract Website Content Tool step to turn websites into text and HTML for your Agents and Tools
The Extract Website Content Tool step will allow you to scrape and convert a website into text or HTML files for your Agents and Tools.
Add the Extract Website Content Tool step to your Tool
You can add the Extract Website Content Tool step to your Tool by:
- Creating a new Tool, then searching for the ‘Extract Website Content’ Tool step
- Click ‘Expand’ to see the full Tool step
- Enter the URL of the website you want to scrape and extract content from in ‘Website URL’
- Under ‘Method’, select ‘Text’ or ‘HTML’ based on what format you want to have your website content be extracted as
- Click ‘Run step’ to test out your Tool step with your inputs!
Advanced Settings
Model
You can choose between two models to use for this Tool step:
- Apify
- Browserless
The following Advanced Settings will then vary based on which Model you choose.
Apify Advanced Settings
Scrape Type
You can select between two scrape types:
- Simple HTML (cheaper)
- Full Web Page (expensive)
Use proxies
You can use proxies to scrape the website - this is more expensive.
Max depth
The maximum number of links starting from the start URL that the crawler will recursively follow. The start URLs have a depth of 0. Capped at 10.
Max pages
The maximum number of pages to crawl. Capped at 100.
Browserless Advanced Settings
These Advanced Settings can only be used if you select ‘Text’ as your ‘Method’ in the Tool step.
Element selector
You can specify which element from the HTML components to scrape. By default, it is set to body
. Note that using ’+ New item’, you can specify a list of elements to be scrapped.
Extra headers
If you need to provide special information to be able to scrape a website, provide the data as a JSON object. The below object shows an example where an authentication token called auth-token
and a user-id
are required.
Common errors
What’s the difference between this Tool step and Firecrawl?
Firecrawl is another Tool step you can use to scrape and extract website content. Unlike this Tool step, it requires you to sign up for Firecrawl and bring your Firecrawl API Key - however, it comes with more options for scraping outputs.
Both Tool steps can be used, and we recommend trying both to see which one suits your needs more for your Agents and Tools.