Using the n8n HTML node with a CSS selector is the most effective way to pinpoint and extract specific data from a webpage’s HTML structure for your automations. This process typically involves two steps: first, using the HTTP Request node to fetch the page’s source HTML, and second, feeding that HTML into the HTML node’s ‘Extract HTML Content’ operation. Within the HTML node, you define your extraction logic by providing a CSS Selector to target an element, specifying the Return Value (like Text, an Attribute, or the inner HTML), and naming the Key where the extracted data will be stored, effectively turning unstructured web content into clean, usable JSON data.
What Even is a CSS Selector? (The Basics for Automation Pros)
Alright, let’s get one thing straight. You don’t need to be a front-end developer to be a master of web scraping. But you do need to understand the language of the web, and a big part of that is CSS selectors.
So, what are they? Think of a CSS selector as a specific address for a piece of information on a webpage. A webpage is just a big document of nested boxes (called HTML elements). A selector is the instruction that tells a browser—or in our case, n8n—exactly how to navigate through those boxes to find the one you want.
It’s like telling a friend how to find a book in a library. You wouldn’t just say, “It’s in the library.” You’d say, “Go to the second floor, find the ‘Fiction’ section, look for the shelf labeled ‘Sci-Fi’, and grab the third book from the left.” That’s a selector!
For most of your n8n web scraping, you’ll only need to know a few basic types:
- Element Selectors: Targets all elements of a certain type (e.g.,
p
for paragraphs,h2
for subheadings,a
for links). - Class Selectors: Targets elements with a specific class attribute. These start with a dot (e.g.,
.product-title
). - ID Selectors: Targets the one unique element with a specific ID. These start with a hash (e.g.,
#main-content
).
Understanding these three is 90% of the battle.
Your Web Scraping Toolkit: HTTP Request + HTML Node
In n8n, web scraping is a beautiful two-step dance. You can’t extract data from a page you don’t have.
- The Fetch: First, you drop an HTTP Request node onto your canvas. You set the Request Method to
GET
and paste the URL of the page you want to scrape. When you run this, n8n goes to that URL and grabs the entire raw HTML source code, just like your browser would. - The Extract: Next, you connect an HTML node. You set the Operation to
Extract HTML Content
. The input from the HTTP Request node (the raw HTML) automatically becomes the source. Now, you tell the node what to pull out. This is where you’ll configure the Extraction Values using your newfound CSS selector knowledge.
In the Extraction Values section, you’ll define:
- Key: The name for your new JSON field (e.g.,
productName
). - CSS Selector: The ‘address’ of the data you want (e.g.,
h1.product-title
). - Return Value: What you want from that address. Usually, this is
Text
(the visible text), but you could also grab anAttribute
(like thehref
from a link) or the rawHTML
.
The Easiest Way to Find Your n8n HTML CSS Selector
Now, here’s the part that feels like a superpower. You don’t have to manually read through thousands of lines of HTML to figure out the selector. Your browser will do it for you!
I still remember the first time I did this; it felt like cheating. Here’s how:
- Open the webpage you want to scrape in a browser like Chrome or Firefox.
- Find the exact piece of data you want to extract (a price, a name, a headline).
- Right-click on it and select Inspect.
- Your browser’s Developer Tools will pop up, with the exact line of HTML for that element highlighted.
- Right-click on that highlighted HTML line.
- Navigate to Copy > Copy selector.
Voila! You now have a CSS selector copied to your clipboard, ready to paste directly into your n8n HTML node. Easy, right?
Building More Resilient Selectors (The Pro Move)
Let’s be honest about this. The selector your browser copies for you is often… ugly and fragile. It might look something like this: body > div:nth-child(2) > main > div:nth-child(4) > p
.
This selector is extremely specific about the position of the element. If the website’s developer adds a single new <div>
anywhere in that chain, your selector breaks. Your workflow fails. You get paged at 3 AM. Not ideal.
A professional automator builds for resilience. Instead of relying on position, we look for more stable identifiers, like unique class names or data attributes.
Case Study: Scraping a Product Table
Imagine you want to scrape a table of products, getting the product name and its price from each row. A beginner might right-click the first product name, copy its selector, then right-click the first price, copy its selector, and so on. This is tedious and will break.
The pro approach is to think structurally.
- Inspect the table. You notice each product row has a
<tr>
element with the classproduct-row
. - Inspect the name and price. Inside each
product-row
, the name is in a<span>
with the classproduct-name
, and the price is in a<div>
with the classprice-tag
.
Instead of a fragile selector, you can now build a robust one. First, you’d configure the HTML node to grab all the rows using the selector tr.product-row
. This will give you one item for each row.
Then, you could use a Code node or another HTML node in a loop to extract the name and price from each row using the selectors .product-name
and .price-tag
respectively. This workflow will continue to work even if the website adds more columns or changes the layout, as long as those class names remain.
Selector Type | Example | Pros | Cons |
---|---|---|---|
Browser-Copied | div:nth-child(4) > td:nth-child(2) |
Fast to get for a one-off task. | Extremely fragile; breaks with minor layout changes. |
Class-Based | tr.product-row span.product-name |
Much more resilient to layout changes. | Relies on developers using consistent class names. |
Attribute-Based | span[data-testid="product-name"] |
Very stable; often used for testing. | Not available on all websites. |
Common Pitfalls and How to Solve Them
As you venture into web scraping, you’ll hit a few common snags. Here’s how to jump over them.
-
The Dreaded Space in Class Names: Sometimes a class attribute has multiple names, like
class="btn primary-action"
. If you see a selector in the n8n forums likediv.btn primary-action
, it won’t work. The space is interpreted as a descendant selector. To target an element with both classes, you must chain the class selectors together without a space:div.btn.primary-action
. -
Dynamically Loaded Content: This is the big one. If you run your workflow and get an empty result, but you can clearly see the data on the webpage, it’s likely loaded by JavaScript after the initial page load. The n8n HTTP Request node only gets the initial HTML. The best solution? Open the browser’s Dev Tools, go to the Network tab, filter by Fetch/XHR, and refresh the page. You can often find the direct API call the website uses to get that data. Calling that API directly is always more stable than scraping the front end.
Conclusion: From Web Page to Workflow Data
Mastering the n8n HTML CSS selector is a gateway skill. It transforms the entire internet into a potential data source for your automations. Start by letting the browser do the heavy lifting with “Copy selector,” but always aim to refine it for resilience by looking for stable class or ID attributes. Remember the two-step dance of HTTP Request and HTML Extract, and always be suspicious of content that might be loaded dynamically by JavaScript. With these principles in mind, you’re well on your way to pulling any data you need, directly into the heart of your n8n workflows.