Facet Search and SEO – This Is How It Works.

Facet Search and SEO – This Is How It Works.

Filter and sorting functions help your users to find exactly the content of your website that they were looking for. That’s good for your users and therefore also for Google, isn’t it? Yes and no! In the case of poor implementation, such faceted navigation can hurt your SEO performance. In this blog post, I’ll tell you what is important.

What is a faceted search?

The faceted search (often also called faceted search / faceted navigation) is a functionality built into online shops that you can use to restrict a large selection of products with the help of certain features (facets). Using filters and sorting, users can then navigate to the desired products in a targeted manner.

On a category page with numerous T-shirts, a search can narrow down the range to the desired product by selecting properties for color and size. As a rule, the faceted navigation is clearly visible above the product list or in the sidebar.

Faceted navigation is not only used in e-commerce in classic online shops but also on the websites of numerous other industries:

  • Real estate portals (selection in terms of number of rooms, area, districts)
  • Car portals (selection in terms of mileage, color, body shape, first registration)
  • Travel websites (selection in terms of rating stars, accommodation types, hotel facilities, room facilities)

What is filter navigation?

The terms filter navigation and faceted search are often used synonymously. Strictly speaking, this is not correct because the filter navigation is only part of the faceted search. In contrast to the faceted search, a selection can only ever be restricted with the filter navigation according to one criterion.

Combination of filter and facet navigation

A combination of filters and faceted navigation can now be found in a large number of online shops. In this example, the only one-floor covering can be selected in the football boots category (filter). In addition, several characteristics can be selected for the size and color (facet).

Potential SEO issues

A faceted search is ideal from the user’s point of view, as you can click through the individual facets quickly and easily to exactly the right product. However, this can lead to serious SEO problems, which is usually due to the implementation of the search.

Why? The selection of facets often changes the URL because a corresponding parameter is added. For example, after selecting the color blue, the URL example.de/t-shirts/ becomes the URL example.de/t-shirts/?color=blau. Unless countermeasures have been taken by the website operator, this newly created independent URL may be indexed by the search engine.

Duplicate content can arise

We speak of duplicate content when identical content is provided on different URLs. This causes major problems for the search engines, as the algorithms have to decide which URL is the most suitable and should rank for the search query.

Parameter URLs can now cause problems in this area. If there is a category page with t-shirts and this is filtered according to size, for example, the result is a lot of pages that do not differ very much in terms of content. Likewise, the pages filtered by size can also be a duplicate of the regular category page.

Another variant of how duplicate content can arise is in the order of the filters. For example, depending on whether the color or size was selected first, the URLs could then be as follows:

  • example.de/t-shirts/?color=blau&groesse=large or
  • example.de/t-shirts/?groesse=large&color=blau.

For Google, these are different URLs, but the content is identical and therefore harbors a potential duplicate content problem.

The crawl budget is charged

Every website has a specific crawl budget. These are the resources that Google has reserved for your website. In simple terms: the number of pages that the Google bot visits. Small websites that have a manageable number of subpages usually don’t have to worry about the crawl budget.

Larger shops, on the other hand, should have this budget in mind. Let’s stay with the example of t-shirts. Assume there are the following number of occurrences for the properties:

  • Color: 10
  • Size: 6
  • Neckline: 3
  • Brand: 10

This then results in 10 * 6 * 3 * 10 = 1,800 possible combinations and thus 1,800 potential URLs. In the worst case, Google then uses the crawl budget on these filter URLs and not the actually important URLs of the page.

Internal link juice is wasted

The parameters result in a large number of new URLs, which are also linked on every page. As a result, the internal link juice is distributed over many less relevant pages and not bundled into the important category pages.

Analysis of possible problems

To find out whether your faceted URLs are in the search engine index and whether they are causing you problems, you should analyze several things.

Are parameter urls in the google index?

The first port of call to check whether your faceted URLs are indexed is the Google search itself. With the advanced Google search operators site: and inurl: you can quickly and easily check the indexing of certain URLs. The number of hits that Google shows you give you a first impression of the extent.

If you are a user of the Google Search Console (which I would strongly recommend), you can check for individual URLs whether they are already indexed. For this, you can use the input mask at the top and enter any URL of your property.

Log file analysis

With the help of a log file analysis, you can determine which URLs are visited by Google and how often. You should keep an eye on whether the search engines are crawling your important category and product pages. If this is not the case and instead, the majority of the accesses are made to your faceted URLs, there is an urgent need for action.

SEO best practice faceted search

You now know the problems that can arise and how to analyze them. But how should you best deal with the faceted search?

Use Canonical Tag

One way to exclude identical pages from indexing is to specify a canonical URL. This is achieved by specifying a canonical tag in the HTML source text. This tag tells Google which URL is to be taken into account for indexing.

For example, suppose you want to prevent a category page filtered according to size from being indexed with t-shirts and instead only consider the unfiltered category page. In that case, the canonical tag must refer to the t-shirt page on each filter URL.

When should you use the canonical tag?

  • You want to prevent your faceted pages from being indexed.
  • You have an online shop with only a few products and a few selectable properties. Therefore, the crawl budget is no problem for you; Google regularly visits your most important pages.
  • You accept programming effort (if your system does not offer an integrated standard solution)
  • You want to bundle link juice on your important subpages and strengthen them in a targeted manner.

The canonical is basically a means of preventing duplicates. However, there are also disadvantages here. For one, it’s just a hint, not a directive. So Google does not have to follow your recommendation and can instead continue to index the filtered category page. The crawl budget is also burdened here as Google has to visit the URL in the canonical tag.

Set the robots tag to Noindex

In principle, you can completely prevent your filtered pages from ending up in the Google index in the first place. To do this, you have to set the robots tag to Noindex for the corresponding pages. This instruction tells the search engines not to index the accessed page.

With this, you have already prevented potential duplicate content problems and inflation of the index. However, the Google crawler will continue to visit these pages in the future, even if less frequently. Thus, despite this measure, your crawl budget will continue to be burdened. In addition, link power is lost when linking to noindex pages.

When should you use the noindex tag?

  • You want to make sure that your faceted pages are not indexed.
  • You have an online shop with only a few products and a few selectable properties. Therefore, the crawl budget is no problem for you; Google regularly visits your most important pages.
  • You are looking for a solution for all search engines.
  • You accept programming effort (if your system does not offer an integrated standard solution)

Use the Robots.txt file

To get the most out of your crawl budget, you should use the Robots.txt file. In this file, you can specify which areas of a website can and cannot be visited by the crawler. But be careful: filter URLs already contained in the index are not removed.

Because of the Robots.txt file, the search engine is no longer allowed to visit the page and can no longer read any robots tag instructions such as Noindex. You also generate a “black box” for Google with internally linked pages blocked in Robots.txt. Google sees the links but doesn’t know what content is hidden behind them. Does Google like that?

When should you use the Robots.txt file?

  • You need a solution for all search engines.
  • You are looking for a quick and easy solution to stop your faceted pages from crawling.
  • Your faceted pages are not yet in the search engine index.
  • You accept that Google will still index pages if they are linked from other external sites.
  • You know what you are doing: under certain circumstances, you make entire (important) areas of the page inaccessible to search engines!

Parameters in the Google Search Console

Similar to the Robots.txt file, you can also use the Google Search Console to prevent crawling access to certain URLs. However, this function can only be used if you have also created your page as a URL prefix. In addition, the settings then only apply to the Googlebot, but not to other search engines.

Under the menu item “Previous tools and reports,” you can then select the URL parameters function and add a new parameter. For example, let us assume that a parameter color is added to the URL on your site by selecting the color filter. Then you should make the following settings to prevent the crawling of all URLs that contain the color parameter.

When should you consider parameter control in Google Search Console?

  • Your filter control is done via parameters.
  • Your faceted pages are not yet in the search engine index.
  • You are looking for a quick solution that you can implement without programming effort.
  • You know what you are doing: if used incorrectly, Google may ignore important subpages of your website.

Use PRG pattern

To prevent Google from visiting your faceted URLs at all, you should use a PRG pattern. The PRG stands for Post-Redirect-Get. If you select the facet, the content of the page is changed, but by using Post, the URL remains the same, and there is no link that Google would follow.

The next step is a redirect to a get request, in which the parameter URL is then visible to the user. This has the advantage that he can also share this URL with others.

When should you consider using PRG patterns?

  • You need a solution for all search engines.
  • You have an online shop with many products and many selectable properties. Google very often visits the resulting faceted pages, but important pages are not. The crawl budget is a problem for your site.
  • You want to specifically strengthen your most important subpages.
  • You accept a lot of programming effort.

Which of the solutions is the right one in your case depends on various factors, such as the size of your site, your IT resources, and your products (maybe filters should also be indexed?). Last but not least, the cost/benefit effort must always be estimated.

Ideally, you don’t just use one of the above solutions but combine them. This prioritization applies to many online shops:

  1. Gold standard: PRG pattern + canonical tag + configure parameters
  2. Quite a quick solution: Noindex tag (but then without canonical, please!)
  3. Robots.txt: With robots.txt, I would only block pages out if you have massive problems with crawling, the Google bot is crawling too much, and you need a quick solution. Ideally, you then also mask the links using PRG patterns.
  4. Whatever works and costs nothing (except time): At least configure the parameters in the Google Search Console.

Deliberate indexing of filter pages

I have now told you what options there are to exclude your faceted URLs from indexing. However, a deliberate release for indexing by search engines for filter or facet pages can make sense in individual cases. In order to find out whether you could use potentials with these URLs, you should first illuminate a few points.

Is there a demand for your filter page?

Are you already using filters? Then you can use your web analysis tool to find out which filters are used and how often. Do you have a search function on your website? Very good! On the basis of the searches carried out, you can find out whether there are certain searches that keep popping up and that you should serve with your own faceted URL.

Is there enough search volume?

As part of keyword research,  you can also check whether there is a relevant search volume at all for the search term that is to be served with your faceted URL. Can you answer the question about sufficient search volume in the affirmative? Great, then you should consider indexing.

Do you have enough products on offer?

So there is a relevant search volume for the search term of your URL. But is your offer attractive enough? Can it meet the seeker’s expectations? In an online shop, you should typically make sure that you can offer enough products. A category page with only one or two products is meagre in most cases and should not be considered for indexing.

Can you influence important SEO elements of the URL?

Finally, you should make sure that you can optimize important SEO elements of your filter page individually with regard to your search term. This usually includes:

  • Page Title and Description
  • headlines
  • Texts

Conclusion

As you have seen, the faceted search is a very nice thing from the user’s point of view. But you now also know the pitfalls that can arise from an SEO perspective.

There is no one-size-fits-all solution on how to deal with it, and it always depends on the individual framework conditions. It is often a mixture of the above measures.

If you see potential in the URLs that arise via the facets, it makes sense to also have them crawled and indexed. If there is no potential or you cannot meet the demand, you should at least use the canonical tag and ideally prevent the crawling of the pages using the PRG pattern. I wish you every success in optimizing!