Duplicate Content: Understand, Find & Avoid Dangers

Most website operators have vaguely heard of it: Duplicate content is dangerous! Google doesn’t like that! Also true. But don’t panic. At least not anymore. Because here you can find out what is behind it and what you have to pay attention to.

What is duplicate content?

Duplicate content (i.e., “double content,” “DC” for short) is content that appears identically on several pages on the Internet. It is not just about copied texts, but above all, completely identical individual pages. A distinction is made between internal and external duplicate content: Internal means that the same content is on a domain – for example, on digitalgarg.com. External means that the content appears on multiple domains.

Duplicate content causes problems for search engines like Google. That is why the affected page’s content is more difficult to find or even filtered out there. So that a website does not have ranking problems due to copy content, every indexed page must have enough “unique content”. “Unique content” is content that was only created for one page, and only appears on that page.

Why is duplicate content a problem?

The Googlebot doesn’t like copy content. And if he finds too much on a domain, he gets mad. The result is penalties and resets in the ranking.

Duplicate content is a big issue for Google. On the one hand, it is difficult to find out algorithmically which page of a domain is most suitable for a search query. Also, Google wants to save crawling resources and not crawl 100 versions of the same page, because that is really a lot of money by Google that is wasted on hardware performance. The basics on the subject are available directly from Google: “Duplicated content.”

When is duplicate content a problem?

You have a real problem with copy content, especially if Google cannot decide which page is more relevant and, therefore, several pages alternate in the ranking. But after you probably want to hear a number: Once you have more duplicate content than unique, you have a problem. No later than. The perfect search engine optimized page consists of 100% unique content – in theory.

What are typical examples of duplicate content?

Duplicate content has many faces. A few of the classics are these:

  • Websites accessed via https: // example.com, http: // example.com, http: // www. example.com and https: // www. example.com are reachable (and not forwarded)
  • Upper and lower case URLs such as example.com/Example and example.com/example
  • Own URLs for print versions
  • Additional PDFs with product information such as technical details, which (should) also be given on the product landing page.
  • Numerous product detail pages for specific sizes, colors, and shapes
  • Affiliate URL parameters such as? Partnerid = 2858
  • Parameter URLs for sorting and displaying product overviews
  • /index.htm, / com/ and similar things that content management systems produce
  • Automatically generated tag pages.
  • And in a way, pagination pages too.

The list can probably go on forever. And there’s something like that on every domain – guaranteed.

How can you find copy content?

The easiest way to find duplicate content on your site is to google text blocks. Simply put the text module in quotation marks and off you go:

find duplicate content

In order to really find the copy content, you then have to click on this link also to display the duplicated pages that have been filtered out.

Since it would be a bit cumbersome to search for all the text modules, there are also helpful tools. In Google’s new Search Console, there is the report “Index Coverage”. To do this, you have to click on the “excluded” in the diagram:

check duplicate content on search console

Then you will also see some types below that Google itself tells you that they have classified a page as a duplicate of another URL:

With a click on the URL concerned, a menu will open up where you can jump directly to the URL checking tool. There you can even see which is the duplicate URL.

But you can also find duplicate content using various tools available in the market.

How can you avoid duplicate content?

There are various solutions to avoid duplicate content. The most basic: Don’t let dupicate content arise in the first place. It starts with a clean crawling control. This means that you shouldn’t link duplicate content so that search engines don’t have to deal with it.

But if the duplicate page is already there, then ideally, you should forward it directly to the original intended URL via 301 redirects. Then your site will stay slim and healthy.

But there is also duplicate content that is useful for your visitors. For example, URLs with sorting, affiliate URLs, or product variants. You can keep these pages, but they have to refer to the original URL via a canonical link. This is what it looks like:

<link rel = “canonical“ href = “https://www.digitalgarg.com/duplicate-content-guide/“ />

This link in the <head> of your page, invisible to users, tells search engines which page should appear in the search results. Search engines then understand the duplicate URL and the original URL as one piece of content and can deal with it.

If you have duplicate content that should not appear in any version in the index – for example, very similar distribution pages that are only used for navigation – then you should set them to “noindex, follow” using a robots meta tag to exclude them from search engines. However, it is even better if you question the entire existence of these pages.

If you have duplicate content that should both be found using the search, then only one thing helps: You have to individualize this content. Even if all your services are identical, regardless of whether you are repairing a laptop or desktop PC, for example: If you want the services to be found separately, you have to formulate their own content for each. Of course, this also applies to product descriptions in online shops.

Special cases of duplicate content

Recurring text modules

Even single paragraphs that appear on several of your pages are a form of duplicate content. Google calls this “recurring text modules”:

“Minimize recurring text blocks: Instead of adding extensive copyright notices at the end of each page, you can just provide a short summary with a link to detailed information.”

That is not insignificant information. Try to put as little text as possible in Footer & Co. Shipping information and other things are also duplicate content! Google reacts quite sensitively, especially if you put 300 words of extensive information about your great shop at the end of each individual website. I don’t know what is supposed to be bad about it (for the user). Google will already know.

External duplicate content

If content appears on multiple domains, Google has to choose an original. As a rule, this is the page on which the Googlebot first found the content. But other signals such as links to the source are also an indication for Google.

So if you publish a press release and you want your page to be found for it, you should make sure that you publish first. It is crucial that Google crawls your page first. You can accelerate this by clicking in the Google Search Console via “Crawling”> “Access as by Google” for the requested URL on “Request indexing”.

External duplicate content is not particularly problematic for the original. However, if you use supplier content for product descriptions, it will likely be used by other websites as well. Then it is very unlikely that your site will be found. You should, therefore, always create your product descriptions yourself.

But if you use quotations, this is usually not a problem. To be on the safe side, you can mark them as quotes in the source code using the “blockquote” tag:

<blockquote> This is a quote. </blockquote>

International duplicate content

If you are active in Germany, France, and Switzerland, you probably have their own pages with correspondingly adjusted prices, telephone numbers, and shipping information. So that you don’t have any problems with duplicate content, in this case, the “hreflang” award was invented. In the <head> of your page, you tell the search engines which of the pages is intended for which country and for which language. For example, this code says that Example.de is in German and for Germany, Example.fr is in French and for France:

<link rel = “alternate” hreflang = “de-de ” href = “https://www.example.de/duplicate-content” />

<link rel = “alternate” hreflang = “de- fr ” href = “https: //www.example.fr/duplicate-content” />

Also read – hreflang Guide

How bad is duplicate content really?

The bad thing about duplicate content is that the effects usually don’t show at all. Nevertheless, it is ballast on the side that slows you down.

Only when your site has thousands of URLs, and your site structure becomes more and more complicated (canonicals, hreflang, different domains) do the problems get really serious. As a hobby site operator, you usually don’t have any major problems with duplicate content- as long as you write your own content and you have an eye on the wildest excesses of your content management system.

If you want to understand the topic of SEO even better, you are welcome to contact us.

Leave a Comment