Duplicate Content in the SERPs Sucks!

duplicate dollsThe theme of this post is: don’t create multiple pages, subdomains, or domains with substantially duplicate content. Almost every day when I visit new blogs on the internet I spot duplicated content. The most common instances I witness are bloggers, who set up free blogs on WordPress.com, whereon blogger initiated advertising and duplicate content are not allowed, who then go on to create a mirror site on a free Blogger blog containing all the same content, so they can benefit from the niggardly income provided by Google Adsense.  The second most common experience I’m having is witnessing is published articles from article directories duplicated on multiple sites. The third most common experience I am witnessing is very similar content on multiple sites that differs only in that a few words or paragraphs have been added to the core text.

What constitutes duplicate content?

Duplicate content is content that can be accessed on more than one URL.
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin. Examples of non-malicious duplicate content could include:

  • Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
  • Store items shown or linked via multiple distinct URLs
  • Printer-only versions of web pages
  • If your site contains multiple pages with largely identical content, there are a number of ways you can indicate your preferred URL to Google.   (This is called “canonicalization“.)  However, in some cases, content is deliberately duplicated across domains in an attempt to manipulate search engine rankings or win more traffic. Deceptive practices like this can result in a poor user experience, when a visitor sees substantially the same content repeated within a set of search results.”

    Why is duplicate content an issue?

    One of the biggest issues with SEO is duplicate content. If search engine spiders can’t tell which version of a web page or document is the original or canonical version, then the consequences will be less than ideal search visibility. Most duplicate content is created by blog scraping sploggers who steal content by subscribing to RSS feeds. Some duplicate content is created by the author’s of the content and the latter is what this article is focused on.

    Search engines are designed to provide the most relevant results to those who use them. When it comes to a blog not making the ascent to the top of the search engine rankings and SERPs (search engine pages results) the issue of duplicate content arises. Search engines like Google, Yahoo, Bing, and Ask have developed tools and filters that locate and remove web pages containing duplicated content, in order to deliver the most relevant and timely results to searchers. Not all duplicate content has to be identical to be spotted and removed a search engine crawler. But web pages with similarity over of over 60% will definitely be detected and impede any ranking success a blogger is aiming to enjoy.

    Matt Cutts of Google introduces the canonical link element

    Whenever content on a site can be found at multiple URLs, it should be canonicalized for search engines. This can be accomplished using a 301 redirect to the correct URL, using the rel=canonical or in some cases using the Parameter handling tool in Google Webmaster Central.  The ways of properly handling cross-domain content duplication are found in Handling legitimate cross-domain content duplication on the Official Google WebMaster central Blog.

    Get with the program, please!

    On my regular read around today I came across the following comment relating to traffic generation and link building.
    “Submit some of your more popular posts to article directories in order to gain greater exposure”
    Let me just make myself 100% clear on this statement….
    It is false, do not submit any content from your site/blog to article marketing directories, if you do it will be labeled duplicate content and no doubt your page will be thrown into the supplementary index.” — Tim Grice in  SEO – Some Common Newbie Mistakes

    1.   It  seems clear to me that those creating duplicate content mirror blogs on WordPress.com and Blogger (blogspot) blogs are motivated by greed, and fall into the group who are deliberately duplicating content across domains in an attempt to manipulate search engine rankings and/or secure more traffic.  I report all such sites when I encounter them.

    The types of blogs allowed and not allowed on the WordPress.com blogging platform  and the Terms of Service prevent using a WordPress.com blog as a publicly available and indexed duplicate content blog.  WordPress.com Staff will suspend or delete all duplicate content blogs reported to them. If you have exported your content out of a blog on another blogging platform such as Blogger, Blogger, Israblog, LiveJournal, Movable Type, Typepad, Posterous, Spaces, Tapuz ,Vox, and Yahoo! 360, and then imported it into a WordPress.com free hosted blog, change the visibility on the original blog to “private” so there will be no duplicate content issue.  If you don’t do that then my understanding is that the first content to be indexed will be considered to be  the original, and all other copies will be considered to be  duplicates.

    2.   Ezinearticles and most article directories so accept article(s) that have been previously published elsewhere, provided you are the unique person who holds copyright to the article.  However, Hubpages, Buzzle, Ehow and Knol do not allow duplicate content. They want to only unique content on their sites and will delete your article(s) and your account if you persist. It seems to me that anyone who can write can also rewrite.  So smart bloggers are not duplicating content and having content in  article directories, etc. out place their blog content in the SERPs

    3.   Reputable blog directories do not allow duplicate content sites to be registered. If and when they do slip in under the radar and are reported to site Admin they will delete the site from their directory.

    4.   When RSS syndicating content, create different versions of the same article that you want to syndicate,  rather than posting the same article everywhere.

    Further reading: Six Easy Ways to Eliminate Pesky Duplicate Content

    Plagiarism checkers:

    There are many free plagiarism checkers you can use online. Copyscape is a free plagiarism checker. The software lets you detect duplicate content and check if  articles are original.

    plagium (beta) – Track plagiarism by pasting your original text.

    Conclusion:

    I require the use of search engines to do research for my contracted work and  prior to creating and publishing blog posts. And, I resent going through screen after screen of duplicated content results presented to me in the SERPs. I think it is a good strategy for search engines to penalize those sites with duplicate content by omitting them from the search results.   Google’s algorithm will continue to be adjusted over time to fit one simple goal:  return the most relevant, helpful pages for any particular search.  Really?  Then  why Google isn’t doing a better job?  Duplicate Content in the SERPs Sucks!

    Update:  Google Webmaster Central: Duplicate content summit at SMX Advanced.

    30 thoughts on “Duplicate Content in the SERPs Sucks!

    1. Pingback: Panda and Penguin algorithm updates | one cool site

    2. Pingback: Official Google Webmaster Central Blog: Raising awareness of cross-domain URL selections « one cool site

    3. Pingback: Can Google detect which content is original? « one cool site

    4. Oh. Just thought . . .
      What about famous quotes? I get them from a book (paper, ink, etc.) but I know they exist elsewhere on the Internet. Of course there is no rewriting possibility, there.

      • As long as proper attribution is made there is not problem with using quotes. However, you also need to included your own unique words in the conetnt. Copyright basics for bloggers

        The most important thing to know is that the law pertaining to copyright and plagarizing the work of others in an attempt to pass it off as your own is the same in cyberspace as it is in the print world.

    5. Well. I searched for TOS and got this page–information I really needed.
      What about a blog that states: Please tell everyone you know about this.
      The context is a group of needy persons being defended and helped by the blogger. I only sorta rewrote it, and did link back to the original source…Oh, I think I need to touch it up a little? Make it even more original to me in wording?
      So…
      How do I find the thread, here, about wp TOS? Thanks so much.

    6. TT,
      This is GREAT information. I have nothing really to add and, luckily, I haven’t had to deal with any duplicate content issues. I just want you to know that I found every portion of this valuable. You are my number one blogging resource and you continue to earn that position time and time again. So thankful.

      • @Janene,
        Thanks so much for the compliment. I value it and your faithful readership and comments too. I’m aiming to provide useful information to bloggers in every one of my posts which now over 500 in number. If you ever have a topic you would like to suggest please feel free to so that. Have a great holiday season. :)

    7. Hi TT.
      I have reproduced an article from the Guardian.uk which addresses internet related issues. I reproduced it and clearly attributed it to the Guardian and then added my own comments at the end.
      Is this OK or would it constitute duplication in the terms you describe ?
      Best regards and have a great holiday break and may 2011 be a biggie for you.

      • The Guardian holds copyright to the content and there’s zero doubt about that. Did you secure the permission of the copyright holder to reproduce their entire article? If not then you violated their copyright and you created duplicate content? NO you did not so it’s NOT okay — it’s a copyright violation.

        Succinctly stated whether or not the author of any original digital work has posted a copyright notice on their site or the work itself is irrelevant. It does not change the fact that they hold the copyright to their works and it cannot be re-published unless or until their permission has been given.

        The only time a complete post can be legally re-published is when prior written permission has been received from the copyright holder. In other words, the same rules that apply to the world of print also apply in cyberspace. Otherwise republishing a BRIEF excerpt, correctly identifying the author of it, and providing a link back to the original post is the correct protocol. See here please > http://onecoolsitebloggingtips.com/2009/02/05/copyright-basics-for-bloggers/

    8. Hey, Timethief!

      Don’t know if this is possible, but figured I’d ask.

      Rather than start a Blog at Blogger or WordPress, we started out posting on our own domain.

      I know that our blog is “powered by WordPress,” whatever that may mean. I’m not very tech savvy.

      But, because we never had our site at wetookthebait.wordpress.com, we lose out on a lot of traffic that could potentially come from wordpress (I know we don’t show up in internal wordpress search engines, for example.)

      I know we don’t want to set up a wordpress blog and just copy and paste our blog entries to a new wordpress blog, because that would show us as having a mirror blog.

      I understand that it’s pretty simple to migrate a blog from wordpress or blogger to your own domain.

      Is the reverse also true? Or, do we just lose out on all of the wordpress blog searches because we started out the wrong way?

      • Powered by WordPress.org means that you are using free open source WordPress.org software on your site. WordPress.ORG and WordPress.COM are completely separate and run on different software. WordPress.com vs. WordPress.org – WordPress.com is a hosted blog service on a multi-user blogging platform. You do not have to download software, pay for hosting or manage a web server. WordPress.com does not permit uploading themes or plugins. WordPress.org is free software. You can install themes and plugins, run ads, and edit the database. Check out the article on the differences. http://support.wordpress.com/com-vs-org/

        As your blog is not a free blog from and being free hosted on the WordPress.COM multi-user blogging platform it’s not part of the WordPress.COM community and it cannot benefit from the features like traffic from the global tagging pages as WordPress.COM blogs do and other promotional features that apply only to WordPress.COM blogs within that community. For example, the posts cannot featured in WordPress.COM Freshly Pressed articles on the front page, in Top Blogs, Growing Blogs, or Blogs of the Day.

        Your blog is indexed by Google (405 results). Your blog is indexed by Bing (29 results). Your WordPress version (2.9.2) is out of date and that means it can be vulnerable to security exploits – upgrade now! And keep your versions up to date at all times.

        If the bottom line here is that you want traffic from WordPress.com to a WordPress.ORG site then that’s NOT possible.
        I’m not clear what you mean by “reverse”. Note the WordPress.COM Terms of Service prohibits using a WordPress.COM blog to drive traffic to third party sites. If you wish you can get a free WordPress.COM blog and purchase domain mapping http://en.support.wordpress.com/domain-mapping/ and import the content from your current WordPress.org site into it – after you upgrade to the most recent version of WordPress.org on your site. http://en.support.wordpress.com/moving-a-blog/#moving-from-wordpress-org Then you can cancel your web hosting and close down your site. Once you have done that then all the URLs currently directing traffic to your site will seamlessly direct traffic to your new WordPress.com blog.

      • I’m with you on that and it’s exactly what prompted me to publish this post. I’m sick and tired of bloggers posting the same content on multiple domains. As a search engine user those people are wasting my time by attempting to game search engine results. I would like to see Goggle eliminate all that duplicated content from the SERPs. The original is enough – thanks!

    9. Thanks, you just saved me from creating duplicate websites at Blogger and Typepad..
      Not that I am interested in the SERP ratings very much, but just don’t need bad press.. – as much as possible..!
      ;)
      Thanks Titi!

    10. Oh boy, I may have done exactly what you are telling us not to do here. Motivated by a desire to expose more readers to the issue of slave labor used to harvest cocoa beans, I posted a duplicate on Culinate, a web food community to which I belong. Here is my duplicate copy.http://www.culinate.com/user/Cooking+in+Mexico/blog/chocolate_slavery_and_our_collective_guilt_
      Here is the original. http://cookinginmexico.com/2010/11/19/chocolate-slavery-and-our-collective-guilt/
      They are word for word identical.
      Should I remove the one on Culinate?

      Thank you.

      Kathleen

      • Hi Kathleen,
        The duplicate was created in November. Deleting it IMHO is not the best choice as it’s already indexed and doing that will mean when people click the Culinate link they will get a “404″ page not found. As you do have access to the piece what you can do is rewrite it, then edit and remove the content from the duplicate, and replace it with the rewritten text.

    Comments are closed.