Can Google detect which content is original?

Has your blog content every been stolen? Have you ever used Google search and been incensed to discover the stolen version ie. duplicate content is appearing in higher positioning in SERPS (Search engine page results) than your original article appears?

Duplicate content is content that can be accessed on more than one URL. “Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. If search engine spiders can’t tell which version of a web page or document is the original or canonical version, then the consequences will be less search visibility.

Duplicate content within a domain

Duplicate content within a domain is a common problem on blogs where multiple URLs can refer to the same content, for example, if you have full posts displaying in Archives, Categories pages and Tag pages. On self-hosted wordpress.org installs the no-index, follow tag can be used to instruct Google and the other search engines to crawl the page and follow the links but not add the page to its index. This cannot be done on free hosted blogs on wordpress.com, as it’s a multi-user blogging platform where users cannot access and edit themes or templates. With Panda rolling out globally and Google giving advice to remove duplicate content and non-original content, what is one to do?

To reduce duplicate content within my domain I have taken these steps:

  1. I have  set my RSS Feeds > Settings > Reading to “Summary” rather than “Full” to  reduce content theft.
  2. I have a Copyright page and copyright notices also to reduce content theft.
  3. I do not use a theme that displays full posts in Archive pages, Categories pages and Tags pages.  Instead I use the Inuit Types theme as it is a theme that automatically provides excerpts of post content on the Front page,  Archives, Categories and Tags pages.
  4. I copy and paste a sentence from my latest post into Google search a few hours after publication to search for duplicates.
  5. I use Copyspace to search for duplicates.
  6. I also use plagium (beta)  to track plagiarism.
  7. I have set up Google Alerts for my domain names.
  8. I act immediately when I discover my content has been stolen and file a DMCA take down notice when required.

Duplicate content across domains

Though it isn’t the only cause,  the most obvious cause of duplicate content is when people intentionally lift content from other sites for their own use.  Many content thieves are using Blogspot free hosting and Adsense (Google owns both) to make money from stolen blog content. In March Google decided to change the search algorithm  by means of the “Panda update.” It was aimed at rooting out duplicate content from content farms thereby delivering relevant results and enriching users search experience.  The bad news is  Google’s new “Panda” algorithm is ranking some  stolen content higher than the original versions.

Kunal Pradhan, Ahmedabad, India posed this question to Matt Cutts of Google:

“Google crawls site A every hour and site B once in a day. Site B writes an article, site A copies it changing time stamp. Site A gets crawled first by Googlebot. Whose content is original in Google’s eyes and rank highly? If it’s A, then how does that do justice to site B?”

How can I make sure that Google knows my content is original?

Updated June 21st, 2011

Will showing recent posts on my homepage cause a duplicate content issue?

Further reading on the Google Panda Algorithm update:
Why you should offer partial feeds after Google Panda Update
The Panda that hates farms (Matt Cutts and Amit Singhal Wired interview)

37 thoughts on “Can Google detect which content is original?

  1. Pingback: Panda and Penguin algorithm updates | one cool site

  2. Pingback: Reposting content from other sites | one cool site

  3. Pingback: Google Penguin Update | one cool site

  4. Pingback: Official Google Webmaster Central Blog: Raising awareness of cross-domain URL selections « one cool site

  5. Excellent site and great advice. I have a ? though about the stats – when I click on my page (when I’m logged out) it adds to the page view stats by a factor of 2 each 1 visit. Is this the case with other people visiting my site? I mean if my site stats were 500 per month, does that actually mean the real number of individual people viewing is 250? Or am I just plain wrong.

    • WordPress stats are not counting “people”. WordPress stats are counting PageViews.
      If you are using a theme that displays an excerpt on the front page, and a visitor clicks into that front page they must click either the “read more” tag or the post title to read the entire post on it’s own page.
      1 click home page + 1 click post on its own page.

      If you are using a theme that displays an excerpt on the front page, and a visitor clicks directly into the post from another site like Twitter or Google they read the entire post on it’s own page.
      1 click post on its own page.

      and so on. Support documentaion on Stats. http://en.support.wordpress.com/stats/

  6. Thanks timethief … you’ve saved me lots of time :) I’m new to WP and I’ve only been blogging less than a week so thanks to you I can focus more on blogging about my rantings and chaotic life than having to scour through the WP Support page. Yay to women who know their stuff (you). See ya around …

  7. Sorry to hear of your injury, TiTi. How did that happen?

    I hope it heals quickly so you can get out and about. In the meantime, may the seedlings in your greenhouse, flourish.

    We’re not getting a lot of rain here, but overall, it’s been a horrendous and LONG winter. Now we are finally seeing lush green and lots of blooms popping out everywhere. Good for the spirits! :)

  8. Very good and useful information, TiTi. This has happened to me many times, either with images or even, whole posts. I will have to bookmark this article to come back to, for reference. Thanks for posting it.

    I hope you are well and enjoying the spring season, pain free! Or at least, with minimal pain.

    • Hi Lynda,
      It’s good to hear from you. Unfortunately, I have broken some bones in my foot and I not getting around much these days. Hiking, dancing etc. is out for the next 8 weeks. I have started plants in my greenhouse and I’m looking forward for to seeing the rain quit — enough already! :) Take care and be well.

  9. Hi, Timethief

    Thanks for the useful info. Like you, I try to only include an excerpt of articles on front pages and feeds (to reduce duplicates). Now, I see it may minimize theft too. I’ve noticed the copyright page on your blog before and think it’s great. Thought about adding one to my site after I saw yours.

    The added steps and video you included were also very helpful.

    Just curious, how often have you discovered stolen content from your blog. And was it easy to get the person to remove the content after you filed a DMCA?

  10. Outstanding post with many tools listed to help us find, fight, and protect our content. I have used several of these tools that you have listed to help me protect and defend my content when I faced an attack on my blog’s content. One of your old posts concerning how to file a DMCA was so useful and was a key to helping me deal with my content theft.

  11. After reading this post,and following the links in this post, i think i should now take SEO seriously. I expect myself to have a formal study about SEO as soon as i get some time. Thanks for the post.

    • Hello Phoxis,
      I only know the basics and I read what SEO professionals and others have to say. The effects of the update are being reported in many articles throughout the blogosphere. So I decided to share what I what I have done myself in response to what I have read and the video too. Thanks for commenting.

      • I believe that to do the thing right i need to get into it formally and allot myself some time to understand what is going on in behind the curtain and then from that knowledge decide what should be done. And this post is an eye-opener. Thanks for this.

  12. Hi Timethief

    I had been posting only summaries to my RSS feed, but I heard from my Kindle blogs customers that they only got the summary on their Kindle so I had to change it. Am I doing something wrong?

  13. timethief,

    All this just makes my head spin! But you’ve offered some excellent tips that I will look to employing. I appreciate the video. Thanks.

    • Hi Sandra,
      Heads are spinning. SEO experts are reporting what they have observed. I have done a great deal more research than what I referred to here. What I chose to do was to cut down my post and provide only the information I think is most relevant to my reader community.

      P.S. Have I told you lately how much I value your friendship? I don’t know if you are the huggy type or not but if you are then here’s a big {HUG}. :)

Comments are closed.