Can Google detect which content is original?

Has your blog content every been stolen? Have you ever used Google search and been incensed to discover the stolen version ie. duplicate content is appearing in higher positioning in SERPS (Search engine page results) than your original article appears?

Duplicate content is content that can be accessed on more than one URL. “Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. If search engine spiders can’t tell which version of a web page or document is the original or canonical version, then the consequences will be less search visibility.

Duplicate content within a domain

Duplicate content within a domain is a common problem on blogs where multiple URLs can refer to the same content, for example, if you have full posts displaying in Archives, Categories pages and Tag pages. On self-hosted wordpress.org installs the no-index, follow tag can be used to instruct Google and the other search engines to crawl the page and follow the links but not add the page to its index. This cannot be done on free hosted blogs on wordpress.com, as it’s a multi-user blogging platform where users cannot access and edit themes or templates. With Panda rolling out globally and Google giving advice to remove duplicate content and non-original content, what is one to do?

To reduce duplicate content within my domain I have taken these steps:

  1. I have  set my RSS Feeds > Settings > Reading to “Summary” rather than “Full” to  reduce content theft.
  2. I have a Copyright page and copyright notices also to reduce content theft.
  3. I do not use a theme that displays full posts in Archive pages, Categories pages and Tags pages.  Instead I use the Inuit Types theme as it is a theme that automatically provides excerpts of post content on the Front page,  Archives, Categories and Tags pages.
  4. I copy and paste a sentence from my latest post into Google search a few hours after publication to search for duplicates.
  5. I use Copyspace to search for duplicates.
  6. I also use plagium (beta)  to track plagiarism.
  7. I have set up Google Alerts for my domain names.
  8. I act immediately when I discover my content has been stolen and file a DMCA take down notice when required.

Duplicate content across domains

Though it isn’t the only cause,  the most obvious cause of duplicate content is when people intentionally lift content from other sites for their own use.  Many content thieves are using Blogspot free hosting and Adsense (Google owns both) to make money from stolen blog content. In March Google decided to change the search algorithm  by means of the “Panda update.” It was aimed at rooting out duplicate content from content farms thereby delivering relevant results and enriching users search experience.  The bad news is  Google’s new “Panda” algorithm is ranking some  stolen content higher than the original versions.

Kunal Pradhan, Ahmedabad, India posed this question to Matt Cutts of Google:

“Google crawls site A every hour and site B once in a day. Site B writes an article, site A copies it changing time stamp. Site A gets crawled first by Googlebot. Whose content is original in Google’s eyes and rank highly? If it’s A, then how does that do justice to site B?”

How can I make sure that Google knows my content is original?

Updated June 21st, 2011

Will showing recent posts on my homepage cause a duplicate content issue?

Further reading on the Google Panda Algorithm update:
Why you should offer partial feeds after Google Panda Update
The Panda that hates farms (Matt Cutts and Amit Singhal Wired interview)

5 Google Webmasters Video Tutorials

the number 5These 5 videos introduce how Google discovers, crawls, indexes your site’s pages, and how Google displays them in search results. It also touches lightly upon challenges webmasters and search engines face, such as duplicate content and the effective indexing of Flash and AJAX content. Lastly, it also talks about the benefits of offerings Webmaster Central and other useful Google products. Continue reading

Matt Cutts on How Google Search Works

The life span of a Google query is less then 1/2 second, and involves quite a few steps before you see the most relevant results. Here’s how it all works.

How Google Search Works

Updated: How Google Social Search works

Related posts found in this blog:
Blogging Resources: Search Engines
Blogging Tips: Tag to Increase Traffic
YouTube and Google Tips from Matt Cutts
What factors influence video results in Universal Search?

Add to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Yahoo BuzzAdd to Newsvine