Get Links From Scraped Content

Photo credit Andrea Piacquadio  

If you’ve managed a website for any extended amount of time, you are probably already aware of the fact that many other website owners tend to copy content from other websites and use it as their own. In some cases website owners may do this sparingly where they may directly copy and paste content from one of their competitors or a related website once or twice. However, in some cases website owners will automate a process that copies large volumes of content from others. While this practice can be detrimental for everyone involved in some cases we can use this behavior to our advantage to build links that point back to our websites from those that are copying our content.

Admittedly, these links may not be the best quality or could be ignored by many of the search engines. However, receiving a link from within copied content back to the original source, can not only provide some level of link equity but also can help the search engines understand who the original author of the content is. In this article we will explain a series of steps you can take to ensure the website copying your content is more likely to link back to the original source (your website).

Include self referencing canonical tags

Self referencing canonical tags are used to communicate to the search engines the version of the URL they should be indexing. When used on every page of your website they can eliminate the risk associated with duplicate content on other web pages. This is because search engines will identify this URL associated with the content in the canonical tag. This is important when discussing scraped content because Google and other search engines will need guidance on which content should rank and which content should be ignored. If you are the original owner, the copied piece of content should rank lower. This self-referencing canonical tag will ensure that that is possible.

Also in some instances content scrapers may copy all the HTML on each page and republish it. If that is the case the canonical tag will be published as well which will reference the origin of the content being scraped, and apply all ranking and internal link value on to your site instead of the site that copied you.

You can easily create canonical tags using our meta tag generator. Or if you are using a CMS like WordPress there are many plugins that can add these automatically.

Link internally with absolute URLs

If content scrapers are using an automated process to copy your website it is likely that they are also taking all of the HTML found in the content including images and links. This means that you should link to other pages on your website within your own content as much as possible to increase the likelihood that these links will be replicated on other people’s websites when they copy your content.

To make this tactic work successfully it’s important that all of the internal links on your website use absolute URL paths. This means that the entire URL for the web page found within each link should begin with HTTP or HTTPS and end at the very end of the URL. This will ensure that the entire URL is copied over to the other website when the content is scraped. If you omit the first sections of the URL such as the domain name the link will not point to your website but rather to another page on their own website. This is why it’s critically important to always use absolute URLs when linking to internal pages on your website.

Find Duplicates and Outreach

The two tips above assume that those copying content are using an automated approach. However it’s possible that some website owners may copy and paste your content by hand; in which case it would not transfer any internal links or canonical tags. In this case we will need to search for these types of content scrapers and contact them to request that they link to the original source of content.

You can identify these types of content scrapers by simply copying a section of your content and placing it in the search engine and searching. It’s best to copy several sentences at a time and include them within quotation marks so that the search engines will search for exact match passages.

Another method for identifying this type of content duplication is to use a service such as Copyscape which can scan your website’s content and compare it against its internal index of content on the internet. These services can even score the frequency and level of duplication that two pieces of content may exhibit. This is helpful to quickly identify websites that may be copying your content and curate a list of the worst offenders.

After you have identified a list of websites copying your content you can then begin outreach to each. When contacting each of these websites it is smart to let them know you are aware that they have violated your copyright, however you may allow them to keep the content if they can provide a link back to the original source or your homepage. In these situations the website owner will realize that linking to your website is much easier than escalating the issue especially if they are in violation of copyright terms. Below is a sample letter you can use for your outreach:

To whom it may concern,

We have noticed that there is content on your web site that we own the copyright for. 

Copied Content: https://ScrapperSite.com/dogs-are-cool/

Orginal Content: https://RealSite.com/dogs-are-cool/


We are flattered that you liked our content enough to copy it, however we would like credit for the work we do. Please add a link to the copied content that points back to the original source or our homepage. 

If this issue is not resolved in 15 days we will be forced to submit a DMCA take down request with your web hosting provider.

Thank you for your time,

Acquiring links from copied content may not result in the best opportunities, however, it can be an easy way to turn a bad situation into something positive. While other link building techniques should probably be pursued first, this one can provide some links while you are also mitigating duplicate content issues.

About Joe

Joe Hall is the creator of LinkBuildingIdeas.com, an SEO consultant, and Principal Analyst at his company Hall Analysis. He loves science fiction, making things, and reading.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>