Add a bookmark to get started

10 March 20105 minute read

Spiders, bots and other creepy crawlers: protecting your company website

Websites contain a wealth of information. Unfortunately, the human Internet users you hope are accessing your site are not the only ones attracted to it. Automated tools, frequently referred to as spiders, bots and screen scrapers, may be crawling your company website too.

Given the potential of the Internet to consolidate and manipulate information, automated data aggregation has become a business model for many companies. Using specialized software applications, a scraper can almost instantaneously access numerous websites, submit queries to the sites, extract data from them and gather copies of the actual web pages viewed. Websites containing unique collections of proprietary, pricing and consumer information are frequent targets of scrapers.

Some aggregators, such as reputable online travel sites that provide a collection of airline and hotel options, may “scrape” source websites with the authorization of the website owners. Others, however, attack competitors’ sites, engaging in unauthorized scraping of large volumes of data. Yet others may crawl sites searching for email addresses, credit card numbers or other sensitive information.

Best practices to battle the bots

To a certain extent, companies can combat scrapers via technology. However, because you need to make your website reasonably accessible to intended users and because aggregation software is malleable, it is impossible to block all unauthorized access. Companies should employ certain best practices that will put them in the best position possible to take advantage of legal remedies should they become the target of a harmful attack.

Best practices include:

  • incorporating well-drafted terms of use on your website
  • incorporating a special instruction expressly telling scrapers to keep out, and
  • sending cease-and-desist letters to violators, which enclose your website terms of use

Website terms of use

Well-drafted terms of use are critical to protecting the company website. The terms should contain language that expressly prohibits access by spiders, bots, scrapers and other web crawlers. For example, the terms might contain a provision like the following:

You may not use any deep-link, page-scrape, spider, robot, crawl, index, Internet agent or other automatic device, program, algorithm or technology which does the same things, to use, access, copy, acquire information, generate impressions, input information, store information, search, generate searches or monitor any portion of this website.

Website terms of use should be easily accessible from every page of the website.

Robot talk

Your website should incorporate machine-readable code that expressly tells bots not to access the site. For example, a site might include a special instruction called a robots.txt or robot exclusion file. While compliance with robots.txt files is voluntary, inclusion of such files is often necessary to stop a court from later concluding that you impliedly licensed bots to access the site.

Cease-and-desist letters

If a website owner can identify the source of the scraping, taking prompt action through a written demand can sometimes stop the activity prior to formal legal action or better position the owner to take legal action against the scraper. Written demands should always incorporate a copy of the website terms of use so the scraper cannot later attempt to defend its conduct by claiming it did not have notice of the explicit terms it is accused of violating.

Available legal remedies

If preventative measures fail, website owners do have legal remedies. Breach of contract, the federal Computer Fraud and Abuse Act and trespass claims are common to potentially all scraping attacks.

While the enforceability of website terms and conditions is likely to be challenged by scrapers, a growing number of courts have now enforced website terms, even when presented only as a browse-wrap agreement (an agreement that users assent to by use of the website). A key to increasing the likelihood of enforceability is ensuring that the scraper has actual notice of the terms it is alleged to be breaching. Articulating a plausible damage claim caused by the scraping activity is also critical. The damage could be harm to the target’s computer system, or it could be commercial in nature.

Most website owners victimized by unauthorized scraping can also state a claim under the federal Computer Fraud and Abuse Act (CFAA), 18 U.S.C. § 1030 (2008). The claim is based on the scraper’s unauthorized access to the website owner’s server.

Finally, a number of courts have indicated a willingness to apply traditional trespass theory to cyberspace. However, these claims will likely be successful only if the website owner can establish some harm to its computer system or threat to the integrity of the system. Large-scale attacks that reduce the capacity and functionality of a server are good candidates for this sort of claim, but isolated attacks that do not impair the server should not be alleged under the trespass theory.

In addition to the above remedies that center around unauthorized access, the taking of material from the website may trigger additional claims, such as misappropriation or copyright infringement. If the website owner employs technological measures to block scraping, and the defendant circumvents such measures, the website owner may also have a claim under the Digital Millennium Copyright Act, which prohibits circumvention of technological measures that control access to a copyrighted work.

Conclusion

Unauthorized access to websites by automated means will only increase as the sophistication of scraping software improves and aggregators continue to enjoy easy profits in taking others’ work. Website owners must protect themselves, not only through technological measures but also by implementing best practices and pursuing effective legal remedies.

For more information, please contact Gina Durham.

Print