Websites contain a wealth of information. Unfortunately, the human Internet users you hope are accessing your site are not the only ones attracted to it. Automated tools, frequently referred to as spiders, bots and screen scrapers, may be crawling your company website too.
Given the potential of the Internet to consolidate and manipulate information, automated data aggregation has become a business model for many companies. Using specialized software applications, a scraper can almost instantaneously access numerous websites, submit queries to the sites, extract data from them and gather copies of the actual web pages viewed. Websites containing unique collections of proprietary, pricing and consumer information are frequent targets of scrapers.
Some aggregators, such as reputable online travel sites that provide a collection of airline and hotel options, may “scrape” source websites with the authorization of the website owners. Others, however, attack competitors’ sites, engaging in unauthorized scraping of large volumes of data. Yet others may crawl sites searching for email addresses, credit card numbers or other sensitive information.
Best practices to battle the bots
To a certain extent, companies can combat scrapers via technology. However, because you need to make your website reasonably accessible to intended users and because aggregation software is malleable, it is impossible to block all unauthorized access. Companies should employ certain best practices that will put them in the best position possible to take advantage of legal remedies should they become the target of a harmful attack.
Best practices include:
- incorporating a special instruction expressly telling scrapers to keep out, and
You may not use any deep-link, page-scrape, spider, robot, crawl, index, Internet agent or other automatic device, program, algorithm or technology which does the same things, to use, access, copy, acquire information, generate impressions, input information, store information, search, generate searches or monitor any portion of this website.
Your website should incorporate machine-readable code that expressly tells bots not to access the site. For example, a site might include a special instruction called a robots.txt or robot exclusion file. While compliance with robots.txt files is voluntary, inclusion of such files is often necessary to stop a court from later concluding that you impliedly licensed bots to access the site.
Available legal remedies
If preventative measures fail, website owners do have legal remedies. Breach of contract, the federal Computer Fraud and Abuse Act and trespass claims are common to potentially all scraping attacks.
While the enforceability of website terms and conditions is likely to be challenged by scrapers, a growing number of courts have now enforced website terms, even when presented only as a browse-wrap agreement (an agreement that users assent to by use of the website). A key to increasing the likelihood of enforceability is ensuring that the scraper has actual notice of the terms it is alleged to be breaching. Articulating a plausible damage claim caused by the scraping activity is also critical. The damage could be harm to the target’s computer system, or it could be commercial in nature.
Most website owners victimized by unauthorized scraping can also state a claim under the federal Computer Fraud and Abuse Act (CFAA), 18 U.S.C. § 1030 (2008). The claim is based on the scraper’s unauthorized access to the website owner’s server.
Finally, a number of courts have indicated a willingness to apply traditional trespass theory to cyberspace. However, these claims will likely be successful only if the website owner can establish some harm to its computer system or threat to the integrity of the system. Large-scale attacks that reduce the capacity and functionality of a server are good candidates for this sort of claim, but isolated attacks that do not impair the server should not be alleged under the trespass theory.
In addition to the above remedies that center around unauthorized access, the taking of material from the website may trigger additional claims, such as misappropriation or copyright infringement. If the website owner employs technological measures to block scraping, and the defendant circumvents such measures, the website owner may also have a claim under the Digital Millennium Copyright Act, which prohibits circumvention of technological measures that control access to a copyrighted work.
Unauthorized access to websites by automated means will only increase as the sophistication of scraping software improves and aggregators continue to enjoy easy profits in taking others’ work. Website owners must protect themselves, not only through technological measures but also by implementing best practices and pursuing effective legal remedies.
For more information, please contact Gina Durham.