Don't Tread on Me: Moderating Access to OSN Data with SpikeStrip
Christo Wilson
Alessandra Sala
Joseph Bonneau
Robert Zablit
Ben Y. Zhao
Proceedings of The 3rd Workshop on Online Social Networks (WOSN)
[Full Text in PDF Format, 195KB]
Paper Abstract
Data access in today's web is open and unrestricted. Aside from small pockets of restricted content, site administrators have little control over the flow of their online data. This has a number of negative consequences, including the development of unauthorized cross-site deep links, online click fraud, and rogue crawling attacks, where unauthorized crawlers driven by profit extract a variety of valuable datasets from their owners, such as airline websites, recommendation engines and online social networks.
In this paper, we propose to return control of online data to their owners
using a general "link encryption" primitive that associates each web
session with a unique "view" of the visited site. By applying hyperlink
encryption to selective pages, a site administrator can eliminate
cross-site deep linking, as well as identify and rate-limit sessions
performing rogue crawls, all while using whitelists to provide unfettered
data access to legitimate crawlers. We implement SpikeStrip, an Apache
module that uses link encryption and a scalable rate limit mechanism to
throttle rogue crawlers. Using benchmarks and measurements on an Apache
server hosting data from three different sites, we show that SpikeStrip
imposes negligible overhead on server throughput while significantly
raising the bar for rogue crawlers.