Don't Tread on Me: Moderating Access to OSN Data with SpikeStrip

Christo Wilson
Alessandra Sala
Joseph Bonneau
Robert Zablit
Ben Y. Zhao

Proceedings of The 3rd Workshop on Online Social Networks (WOSN)

[Full Text in PDF Format, 195KB]

Paper Abstract

Data access in today's web is open and unrestricted. Aside from small pockets of restricted content, site administrators have little control over the flow of their online data. This has a number of negative consequences, including the development of unauthorized cross-site deep links, online click fraud, and rogue crawling attacks, where unauthorized crawlers driven by profit extract a variety of valuable datasets from their owners, such as airline websites, recommendation engines and online social networks.

In this paper, we propose to return control of online data to their owners using a general "link encryption" primitive that associates each web session with a unique "view" of the visited site. By applying hyperlink encryption to selective pages, a site administrator can eliminate cross-site deep linking, as well as identify and rate-limit sessions performing rogue crawls, all while using whitelists to provide unfettered data access to legitimate crawlers. We implement SpikeStrip, an Apache module that uses link encryption and a scalable rate limit mechanism to throttle rogue crawlers. Using benchmarks and measurements on an Apache server hosting data from three different sites, we show that SpikeStrip imposes negligible overhead on server throughput while significantly raising the bar for rogue crawlers.