Identifying Value in Crowdsourced Wireless Signal Measurements

Zhijing Li
Ana Nika
Xinyi Zhang
Yanzi Zhu
Yuanshun Yao
Ben Y. Zhao
Haitao Zheng

Proceedings of the 26th World Wide Web Conference (WWW 2017)

[Full Text in PDF Format, 266KB]

Paper Abstract

While crowdsourcing is an attractive approach to collect large-scale wireless measurements, understanding the quality and variance of the resulting data is difficult. Our work analyzes the quality of crowdsourced cellular signal measurements in the context of basestation localization, using large international public datasets (419M signal measurements and ~1M cells) and corresponding ground truth values. Performing localization using raw received signal strength (RSS) data produces poor results and very high variance. Applying supervised learning improves results moderately, but variance remains high. Instead, we propose feature clustering, a novel application of unsupervised learning to detect hidden correlation between measurement instances, their features, and localization accuracy. Our results identify RSS standard deviation and RSSweighted dispersion mean as key features that correlate with highly predictive measurement samples for both sparse and dense measurements respectively. Finally, we show how optimizing crowdsourcing measurements for these two features dramatically improves localization accuracy and reduces variance.