Complexity vs. Performance: Empirical Analysis of Machine Learning as a Service
Ben Y. Zhao
Proceedings of the 17th ACM SIGCOMM Internet Measurement Conference (IMC 2017)
[Full Text in PDF Format, 1.2MB]
Machine learning classifiers are basic research tools used in numerous types of network analysis and modeling. To reduce the need for domain expertise and costs of running local ML classifiers, network researchers can instead rely on centralized Machine Learning as a Service (MLaaS) platforms.
In this paper, we evaluate the effectiveness of MLaaS systems
ranging from fully-automated, turnkey systems to fully-customizable
systems, and find that with more user control comes greater risk.
Good decisions produce even higher performance, and poor decisions
result in harsher performance penalties. We also find that
server side optimizations help fully-automated systems outperform
default settings on competitors, but still lag far behind well-tuned
MLaaS systems which compare favorably to standalone ML libraries.
Finally, we find classifier choice is the dominating factor in determining
model performance, and that users can approximate the
performance of an optimal classifier choice by experimenting with
a small subset of random classifiers. While network researchers
should approach MLaaS systems with caution, they can achieve
results comparable to standalone classifiers if they have sufficient
insight into key decisions like classifiers and feature selection.