Efficient Estimation of Dynamic Density Functions with an Application to Outlier Detection
August 2, 2012
2:30 p.m.
Abdulhakim Qahtan
Abstract
Estimating the density of data streams characterizes the distribution of streaming data, which can be utilized for online clustering and outlier detection. However, it is a challenging task to estimate the underlying density function of streaming data as it changes over time in an unpredictable fashion. In this talk, I will introduce our method for estimating the dynamic density over data streams, named KDE-Track as it is based on a conventional and widely used Kernel Density Estimation (KDE) method. KDE-Track can efficiently estimate the density with linear complexity by using interpolation on a kernel model, which is incrementally updated upon the arrival of streaming data. Both theoretical analysis and experimental validation show that KDE-Track outperforms traditional KDE and a baseline method Cluster-Kernels on estimation accuracy of the complex density structures in data streams, computing time and memory usage. KDE-Track is also demonstrated on timely catching the dynamic density of synthetic and real-world data. In addition, KDE-Track is used to accurately detect outliers in sensor data and compared with two existing methods developed for detecting outliers and cleaning the sensor data.