Approximate calculation of Tukey's depth and median with high-dimensional data

  • Milan Merkle University of Belgrade, Faculty of Electrical Engineering
  • Milica Bogićević Faculty of Electrical Engineering, University of Belgrade

Abstract

We present a new fast approximate algorithm for Tukey (halfspace) depth level sets and its implementation-ABCDepth.
Given a $d$-dimensional data set
for any $d\geq 2$, the algorithm is based on a representation of level sets as intersections of balls
in $\mathbb{R}^d$ (M. Merkle, J. Math. Anal. Appl. {\bf 370} (2010)). Our approach does not need calculations of projections of sample points to directions.
This novel idea enables calculations of level sets in very high dimensions with complexity which is linear in $d$, which provides a great advantage
over all other approximate algorithms. Using different versions of this algorithm we demonstrate approximate calculations of the deepest set of
points ("Tukey median") and Tukey's depth of a sample point and of out-of-sample point, all with a linear in $d$ complexity. An additional theoretical advantage of this approach is that the data points are not assumed to be in "general position". Examples with real and synthetic data show that the executing time of the algorithm in all mentioned
versions in high dimensions is much smaller than other implemented algorithms and that it can accept thousands of multidimensional observations.


Keywords: Big data, multivariate medians, depth functions, computing Tukey's depth.

Published
Sep 12, 2018
How to Cite
MERKLE, Milan; BOGIĆEVIĆ, Milica. Approximate calculation of Tukey's depth and median with high-dimensional data. Yugoslav Journal of Operations Research, [S.l.], v. 28, n. 4, p. 475-499, sep. 2018. ISSN 2334-6043. Available at: <http://yujor.fon.bg.ac.rs/index.php/yujor/article/view/631>. Date accessed: 17 jan. 2019.
Section
Articles