Skip to content

Advertisement

Open Access

Understanding the concept of outlier and its relevance to the assessment of data quality: Probabilistic background theory

Earth, Planets and Space201457:BF03351881

https://doi.org/10.1186/BF03351881

Received: 18 November 2004

Accepted: 19 July 2005

Published: 20 June 2014

Abstract

In recent years an increasing interest in the studies on outlier can be observed, however, for the time being there exists no general definition of outlier. In the present paper we introduced a generic descriptive definition of outlier. We observed that the outlier problems had so far been treated in statistical way without paying proper attention to probabilistic-theoretic backgrounds. In view of this gap, we made an attempt to establish a probabilistic background theory. Within this framework, the large deviations are considered as probabilistic-theoretic model of outlier, and the interrelationship of the laws of large numbers, the central limit theorems and the large deviations are clarified. These considerations are specialized for the case of statistical sample, which is important from the point of view of the assessment of data quality. Some methodological and historical aspects of geodesy, geophysics and astronomy are mentioned, too. We revealed that the data analysis carried out by Kepler in the process of discovery of his famous elliptic law of planetary motion has relevance to the outlier problem. This methodologically interesting fact is a new result in the history of geosciences. We established that the accuracy of Chebyshev inequality increases as the deviation of the random variable involved from its expectation, increases. The possibility of application of Chebyshev inequality to the outlier problem is pointed out.

Key words

Assessment of data qualityBerry-Esseen theoremChebyshev inequalitylarge deviationsoutliers

Advertisement