Understanding the concept of outlier and its relevance to the assessment of data quality: Probabilistic background theory

Monhor, Davaadorjin; Takemoto, Shuzo

doi:10.1186/BF03351881

Article
Open access
Published: 20 June 2014

Understanding the concept of outlier and its relevance to the assessment of data quality: Probabilistic background theory

Davaadorjin Monhor¹ &
Shuzo Takemoto¹

Earth, Planets and Space volume 57, pages 1009–1018 (2005)Cite this article

1242 Accesses
9 Citations
Metrics details

Abstract

In recent years an increasing interest in the studies on outlier can be observed, however, for the time being there exists no general definition of outlier. In the present paper we introduced a generic descriptive definition of outlier. We observed that the outlier problems had so far been treated in statistical way without paying proper attention to probabilistic-theoretic backgrounds. In view of this gap, we made an attempt to establish a probabilistic background theory. Within this framework, the large deviations are considered as probabilistic-theoretic model of outlier, and the interrelationship of the laws of large numbers, the central limit theorems and the large deviations are clarified. These considerations are specialized for the case of statistical sample, which is important from the point of view of the assessment of data quality. Some methodological and historical aspects of geodesy, geophysics and astronomy are mentioned, too. We revealed that the data analysis carried out by Kepler in the process of discovery of his famous elliptic law of planetary motion has relevance to the outlier problem. This methodologically interesting fact is a new result in the history of geosciences. We established that the accuracy of Chebyshev inequality increases as the deviation of the random variable involved from its expectation, increases. The possibility of application of Chebyshev inequality to the outlier problem is pointed out.

References

Aiton, E. J., Kepler’s second law of Planetary Motion, Isis, 60, 75–90, 1969.
Article Google Scholar
Baarda, W., A testing procedure for use in geodetic networks, Publications on geodesy, Vol. 2, No. 5. Netherlands Geodetic Commission, Deft, 1968.
Google Scholar
Bachmann, P., Zahlentheorie: Die Analytische Zahlentheorie, Zweiter Theil, B. G. Teubner, Leipzig, 1894.
Google Scholar
Bahudar, R. and R. R. Rao, On deviations of sample mean, Annals of Mathematical Statistics, 31, 1015–1027, 1960.
Article Google Scholar
Barnett, V. and T. Lewis, Outliers in Statistical data, Second Edition, John Wiley, New York, 1984.
Google Scholar
van Beek, P., An application of the Fourier methods to the problem of sharpening the Berry-Esseen inequality, Z. Wahrscheinlichkeitstheorie ver. Geb., 23, 187–196, 1972.
Article Google Scholar
Bernoulli, D., The most probable choice between several discrepant observations and the formations therefrom of the most likely induction, Reprinted in Biometrika, 48, 1–18, 1961.
Article Google Scholar
Bernoulli, J., Wahrscheinlichkeitsrechnung, Whilh. Engelmann, Leipzig, 1899.
Google Scholar
Berry, A. C., The accuracy of the Gaussian approximation to the sum of independent variates, Trans. Amer. Math. Soc., 49, 122–136, 1941.
Article Google Scholar
Bickel, P. J. and A. M. Krieger, Extensions of Chebyshev’s inequality with applcations, Probability and Mathematical Statistics, 13, 293–310, 1992.
Google Scholar
Boscovich, R. J., De litteraria expeditione per pontificiam ditionem, et synopsis amplioris operis, ac habentur plura ejus ex exemplaria etiam sensorum impessa, Bononiensi Scientiarum et Artum Instuto Atque Academia Commentarii, 4, 353–396, 1757.
Google Scholar
Chebyshev, P. L., Des valeurs moyennes, Liouville’s, J. Math. Pures Appl., 12, 177–184, 1867.
Google Scholar
Cramèr, H., Sur un nouveau théorème-limite de la théorie des probabilités, Actualités Scientifiques et Industrielles, 736, 5–23, 1938.
Google Scholar
Detrekoi, A., On the taking of gross errors into consideration in processing measurement data, Geodezia es Kartgrafia, No. 3, 155–160, 1986 (in Hungarian).
Google Scholar
Dreyer, J. L. E., Tycho Brahe: a Picture of Scientific Life and Work in the XVIth Century, Black, Edinburgh, 1890.
Google Scholar
Esseen, C. G., On the Liapunov limit error in the theory of probability, Ark. Mat. Astr. Fys., 28, 1–19, 1942.
Google Scholar
Finney, R. L. and G. B. Thomas, Calculus, Addison-Wesley, New York, 1990.
Google Scholar
Gather, U., Outlier models and some related inferential issues, in The Exponential Distribution, edited by N. Balakrishnan and A. P. Basu, pp. 221–239, University of Missouri-Columbia, Gordon and Breach Publishers, 2000.
Google Scholar
Imanishi, Y., T. Higashi, and Y Fukuda, Calibration of the superconducting gravimeter T011 by parallel observation with the absolute gravimeter FG5#210—a Bayesian approach, Geophys. J. Int., 151, 867–878, 2002.
Article Google Scholar
Khinchine, A. I., Sur la loi des grands nombres, Comptes rendus de l’Académie des Sciences, 189, 477–479, 1929.
Google Scholar
Knuth, D. E., “Big Omicron and big Omega and big Theta”, SIGACT News, Special Interest Group on Algorithms and Computation Theory, 8, 18–14, 1976.
Google Scholar
Kolmogorov, A. N., Sur la loi forte des grands nombres, Comptes rendus de l’Académie des Sciences, 191, 910–912, 1930.
Google Scholar
Kolmogorov, A. N., Foundations of the Theory of Probability, Chelsea, New York, 1950.
Google Scholar
Kubik, K., W. Weng, and P. Frederiksen, Oh, Gross Errors!, Australian Journal of Geodesy, Photogrammetry and Surveying, 42, 1–18, 1985.
Google Scholar
Landau, E., Vorlesungen über Zahlentheorie: Aus der Analytischen und geometrischen Zahlentheorie, Zweiter Band, Hirzel, Leipzig, 1927.
Google Scholar
Laplace, P. S., Memoire sur les approximations des formules qui sont fonctions de tres grands nombres et sur leur applications aux probabilités, Mémoires de l’Académie des Sciences de Paris, 353–415, Supplement 559–569, 1810.
Google Scholar
Laplace, P. S., Théorie Analytique des Probabilités, Gauthier-Villars, Paris 1st ed., 1812., 2nd ed., 1814 and 3rd ed., 1820.
Google Scholar
Legendre, A. M., Méthods des moindres carrés, pour trouver le milien le plus probable entre les résultats de differéntes observations, Mem. Inst. de France, 149–154, 1810.
Google Scholar
Linnik, Y. V., On the probability of large deviations for the sums of independent variables, Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, volume II, 289–306, 1961.
Google Scholar
Maire, C. and R. J. Boscovich, De litteraria expeditione per pontificiam ditionem ad dimetiendas duas Meridiani gradus, et corrigendum mappam geographicam, jussu, et auspiciis Benedicti XIV pont. Max. Suscepta. Ramae, or its French translation, 1755.
Google Scholar
Maire, C. and R. J. Boscovich, Voyage Astronomique et Géographique dans l’Etal de l’Eglise, entrepis par l’Ordre et sous les Auspices du Pope Benoit XIV, pour mesurer deux degrés du méridien, et corriger la Carte de l’Etat ecclesiastique, Paris, 1770.
Google Scholar
Monhor, D. and S. Takemoto, Geodetic and astronomical contributions to the invention of the normal distribution: some refinements and new evidences, J. Geod. Soc. Japan, 2004 (submitted).
Google Scholar
Nagaev, S. V, Large deviations of sums of independent random variables, The Annals of Probability, 7, 745–789, 1979.
Article Google Scholar
O’Gorman, M. A. and R. H. Myers, Measures of errors with outlier in regression, Comm. Statist. Simula., 16, 771–789, 1987.
Article Google Scholar
Pearson, K., James Brenoulli’s theorem, Biometrika, 17, 202–211, 1925.
Google Scholar
Plackett, R. L., The principle of the arithmetic mean, Biometrika, 45, 130–135, 1958.
Article Google Scholar
Poisson, S. D., Recherches sur la Probabilité des Jugements en Matière Criminalle et en Matière Civile, précedées des Règles Genérales du Calcul des Probabilités, Bachelier, Paris, 1837. Translated into German by C. H. Schnuse under the title: Lehrbuch der Wahrscheinlichkeitsrechung und deren wichtigen Anwendungen, Braunschweig, 1841.
Google Scholar
Srikantan, K. S., Testing the outlier in a regression model, Sankhya, A, 23, 251–260, 1961.
Google Scholar
Stefansky, W., Rejecting outliers by maximum normal residual, The Annals of Statistics, 42, 35–45, 1971.
Article Google Scholar
Stefansky, W., Rejecting outliers in factorial designs, Tecnnometics, 14, 469–479, 1972.
Article Google Scholar
Todhunter, I., A History of the Mathematical Theories of Attraction and the Figure of the Earth, in two volumes, Macmillan and Co., London, 1873.
Google Scholar
Wilks, S. S., Mathematical Statistics, Wiley, New York, 1962.
Google Scholar
Wilks, S.S., Statistical inference in geology, The Earth Sciences: Problems and Progress in Current Research, edited by T. W. Donnelly, Rice University Semicentenial Publications, pp. 105–136, 1963.
Google Scholar
Wilson, C, Kepler’s derivation of the elliptic path, Isis, 59, 4–25, 1968.
Article Google Scholar
Zolotarev, V M., A sharpening of the inequality of Berry-Esseen, Z. Wahrscheinlichkeitstheorie ver. Geb., 8, 332–342, 1967.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Geophysics, Graduate School of Science, Kyoto University, Kitashirakawa Oiwake-cho, Sakyo-ku, Kyoto City, Kyoto, 606-8502, Japan
Davaadorjin Monhor & Shuzo Takemoto

Authors

Davaadorjin Monhor
View author publications
You can also search for this author in PubMed Google Scholar
Shuzo Takemoto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Davaadorjin Monhor.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Monhor, D., Takemoto, S. Understanding the concept of outlier and its relevance to the assessment of data quality: Probabilistic background theory. Earth Planet Sp 57, 1009–1018 (2005). https://doi.org/10.1186/BF03351881

Download citation

Received: 18 November 2004
Revised: 28 June 2005
Accepted: 19 July 2005
Published: 20 June 2014
Issue Date: November 2005
DOI: https://doi.org/10.1186/BF03351881

Understanding the concept of outlier and its relevance to the assessment of data quality: Probabilistic background theory

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Key words