Movatterモバイル変換


[0]ホーム

URL:


Skip to Main Content

Advertisement

MIT Press Direct, home
header search
    Neural Computation
    Skip Nav Destination
    Article navigation
    October 01 1998

    Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms

    In Special Collection:CogNet
    Thomas G. Dietterich
    Thomas G. Dietterich
    Department of Computer Science, Oregon State University, Corvallis, OR 97331, U.S.A.
    Search for other works by this author on:
    Crossmark: Check for Updates
    Thomas G. Dietterich
    Department of Computer Science, Oregon State University, Corvallis, OR 97331, U.S.A.
    Received:October 21 1996
    Accepted:January 09 1998
    Online ISSN: 1530-888X
    Print ISSN: 0899-7667
    © 1998 Massachusetts Institute of Technology
    1998
    Neural Computation (1998) 10 (7): 1895–1923.
    Article history
    Received:
    October 21 1996
    Accepted:
    January 09 1998
    Citation

    Thomas G. Dietterich; Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms.Neural Comput 1998; 10 (7): 1895–1923. doi:https://doi.org/10.1162/089976698300017197

    Download citation file:

    toolbar search
    toolbar search

      Abstract

      This article reviews five approximate statistical tests for determining whether one learning algorithm outperforms another on a particular learning task. These test sare compared experimentally to determine their probability of incorrectly detecting a difference when no difference exists (type I error). Two widely used statistical tests are shown to have high probability of type I error in certain situations and should never be used: a test for the difference of two proportions and a paired-differencest test based on taking several random train-test splits. A third test, a paired-differencest test based on 10-fold cross-validation, exhibits somewhat elevated probability of type I error. A fourth test, McNemar's test, is shown to have low type I error. The fifth test is a new test, 5 × 2 cv, based on five iterations of twofold cross-validation. Experiments show that this test also has acceptable type I error. The article also measures the power (ability to detect algorithm differences when they do exist) of these tests. The cross-validatedt test is the most powerful. The 5×2 cv test is shown to be slightly more powerful than McNemar's test. The choice of the best test is determined by the computational cost of running the learning algorithm. For algorithms that can be executed only once, Mc-Nemar's test is the only test with acceptable type I error. For algorithms that can be executed 10 times, the 5 × 2 cv test is recommended, because it is slightly more powerful and because it directly measures variation due to the choice of training set.

      This content is only available as a PDF.
      © 1998 Massachusetts Institute of Technology
      1998
      You do not currently have access to this content.

      Sign in

      Don't already have an account?Register

      Client Account

      You could not be signed in. Please check your email address / username and password and try again.
      Could not validate captcha. Please try again.

      Sign in via your Institution

      Sign in via your Institution
      3,650Views
      2,348Web of Science
      2,405Crossref

      Advertisement

      Related Book Chapters

      Approximate Inference
      An Introduction to Lifted Probabilistic Inference
      Inference of Approximations
      Systems That Learn: An Introduction to Learning Theory
      APPROXIMATION CRITERIA
      The Analysis and Synthesis of Linear Servomechanisms
      Function Approximation
      Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation

      Advertisement

      Neural Computation
      • Online ISSN 1530-888X
      • Print ISSN 0899-7667
      Close Modal
      Close Modal
      This Feature Is Available To Subscribers Only

      Sign In orCreate an Account

      Close Modal
      Close Modal
      This site uses cookies. By continuing to use our website, you are agreeing toour privacy policy. No content on this site may be used to train artificial intelligence systems without permission in writing from the MIT Press.
      Accept

      [8]ページ先頭

      ©2009-2025 Movatter.jp