Skip Navigation


The British Journal for the Philosophy of Science Advance Access originally published online on April 11, 2006
The British Journal for the Philosophy of Science 2006 57(2):323-357; doi:10.1093/bjps/axl003
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
57/2/323    most recent
axl003v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Mayo, D. G.
Right arrow Articles by Spanos, A.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2006). Published by Oxford University Press on behalf of British Society for the Philosophy of Science. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Severe Testing as a Basic Concept in a Neyman–Pearson Philosophy of Induction

Deborah G. Mayo and Aris Spanos

Virginia Tech, Department of Philosophy, Blacksburg, VA 24061, USA mayod{at}vt.edu
Virginia Tech, Department of Economics, Blacksburg VA 2406, USA aris{at}vt.edu

Despite the widespread use of key concepts of the Neyman–Pearson (N–P) statistical paradigm—type I and II errors, significance levels, power, confidence levels—they have been the subject of philosophical controversy and debate for over 60 years. Both current and long-standing problems of N–P tests stem from unclarity and confusion, even among N–P adherents, as to how a test's (pre-data) error probabilities are to be used for (post-data) inductive inference as opposed to inductive behavior. We argue that the relevance of error probabilities is to ensure that only statistical hypotheses that have passed severe or probative tests are inferred from the data. The severity criterion supplies a meta-statistical principle for evaluating proposed statistical inferences, avoiding classic fallacies from tests that are overly sensitive, as well as those not sensitive enough to particular errors and discrepancies.

  1. Introduction and overview
    1.1 Behavioristic and inferential rationales for Neyman–Pearson (N–P) tests
    1.2 Severity rationale: induction as severe testing
    1.3 Severity as a meta-statistical concept: three required restrictions on the N–P paradigm

  2. Error statistical tests from the severity perspective
    2.1 N–P test T({alpha}): type I, II error probabilities and power
    2.2 Specifying test T({alpha}) using p-values

  3. Neyman's post-data use of power
    3.1 Neyman: does failure to reject H warrant confirming H?

  4. Severe testing as a basic concept for an adequate post-data inference
    4.1 The severity interpretation of acceptance (SIA) for test T({alpha})
    4.2 The fallacy of acceptance (i.e., an insignificant difference): Ms Rosy
    4.3 Severity and power

  5. Fallacy of rejection: statistical vs. substantive significance
    5.1 Taking a rejection of H0 as evidence for a substantive claim or theory
    5.2 A statistically significant difference from H0 may fail to indicate a substantively important magnitude
    5.3 Principle for the severity interpretation of a rejection (SIR)
    5.4 Comparing significant results with different sample sizes in T({alpha}): large n problem
    5.5 General testing rules for T({alpha}), using the severe testing concept

  6. The severe testing concept and confidence intervals
    6.1 Dualities between one and two-sided intervals and tests
    6.2 Avoiding shortcomings of confidence intervals

  7. Beyond the N–P paradigm: pure significance, and misspecification tests
  8. Concluding comments: have we shown severity to be a basic concept in a N–P philosophy of induction?


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Br J Philos SciHome page
D. G. Mayo
How to Discount Double-Counting When It Counts: Some Clarifications
Brit J Philos Sci, December 1, 2008; 59(4): 857 - 879.
[Abstract] [Full Text] [PDF]


Home page
Hum Exp ToxicolHome page
D. Mayo and A Spanos
Risks to health and risks to science: the need for a responsible "bioevidential" scrutiny
Human and Experimental Toxicology, August 1, 2008; 27(8): 621 - 625.
[Abstract] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.