The British Journal for the Philosophy of Science Advance Access originally published online on April 11, 2006
The British Journal for the Philosophy of Science 2006 57(2):323-357; doi:10.1093/bjps/axl003
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Severe Testing as a Basic Concept in a NeymanPearson Philosophy of Induction
Virginia Tech, Department of Philosophy, Blacksburg, VA 24061, USA mayod{at}vt.edu
Virginia Tech, Department of Economics, Blacksburg VA 2406, USA aris{at}vt.edu
Despite the widespread use of key concepts of the NeymanPearson (NP) statistical paradigmtype I and II errors, significance levels, power, confidence levelsthey have been the subject of philosophical controversy and debate for over 60 years. Both current and long-standing problems of NP tests stem from unclarity and confusion, even among NP adherents, as to how a test's (pre-data) error probabilities are to be used for (post-data) inductive inference as opposed to inductive behavior. We argue that the relevance of error probabilities is to ensure that only statistical hypotheses that have passed severe or probative tests are inferred from the data. The severity criterion supplies a meta-statistical principle for evaluating proposed statistical inferences, avoiding classic fallacies from tests that are overly sensitive, as well as those not sensitive enough to particular errors and discrepancies.
- Introduction and overview
- 1.1 Behavioristic and inferential rationales for NeymanPearson (NP) tests
- 1.2 Severity rationale: induction as severe testing
- 1.3 Severity as a meta-statistical concept: three required restrictions on the NP paradigm
- 1.2 Severity rationale: induction as severe testing
- 1.1 Behavioristic and inferential rationales for NeymanPearson (NP) tests
- Error statistical tests from the severity perspective
- 2.1 NP test T(
): type I, II error probabilities and power
- 2.2 Specifying test T(
) using p-values
- 2.2 Specifying test T(
- 2.1 NP test T(
- Neyman's post-data use of power
- 3.1 Neyman: does failure to reject H warrant confirming H?
- 3.1 Neyman: does failure to reject H warrant confirming H?
- Severe testing as a basic concept for an adequate post-data inference
- 4.1 The severity interpretation of acceptance (SIA) for test T(
)
- 4.2 The fallacy of acceptance (i.e., an insignificant difference): Ms Rosy
- 4.3 Severity and power
- 4.2 The fallacy of acceptance (i.e., an insignificant difference): Ms Rosy
- 4.1 The severity interpretation of acceptance (SIA) for test T(
- Fallacy of rejection: statistical vs. substantive significance
- 5.1 Taking a rejection of H0 as evidence for a substantive claim or theory
- 5.2 A statistically significant difference from H0 may fail to indicate a substantively important magnitude
- 5.3 Principle for the severity interpretation of a rejection (SIR)
- 5.4 Comparing significant results with different sample sizes in T(
): large n problem
- 5.5 General testing rules for T(
), using the severe testing concept
- 5.2 A statistically significant difference from H0 may fail to indicate a substantively important magnitude
- 5.1 Taking a rejection of H0 as evidence for a substantive claim or theory
- The severe testing concept and confidence intervals
- 6.1 Dualities between one and two-sided intervals and tests
- 6.2 Avoiding shortcomings of confidence intervals
- 6.2 Avoiding shortcomings of confidence intervals
- 6.1 Dualities between one and two-sided intervals and tests
- Beyond the NP paradigm: pure significance, and misspecification tests
- Concluding comments: have we shown severity to be a basic concept in a NP philosophy of induction?