Skip Navigation


The British Journal for the Philosophy of Science Advance Access originally published online on March 2, 2009
The British Journal for the Philosophy of Science 2009 60(2):345-375; doi:10.1093/bjps/axp008
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
60/2/345    most recent
axp008v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Huemer, M.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2009). Published by Oxford University Press on behalf of British Society for the Philosophy of Science. All rights reserved. For permissions, please email: journals.permissions@oxfordjournals.org

Explanationist Aid for the Theory of Inductive Logic

Michael Huemer

Department of Philosophy, University of Colorado, Boulder, CO 80309-0232, USA, dqvgudj02{at}sneakemail.com


    Abstract
 TOP
 Abstract
 1. A Probabilistic Formulation...
 2. A Problem with...
 3. Explanationist Relief for...
 4. Problems and objections
 Notes
 References
 
A central problem facing a probabilistic approach to the problem of induction is the difficulty of sufficiently constraining prior probabilities so as to yield the conclusion that induction is cogent. The Principle of Indifference, according to which alternatives are equiprobable when one has no grounds for preferring one over another, represents one way of addressing this problem; however, the Principle faces the well-known problem that multiple interpretations of it are possible, leading to incompatible conclusions. I propose a partial solution to the latter problem, drawing on the notion of explanatory priority. The resulting synthesis of Bayesian and inference-to-best-explanation approaches affords a principled defense of prior probability distributions that support induction.

  1. A Probabilistic Formulation of the Problem of Induction
  2. A Problem with Objective Bayesianism
    2.1 Intuitive motivation for the Principle of Indifference
    2.2 The inconsistency objection
    2.3 An effort to contain the problem

  3. Explanationist Relief for Objective Bayesianism
    3.1 Explanation and explanatory priority
    3.2 Explanatory priority and the assignment of priors
    3.3 In defense of Laplace
    3.4 The metaphysics of the explanationist defense: causation and laws
    3.5 Inference to the best explanation?

  4. Problems and objections
    4.1 Unknown explanatory possibilities
    4.2 Empirical reasoning about explanatory priority
    4.3 The probability of deterministic laws
    4.4 Changing chances
    4.5 Scruples concerning a priori probability


    1. A Probabilistic Formulation of the Problem of Induction
 TOP
 Abstract
 1. A Probabilistic Formulation...
 2. A Problem with...
 3. Explanationist Relief for...
 4. Problems and objections
 Notes
 References
 
The problem of induction is the problem of explaining why it often makes sense to accept conclusions that are supported only by inductive arguments. I take an inductive argument to be a species of non-demonstrative argument in which what is known to be true of a sample from some population is extended to other members of the population not included in the sample. Sometimes induction is represented as proceeding according to the following pattern:

All observed A's have been B.

Therefore (probably), all A's are B.

Sometimes, instead, induction is represented as following this pattern:

All observed A's have been B.

Therefore (probably), the next A to be observed will be B.

Hereafter, I shall focus mainly on the second sort of inductive inference, partly because it seems more likely that the second sort of induction can be justified than that the first can, though the justification of the second sort of induction is nevertheless nontrivial and philosophically interesting.

The problem of induction is a problem largely because of the perceived force of inductive skepticism, the view that the premises of an inductive argument as such provide no epistemic reason for accepting the conclusion of that argument. While a number of influential philosophers have embraced it,1 the view's counter-intuitiveness—entailing as it does that we presently have no evidence that the Earth revolves around the sun, and that there is no epistemic reason to think that placing my hand in a fire will be painful—seems sufficient reason for seeking a way of avoiding inductive skepticism. In any case, I shall assume hereafter that a non-skeptical resolution of the problem of induction is desirable.

This is not to say that we should aim at defending the rationality of every inductive inference. A plausible theory of induction may impose strictures on cogent inductive inferences that rule out many actual or possible inductions. Two candidate strictures that come to mind are that the sample that the inductive premises concern should be large, and that it should be sufficiently varied. Doubtless there are other such plausible conditions. But we shall be satisfied if we can defend the thesis that at least some inductive inferences are cogent. Hereafter, when I discuss inductive inferences, I shall have in mind those inductive inferences that are the best candidates for cogent inferences—that is, inductions in which the sample is large and varied; there are no special reasons for doubting the conclusion; the premises and conclusion use ordinary predicates, rather than ‘grue-like’ predicates; and so on. This assumption is fair, since inductive skeptics deny that induction can be justified even in the seemingly most favorable of circumstances.

Clearly, the conclusion of an inductive argument is not certain to be true given that the premises are. Once we acknowledge this, it is natural to turn to a probabilistic formulation of the issue: those who accept the cogency of some forms of induction (hereafter, ‘inductivists’) are naturally taken as claiming that the conclusion of an inductive argument is supported by its premises in the sense that the premises render the conclusion more probable. Inductive skeptics are naturally read as claiming that the conclusion of an inductive argument is not rendered more probable by its premises. This formulation of the issue requires an epistemic or logical interpretation of probability, rather than a physical interpretation. Hereinafter, I shall assume that such an interpretation is acceptable, addressing myself to the question of to what extent the notion of epistemic probability affords a solution to the problem of induction.

It will be convenient hereafter to discuss inductivism and inductive skepticism in terms of a simple, admittedly artificial example. If we can come to an understanding of this case, we will have a better chance of subsequently generalizing our results:

Example 1 A physical process X has been discovered, the laws governing which are as yet unknown, except that the process must produce exactly one of two outcomes, A or B, on every occasion. No relevant further information is known about X, A, or B. We plan an experiment in which X will occur n times, and we will observe on each occasion whether A or B results.

Let Ai = [Outcome A occurs on the ith trial],

Ui = [Outcome A occurs on all of the first i trials].

In this case, we wish to consider whether (at least for large values of i), Ui provides probabilistic evidence for Ai+1. The following three positions are possible:

Inductivism: P(Ai+1Formula Ui) > P(Ai+1)
Skepticism: P(Ai+1Formula Ui) = P(Ai+1),
Counter-inductivism: P(Ai+1Formula Ui) < P(Ai+1).

Inductivism, inductive skepticism, and counter-inductivism, so defined, are each probabilistically coherent views. Perhaps the easiest way to see the coherence of inductive skepticism (pace David Stove2) is to consider a model of inductive skepticism, that is, a possible case in which the correct probability distribution would in fact be the one employed by the inductive skeptic: Suppose a fair coin is to be flipped a large number of times. Suppose that the first 50 flips result in heads up. Given this, what is the objective chance that the coin will land heads up on the next flip? Answer: 1/2, the same as the prior probability of the coin landing heads up on any given trial. What happens during the first 50 flips is independent of what happens on any subsequent flip, since the coin is fair and has no memory of what happened previously. Since this is a possible distribution for the objective chances, it is also a coherent distribution for epistemic or subjective probabilities, since the latter are governed by the same axioms. The inductive skeptic's view is that distinct observations are analogous to distinct flips of a coin known to be fair: they are entirely probabilistically independent of each other.

The counter-inductivist distribution, on the other hand, is similar to the probability distribution appropriate to a game of Russian Roulette: the more times you have pulled the trigger (without spinning the barrel again) and not been shot, the more likely it is that you will be shot the next time. Despite the occasional human tendency to commit the gambler's fallacy, there may be no one who has advanced a general counter-inductivist probability distribution (though Popper and Miller [1983] come close).

Our task is to explain why the admittedly coherent probability distribution of the skeptic or the counter-inductivist is rationally inferior to some inductivist probability distribution.


    2. A Problem with Objective Bayesianism
 TOP
 Abstract
 1. A Probabilistic Formulation...
 2. A Problem with...
 3. Explanationist Relief for...
 4. Problems and objections
 Notes
 References
 
2.1 Intuitive motivation for the Principle of Indifference
Objective Bayesians recognize constraints on initial probability distributions that go beyond the Kolmogorov axioms.3 Ideally, we might hope that such constraints will uniquely determine the prior probability of every proposition. But even much more modest constraints could suffice to avoid inductive skepticism—as long as we can constrain priors sufficiently that, for example, the drawing of a series of black balls from an urn supports the hypothesis that the next ball drawn will be black, we will have made significant progress on the problem of induction.

The Principle of Indifference, according to which the probabilities of two alternatives are equal whenever one lacks reason for favoring one over the other, is perhaps the most popular way of constraining prior probabilities. This principle can be motivated by an epistemic or logical interpretation of probability. Suppose that the probability of a proposition (for a given person) is understood as a measure of how much reason one has to believe that proposition, or the degree to which that proposition is supported by one's evidence. Then the Principle of Indifference amounts to the claim that, if one has no reason for preferring one alternative over another, then one has as much reason, or evidence, for the one proposition as for the other. This principle seems close to an analytic truth, though it presupposes the substantive assumption that how much reason one has to believe a proposition may be treated as a quantity. It seems that, if one does not have an equal amount of reason to believe A as to believe B, then one must have more reason to believe one than to believe the other. But this is incompatible with one's having no reason to prefer either alternative. Therefore, if one has no reason to prefer either A or B, then they must have equal epistemic probabilities.

2.2 The inconsistency objection
Consider one illustration of the common charge that the Principle of Indifference is inconsistent:

Example 2 Sue has taken a trip of 100 miles in her car. The trip took between 1 and 2 hours, and thus, Sue's average speed was between 50 and 100 miles per hour. Given only this information, what is the probability that the trip took between 1 hour and 1 1/2 hours?4

Here is one solution. Using a generalization of the Principle of Indifference, we assign a flat probability density over the range of possible durations of the trip, from 1 hour to 2 hours. Since the interval from 1 hour to 1 1/2 hours is one-half of the total range of possibilities, the probability of the true time falling in that interval is 1/2.

Here is another solution. Again using a generalization of the Principle of Indifference, we assign a flat probability density over the range of possible average velocities with which Sue may have traveled. Now, the time of Sue's journey was between 1 hour and 1 1/2 hours if and only if her velocity was between 66 2/3 mph (= 100 miles/1 1/2 hours) and 100 mph (= 100 miles/1 hour). Since the interval from 66 2/3 to 100 mph is two-thirds of the total range of possible velocities, the probability of the true velocity falling in that interval is 2/3.

These two answers are inconsistent, yet both seem to be arrived at by equally natural applications of the Principle of Indifference. At worst, we might conclude that the Principle of Indifference is inconsistent (Van Fraassen [1989], p. 303; Howson and Urbach [1989], pp. 45–8). At best, we might say that the principle stands in need of clarification: When we wish to deploy the Principle of Indifference, under what way of partitioning the possibilities ought we to assign each possibility an equal prior probability? For cases with a continuous range of alternatives, with respect to what variable ought we to assign a uniform prior probability density?

2.3 An effort to contain the problem
We might seek to limit the impact of the inconsistency objection by arguing that at least in some cases, we have clear intuitions about which of a set of partitions of the space of possibilities is relevant. In those cases, we may deploy the Principle of Indifference. In cases like the above, in which we have no clear intuitions discriminating among some possible ways of characterizing the possibilities, perhaps we are unable to determine which of a set of numbers is the correct probability for a given proposition, or perhaps there is no uniquely correct probability.

Suppose, for example, that I inform you that I have a playing card in my pocket. Suppose you know nothing about me, so that you have no knowledge of what sort of playing cards I might prefer to keep in my pocket, and I refuse to tell you by what physical process the playing card in my pocket was selected. Given this, what is the probability that the card in my pocket is a four of clubs? Here is one solution: the four of clubs is one of 52 possible playing cards. Applying the Principle of Indifference, each of the possible cards has an equal probability of being in my pocket. So the probability that the card in my pocket is the four of clubs is 1/52.

Now, in the style of the Inconsistency Objection, here is another solution. The card in my pocket is either a three, or a four, or something else. Applying the Principle of Indifference, each of these alternatives has an equal probability. So the probability of the card's being a four is 1/3. Now, if it is a four, then it is either a club or not a club. Applying the Principle of Indifference again, each of these alternatives receives 1/2 probability. So the probability of the card's being the four of clubs is (1/3)(1/2) = 1/6.

This version of the inconsistency objection is intuitively uncompelling. The reason is that, though we may lack a general account of how possibilities should be partitioned when applying the Principle of Indifference, the partitioning required by the ‘1/6’ solution to the problem does not strike us as equally natural as the partitioning required by the ‘1/52’ solution. Rather, the partitioning used for the ‘1/52’ solution is clearly the more natural. In contrast, Sue's journey (Example 2) presents an intuitively compelling puzzle because the speed of Sue's car and the time of her journey seem equally natural variables in terms of which to characterize the possibilities. As a result, we might say that in the case of Sue's journey, the answer to the problem is either indeterminate or unknown, but that nevertheless, in the case of the card in my pocket, the problem has a clear, unique answer of 1/52.

Though I have some sympathy with this line of thought, it offers us little help with the problem of induction. For skeptics can defend their position with an application of the Principle of Indifference that seems intuitively natural, or at least not clearly artificial as in the ‘1/6’ solution to the playing card problem. This application of the Principle of Indifference is to assign an equal initial probability to each possible sequence of observations, or to each possible way of distributing properties to individuals. In Example 1, this amounts to assigning to each possible sequence of A and B results the same probability. Since there are 2i possible ways of distributing A and B among i members of a sequence, the probability of each possible sequence is (1/2)i. This is not an intuitively strained or artificial way of interpreting the Principle of Indifference. But of course, it amounts to the ‘fair coin’ probability distribution: the outcome of any iteration of process X will be probabilistically independent of the outcomes of any other iterations. P(Ai+1) = 1/2, since Ai+1 is one of the two possible outcomes of the (i + 1)th iteration of X; P(Ui) = (1/2)i, since Ui describes exactly one of the 2i possible sequences of the first i outcomes; and P(Ui & Ai+1) = (1/2)i+1, since (Ui & Ai+1) describes exactly one of the 2i+1 possible sequences of the first i + 1 outcomes. Applying the axiom of conditional probability, we obtain P(Ai+1Formula Ui) = P(Ui & Ai+1)/P(Ui) = (1/2)i+1/(1/2)i = 1/2, the same as the initial probability of Ai+1. Hence, inductive skepticism seems to be vindicated.

Another seemingly natural interpretation of the Principle of Indifference results in an inductivist probability distribution which we may call the Laplacean distribution. This interpretation assigns an equal initial probability to each possible proportion of As in the sequence. The proportion of As in a sequence of i instances of process X is either 0/i or 1/i or ... or i/i. So each of these possibilities has an initial probability of 1/(i + 1). This distribution favors induction: after i cases of A have been observed, with no Bs, the probability of the next observed case being A as well is given by


Formula

This is the Rule of Succession invoked by Bayes ([1763Go], scholium to Proposition 9), Laplace ([1995Go], pp. 10–1), and others to defend induction.5

If we are to wield the Principle of Indifference against inductive skepticism, then, we must supply a rationale for preferring an inductivist prior probability distribution, such as Laplace's distribution, over the inductive skeptic's distribution. It is here that objective Bayesians are most in need of aid. And it is here that explanationism enters our story.


    3. Explanationist Relief for Objective Bayesianism
 TOP
 Abstract
 1. A Probabilistic Formulation...
 2. A Problem with...
 3. Explanationist Relief for...
 4. Problems and objections
 Notes
 References
 
3.1 Explanation and explanatory priority
The explanationist holds that much of our non-demonstrative reasoning is to be understood in terms of inference to the best explanation (Harman [1965]; Foster [1982–3]; Niiniluoto [1999]; Lipton [2004]). Whether and how this approach comports with Bayesianism remains a matter of dispute. Bayes’ Theorem seems to provide at least partial support for the explanationist approach: in choosing between candidate explanations h1 and h2 for evidence e, one factor that seems relevant is the likelihood ratio P(e | h1)/P(e | h2). The greater this is, the better h1 is as an explanation of e, compared to h2—other things being equal, the hypothesis that more strongly predicts the evidence is the better explanation. Bayesians will go along with this approach so far.

But there is more to explanation than likelihood ratios reveal. An explanation must do more than induce a higher probability for the explanandum than the explanandum's initial probability. For instance, typically P(e | a & e) > P(e), yet (a & e) does not count as an explanation of e. Importantly, the explanans must be in some sense prior to (or more basic than, or more fundamental than) the explanandum. (a & e) violates this criterion for explaining e. Henceforth, I shall refer to this crucial relation that an explanatory fact must bear to its explanandum as ‘explanatory priority’. Following are examples of some kinds of explanatory priority:

  1. Causal priority: If A (partly) causes B, then the occurrence of A (that is, the fact that A occurs) is ‘prior’ to that of B in the order of explanation, meaning that A's occurrence is a candidate to figure in an explanation of B's occurrence, whereas B's occurrence is not fit to serve in an explanation of A's.
  2. Temporal priority: If A is a fact about events or states that are temporally prior to (exist before) the events or states that B concerns, then A is explanatorily prior to B. (A may still, of course, fail to satisfy some other requirement for explaining B.)6 For these purposes, an eternal or timeless fact may also be treated as prior to facts about what happens at particular times.7
  3. The part–whole relation: The existence, arrangement, and intrinsic features of the parts of an object are explanatorily prior to the existence and features of the whole.8
  4. The in-virtue-of relation: If B holds in virtue of A's holding, then A is explanatorily prior to B. The determinable-determinate relation may be a species of the in-virtue-of relation: if d is a determinate of D, then an object that has d will also have D in virtue of its having d. So a thing's having d will be explanatorily prior to its having D.9
  5. Supervenience: At least some forms of supervenience are also instances of explanatory priority. For instance, the object on which I am seated is a chair in virtue of its parts having certain microphysical properties and relations, the properties and relations on which its chairhood supervenes. So the instantiation of those properties and relations is explanatorily prior to this object's being a chair.

I treat explanatory priority as a relation between facts or propositions, since I take facts or propositions to be the sort of things that explain and are explained.10 Suitable rephrasing, however, could accommodate the view that events may explain or be explained. I shall not attempt to fully analyze the concept of explanation. I assume, however, that A's being a good explanation of B has at least these two important necessary conditions: (i) that A should be explanatorily prior to B, and (ii) that B should be more probable given A than otherwise (P(B | A) > P(B)), understanding probability in a logical or epistemic sense.11 In some cases, such as the case of causal explanations, it may seem as though the relevant sort of probability in condition (ii) is physical probability. However, provided that the explanans A includes the relevant causal laws or other facts that determine the physical probabilities, the relation between the logical probabilities Pl(B | A) and Pl(B) will mirror the relation between the physical probabilities Pp(B | A) and Pp(B). And it is reasonable to hold that A must include such causal laws to be a candidate for the full explanation of B.

3.2 Explanatory priority and the assignment of priors
In the literature on Bayesianism and inference to the best explanation, some have suggested that explanationism should be incorporated into a Bayesian framework through prior probabilities, roughly by one's assigning higher probabilities to propositions that are felt to be explanatory (Niiniluoto [1999], p. S448; Okasha [2000], pp. 702–4; Lipton [2004], pp. 115–6). Van Fraassen ([1989Go], pp. 138, 160–9) has suggested, instead, that the explanationist would give a bonus to the posterior probability of a hypothesis that is judged as the best explanation of the evidence, thereby violating Bayesian conditionalization. Each of these proposals seems artificial. Both have the flavor of ad hoc modifications to Bayesianism designed to humor explanationists.

Explanatory priority may affect the assignment of prior probabilities in a different way: it may feature in a partial solution to the problem of the interpretation of the Principle of Indifference, so that, rather than humoring explanationists, Bayesians may receive crucial aid from explanationists on a central problem for their view. The way in which considerations of explanatory priority may modify (or clarify) the Principle of Indifference is this: in applying the Principle of Indifference, one ought to assign equal probabilities (or a uniform probability density) at the most explanatorily basic level. I call this the Explanatory Priority Proviso to the Principle of Indifference. Suppose, that is, that we have two partitions of the space of possibilities, one that divides the possibilities into mutually exclusive, jointly exhaustive alternatives h1, ... , hn, ... , and another that divides the possibilities into mutually exclusive, jointly exhaustive alternatives j1, ... , jn, ... .12 Suppose further that each of the hi is explanatorily prior to each of the ji. Then the former partition should be preferred to the latter for purposes of applying the Principle of Indifference. For the case of continuous ranges of possibilities, suppose we have two variables, v1 and v2, each of whose values exhaust the possibilities. But suppose that v1's having the value that it does is explanatorily prior to v2's having its value. Then v1 should be preferred to v2 for purposes of applying the Principle of Indifference.13

Let us begin with some examples designed both to clarify how this interpretation may be applied and to exhibit its plausibility.

Example 3 You are informed that a certain lamp is either on or off, and also that a single marble was recently drawn from a bag containing only red, blue, and/or green marbles. If a red marble was drawn from the bag, then the person drawing the marble made sure the lamp would be on (turning it on if necessary). If either a blue or a green marble was drawn, then he made sure the lamp would be off. Given just this information, what is the probability that the lamp is on? What is the probability that a red marble was drawn?

Solution 1: The lamp is either on or off. Applying the Principle of Indifference, each of these alternatives has probability 1/2. The lamp is on if and only if a red marble was drawn from the bag. So the probability that a red marble was drawn is also 1/2, while the probability of a blue marble is 1/4, and the probability of a green marble is 1/4.

Solution 2: The marble drawn from the bag was red, blue, or green. Applying the Principle of Indifference, each of these alternatives has probability 1/3. The lamp is on if and only if a red marble was drawn from the bag. So the probability that the lamp is on is 1/3.

Solution 2 is the intuitively correct one. This is explained by the Explanatory Priority Proviso. The drawing of the marble is causally and temporally prior to the lamp's current state, so the possible results of the marble-drawing are explanatorily prior to the possible states of the lamp. Therefore, the Principle of Indifference is to be applied to the possible marble-drawing results. Solution 1 is incorrect, because the lamp's state is determined by the prior results of the drawing; we must therefore first assign probabilities to the possible results of the drawing, and determine the probability of the lamp's being on from that probability distribution.

Example 4 You are informed that a conscious brain has recently been artificially created. (This supposition is meant to neutralize your background knowledge of the sorts of states that brains are typically in.) The brain has been put in one of the 4 million possible states recognized by modern brain science. Assume that mental states supervene on physical states, and that 100,000 of the 4 million possible brain states realize overall painful mental states, 50,000 realize pleasurable mental states, and the remainder realize hedonically neutral mental states (or states that are between pleasure and pain). What is the probability, on this information, that the brain is in pain?

Solution #1: The brain is either in a painful state, in a pleasurable state, or in a hedonically neutral state. Applying the Principle of Indifference, each of these alternatives has a probability of 1/3.

Solution #2: Each of the possible brain states is equally probable. Since 100,000 of those states realize pain, the probability that the brain is in pain is 100,000/4,000,000 = 0.025.

Again, Solution 1 is intuitively wrong. One should not assign 1/3 probability to the brain's being in pain, because the brain's hedonic state is determined by its (explanatorily prior) physical state, and only 0.025 of the possible physical states give rise to pain.

Now that we have a sense of the plausibility of the Explanatory Priority Proviso, let us apply it to the problematic case discussed in Section 2.2:

Example 2 Sue has traveled a distance of 100 miles in between 1 hour and 2 hours. Her average velocity was between 50 and 100 mph. What is the probability that her trip lasted between 1 and 1 1/2 hours and thus that her average velocity was between 66 2/3 and 100 mph?

Solution: The duration of Sue's trip is causally explained by the speed at which she drove, not vice versa. Imagine someone, upon learning that Sue took two hours to arrive at her destination, asking, ‘Why did it take her so long?’ He might be told, ‘Because she was only going 50 miles per hour.’ This sort of explanation would be apt whether the question was ‘Why did it take so long?’, ‘Why did it take so little time?’, or simply ‘Why did it take exactly two hours?’—in any of these cases, one could cite the speed at which Sue drove, given the distance she had to travel. In contrast, imagine someone, upon learning that Sue was driving at 50 miles per hour, asking, ‘Why was she moving so slowly?’ He could not be appropriately answered, ‘Because she took two hours to get there.’ Relatedly, if Sue had wanted to make her journey shorter, she could have brought this about by driving faster. But she could not have made herself drive faster by arriving sooner. Thus, it seems that Sue's velocity is causally prior to the duration of her trip. Therefore, we assign a uniform probability density over the possible average velocities of Sue's trip. Since the measure of the interval [66 2/3, 100] is 2/3 of the measure of [50, 100], the probability of Sue's velocity falling in the former interval is 2/3.

Note that here we do not apply a uniform probability density to the possible durations of Sue's trip on the grounds that ‘velocity’ is defined in terms of distance and duration. The sort of priority invoked in the Explanatory Priority Proviso is metaphysical rather than conceptual. What matters is that Sue's velocity is metaphysically prior to the duration of her trip, because the velocity causally determines the time it will take to go 100 miles—not the ostensible fact that the concept of velocity is dependent on the concept of duration. One reason for preferring a reliance on metaphysical priority rather than conceptual priority is that conceptual priority may differ between different subjects. Suppose one individual formed the concept of duration first, and then formed the concept of velocity by defining velocity as distance traveled per unit time, while another individual formed the concept of velocity (or rate of change) first, and only later formed the concept of duration.14 It seems that these beings might nonetheless have the same information relevant to assigning probabilities in Example 2, and thus that our theory should not require them to endorse different answers to the problem.

One might doubt that conceptual priority relations can differ between subjects in this way. However, another argument against relying on conceptual priority is that doing so may result in intuitively wrong answers in cases like Example 4. Suppose that the mental concepts used in Example 4 (‘pain’, ‘pleasure’) are psychologically basic, since they are formed on the basis of direct introspection. But suppose that the concepts used for identifying the four million different brain states are largely theoretical and require complex definitions. Intuitively, this makes no difference to the correct treatment of Example 4.

3.3 In defense of Laplace
The Explanatory Priority Proviso does not resolve every puzzle regarding the interpretation of the Principle of Indifference. In some cases, we may have two ways of characterizing the possibilities, neither of which is intuitively more natural than the other, and neither of which classifies the alternatives in terms of explanatorily prior propositions. In such cases, perhaps the relevant probabilities are indeterminate, or perhaps some other principle is required to assess the relevant probabilities.

Nevertheless, the Explanatory Priority Proviso makes important progress toward solving the problem of induction, as it helps to resolve the dispute between the skeptical interpretation of the Principle of Indifference and the inductivist interpretation discussed in Section 2.3 above. Return to our original example:

Example 1. Process X is to be repeated n times, producing either A or B on each occasion. Where Ai is the proposition that outcome A occurs on the ith trial and Ui is the proposition that A occurs on all of the first i trials, what is P(Ai+1 | Ui)?

Solution: The physical process in question has some physical probability, or objective chance, of producing A on any given occasion. This objective chance is explanatorily prior to the individual outcomes or sequences of outcomes. Therefore, we assign a uniform (logical) probability density over the possible values of this objective chance, rather than over the possible sequences of outcomes.15 Thus, we assign


Formula 1

(1)
where c is the objective chance and {rho}(c) is the probability density function for c. To find P(Ai+1 | Ui), we invoke the axiom of conditional probability:


Formula 2

(2)
To determine the quantities on the right-hand side of Equation (2), we use the probability density given in Equation (1):


Formula 3

(3)
where C = c’ denotes the proposition that the objective chance of outcome A is c. The Probability that the first i instances of the experiment will result in outcome A, given that the objective chance on each instance is c, is ci (invoking a version of Lewis’ [1986] Principal Principle). Thus, we have:


Formula 4

(4)

Substituting Equations (4) into Equation (2), we arrive at the Rule of Succession:


Formula 5

(5)

Here, the Rule of Succession is justified, not by an arbitrary decision to privilege the classification of possibilities in terms of the possible proportions of A and B results over the classification in terms of the possible sequences of A and B results, but by the fact that the objective chances are explanatorily prior to the sequences, and thus that the Principle of Indifference must be applied at the level of objective chances. With the Rule of Succession, we have a reasonably strong form of inductivism: if we observe 98 As in a row, we attain a 99% probability that the next case will also be an A.

3.4 The metaphysics of the explanationist defense: causation and laws
This defense of Laplace's probability distribution and the Rule of Succession is available only on certain metaphysical assumptions, which may explain why neither Hume nor Carnap followed this route. To employ the preceding defense, one must accept the existence of objective chances, and one must accept that objective chances are explanatorily prior to particular events. Under what metaphysical conditions would objective chances be explanatorily prior to particular events? Suppose that the objective chance of outcome A is determined by the laws of nature and general, standing background conditions. Suppose further that laws of nature are conceived as eternal or timeless facts that in some sense ‘govern’ what happens in the world. Then the laws will be explanatorily prior to particular events. The standing background conditions as well will typically be explanatorily prior to the particular events whose objective chances we are concerned with, due to their temporal and causal priority. This makes it plausible that the resulting objective chances are explanatorily prior to particular events.

Consider an alternative metaphysical view. Suppose that, as Hume ([1975], p. 76) would have it, causation is nothing but constant conjunction, so that whether a type of event A causes a type of event B is determined by whether, in general, events of kind A are followed by events of kind B. Laws of nature, let us suppose, are nothing but summaries of patterns to be found in the particular events.16 This view seems to entail a reversal of the explanatory priority relation that we normally take to obtain. On the common-sense view about laws of nature, one ought to say: Events of type A are generally followed by events of type B, because it is a law that A-events are followed by B-events. On the Humean view, one ought to say: it is a law that A-events are followed by B-events, because events of type A are generally followed by events of type B. I take it that the ‘because’ in each case signals (at least) an explanatory relation. On the Humean view, we first have facts about what particular events occur at what times and places. Facts about causation and laws then supervene on, and are nothing over and above, those particular facts. On this view, causal priority ought not to be taken as implying explanatory priority. To say that As cause Bs is just to say that A-type events are always followed by B-type events. The mere existence of such a contingent pattern in the phenomena is not explanatorily prior to the particular occurrences of Bs.17 Rather, the facts about what sorts of particular events occur at what times are explanatorily prior to facts about the patterns in the series of events. This accords with three of the criteria of explanatory priority given in Section 3.1: (i) the particular events in a series are parts of the series, so facts about the former are prior to features of the latter; (ii) the causal and nomic facts, on the Humean view, hold in virtue of the facts about particular events; and (iii) the causal and nomic facts on the Humean view supervene on the facts about particular events. In contrast, on a realist view of causation and laws (see Armstrong [1983]; Tooley [1987]), the causal laws do not supervene on and do not hold in virtue of the facts about which particular events occur at which times, and since causal laws are not merely facts about patterns obtaining in a series of particular events, (i) is irrelevant.

For this reason, given the Explanatory Priority Proviso, Humean views of causation and laws induce a different sort of probability distribution from non-Humean, realist views. On a Humean view, the appropriate application of the Principle of Indifference is to assign equal probabilities to the possible sequences of particular events, resulting in the inductive skeptic's probability distribution.18 On a realist view, on the other hand, causal and nomological facts, including facts about objective chances, are explanatorily prior to facts about sequences of particular events, and the resulting interpretation of the Principle of Indifference, as we have seen, yields an inductivist probability distribution. This is one reason for preferring non-Humean theories of causation and laws, since they yield the intuitively correct sort of probability distributions.

3.5 Inference to the best explanation?
In what sense, if any, does the approach advanced here involve inference to the best explanation? Initially, it might appear that, while the approach makes use of the notion of explanatory priority, no actual inference to the best explanation is required to arrive at inductive conclusions. Rather, it appears that one arrives at an inductive conclusion by simply conditionalizing on some set of evidence, starting from an inductivist prior probability distribution. Considerations of explanatory priority feature in the motivation for that prior distribution, but even that does not obviously involve one in making an inference to the best explanation—at no point in the reasoning given in defense of the Laplacean distribution did we need to make the claim that some hypothesis was the best explanation for anything, as opposed to the mere claim that some hypotheses are explanatorily prior to others. And it seems that, once we have the appropriate prior distribution, at no later stage need we make the claim that some hypothesis is the best explanation for anything, either.

Though I am not greatly concerned with whether my approach involves genuine inference to the best explanation, it seems to me that it at least involves something very much like inference to the best explanation. On my approach, one begins by considering a set of alternatives that are explanatorily prior to the data and so are in that minimal sense potential explanations of the data. Which of these alternatives has the greatest posterior probability will be determined by the initial plausibility of each alternative together with the degree to which it predicts the evidence, P(e | h). The notion of an initially plausible, explanatorily prior hypothesis that confers a high probability on the evidence is at least something close to that of a good explanation of the evidence.

In our above example used to derive the Rule of Succession, the essential reason why Ai+1 receives a high probability conditional on Ui (for large i) is that the discovery of Ui confers a high posterior probability on hypotheses placing the objective chance of outcome A near the top of its range of possible values. The initially flat density distribution over C becomes skewed toward the top end. An explanationist might plausibly say: the best explanation for the evidence Ui is that c is close to 1. This is the best explanation, because this hypothesis (i) is explanatorily prior to the data, and (ii) confers a much higher likelihood on the data than the alternative explanatorily prior hypotheses (such as that c is close to 1/2 or that c is close to 0). We infer that this hypothesis is probably correct—which is to say, we raise our degree of belief in it—whereupon we must also raise our degree of belief that outcome A will occur in the future. It seems to me that the inductive prediction is supported both by an inference to the best explanation and by good Bayesian reasoning.

Admittedly, the notion of inference to the best explanation employed here is a thin one. The only criteria of good explanation that play a role in proper inductive reasoning, in my view, are those that play a role in the assessment of probabilities. For example, does the simplicity of a potentially explanatory hypothesis count toward the quality of the explanation, for purposes of making an inference to the best explanation? Only insofar as that simplicity bears on either the prior probability of the hypothesis, P(h), or the likelihood P(e | h).19 Similarly, if pragmatic factors affect the quality of an explanation, these factors nevertheless will be irrelevant to confirmation, unless they affect the proper assessment of logical probabilities. Thus, I do not defend inference to the best explanation without qualification: I defend inference in accordance with the principles of logical probability, I argue that some features implicated in the notion of a good explanation (the notion of explanatory priority in particular) also feature in the correct principles of logical probability, and I defend inference to the best explanation insofar as it accords with Bayesian reasoning, given those principles. It may be that my notion of explanation is too thin: perhaps what is ‘the best explanation’ for a set of phenomena depends in part on pragmatic or other factors not relevant to the assignment of logical probabilities. If so, I would not defend inference to the best explanation per se; I would defend only what we might call ‘inference to the L-best explanation’, where the L-best explanation is the explanation that is best in terms of the factors that are relevant to logical probability.


    4. Problems and objections
 TOP
 Abstract
 1. A Probabilistic Formulation...
 2. A Problem with...
 3. Explanationist Relief for...
 4. Problems and objections
 Notes
 References
 
The Explanationist-Bayesian approach raises a number of issues and problems that require further analysis. Here, I can offer only brief sketches of how a defender of the approach might seek to address some of these problems.

4.1 Unknown explanatory possibilities
The Explanatory Priority Proviso calls for the Principle of Indifference to be applied to the alternatives at the most explanatorily basic level. But in some cases, we do not know what the most explanatorily basic level is. Indeed, sometimes empirical investigation reveals new explanatory possibilities of which we were previously unaware. This is particularly to be expected if, as I have suggested, both causal priority and the part–whole relation imply explanatory priority. Suppose, for example, that we seek to explain the behavior of some chemical substance. In the light of atomic theory, hypotheses about the properties and arrangement of the atoms of which that substance is composed are among the explanatorily prior alternatives. We would thus want to begin by assigning probabilities to those alternatives in a suitably neutral manner. Later investigation may reveal, however, that atoms are composed of subatomic particles. We would thus want to assign probabilities in a neutral manner to alternative hypotheses about the subatomic particles, rather than to the alternative hypotheses about atoms.

The case of unknown explanatory possibilities raises a number of issues. One issue is familiar to Bayesians for other reasons: the Explanatory Priority Proviso appears to impose an unrealistic demand on epistemic agents. Given that we are often unaware of the explanatorily most basic alternatives, we cannot follow the directive to assign equal probabilities to these alternatives. This is analogous to a problem sometimes raised for Bayesians: given that mortal humans are unable to identify all the necessary truths, it is unrealistic to require that a rational person assign probability 1 to every proposition that is in fact necessary.

Perhaps the most natural way to deal with the problem is to say that one rationally ought to apply the Principle of Indifference to the alternatives at the most explanatorily basic level that one is aware of.20 This naturally suggests the view that, when one learns of new potentially explanatory alternatives, one will need to revise one's degrees of belief by a process other than conditionalization, a process designed to adjust one's degrees of belief to what they would have been, had one known of the new potentially explanatory alternatives earlier and had one then assigned each of them equal probabilities. Though Bayesians may be uncomfortable here, presumably this is the same sort of response as one would want to make to the problem of unknown necessary truths: when one discovers a new necessary truth—say, by proving a new theorem—one should revise one's degrees of belief (leaving aside the issue of uncertainty as to the soundness of the proof) by assigning probability 1 to that newly discovered truth. This is not a process of conditionalization, but rather, one might say, a process of correcting for one's earlier cognitive limitation.

4.2 Empirical reasoning about explanatory priority
In some cases, it seems clear that we must engage in empirical reasoning to discover the explanatory priority relations among facts. This is made particularly clear by the Causal Priority principle enunciated in Section 3.1, namely, the principle that the occurrence of a cause is explanatorily prior to the occurrence of its effect. What is causally prior to what is surely an empirical matter, so it seems that what is explanatorily prior to what is also at least partly an empirical matter. But the approach I have advanced seems to require that we know explanatory priority relations a priori, so that we may use them to assign logical probabilities to propositions a priori. How can my approach accommodate empirical reasoning about explanatory priority relations?

As this objection recognizes, my approach requires that at least some facts about explanatory priority can be known a priori. However, my approach does not require that all facts about explanatory priority are knowable a priori; the approach allows some cases of empirical reasoning about explanatory priority. To illustrate, consider the following example.

Example 5. Psychologists have discovered a set of correlations between people's political beliefs and people's emotional states. For instance, subjects with emotional attitude E1 are more likely to have belief B1, and vice versa. The psychologists consider three theories: (i) the relevant emotions causally influence the relevant beliefs; (ii) the relevant beliefs influence the relevant emotions; and (iii) there are no relevant causal relations, so any correlations are purely coincidental. It is not known which of these theories is true.

This example is simplified—in reality, more than three theories would be under consideration—but the example will nevertheless serve both to raise the problem for my view and to explain the outlines of a solution. On theory (i), facts about a subject's emotional state are explanatorily prior to facts about the subject's cognitive state; on theory (ii), facts about the subject's cognitive state are prior to facts about his emotional state; and on theory (iii), neither is explanatorily prior. Thus, given the Explanatory Priority Proviso, the three theories would call for different probability distributions over the set of possible mental states of a given subject. Given theory (i), we should assign a uniform prior over the possible emotional states of a given subject. Given (ii), we should assign a uniform prior over the possible cognitive states of a given subject. And given (iii), we should assign a uniform prior over the Cartesian product of the possible emotional states and the possible cognitive states of the subject. But we cannot know which theory is correct without engaging in empirical reasoning. So it seems that we cannot know how to assign prior probabilities without first engaging in empirical reasoning to determine which theory is true. This is problematic, since, on the theory of logical probability, we need prior probabilities to get any such empirical reasoning started.

The solution to this sort of problem is to introduce a further level of possibilities that is explanatorily prior both to a subject's emotional state and to a subject's cognitive state. This extra level is that of the possibilities with regard to the causal principles. The facts about the psychological laws are explanatorily prior to facts about subjects’ particular mental states, assuming a realist account of laws (per Section 3.4). Theories (i), (ii), and (iii) may be regarded as three competing theories about what these laws are like. We should therefore start by applying the Principle of Indifference at the level of these three theories, assigning each a prior probability of 1/3.21 We then assign a uniform prior probability distribution over the possible emotional states of a subject conditional on theory (i), a uniform distribution over the possible cognitive states of a subject conditional on theory (ii), and a uniform distribution over the Cartesian product of the possible emotional states and the possible cognitive states of a subject conditional on theory (iii). This gives us the prior probability distribution from which we may engage in empirical reasoning to decide how likely (i), (ii), and (iii) are in the light of our evidence. If we find (i) to be highly probable given the evidence, we will have empirically confirmed that emotional states are causally and thus explanatorily prior to cognitive states. It is thus that the Explanatory Priority approach accommodates some empirical reasoning about what has explanatory priority, given some other, a priori judgments about explanatory priority. In this case, we may start with a priori knowledge of the necessary truth that causal laws are explanatorily prior to particular matters of fact.

4.3 The probability of deterministic laws
The Laplacean probability function recommended in Section 3.3 lends support to Karl Popper's claim that the initial probability of any universal deterministic law is zero.22 For it seems that the hypothesis of a deterministic law—say, a law requiring A always to result from process X—is equivalent to the hypothesis that the objective chance of A's resulting from X is 1. That is one possible value of the objective chance, out of a continuous infinity of possibilities, so the prior probability of the objective chance taking on exactly that value is zero. Now, Bayes’ Theorem tells us, for any hypothesis h and evidence e:


Formula

If P(h) = 0, then P(h | e) = 0, for any possible evidence e that itself has a non-zero prior. Therefore, a hypothesis with zero initial probability can never be confirmed. And so it seems that deterministic laws can never be confirmed. On this view, if A were observed to result from process X a large number of times with no exceptions, we would have evidence only for the claim that the objective chance of A is very high—for instance, we might confirm that it is greater than 0.999—but not that it is exactly 1.

This conclusion strikes me as incorrect. While I would not wish to entirely rule out the hypothesis that the objective chance of A is some very high number less than one, I think that with a sufficient number of positive instances of A, in a sufficiently wide variety of circumstances, with no known exceptions, scientists would reasonably conclude that a deterministic law was operating.

It therefore seems to me that the Laplacean probability distribution requires modification. In particular, it seems to me incorrect to equate a deterministic law with a law that sets the objective chance of some outcome of a process to either 1 or 0.23 To see why, consider the following descriptions of two allegedly possible worlds:

In World 1, there is a deterministic law that process X must produce outcome A. Process X occurs an infinite number of times in World 1, and on exactly five of those occasions, A fails to result.

In World 2, there is a law that the probability of outcome A resulting from process X is 1. Process X occurs an infinite number of times in World 2, and on exactly five of those occasions, A fails to result.

World 1 is obviously logically impossible. World 2, however, is logically possible. In standard probability theory, that an event has probability zero is logically compatible with that event's actually occurring. (If an infinitely sharp dart is thrown so as to hit a random location on a dartboard, each geometric point on the board has zero probability of being hit, since it constitutes a measure-zero portion of the board; yet it is guaranteed that some point will be hit.) Likewise, A's failure to occur in some circumstance is logically compatible with its having a probability 1 of occurring in that circumstance. Indeed, A's failure to occur on five out of infinitely many trials does not even disconfirm the supposition that A has probability 1 of occurring on each occasion. After all, the relative frequency with which A fails to occur in World 2 matches the objective chance of A's failure to occur (assuming that we treat 5/{infty} as 0), just as we would expect. It is thus difficult to see what objection one could have to the possibility of World 2.

If World 1 is logically impossible while World 2 is logically possible, then a deterministic law to the effect that X must produce A must be something different from a law to the effect that X has probability 1 of producing A. A deterministic law entails a statement of physical probability, but no statement about physical probability entails any deterministic law. So deterministic laws say something more than probabilistic laws; they are not merely a special case of probabilistic laws. Once we recognize this, we need to modify our view of the space of possibilities in Example 1. Earlier, it appeared that the possibilities with regard to the laws governing process X can be adequately characterized by assigning a value between 0 and 1 inclusive of the objective chance of A's eventuating. But this leaves deterministic laws out of consideration, since no value assigned to this objective chance entails the presence of a deterministic law. To take account of the deterministic possibility, we may reason as follows. There are two fundamental alternatives: either the outcome of X is causally determined, or it is undetermined. Each of these possibilities has a prior probability of 1/2. If determinism holds, then either A is necessary or A is impossible, so each of these alternatives has a prior probability of 1/4. Finally, we assign a uniform probability density over all the possible indeterministic values of c, that is: {rho}(c) = 1/2, for 0 ≤ c ≤ 1.

This new proposed probability distribution is even more friendly to induction than Laplace's. In place of the Rule of Succession, it leads to the stronger inductivist conclusion:24



Formula

And it allows the deterministic hypothesis that A is causally necessary to be confirmed, in accordance with the formula25



Formula

For example, the observation of 97 As in a row confers a 98% probability on the universal law that all outcomes must be A. However, this probability distribution also gives the scarcely believable verdict that, after observing just one instance of A, one should have 83% confidence that the next iteration of X will produce outcome A as well.

We might try to correct the probability distribution in one or more ways. First, the supposition that, if determinism is true, then the objective chance of A is either 1 or 0 is mistaken: the truth of determinism means that some causally sufficient conditions, either for outcome A or for outcome B, are present in each iteration of process X. This does not entail that the conditions defining process X themselves contain such sufficient conditions; thus, the objective chance of outcome A, relative to the reference class of all instances of process X, may be non-extreme. (Analogously, though the outcomes of coin flips may well be determined in the actual world, it is not the case that the objective chance of a coin coming up heads on any flip is 1, nor is it 0; it is 1/2.) One would therefore need to assign probabilities, perhaps through another application of the Principle of Indifference, to the alternative hypotheses as to which possible conditions determine the outcome of X, and to the possible distributions of the potentially causally relevant initial conditions among instances of X, in order to determine the probability that c = 1 given determinism.

Second, we might think that, rather than just two fundamental alternatives— determinism and indeterminism—we should consider three fundamental alternatives: (i) the outcome of X is governed by deterministic laws, (ii) the outcome is governed by indeterministic laws, and (iii) the outcome is governed by no laws (which perhaps implies that the physical probability of A on each occasion is 1/2).

Each of these suggested corrections would preserve the possibility of confirmation for the universal deterministic law that all outcomes of X are A, while reducing the ease with which inductive conclusions are reached (that is, increasing the amount of evidence needed to reach a given level of probability, either for Ai+1 or for the hypothesis that A is necessary).

4.4 Changing chances
Thus far, I have treated the objective chance of a given outcome's resulting from a given process as fixed. A Humean skeptic might question this assumption. Perhaps there are no stable objective chances, either because objective chances in general do not exist, or because the objective chance of an event changes over time. It is no surprise, one may feel, that if we assume stable objective chances, we can thence reason to inductive conclusions from inductive evidence. But no inductive skeptic worth his salt will grant such an assumption. And if we must entertain the possibility that the objective chances may change unpredictably, it is far less obvious how inductive conclusions can be derived.26

In response, I offer a generalization on the result of Section 3.3 (leaving aside the complications introduced in Section 4.3 immediately above). Rather than assuming that there are stable objective chances, we may consider two hypotheses, S and ~S, where S is the hypothesis that there are stable objective chances; in particular, that the objective chance of outcome A's resulting from process X in example 1 is some fixed number between 0 and 1 inclusive. As we saw in Section 3.3, given S, the probability of Ai+1 given Ui is (i + 1)/(i + 2). That is,


Formula 6

(6)
What is the probability of Ai+1 given Ui, on the assumption that ~S? The correct answer to this is unclear, but let us take the most pessimistic, skeptical assumption: let us suppose that, given ~S, inductive evidence is completely irrelevant to predictions about the future, and thus that


Formula 7

(7)
(This would be true if, for example, objective chances changed in a completely unpredictable way at every instant. Equation (7) thus represents the most extreme skeptical position.)

The overall probability of Ai+1 given Ui is given by the equation


Formula 8

(8)
The respective probabilities of S and of ~S given Ui can be calculated using Bayes’ Theorem:


Formula 9

(9)


Formula 10

(10)
Given our pessimistic, skeptical assumption about ~S, we should assess Formula as (1/2)i. Given our earlier derivation, assuming S, of the Laplacean probability distribution, we should assess P (Ui | S ) as 1/(i + 1). Thus:


Formula 11

(11)


Formula 12

(12)
Finally, we can solve for P(Ai+1 | Ui) by substituting the right-hand sides of Equations (11) and (12) into Equations (9) and (10) to obtain expressions for P(S | Ui) and P(~S | Ui), then substituting those expressions, as well as the right-hand sides of Equations (6) and (7), into Equation (8). Omitting the intermediate algebra and letting ‘s stand for P (S ), the result is


Formula 13

(13)

Equation (13) gives the probability of the inductive prediction Ai+1 as a function of the number, i, of consecutive positive instances observed and the initial probability, s, assigned to the hypothesis of stable objective chances. We therefore need no longer assume that objective chances are stable; we may instead treat that as a hypothesis to which we assign some initial probability, and we may track the effects of inductive evidence both upon that hypothesis (using Equation (9)) and upon the inductive prediction Ai+1 (using Equation (13)).

What does Equation (13) tell us about the effects of inductive evidence? When s is zero, the right-hand side of Equation (13) reduces to 1/2, for all values of i. When s is nonzero, the right-hand side of Equation (13) takes on its minimum value of 1/2 when i = 0, and increases monotonically with i, approaching 1 as i approaches infinity (as i increases without bound, the term containing 2i+1 dominates both the numerator and the denominator). Thus, provided P (S ) != 0, Equation (13) yields the general inductivist thesis articulated in Section 1, namely that P(Ai+1 | Ui) > P(Ai+1), for all i > 0.

Quantitatively, the inductive support is impressive, for most values of P (S ). Suppose that, in a skeptical mood, we assign an initial probability of only one in one million to the hypothesis of stable objective chances for process X and outcome A. Even in this case, P(Ai+1 | Ui) exceeds 95% by the time i = 30 (see Figure 1). The only way to avoid inductivism is to set P (S ) = 0, thus in effect laying claim to a priori absolute certainty that no stable objective chances exist—a stance most unbecoming a skeptic.


Figure 1
View larger version (36K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1 Probability of an inductive prediction as a function of the number of observed positive instances, assuming a 0.000001 initial probability of the existence of stable objective chances.

 
4.5 Scruples concerning a priori probability
Some philosophers would object to the notion of the a priori probability of a proposition that I have relied upon throughout. They would say that to assign probabilities to propositions, one must always have at least some empirical evidence. In the absence of any relevant evidence, one should say that the probability of a hypothesis is simply unknown or indeterminate.

It is important first to clarify this idea. If the sort of probability one has in mind is physical probability, then it is very plausible to claim that one must have some empirical evidence to assess the probability of an event or proposition. On the other hand, if one has in mind epistemic probability, it is not obvious why one's ignorance of empirical facts should in principle interfere with one's assignment of probabilities. An epistemic probability assessment is simply an assessment of how much reason one currently has for believing some proposition. It is unclear why one's ability to say how much reason one presently has for believing something should require empirical information beyond the knowledge of what one's current mental state is. A shortage of empirical information may prevent one from describing the physical facts about some process—but it should not prevent one from describing the state of one's information itself. According to the motivation for the Principle of Indifference suggested in Section 2.1, in assigning a uniform probability distribution over some set of alternatives, one is merely claiming that one has as much reason to believe any of the alternatives as to believe any of the others. That is compatible with the claim that one has exactly no reason to believe any of the alternatives. Indeed, the latter claim would seem if anything to support the former, rather than undermining it as the present objection supposes.

Those who press the present line of objection, then, either reject the notion of epistemic probability, or claim that, despite what I have said, one cannot assess how much reason one has for believing a proposition without specific empirical evidence bearing on that proposition (see Achinstein [1995] for a defense of the latter view).

It seems to me that the latter position is more problematic than is commonly recognized. Consider two fundamental principles that are true of conditional probabilities on the standard conception. First: if, in the light of some evidence e (where e may include relevant background knowledge), h has a certain probability of being true, then that probability is P(h | e). I have stated this principle in such a way as to leave open the possibility that h may fail to have any probability relative to some sets of evidence. Perhaps, as an empiricist would wish to maintain, when e contains no evidence relevant to h, then h has no probability relative to e. Nevertheless, most empiricists would accept that on some occasions, a proposition has a probability in the light of our evidence; for example, in the light of my current knowledge, the probability that the next coin I flip will come up heads is 1/2. I assume then that, where e includes all of my current actual evidence relevant to coin flips, P(the next coin I flip will come up heads | e) = 1/2.

Second, according to the axiom usually taken to define conditional probabilities, for any hypothesis h and evidence e, P(h | e) = P(h & e)/P(e). On the standard view of conditional probability, if P(h | e) exists, then the terms on the right-hand side of that equation must have values. If either P(h & e) or P(e) is indeterminate, inscrutable, undefined, or the like, then it seems to follow that P(h | e) is also indeterminate, inscrutable, undefined, or the like.27 This is true regardless of what h and e are. It is true even if e is the richest set of empirical information you care to consider. Thus, if it is in general impossible to assign the unconditional probability of any contingent proposition—perhaps because such probabilities are always indeterminate or inscrutable—then it seems that it is also impossible to assign the probability of any contingent proposition in the light of any evidence whatsoever. No amount of data collection, in other words, will help us overcome the initial inscrutability, because the assessment of that data—what it supports, and how strongly—is relative to an initial probability distribution.28

I think this point is often overlooked because it is imagined that one can assign probabilities of events simply by looking at the observed relative frequencies in a large number of trials—we naively, and quite wrongly, are inclined to think that the need or relevance of a prior probability distribution will be overcome if we only take a sufficiently large sample.29 One way to see the wrongness of this is simply to recall some of the prior probability distributions we have discussed above. The skeptical distribution, the Laplacean distribution, and the distributions discussed in Section 4.3 above, all differ on what the probability of Ai+1 is, and they continue to disagree no matter how much data comes in. That is, they differ on the value of P(Ai+1 | Ui) for every i.30

Now, if the opponent of a priori probabilities were to propose some specific rule for assigning probabilities in the light of a given amount of evidence, we could look at the results this rule delivers to see which (if any) prior probability distributions it agreed with. (If the rule does not agree with the outcome of conditionalization starting from any prior probability distribution, then the rule is probabilistically incoherent.) For instance, suppose one proposes that, for all i ≤ 1,000, P(Ai+1 | Ui) is indeterminate, but thereafter, P(Ai+1 | Ui) = (i + 1)/(i + 2). Then one is agreeing with the results of the Laplacean distribution, though restricting one's agreement to cases where i > 1,000. It is hard to see how one could have a coherent rationale for such a view. For if one initially objects to the alleged arbitrariness of selecting Laplace's distribution a priori over other equally coherent probability distributions, then shouldn't one equally object to the arbitrariness of selecting the Rule of Succession for cases where i > 1,000 over other, equally coherent rules, such as those that would agree with the results of some of the alternative, equally coherent prior probability distributions just mentioned? The same objection would apply whatever rule one adopts—any scruples about privileging one prior probability distribution over the other, equally coherent distributions ought to apply equally to privileging one rule for assigning posterior probabilities over other, equally coherent rules. Any rationale for considering one rule to be uniquely correct will likely constitute a rationale for considering the prior probability distributions that agree with that rule to be objectively preferable to all other prior probability distributions.

The initial motivation for the scruples about a priori probability is clear enough: devising a comprehensive and precise set of general principles delimiting a priori probabilities has proven extremely difficult, so much so that one might be forgiven for doubting that such principles exist. However, if we accede to this doubt, we must also surrender to skepticism: if there are no a priori epistemic probabilities, then there are no epistemic probabilities whatsoever. One therefore could not say—no matter what evidence one had—that a scientific theory, or any other inductive conclusion, was likely to be true.

I have offered a first step toward a solution of this problem—a step, that is, toward articulating the principles governing a priori objective probabilities. In brief, in the absence of evidence favoring one alternative over another, one should assign equal epistemic probabilities to the explanatorily basic alternatives. A great deal of additional work needs to be done, both in attempting to apply this principle to problem cases, and in seeking additional principles governing a priori probabilities. Nevertheless, the approach can be shown in simple examples to yield inductivist probability distributions and to rule out inductive skepticism. This suggests that a hybrid Explanationist-Bayesian treatment along these general lines may offer the most promising avenue of attack on the problem of induction.

I owe thanks to Christian Lee, three anonymous referees for this journal, and most of all Michael Tooley for many insightful and invaluable comments on the manuscript.


    Notes
 TOP
 Abstract
 1. A Probabilistic Formulation...
 2. A Problem with...
 3. Explanationist Relief for...
 4. Problems and objections
 Notes
 References
 
1 Hume ([1975], pp. 25–39) and Popper ([1961Go], pp. 27–30) endorse inductive skepticism. Pierce ([1932], pp. 470–1), Wittgenstein ([1981], 5.15), and Keynes ([1921Go], pp. 56–7) endorse what I refer to in the text below as the skeptical probability distribution. Back

2 Stove ([1986Go], pp. 51–4) argues that inductive skepticism is probabilistically incoherent, given some assumptions regarding non-extreme prior probabilities. My characterization of inductive skepticism in the text enables the skeptic to avoid Stove's argument. Back

3 Objective Bayesians introduce constraints beyond the axioms (i) that the probability of any proposition must be greater than or equal to zero, (ii) that the probability of a tautology must be 1, (iii) that P(A {vee} B) = P(A) + P(B) whenever A and B are mutually exclusive, and (iv) that P(A & B) = P(A) x P(B | A). I lack space here to discuss the more popular, subjective variety of Bayesianism (see Howson and Urbach [1989]; de Finetti [1974]). Back

4 Fumerton ([1995Go], p. 215) discusses this example. Back

5 The Laplacean distribution is equivalent to Carnap's ([1962], pp. 562–77) m* measure, leading to his recommended confirmation function, c*. Carnap ([1962Go], pp. 567–8) derives a general formula that gives the Rule of Succession as a special case when families of two atomic predicates are considered. Unfortunately, Carnap later took back his support for c* ([1980], pp. 110–9). Back

6 Consider a case in which a kind of event, C, regularly causes A followed by B, where A and B are not causally related to each other. In such a case, the occurrence of A might well raise the probability of B's occurring due solely to its raising the probability that C has occurred. In my view, A would also be explanatorily prior to B. Yet A would not explain B. This shows that some further relation between A and B is required for explanation beyond explanatory priority and the probabilistic relation. Back

7 Principles (1) and (2) can come into conflict if backward causation is possible. In such a case, I believe that principle (1), that causal priority implies explanatory priority, would take precedence; however, I hold that backward causation is not possible. Back

8 Entangled particles in quantum mechanics may be thought to provide a counterexample to this rule, because the state of an entangled system does not supervene on the states and arrangement of the particles composing the system. But this need not be seen as a counterexample, for two reasons: First, the wave function may be seen as itself a component of the system, distinct from the particles, as in Bohm's interpretation of quantum mechanics. The properties of the system as a whole would then be fully explicable in terms of the properties of the parts (including the wave function). Second, the claim that A is explanatorily prior to B does not in general entail that B supervenes on A. Explanatory priority is merely one component in a good explanation; A's being explanatorily prior to B does not entail that A fully explains B. Back

9 I thank Christian Lee for this example of explanatory priority. Back

10 Facts may be taken as the primary relata of explanatory priority relations. A proposition p may be said to be explanatorily prior to a proposition q when the fact that p (or, in the event that p is false, the fact that ~p) is explanatorily prior to the fact that q (or the fact that ~q). Back

11 The notion of a ‘good’ explanation is partly epistemic; it is close to that of a satisfying explanation. Notably, A may be in fact the correct explanation of B without A's being a very good explanation of B—consider a case in which the actual causal history of B involves a highly complex and improbable sequence of coincidences. A description of that causal history would correctly (truthfully) explain B, yet it would not be satisfying as an explanation. Meanwhile, a more simple and elegant hypothesis, better supported by our available evidence, might offer a more satisfying explanation of B and yet be false. This simply illustrates the fallibility of inference to the best explanation.

As Michael Tooley has pointed out (personal communication), there may even be cases in which A correctly explains B despite A's lowering the probability of B: suppose there are probabilistic laws of nature, that A has a 50% chance of probabilistically causing B, but that, due to A's interfering with other potential causes of B, the occurrence of A actually lowers the probability of B's occurrence overall. On a given occasion, A together with a description of the relevant probabilistic laws might correctly explain B. Nevertheless, this would not be a ‘good’ explanation of B. Back

12 Exclusiveness and exhaustiveness should be understood probabilistically, i.e., we may call h1 and h2 mutually exclusive iff P(h1 & h2) = 0 (even if h1 does not logically contradict h2). Similarly, the hi are jointly exhaustive iff P(h1 {vee} h2 {vee} Formula ) = 1. Back

13 By a ‘variable’ here, I mean a certain sort of property, namely, a determinable that may take on any of a continuous set of determinate values. I treat variables as intensional entities; thus, x and y may be distinct variables even if the same objects have both properties, and even if the values of x and y are always equal. ‘Partitions’, on the other hand, are extensional: a partition on a set is just a set of (mutually exclusive, jointly exhaustive) subsets of the given set. Back

14 Piaget ([1969], Chapter 2) claims that children in fact acquire the concept of velocity before acquiring the concepts of duration or time. Back

15 Bayes ([1763Go], scholium) takes this approach. Back

16 Lewis ([1994Go]) defends a view along these general lines, taking inspiration from Hume. Back

17 Compare Dretske's ([1977], p. 262) argument that mere regularities do not explain their instances. Back

18 This explains and vindicates the intuition, often pressed by advocates of inference to the best explanation (Dretske [1977], p. 267; Foster [1982–3], pp. 91–2; Armstrong [1983], pp. 52–9), that non-realist views about causation or laws engender inductive skepticism. Among such advocates, Tooley ([1987Go], p. 135) comes closest to justifying the intuition in a probabilistic framework. Back

19 I discuss this issue in my ([2009]), where I argue that the simplicity of a hypothesis is typically correlated with its likelihood, P(e | h), in relation to evidence that it accommodates. Back

20 As one referee has pointed out, this qualification, together with the claims of Section 3.4, implies that, if one justifiably believes a Humean account of causation and laws, then one is justified in accepting the skeptical probability distribution, and therewith inductive skepticism. Conversely, if, as I believe, inductive skepticism is unjustified, then a Humean account of causation and laws is unjustified. Back

21 This solution is a simplification, given the simplified nature of the problem. The precise approach would be to list all the possibilities with regard to what the relevant causal laws are, and assign a uniform prior probability distribution across this set of possibilities; since in reality more than three possibilities could be enumerated, a more complex probability function than the one here mentioned would be required. The solution stated in the text is intended to illustrate qualitatively the general approach. Back

22 Popper ([1961Go], pp. 363–8) claims that the initial probability of any universal law applying to an infinite population is zero. Carnap ([1980], p. 145) recognizes as a problem that his system of inductive logic generates this result for all values of {lambda}. Back

23 I owe the following argument in the text to Michael Tooley (personal communication), though he may not endorse it. Back

24 P(Ui) here is calculated by the formula, P(Ui) = P(Ui | A is necessary) x P(A is necessary) + P(Ui | A is impossible) x P(A is impossible) + Formula where P(A is necessary) = 1/4, P(Ui | A is necessary) = 1, P(A is impossible) = 1/4, P(Ui | A is impossible) = 0, {rho}(c) = 1/2, and P(Ui | C = c) = ci. Back

25 This formula is derived starting from the equation, P(A is necessary | Ui) = Formula , where P(A is necessary) is 1/4 and P(Ui | A is necessary) is 1. P(Ui) is calculated as in the previous note. Back

26 This objection is due to an anonymous referee for this journal. Back

27 Hájek ([2003Go]) poses powerful objections to this standard view, which I lack space to discuss here. To give some of the flavor of the objections, let Q be a logically possible proposition with zero initial probability. Intuitively, it seems that P(Q | Q) = 1, even though P(Q & Q)/P(Q) = 0/0 is undefined.

Hájek takes conditional probability as primitive, arguing that conditional probabilities may exist when the relevant unconditional probabilities do not. Nonetheless, his view provides no way of determining the value of P(h | e) in the present context, so opponents of a priori probability should still regard P(h | e) as indeterminate or inscrutable. Back

28 This argument applies to the claim that initial probabilities are entirely indeterminate. If one accepts that the initial probabilities of propositions are confined to non-extreme ranges, though perhaps lacking perfectly precise values, and one posits certain nice properties of acceptable probability distributions, then one will typically find posterior probabilities confined to narrower ranges, as the convergence theorems of Savage ([1954Go]) and Hawthorne ([1993Go], [1994]) show. But I take it that principled opponents of a priori probability would see no reason to accept these assumptions. Back

29 One version of this mistake is to endorse ‘the straight rule’, the idea that the probability of a given outcome's occurring on the next of a series of trials should be reckoned equal to the relative frequency with which that outcome has occurred in the previous trials. See Carnap ([1980], pp. 85–6) for brief but effective criticisms of this rule. Back

30 The convergence theorems often invoked by Bayesians (Savage [1954], pp. 46–50; Hawthorne [1993], [1994]) require ruling out at the start some coherent probability distributions, including the skeptical distribution. In addition, they fail to guarantee convergence for any given, finite amount of data. For any finite set of data, and any desired degree of divergence, there exist prior probability distributions, satisfying the stipulations of the convergence theorems, such that the desired amount of divergence will exist after conditionalizing on the given data, starting from those prior probabilities. Back


    References
 TOP
 Abstract
 1. A Probabilistic Formulation...
 2. A Problem with...
 3. Explanationist Relief for...
 4. Problems and objections
 Notes
 References
 

    Achinstein P. Are Empirical Evidence Claims A Priori? British Journal for the Philosophy of Science (1995) 46:447–73.[Abstract]

    Armstrong D. M. What Is a Law of Nature? (1983) Cambridge: Cambridge University Press.

    Bayes T. An Essay Towards Solving a Problem in the Doctrine of Chances. Philosophical Transactions of the Royal Society (1763) 53:370–418.[CrossRef]

    Carnap R. Logical Foundations of Probability (1962) 2nd edition. Chicago: University of Chicago Press.

    Carnap R. A Basic System of Inductive Logic, Part II. In: Studies in Inductive Logic and Probability—Jeffrey R., ed. (1980) 2. Berkeley: University of California Press. 7–155.

    de Finetti B. Theory of Probability—Machi A., Smith A., eds. (1974) London: John Wiley.

    Dretske F. I. Laws of Nature. Philosophy of Science (1977) 44:248–68.[CrossRef][Web of Science]

    Foster J. Induction, Explanation, and Natural Necessity. Proceedings of the Aristotelian Society (1982–3) 83:87–101.

    Fumerton R. Metaepistemology and Skepticism (1995) Lanham, MD: Rowman & Littlefield.

    Hájek A. What Conditional Probability Could Not Be. Synthese (2003) 137:273–323.[CrossRef][Web of Science]

    Harman G. The Inference to the Best Explanation. Philosophical Review (1965) 74:88–95.[CrossRef][Web of Science]

    Hawthorne J. Bayesian Induction Is Eliminative Induction. Philosophical Topics (1993) 21:99–138.

    Hawthorne J. On the Nature of Bayesian Convergence. PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association (1994) 1:241–9.

    Howson C., Urbach P. Scientific Reasoning: The Bayesian Approach (1989) 1st edition. Chicago: Open Court.

    Huemer M. When Is Parsimony a Virtue? Philosophical Quarterly (2009) 59:216–36.[CrossRef][Web of Science]

    Hume D. Enquiry Concerning Human Understanding. In: Enquiries Concerning Human Understanding and Concerning the Principles of Morals—Selby-Bigge L. A., Nidditch P. H., eds. (1975) 3rd edition. Oxford: Clarendon Press.

    Keynes J. M. A Treatise on Probability (1921) London: Macmillan.

    Laplace P. S. Philosophical Essay on Probabilities—Dale A., ed. (1995) New York: Springer.

    Lewis D. A Subjectivist's Guide to Objective Chance. In: Philosophical Papers (1986) 2. Oxford: Oxford University Press. 83–113.

    Lewis D. Humean Supervenience Debugged. Mind (1994) 103:473–90.[CrossRef][Web of Science]

    Lipton P. Inference to the Best Explanation (2004) 2nd edition. London: Routledge.

    Niiniluoto I. Defending Abduction. Philosophy of Science (1999) 66(Suppl.):S436–51.[CrossRef][Web of Science]

    Okasha S. Van Fraassen's Critique of Inference to the Best Explanation. Studies in History and Philosophy of Science (2000) 31:691–710.[CrossRef]

    Piaget J. The Child's Conception of Time—Pomerans A. J., ed. (1969) New York: Basic Books.

    Pierce C. S. Collected Papers of Charles Sanders Peirce—Hartshorne C., Weiss P., eds. (1932) 2. Cambridge, MA: Harvard University Press.

    Popper K. R. The Logic of Scientific Discovery (1961) New York: Science Editions.

    Popper K. R., Miller D. A Proof of the Impossibility of Inductive Probability’. Nature (1983) 302:687–8.[CrossRef][Web of Science]

    Savage L. J. The Foundations of Statistics (1954) New York: John Wiley.

    Stove D. C. The Rationality of Induction (1986) Oxford: Clarendon Press.

    Tooley M. Causation: A Realist Approach (1987) Oxford: Clarendon Press.

    van Fraassen B. Laws and Symmetry (1989) Oxford: Clarendon Press.

    Wittgenstein L. Tractatus Logico-Philosophicus. Translated by Ogden C. K. (1981) London: Routledge.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
60/2/345    most recent
axp008v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Huemer, M.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?