In situations where the sample space is continuous we will follow the same procedure as in the previous section. Thus, for example, if \(X\) is a continuous random variable with density function \(f(x)\), and if \(E\) is an event with positive probability, we define a conditional density function by the formula \[f(x|E) = \left \ < \matrix< f(x)/P(E), & \mbox \ .\] We can think of the conditional density function as being 0 except on \(E\), and normalized to have integral 1 over \(E\). Note that if the original density is a uniform density corresponding to an experiment in which all events of equal size are then the same will be true for the conditional density.
In the spinner experiment (cf. Example 2.1.1]), suppose we know that the spinner has stopped with head in the upper half of the circle, \(0 \leq x \leq 1/2\). What is the probability that \(1/6 \leq x \leq 1/3\)?
Here \(E = [0,1/2]\), \(F = [1/6,1/3]\), and \(F \cap E = F\). Hence \[\begin \\ &=& \frac \\ &=& \frac 13\ ,\end In the dart game (cf. Example 2.2.2), suppose we know that the dart lands in the upper half of the target. What is the probability that its distance from the center is less than 1/2? We return to the exponential density (cf. Example 2.2.7). We suppose that we are observing a lump of plutonium-239. Our experiment consists of waiting for an emission, then starting a clock, and recording the length of time \(X\) that passes until the next emission. Experience has shown that \(X\) has an exponential density with some parameter \(\lambda\), which depends upon the size of the lump. Suppose that when we perform this experiment, we notice that the clock reads \(r\) seconds, and is still running. What is the probability that there is no emission in a further \(s\) seconds? Let \(G(t)\) be the probability that the next particle is emitted after time \(t\). Then \[\begin \\ & = & \frac If \(E\) and \(F\) are two events with positive probability in a continuous sample space, then, as in the case of discrete sample spaces, we define \(E\) and \(F\) to be independent if \(P(E|F) = P(E)\) and \(P(F|E) = P(F)\). As before, each of the above equations imply the other, so that to see whether two events are independent, only one of these equations must be checked. It is also the case that, if \(E\) and \(F\) are independent, then \(P(E \cap F) = P(E)P(F)\). In the dart game (see Example 4.1.12, let \(E\) be the event that the dart lands in the upper half of the target (\(y \geq 0\)) and \(F\) the event that the dart lands in the right half of the target (\(x \geq 0\)). Then \(P(E \cap F)\) is the probability that the dart lies in the first quadrant of the target, and \[\begin P(E \cap F) & = & \frac 1\pi \int_ 1\,dxdy \\ & = & \mbox\,(E\cap F) \\ & = & \mbox\,(E)\,\mbox\,(F) \\ & = & \left(\frac 1\pi \int_E 1\,dxdy\right) \left(\frac 1\pi \int_F 1\,dxdy\right) \\ & = & P(E)P(F)\end\] so that \(E\) and \(F\) are independent. What makes this work is that the events \(E\) and \(F\) are described by restricting different coordinates. This idea is made more precise below. In a manner analogous with discrete random variables, we can define joint density functions and cumulative distribution functions for multi-dimensional continuous random variables. Let \(X_1,~X_2, \ldots,~X_n\) be continuous random variables associated with an experiment, and let \( = (X_1,~X_2, \ldots,~X_n)\). Then the joint cumulative distribution function of \(\) is defined by \[F(x_1, x_2, \ldots, x_n) = P(X_1 \le x_1, X_2 \le x_2, \ldots, X_n \le x_n)\ .\] The joint density function of \(\) satisfies the following equation: \[F(x_1, x_2, \ldots, x_n) = \int_<-\infty>^ \int_<-\infty>^ \cdots \int_<-\infty>^ f(t_1, t_2, \ldots t_n)\,dt_ndt_\ldots dt_1.\]\bar> It is straightforward to show that, in the above notation, \[ f(x_1, x_2, \dots \dots , x_n) = \frac<\partial^nF(x_1,x_2, \dots \dots, x_n)><\partial x_1\partial x_2 \cdots \partial x_n)>\] Let \(X_1\), \(X_2\), …, \(X_n\) be continuous random variables with cumulative distribution functions \(F_1(x),~F_2(x), \ldots,~F_n(x)\). Then these random variables are if \[F(x_1, x_2, \ldots, x_n) = F_1(x_1)F_2(x_2) \cdots F_n(x_n)\] for any choice of \(x_1, x_2, \ldots, x_n\). Thus, if \(X_1,~X_2, \ldots,~X_n\) are mutually independent, then the joint cumulative distribution function of the random variable \( = (X_1, X_2, \ldots, X_n)\) is just the product of the individual cumulative distribution functions. When two random variables are mutually independent, we shall say more briefly that they are Using Equation 4.4, the following theorem can easily be shown to hold for mutually independent continuous random variables.\bar> Let \(X_1\), \(X_2\), …, \(X_n\) be continuous random variables with density functions \(f_1(x),~f_2(x), \ldots,~f_n(x)\). Then these random variables are mutually independent if and only if \[f(x_1, x_2, \ldots, x_n) = f_1(x_1)f_2(x_2) \cdots f_n(x_n)\] for any choice of \(x_1, x_2, \ldots, x_n\) In this example, we define three random variables, \(X_1,\ X_2\), and \(X_3\). We will show that \(X_1\) and \(X_2\) are independent, and that \(X_1\) and \(X_3\) are not independent. Choose a point \(\omega = (\omega_1,\omega_2)\) at random from the unit square. Set \(X_1 = \omega_1^2\), \(X_2 = \omega_2^2\), and \(X_3 = \omega_1 + \omega_2\). Find the joint distributions \(F_<12>(r_1,r_2)\) and \(F_(r_2,r_3)\). We have already seen (see Example 2.13 that \[\begin F_1(r_1) & = & P(-\infty < X_1 \leq r_1) \\ & = & \sqrt, \qquad \mbox \,\,0 \leq r_1 \leq 1\ ,\end\] and similarly, \[F_2(r_2) = \sqrt\ ,\] if \(0 \leq r_2 \leq 1\). Now we have (see Figure \(\PageIndex\)) \[\begin F_<12>(r_1,r_2) & = & P(X_1 \leq r_1 \,\, \mbox\,\, X_2 \leq r_2) \\ & = & P(\omega_1 \leq \sqrt \,\,\mbox\,\, \omega_2 \leq \sqrt) \\ & = & \mbox\,(E_1)\\ & = & \sqrt \sqrt \\ & = &F_1(r_1)F_2(r_2)\ .\end\] In this case \(F_<12>(r_1,r_2) = F_1(r_1)F_2(r_2)\) so that \(X_1\) and \(X_2\) are independent. On the other hand, if \(r_1 = 1/4\) and \(r_3 = 1\), then (see Figure \(\PageIndex\)) \[\begin F_(1/4,1) & = & P(X_1 \leq 1/4,\ X_3 \leq 1) \\ & = & P(\omega_1 \leq 1/2,\ \omega_1 + \omega_2 \leq 1) \\ & = & \mbox\,(E_2) \\ & = & \frac 12 - \frac 18 = \frac 38\ .\end\] Now recalling that \[F_3(r_3) = \left \ < \matrix< 0, & \mbox\,\,r_3 < 0, \cr (1/2)r_3^2, & \mbox\,\,0 \leq r_3 \leq 1, \cr 1-(1/2)(2-r_3)^2, & \mbox \,\,1 \leq r_3 \leq 2, \cr 1, & \mbox \,\,2 < r_3,\cr>\right.\] (see Example 2.14, we have \(F_1(1/4)F_3(1) = (1/2)(1/2) = 1/4\). Hence, \(X_1\) and \(X_3\) are not independent random variables. A similar calculation shows that \(X_2\) and \(X_3\) are not independent either.12> Although we shall not prove it here, the following theorem is a useful one. The statement also holds for mutually independent discrete random variables. A proof may be found in Rényi. 17 Let \(X_1, X_2, \ldots, X_n\) be mutually independent continuous random variables and let \(\phi_1(x), \phi_2(x), \ldots, \phi_n(x)\) be continuous functions. Then \(\phi_1(X_1),\) \(\phi_2(X_2), \ldots, \phi_n(X_n)\) are mutually independent. Using the notion of independence, we can now formulate for continuous sample spaces the notion of independent trials (see Definition 4.5). A sequence \(X_1\), \(X_2\), …, \(X_n\) of random variables \(X_i\) that are mutually independent and have the same density is called an independent trials process As in the case of discrete random variables, these independent trials processes arise naturally in situations where an experiment described by a single random variable is repeated \(n\) times. We consider next an example which involves a sample space with both discrete and continuous coordinates. For this example we shall need a new density function called the beta density. This density has two parameters \(\alpha\), \(\beta\) and is defined by \[B(\alpha,\beta,x) = \left \< \matrix< (1/B(\alpha,\beta))x^(1 - x)^, & <\mbox In medical problems it is often assumed that a drug is effective with a probability \(x\) each time it is used and the various trials are independent, so that one is, in effect, tossing a biased coin with probability \(x\) for heads. Before further experimentation, you do not know the value \(x\) but past experience might give some information about its possible values. It is natural to represent this information by sketching a density function to determine a distribution for \(x\). Thus, we are considering \(x\) to be a continuous random variable, which takes on values between 0 and 1. If you have no knowledge at all, you would sketch the uniform density. If past experience suggests that \(x\) is very likely to be near 2/3 you would sketch a density with maximum at 2/3 and a spread reflecting your uncertainly in the estimate of 2/3. You would then want to find a density function that reasonably fits your sketch. The beta densities provide a class of densities that can be fit to most sketches you might make. For example, for \(\alpha > 1\) and \(\beta > 1\) it is bell-shaped with the parameters \(\alpha\) and \(\beta\) determining its peak and its spread. Assume that the experimenter has chosen a beta density to describe the state of his knowledge about \(x\) before the experiment. Then he gives the drug to \(n\) subjects and records the number \(i\) of successes. The number \(i\) is a discrete random variable, so we may conveniently describe the set of possible outcomes of this experiment by referring to the ordered pair \((x, i)\). We let \(m(i|x)\) denote the probability that we observe \(i\) successes given the value of \(x\). By our assumptions, \(m(i|x)\) is the binomial distribution with probability \(x\) for success: \[m(i|x) = b(n,x,i) = You are in a casino and confronted by two slot machines. Each machine pays off either 1 dollar or nothing. The probability that the first machine pays off a dollar is \(x\) and that the second machine pays off a dollar is \(y\). We assume that \(x\) and \(y\) are random numbers chosen independently from the interval \([0,1]\) and unknown to you. You are permitted to make a series of ten plays, each time choosing one machine or the other. How should you choose to maximize the number of times that you win? One strategy that sounds reasonable is to calculate, at every stage, the probability that each machine will pay off and choose the machine with the higher probability. Let win(\(i\)), for \(i = 1\) or 2, be the number of times that you have won on the \(i\)th machine. Similarly, let lose(\(i\)) be the number of times you have lost on the \(i\)th machine. Then, from Example 4.16 the probability \(p(i)\) that you win if you choose the \(i\)th machine is \[p(i) = \frac <<\mboxExample \(\PageIndex\):
Solution
Example \(\PageIndex\):
Solution
Independent Events
Example \(\PageIndex\):
Joint Density and Cumulative Distribution Functions
Definition \(\PageIndex\)
Independent Random Variables
As with discrete random variables, we can define mutual independence of continuous random variables.
Definition \(\PageIndex\)
Theorem \(\PageIndex\)
Example \(\PageIndex\):
Theorem \(\PageIndex\)
Independent Trials
Definition
Beta Density
Example\(\PageIndex\)
Example \(\PageIndex\): (Two-armed bandit problem)
Exercises
Exercise \(\PageIndex\)