Probability Mass Functions and Density Functions In An Experimental Design

Afza.Malik GDA

Probability Calculation and Research Design 

Probability mass functions and density functions In An Experimental Design , Reading a pdf, Probability calculations.

Probability mass functions and density functions In An Experimental Design , Reading a pdf, Probability calculations.

A probability mass function ( pmf ) is just a full description of the possible outcomes and their probabilities for some discrete random variable. In some situations it is written in simple list form, eg,

where f(x) is the probability that random variable X takes on value x, with f(x)=0 implied for all other x values. We can see that this is a valid probability distribution because each probability is between 0 and 1 and the sum of all of the probabilities is 1.00. In other cases we can use a formula for f(x), eg

which is the so-called binomial distribution with parameters 4 and p. It is not necessary to understand the mathematics of this formula for this course, but if you want to try you will need to know that the exclamation mark symbol is pronounced “factorial” and r! represents the product of all the integers from 1 to r. As an exception, 0! = 1

    This particular pmf represents the probability distribution for getting x “successes” out of 4 “trials” when each trial has a success probability of p independently. This formula is a shortcut for the five different possible outcome values. 

    If you prefer you can calculate out the five different probabilities and use the first form for the pmf . Another example is the so-called geometric distribution, which represents the outcome for an experiment in which we count the number of independent trials until the first success is seen. The pmf is:

and it can be shown that this is a valid distribution with the sum of this infinitely long series equal to 1.00 for any value of p between 0 and 1. This pmf cannot be written in the list form. (Again the mathematical details are optional. )

    By definition a random variable takes on numeric values (ie, it maps real experimental outcomes to numbers). Therefore it is easy and natural to think about the pmf of any discrete continuous experimental variable, whether it is explanatory or outcome. 

    For categorical experimental variables, we do not need to assign numbers to the categories, but we always can do that, and then it is easy to consider that variable as a random variable with a finite pmf . Of course, for nominal categorical variables the order of the assigned numbers is meaningless, and for ordinal categorical variables it is most convenient to use consecutive integers for the assigned numeric values.

“Probability mass functions apply to discrete outcomes. A pmf is just a list of all possible outcomes for a given experiment and the probabilities for each outcome.”

    For continuous random variables, we use a somewhat different method for summarizing all of the information in a probability distribution. 

    This is the probability density function (pdf), usually represented as “f(x)”, which does not represent probabilities directly but from which the probability that the outcome falls in a certain range can be calculated using integration from calculus. (If you don't remember integration from calculus, don't worry, it is OK to skip over the details.) remember integration from calculus, don't worry, it is OK to skip over the details.)

    One of the simplest pdf's is that of the uniform distribution, where all real numbers between a and b are equally likely and numbers less than a or greater than b are impossible. The pmf is:

    In this formula R dx means that we must use calculus to carry out integration Note that we use capital X for the random variable in the probability statement because this refers to the potential outcome of an experiment that has not yet been conducted, while the formulas for pdf and pmf use lower case x because they represent calculations done for each of several possible outcomes of the experiment. 

    Also note that, in the pdf but not the pmf , we could replace either or both ≤ signs with < signs because the probability that the outcome is exactly equal to t or u (to an infinite number of decimal places) is zero.

So for the continuous uniform distribution, for any a ≤ t ≤ u ≤ b,

    You can check that this always gives a number between 0 and 1, and the probability of any individual outcome (where u=t) is zero, while the probability that the outcome is some number between a and b is 1 (u=a, t=b). You can also see that, eg, the probability that X is in the middle third of the interval from a to b is 1 3 , etc. 

    Of course, there are many interesting and useful continuous distributions other than the continuous uniform distribution. Some other examples are given below. Each is fully characterized by its probability density function.

Reading a pdf

    In general, we often look at a plot of the probability density function, f(x), vs. the possible outcome values, x. This plot is high in the regions of likely outcomes and low in less likely regions. The well-known standard Gaussian distribution (see 3.2) has a bell-shaped graph centered at zero with about two thirds of its area between x = -1 and x = +1 and about 95% between x = -2 and x = + 2. But a pdf can have many different shapes.

    It is worth understanding that many pdf's come in “families” of similarly shaped curves. These various curves are named or “indexed” by one or more numbers called parameters . 

    For example that family of Gaussian (also called Normal) distributions is indexed by the mean and variance (or standard deviation) of the distribution. The t-distributions, which are all centered at 0, are indexed by a single parameter called the degrees of freedom. The chi-square family of distributions is also indexed by a single degree of freedom value. The F distributions are indexed by two degrees of freedom numbers designated numerator and denominator degrees of freedom.

    In this course we will not do any integration. We will use tables or a computer program to calculate probabilities for continuous random variables. We don't even need to know the formula of the pdf because the most commonly used formulas are known to the computer by name. Sometimes we will need to specify degrees of freedom or other parameters so that the computer will know which pdf of a family of pdf's to use .

    Despite our heavy reliance on the computer, getting a feel for the idea of a probability density function is critical to the level of understanding of data analysis and interpretation required in this course. 

    At a minimum you should realize that a pdf is a curve with outcome values on the horizontal axis and the vertical height of the curve tells which values are likely and which are not. The total area under the curve is 1.0, and the under the curve between any two “x” values is the probability that the outcome will fall between those values.

    For continuous random variables, we calculate the probability that the outcome falls in some interval, not that the outcome exactly equals some value. This calculation is normally done by a computer program which uses integral calculus on a “probability density function.” 

Probability calculations

    This section reviews the most basic probability calculations. It is worth while, but not essential to become familiar with these calculations. For many readers, the boxed material may be sufficient. You won't need to memorize any of these formulas for this course.

    Remember that in probability theory we don't worry about where probability assignments (a pmf or pdf) come from. Instead we are concerned with how to calculate other probabilities given the assigned probabilities. Let's start with calculation of the probability of a "complex" or "compound" event that is constructed from the simple events of a discrete random variable.

    For example, if we have a discrete random variable that is the number of correct answers that a student gets on a test of 5 questions, ie integers in the set {0, 1, 2, 3, 4, 5}, then we could be interested in the probability that the student gets an even number of questions correct, or less than 2, or more than 3, or between 3 and 4, etc. 

    All of these probabilities are for outcomes that are subsets of the sample space of all 6 possible “elementary” outcomes, and all of these are the union (joining together) of some of the 6 possible “elementary” outcomes. In the case of any complex outcome that can be written as the union of some other disjoint (non-overlapping) outcomes, the probability of the complex outcome is the sum of the probabilities of the disjoint outcomes. To complete this example look at Table 3.1 which shows assigned probabilities for the elementary outcomes of the random variable we will call T (the test outcome) and for several complex events.

 Disjoint addition rule

You should think of the probability of a complex event such as T.

Post a Comment


Give your opinion if have any.

Post a Comment (0)

#buttons=(Ok, Go it!) #days=(20)

Our website uses cookies to enhance your experience. Check Now
Ok, Go it!