ABOUT THE BOOK INTRODUCTION TO 
PROBABILITY AND  
Probability theory has a very long history dating back to the 
seventeenth century. It is a well-established branch of 
mathematics that has applications in every area of human STOCHASTIC PROCESSES
discipline and daily experiences.
WITH APPLICATIONS
This is an introductory textbook dealing with probability and 
stochastic processes. It is designed for undergraduate and 
postgraduate students in Statistics, Mathematics, the Physical 
and Social Sciences, Engineering and Computer Science. It Marginal Distribution of pH
presents a thorough treatment of probability and stochastic 
ideas and methods necessary for a firm understanding of the 
subject. The text can be used in a variety of course lengths, 1
levels, and areas of emphasis. ----------1/ \ p 7 S
_ 0 0 28125 0.J2S125
0.J3984J75I
 / ^ -- ----T
The material is divided into three parts. The first part covers V
Seffetl
basic probability topics for undergraduate students. The second 
part covers advanced probability topics that are of interest to 
postgraduate students while the third part deals with topics in Thr function P"".
stochastic processes that are taught both at undergraduate and 
postgraduate levels.
Very little statistical background is assumed in order to obtain 
full benefits from the use of the text. Also, numerous examples 
and practice questions are included to aid understanding of all 
the subject areas covered by the book.
The publication of this book is a demonstration of our 
commitment to the provision o f relevant and current materials 
for Statistics students in higher institutions of learning.
FASCO PUBLISHERS
ISBN 978- 9 78- 52890- 0 - 8
UNIVERSITY OF IBADAN LIBRARY
ilN l k o d u c t i o n  t o  p r o b a b i l it y  AND STOCHASTIC PROCESSES 
(WITH APPLICATIONS)
FOREW ARD
(c) 2014 by Shittu, Olanrewaju 1., Otekunrin, Oluwaseun O., 
This impressive book by Shittu O.I., Otekunrin O. A., Udomboso C. G., and 
Udomboso, Christopher G., Adepoju, Kazeem A.
Adepoju K A. (o f the Department o f Statistics, University o f Ibadan, Nigeria) 
encompasses the essence o f probability, and stochastic processes under a 
hirst Published: May, 2014 common shade. The authors are to be commended for their lucid presentation 
as well as their broad coverage o f  the subject matter.
All rights reserved.
A special feature o f the book is that the exercises and the test form an 
Vo part o f  this publication may be reproduced or transmitted in any form integrated pattern. These exercises are designed to encourage the student to 
or by any means, electronic or mechanical, including photocopy, reread the text, practice them and become thoroughly familiar with the 
recording, or any information storage or retrieval system, without techniques described. This will help in impressing on the student the methods 
permission in writing from  the publisher. and logic o f  establishing the techniques.
Student involved in statistics-oriented discipline, and professional statistician in 
Fasco Publishers, 67 Gbadebo Street, Mokola. Ibadan. a wide variety o f fields will find this book a highly useful volume for study, 
Tel: 08032934309, 08051036056 and application. This is a scholarly undertaking by the four authors, and l have 
a full appreciation for the job nicely done.
ISBN 9 7 8 - 9 7 8 - 52890 - 0 - 8 G.N. Amahia (Ph.D)
Professor o f  Statistics 
University o f Ibadan 
Pnnted in Ibadan by Fasco Printing Works Ibadan.
iii
UNIVERSITY OF IBADAN LIBRARY
P R E F A C E T A B L E  o f  c o n t e n t s
Probability theory has a very long history dating back to the seventeenth 
Foreward
century. It is a well-established branch o f mathematics that has applications in iii
every area o f  human discipline and daily experiences. Preface
iv
1 his is an introductory textbook dealing with probability and stochastic Contents v
processes. It is designed for undergraduate and postgraduate students in Chapter 1 - The Mathematics of Choice
Statistics. M athematics, the physical and social sciences, engineering and 1.0 Introduction
2
computer science. It presents a thorough treatment o f probability and stochastic 1.1 Fundamental Principle of Counting
ideas and methods necessary for a firm understanding o f the subject. The text 1.2 Permutation
can be usedin a variety o f  course lengths, levels, and areas o f emphasis. 1.3 Combination
1.4 Stirling Numbers of the Second Kind 16
Hie material is divided into three parts. The first part covers basic probability 1.5 Allocation and Matching Problems 28
topics for undergraduate students. The second part covers advanced probability Chapter 2 - Elements of Probability
topics that are o f  interest to postgraduate students while the third part deals 2.1 Introduction
with topics in stochastic processes that are taught both at undergraduate and 2 2 Definition of Terms and Concepts 8̂
postgraduate levels. 2.3 The Approaches to the definition of Probability
Very little statistical background is assumed in order to obtain fiillbenefits from 2.4 Probability of an event
the use o f the text. Also, numerous examples and practice questions are 2.5 Consequences of Probability Axioms
included to aid understanding o f all the subject areas covered by the book. 2.6 Rules of Probability
2.7 Venn Diagrams
2.8 The Principle of Inclusion and Exclusion
The publication o f  this book is a demonstration o f  our commitment to the •4
2.9 Conditional Probability and Independence 55
provision o f relevant and current materials for Statistics students in higher 2.10 Statistical Independence
institutions o f  learning o f  the authors.
( haptcr 3 - Conditional Probability and Bayes Theorem ()1
This text which cannot be said to be exhaustive was developed from the years 3.1 Conditional Probability
3.2 Independence
o f learning and teaching o f probability and stochastic processes. While we 
3.3 Bayes Theorem ^
claim responsibility for some errors that could have been made inadvertently in 3.4 Total Probability ^
this first edition, we welcome comments and objective criticisms from the users 67
o f  this hook.
iv v
UNIVERSITY OF IBADAN LIBRARY
Chapter 7 - Probability Generating Functions (PGf) 1-42
7.1 Introduction 142
Chapter 4 - Fundamentals of Probability Functions 78 7.2 Properties of PGF 142
4.1 Introduction 78 7.3 Probability Generating Functions approach for deriving Means and Variances
4.2 Probability Density Function (pdf) 78 of some Discrete Distributions 144
4.3 Distribution Function 82 7.4 Binomial Distribution 144
4.4 Distribution Function for Discrete Random Variables 85
4.5 Joint Distribution Function 87 Chapter 8 - Moment Generating Functions 147
4.5.1 Conditional Distribution of Jointly Distributed Random Variables 89 8.1 Moment Generating Function 152
4.6 Independence of Functions of Random Variables 90 8.2 m.g.f for Bivariate Distribu:ion 155
4.7 Functions of Random Variables 97 8.3 Obtaining Moments from m.g.f 156
Chapter 5 - Some Discrete Probability Distributions 102 Chapter 9 - Characteristic Functions 158
5.1 Bernoulli Trials and Binomial Distribution 102 9.1 The Characteristic Function (c.f.) 158
5.2 Binomial Distribution 103 9.2 Exponential Distribution 159
5.3 Poisson Distribution 107 9.3 Gamma Distribution 160
5.4 Properties of a Poisson Experiment 107 9.4 Characteristics Function of the Sum of Independent Random Variables 166
5.5 Mean and Variance of a Poisson Distribution 108 9.5 Some Special Probability Distribution 167
5.6 The Poisson Distribution as an Approximation to the Binomial Distribution 110 9.6 The Inversion Formula 169
5.7 Hypcrgeomctric Distribution 111
5.8 Mean and Variance of Hypergeometric Distribution 113 Chapter 10 - Measurable Function 175
5.9 Binomial Distribution as an approximation to the 1 Iypergeometric Distribution 116 10.1 Some definitions 175
5.10 Negative Binomial and Geometric Distributions 118 10.2 Obtaining Countable Class of Disjoint 1 SO
5.11 Negative Binomial Distribution 118 10.3 Abstract Model for Probability of an Event 175
5.12 Geometric Distribution 120 10.4 Axiom for Finite Probability Space 176
5.13 Multinomial Distribution 124 10.5 The Halley-De-Moivre Theorem 184
10.6 Probability Space 187
Chapter 6 - Some Continuous Probability Distributions 127 10.7 Sigma-Field (a -  Field) 189
6.0 Introduction 127 10.8 Borel Field 190
6.1 Normal Distribution 127 10.9 Random Variable in Measure Space 191
6.2 Exponential Distribution 128
6.3 Gamma Distribution 130 Chapter 11 - Limit Theorems and Law of Large Numbers 199
6.4 Pareto Distribution 134 11.1 Introduction 199
6.5 Maxwell Distribution 137 11.2 Concept of Limit 199
11.3 Markov’s Inequality 200
11.4 Bienayme-Chebyshev’s Inequality 201
vii
vi
UNIVERSITY OF IBADAN LIBRARY
11.5 Convergence of Random Variables 204 17.5 Compound Distributions 259
11.6 Laws of Large Number 204 17.6 Markov Chain 260
17.7 Stationarity Assumption 271
Chapter 12 - Principles of Convergence and Central limit Theorem 215 17.8 Absorbing Markov Chain 272
12.1 Introduction 215
12.2 Convergence of Random Variable 215 Chapter 18 - Equilibrium (Steady State) and Passage Time Probabilities 285
12.3 Cauchy-Schwarz Inequality 218 18.1 Introduction 285
12.4 Borel-Cantclli Lemma 219 18.2 Graph of Marginal Distribution of 286
12.5 The Central Limit Theorem 221 18.3 Stationary Distribution 288
12.6 The Central Limit Theorem 221 18.4 First-Passage and First-Return Probabilities 291
12.7 Strong Law of Large Numbers for Independent Random Variables 222 18.5 Distribution of Number of Steps for First Passage 293
12.8 Bolzano-Cauchy Criterion for Convergence 226 15.6 First Return (Recurrence) 294
12.9 First Borel-Cantelli Lemma 226
12.10 Second Borel-Cantelli Lemma 227 Chapter 19 - Chapman-Kolmogorov Equations and Classification of States 297
12.11 The Zero-One-Law 229 19.1 Introduction 297
12.12 Limit Theorems for Sums of Independent R.V’s Lindcberg-Levy Theorem 232 19.2 Classification of States 300
19.3 Discrete Time Process 303
Chapter 13 - Introduction to Brownian Motion 236 19.5 Continuous Time Process 310
13.1 Brownian Motion (Weiner Process) 236 19.6 The Exponential Process 312
13.2 Brownian Process 237
13.3 Multinomial Distribution and Gaussian Process 237 Chapter 20 - Introduction to the Theory of Games and Queuing Models 319
13.4 Properties of a Brownian motion (B. M) 239 20.1 Games Theory 319
20.2 Gambler’s Ruin 319
Chapter 16 - Introduction to Stochastic Processes 246
20.3 Queuing Theory 331
16.1 Basic Concepts 246 20.4 The Basic Queuing Process 335
16.2 Discrete-Time Markov Chains 247 20.5 Poisson Process and Exponential Distribution 336
16.3 Classification of General Stochastic Processes 249
20.6 Classification of Queuing System 337
16.4 Classical Type of Stochastic Processes 249 20.7 Poisson Queues 338
16.5 Markov Processes 251
Chapter 17 - Generating Functions and Markov Chains 254
17.1 Introduction 254
17.2 Basic Definitions and TailP robabilities 254
17.3 Moment-Generating Function 256
17.4 Convolutions 258
viii ix
UNIVERSITY OF IBADAN LIBRARY
PART ONE
UNIVERSITY OF IBADAN LIBRARY
C H A PT E R  1 This is because \A\ +  |S | counts every element o f A D B twice. Lei us illustrate this 
T H E  M A T H E M A T IC S OF C H O IC E with the following example.
1.0 Introduction
Many real life situations requires enumerating the number of possible ways of taking Example 2: If A =  (2.3, 4, 5, 6). |/1| = 5 and B = ( 3,4, 5 ,6 ,7). |/?| =  5 
a number of decisions out of many available ones, or the number of ways an event can then. |/1| + \B\ = 10 
occur, number o f possible outcomes of an experiment. All o f the above require the act A U B =  2 ,3 ,4 ,5 ,6 ,7  
of either counting, choosing, arranging or a combination of the above. Therefore, it is \A U B \ - 6
just apt to introduce the reader first, to some basic principles of counting. Since A and B are not disjoint. \A U B\ < \A \ 4- \B\
Compensating for this double counting yields the formula
1.1 Fundamental Principle of Counting j/1 U B | = |/1| + \B\ — \A n  B | ................eqn.U)
If one experiment can result in n possible outcomes an experiment can result in k 1-rum our example, A fl B = 3, 4 ,5 ,6  
possible outcomes, then nk is the total number of possible outcomes from the two |i4 n  B\ = 4 
experiments. \A u B \  = 5 + 5 - 4  
Consider a finite sequence of decisions. Suppose the number of choices for each =  6
individual decision is independent of decisions made previously in the sequence. Thus proving equation (1)
Then, the number o f ways to make the whole sequence o f decisions is the product of Theorem 1:\A U B U C\ =  \A\ + |S | + |C| - \ A n B \ -  |/1 n  C \ - \ B n C \  +
these numbers o f choices i.e. n! \A D B n  C\ for three sets A. B and C.
Proof:
Example 1: The number of four-letter words that can formed by rearranging the We know from equation (1) above that \A U B\ = \A \ 4- \B\ — \A n  B\
letters in the word PLAN is 4! =24. Then, for 3 sets. \A U B U C\ = \A U [B U C]|
PLAN PLNA PALN PANL PNLA PNAL = \A\ 4- \B UC| -  |/1 n  [B UC]|
LPAN LPNA LAPN LANP LNPA LNAP Applying equation (1) to \B U C| gives
APLN APNL ALPN ALNP AN PL ANLP \A U S u  C| =  \A\ +  [\B\ + \C\ -  |S n C |] - \ A n  [B U C ]|........... eqn (2)
NPLA NPAL NLP A NLAP NAPL NALP Because A n  [S U Cl =  {A D S ) U (/I n  C). we can apply equation(l) again to obtain
\A n  |S  u  CJ| = \A n  B\ + \A r\ C\ -  \A r\ B r\ C |........... eqn (3)
1.1.2 The Second Counting Principle (The Principle of Inclusion and Exclusion) Finally, a combination o f equations (2) and (3) yields^
If a set is the disjoint union of two (or more) subsets, then the number o f elements in \A U S U C| =  [\A\ + \B\ +  |C|] -  [\A n  S | +  \A D C| 4- |S  n  C|] +  \A n  B n 
the set is the sum of the numbers of elements in the subsets, i.e. C |........... eqn (4)
n(A U S) = n(/l) +  n (S ) implying that \A U B\ = \A\ 4- |S | if A and B arc disjoint. 
Theorem \:\A U B\ <  |/1| 4- \B\ if A and B are not disjoint.
2
UNIVERSITY OF IBADAN
LIBRARY
When we subtract the sum of the number o f elements in such pairwise intersections, 
Thus proving theorem 2. some elements may have been subtracted more than once. Those are the elements that 
From this derivation, we notice that an element o f A n  B n  C is counted 7 times in belong to at least three of the sets/1;. W!e add the sum of the elements of intersections 
equation(4). the First 3 times with a plus sign, then 3 times with a minus sign and then of the sets taken three at a time. (Mote: the condition i < j  < k ensures that every 
once more with a plus sign. intersection is counted only once)
Example 1.1: IF/l =  {1, 2 ,3 ,4}B =  {3, 4, 5, 6} C = {2,4, 6, 7} then The process continues with sums being alternately added or subtracted until we come 
A U fiU C  =  {1,2,3,4,5,6, 7} to the last term which is the intersection o f all sets A, thus proving the theorem.
|/1 U B U C\ = 7 ...........................(a) Let S = A  U A2 U ... .U A , and A f  = S\Ai lhen the PIE principle can also be 
expressed as
Ml = 4 
\B 1 = 4  Mic n ..... n  A ,c| =
\C 1 = 4
MI +  M I+ IC I =  12 +1l<k/<n |A  n  A l M i H A n / i J  +
/ i n e  =  3 ,4 , / i n c  = 2 , 4 , e n c  = 4,6 
-  ( - i ) " +1 In /ifl
In this example. |/1 n  B\ = \A n  C\ = \B n C| =  2 so that Example 1.2: Let A be the subset ol the first 700 hundred numbers
|/1 n  /?| +  |/1 n  CI + \B r c I =  6 and S  — {1.2,..... ,700} that are divisible bv7. Find the number o f elements in 5 that are
/ i n e n c  =  4 ,| 4 n sn c |  =  i not divisible by 7.
Iherel'orc. |/1 U /? U C| =  |/1| +  |Z?| +  |C| — |/ ln /? | — |/ l n C | — \B n  C\ + Solution:
\A n  B n C \ A =  {7,14,21 ,28 ,35 ,42 ,49.......}
=  1 2 - 6  + 1 =  7 ........................ (b) Ml =  ioo 
Thus, (a) = (b) thus establishing theorem 2. Mcl = \s\ -  Ml
Generally, the Principle of Inclusion and Exclusion (PIE) states that: =  7 0 0 -  ioo  
If A - A ..... ,A„ are Finite sets, the cardinality of their union =  600
IA  U A2 U ....U 4 n | =
Example 1.3: Find the number o f integers from 1 to 1000 that are not divisible by 5, 
n 6 and 8
Y M i l  -  y  \Ai n Aj\ +  V  |A n Aj n Ak\ -■ ■ ■  +  ( - i ) " +1 |n A I
4-J £—l \£i<j<n t—tl<i<j<k<n Solution: Let A - A . A  be the subset consisting of those integers that are divisible 
Proof: by 5. 6 and 8. Fhe number we are interested in is
On the left is the number o f elements in the union o f n sets. On the right, we First |A\ n  A [ n  A J| = 1000 -  |/J,| -  | A | -  |A | + |A  n  A2\ + 14, n  A2\ + |A  n  A | -
count elements in each o f the sets separately and add them up. If the sets A  are not | A n  A  n  A |
disjoint, the elements that belong to at least two of the sets A , or the intersections 
1 . 1 I 1000 1 ^  1000
A  n A , are counted more than once. We wish to consider every such intersection, but M, = 1------=200 |.A| = = 166 \AX\ = 100
L ^ J L  b = 125s ^
each only once. Since A, n At = Aj n  A , vve should consider only pairs (A , A )  vv't*1 
i < / .
4
UNIVERSITY OF IBADAN LIBRARY
Noli-. 1 lie results lor |,1,|. |.4,| and |# 3| were achieved by using the round down.
Definition 3: The number o f different ways to choose r  things from n things with 
notation [_ J which involves the dropping of the fractional part. replacement if  order does not matter is C(r -I- n -  1, /-)
To compute the number in a 2 and 3 -  set interaction, we use the least Example 1.5: How many different three letter “words” can be produced using the 
“alphabet” ALEXY?
common multiple (LCM) i.e.
1000 Solution: Since there are no restrictions on the number of times a letter can be used, 
K  °  A?\ = = j  j 53 =  125 such words can be formed.30
Example 1.6: At a fundraising luncheon, each o f 50 patrons is given a numbered
1000 = 25 ticket, while its duplicate is placed in a bowl from which prize-winning numbers will 
40 be drawn.
1000
|.-1,0/1, | = = 41 i) If' the prizes are #50, #100, and #150, how many outcomes are possible 24
assuming that winning tickets are not returned to the bowl.
1000
M = ii) If the prizes are the same, say, #70 each for the 3 prizes, how many outcomes 12
are possible assuming that winning tickets are not returned to the bowl?
Thus. |.< n .4 j[  = 1000-200-166-125 + 33 + 25 + 41 -8  = 600 iii) If the winning tickets are returned to the bowl, how many outcomes are 
possible when the prizes are as in (i) and (ii) respectively?
Important remarks:
Definition 1: The number of ordered selections o f r  elements chosen from an n- Solution:
element set is P(n, r). i) P (50 ,3) =  117600 different outcomes are possible
P (n ,r ) = (?i -  l) (n  -  2 ).... (n -  [?~ -  1]) ii) Since the prizes are the same, then order is not important implying that there 
=  n(/i -  l)(?t -  2 ).... (?i — r  +  1) are C(50,3) =  19600 different possible outcomes.
n! iii) a) Different prizes, with replacement, order matters: 503 =  125000 
b) Same prizes, with replacement, order does not matter:
=  r! C(n, r ) C(3 + 50 -  1,3) =  22100
Example 1.4: Suppose 6 members of Adeola’s School Parent Teachers Association 
Note: In choosing with replacement, elements may be chosen more than once. If order
meet to select a 3-member delegation to represent the association at a statewide 
does not matter, then we are only concerned with the multiplicity with which each 
convention. If the laws stipulate that each delegation be comprised of a delegate, a element is chosen.
first alternate and a second alternate. How many ways can the 6 members comply fable 1.1 summarizes the four scenarios that we have considered.
from among themselves?
Solution: P (6,3) =  120 ways or 3! C(6 , 3) =  120 ways
Definition 2: The number of ways o f making a sequence of r decisions each of which 
has n choices is n r if  order matters and the selection is with replacement.
6
7
UNIVERSITY OF IBADAN LIBRARY
Table 1.1
Order Matters O rder does not matter Example 1.8: How many different words of three letters can be formed with letters A, 
Without replacement P (n ,r) C(n, r) B. C. D, E and F no letter is repeated?
With replacement n r C (r +  n -  l , r ) Solution: The first letter can be arranged in 6 ways 
The second letter can be arranged in 5 ways 
1.3 Permutation: The ordered arrangement of n distinct items taking all or r of them The third letter can be arranged in 4 ways.
at a time is called permutation. The items are usually assumed to be arranged on a line Total number o f arrangement is 6 x 5 x 4 x 3.
without replacement such that if two of the r objects are interchanged, it results into Alternatively
different permutation (arrangement). 6p3 =  =  6 x 5 x 4  = 120 ways.
The number of permutation— o f n (nn
 ! items taking r at a time is denoted-r)! (A) Permutation of n things, not all of which are distinct.n P r fhc number o f permutations o f// things taking all at a time where p  of them 
This is the same as the number of ways, in which r spaces can be fill taking n are alike o f one kind, q are alike of another kind and r  alike of the third kind is
7l!
different items at a time. p'.q'.rl
The first place is filled in n way, the second (n  -  1) ways ... and r place if filled in 
Example 1.9: In how many ways can the letters o f the word STATISTICS be 
(n +  r  + 1) ways. This r places is filled in (a -  l ) ( n  -  2) ... (?i — l r  +  1)! ways. arranged.
••• vtpr =  n (n  — l) (n  -  2) ... (n -  r — 1)
Solution:
The number of permutations of n distinct items taking all at a time is 
T occurs 3 times 
n (n  -  l ) (n  -  2) ... 3.2.1. =  n! ways 
I occurs 2 times 
The symbol n! is called m factorial and we define 0! =  1.
s occurs 3 times.
So the number o f possible arrangement is 
Example 1.7: Evaluate SPs 
^ =  50400 ways.
Solution:
r = _(5i-3i )_1 (B) When certain things always o r never occur:
Sx4x3\x2\ (i) s item will always occur: Given n items to arrange taking r at a lime out of 
~  2 ! which S of them will always occur, keep aside the S  items and arrange the 
= 5 x 4 x 3  remaining (n -  s) items taking (r — s) at a time.
=  60 ways The S items can be arranged taking S’ at a time in rPs ways.
Example 1.4: If 12Pr =  17160, find/-. The total number o f permutations is n -  SPr_^X rPs.
Solution: 13Pr = 13(12)(11) ... (12 — r  +  1) = 13 (12)(11)(10) (ii) Never occur: Leave out the S  items and find the number of permutation of (n -  s) items taking r at a time. i.e.
=  13 - r  ••• r  =  3
8
9
UNIVERSITY OF IBADAN
LIBRARY
Example 1.10: A committee of 7 representative of a class consists o f class captain Solution:
and his deputy. On a visit to the Head-teacher there are four seats. How many ways If the two books are to stand together, regard the two books as one. then the number 
can the committee be seated it: of arrangement is 219Fg =  2 x 9 !  =  72560 ways. Number of arrangement without 
(i) there is no restriction restriction is P10 = 10! =  3628800 ways so the permutation when the two books are 
(ii) the class captain and his deputy must sit. not to stand together is
(iii) one o f the students committed a crime and can not sit down even if there are 1 0 1 -2 x 9 !
enough seats, and =  3628800 -7 2 5 7 6 0  
(iv) determine the probability of the event in (ii) and (iii) above. =  2903040 ways
Example 1.12: Letters o f the word “ARRANGE' are to be arranged. Find the 
Solution: probability if:
(i) When there is no restriction (i) two r’s do not occur together
n  =  7, r  =  4 (ii) if the two R’s and two A’s do occur together
7P4 =  =  7 x 6 x 5 x 4  =  840 ways
Solution:
(ii) keep aside the class captain and his deputy:
4p! ( n - 2 ) P r . 2 =  5/>2X4P2 =  ^ , ^ (i) Without restriction, number of arrangement's ~  =  1260 ways. When two
=  5 x  4 x  3 x 12 R 's occur together is ^  =  360 way when two R’s do not occur together is 
=  12 x  60 1 2 6 0 -3 6 0  =  900 ways.
=  720 ways P (two R 's not occur together) =  = 0.714
1260
(iii) Leave out the criminal then we have (ii) If two R’s and two A’s do occur together we have (A, A) (R. R) N G. E i.e.,
P5 =  5! =  120 ways.
= 6 x 5 x 4 x 3  (D) When the number of items not occurring together is more than two
= 360 ways Some kind o f logic would have to be applied here. It is better illustrated with 
an example.
(C) Perm utation when two things are not to occur together:
Procedure Example 1.13: In how many ways can 5 blue cars and 4 red cards be arranged in a 
straight car park two red cars are not to stand together.
(a) Find permutation without restriction
(b) Find permutation when two things occur together. Solution: First, the first 5 cards are positioned as indicated below 
(c) The difference between (a) and (b) gives the number of arrangement when two X B X B X B X B X B X
things do not occur together. The blue cars can be arranged in 5! ways. Now there are 6 vacant positions (marked 
Example 1.11: In how many ways can 10 different books be arranged on a shelf if X). The remaining 4 red cars can be arranged in P4 =  360 ways. The required 
two particular books are not to stand together? number o f ways o f parking 5 blue cars and 4 red cars is 5! X P4
10 11
UNIVERSITY OF IBADAN LIBRARY
The remaining 4 digits can be fitted in
=  120 x  360 P4 = 4! =  6 ways
=  43200 ways So, the total number o f numbers divisible by 2 =  24 x 2 =  48.
(E) When Items are repeated: ( d ) Suppose we have to form numbers which begin with 1 and end with 3. Here 
The number o f permutation of n different items taking r at a time, when each the first and the last places are fixed.
item may occur an number of times is nr . Then, the remaining 3 digits can be filled in.
Example: 1.14: A die is rolled 4 times what is the sample space.
1 2  4 5 3 
Solution: 1 2  5 4 3
A die has six faces, hence may occur in 6 ways. 1 4  2 5 3
The sample space is 1 4 5 2 3 = 6 ways
64 =  1296 1 5  2 4 3 
(F) Formation of numbers with digits: 1 5  4 2 3
The idea of permutation can be applied in the formation of numbers with digits. This (e) Suppose we have to form a number where 1 or 3 is in the beginning or the end. 
is particularly useful in a raffle draw. Let us illustrate with a simple case. Then the two digits can be arranged among themselves in 2! ways. Hence total 
number o f arrangement will be P3X 2 =  12 ways.
Example 1.15: Suppose the five digits 1, 2. 3, 4. 5 are given. To find the total number (0 Suppose we have to form numbers greater than 30,000. Here there should be 3 
of numbers which can be formed under different conditions. or 4 or 5 in ten thousand’s place which can be filled in 3 ways.
(a) Without restriction =  P5 =  5! =  120 ways. The remaining 4 digits tilled in 4! ways.
(b) Suppose 5 always occur in the tenth place. Now the tenth place is fixed, then Therefore, we have, i.e.
the remaining four places can be fitted with four digits as l \  = 4! =  24 ways. i.e. 3 1 2  4 5 
1 2 3 4 5 2 1 3  5 4 3 2 1 4 5 etc
1 2 4 5 3 2 1 4  5 3 i.e.. total number o f numbers
3 X P4 
1 3 2 5 4  2 3 1 5 4  x 2 = 2 4  ways = 3 X 2 4  = 72
1 3  4 5 2 2 3 4 5 1 Example 1.16: How many numbers can be formed with digits 1.2. 4, 0, 5 when any 
1 4 3 5 2  2 4 1 5 3 is not repeated in any number?
1 4  2 5 3 2 4 3 5 1 Solution: There are 5 digits in all including zero. The number of single digit numbers 
is Px. The number of two digit number is P2. Out of this, some have zero in the tenth
(c) Suppose we have to form a number divisible by 2. Then the unit's place must 
be occupied by 2 or 4 which can be arranged in 2 ways.
13
12
UNIVERSITY OF IBADAN LIBRARY
Example 1.18: Suppose the letters of the word STAPLER is given to form 
place and so reduces to one digit number. Hence the number of two digit numbers is words.
P2 -  P\- Similarly, the nutrber of three digit number is P3 -  P2. (a) If there is no restriction, the number of words is
The total number o f numbers is P7 =  7! = 5040 words.
Px + (P2 ~ Pi) + t o  “  Pz) + t o  -  Ps) + t o  ~ PJ (b) Suppose all words to be formed begins with S. The remaining 6 places can be 
4 +  16 +  48 +  96 +  96 filled in 6! = 720.
260 numbers. (c) Suppose all words to be formed begins with S or ends with E. The two 
positions can be filled in P2 = 2 ways. The other 6 digits can be filled in 
Example 1.17: P6 = 6! =  70 ways.
(i) Find the sum of all the numbers that can be formed with digits 1, 3, 4, 7, 5, 9 
Hence total number of words is 2 x 120 = 240 words.
taking all at a time. (d) If all words formed must begin with S  and end with E. The two places are now 
(ii) Find the probability o f having a number with 3 in the tenth place.
fixed. Then the remaining 5 places can be filled in 5! = 120 ways. Hence, 120 
words are formed.
Solution: (e) Suppose two vowels A and E are to stand together. Regard A and E as one
(i) We need to consider when each digit occupy a particular place. The number of 
a, E, STPLR
permutation when 1 is in the unit place is Ps = 5! =  120. The number of 
STPLR can be arranged among themselves in 6! = 120 ways.
permutation when any o f the given numbers occupy the unit place is also 
The two vowels can be arranged in 2 ways.
5! =  120 ways. Hence we can sum all the numbers in the unit place a
Hence the total number of words is 2 x 120 = 240 words.
120(29)* 1 =  3 480*  1 (0 If three particular letter are to occupy the even places. The first letter can be 
Similarly the sum of numbers in the 10th place is also
filled in 3 ways, the second in 3 ways and the third in 1 way. a total of 6 ways.
120(1 + 3 + 4 +  5 + 7 + 9) =  2480 * 10
Then, the remaining 4 letters can be filled in 4! = 24 ways. Hence, the total 
=  34800 number of words is 6 x 24 = 144
In the same manner, the sum of all the numbers is
3480 (100,000 + 10,000 +  1,000 +  100 +  10 + 1) (H) Ordered:
=  3430 (111111) =  386666280 Arrangement of items round a circle:
(ii) The number o f numbers taking all at a time without restriction is Things can be arranged round a circle in (i) clockwise and (ii) anti- clockwise 
P6 = 6! = 720 direction.
The number o f numbers when 3 occupy the tenth place is P5 = 120 
Pr (a number 3 in the tenth ^p lace) =  —720  =  0.1667
(G) Formation o f words with letters:
This is similar to what we illustrated in Formation o fn umbers with digits.
15
14
UNIVERSITY OF IBADAN LIBRARY
Example 1.19: In how many ways can 7 people sit round a circular dinning tabic
=  “ (7 — 1)!
Example 1.21: !n how many ways can a committee o f 5 be selected from amongst 6
=  360 ways
boys and 7 girls; if the committee must consist o f (i) 2 boys and 3 girls, (ii) at most 3 
(i) The number o f arrangements when the direction (clockwise or anticlockwise) boys?
is specified is (n — 1)!. This is because one of the items can be used as a Solution: There are a total o f 13 persons.
starting point.
(i) The total number o f combination is 2 boys can be selected from 6 boys in (®) 
(ii) When the direction o f arrangement is not specified is ^ (n -  1)! ways. ways.
3 girls can be selected from 7 girls in Q  ways.
Example 2.17: How many ways can 20 different beads be arranged to form a 
necklace? Total number o f combination is
=  2 (n  — 1)‘ ( 2)  (3)  = 15x35 = 525 ways
=  ^ (19!) ways (ii) There could be 0, 1 ,2  and maximum of 3 boys. Hence the total number of 
combination is
Example 1.20: A round table conference is to be held by 10 persons such that 2 
particular person may wish to sit together. ® © +® ® +© ® +© Q
Solution: Regard the 2 people as one. We now have 9 persons. The two persons can =  21 +  210 +  525 +  420 
be arranged in 2! ways. The 9 persons can be arranged in (9 -  1)! ways. The total =  1176 ways
number of arrangement is Example 1.22: A box contains 20 balls all o f which are o f the same size. 15 o f them
8! x 2! = 80640 ways are Red and 5 Black balls. 4 balls are selected at random from the box, find the 
probability o f having:
1.4 Combination (i) exactly 2 black balls.
The number o f arrangement or ‘selection’ of n different items taking some or (ii) at least 1 red ball
all of the number o f things at a time irrespective o f the order is referred to as 
Solution:
combination.
The number o f combination n things taking rat a time is denoted by (i) The first thing to do is to find the combination o f any 4 balls out o f 20 (i.e. 
sample space) ( ^ ) -
(n-r)!r!
Most of the problems on selection without replacement can be solved using Number o f ways o f choosing 2 black from 5 is Q .
combination approach.
Number of ways o f choosing the remaining 2 from 15 red balls is ( ^ ) .
Number o f outcomes of favour of the event is © ( ? )
fSVlSX
16 P(2 black anti 2 red balls) =  - =  0.217
( 4 )
17
UNIVERSITY OF IBADAN LIBRARY
Example 1.24: A certain examination consists of 12 questions divided into two parts 
(ii) The probability o f having at least 1 red ball is of 6 questions each. How many ways can a student choose any 8 questions if he must 
( y x a + t v T / M y K M f l G ) attempt exactly 5 questions from the first part?"_   75 +  1050 +  2 2 7 5 +t 1)365 Solution: From the first part, questions are selected in ^  =  6 ways.4845 In the second part, 3 questions are selected = 20 ways.
=  0.983
The required number is ^  ^  =  120 ways.
Combination when a particular thing must be included or not included
(A)
The number o f ways o f choosing r things out of n in which k particular thing
(i) (B) When all items arc alike and each of them may be disposed off in 2 ways:
always occur is In this situation, the item may be included or rejected. The total number of ways of 
The number of ways of choosing r  things out o f n which k particular thing disposing all things is 2 x  2 x  ...x n times =  2n. This include a case where all the 
(ii)
. f n  -  k \  items are rejected.
never occur is y r ) I lencc. the total number of ways in which one or more things are included is 2” -  1. 
This is equivalent to Q )  +  ( n ^  j )  ------f  ( j )
Example 1.23: 15 players were invited for a crucial football match. In how many
ways can 11 players be chosen if
(i) the skipper must be included Example 1.25: In how many ways can a student solve one or more questions out of 8 
(ii) a particular player is injured and must not be included.
in a paper?
(iii) player A must be included and player B must not be included.
Solution: The student may either solve a question or leave it (i.e. 2 ways). The total 
number o f ways of solving one or two or all the questions is
Solution:
(i) If the skipper is selected first, we have 14 players left to select the remaining 01
10 players. = 2 5 6 - 1  
/’14\
The required number is = 1001 ways. = 255 ways
( Note:ii) Remove the injured player, now select 11 from the remaining 14 players.
If it must include a case where none of the questions is solved, then the required 
The required number is /■14\ = 364 ways. number is
(iii) If we remove B and select player A ©+(?Mz)+-+®=28
Then required number is { ^ )  =  286 ways. =  256 ways
Example 1.26: How many different products can be formed with the letters a. b. c. d, 
e a n d /
19
18
UNIVERSITY OF IBADAN LIBRARY
Solution: =  5.36 x 1028
Solution: The number of ways in which one or more of the six letter (b) If the group of 13 cards are to be arranged, in how many ways can this be 
=  26 =  1 done?
Rut this includes a single letter which is not a product. Hence, the number of products Solution:(-1032!1)4 =  1.28x 103°
i.e. 26 -  6 -  1 =  57.
Example 1.29: How many ways can 18 books be divided?
(C) When some items are alike and each of them can be disposed in a way: (i) equally or
Given n = [p + q + r  + s + — ] items out of which p. q, r, s of them are alike and (ii) in ratio 1:2:3
p  can be chosen in (p +  1) ways
q can be chosen in (c/ + 1) ways Solution:
/■ can be chosen in ( r  + 1) ways. (i) 18 books can be divided into 3 groups of 6 each. Then the required number is 
then the total number of combinations is (p + l)(q  +  l ) ( r  +  l ) ( s  + 1) — 1 ways. 18! =  17,153,138 ways
Example 1.27: How many factors has 2160? (ii) lo divide 18 books in ratio 1:2:3 each group would consist of 3. 6, 9 
respectively.
Solution: The factors of 2160 are i.e.
2160 =  1 6 x 2 7 x 5  Hence the required number is =  4,084,080 ways.
=  24x 33x 5 1 
But (h) Permutation and Combination Occurring Simultaneously
24 can be formed in 5 ways. Some problems require the application of the permutation and combination 
33 can be formed in 4 ways. approaches simultaneously. We shall give a theory which may be proved.
51 can be formed in 2 ways.
Hence the total number o f factors are 5 x 4 x 2 = 4 0 . I heorem: If there are in different things o f one kid, n different thins o f the 2nd kind 
and k  different things of the 3rd kind. The number of permutation which can he formed 
(D) When Sharing (Dividing) n items into different groups: containing ro f  the first..? of the second and/ of the third is
A number o f items can shared among a group of people equally or in given (?) * O x (j) * (r *s + D |
proportion.
Example 1.30: How many ways can 5 boys and 4 girls selected from among 12 boys 
(i) If n  =  p +  q + r  and p = q = r.
. M* and 9 girls be arranged on a bench?
Then the number of ways of sharing n things equally is
Solution: 5 boys are selected from 12 in ways.
(ii) [ fn  = p + q + r  and p *  q =£ r, then the number of ways o f sharing n things 
proportionally is ^ 4 girls are selected from 9 in Q  ways.
Example 1.28:
(a) In how many ways can a deck of 52 cards be shared among 4 players equally? 21
20
UNIVERSITY OF IBADAN LIBRARY
but the 9 people can be arranged among themselves in ap9 — 9! ways (F) Combination with repetition
Os2)  ( )   = 3.62* 10'° Sometimes we are interested in the number of combinations of items when The required number is each o f the items may be repeated. Given n items, the number of combinations 4 91 taking r at a lime then repetitions are allowed is denoted by nHr where
Example 1.31: „_ H (rn +=r - l()"( n ++ r’-- 2-) . .1.('n) =+ r--<n±q2LV r  /  (n+ r-l)!r!(a) How many words each containing 2 vowels and 3.consonants can be formed r! r - l )  ( n - l) n  
with 5 vowels and 8 consonants? _  ( n + r - l) ( nr!+ r - 2 ) .. .n  
(b) How many words can be formed if
Example 1.32: How many combinations of 4 digit numbers can be formed from the 
(i) ‘a' must be included digits 2.4. 5. 7. 8. 9 if the digits may be repeated at least once?
(ii) the words must contain at least two consonants?
Solution: There are 6 digits, to take any 4 at a time, the required number is 
6 Hr =  
Solution: r  \  4! =  JJ 4!L5!
(a) 2 vowels can be chosen from 5 in
=  126
Example 1.33:In an experiment, 2 dice are rolled once. Find the total number of 
3 consonants can be chosen from 8 in outcomes if
the 5 letters can be arranged among themselves in 5! ways. (i) they are distinct
The required number is (ii) they are of distinguishable
( 2) ( 3)  5! =  5 6 0 x  120 Solution
(b) “a’ is a vowel = 67200 ways. On a single die there are 1,2, 3. 4, 5, 6 (6 numbers)
(i) if ‘a’ must be included, we need one more vowel. The required number is (i) If they are distinct, the total number of outcomes is 62 = 36
( ^ ( 3)  5! =  33600 ways (ii) If they are not distinguishable, then any number on the die may be repeated. 
If the word must contain at least 2 consonant, then it could contain 2 or more I Icnce the required total number o f outcomes is(ii)
consonants. =  21.
The required number is M ultinomialCoe Ifi  c i cn ts
This is a generalized version o f basic counting principle.
Consider a set of n-distinct items to be divided into r distinct groups o f sizes 
=  33600 + 67200
nXl n2, n3, ..., n r where £ - =1 n i = n.
= 100800 ways
22 23
UNIVERSITY OF IBADAN LIBRARY
The number o f possible choices for the first group is ( " ) . second groiip is ( "
The other partition of S =  {1, 2, 3} are
/Tl — Jlj — tl2 
third group is v n3 {1} U {2} U {3}-3 blocks
Then !t otal pos(snib- tlei j !d ivision (fonr- nnx1-,nn22--,-n---3n, r._..] ),!n r is therefore {1, 2} U {3}-2 blocks{1, 3} U {2}- 2 blocks
711(71-7!! )! ’ (7 1 -T il - 7 1 2) ! n 2 ! ..........  0l71r !
a n - n iN  / n - n j  —n2 -------- nr _a
{1,2,3}- 1 block
n 2 n r.i* Thus, S has a total of 5 different partitions made up of:
One of {1} U {2,3}.{1} U {3,2}, {2,3} U {1} and {3,2} U {1} 
nt ! n2! ?i3! ... nr \ {1}U{2}U{3}
Example: There are 12 Super Falcons to be divided into two teams o f 6 girls each. {1,2} U {3}
How many different divisions is possible. {1.3} U {2}
12»
Solution: There are —  = 33,264 divisions {1,2,3}
Exercises: Definition 5: The number partitions of {1,2,3, ...m} into n blocks is denoted by 
1. A U.I. football team plays 8 games in succession, winning 3, losing 3 and S(m, n) and this is known as the Stirling number of the second kind.
Note: S(m, n) = 0 if n < 1 or n > m.
ending 2 in a tie. Show that the number of ways this can happen is (®) (^ )  =
Also, SCm, 1) =  1 =  S(m, m). This is because there is just one way to partition 
0! {1,2,3, ...m} into a single block and
31312!
~) Find n  and r such that the following equation is true {lj U {2} U {3} U .....U {m} is the unique unordered way o f expressing {'1,2,3, ...m}
as the disjoint union o fm  nonempty subsets.
1.5.1 Stirling’s Identity: For any two rpositive integers m  and r.
1.5 Stirling Numbers of the Second Kind r! 5(m, r) = ^ ( - l ) r+cC(r, t)
Definition 4: Let 5 be a set. A partition o f S is an ordered collection o f pairwise, c=i
disjoint, nonempty subsets o f 5 whose union is all of S. The subsets of a partition are Therefore 5 (m ,r) =  ^ £ t= i ( —1 )7 +£C(r, t) t ,n
called blocks. Example 1.34: 5 (4 ,1 ) =  C( 1 ,1)14 =  1 
For S = /lj U A2 U A3 U ... U Ak to be a partition of S:
5(4,2) =  ^ l - C ( 2 , l ) l 4 + C (2 ,2 )24]
i. Ai n  Aj = 0 whenever i ^  j
ii. A j * 0 . \ < j < k = ^ [ -2  +  16] =  7
Two partitions are equal if and only if they have the same blocks.
For instance. {1} U {2,3},{1} U {3,2}, {2,3} U {1} and {3,2} U {1} are 4 different 5 (4 .3) =  i [ C ( 3 , l ) l 4 -  C(3, 2)24 + C(3,3)34] 6
looking ways of writing the same two-block partition of S = {1,2, 3}
= 7613 - 4 8  + 81] =  6 
24
25
UNIVERSITY OF IBADAN LIBRARY
Solution: This is 5(m, 1) +  5(m, 2) +  ••• +  5(m ,n). This is the same as finding the 
5(4.4) =  - ^  [—C (4,1)14 + C(4,2)24 -  C (4.3)34 +  C(4 ,4)44]
number o f ways in which {1,2,.... m] can be partitioned into n  or fewer blocks since 
it is no longer a requirement that no urn be left empty.
=  ^ - [ - 4  +  96 -  324 +  256] = 1
Example 7: The number o f ways to distribute four labelled balls among two 
unlabelled urns is 5 (4 ,1 ) +  5(4 ,2) =  1 + 7 =  8 i. e.
1.5.2 Application of Stirling’s num ber o f the second kind to distribution of 
5 (4 ,1) =  {1,2,3,4}&{ }
objects into urns 5(4 ,2 ) =  {1 }&{2,3,4},{2}&{1,3,4}.
We are interested in the question "In how many different ways can m  balls be 
{3)&{1.2,4},{4}&{1.2. 3}.{1,2}&{3,4}.{1,3}&{:i,4}, {1,4}&{2,3}
distributed among n  urns?” We are going to answer this question by considering 
Variation 3: In how many ways can m labelled balls be distributed among n labelled 
whether the balls and urns are labelled or not and whether a particular urn can be left urns? This is nm.
empty? Example 1.36: Five labelled balls can be distributed among 3 labelled urns in 
We will consider 4 variations: 3s = 243 ways.
Variation 1: In how many ways can m labelled balls be distributed among n 
unlabelled urns if no urn is left empty? This is the same as “In how many ways can Variation 4: In how many ways can m labelled balls be distributed among n labelled 
urns if no urn is left empty? This is ?t! 5(m, n).
the set {1, 2, 3, ...m ] be partitioned into n blocks. This is 5(m, n).
Example 1.35: In how many ways can 4 labelled balls be distributed among 2 There are 5 (m ,n ) ways to distribute m labelled balls among n  unlabelled urns using 
variation 1. After the distribution o f the balls, there are n! ways to label the urns. By 
unlabelled urns if no urn is left empty?
the fundamental principle of counting, the answer is n\S(m , n).
Solution:5(4, 2) =  7 that is if the balls are labelled 1, 2 ,3 ,4  then the 7 possibilities 
Example 9: In how many ways can 5 labelled balls be distributed among 3 labelled 
are urns if no urn is left empty?
{1}&{2,3,4}
Solution:3! 5(5 ,3)
{2}&{1,3.4}
Example: Suppose that a secretary prepares 5 letters and 5 envelopes to send to 5 
{3}&{1,2,4}
different people. If the letters were randomly stuffed into the envelopes, a match 
{4}&{1,2,3}
occurs if  a letter is inserted in the proper envelope.
{1,2}&{3,4}
(i) In how many ways can the letters be stuffed into the envelopes so that no letter 
{1,3}&{2,4} falls into the proper envelope?
{1,4}&{2,3}
(ii) What is the probability that none of the letters is placed in the right envelope?
Because the urns are unlabelled, (iii) What is the probability that at least one of the letters is placed in the right 
{2}&{1,3,4} =  {1,3,4}&{2) etc. envelope?
Variation 2: In how many ways can m labelled balls be distributed among n 
(iv) What is the probability that exactly 3 o f  the letters were placed in the right 
unlabelled urns? envelope?
26 27
UNIVERSITY OF IBADAN LIBRARY
letter has only one matching envelop). Then it is possible to determine the probability 
that the secretary sniffs the letters randomly into right envelops.
Solution:
The total number of derangements for the 5 letters is
(i) 1.6.1 Derangements
D5 = 5! i _ i  + 1 _ 1 + ! _ !
Definition 1: A derangement o f (1. 2..... n) is a permutation //, /'?,..... , /„ of (1 .2 ...... n)
1! + 2! 3! + 4! 5! J such that it* 1, iyt 2......./>t«.
=  120 [1 -  1 +  0.5 +  0.1667 +  0.0417 + 0.00833] Thus, a derangement of (1. 2,...,n) is a permutation /'/, i2....  z'„ of (1, 2,....n) in which
= 120(0.71673) no integer is in its natural position: /'/^ 1, ij# 2, . ., i„*n.
=86.008 ways Denote by D„ the number of derangement of (1, 2....... n)
(ii) Probability that none o f the letters is placed in the right envelope is given as Consider the following example for illustration:
d l _ 1 _ }_ + _1_ __1_ + J_ _  J_
5! 1! + 2! 3! 4! 5! Example 1: At a party, 10 gentlemen check their hats. In how many ways can their 
=  0.716 hats be returned so that no gentleman gets the hat with which he arrived?
(iii) The probability that at least one o f theletters is placed in the right envelope is This problem consists o f an n-element set X in which each element has a 
1 Prob [None of the letters is placed in the right envelope] specified location. We are required/asked to find the number of permutations of the 
=  1 - (0 .7 1 6 ) set X in which no element is in its specified location.
=  0.2833 Here, the set X is the set of 10 hats and the specified location of a hat is (the 
probability that exactly 3 of the letters were placed in the right envelope is
(iv) The head of) the gentlemen to which it belongs.
given by Let us take X to be the set {1,2........ ,10} in which the location of each of the
integers is that specified by its p, ositi1 on in the sequence 1 ,2 ..........10.( A ' /  2! 3! (A -A )! Theorem: For n > I, Dn = n! 1 ----- h —1 ----1+
N\ 1! 2! 3!
+ ( - ! ) "n-l 
Proof: Let S be the set o f all n! permutations of (1. 2.........n). For j = 1, 2,..., n. let p,
% - 3 ) ! l - l  + 91 be the property that in a permutation, j is in its natural position. Thus, the permutation
5! //, hi...... in of (1 ,2 ........ ,n) has property pj provided z} = j. A permutation of (1, 2,...n)
= 0.083 is a derangement if and only if it has none of the properties pi, pi.......pn-
Let Aj denote the set o f permutations of (1, 2..... n) with property pj. (j = 1,2,
1.6 Allocation and M atching Problems n). The derangements of (1.2..... n) are those permutations in
A,1' n  A\ n ..... n  A*.
Introduction
Matching and allocation are some of the classic problems in probability theory. I his Thus, Dn = | A\ n  A, n ........n  A,1, |
problems dated back to the early 18th century has many variations. There are many The PIE is used to evaluate D„ as follows:
ways to describe the problem. One such description is the example o f matching letters 
with envelops. Suppose there are '/letters with //matching envelops (assume that each 29
28
UNIVERSITY OF IBADAN LIBRARY
The permutation in A| are of the form 1. A . . w h e r e  /„ is a permutation of rhus, —j- is the probability that it is a derangement if we select a permutation o f (1, 
(2..........n). Thus, |A11 = (n -  1)! And more generally for |A,| = (n -  1)! for j = 1, 2, 2..... n) at random.
........ n.
The permutations in A |n  A2 are of the form 1, 2, 13. ...in where /j ....... i„ is a 1.6.2 The Matching Problem
permutation o f (3........... n). Thus. | A |n  A2| = (n -  2)! Suppose that an absent minded secretary prepares n letters and envelopes to 
send to n different people. If the letters were randomly stuffed into the envelopes, a 
Generally. |A jn A,| = (n -2 ) !  for any 2 combinations (i .j) of (1 .2 ...... n).
match occurs if a letter is inserted in the proper envelope.
For any integer k, with 1 < k < n, the permutations in A in  A2n .....r v \k are of
the form 1.2 .......k, /'k 1 in- where /'*-/......./„ is a permutation of (k+1......... n). Thus, Example 2: Suppose that each of jV men in a room throws his shirt into the centre of 
|A in  A2n....r»Ak| = (n -  k)!. the room. The shirts are first mixed up and then each man randomly selects a shirt.
Generally. |A i,n  Ai2 n . . . .n A ik| = (n -  k)! for any k-combination (/|. /2,..... /k) of (1. (1) What is the probability that none of the men selects his own shirt?
2... n): (2) What is the probability that at least one of the men selects his own shirt?
k -  combinations of (1. 2.......n), applying the inclusion- (3) What is the probability that exactly k of the men select their own shirt?Since there are
Solution:
exclusion principle, we obtain: 1. From our discussion on derangement, the probability that none of 
D„ = n\ - (« -!)! + +HrC (« -« ) ! the men selects his own shirt is----- 3 (" - 2 ,!- ( ; ) (" - 3>i+..... PN , _ 1  + I _ i  +n\ n\ . n\ JV! 1! 2! 3! •+ (-D "  —JV!=  / ; ! ----------- - + —  + •+ (-D" -
1! 2! 3! n\ 2. The probability that at least one of the men selects his own shirt is 
. r. 1 1 1 •+ (-D ' 1 -  Prob [None selects his own shirt]
[ 1! 2! 3! n\
1 bus. from example 1 above. 1 -1  + -  -  -  + 
, ( - 0 * 
2! 3! JV!
1 1 1 1 1
D,o = 10! 1 1 1 1 + 1
1! ' 2! 3! 4! 5! 6! 7! 8! 9! H- I - -  + - ........... - 1- ^ .
2! 3! JV!
You should be able to supply the final answer for Dm 
1 I  + 1  ( - 0 *
Note: (i) The series expansion for e '] = 1 -  ^1!  -t- 2! -  73!j  + 4! 2! 3!.................... /V!
3. The probability that exactly k o f the men select their own shirt is as follows: 
(iij—  is the ratio of the number of derangement of (1, 2.......n) to the total First fix attention on a particular set o f k  men. The number o f ways in which this and 
n\
number of permutations o f (1 .2 ..... n). only this k men can select their own shirt is equal to the number of ways in which the 
other N-k men can select among their shirts in such a way that none of them selects 
his own shirt.
31
30
UNIVERSITY OF IBADAN LIBRARY
The probability that none o f the N-K men, (selecting among their shirts), selects his Solution:
1. 6 men. 6 women divided into 2 groups
own shirt is 1 -  1 + — -  — +
(i) two groups o f 6persons each
It follows that the number of ways in which the set o f men selecting their own shirts 
corresponds to the set of k men under consideration is
(-I)"'*
(N-K)!
2! 3! + ..........+ { N - K ) \
Also, as there are possible selections of a group of K men, it follows that there
14.4375
924
are
/VN =  0.0156
( N - K ) \
K 2! 3! ( N - K ) \
(ii) 6 ! 6!
ways in which exactly K of the men select their own shirts. X
All males and all females 2S3!3! 233!3!
The probability required is thus 12 !
, , . N-K 2 6 3! 3!f N 
( N - K ) \ 1 — 1 H---
i -----i- -------(----i-r----*----
, K 2! 3! (N -K )\
N\ ■_2363!!3_!/y (2-5)2 6.25
26162!6 14.4375 14.43 =  0.43292! 3! (N - K ) \ ! !
K\
Example 4:
e~
This result is approximately — , for large N. k = 0,1............... (a) State the principle o f inclusion and exclusion.
K\
(b) Suppose 15% of apple and 10 consignments were toxic. If the consignment 
Example 3:
Suppose there are a group o f six men and six women. They are to be paired in groups consists o f 60% apple and 40% mango, what is the probability that a fruit 
selected at random is toxic?
of 2 for the purpose of determining roommates.
(i) What is the probability that both groups will have the same number of 
Solutions:
male and female.
(ii) What is the probability that there are no male and female as (b) 15% of apple are toxic, 10% of mangoes are toxic
Consignment: 60% apple, 40% mango 
roommates?
Let F represent fruit; A: apple, M: mango 
Let T represent toxic fruit
32 33
UNIVERSITY OF IBADAN LIBRARY
5880
(i) P(T) =  P(A\T)P(A) +  P(M\T)P(M)
"  282475247 
= 0.15(0.6) + 0.10(0.4) =  7.369 x 10~14
= 0.09 +  0.04
= 0.13 Example 6:
(ii) PWT) = ^ Suppose that each of the 10 men in a room throws in their cap into the center of the 
0.09 room to be picked by 10 ladies in the annual marriage fixing ceremony. What is the 
013 probability that
=  0.0117 (i) No lady picks the cap o f the man o f her •:hoice.
Example 5: (ii) At least one lady picks the cap o f the man of her choice.
3.(a) Give the Stirling’s identity. (iii) Exactly 7 ladies could not pick the cap of men of their choice.
(b)(i) In how many ways can 10 labelled balls be distributed among 7 labelled urns 
(ii) What is the probability ii' the urns are unlabeled and non of them is lell empty. Solution:
1 lOmen and 10 ladiesSolution: ^n!= fI l - 11!  + -2! - -3!  + -4! - -  + —10!J](a). Stirling Identityr (i) Pr (No lady  picked a cap) =  [1 -  1 +  0.5 -  0.1667 +  0.0417 -  0.0083 +s(m-r)=^ I (- )r+,C)tM 0.0014 -  0.0002 T 0.000 -  0.000 + 0.000]
f=l =  0.3679
wherem and r are positive integers (ii) Pr (at least one lady picked a cap) =  1 -  Pr (No lady picked a cap)
b(i)7n =  10 labelled balls =  1 -  0.3679
n =  7labelled balls =  0.6321
Number of ways is n m = 7 10 (iii) n -  kwhere n = 10 , k = 7
=  282,475,249ways 1 0 - 7  =  3
(uses the principle o f inclusion and exclusion) 
b(ii) 5 (10 ,7 ) = P(k) “  (* )< "  "  V '  l 1 ~1! + 2! “  3! + * ( n -/0 l]
£  [ ( j )  110( - 1 )8 +  Q  210( - 1 )9 +  ( 3)  3‘°(-l)> ° + Q  4 10(—l )11 +
Q  510( -1 )12 +  Q  6‘°(—l ) 13 +  Q  710(—l) 14] 7! 7!
29635200 1 -  1 +  0 .5 -0 .1 6 6 7
“ 7! 7!
= 5880ways 0.333
5040
Therefore, P r[5 (10 ,7 )] =  -S~ ^ -
34 35
UNIVERSITY OF IBADAN LIBRARY
= 0.00006
(10) fen children are to uc grouped into two clubs in such a way that five will
= 6.61 x  10"5 belong to each club. If in watch club a secretary and a president is to selected, 
Therefore Pr(exactly 7 ladies could not pick the cap o f men of their choice) is 1 — ^ in how many ways can this be done?
i.el - 6 .6 6  x 10”s = 0.9993 (11) A shelf contains Chemistry, Mathematics and Economic text books. In how 
many ways can S books be selected?
(12) Show that:
Practice Questions a. nP (n  -  l ,r )  =  P(n,r  + 1)
(1) Show that ( ” )  =  (n " r ) b. P{n  +  l ,r )  =  rP(n ,r  -  1) -F P(n,r)
(2) If Cn_4 =  15; find n. 13. In how many ways can lour elements be chosen from a ten-element set:
(3) An examination question is divided into three sections A, B. C with 3. 4 and 5 a. with replacement if order matters?
question respectively. A student is required to answer t questions each from. b. with replacement if order does not matter?
Sections A and B and 3 from Section C. In how many ways can he write the c. without replacement if order does not matter?
d. without replacement if order matters?
examination?
3. In how many ways can six balls be distributed among four urns i f :
(4) In how many ways can he solve one or more question in Section C. a. the urns are labelled but the balls are not?
(5) If the paper is one o f the professional examination papers where candidates are b. the balls are labelled but the urns are not?
required to attempt as many questions as possible, find the total number of c. both balls and urns are labelled?
ways a candidate can write the examination if must attempt at least one d. neither balls nor urns are labelled?
question? 14. Show that Ds =  44
(6) In how many ways can a person purchase two or more items out o f 5? 15. Seven gentlemen check their hats at a party. How many different ways can 
(7) A nursery school pupil learning simple arithmetic is given 5 counters with their hats be returned so that:
digits 2. 1,3. 0, 4. 5 to form numbers. Find the probability that the pupil is a) no gentleman receives his own hat?
about to form a b) at least one gentleman receives his own hat?
(a(i)) 3 digit number c) at least two gentlemen receive their own hat?
(ii) a number greater than 400.000
(b) Using all the digits except 0. how many numbers can be formed and what is 
their sum?
(8) How many ways can the letters o f the sentence “Daddy did a deadly deed” be 
formed?
(9) A boy found a keylock for which the combination was unknown, but correct 
combination is a four digit number d l( cl2, d3l d4, where d,, t =  1,2, 3,4 is 
selected from 1, 2, 3, 4. 5, 6. 7, 8. How many different lock combinations arc 
possible results in such keylock?
36
37
UNIVERSITY OF IBADAN LIBRARY
CHAPTER 2
E L E M E N T S  O F  P R O B A B IL IT Y at a time; draw of two cards from a deck one after the other; a random 
selection of a ball from a box and examine the colour.
2.1 Introduction
The definition o f probability is as varied as the values of any random variable. Its (c) An outcome: This is a possible result o f a trial or an experiment. In a toss of 
definition depends on the extent to level one is knowledgeable o f the use and power two coins, an outcome could be any one of HH, HT, TH, TT. The possible 
outcomes in a throw of a die are, 1, 2, 3, 4, 5, 6.
o f probability concept.
Probability can be defined as a measure of uncertainty concerning a phenomenon. It (d) Sample Space: Is the totalily o f all possible outcomes o f an experiment. It is a 
can also be defined as a real value that measures the degree o f belief one has in the set o f all finite or countably infinite number o f elementary outcomes 
occurrence of a specified event. Probability is also described as the study of random ex,e 2, -  ,enIt is usually represented byS =  [e1 ,e2, ... ,e n}
phenomena. Most phenomena studies in the Physical Science. Biological Sciences. The sample space in a toss o f a coin and a die is represented by 
Engineering and even Social Sciences are looked at not only from deterministic but H1H 2H 3H 4H 5H 6H 
also from a random point of view. Therefore the theory o f probability has as its T IT  2T 3T 4T 5T 6T 
central feature, the concept of a repeatable random experiment, the outcome of which 1 2 3 4 5 6
is uncertain.
To the Statistician, probability remains the vehicle that enables him use information in i.e. S = [IH, 2H ,3H , 5//, 17\ 27, 3 7 ,4 7 ,57\ 67}
the sample to make inferences or describe a population from which the sample was The sample space when a die is thrown twice is
obtained. I'hus the study o f probability prepares a strong background for reliable S =  {11 ,1 ,2 ,1 ,3 ,1 ,4 , 1, 5,1, 6 ,1 ,2 ,2 2 ..... 66}
statistical inference. No wonder Professor Sir John Kingman remarked in a review (c) An Event: Is a subset o f  a sample space.
Lecture in 1984 on the 150th anniversary o f founding of the Royal Statistical Society It consists of one or more possible outcomes of an experiment. It is usually 
that “the theory of Probability lies at the root o f all statistical theory”. denoted by capital letters A, B, C, D, .... It should be noted that a subset in a
given set could consist o f all the possible outcomes or none o f the outcomes of 
Definition the given set.2.2 of Terms and Concepts
Before we define probability as a concept, it is necessary to review the definition of e.g. When a die is tossed once, we define. Set 
some probability terms that shall be employed in our discussions. A = {s e t  o f  even number} = [2,4,6}
(a) A Trial: Is any process or an act which generate a number o f outcome which B ={s-et o f  prim e num ber} ={1,3,5}
can not be predicted a priori. A trial usually results into only one of the C = {s e t  o f  num ber g rea te r  than 7} ={0}
possible outcomes e.g., A toss of a coin once, will lead to either a Had (IT) or a (f) Mutually exclusive events: Two events A and B are said to be mutually 
tail (T) turning up. The selection o f a card from a deck o f well shuffled cards exclusive, if the occurrence o f A prevents the occurrence of B. This implies 
result in one of the cards being drawn. that the two events can not occur together i.e. A n B= e.g. the occurrence o f H  
(b) A Random Experiment: Is any operation which when repeated generates a prevent the occurrence o f 7  in a toss o f a coin.
number o f outcomes which cannot be predetermined, e.g. A toss of two coins (g) Mutually Exhaustive Events: Events Av Az, A3, A4, ... ,A n are said to be 
mutually exhaustive if they constitute the sample space, i.e.
38
39
UNIVERSITY OF IBADAN
LIBRARY
number of outcomes for an experiment. There is no requirement that the experiment 
2i= 1> s . be performed bpef,o .r.e  the probability is determined, i.e.Number o f  outcom es in fa vo u r  o f  A _
} Total num ber o f  outcom es fo r  experim ent N
However, some events could be both mutually exclusive and exhaustive. This implies Where N is the total number o f possible outcomes
that they are disjointed and yet their sum is equal to the sample space. This would be ThusProbability is a measure o f likelihood that a specific event will occur.
illustrated later in (1.8). It should be noted that the last two probability terms are 
Example 2.3.1: Find the probability o f obtaining any number in a simple thrown of a 
associated with one experiment only. die.
(h) Independent Events: Two events A and B are said to be independent if the
Solution: The experiment has six outcomes 1, 2, 3, 4, 5, 6.
occurrence of A does not affect B. This implies that the two events can occur
together, e.g. the event o f an event number and a Tail in a throw of a coin and P (a number) -------------- --------------- =  -Total num ber o f  outcom es  6
a die at once.
(i) Sure/Certain Event: The sample space S  is the only sure event. The Example 2.3.2: Find the probability o f obtaining an event number in one roll of a die. 
Solution: Let A be the event o f an even number,
probability of a certain event E is one (P{E) =  1)
4 =  {2, 4, 6}; n (A) =  3
(j) Impossible Event: This is the complement of the sure event. It is an empty 
5 = {1,2,3,4,5,6}; n (S) =  6
set 0.
p r ^ \  Number o f  outcomes included in A _  3 _  ^  ^
Total num ber o f  outcom es 6
2.3 The Approaches to the definition of Probability This approach to the definition o f probability only holds for finite sample space where
The three conceptual approaches to the definition of probability (1) the classical elementary events are equally likely. However this assumption is not always true in
approach, (2) the relative frequency approach and (3) the axiomatic approach, (4) the real life as all events are not equally likely. After all we are not equally endowed.
subjective approach. These three concepts are explained as follows: (b) Frequency or ‘aposteriori’ probabilityApproach: This method defines
probability as an idealization of the proportion o f times that a certain event will occur
(a) Classical or ‘a priori’ Approach
in repeated trials o f an experiment under the same condition. Thus, in an experiment
If there are nnumber of exhaustive, mutually exclusive and equally likely cases of an 
is repeated /V times and n(A ), is the number o f times that A occur, then the relative 
event and suppose that nA of them are favourable to the happenings of an event A frequency is
under the given set of conditions, then (A) =  ^  . An example is the toss of a die n(/l)
N
once. The six possible outcomes are 1,2,3,4,5,6. The probability of occurrence of a 2
But relative frequencies are not probabilities but approximate probabilities. If the
is -. The probability is ‘a priori’, that is it can be determined before carrying out the
6 experiment is repeated indefinitely, the relative frequency will approach the actual or 
experiment. theoretical probability.
This method assumes that the elementary outcomes of an experiment are equally n(A)
likely. It defines the probability of an elementary event e{ as 1 divided by the total P(A) = limn—ca N
40 41
UNIVERSITY OF IBADAN LIBRARY
However, there is a requirement that the experiment be performed before the real world occurring at random is then determined satisfying certain properties (called
probability is determined. Hence, the probability is determined aposteriori. It should axioms).
he noted that some events in real life cannot be repeated before the probability is 
determined. Even if it can be determined the limit may not converge. 2.4 Probability of an event
Example 2.3: Fifty o f the 800 cars that enters the University o f Ibadan on a If A is an event from an experiment E with sample space the real valued function 
graduation day are found to be Jeep. Assuming different cars comes into the campus P(A)\s called the probability of A which satisfy the following axioms:
randomly, what is the probability that the next car is a Jeep? (1) 0 <  P(A) <  1 for every event A
(2) PCS) =  1
Solution: Let N  be the total number of cars and n be the total number o f Lexus. Then 
N=800, n=50 (3) P(A, U A2 U ...) =  P (/la) +  P(A2)+...
CO 
Using the relative frequency concept of probability, the probability that the next car 
being a Lexus is
1 =  1
P (Lexus) =  £  =  -51 =  0.0625 for every finite or infinite sequence o f disjoint event Av  A2 ...
(c) Subjective Probability: is the probability assigned to an event based on 2.5 Consequences of Probability' Axioms 
subjective judgement, experience, information and believe. Such probabilities Theorem I
assigned arbitrarily are usually influenced by the biases and experience of the (a) If .-I is a given event and Ac is the compliment o f A. then P (AC) = 1 -  P(A).
person assigning it. Proof: A U Ac =  S
For instance the probability of the following events are subjective: P(A  + Ac) = P(S) =  1 by axiom (2)
1. The probability that Jude, who is taking statistics in the second .-. P(A)  + P(AC) = 1/1 and Ac are mutually exclusive
semester will score seven points in the course. =  P(AC) = l - P ( A ) .
2. The probability that a particular Football Club win the maiden match 
with another club. (b) Theorem II:
3. The probability that Ade will win the case he has filed against his Given that cj) c  S, then P(A) = 0 
landlord. Proof:
Since subjective probabilities is based on the individual’s own judgement, it is rarely S  U 0  = S.
used in practice as it lacks the theoretical backing. P(S U 0 ) =  S = 1 by axiom (2) 
(d) Axiomatic or theoretical Approach: To circumvent the difficulties posed by P(S) + P (0 ) =  1 since P(S) =  1 
the earlier approaches to the definition of probability and based on the study of 1 + P(0) = 1
random of random phenomena, researchers have developed a mathematical = P(0) =  0.
expression of certain aspects of the real world. The probability of a certain part of the
42 43
UNIVERSITY OF IB
DAN LIBRARY
2.6 Rules ol Probability
Theorem 1: Let 5 be a sample space and P (.) be a probability function on S : then the l licorem 5: Commutative laws:
probability that the event A does not happen is 1 -  P(/l) i.e. P(A') =  1 -  P(A). / l u f i  =  f lu / l  
/ l n B  = 8 n / i
Proof: Theorem 6: Associative laws:
From definition. /I n  /!' =  0  ; / l u  /l' =  5 A U (B U C) =  (A U B)  U C 
P(/l U A') =  P(S) A n ( B  n C) =  (A n f l ) n c
P(/l U A') =  P(5) = 1 
P(/l U A') =  P(i4) +  P(A') = 1 Theorem 7: Distributive laws:
P (/l') =  1 -  P(/|) /i n (B u  c )  =  (/i n B)  u (a  u c )  
A u (B n c )  =  (a  u B) n (a  u c)
Theorem 2: Let S be a sample space with probability function P ( . ); then 0 <  
P(/l) < 1 lor any event A in S. (A')'  =  A
Proof: A' = S \  A
My property (1). P(/1) >  0
We need to show that P (/l) <  1 Thus
I roni theorem ( I ». P (/l) -f P(/T) =  1 A n S  = A 
Mut P(A')  > 0  A u S  = S 
So. P(A) = 1 -  P(A') <  1 A n 0 =  0  
/l U 0  =  /I
fheorem 3: Let S  be a sample space with a probability function P ( .). If 0  is the Also
impossible e\ent. then P (0 ) =  0. i4 n /T  =  0 
Proof: Observe that 0 = S' /I u  A'  =  5  
from property (3). we get P(5 U S') = P(S) + P(S') A n /l  =  A 
P(S) + P(0) A u A = A
Mut S  U S' = 6* and P(S) =  1 
Therefore P (0 ) =  0 Theorem 11: De Morgan's laws:
(A U B)'  = A ' r \B '
Theorem 4: If >1l and /12 are subsets o f S such that Ax c  A2. thenP(/li) ^  P (/l2)- (A n B)'  = A ' V B '
Theorem 12:
A -  B = A n  B' = A \ B  
P(A \ B )  = P(/l n  B')  =  P(A) -  P(A n B)
44
45
UNIVERSITY OF IBADAN LIBRARY
Solving Problems using Venn diagrams
Theorem 13: Example 1: In a sample o f 1000 foodstuff stores taken at an Ibadan market, the 
P(A U B) = P(A) + P(B) -  P{A n  B) following facts emerged:
if A and B are disjoint, that is P(A n B) = 0, 200 of them slock rice, 240 stock beans, 250 slock gaari, 64 stock both beans and rice. 
then P{A U B) = P (/l) +  P (5 ) 97 stock both rice and gaari, while 60 stock beans and gaari. If 430 do not stock rice.
do not stock beans and do not stock gaari, how many of the stores stock rice, beans 
Theorem 14: and gaari?
P(0) =  0 Solution:
Theorem 15: Multiplicative law of Probability
If there are two events A and B, probabilities o f their happening being P (/l) and P (P ) 
respectively, then the probability P(AB) of the simultaneous occurrence o f the events 
A and B is equal to the probability o f A multiplied by the conditional probability of 
B(i. e. the probability o f B when A has occurred) or the probability of B multiplied by 
the conditional probability o f A i.e.P(AB) = P(/1)P(P /A )  = P(B)P(A/B
2.7 Venn Diagrams
A set is a collection of objects, which can be distinguished from each other. The 
objects comprising the set are called the elements o f the set and they may be finite or
Let: R represent rice stores 
infinite in number.
Venn diagrams are diagrammatical representation of sets. For instance, consider the B represents beans stores 
set A =  {1,2, 3 ,4 ,5 ,6 ,7 ,8 ,9} , suppose that A has a subset B = { 2 ,3 ,4 ,5}. The G represents Gaari stores
diagrammatic representation of this is shown below. Let jc represents those that stock all the 3 food items
A Those that stock gaari alone are 250 -  [(97 -  x )  + x  + (60 -  x)] = 93 +  x 
Those that stock beans alone are 240 -  [(60 -  x)  +  (x) 4- (64 — x)] =  116 4- x
Those that stock rice alone are 200 -  [(64 — x) + x  + (97 — x)] = 39 +  x 
430 did not stock any o f the food items
Therefore. 1000 = (39 + x)  4- (93 4- x)+(116 4- x )  +  x  +  ( 6 4 -  x)  4- (60 -  x )  +  
(97 - x )  + 430
And x = 1 0 0 0 -8 9 9  =  101
Therefore 101 stores stock rice, beans and gaari.
47
46
UNIVERSITY OF IBADAN LIBRARY
2.8 The Principle of Inclusion and Exclusion
2.8.1 The Second Counting Principle M u B U C| = Ml + [M l + |C| -  M n  C|] -  M n [B U C ]|............eqn (2)
If a set is the disjoint union of two (or more) subsets, then the number of elements in Because A n  [B U C] = (A n  S) U (A n  C), we can apply equation( 1) again to obtain
the set is the sum of the numbers o f elements in the subsets, i.e. |/1 n M u  C]\ = \A n B\ +  M n  C| - \A n B n  C |........... eqn (3)
n(/l U B)  =  n(/l) +  n(B )  implying that \A U B\ = |/11 +  \B\ if A and B are disjoint. Finally, a combination o f equations (2) and (3) yields
M U B u  C| =  [Ml +  Ml +  Ml] -  [M n  a |  +  M n  C| +  M n c | ]  +  M n  s  n
Theorem I:|/l U B\ < \A\ + \B\ if A and B are not disjoint. Cl.......... eqn (4)
This is because Ml + \B\ counts every element of A n B twice. Let us illustrate this Thus proving theorem 2.
with the following example. From this derivation, we notice that an element of A n  B n  C is counted 7 times in 
equation(4), the first 3 times with a plus sign, then 3 times with a minus sign and then 
Example 2: If A =  (2 .3 .4 .5 .6 ), \A\ =  5 and B = ( 3 ,4 ,5 ,6 ,7 ), \B\ =  5 once more with a plus sign.
then. Ml +  \B\ =  10 
A U B  =  2 ,3 ,4 ,5 ,6 ,7  Example 3: If A =  {1,2 ,3 ,4}B =  {3,4,5,6}C = {2,4,6,7} then
M u f l |  =  6 A u f l u C  =  {1,2,3,4,5,6,7}
Since A and B are not disjoint, \A U B\ <  Ml + Ml |/1 U £ U C | =  7 ...........................(a)
Compensating for this double counting yields the formula Ml = 4 
Ml = 4 
M u B\ =  Ml + Ml -  M n B I............... eqn.(l)
From our example. A n  B = 3 ,4 ,5 ,6  Ml = 4
\A r \B  | = 4  Ml + Ml + Ml = 12
M U S |  = 5  +  5 - 4  A n B = 3 ,4 ,/l n  C = 2,4, f? n  C = 4 ,6  
In this example, \A n  B\ = \A n  C| =  \B n  C| = 2 so that 
= 6
thus proving equation (1) M n B\ + M n  C| +  M n  C| =  6 and
A n f i n c  =  4 ,M n f in C |  =  i
Theorem 2:\A U B U C| =  Ml + Ml + |C| -  \A n  B\ -  \A D C| -  \B n C| + Therefore. \A U B U C| =  Ml +  Ml + Ml -  \A O B\ -  \A n  C| -  \B n C| +
M n  S n  C| for three sets A, B and C. M n s n c |
= 1 2 - 6  + 1 =  7 ......................... (b)
Proof: I bus. (a) = (b) thus establishing theorem 2.
We know from equation (1) above that | A U 8 | =  |/1| + Ml -  M n B\ Generally, the Principle of Inclusion and Exclusion (PIE) states that:
Then, for 3 sets. \A U B U C\ = \A U [B U C]| P. AltA2, are finite sets, the cardinality o f their union
= MI +  M U C | -  \A n [ B  UC]| Mi U/ l2 U ...,U An\ =n
Applying equation (1) to \B U C\ gives
Y-1 M , i -L—y'1  < i< j< n  p ( n / i / | + —Y1 l ’i i<  j<k<n |4 ,n / l / n -••• +  ( - i ) n+1 | n >1, |
48
49
UNIVERSITY OF IBADAN LIBRARY
Proof: Solution: Let At . A2, A3 be the subset consisting of those integers that are divisible 
On the left is the number o f elements in the union of n sets. On the right, we first by 5. 6  and 8. The number we are interested in is
count elements in each o f the sets separately and add them up. If the sets At are not 
n A \ n A '}| = 1000 -  |j4,| -  |/i,| -  |/1,| + |A, n  A21 + \Al r*Ai\ + \A, n  A}\ -  
disjoint, the elements that belong to at least two o f the sets Ah or the intersections | A, n A 2n  A}\
A, ft Aj. are counted more than once. We wish to consider every such intersection, but 
each only once. Since A{ n  Aj = Aj n  At, we should consider only pairs (Ai.Aj) with W = [!M J=200 W = [1 -’—^ J'  = 166 1/1,1 ='100 = 125
i < j .
When we subtract the sum of the number of elements in such pairwise intersections, Note: The results for |/ll|, |. 'l , | and |.4,| were achieved by using the round down, 
some elements may have been subtracted more than once. Those are the elements that notation [  J which involves the dropping o f the fractional part.
belong to at least three of the sets A{. We add the sum of the elements of intersections To compute the number in a 2 and 3 -  set interaction, we use the least 
of the sets taken three at a time. (Note: the condition i < j  < k  ensures that every common multiple (LCM) i.e.
intersection is counted only once)
The process continues with sums being alternately added or subtracted until we come 
to the last term which is the intersection of all sets A{ thus proving the theorem.
Let S = Ax U A2 U ....U An and A *  = S\Ai then the PIE principle can also be
expressed as
MiC n .....D A nc | =
n
i s i - y w y ' I S  \a , n  Aj n  Ak \ + ■■■ = 8i< jsn  ^—l l< i< j< kin
-  ( - l ) ”+1 |n >l<| Thus. Î l,1 r \A \  n ^ j |
------  . . ]u» u n 2
are any two events of 
Example 4: Let A be the subset of the First 700 hundred numbers 5 =  {1 ,2 ,.......700} rule an experiment with sample space S. then we have the addition
that are divisible by 7. Find the number o f elements in 5 that are not divisible by 7. 
Solution: u  a 2)  =  P(At ) + P(A2) -  P(A] n  Az)Proof:
A = {7,14,21,28,35,42,49 .......} In a Venn diagram 
\A\ = 100 Fig. 1.1
\A'\ = \ S \ - \ A \ P(At U A2) = P(A l) U P(A2) = 1 
= 7 0 0 -  100 P(A1 U A Z) =  P(A,) + P(A2 n A f )  
= 600
but P(A2 n A \ )  =  P{A2) -  P(A2 n  a 2)
Example 5: Find the number of integers from 1 to 1000 that are not divisible by 5. 6 
and 8
50 51
UNIVERSITY OF IBADAN LIBR
RY
( ii)  P(YuG) =  P(Y) + P(G)
P(AX U A2) =  P (/li) +  P{A2)P{Ai n  A2)Addition rule 5 4 9
••• P(A1 U A2) =  P (/li)  + P (/l2)^ (^ i n A2)Addition rule (since only one ball is drawn P(Y  n  G) = 0)
However, if Ajand A2 have no point in common, that is when Atand A2 are mutually (iii) P (P  )  = 1 — P(B)  =  1 — ^  =  0.6
exclusive (iv) P(B U G)c = l - P ( B U G )
P(Aj n A 2) =  0 since Aj n A2 =  0 = 1 -  IP(B)  +  P(B)]
We have I* =  (At U A2) =  P(At ) + P(A2)Special Addition rule P =  (Aj U A2) =
P(A,) + P(A2)Special Addition rule a
Using the same procedure fort any three events A. B and C. 20
P(A U B U C) = P(A) + P(/i) + P(C) -  P{A n B) -  P(A n C) -  P(A n C) = 0.4
-  P(A n B n C) Alternatively.
P(neither Black nor Green) =  P(Yellow or Red)
Example: A coin is rolled three times, what is the probability of getting (i) 1 head, (ii) ~P(Y) + P(B)
2 heads, (iii) at l east 2 Solution: Let H and T  heads.represent Head and Tail respectively.
Let the sample space be defined as = ^ = 0-4
5 =  [HUH, THH, TTH. HTT. , TTT} (v) P(B D K)= 0 see note in (ii) above.HTH. HHT. THT
(i) P(1 head) =  {I ITT, TUT,TTH} =
Example: A survey o f 500 students taking one or more courses in Algebra. Physics
(ii) P(2 head) =  {HHT.TH1I.HTH} =  *} and Statistics during one semester revealed the following numbers o f students in 
(iii) P(at least 2 head) =  P(2 heads) +  P(3 heads) indicated subject:
=  -  +  -  =  -  =  0.5 Algebra 186 Algebra and Physics 83 8 8 8
Note: The events o f 2 heads and 3 heads are mutually exclusive. Physics 295 Physics and Statistics 217 
Statistics 329 Algebra and Statistics 63
A student is selected at random what is the probability that he takes
Examples: A bag contains 8 black balls; 3 red balls, 4 green balls and 5 yelkns ball: (i) all the three subjects
all of which arc of the same size. If a ball is drawn at random from the bag. what ii (ii) Statistics but not Physics
me probability that the ball is (i) black, (ii) either yellow or green (iii) not black, iv (iii) Statistics but not Physics and Algebra
neither black nor green, (v) black and yellow? (iv) Statistics. Algebra but not Physics
Solution: Let B R. G and )' represent the event of black, red. green anc 'ciiow bal: (v) Algebra or Physics
respectively. Total number of balls = 20. 
n(B) 8
n )  KB) =  —- f  =  — =0 . 4  
n(S; 20
53
52
UNIVERSITY OF IBADAN LIBRARY
Solution: Let A. P and S denotes the event o f a student taking Algebra. Physics and =  P (s ) -  P(5 n Pc) 
Statistics respectively. =  P(5) -  [P(S n  P) -  P(A n P n 5)]
Presenting the information in a Venn diagram we have _3_ 5
2 9
 010 
 
65 5
20107  | 50503 
~  500
=  0.33
(v) P (Algebra or Physics)
=  P(A  n P)
i.e. _P1(8U6  P+ )  2=95  P_ ( _A83)_ +  P(P) -  P(A u P)
n(A n A n Bc) =  n(A n S) -  n(A n P n  S) = 10 500 500 500
n(P n S n Bc) = n(P n S) -  n(A n P n S) = 164 _~   530908 
n(A n P n Sc) =  n(A n P) -  n(A n P n  S) = 30 =  0.796
Using the addition rule, we can find the number of students that takes all the three
subjects. 2.9 Conditional Probability and Independence
n(/l U P U 5) =  n(/l) + n(P ) +  n(/l n P) -  n(A  n  5) + n(A n P n 5)
If A and B are any two events, the conditional probability of A given B is the 
500 =  186 +  329 -  83 -  217 -  63 +  n(A fl P n S ) 
probability that even A will occur given that event B has already occurred.
n(A n P n s )  = 53 This is equivalent to the probability of events A and B (occurring simultaneously) 
••• P(All three subjects) = ^  = 0.106 divided by probability of event B.
(ii) P(Statistics but not Physics) i.e. P{A/B)  = provided P(B) *  0
=  P(S  n Pc) = P(/l n s )  = P(B)P (A /B)  = P{A)P{B).
=3 2P509
(
0 
5
 _
) 2 -517
  P(S n P) In general00 P(A, n A3 n ...An) =  P(Al )P(Al / A 2)P{A3/ A l n A2) ... P {A n )/ ( A , ... An)
= —500  = 0 .2 2 4
(iii) P (Statistics but not Physics and Algebra) Let Al ,A2, Az denote the 1st, 2nd and 3rd cards
=  P(S) -  P(A n P) P(A1 n  a 2 n a 3) 4=  P3( /l1)2.P(i42/-41) .P ( /i3/A 1 n a 2)
=3 2P9( S) -  P(/l n P) -  P(S n P) +  P(A n P n 5) = —52 x  5—1  x  —50500 _  _58030_  _  251070  +   _5503^ 0_ 13226400 
-   _£~~ 50!0. =  0.00018
= 0.164
(iv) P (Statistics. Algebra but not Physics)
54 55
UNIVERSITY OF IBADAN LIBRARY
Example: A bag contains 10 while balls and 15 black balls. Two balls are drawn in (b) without replacement
succession (a) with replacement (b) without replacement. What is the probability that (i ) P ( ,B C W )  = P (B ) .P (W /B )
(i) the first ball is black and the second white —~   2155 * 2150  =  0  2 5
(ii) both are black
(iii) both are of the same colour (li) P(B
(iv) both are of different colours _1  n  B215  1)4 = P(Bl ). P(B2/ B l )
(v) the second is black given that the first is white. ~ i I * ^ = ° - 3 5
Solution: Let B and W denote black and white balls respectively. (iii) P (both black or both white) = P t f ^ P15f  a /1B4 x, )1 0+  P9(l4r1)P (l^2/lV 1)(a) with replacement 25 *  24 +  2 5 *  24
( i )  P ( B= D15W )1 0= = P 0(B.2)4.  P(W)25 25 =  0.35 +  0.15 —x — =  0.50
(iv) P (both are of different colours) = P_( B15) P 1(0W /B10)  +1 5P ( W )P (B /W )(ii) n  S2) = P ( B ) x P ( B ) 25 *  25 +  2 4 * 2 4
= ©  = 0 3 6 =  0.25 +  0.25 
=  0.50
(iii) P (both black or both white) = P(B^ n B2) +  n  W2) ( v ) P ( B / l V ) = ^ p
==  00..S326 + 0.16 
_ 1255  *  2140   .1 00. ' 25_  0̂ 245 
=  0.625
(iv) P (both are of different colou[r1s5)  =1 0P1( B,  n[1 0W )1 5+1 P(W  n B)=~   2[25* 25J +  1.25* 25] 2.10 Statistical Independence= 0.(400.24) Two events A and B are said to be independent if the probability that B occurs is not influenced by whether A has occurred or not. 
i.e. P(B) =  P{B/A)
V '  V '  J P(W) =  00.6.4 Hence events A and B are independent if 
P ( A H B )  = P(A).P(B)
From the last result, we could see that the =tw 0o. 6e.v ents are independent, hence, Three events are said to be mutually independent ifP ( B / W )  = P(W)  
(i) They are pairwise independent, i.e.
because the drawing is with replacement.
P(A r  B ) =  P(A). P{By, P (A r \C )  = P(A) cot P{C);
56 57
UNIVERSITY OF IBADAN LIBRARY
P(B n c )  = P{A). P(B).P(C) and
= P(ABA) + P(B.A.B')
(ii) P(A fl B n C) =  P(A). P(B). P(C)
x
It should be noted that mutually exclusive events are not independent as the *) +
_  _5_
occurrence of one rules out the possibility of the other, i.e. _  36
P(A /B)  = P(B/A)  = 0. ( iv ) P(B wins at least one game = 1- P(no game) 
Example: What is the chance of getting two sixes in two rollings of a single die? =  1 -  P(B1 B2 B3)
Solution:
P (six in 1st die) = — P (six in 2nd die) = — 
6 6 19
Since the two events are independent 27
Example: An unbiased die is rolled n times 
P (six in 1st and 2nd die) = —
6 6 36 (i) Determine the probability that at least one six is observed in the ^trials.
Example:/! and B plays 12 games of Ayo (Yoruba traditional game). A wins 6 and B 
wins 4 and two are drawn. They agree to play three games more. Find the probability Calculate the value o f n if the probability is to be approximately -•
Solution: 2
that:
(i) A wins all the three games P(a six in a throw) =  -
6
(ii) Two games end in a tie P(no six in a throw) =  -
(iii) A and B wins alternately 6
(iv) B wins at least one game. (i) P(at least 1 six in n trials) =  1 -  P(no six in n trials)
Solution: Let A and B represent the event of A and B winning the game and D 
winning the game and D denote the event of a tie. (ii) If the probability is^ ; then
©  =  ? 
n l ° E ( i )  =  i ° g ( j )
( i) P(A wins all three) = \ x \ x \  = \2 2  2 o n _  log(V2) 
(ii) P (2 games and in ties) =  P(D. D. D)c + P(DC. D. D) +  P(D. Dc. D) log(5/6)
/ I  1 5 \ /5 1 1\ f l  5 I n n  =  4
“ k V ' e M e V f i M e V ' i J
5 Example: Determine the probability for each of the following events.
72 (a) A king or an ace or jack o f clubs or queen of diamond appears in a single card 
from a well shuffled ordinary deck o f cards.
(iii) If A and E -  B wins alternately in two mutually exclusive ways. (b) The sum of 8 appears in a single toss of. a pair o f fair dice.
58 59
UNIVERSITY OF IBADAN LIBRARY
(c) A 7 or 11 comes up in a single toss of a pair ol dice A n B =  {HH),A n C  =  {HT},B nC  =  {TH),A n  B n c = 0 
P(A) = P(B) = P(C) =  -4 = 0 . 5
Solution:
P (A  n B )  =  P ( A ) .  P ( B )  =  i ;  P ( B  D C )  =  P ( B ) .  P ( C )  =  -l  
(a) P(King) = ^ ;  P(an ace) =  ^
P (A  f lC ) =  P ( A ) .  P { C )  =  P {A  n B  n C) *  PQ4). P ( B ) .  P ( C )
P (Jack o f club) =  ^  ~
62 Hence events A, B and C are not mutually exclusive.P (Queen o f diamond) =  — Example: An urn contains P ' white and 'q‘ black balls and the second contains ' C  
P (a kin' d4, an ace, J. o f club or5 Q. o f diamond) white and d'  black balls. A ball is drawn at random from the first and put into the ,52 +  52 +  S 2 Ĥ  52/ ~  26 second. Then a ball is drawn from the second urn. Find the probability that the ball is 
white.
(b) Solution: This is a conditional probability.
Dice 1 2 3 4 5 6 Total number o f ball in the 1st Um is (P +  q )
1 2 3 4 5 6 7 Total, number of ball in the 2nd Urn is (c +  d )
2 3 4 5 6 7 8 Total number o f ball in the 2nd after the first draw is c + d  -I- 1 
3 4 5 6 7 8 9 P (white ball in the 2nd um)
4 5 6 7 8 9 10 = P ( W ) P ( W /B  + P ( W ) P ) ( W / W )
5 6 7 8 9 10 11 =  — -— (— ) + — —  (— ) 
10 11 12 c + d + l V p + d /  c + d + 1 \p+qJ6 7 8 9 _  c(p+q)
(c+d+\)(p+q)
c
P(sum =  8) = ^ ; ~  c + d +1
(O P ( 7 ) = £ ; P ( H )  =  £
2 
P ( 7 ° r n )  =  ^  +  ^ = 9
Example: A pair o f fair coins is tossed once. Let A be the event o f head on the first 
coin and B the event of head on the second coin first coin and B the event o f head on 
the second coin while C is the event of exactly o head is events A, B and C mutually
independent?
Solution:
5’ =  {HH.HT, TH, TT}
A =  {////, HT),B =  {////, TH) 
C  =  [HT,TH}
60 61
UNIVERSITY OF IBADAN LIBRARY
C H A P T E R  3
. _5J_
C O N D I T I O N A L  P R O B A B I L I T Y  A N D  B A Y E S ’ T H E O R E M
145
(ii)
Let G, n  G': denote the event that the two students selected are both girls. 
VI Conditional Probability PG. nC,  = P(GX) P(G2 /G,)
Supposed A and B are any two events such that A is the prior event and B is the 
posterior event. There is the possibility that there are points o f  intersection between 12 J2  
30 V 29 
the two events such that the occurrence of one is conditioned on the other. Thus we 
132 22
give the following definition.
870 ~ 145
Definition 1: Let A and B be two events in the sample space S with given probability (iii)
space IS. A. B. P(.)) where P(.) is a real valued function, the conditional probability of /i, B2 vj G,G, is the event that both students selected are o f the same sex. 
event A given the event B has occurred denoted by P[A/B], is defined by P(BtB2v G lG 2) =  P{B,B2)  + P(G,G2)
Since Bt n B ,  and G,Gz are mutually exclusive
P(A/B) = ,P(B)>0  , this implies that
r ( n ) P(B,B2 u G,G2) = 51 2*?
P( A r*B)= P(AI  B).P(B) 145 145
73
Also P(B/A) — *P(A)> 0 which also impliesP ( A n  B) -  P(B! A).P(A)
l (A) 145
Example I : Two students arc chosen at random from a class consisting of 18 boys (iv) B,G, u  G, B: is the event that the two students selected are a boy and a girl.
and 12 girls. What is the probability that the two students selected are: (B,G, u G,B2) = P(B,Gj) + P(G,B2)
(a) both boys (b) both girls (c) of the same sex (d) a boy and a girl. -  P{B[ ) . P ( G 2 / B i )  + P(Gx) . P ( B 2I G x)
Solution: Let B| to be the event that the first student selected is a boy. 2 1  J1  12 2 130 A 29 30 V 29
Let B> be the event that the second student selected is a boy.
Let B, o  B, denote the event that the two students selected are both boys. = 1  r 12/  + 1  x  1 8 / = 7 2 /5  / 2 9  5 r  729 7145
(i) P( Bt n  B2) = />(/*, ) , P ( B J B l ) where Example 2:A boy has 10 identical marbles in a container consisting o f 6 red and 4
blue marbles. He draws two marbles at random one after the other from the container 
without replacement. Find the probability that:
P ( B j B t) = (a) the first draw is red while the second is blue ' / 9
lb) both draws are o f the same colour
3 17
Therefore, P(Bt n B 2) = -  x  — (c) both draws are o f different colours.
Solution:
(a) Let R| be the event that the first draw is red
Let B: be the event that the second draw is blue.
62
63
UNIVERSITY OF IBADAN
LIBRARY
3.2 Independence
The event Rt n  B,  is the event that the first draw is red while the second draw is blue. 
^ n f l ]
P(Rt n B 2 ) = P(R{) . P { B J R x) where Recall that P [A/B] />(£)> 0/ W
PiR,)  = X o  Definition 2: Two events /! and B are said to be stochastically or statistically 
independent if and only if any one o f the following conditions is satisfied:
p ( b j r x) = y 9 (i) P(AnB) = P(A)P(B)
6 4
P(R, o B ,) —  x — (ii) P(A/B)= P(A) if P(B) > 010 9 (iii) P(B/A) = P(B) if P(A) >0
4^ It is easily shown that (i) implies (ii), (ii) implies (iii) and (iii) implies (i). See Post­
15 test (2 ).
(b) Let R| be the event that the first draw is red. Therefore. P(Ar\ B) = P(A/B)P(B)  = P(B/A)P(A)  if P(A) and P(B) are non-zero.
Let R-2 be the event that the second draw is red. This implies that one o f the events is independent o f the other. In fact,
Let B| be the event that the first draw is blue 
Let B2 be the event that the second draw is blue. P\.UB\--P{AnB) ._p tBIA)pW = P^)P{A)
P(B) P(B) P(B) ’
Therefore So. if P(A). P(B) > 0 and one of the events is independent o f the other, then the 
/>(/?,/?, u  /?,/?,) = P(RtR2) + P{BxB2) since /?,/?, and B{B2 are mutually
second event is also independent o f the first. Thus, independence is a symmetric 
excusive. relation.
Pi /?,/?,)=  P{RX) P(R2 I R x) 
Remark: Two mutually exclusive events A and B are independent if and only if  P(A) 
10 '' 9 P(B) 0 which is true if and only if either P(A) or P(B) = 0
Also, if P(A) *  0 and P(B) * 0, then A and B independent implies that they are not 
mutually exclusive.
P(Bt/?,) = P(BX) x P ( B J B ,)
Definition 2: Events A/, A: ..........4n from A in the probability space [.S'. A, P(.)] are
_ 4_ 3 said to be completely independent if and only if 
10 '  9
(i) P (/f n  A,) = P(A,)P(A,)  fo rte /
12
15 (ii) P(At n  Ai n  Ak) -  P(A,)P(A,)P(Ak) for i * j , j  *k,  i * k
, = 1 _
Therefore, n  B\B^  = » + » 15 (iii) 1̂n1 A. = n#s|/> ( 4 )
Note: (i) These events are said to be pairwise independent if 
P(Ai n A l )= P(A ,)P(At ) for a l l / * /
64 65
UNIVERSITY OF IBADAN LIBRARY
(ii) Pairwise independence does not imply independence P( B/A) - P ( B n A )
(iii) .1 and B mutually exclusive implies that they are not independent. P(A)
This implies that P (A n  B) = P (B n  A) -  P(B/A)P(A)
Example 3: Suppose two dice are tossed. Let A denote the event o f an odd total. B. 
the event o f an ace on the first die, and C the event o f a total of seven. Therefore P(A/B) = ~̂ - - A)P('A)
P{B)
(i) Are A and B independent?
fhe above is known as Bayes theorem.
(ii) Are A and C independent?
(iii) Are B and C independent? 3.4 Total Probability Rule and Baye’s Theorem
Solution: If there are two or more events where one is the prior and the other in the 
P[A/B] -  4 ~ P(A) posterior event, it is often desirable to determine the probability that a particular event 
has occurred given that the other event has previously occurred. Even though this kind 
P[A/C] l* P [ A ] -'/2  
of problem can be solve by merely applying the addition and multiplication rule, 
P1C/B| i= P (C )  = '/6 much compact procedure has been developed called the Baye's theorem.
So, A and B are independent 
A is not independent o f C Baye’s Theorem
B and C are independent Let a sample space S  of an experiment be partitioned into n mutually exclusive 
and exhaustive events A X,A2. .... An. Let B be an arbitrary event that occurred after 
Example 4: Let A| denote the event of an odd face on the-first die,
the experiment been performed. Such that P (/!,■) =£ 0, i = 1 ,2 ,..., n then.
Let Ajdenote the event o f an odd face on the second die.
Let A3 denote the event of an odd total in the random experiment consisting o f two /5(B) =  s r = i ^ f) W A )
and
dice. Then.
P W B )  =
P(A,)P(A2) = j x  j  = P ( A , n A 2)
P(A,)P(A,) = j x  j  = PfA2/A,]P(A,) = P ( A , n A ,) Proof: Let the events A,and B be depicted as in Fig. 1.3
P lA ,n A ,)  =  i = P ( A 2) P lA J  
1 herefore. A|. A 2 and A3 are pairwise independent.
But P( A, n / I ,  n  .4,) = 0 * j  = P(At)P(A ,)P(A ,)
So. A |. A: and A3 are not independent.
3.3 Haves Theorem
P ( A n B ) By definition o f conditional probability, we have
Given that P(A/B) = P(B)
66 67
UNIVERSITY OF IBADAN LIBR
RY
Solution:
p(Ai n B) | et k be the event o f picking an apple. 
P(B/Ai) P M I sing the table below:
P(Ai n B )  = P(Ai')P(B/Ai) Slate of Nature P(B.) P(E/Bj) P(Bi)P(E/Bi) P(B,/E)
B, (4A. 10) X X
We know thal XP(Aj n B) B:U aT 40)
P i A J B ) X X Xs X" T c/t T B3(2A.30) X X 715 X
Such that P{Ai n  B) = P{ B)P {AJB) Total 1 - x 5 1
But total probability is The required probability 
P(B)  =  P(Ai n  B)  + p (A2 n B )  + P(A2 n  B)  +  -  +  P(/ln n s )
P(B|/E) -  '/,
Using (1) in (3) we have
P(B) = P^JPCB/ZIO +  P(/I2)P(B//12) +  -  + P(An) ( B / A n)
11 ^ P I B ^ P I E /  B,)
=  Y j P(.Ai)( .B/Ai)
i= 1 4 1
Using (3) in (2) we have the Bayes' formula defined as: 5 3 = X
VlS
P(.Ai)P(B/Ai) Example 2 :In a certain town, there are only two brands o f hamburgers available, 
P(A‘/ B )  ~  £ "  1 P{A,)P (S//1,) Brand A and Brand B. It is known that people who eat Brand A hamburger have a 
Example I: The contents o f 3 identical baskets B,(i = 1, 2. 3) are: 30% probability o f suffering stomach pain and those who eat Brand B hamburger 
have a 25% probability o f suffering stomach pain. Twice as many people eat Brand B 
Bp 4 apples and 1 orange 
compared to Brand A hamburgers. However, no one eats both varieties. Supposing 
B2: 1 apple and 4 oranges 
one day, you meet someone suffering from stomach pain who has just eaten a 
Ru 2 apples and 3 oranges
A basket is selected at random and from it, a fruit is picked. The fruit picked turns out hamburger what is the probability that they have eaten Brand A and what is the 
to be an apple on inspection. What is the probability that it come from the first basket probability that they have eaten. Brand B?
Solution: l.et S denote people who have just eaten a hamburger
Let A denote people who have eaten Brand A hamburger 
Lei B denote people who have eaten Brand B hamburger 
Let C denote people who are suffering stomach pains 
We are given that 
P(A )- X 
P(B) X
69
68
UNIVERSITY OF IBADAN LIBRARY
Si)"., chance o f catching fish, the corresponding figures for the river and the lake are
P(C7A) = 0.3 40% and 60% respectively.
P(C/B) = 0.25 (a) Find the probability that he catches fish on a given Saturday.
S -  A <JB (b) What is the probability that he catches fish an at least three o f the fire 
As those who have stomach pain have either eaten Brand A or B. then A n B  = tf consecutive Saturdays?
(c) If on a particular Saturday, he comes home without catching anything, where 
P(C) = P (C n  S) = P (C n  A) + P (C n  B)
= P(C/A)P(A) + P(C/B)P(B) is it most likely he has been?
(d) His friend, who is also a fisherman, chooses among the three locations with 
0.3 x X  + 0.25 x K  
equal probabilities. Find the probability that the two fishermen will meet at 
= % least once in the next three weekends? (Any assumptions made should be 
P(C/A)P(A)
I hen P( A/C) --------- clearly stated).
0.3 X{&) _  „  
~ 7% Solution: Let S. R and L denote the event that he goes to the sea, the river and the ---------  _  
%30 A- lake respectively and F  denote the event that he catches fish.P(C /  B)P(B) _ 0.25 (% ) 
AndPtB/C) —  ^ P(S) = ' - ;P( .F/S)=±
P(R ) =  i ; P ( F / f i )  =  |
= %
So. if someone has stomach pain, the probability that they have eaten Brand A P (0  =  f P ( P / i ) = ;
hamburger is X  and lhe probability that they have eaten Brand B is X • (a) Using the idea of total probability,
P (F ) =  P (5 )P (F /5 )  +  P(R)P(F/R ) +  P{L)P(F/L)
Example 4: Suppose 15% of apple and 10 consignments were toxic. If the 
consignment consists o f 60% apple and 40% mango, what is the probability that a
fruit selected at random is toxic?
Solution: Let B be the event of toxic fruit and. A 1 ,A1 be events of selected fruit being (b) Let the number o f Saturdays on which he catches fish be a random variable X  with
an apply and a mango respectively.
=  ^  =  0 ^ 2) “ 0.4 P{X >  3) =  P(X = 3) +  P(X =  4) + P(X = 5)
W / A d - g - 0 . 1 S ;  P ( f i / / l2) =  ^  =  0.1 =  ( 3) (0 .65)3 (6.35)2 +  (!j) (0.65)4 (0.35)1 + (j!) (0 .65)5 (0.35)°
P{B)  = P{Al )P{B/A l)  +  P(A2) P ( B /A 2) =  0.3364 + 0.3124 + 0.116
=  (0.6 x 0.15) +  (0.4 x 0.1) =  0.13 =  0.765
Example 5: Every Saturday a fisherman goes to the river, the sea and a lake to catch I lere we need to calculate the probability that he goes to each o f the locations without 
llshes with probabilities j ;  ^an d ^  and respectively. If he goes to the sea. there is an catching fish.
71
70
UNIVERSITY OF IBADAN LIBRARY
P(v/ l ) =  —100  =  0.6; P (v /l2) =  —100 =  0.4 
n  _  p(sr\Fl) P ( P / / L )  =  —  = 0 . 1 5 ;  P{B/A2) =  — =  0.1
P ( S / F 1) = P(Fl)
P (P ) =  P d A j P i B / A j  +  P{A2)P {B /A 2)
=  p(s)p(f V£) =  M  =  ^ p . 2 8 6
( - )  £  ^ = (0.6 x 0.15) +  (0.4 x 0.1) =  0.13p p
Example 7: F.very Saturday a fisherman goes to the river, the sea and a lake to catch
Similarly, fishes with probabilities and  ^ respectively. If he goes to the sea, there is an
P ( « / F 1)  =  M  2  =  ¥  =  ^ = o . 4291 20 80% chance of catching fish, the corresponding figures for the river and the lake are
40% and 60% respectively.
P ( i / P 1) = PWpZ \ ' R) =  =  7 =  0-286 (a) Find the probability that he catches fish on a given Saturday.
So it is most likely that he has been to the river. (b) What is the probability that he catches fish an at least three o f the fire 
(d) Let S l lS2 denote the event that the first and second fisherman goes to the sea consecutive Saturdays?
respectively, and define R1, R2 ,Llt L2 similarly. (c) If on a particular Saturday, he comes home without catching anything, where 
The probability that they meet on a given Saturday (assuming independence) is it most likely he has been?
(d) Flis friend, who is also a fisherman, chooses among the three locations with 
is
P(Sl n s2) +  P(R1 n r 2) +  P (LX n L2) equal probabilities. Find the probability that the two fishermen will meet at 
12  i3 , 41  31 , 14  i3 least once in the next three weekends? (Any assumptions made should be 
3 clearly slated).=  -  =  0.33
Probability that they fail to meet on a Saturday is Solution: Let S, R and L denote the event that he goes to the sea, the river and the 
( i - i )  =  1  =  0.666 lake respectively and F denote the event that he catches fish.
The probability that they fail to meet on three consecutive Saturdays is P ( S ) = j ; P ( F / J ) = i
b - ; )  = £ = 0296 P{R) = \ - . P(F/R)  =  \
The probability that they meet at least once in three weekends is P { L ) = \ - P { F / L ) = \
= 1 -  P( failed to meet)
(a) Using the idea of total probability
=  1 -  0.296
P(F) =  P (5 )P (P /5 ) +  P (P )P (F /P ) +  P(L)P(F /L )
=  0.703 1 4  1 2  1 3
Example 6: Suppose 15% o f apple and 10 consignments were toxic. If the = -2 x -5 x -4 x -5 x -4
consignment consists o f 60% apple and 40% mango, what is the probability that a =  —20  = 0 .6 5  
fruit selected at random is toxic?
Solution: Let B be the event o f toxic fruit and Av A2be events o f selectedfruit being 
an apply and a mango respectively.
73
72
UNIVERSITY OF IBADAN LIBRARY
(b) Let the number of Saturdays on which he catches fish be a random variable X =  1 — P (f a i le d  to meet)  
=  1 -  0.296 
with B ( s , ~ ) =  0.703
P(X >  3) =  PCX =  3) +  PCX =  4) +  PCX =  5) = ( ! )  (0.65)3(6.35)2 +
Practice Questions
( ! )  (0.65)4(0 .35)1 +  ( ^  (0.65)s(0.35)° 1. If/11,/12, and A3 be any three events, prove that
=  0.3364 + 0.3124 +  0.116
3
=  0.765
Here we need to calculate the probability that he goes to each of the locations without P M , + /l2 -M 3) =  £  p M i) -  Y , p  M l +  A2 + /13)
1=1 i=j
catching fish It is important to note that addition theorem can be validly applied only when 
P(F') the mutually exclusive events belong to the same set.
P i s W / s )  _ M - 1  =  n?R6 2. A newspaper vendor sells three papers: the Times, the Punch and the Commet.
p(f') -  ^ 70 customers bought the Times. 60 the Punch and 50 the Commet on a
Similarly. particular day. 17 bought Times and the Punch and 15 the Punch and the
Commet and 16 the Commet and the Time while 3 customers bought all three 
P ( P / n  =  « ^  =  #  =  L  0.429
papers. Every customer bought at least one type of paper. Using Venn diagram 
or otherwise; find;
P(L /F ' )  = nL)Pl' ' /R) =  ?  =  L  0.286 
1 '  20 (i) how many customers patronized the newsagent on that particular day?
So it is mostly likely that he has been to the river. (ii) how many customers bought a single paper?
(d) Let Si. Sj denote the event that the first and second fishermen goes to the sea (iii) how many customers bought Times but not Commet?
respectively, and define R/, R:. L /, ^similarly. (iv) how many customers bought the Punch or Commet. but not the Times?
.the probability that they meet on a given Saturday (assuming independence) is 3. A random sample of 60 candidates who sat for Part I and II of an examination 
P(SX n S 2) +  P{RX n R2)  +  P{LX n  L2) in 1984 is taken. The table below' shows the number of candidates who passed 
= -1 x 1- x1- x1- x1 - x1 -  or failed each part of the examination.
2 3 4 3 4 - 3
Part I
= rs = 0.33
Part 11 Pass Pass Fail Total
Probability that they fail to meet on a Saturday is Fail 20 35
( i - j K = 0-666 Total 24 60
1 he probability that they fail to meet on three consecutive Saturdays is
i) copy and complete the table
( 1 - j ) 3 = ^ = 0-296 ii) if a candidate is chosen at random from the sample, use the table to
l lie probability that they meet at least once in three weekends is
74 75
UNIVERSITY OF IBADAN LIBRARY
find ihc probability that the candidate:
a) passed part II component B3 will fail with probability 0.6. Also, if component B| fails, the 
b) passed parts 1 and 11 device will shut off with probability 0.2 ; if component EL fails, the device will 
c) passed part II but failed part I. shut off with probability 0.5. if component B3 fails, the device will shut off 
iii) if a candidate is chosen at random from the subgroup o f those who failed with probability 0.1. The device suddenly shuts off, what is the probability 
Part I, find the probability that the candidate passed Part II. that the shut off was caused by the failure of component B|.
4. Given that: 9. Stores X, Y. Z sell brands A. B and C of men’s shirts. A customer buys 50% 
(i) P(AnB) = P(A)P(B) of his shirts at X. 20% at Y and 30% at Z. Store X sells 25% brand A. 40% 
(ii) P(A/B) = P(A) if P(B) > 0 brand B and 25% brand C'. Store Y sells 40% brand A, and 20% brand B and 
(iii) P(B/A) = P(B) if P(A) >0 30% brand C. Store Z sells 20%
Show that (i) implies (ii). (ii) implies (iii) and (iii) implies (i)
5. Consider the experiment of tossing 2coins. Let the sample space S = {(H,H), 
(H.T), (T.I-I). (TGI)! and assume that each point is equally likely. Find:
i ) the probability of two heads given a head on the first coin
ii) the probability of two heads given at least one head.
6. Given that two dice are tossed. What is the probability that their sum will be 6 
given that one face shows 2?
7. A certain brand of compact disc (CD) player has an unreliable integrated 
circuit [/C]. which fails to function on 1% of the models as soon as the player 
is connected. On 20% of these occasions, the light displays fail and the buttons 
fail to respond, so that it appears exactly the same as if the power connection 
is faulty. No other component failure causes that symptom. However, 2% of 
people who buy the CD player fail to fit the plug correctly, in such a way that 
they also experience a complete loss of power. A customer rings the supplier 
of the CD players saying that the light displays and buttons are not functioning 
on the CD. What is the probability that the fault is due to the IC failing as 
opposed to the poorly fitted plug?
8. An electronic has 3 components and the failure of any one of them may or 
may not cause the device to shut off automatically. Furthermore, these failures 
are the only possible causes for a shut-off and the probability that two of the 
components will fail simultaneously is negligible. At any time, component B| 
will fail with probability 0.1, component B? will fail with probability 0.3 and
76
77
UNIVERSITY OF IBADAN LI RARY
CHAPTER4 t i \ ) for every interval fa, b ]
FUNDAMENTALS OF PROBABILITY FUNCTIONS P ( a < X < b }  = { * f (x) dx
Then X is said to be a continuous random variable with pdf
4.1 Introduction
1 lowever. f (i) and (ii) above holds and
A random variable X is a real valued function that assigns values to every elementary 00
outcomes of an experiment. Let E be an experiment, with elementary outcomes (iii)  ^ / ( x o  =  1' and
el , e2, e3, e4, .........in the sample space S, thenS =  (el l e2, e3l e4............. }. i =co
A .random variable X can take values 1,2,3 ,4 ,........ for finite or countable infinite (iv) for all i, i =  1 , ci +  1 ,... ,b s . t .
elementary event. b
An event may consist o f one or more elementary events, for example: P(a <  X  < b)  =
A = {ev e3, ek+1: e,eS} 1=1
B =  {</>} a null set Then X is said to be discrete random variable with probability mass function (pm 0 f(Xi)
C = {<?!} a singleton Note:
D = {ex, e 3} a doubleton d rx rx
f a  J hx) dx ~ f{x)> !\x) -  J /(X) dx
Independent events: Two events A and B are independent if the occurrence o f A has 
no influence on the occurrence o f B and vise versa, Where /Jxj is the pdfof the random variable X and F{x) is the distribution function, then
i.e P(AHB)  =  P(A) .P(B)
F[t) = /(Oancl
Independent Random Variables
The random variable X and Y are said to be independent if for any two set of real 
numbers if for all A and B. - /= (« ) ]= /io
P{X < a ,y  < b} = P{X <  a,)P{Y  <  b)
P ( A r \ B  = P(A). P(B) Consider a continuous random variable X defined on an interval (0. a]. Let x be a 
point on [0. a) i.e. a  value o f x.
4.2 Probability Density Function (pdf) P(>a) = Pr{x0 < X  < x 0 +  xa]
Suppose X is a random variable and 3 a function / w  such that It follows that
(•) /(*) ^  0 p(2xa) =  Pr{xo <  x < x 0 +  2*a}
(ii) / u ) has at most a finite number o f discontinuity in every finite interval on the =  Pr{x0 < X < xQ +  x]  +  Pr[xQ + a: < X < x 0 T 2xa]
real line '-P {x ) + P (x)
(*»») C mf(x)d x = 1 =2 P iX)
78
79
UNIVERSITY OF IBADAN LIBRARY
11 follows that
P( n x )  =  n P w
X and Y will be dependent if  Z is the number o f successes in the n +  m  trials i.e. 
If (0 <  x  < a) and we consider/5̂ ) to be contiunuious at x  = 0, then it is Z =  X + Y
KmPw  =  Pm  =  0
It follows from the above that Example
Pr(x =  x0) =  0 f o r a n y x 0. If X and Y are independent binomial random variable with respective parameters
Thus for a continuous random variables we define a probability density function (pdf) (n,p)  and (m,p).  Calculate the distribution of X + Y
f(x) such that Solution
Let
Pr{a < X < b] = t f f M dx 
n
For all real values a and b
P(X + Y = K)  = ^  P(X  =  t, Y = K -  i)
Equation (3) can be rewritten as i= 0 
Pr{a < X < a + h} = h f a  + 0(h)
Or n
Pr{a < X < x  +  dx} =  f o d x  = Y j P(X = i) P(Y = K -  0  
(=0
Front the above, we can deduce the following
(0  fix) ^  0 ii
(ii)Pr{a < X < b) = f *  dw dx =K
i= 0
(tit) J  _ f (X)dx  = 1  =  Pr {—oc <  A' <  -cc]
(iv) 0 <  f (x) <  1
In term of the joint distribution function, the distribution o f X and Y is
F(a.b) = Fx(a)Fy (b) V-a.b.
i  =  1
m+n-k
Example: Suppose that n + m independent trials have a common probability of p q
success P If X is the number o f success in the first n trials and Y. the number of where
success in the final n trials. Show that X and Y are independent. c n - i o r a
i=o
Solution
and (y )  =  0 when j  > r
P(X — x, Y = y )  = Q p V ~ *  (y  ) Py  (1 -  P)m~y 0 < X < n
=  P(X -  x)  P(Y = y)  0 <  y  < m
80
8 1
UNIVERSITY OF IBADAN LIBRARY
4.3 Distribution Function
Solution
Distribution function forms the foundation o f the theory of probability and statistics. (/) p{X  =  -  l)=  1 -  p\  3 a jump discontinuity at x = 1
If the value of .Vobserved in n-experiment is less than or equal to a k-times, then (ii) p ( X  = 0) = 0: F is contains at x = 0
(/,/) P( x >  i) v , o
F ,(at) = /> (*< *) = -  
n = F(I) -  ^0 ,
If A' is discrete and m is the number of times X  is observed in n trial, then 
= \ - p + ^ p - ( \ - p ) = ^ p
n Theorem
f x(x) is the (cumulative) distribution function The distribution function Fx(x) o f  a random variable is non-degreasing, continuous 
/,,, is the probability density function. on the right with Fx(-oo) = 0 and Ft(-oo) = 1. Conversely every function F, with the 
Let A' be a real random variable on the probability space(n, A J }). For x e  93, we above properties is the different from a random variable on some probability space.
define
(/) P ( X < x ) = F x(x) Proof: For x < x '
(ii) Px(a.b) = P(a < x  < b) [ X  <  x ' ] = [ x  <  x ] + [ x  <  X  <  x ' ]  
=  f x ( b ) ~ f x ( a ) ( b > a ) p \ x  < .y 1 ] = p \ X  <  a-] +  p\x <  X  < a 1 ]
Kxuinplc: Since p\x < X  < a 1 ] > 0 
Let X  have the distribution function Fx( x ' ) -F x(x)>0
0; 1/ A <-1 I his implies that F  (a ) is monotone non-decreasing in a 
(a ) 1 -  p \  \t -1 <A<CF  = 1 Consider {a ', j; a,1, a 
1- P + - P M  0 < A < 21- 1/ a > 2 Since [a  < X  <> dlt] -> <j> as a,1, l  x 
) -  K(x) 0 asx), i  a
Find
Since this is true for every sequence {a,1,} then Fx (a) is continuous from the right.
W p ( X  = -i)
(») p ( x  = 0) For a continuous random variable X. the c.d.f is defined as 
(m) p ( X > )
= P U X  < x )  =  £ . f (c)dt  
If X assumes a value between and b
Pr{a < X <} =  P(X < b) -  P{X < a)
= F(t» “  F(a)
82 83
UNIVERSITY OF IBADAN LIBRARY
F(0 =  £  'X *  =  *i)
=  £ f t * ) dx i =0
From (5) we obtain Example:
_  dh\x) Let the probability space by ('Jt./?.^)and Xb e  the identity mapping o f SJ?to R. wherep  
I (*) clx is the normal probability distribution. Then
From (4) we can also define
Pr{X > x} = Pr[x < X < + c o }
=  Pr{ - o o  <  X  <  + 00} -  Pr{ - c o  <  X < * }
It can be noted that Fy increases continuously. Since /*’(•) represented the cumulative 
probability at an event, its maximum value is unity and non-negative. 
1 ~  P(.X)
This is often referred to as the survivor function i.e. F ( -  00) = 0  = tim F[y)
=  1 “  F(x) F(i * ) =  1 = fim F(y)
Hazard function is a related quantity defined by ~ lim F[s) => lim/7 from the left 
For a discrete random variable X. the equivalence o f pdf is probability mass Uni /•[,, => lim/7 from the left
(pmf) defined as
P(x) = P(X = x,) 4.3.1 Distribution Function for Discrete Random Variables
Let us define the distribution function for the discrete random variable as 
t t
F(t,  =  £  P(X =  x t )  =  £  Pr(X =  xt ) /•,' (.v)= /*(.\’ < .v). then.
t=l C=1 P(.x < .V < .v1 )= 7>(.V < -v‘) -  P(X  < x)
Example: which tends to zero as .v t  .v1 and F * is continuous from the left. 
Let X be the number of success in single trial of an experiment with constant f {x ' )h /*'i (.v + 0) is the limit from the right 
probability P. When the trial is repeated n times then
l'{x )-  F (v - 0) is the limit from the left
P{X = x)  =  Q )  pxqn~x where q =  1 — P
It is known that f\.{x) for discrete random variable increases by jumps, and is called
n t the step-function. F ,(.q
1=1 1=1
=(P  +  <0 " = 1
Recall
£(X ) =  np, Var (X) = npq = a2 -X| x, x.
85
84
UNIVERSITY OF IBADAN LIBRARY
K .vain pic:
Consider a random variable ..V with distribution function given by
0: x  < 0 («) 4 ' - = > < ) = 4 v  = X ) - 4 < >
1 0 < * < 1 (iii) Pr(.V ~-\)=Pr(x* -  l ) -  Pr(.V < 1) = 1 .1  = 0
4 ’
1/ .
/ 4 ’ 1 < x  < 2 M  Fr(.V = 2)= Pr(A" £ 2 )- /> (^ < 2 )  = ̂  + i j - ^ i + l j
1/ -
/ y 2 < x  < 3 = 16 - 12 =1: x > 3  ~
( i) Sketch the distribution function and hence or otherwise (v) P(X > 2)J(X > l) = = = “
^ > 1 )  \-F(\) 3/ 9
(//) ( 'alculate Pr|,V = j/y |
(vi) F ( i ) - F ( 2) = \ ~ y ?i= y }
(iii) Calculate Pr{.V -  lj
(«) f (2-) -F(v ) = / 4 - / 4  = 0
(iv) Calculate Pr{A' = 2}
(v) Calculate the conditional probability that X  is greater than 2. given that X  is (vii) f ( i  ) - 4 o ' ) = % - o  = X
greater than (iv) l - F ( r ) = l - %  = %
(vi) Pr[2< X  <3}; (vii) P r { \ < X < l } (x) I - f (3')=1-1=0.
(viii) l*rJO < X  < ij ( a )  Pr {X>2} ; 0  P r k > 3 }
4.4 Jointly Distributed random variables
Solution If the occurrence of event X that affects event Y we require the concept of conditional 
probability.
The conditional probability distribution function of X given Y for discrete random 
variable is given by: P(X/Y) = P(y)>  0
P(Y/X) = P(X) PW  > 0
While for continuous random variable: f (x /Y )=
1 fylY)
r ( r / n - j & § g p
86 8 7
UNIVERSITY OF IBADAN LIBRARY
Definition: Lei (£2,e, P ) be a probability spaceandlet B be an event with P(/l) > 
O.Then the conditional probability of B given A is defined by P(B/A) = CO
P(A) > 0 /*(*) = j  f(x,y)dy X, Y, continues
P{A)
But P{AB) = P(B/A) P(A) = P(A/B) P(S) 
Recall the Baye’s theorem
Px0 0  =  ^  P{X =  XJ = ^  pij X, Y discrete
P(Bk/A)  =
ZHs l P(A/BiP(.Bi) j
the m.d.f. for random variable Y is
Two random variable. X and Y are jointly and continuously distributed if there exist a fy(y) = X, Y, continues
function /(^d e fin ed  for all real x and y and a two dimensional plane C such that:
J  J  Py (y) = Y,jP{Y = y i )X,Y discreteP{{x,y)e C} = x ,y  e C f {x,y)dx dy
Example: The joint d.f. at X and Y is given by 2 e~xe~2y
{P{X = x t, y  = y{] = pu > 0 0 < x  <  CO
ZXpij = 1 Compute (i) P{x < l ,y  < l / 2}
The function / (x y)is called the joint of X and Y. Satisfying the following conditions (ii) P{x < y)
(0 /o ,y) > =  1, V x . y e C ( i t t ) R ( X  <  a )
Example 2:
(u) ^  = 1, for  X, Y discrete Given f {xy) =
j 2(x + y  -  3xy2)
(0 elsewhere 0 < y  <  1 ' 0 <  x < 1
x y
f  x J y  Find (0  Pr{0 < X < 3/ 4}(tv) P[X/Y < l / ^f (x ,y) = 1, f o r  X, Y contiunes
For discrete random variable. GO Pi1/ io  < y < 3/ 4}(^) Pr[X < 3/ 4 / k < V 2)
The joint distribution function of X and Y is given by: (iii) P(*.y)
j  J  4.5.1 Conditional Distribution of Jointly Distributed Random VariablesP(X,Y) = f{x,Y)dx dy =  P(y) > 0
x y x y
t= l)=0 1 j
and the marginal distribution d.f for X. is defined as
89
UNIVERSITY OF IBADAN LIBRARY
Exercise: Proof:
If X and Y are independent Poission random variable with respective parameters A, Let F''[t)~ P{l) then
and A2. Compute the distribution of X + Y.
Solution
= / W
Let P(x +  y  = n) = Pr(X =  k, Y = n -  k)
for 0 < k < n 
and disjoint events Two random variable’s .Vand Y are said to be stochastically independent different: 
(X =  k,Y = n - k ) /(■'■• )') = f  (x)fi (.')■ < x  < co
Exercise -co < y  < 00
Given the following probability distribution function
2 4 Where /(.v,_y)is the joint density function of X  and Land f ( x )  and /,(y )  are the x/y 1 3 marginal p.d.f of X  and Y respectively.
1 V24 Vl6 1/q
2 V Vs V 24 lU Theorem: Two random variables are stochastically independent if and only if the12
joint p.d.f can be written as product of a non-negative function ofx alone and a non- 
3 5/24 5/16 5/8 negative function ofy  alone.
v 3 V2 v6 1 Proof:
Find p(XY) 11 /[<., ) - gix ) h{)’) where g(x)aii(l li(y) are non-negative function of.r and;-alone 
respectively, then the marginal pdf at X  is given by
4.6 Independence of Functions of Random Variables / M = £ / t , . , i * -
Two random variable’s X and Y are said to be stochastically independent iff: 
hx.y) = A 00/ 200; -co <  x < co = J  _ s(x)h(y) dy, where g(x)is a function of  x alone
—co < y < 00 • • AW  = ^(-v)J h ( y ) d y  
where f (x.y) >s the joint density function of X and Y and / a(x) and / 2(y) are the 
= c, g(.t)
marginal pdf of X and Y respectively. Similarly, the marginal p.d.f of Y is given by
Theorem: Two random variables are stochastically independent if and only if the 
joint p.d.f can be written as a product of a negative function of x alone and a with 
negative function of y alone.
= J  s[x )h{y) dx, where h(y)is a function o f  y  alone
Where / (.v) is the pdf and random variable ̂ Vand f(x)  is the distribution function. A W  = h ( y ) \ " g ( y ) d x  
-  c, h(v)
90
91
UNIVERSITY OF IBADAN LIBRARY
CO CO
-3-
rH LO
The last two integrals being iterated integrals w.r.t. two measures respectively and the 
But since f lt y) is the joint pdf of X  and first then integral w.r.t. product o f two measure.
OR I f  f [x,y)= g(x) h(y) for  some function gaud h
J  |  fix. v fo  dx ~ 1» by hypothesis then J g[.x\h |  h(y)dy = J f ( x , y ) d  (.v,y)
1 B AXB
j” £  fu. x)(lx dy = [ 'S ,sM hb ) cLx dy Where A is a unique-infinite measure A, X A2 
Applying Fubini’s theorem to the finite integer we have:
Fubini Theorem (2): If h is a non-negative function on X  Q, then 
dy { \ j ' ^ dy) ^  ltd A = ̂  It dp, dp2
Letting C, * C3 = 1 = \\h dp2 dp,
l l a ... , = r x ^ ^ K v ) = i The above reduces to Theorem (2) above in the case of indicator function of 
rectangles.
■'■fx.x)=ClSlX) C2h(y)
= J \ ( x ) f 2(y) Lemma
which implies that X and Y are stochastically independent. Let X  and Y be stochastically independent random variable the pdf of Z  = X  + Y is 
Fubini’s Theorem: (1) A necessary and sufficient condition that a measurable subset given by g(2-)= [  f(x)h(j>) = j ‘ f(x)h(Z -  x) dx 
A of Q, X Q, has measure zero is that almost every w, -  section (or almost even Where /(.v) is the p.d.f of X and 
iv, sec lion has //, -  measure (or //, -  measure ), zero. h(v) is the p.d.f of Y.
I f  A = A, X A 2, A(a ) = j p f A w J d p ,  (vV|)
Proof:
= jp,(Aw,)d/i2(w2)
Let X have the p.d.f f (x)  and y  has p.d.f h[y\
= p (a , ) p (a 2)
pdf of 2? .  p(\z  < z}) = P{X Y < Z) the joint pdf of X  and Y  is f ( x )  h(y) since ,Y 
Fubini’s theorem gives condition under which it is possible to compute double 
and fare stochastically independent.
integral using iterated integrals. It allows the order of integration to be changed in
iterated integrals. •.c(z)=££7MMv)<fcrf>-
=i»rn dxTheorem
Suppose A and B are complete measure spaces. Suppose f sy)‘s A X  B measurable if Since < G(z)< I, by Fubini’s theorem
I.YS
Then Jf (x ,y)d{x ,y )=\  J/(.t,y)</y dx = j [  \ f (x ,y )dx  \dy
A X B  A fl B \ A  J
93
92
UNIVERSITY OF IBADAN LIBRARY
Exercise
(7(2) = \ j \ x ) H ( Z - x ) d x Find g(&) if A' and Y are independent with parameters A, and A, respectively show
that the random variable)' are stochastically independent
(lit * ' + x ' ao d l' - ^ 7 Z
JG(Z) ■ H (2 - x ) \d x from an exponential distribution.
- \ : m d i Solution: Since Agind A; are independent, the joint p.d.f of v, and v,.//(.vr  .v,) is 
By continuing of distribution function given
*.*(£) - J‘ f (x )h (Z -  x)dx tx  Y x {/(•'.)/('■-’) 0 < < 0
Example 1: 1 ’ |0 elsewhere
L.ct .V and Y be stochastically independent random variables, each having the poission j 1 0 < A", < 00,0 < x2 < 00
distribution with parameter A. Find the distribution of Z  = X  + Y1 [0 elsewhere
Nw r, =/./(.v, .Y,) = .r, + .v,
Solution
e'AA' r 2=//(.vI.xJ) = - L
/  (v )  = h(y)= •v.+*2
which defines a mapping (l -  l) transfonnation from the space
. ^ ) = Zy> / ( v) / 'Wl*U A -  {(.v, t- x, ),0 < .v, < x , 0 < x, < =c}unto the space
= Y j /  (v) -  x l  S'nce y  = z -  -x
> <ii B = (O’i + J’; ),0 < r , < x ,  0 < >*, < l}.
I he inverse transformations are given by
•v, = .»i v.; x i = y t >2 = .»',(> - ) :i)
Applying Binominal expansion, we have dxi _ dx= v,; i =_ V.
dy2
g( *  S -v!(?-.r)! d\\ 5v,
= - y ,
= — (A >Af  d>\ dy.
2! V '
(2A f  ~ P(2A)
2!
94 95
UNIVERSITY OF IBADAN LIBRARY
A . , . I  ~ A x ,) fU s )—A .i.)
1 he Jacobian o f  the transformation
By stochastic independence, since the marginal pdfs 
civ, dx2
Vj V, /(■*,) = /('■ ,)./=  I. 2.....ndv, dv:
f a l-.v , - y The random variables are said to be a random sample of size n from a distribution fa
dy2 which has pdf / ( v).dX
3 -CV, -  V, V,) Exercise>f2>\
l et .V, mill X.  he two stochastically independent random variables with p.d.f
. J  a 0. since v, is not ilentically zero —— c II < x. <cc
l<tf)
. = o < 7 , < o o , 0 < y , < I -----1 --- a*;/ /-I e  - »- - ,  a  <  .v , <  co respective l»y.
|0  '  elsewhere r\ P )  - F
 ̂  ̂ [e- '1 - v, o < yy < co, 0 < y , < 1 Where r(-) is the gamma function.
' 1 ’ 10 elsewhere XDefine Y = By defining a suitable Y: , a function of X, and X ,.
A, — A i
o < < co. 0 < v\ < 1 Calculate:
e ls e w h e re (a) thcjoinl pdf of }j and )\ and 
hence (b) the marginal pdf of }j.
Kxercise: hence or otherwise show that
i inel the marginal pdf of y, and marginal p.d.f of vv 4.7 Functions of Random Variables
Suppose .X is a characteristic of interest, the p.d.f fx (X) may refer to the pdf of a 
\ and \ are stochastically independent. given population. Another characteristic V (which may be a function of X may be of 
)'i e" '  /»(>■;) = 1, 0 < v = < l interest Therefore, there is need to obtain the distribution of the later variable.
It should noted that I bus. given the pdf or c.d.f. of therandom variableX. the pdf or c.d.f. of 
), has the gamma pdf with parameter a  = 2./? ■= 1 anotherrandom variable Y may be obtained as a function of X.
>. has the uniform distribution over (0.1) There are two given major technique to achieve this. They are CDF technique and the 
\, mill W have exponential distribution with parameter 1 . Frans formation technique.
Definition (lor more than two variable) 4.7.1 The C DF Technique
I el A',,A'.......V,,// mutually stochastically independent random variable s. each of Given the CDFofY (Fr (A)with some function of interest (say) Y = g(x)  is of 
which has the same p.d.f f(x) which may or may not be known. Then interest.
97
96
UNIVERSITY OF IBADAN LIBRARY
I he idea is to express the CDF of Y in terms of the distribution of X. Define set Example 2: 
Ay = {X/g(x) < y) It follows that [{Y < y}J and X 6 Ay \ l.el/*(x) = Lx 0  <  a: <  1
i.e. Fy(y)  =  Pr(g(x) < y) and y = 3a: + 1 
In the continuous case find the distribution of g{y) 
Solution
Myl = /'7 ,W  dx
y = 3Ar + y = ^ X  = ^  
=  FX(x2)  -  f x M
Pyiy) -  P(Y < y )
and p.d.f of fy(y)  =  df dy  Fy(y)
=  P[ 3 * +  l  <y|
Example 1:
Given Fv(.r) = 1 -  e~2x, 0 < x < oo (y-i/= f0 3 2x dx
Find the pdf of Y = ex
Solution
/y(y) = p O' *  y)
= P[ex < y]
= P[X <  Iny]  
= PxUny)
but fx M  “
= —ax ( l - e ~ 2x)
= 2 e~2x 4.7.2 Transformation Method
and Fy(y) = J0/n> 2e 2xdx Let X be a continuous random variable with pdf / (V) > 0 for a < X < b and y 
Iny
Fy(y) =  e ~ 2x r?(x). If there is a one-to-one transformation from A = {x/ fy (x) > 0} on to II 
|n |K//y(y) >  0) with inverse transformation.
==   e1 lny~2 + 1 1 <  y  <  CO X = w(y) if the derivative d/ dy  w(y) exist, then-  y~ 2, 
/y (y ) = / > ( y )  | ^ |  y  e / i
/y(y) = ^ /v(y)
a Where lIt—/y II is the Jacobian of the transformation 
Y could be monotone increasing or decreasing fx(y)  = Hx(y) |^ ;|
9 S 99
UNIVERSITY OF IBADAN LIBRARY
Example 3: Using the last example Example
f (x) = Zx 0| <  X < 1, and y  -  3x
l c l /U ) - 5 7 ( V 2). * =  -2 ,-1 ,0 .1 .2 .
Find the distribution of Y = |X|
clx _  1 Solution
cly 3
3(y) = 2 (t 1) * j 
= |C y -D , 1 < y < 4 / y ( l ) = « - l )  +  / i ( l ) = £  +  i  =
Example 3: / , ( 2) =  A ( - 2) + A ( 2) = i r  o
0  <  30 <  CO 4
Given = j1Q9
elsewhere 77 y = 0
determine the pdi of y  — X~
Solution
f (x) = 2x e ~*2 Exercise
y  = X2 =>  X  = -y1 /X/i 1. Let X have a Poisson distribution with p.d.f f(xy = - ~ a a *dy 1 x!
x == 0, 1, 2,...
sOO = fx(y) \~\ I .et Y = 4X, derive the pdfof Y.
2. A random variable X has pdf
= 2 y ' / i e - y  * V 2 y ' 1/2 f(x) = 1 0 <  X < 1
= e~y 0 < y < co Find the pdf of Y = - 2 In X
3. If the random variableX~ N (0,1), find the pdfof Y ■ A'2
4.7.3 Transformation that are not one-to-one 4. Use the transformation method to solve the problem i
If 5 0 )  is not one-to-one over A = [x/fx(x) >0}; then thee is no unique solution to example 1.. j4X' 0 < X < 1  
equation y = g(x).  It is usually possible to partition A into disjoint subsets 10 elsewhere
Ay ,Al , A3 ... such that f.i(x) is one-to-one over each Aj Use the C 'L)F technique to derive the pdfof
f y ( y ) = ^ / x ( ‘V/(y)) (i) Y -  X \  (ii) w = ex(iii) Z = InX (iv)p =(,Y -0.5)2
i In the above example
i. e. fy {y) = ^  fx (xy)where the sum is over Xysuch that /tt(xy) = y
P{x = 2 / y  = 3) = - ^ 1
17 P{Y= 3 )
—
-  v 6
100
101
UNIVERSITY OF IBADAN LIBRARY
CHAPTER 5 5.2 Binomial Distribution
SOM E DISCRETE PROBABILITY DISTRIBUTIONS In Bernoulli distribution, there is just one trial that can result in either success or 
failure. But, in Binomial distribution, we have repeated and independent trials of an 
experiment with two outcomes resulting in either success or failure, yes or no etc.
5.0 Introduction The probability of exactly .t successes in /; repeated trials is given by:
In this chapter, we will be studying some discrete probability distributions with a view 
to obtaining their men and variances. p'q"~ ; x  = 0, 1, 2, .... ...., n
\-x)
5.1. Bernoulli Random Variable 0 elsewhere
A random variable X, that assumes only the value 0 or 1 is known as a Bernoulli where p is the probability of success
random variable. The values 0. or I can be interpreted as events of failure and success q = 1 -p is the probability of failure 
respectively in an experiment usually referred to as Bernoulli trial. x is the number of successes in repealed trials. 
Definition 1: A random variable X is defined to have a Bernoulli distribution if the f(.v) is the probability density function (p.d.f).
discrete density function of X is given by
(p * (l -  p )1_*forx  = 0 or 1 )I ~ - M —v .  ,• \ 5.2.1 Properties of Binomial distribution
/ M  = = p*( 1 -  p)1 */{o, 1}0 ) (i) It has n independent trials
0 otherwise J (ii) It has constant probability of success p and probability of failure
Where the p satisfies 0 < p < l . l  — pis usually denoted by q q =  1 “ P-
Theorem 1:If X has a Bernoulli distribution, then (iii) There is assigned probability to non-occurrence of events.
H(X) = p. Vcir(X) = pq (iv) Each trial can result in one of only two possible outcomes called 
success or failure.
Proof:
E(X) =0.q  + l .p  = p 5.2.2 Mean and Variance of a Binomial Distribution
Var{X) = E(X2) -  ( E[X])2 f(x) - p'q"-' x  = 0, 1, 2, ......... ,n
= 0 2.q + l 2. p - p z = pq
Bernoulli distribution is a special type of discrete distribution sometimes referred to as (i) Mean:
Indie tor function. This implies that for a given arbitrary probability E(X) = z  * f {x)
space[S, A, P(.  )], let A belong to A , define the random variable Xto be the 
indicator function of A; that is x(w)  = then A has a Bernoulli distribution with
parameter p = P[X = 1| = P[A],
//!
(/7—at)! p  qjc!
102 103
UNIVERSITY OF IBADAN LIBRARY
- z.r=0 * ------------------ P q £ [* (* - ! ) ]  = « ( /i- l) /r(//-.r)!,v(,v-l)! 
/. £(JTJ) = 4 V (^ '-1 ) ]  + E(X)
(» -!)!
- • S : (ii-x)l(x-l)! p'  p x l q"~’ = n(n-l)p2 + np
n-1 /. V(X)  = E { X 1) -  [E{X)]2 
-  np ("-D I
frl («-x)!(x-l)! p q = n(n-l) p2 + np -  n2p2 
= n2p2 -  np2 + np -  n2p2 
Let s = .r -  1, .v = s + 1 = np -  np2 
= np £  (""I)!L  -------- xr~,  q = np (1 -p)j.o (/»-.s-l)!jr! p = npq
'/ l - P Remark: The binomial distribution reduces to the Bernoulli distribution when n =  1 
~ "p z*=o P'q'-’-'S Example 1:
= np (p +q).nn‘- l =_  np It is known that screw produced by a certain company will be defective with 
(ii) Variance: probability 0.02 independently of each other. The company sells the screws in 
Var(X) = E(X2) -  [E(X)]2 packages of 10 and offer a money back guarantee that at most 1 of 10 screws is 
E[X2] = E[X(X-1)] + E(X) defective. What proportion of packages sold must the company replace?
Solutio n
Let X be the number of defective screws thus n = 1=, p = 0.02
//
-  z  * (" “ !) Pr (at most one defective) = 1 -  P(X = 0) -  P(X = 1)
/7! /> (^> i) = i - m ' < i )
= ^ x ( x - l ) — t p V " ' 10N
(/j - x)x! = 1 (0.2 )u(0.8)' (0.2)' (0.8)9
0
n(n-\)(n  - 2 ) !
= 2 > ( * - l ) p„ 2  p_   t=2 q n -.T What is the final answer?(»i -  x)!x(x -  l)(x -  2)!
Example 2:
= n ( n - \ ) p 2 T ( « - 2)! „.t=2 n -  v
£  (« -* ) ! (* -2)! Z7 7
A communication system consist of// components each of which will, independently 
function with probability p. The total system will be able to operate effectively it at 
Let s -  x-2 x -  s + 2 least one-half of its components function.
= « (« -  D^ 2 (if -  2)! For what value of p is the 7-components system were likely to operate more 
U  (#i -  j2 -  2)!s! p q effectively than a 5 components system.-  //(// -  1) p 2 £ ( a -  >
.v P’q"-’- 2 Solution
A 7-component system will be effective
104 105
UNIVERSITY OF IBADAN LIBRARY
5.3 Poisson Distribution
If  P(E 7 > 3) = P(E = 4) + P(E = 5) + (P(E = 6) + P(E = 7).
= 1 - P(E < 3) = 1 - P(E = 0) - P(E = 1) - P(E = 2) - P(E = 3) When n becomes large and p is fairly small, the use of the binomial distribution in 
calculating the various probabilities becomes cumbersome. To overcome this 
P V + p ' qs + 3 ^ , + p ’ problem, we use another probability function which approximates the binomial 
distribution. This probability function is known as the Poisson probability function 
A 5- component will be effective if which we shall be considering in this lecture.
P (£ s > 2) = P{E = 3) + P{E = 4) + P(E = 5) A random variable closely related to the binomial random variable is one whose
P 'q 2 + P'q'  + P S possible values 0, 1,2, 3,.....represent the number of occurrences of some outcomes
not in a given number of trials but in a given period of time or region of space. This 
The 7-component will be better if variable is called the Poisson variable.
P(E1 > 3) > P(E5 > 2); for q = 1 - p.
Complete this 5.4 Properties of a Poisson Experiment
Try for 5 and 3. A Poisson experiment is a statistical experiment that has the following properties:
1. The experiment results in outcomes that can be classified as success or 
failures.
Example 3:
2. The average number of success(A) that occur in a specified region is known.
For what value of K will p (x  = K/ P) /( X  = K  - 1) .b e ^8r eatter or less than 1 if X is a 3. The probability that a success will occur is proportional to the size of the 
b (n, p) and 0<P<1. region.
Solution 4. The probability that a success will occur in an extremely small region is 
virtually zero.
P(X = K) Note:
P(X = K -  1) } p k- ' ( \ - p y - k" (The specified region may take many forms e.g. length, an area, a period of time, 
k - \ j
volume etc)
(n - k - l)P A Poisson random variable is the number of successes that result from a Poisson 
k ( \~ P) experiment.
: .P(X = k)>P(X  = k - 1) iff The probability distribution of a Poisson random variable is called a Poisson 
(n - k + l)P>A:(l - P )  distribution.
i.e.K <(n + l)P Given the mean number of successes Athat occur in a specified region, the probability 
This implies that for the binomial distribution b (n, p), as k  goes from 0 to n, P (x=k) density function (pdf) of Poisson distribution is given by
first increases monotonically and then decreases monotonically, reaching its largest e-*U')
value when k is the maximum. P(x: A) = x\
106 107
UNIVERSITY OF IBADAN LIBRARY
where .t is the actual number of successes that result from the experiment. = t
A ~ np (// is the total number of observation in the experiment and p  is the probability « = | x ( x -  1)!
of success). = a y  Z Z fZ
Note that mean A and variance are equal i.e. A = mean = variance. Also A is the (x -])!
parameter of the distribution, with e= 2. 71828 L e ts= .r-  1
Some examples of random variables that obey the Poisson probability law are:
1. The number of customers entering a post office on a given day = * t .0  si
2. The number of misprints on a page (or a group of pages) of a book. = A
3. The number of packages of instant noodles sold in a particular store on a given
day. (ii) Variance
Var(X) = E(X2) - [£(.r)f 
Identities: E(x2) = E[x(x-1)]+E(x)
E[x(x-l)]= £  t ( x - l ) e ’xA*
,v=o • x\
— 1,  + A,  H--A--2- - 1- -A--'- K... r. _ --xX   <ij..rr - 2
2! 3! = A2 2  At - i  ( r - 2)! r  = r ~ 2 A2
Using the result, we have I-et s=x - 2
e x A' r. — A
2 X , = I x\ = y  L - L
s . O  S\
<r. It
= e y - = A2
&  *
Var(X) = A2 + A - [ A ]2
-  e e "1 =1
= A2 + A -  A2 
5.5 Mean and Variance of a Poisson Distribution = A
(i) Mean
Example 1:
E(X) -  Y jXP(x )
T = 0 The average number of homes sold by Assurance Homes Company is 2 homes per 
z'  e Ar day. What is the probability that exactly 3 homes will be sold tomorrow?r  = a '-'.ax\
Solution: A - 2 since 2 homes are sold per day on the average
z - e ’ '■ A' x = 3, e = 2.71828x\
108 109
UNIVERSITY OF IBADAN LIBRARY
, . e’ *As _ n ( n - l ) ( » - 2)......(n - x  + 1)P(x; A = np) = ----— 1 - -  1 - -
x! nx x!
= 2.71828"2 (2)* but lim , - * ?
3! n
P(3; 2) = 0.180
Thus the probability of selling 3 homes is 0.180 lim 1 -
n(n - l ) ( n - 2)....(/i - x  + 1)
5.6 The Poisson Distribution as an Approximation to the Binomial 
Distribution e~xX'
Let n and p be the parameters of a binomial distribution. Therefore, we have---- 7- ,  x = 0, 1,2....
X !
Therefore mean A = np Thus, the binomial pdf approaches the Poisson as n increases and p  tends to zero.
Variance a 2 =np(l -p )
If n -»oo and p -» 0 simultaneously, in such a way that A= np is fixed, then we can 5.7 Hypergeometric Distribution
say that p = where A is a fixed value. Consider a lot consisting o fm  + n items of which m  of them are defective and the 
remaining n  of them are non-defective. A sample of r  items is drawn randomly 
Then as n increases, the binomial probabilities. without replacement. Let x denote the number of defective items that is observed in 
' n
P(x) = p s (1 x = 0, 1, 2,.... Get closer and closer to the Poisson the sample. The random variable x  is the hypergeometric random variable with 
parameters m  + n and m. Then, the number of ways selecting x  defective items from
probabilities. m defective items i s ^ ) ;  the number of ways of selectingr -  xnon-defective items 
Proof: Given that from n non-defective items is(r  ” Therefore, total number of ways of selecting r 
e~xX'
P (x )= ------- where A =np,x = 0, 1,2,.... items with x defective and r  -  x non defective items is. ^
x! Finally, the number of ways one can select r  different items from a collection 
P(x: n, p) = p(x:«, ^ / )
of m +  n  different items is (m *  n). Thus, the probability of observing x defective 
x = 0, 1,2 ...... n items in a sam ple o f„r  i,t emo s (probabili^ty density function) is (~\mx )(  n?TJn\+~ri-vx )\ J '  frn(n -  1) (n - 2 )  (/;- x + l)(//-x)! A1 or x = 0 ,1 ,2 ,...... r, x < m and r  -  x < nr )
x!(/; -  x)!
n ( n - \ )  (n -  2). ...(/i-x + 1) ^  
x! nx
1 1 0 111
UNIVERSITY OF IBADAN LIBRARY
Example 1:
In lottery, a player selects 6 different numbers from 1,2,.. .,44 by buying a ticket lor 2 8 -6 ]
t naira (N1.00). Later in the week, the winning numbers will be drawn randomly by a T  am, we have U J l  8 - 0  J
device. If the player matches all six winning numbers, then he or site will win the '28]
jackpot of the week. If the player matches 4 or 5 numbers, he or site will receive a U J
lesser cash prize. If  a player buys one ticket, what are the chances of matching, (a) all 
6 numbers (b) 4 numbers. C ompiate this
Solution: Let x  denote the number cf winning numbers in the ticket. If we regard 
winning numbers as defective, then is a hypergeometric random variable with 7.» Mean and Variance of ElypergeoriA rk  DistritosiJf/j n
m + n  = 44, m  -  6 and n = 38. (§) Mean:
(  38? H(x) = t  xf(x )
f 6
uI i U J 1 6138! 1 .!=»(a) P(X = 6) =
44^ f44? 44! 7059052 /nc nc i'Uqn>2Ki rt-v.mn'Vxo-
6
S f f c . b n x ;  o r i i c r i r o b  : 6 j O' 7 •' ' . . m + n 
dtnobrm  nw sv ; U 38)f 6 fo siqrrtpa . A wfLeh arc
Cr
■ b) ?(X -  4) -  Iu 1 2 ...
ml II
-  ").QC. t-933 >.
■ 4 4 \ mo! (jn -  ;;)i.v!
C
A 6 J.
- s ni+it
Example 2 :. c
As part of a health survey, a researcher decides to investigate prevalence of cholera in 
S sub-urban areas but of a city’s 28 sub -urban areas. If 6 cf the sub-urban areas have a x  m(-m - 1)! c
very high prevalence rate, what is the probability that none of them wh! be included in I (m-A-)!.v(.r-l)! r - x
the researcher’s sample? c
Solution:
. rmy ?■ \ 
Retail that we have f i x )  =-   i~ ; .r-x:- as the p. d. f  for hypergeometric distribution c- r awt* nt+n
Here, x  =  0, n  = 22, m + n = 28 and m = 6 . -  c
M = * -  1 =  s + 1
,.i> —1-----
this implies _ ~  (W -J-l)!*!
C
12
113
UNIVERSITY OF IBADAN LIBRARY
m(m - 1) ^ ( * -  2)! . c
= —m+n . c c lll+rt S~> /  J^  i r-j-l W i-2 (m -x ) !(x -2)!
let s = x-2 x = s + 2
m
= —mn+n  . C
m(m - 1) ( m - 2)!
C r—I
Hl+hf' L—i (m - i1 - 2)!j ! f'J“2
m ( m - l )  x
m m+n ̂ *Ca "Cr , 2
m + n\ (m + n - 1)1
{m + n -  r)\r\ [(m + n - l ) - ( r - l ) )  ( r - 1)! m(m - 1) b+m. 2m+n /-*
m r
m + n! (in + n -l)! m(m  - 1) 
_ (m + n -r )!r!  (m + n - r ) ! ( r - l ) ! m  + n\ (m + n - 2)!
(m +  n - r ) ! r !  [(m +  n - 2 ) - ( r - 2 ) ) ( r  — 2)!
Simplifying gives 
m ( m - l ) ( m  +  n -  r) !r! (m + / i - 2 ) !
mr
E(x) = (m + n)! (m + n - r ) \ ( r -  2)!
m  + n
(ii) Variance: m(m - l ) r ( r - l ) ( r  - 2 ) 1  {m + n -  2)!
(m + n){m + n - l ) (m  + n - 2 ) !  (m + n - 2 ) !
£ (x 3) = e \x (x  — X) +  x]
rm(m -  l)(r -  1) 
= £[x(x-l)]  + E{x)
(m + n)(m + n - 1)
E[x(x -1)] = £ x ( x - l ) / ( x ) E(x')  = £[x(x-l)] + E{x)
= Z ' IHrc-.  nc,. x(x ~ ^  —in1 m ( m  - l)r(r- 1) r mm + n (m +  n ) ( m + n - 1) m  + n 
c. Therefore, K(.v) = E(x2) -  [£(.t)]3
Continuing gives m(m — 1) r(i— 1) rm ( rm
(m + n){m + n -  1) m + n \ in  + n
v -  , 14 (/i i -mx(m) ! x- l()x(-m1 ) -( x 2-)!2 ) ! "CW - . t Simplify this last expression to obtain rm ' m  +  n - r  (Post-Test 2)
= 2 ^ u - i ) -------------  ------ m + n —m+n ) , m + n -  ]
i=2 c
Note: If the sampling was with replacement, r and p = would be the appropriate
binomial parameter and its respective variance would be r ( 1 ----— ).
,s ^  (w -x )!(x -2)! m +n V m + n /
= m (m -l) 2  ̂ ------------ ^ T
114 115
UNIVERSITY OF IBADAN LIBRARY
M [ " ' ml ft! (r-x)!
The binomial variance is slightly greater than the hypergeometric variance because of LJ [ r -x ;
the factor ( ~ “ ) >n the hypergeometric variance. m + n (m + /Q!
r (m + n -r)!rl
Asm + n becomes very large compared to r, the hypergeo metric distribution tends to 
the binomial distribution. ml  ______ n\______ ( m  +  n - r ) l r \
( m - x ) l x \  { n - r  +  ;c)!(r — jc)! (m- \  n)\
5.9 Binomial Distribution as an approximation to the Hypergeometric _ r 'j w!w! (m -f n - r )i 
Distribution Kx )  ( m  -  x ) \ ( n  -  r  +  x ) \ ( m  /?)!
Suppose the p.d.f of a hypergeometric distribution is given by _ f  r ̂  ni(in - 1)...... (/>? -  x +1) (m -  cc)! n(n -  l)....(n -  r + x + l)(;t -  r  x)!
VV (m-x)\(n-r-Tx)l(m + n).... {(m + n ) -  (/•-!))
f  lw(/w-l)..... jm -(x - l)}  n ( n - l ) ......[ n - ( r - x - 1)]
_  v -V _______________________________________
(m + + n) -  (r - 1)]
then, we have the following theorem. Divide through by m+n
Theorem:
Let m, n ->ao and suppose that m x - \ 1
f\ m"  1
(  - _ 1 ) f  ® /--X-:.]
+n) \m+n m+n J rn +■ n m+n J)  * f[ jfj"+ n ))(y nin+n .'jj+n, \m+n i j +n
—in + n  = rm, , - + p , o < p < \ m+n m+n r - 1 )
n .m+n in+n m+n)
r - x
then PxqH~\  x = 0,1,2....... /-
in + n 
r ) f  m  V  m i }) \ m  + n ) \ m  + n m + n ) f\ m m x - n +  n m  +  n  J
Proof:
(  " ) > ) r - x 1
We have [  n, m  +  n ) . m +  n m + n  J f\  m  n+  n m \  ( r - 1)
'm) n ) 1... ( m  +  n)
... m 
( m + Since------ P, h
„e nc,e  --------n--- => 1
m + n in -h II
l r ; I here fore
UNIVERSITY OF IBADAN ilg 1L &U_IBRARY
P(r) =  prob o f  (k -  1) successes in (x  + k -  1) trials 
lim x prob o f  (x  + k)th success »»■*-«—»»> l X = r  +  k - l Ck_lPk- 1qT.p
m + n = r + k -  r  = 0, 1 , 2 ....................... eqn. (1 )
P'c,' Pk(fe +  r  -  l)(/c +  r  -  2) .... [k + r + 1 -  ( r  +  1)] ,
= ------------------------------ H------------------------------ 9r
m n Pk(k +  r  — l)(/c + r -  2) ....(/c +  l)fc ,
r - x = -----------------------ri---------------------- ’
This result implies that we can approximate the probabilities —y by
m + n 
r
=  Pk( - i y - k c rqr
-v P by setting p =  mf   provided m, n are large. This is true for all x = 0, 1, =m n No -tek cthTaPt:k( - q ) r................. eqn. (2)
2, ..... r. (i) r  + k — lck_l Pkqr> r  = 0, 1,2 ........
If m, n, are large, approximate the hypergeometric distribution by an appropriate 
binomial distribution. If the need arises, we may also go a step further in =  r  +  k -  l CrPkqr. r =  0,1,2.........
approximating the binomial distribution by the appropriate Poisson distribution.
(ii) 2 Z P ( r ) = P k Z?=0- k Cr( - q r
5.10 Negative Binomial and Geometric Distributions = Pk[ l - q ] ' k
Negative binomial and Geometric distributions are two families o f discrete = pkp~k = 1
distributions that are very important in Statistics. The Geometric distribution is so Equations (1) and (2) for k > 0 are known as negative binomial distribution.
named because the values of the Geometric density are the terms of a geometric series 
while the Negative binomial distribution is sometimes also referred to as the Pascal’s 5.11.1 Mean and Variance of the Negative Binomial Distribution 
distribution. (i) Mean
Recall that the moment generating function (MGF) of a random variable A',
M(t) = E(etx),
using the moment generating function approach, therefore, from equation ( 1), the 
5.11 Negative Binomial Distribution MGF of/? is
Consider a succession of Bernoulli trials, let P(r) denote the probability that 
exactlyr + k (k > 0), trials are needed to produce k successes. This will so happen M(t) = E(etr) = Y JC° 0etr(r  + r ~  ^  P*<? •
when the last trial, that is, (r + /c)th trials is a success with probability p and the But ( 1  -  * ) - »  =  Z ” 0  ( ~ n )  ( - * ) '  =  E,” o ( n + j j ~  !) * J  for - 1 < x < 1
previous (r + k -  1 ) trials must have (k -  1) successes with probability 
r  +- k -  l C(c i Pk~l qr , where q = 1 — p
119
118
UNIVERSITY OF IBADAN LIBRARY
Therefore Some authors define the geometric distribution by assuming 1 (instead of 0) is the 
M M = ^ e " ( ~ rk) p k(.-q y . smallest mass point. The p.d.f then has the form
=EK.„ ( “ * )/> * (-* « ')’■ « r ) ={op(L 7 l l ' r = 1 -2-3 ............... (2)
=  p ‘ ( i - 9ctr *
Now, M'(f) =  k qecPk(1 -  qe1)- *-1 5.12.1 Mean and Variance of a Geometric Distribution
kq Consider equation (2)
E(R) = = 0 =  —
V
E(R)2 = M"(t) (i) Mean
= k qetPk(1 -  gt?1)- *-1  + (fc + l)g ecP*(l -  q e T ^ k q e 1 
Complete the solution using V(/?) = E(/?)2 — (E(/?))2 (see Post-test 4) E ( R ) = Z ” /  p ( i “ p)r_1
etl -  p = q
5.12 Geometric Distribution *  E(R) =  Y ” r p ( q y - '
If in equation (1), we put k = 1, we have
r  + k - l Ck_lPkqr v-*°° d 
= rc0P qr
=  q r p , r  = 0,1 , 2, .... d v 100
and q = 1 -  p, we have geometric distribution. - * * i l j * r
The following describes the Geometric distribution.
Consider a sequence of Bernoulli trials with probability p  of success. This sequence is = p Zdq7 ^  + g2 + <?3 + ' " )
observed until the first success occurs. Let R denote the number of failures before this But (q + q2 + q3 + — ) =  q(l  +  q +  q2 +  q3 + — )
first success. For instance, if the sequence starts with F representing failure and S
success, with F, F, F, S ........ then R=3. i.e. this distribution describes the event of - f e )
first success after nlh independent trials with probability p, 0 < p  < 1 . Therefore, £(/?) = P ^ ( ^ )
Moreover, the probability of such a sequence is P ' ( l - q ) C l ) - q ( - l )
[/; - 3]= (</)('/)('/)(/>) = q' p = 0 -  pY = Pp (1 ~ q ) 2
Generally, the p.d.f, f(r) =■ P[R = r] of R is given by ( ( l - q  + q)\
f ( .r)  = (1 - pYp.  r = p \ o T q y ~ )   = 0, 1 , 2  ......
/ (
f ( r ) = qrp, r  = 0,1,2 ............  ( 1)
E(/?) = -  
V
120 121
UNIVERSITY OF IBADAN LIBRARY
(ii) Variance Thus, the mean and variance of this form geometric distribution are ^ and I r ­
£(«2) = Y "  r 2 p ( l -  p)r_1 respectively.
4—'r= l
Example 1: A fair die is cast on successive independent trials until second six is 
= Y °°  r 2 p ( r observed. What is the probability of observing exactly 10 non-sixes before the second 
*—Jr=l
“  d six is cast.Z Solution: This is a negative binomial distribution problem. So,d v ' ,co 
p k ( \ - p Y r  =  0 ,1 ,2 ......
U - l  ,
d
= P j j ( q  + 2 q 2 + 3 q 3 + - ' )  ClO + 2-1^
Therefore, we have 0.049
d
= p - q ( l  + 2q + 3q2 + 4q3 + - ) Example 2:
Recall that 1 + 2x + 3x2 + 4x3 + ••• = Team A plays team B in a seven game with series. That is the series is over (1-X)2 when either of the teams wins four games. For each game, p(A wins) = 0.6 and the 
Therefore, we have p —dq  q (V•■( l--fl)2/ games are assumed to be independent. What is the probability that the series will end 
( l - q m )  + 2 q ( l - q ) in exactly six games.
= P
(1 ~ q Y Solution:
[Cl — qr)][(l - q  + 2q)] The game will end is either A or B wins the game series.
= P
(1 - q )4 p(game ends) = p (A wins series in 6 games) + p (B wins series in 6 games)
r n1  +  q
= p l((1T-3 9)3 ^ ] ( 0 .6 ) ‘ (0.4)’ + [^ 0 .4 ) ‘ (0.4)2
r 1 + 9 = 0.207 + 0.092
P i ( l - 9 ) : = 0.299
Note: that
(P)2 J
p( A wins series in 6 games) = p [A looses 2 games before 4 wins]
f(-p2) = 1̂ ]  since q = 1 -  p = P(Y = 2)
Therefore,!/(/?) = £(/?)2 -  (f(/? )):
2 - p
(p):Hi)' = 0.207
1 ~ P Example 3:
r,2 In a sequence of independent rolls of a fair die;
122 1 2 3
UNIVERSITY OF IBADAN LIBRARY
• Each trial has a discrete number of possible outcomes
i. Whai is the probability that the first four is observed in the sixth trial. • The probability that a particular outcome will occur is constant for any 
Solution: This is geometric distribution problem given trial
P(R = 5) = j  = 0.067 where R denotes the number of non-fours before the • The trials are independent
A multinomial distribution is the probability distribution of outcomes from a 
occurrence of the first four. multinomial experiment.
i. What is the probability that at least six trials are required to observe a four. Definition: Suppose a multinomial experiment consists of n trials, and each trial can
Solution: P[* > 5] = \ -  P[R £ 4] result in any of k possible outcomes £1,£ 2^ 3. .....»£*• Suppose, also, that each
= !-[/>[/? = 0']+ P[R = l]+ P[R = 2]]-t- P[R =  3+/>[/? = 4 j possible outcome can occur with probabilities pa, p2, P3, ..... , pk . Then, the probability
p that Ej occurs nx times, E2 occurs n2 times,...... , and Ek occurs nk times is
P =  [(n ^ ln fc !)] fr1"1 P2"2 .....Pk"*] where n =  na +  n2 +  n 3 + -  4- nk
Example 1:
A bowl consists of 2 red marbles, 3 green marbles and 5 blue marbles. 4 marbles are 
randomly selected from the bowl with replacement. What is the probability of 
selecting 2 green marbles and 2 blue marbles?
Solution:
The experiment consists of 4 trials, so n = 4.
The 4 trials produce 0 red marbles, 2 green marbles and 2 blue marbles; so
nred ~  0 ,  Kgreen ~  2 # W-blue ~  2
7776 On any particular trial, the probability of drawing a red, green or blue marble is 0.2, 0.3 and 0.5 respectively.
Complete the solution Using the multinomial formula, we have
= f._____2!_____ [Pi "1 Pz"2.....Pic"*]
5.13 Multinomial Distribution l(na!n2! .....nk!).
We know from binomial distribution that each trial of a binomial experiment can 
result in two and only two possible outcomes. In the multinomial experiment, 
however, each trial can have two or more possible outcomes. So, a binomial '(oT^Tii)] [(0-2)°(0-3)zCo.5)z)
experiment is a special case of a multinomial experiment. Therefore p = 0.135.
A multinomial experiment is a statistical experiment that has the following properties: 
• The experiment consists of n repeated trials
1 2 5
1 2 4
UNIVERSITY OF IBADAN LIBRARY
E xam ple  2:
Suppose a card is drawn randomly from an ordinary deck ofplaying'cards and then C H A P T E R  6
put back in the deck. This exercise is repeated five times. What is the probability of SO M E C O N TIN U O U S PR O B A B IL IT Y  D ISTRIBU TIO N S
drawing 1 spade, 1 heart, 1 diamond and 2 clubs?
6.0 Introduction
Solution: Having studied some discrete probability distributions in the last chapter, this chapter 
The experiment consists of 5 trials, n=5 now deals with the study of some commonly used continuous probability 
The 5 trials produce 1 spade, 1 heart, 1 diamond and 2 clubs; so rij = 1 ,n 2 =  1 ,n 3 = distributions.
1 , n4 = 2
On any particular trial, the probability of drawing a spade, heart, diamond or club is 6.1 Normal Distribution
A random variable X is said to have come from the normal distribution if its 
0.25, 0.25, 0.25 and 0.25 respectively. Thus, p1 = 0.25, p2 = 0.25, p3 = 0.25, p4 = 
probability density function (pdf) f i x ) i s define as:
0.25
Using the multinomial formula, we have / w =  1 a M 2.-co .* < o o
V2^  v ;
P =  [(nj n,r.'....nt l)]^ ’,lp a ’" .....P*"*] With p >  0 and a 2 > 0
The mean and variance of the normal distribution can be obtained as follows:
[(1! i n !  2!)] [(° z5)1(0.25)1(0.25)1(O.25)2]
E(x2) = f  xr f{x)dx
p = 0.05859 J-CO
1 -   _i2 /x — i#\ 2P.
Practice Questions - i V27T(7Z V °
1. Suppose that a fair die is rolled 9 times. Find the probability that 1 appears 3 dx
times, 2 and 3 twice each, 4 and 5 once each.
2. In a city on a particular night, television channels 4, 3 and 1 have the rt£ 1
following audiences: channel 4 has 25 percent of the viewing audience, Let Z = —a ' d x a
channel 3 has 20 percent of the viewing audience and channel 1 has 50 percent X =  H + 8 Z
of the viewing audience. Find the probability that among ten television E (x7) = — —  f  (p +  aZ)r e J 2odZ 
viewers randomly chosen in that city on that particular night, 4 will be a s 2n J-oo
watching channel 4, 3 will be watching channel 3 and 1 will be watching 1 f "  _z*
= ~ ^ j=  J (p + crZ)r e 2 dZ 
channel 1.
When r = 1
1  r  z2
E(X) = — = (p + aZ) e z dZ 
\ 2u J-m
126
127
UNIVERSITY OF IBADAN LIBRARY
i r r  _£i r  -
Z e ~ d Z key property o f being memoryless. In addition to being used for the analysis of 
= v f ? l " L e z2l + a L Poisson processes, it is found in various other contexts.r* i  *  r  1 JL
= H ~T=e 2 +cr Z - — e * dZ
J - m V 2 n  J .  co \ f 2 n The exponential distribution is not the same as the class of exponential families of 
distributions, which is a large class of probability distributions that includes the 
Recall that ~  e 2 is a standardized normal distribution with 0 and variance 1. exponential distribution as the baseline distribution
Therefore
E(X) = /i( l)  + o-(O) A random variable X is said to have an exponential distribution if is probability 
density function is defined as 
Since - j=e~~  =  1 and therefore 
f i x )’ = X e-^ .X  > 0
•• E(X) = n Its corresponding moment about the origin is derived using
r* 1 z2
E{Z) = Z - = e ~ T d Z
■'—co yj2 n 4  = E(xr) = C x ' m d xJ — 00
To obtain the variance, set r to 2 in equation ( 1 ) and use = /*  xrXe~Xxdx 
Var(X) =  E(X2) — [£,(A')]2, we proceed as follows
= A r ^ e - ^ d x
J  —  CO
E(X2) = ~  [ Qr -f c r Z ye '^ d Z
1 f 0v0 2n J— co z2 dyLety = Xx, —  = e • dx
= - =  (p2 + 2/i(JZ +  cr2Z2) e ~ d Z
V27T J dx =  y  andx = j—co
r  1 r® 1 z2= ^ T = e ^2 dz + 2 nd r Ze —i dZ + o2 \ —= Z 2e z dZ
=  /^2J C- ol )o   V+ 2 2tT/icr(0) +  cr2 ( l ) J-,,V2d J-my/2n = ^ i o <V e vdy
/T(A') = /72 + cr2 = ^ i is in c e r a  =  y a~1e~ydy
Therefore
Therefore, the r h moment about origin of an exponential; distribution is
Var{X) =  (n2 +  a 2') — p 2 
= <r2
6.2 Exponential Distribution The first four moments can be demanded as follows 
The exponential distribution (also known as negative exponential distribution) is the When r =  1, we have the mean
probability distribution that describes the time between events in a Poisson process, 
i e. a process in which events occur continuously and independently at a constant 
SinceVa =  (a -  1)} 
average rate. It is the continuous analogue of the geometric distribution, and it has the
When r = 2
128 129
UNIVERSITY OF IBADAN LIBRARY
T3 (3 -  l ) j  2 
^  X2 X2 ~A 2 
From which variance can be obtained as follows Ifw elety  = P7 => dx = (Sdy 
VarQ0 = EV(2)-[E tX )]2
= ^2 “  O l1")\2 E(,Xr) = Y ^ f “(fiy)T+‘" le-rt)dy 
= F - ( l ) 2
1 V E(.Xr) = I / -  yr*a~2e-yp iy
7 !
When r  = 3
_  T 4_  ( 4 - l ) j  6
%  A3 A3 A3 Recall from Gamma function that 
and similarly with r  = 4
Ta = e~xx ~1 dx, then
i _  r5  _  (5 “  D i _  2A o r
" 4 A4 A4 “ A4 E{Xr) = — T(r 4- a) This gives the rth moment about the origin from which the 
first four moments can be derived.
6.3 Gamma Distribution When r  =  1, we have
line gamma distribution is a two-parameter family of continuous probability 
distributions. The common exponential distribution and chi-squared distribution are £ t f )  =  ^ r ( r + a )
special cases of the gamma distribution.
£ (* ) =  ^ « ra  
In each of these three forms, both parameters are positive real numbers.
The parameterization with k and G appears to be more common in econometrics and =  ccp 
certain other applied fields, where e.g. the gamma distribution is frequently used to When r  = 2
model waiting times. For instance, in life testing, the waiting time until death is a E{X„2 ) = JP-2T 2  + a 
random variable that is frequently modeled with a gamma distribution. Ta
A continuous random variable X  is said to have a Gamma distribution if its probability 
density function is defined as follow /?2(1 + a ) r ( l  + a) 
_ X ra 
f M  = -‘ ^ p , x > o .c c > o ,p > o
/?2(1 + a )a ra  
= To 
6.3.1 Moments of Gamma Distribution = a ( l  + a )/?2
x re ~ i x a_1 Therefore, we obtain the variance of X using the fact that 
dx
Vcc Pa Var(X) = E(X2) -  [E(X)]2
130 131
UNIVERSITY OF IBADAN LIBRARY
= a( 1 + a )/?2 -  (a/?)2
= a /?2 + a 2/?2 -  a 2/?2 
1/a rp O  = a /?2 
When r  = 3
£ (* 3) =  ^ra  T(3 + a ) and finally = f a l "  l “ ery"~I ^
with r  =  4, we have 
Since Vx =  / “ e5'}/®-1  dy as before. 
£ (**) =  F r r (4  +  a) Then,
6.3.2 Moment Generation Function of Gamma Distribution Mx( t )  =  (1  -  / ? t ) - ‘
The moment generating function of a random variable X distributed as Gamma i.e. Differentiating the above and setting t to zero, we obtain the first four 
X~GA(aP) is derived as follows: moments about the origin as follows
Mx(t) =  E(etx) = f  elIf { x ) d x  Mxl (t) = ap a - p t r a- 1
j  — 00 E(X) = M](0) =  ap 
/"or — h <  t  < h
M]1 = - a p 2( - a  -  1)(1 -  p t y a~2 
etx e Pxa~l
M,(0  _= Jo dx E(X2)  = AfJKO) =  a 2p 2 + ap2 Ta p° Var(X) = E(X2) -  [E(X))2 
= a 2p 2 + a p 2 -  (ap)2 
1 C e t x e ~ * x a- X
--------- -̂-----------------  dx = a p 2
ra p a  Va P a The characteristics function, the second characteristic function and the 
cumulate generating function can be obtained respectively as 
<Px(t) = (1 -  Pi t)-a , cj>x(t) = - a  log(l -  pi t)and 
i: dx Kx{t)= -  a \ o g { \ -  pt)r a p 1
6.3.3 Maximum Likelihood Estimation of parameter of the Gamma 
.  1 rv■>(*' ‘) x a dx Distribution
EccPa Jo Let X\,Xi,.. .,Xn be a random sample of size n taking from Gamma distribution, the 
1 r« iF)̂ dx likelihood function is= r a P a J0
Py
* * i-pt
132 133
UNIVERSITY OF IBADAN LIBRARY
L = £-(*+i)+i
(Ta)np an ' K(3K I00
k FTTT+T
The corresponding log-likelihood function is \p
n
Z”= i xt
LogL = ----- - —  + ( a -  1 )^ l o g x t - n logr(tr) - a n  log/? X~K 0 0-K(3K —  . n  
H K  \ P
Differentiating this with respect to a and /? we have
dlogL 1 ra l  , o P K loo _ -0/?* p K—  = ^y\ o g x i - n — -nlog(3 X K \ P  0* p K
i=i
3/o^Z, _  sr=i^/ an 
J
pci
where <p(a) =  —  equation ( ) can be written as 
= 1
dlogL v-1
^  log*i _  n - nlog/? 6.4.1 Moments of the Pareto Distribution
i= 1 The rth moment of the random variable X~PAR(p, K)
6.4 Pareto Distribution
The Pareto distribution, named after the Italian civil engineer and economist 
Vilfredo Pareto, is a power law probability distribution that is used in description of 
social, scientific, geophysical, actuarial, and many other types of observable 
phenomena.
A random variable A'is distributed Pareto with parameters /? and K. if its pdf is given 
as
f W  = % .  K > p , x >  0 
It is interesting to show that
, r -K
/ ” /■(*) dx =  1 , this is as follows = KpK
r - K \ P
U' = ^ — oo
r—K ■ p
dx
dx <PK 1 | co
K -  r x K~r IP
1 3 4
135
UNIVERSITY OF IBADAN LIBRARY
_ KpK /  l 1
K - r  U  K-r p K~r.
Kp2 [K2 - 2 K  + 1 - K 2 + 2K) 
1 \ (K — l ) 2(K — 2)
K - r V  pK-rj
K P 2 C D
KpK (K -  1)20< ~ 2) 
{K -  r)pK~r KP2Par (A') =
( K - D K K - 2 )
k p k
: \k  -  r)p«p-r
Kpr 6.5 Maxwell Distribution
E{Xr) =
K - r In physics, particularly statistical mechanics, the Maxwell-Boltzmann distribution or 
The above is the rth moment of a Pareto distribution. . Maxwell speed distribution describes particle speeds in idealized gases where the 
When r  = 1 , we have particles move freely inside a stationary container without interacting with one 
b w  = J ! L another, except for very brief collisions in which they exchange energy and 
K -  1 momentum with each other or with their thermal environment. Particle in this context 
Similarly, when r  = 2, we have refers to gaseous atoms or molecules, and the system of particles is assumed to have 
K p2
W 2) = reached thermodynamic equilibrium.K - 2
From which variance can be obtained as 
A random variable A" is said to follow Maxwell distribution if its pdf is defined as
VarQ0  = £?(*2) - [ £ ( * ) |2
K P 2 f  K P  \ 2
K - 2  \  K -  1) It is required to show that the above is a time pdf and we proceed as follows. 
Our expectation is that
KP2 K2P2 
K - 2  ( K - \ Y f(x)dx  = 1 ; then
K(K -  1 )2p 2 -  K2P2(K -  2) 
(k  -  1 )2(K -  2) dx
Fn?(.V) KZ2\(K -  l )2 -  K{K - 2 ) |  
( K - l ) H K - 2 ) ax
136
UNIVERSITY OF IBADAN LIBRARY
c
&
1
1
dx a2 x
Since x 2 = 2a2y
a2dy
= (>/2y)‘ dx =
(y[2y )a  = -y[?=2y dy
j  IcJye  dy  When r = 1
2V2a3 f ”  I  -v ,
^ 7 f / 0 >*' * I m - T T r H
_2_ 3 __2_1 1 
J R V2 ~  JR 2 V2 2§ar2
Since P^ = Vir, we have ~V^T
2ia 2?2a
2 1 1 V Z ~ ~ j T
F i 2 r 2 = 1  « E D -
2 When r = 2
This affirms that Maxwell distribution is a true pdf. ?1+in2 3 2
£ « 2) =
Vtt P2 + 2
6.5.1 Moments of the Maxwell Distribution 4a2 5
/•on = ( xvw dx 2
J  —  CO i«h3 J<> 00 2
2T+r- ^ 2~+ra 3 + r e - y
E(XT 2 ?1  y z1 dy
V  dx
■ f  n
J r . 1 1 1f  c o 2 1+J y 1+ J ax2+re 2a7 dx > T - 2 zI y 2I  r e - y dy'0J0
Using the notates earlier, we have
139
UNIVERSITY OF IBADAN LIBRARY
25a32}
m n  = Since fa = (a -  l ) 3
5 _ n
2 2z a3 22a 3 
V7T 7T
21+5a4 3 4
fc’O T  = “ V T ”1 2 + 2
23 a4T j
= ~ V T ~ '
From which the first four moments can be derived
  3 1 1 _  3 1 8a4-  r -  8a4- -  T-Sci- nce Ir -5  = -3r3-  2 2 _  22  22 2 2 2 2 f 2 _ 4 f 2
8a4- - -  r-
s m ~  r n
= 15 a4
The third and fourth moments about the mean, i.e. n3 and ^4 can then be obtained as
ypn 4
= 3a2
Var (X) = E(X2) -  [E(X)]2 /16
/   ̂ \  2 *  = 2a T " 5 J i
= 3 a > - f e l
= a4( 1 5 - -
Finally, the coefficient of Skewness and Kurtosis is thus:
n 5, = ~ 0.48569
When r = 3 (3-i)1
5v - ^ t- 3  ~ 0.10818
141
UNIVERSITY OF IBADAN LIBRARY
dG{t)
G \t)
dt
C H A P T E R  7
= £ * / "  '/>(*)
PR O B A B IL IT Y  G E N E R A T IN G  FU N CTIO N S (P G F )
= Y j x Pw  => G(i>=
7.1 Introduction
4. The variance of X is given by:
The probability generating function (PGF) for a discrete random variable is a power 
series representation (the generating function) of the probability mass function of a Var [X] = g1i,"+ g ;1)-[g ;„]2 
random variable X. Proof:G;„ = Y x P<*>
PGFs are often employed for their succinct description of the sequence of probability X
P[X = /'] and to make available the well-developed theory o f power series with non- 
negative coefficients.
Gtn = I ( x 2-x ) i> , , / '-2
Definition 1: The probability generating function (PGF) of a random variable X is 
defined as:
G,(t) = E[t ‘ ) = J  X
- I.t  f  />[* = *] o,„ =
where:
(7,,,"= E(x!) - E(x)
Gx{l) is defined only when X take values in the non-negative integers 
P(X=x) is the probability mass function of X. G,., = E(x2) - g ;„
The notation Gx is usually used to emphasize the dependence on X. V(x) = E(X2) - [E [X )Y
r ~t
7.2 Properties of PGF V(x) = E(X2)
1. The probability mass function of X is recovered by taking derivatives of G. ■ M
But IE(x2) = <
P(k) = P(X = k)= GlK)( 0) 
K\ Therefore Var [XJ - g„i + g ;„-[
2 . If X and Y have identical PGFs, then they are identically distributed, i.e. if 
there are two random variables X and Y and Gx = Gy, then fx = fy.
3. The expectation of X is given by 
E(X) = G ‘(1)
Proof:
G(t) = F.(t*) = £ C P ( .r )
143
142
UNIVERSITY OF IBADAN LIBRARY
7.3 Probability Generating Functions x  is the random variable
1. Bernoulli Distribution (i) Mean: 
The probability density function (pdf) of a Bernoulli distribution is given by G(/) = E [0
?[X =x]=P'q'-J
= £ f P [ X  = x]
(i) Mean: »»»
Gx(t) = E[t*]
= £ j t 'P[X  = x]
= t°p°ql4) + t ’p 'q1' 1 
G*(t) ^  q + pt
Gl)=P
G;n = P = E(x)
(ii) Variance:
G " « - ^ a a = [p /+ 0 -p )]"
dt1
Therefore, G '(t) — p G(t) -  \pt+q\"
G"(t) = 0 G ' ( . ) = ^ 0
But Var(X) = E(X2) - [E(X) ] 2 dt
~ n \p t+ q Y  
And Var<X) = G "(l) + Gl,„ -  iG'(i)]2 P
This implies that 0 + p - p 2 =p(l-p) G'(t) =n (pl + q)' P
Var(x) = pq G '(l)  =n (p + q ) ‘ P
G1 (1) — np=E(x)=Mean 
2. Binomial Distribution (ii) Variance: 
The p.d.f of a binomial distribution is given by p"(t) -  n(n-l) (pt + q)n‘2 p2 
b " ( l ) ~ n(n-l)P2
•. Var(x) - G 1 '(ij + Gl,u - [C'cuj2
where: - n(n-l )p2 i- np-n2p2
n is the number of observations ~ n 2 p ? *- np- np2  -  n 2 p2 
;> is the probability of success ■- np - np'
ij the probability of failure
144 145
UNIVERSITY OF IBADAN LIBRARY
= np(l-p) C H A P T E R  8
Var(x) = npq
M O M E N T  G E N E R A T IN G  FU N C TIO N S
3. Poisson Distribution 
8.1 Moment Generating Function
(i) Mean :
The moment generating (m.g.f) is one which generates integral moments when these 
G(t) = E (tx)
moments exists.
e~AX* (i) For the univariate random variable X, the mgf is given by
x\
«,(< )= ,-0  ,
Where t is a dummy variable
(ii) For the bivariate case -'e have corresponding
=
G(t) = Where tKand t2 are dummies and the random variables X x, X 2 are jointly 
G‘(t) = Xe~x^ distributed.
G'(t) = /Uf'lw (iii) In general for multivariate case, we have
~Xe° = (<„/„-(.)=  E{e... ......... -■)
G11 (t) = X.Xe-*'* The moment generating function Mx(t) of a random variable X is defined for all real 
= X2e~i+il values of t by
G "(l) = Mx(l) = l£{etx)X2e~i + i  =  X 1 
(ii) Variance : ( YixetXP{x)'> i f  X is discrete 
Var (X) = GII(1) + [GI(1) -  [G'(1)]2] [ / ^  e txf(X) d x ; i f  X is continous
= G"(1) + G '(1 )-[G i(1)]2 Mx(t) is m.g.f. because all the moments of X can be obtained by successively 
differentlaity Mx(t)  and then evaluating the result at t  = 0.
=  X2 +  X  -  X2
=  X
Example: I f / (x) = X = 1 ,2 ,3 ,4
M M  = Z U i e ‘*fw
= -4 e l +  -4 e 2t +  4- e 3c + -4e 4t
If Aj and Xx have the same pdf and Y = X2 + X2 
My(t) = £ [e t(*i+*2)]
146 147
UNIVERSITY OF IBADAN LIBRARY
CU
II
=  E(e“ ‘.e“ *)
« , «  = [M ,(0]2 For the discrete distribution. If X has a pdf / (x) with support {a1( a2, ...) then
=  i6 e »  + i16. e3 .+ ±16 e«  + i16. e« 1i6. c«  +  i16 e« :1l6 e «  + i16 eet 
« ,C 0  = X e" dw
Example: R
i y =  /(•.)«“ ■+ /(a1)«“ , + -
Let Y be a discrete random variable with pdf x  — 0,1,2, Hence, the c.d.f. at effll is = P(X =  aj). Thus, the probability of any value X 
say a f is the coefficient of e tVl.
Example: Let the moments of v. be defined by E{Xy =  0.8, r = 1 ,2 ,3,...)
y=o yi Then oo r  oo ^
Mx(t)  = M(o) +  ^  0.8 (—) = 1 + 0.8 0.8
w
= r et>'(Aet)>' r = l  r = l
y=o yi
= 0.2 + 0.8 ^ 0 . 8  ^  ??
r = 0
= 0.2eot + 0.8e“
y=0
Thus, P • X =  0) = 0.2, P(X = l+ =  0.8
=  =  p A ( e f - l )
m; ( 0  =  ^ e ( O
v ( ^ f)y , t a « r) _ ^ f ( A 0 2 
Smce =  Z “ 7 T - Ae = — = 1 + 1 T —
y=o = *[£«(*“ )]
= E[Xetx)
MJ(t) = Ae(exp{(A(ef -  1)} Since the interchange at the differentiation and Expectation operator is allowed, we 
m;(0) = a can assume that;
My(t) = (Aec)2exp{(Ael -  1)} + Aec exp{A(ef -  1)} 
=A2 4- A
V'ar(y) = m; ( 0) -  (Afy(O)]2 => A2 + A -  A2
= A for discrete case
Obtain the l/a r(r) given that
Yar(x) = M;(0)-fM.;(0)]2
>49
UNIVERSITY OF IBADAN LIBRARY
=  0 x ( O -  0 y ( O
Also, the M.g.f. ol a random variable uniquely determines the distribution.
• " / » * ] - / s « B,/r» * ‘ Example:
for continuous case If X and Y are independent random variable with parameters (n,p) and (m,p) 
Example 3: respectively. What is the distribution of X + Y.
From an Exponential Distribution Mx(t) = (Pet +  <?)n; tfy(t) =  (Pec + <7)m
Mx(t) =  £(«'*) •• t) = A#x(t). My(t)
(Pe‘ +<?)n (Pec +  q)m => (Pef + £?)n+7n
= J  etxXe~^ dx Example
o Calculate the distribution of X + Y when X and Y are independent. Poisson random 
variable with means Aj and A2 respectively.
Solution
= aJ  <T-(<Aj -t)x Mx(t) =  e ^ et~»
A Mx+y(t) = Mx (t)My(t)f o r  t < A
A-t = e Ai(tft- l)eA2(et-l)
The above function is only defined for t < A
=  e W , - A 2 ) ( e ‘ - l )
2A Hence, X + Y is Poisson distributed with mean (A2 +  A2)
m; co) = F
Var(X) = M »  -  [Mi(o)J Example: If X and Y are independent Normal random variables. The distribution of
X + Y is
= ^ - g ) 2
A2 M*+y( 0  =  Mx(t). My(0
Example 4: Given prove lor (i)Normal Distribution
(ii) Standard Normal Distribution
An important property of m.g.f. is that the m.g.f. of the sum of independent random =  exp +  (px + p 2)t)
variables equals the product of the individual m.g.fs.
Let Z = X + X  where Xi and X2 are independent with m.g.fs Mx(t) and My(t). The If X and Y arc independent discrete random variable with non-negative integers 2
{0,1,2,...} as range with geometric. Distribution function 
m g.f of Z is
=  q’P- with
Mx(t) = £[ef(x+y]
= E[etx.ety ] m.g.f Mx(t)  =
= K(etx) E(ecy) What is the distribution of 7. = X + Y
150
UNIVERSITY OF IBADAN LIBRARY
Solution
MAO = MAOMy(.t)
l - 2 qec+ q2e2t
Replace e f by z  we have
M 0  = (i-qz)2
= P2 2_^k + l  qkZk
k=0 00 [(t ^) ~2B2tx
The distribution of Pz  is a negative binominal distribution. 1 f  " «s»—
V S i j i  J  e dx
— 00
e~- i Xv Note that
Examples: Let f a  = -------; x = 0 ,1, 2,
X ! K* -  /O -  ^ 2t]2 = ( X  -  /t)2 -  2(x -  n)a2t + a2t2 
e~l Ax = (x ~ P)2 — 2x62 + 2[i o2 + a3t 2 
M x(t)= 1x L e**0 x\ (x -  n)2 -  2S2tx = [C -  H) -  er2t]2 -  2\i6zt -  62t2
=  e - *v' - Y-01  &jrly u j \[(X-n)~(T\2-2llS2C-lT2C2]
Mx(t) =
, + (Z e 'f / iP----------dx=  e \ +  Ae + ... V 2 t c 8 2
2!
— e -A  e„Ae'
8.1.1 Moment Generating Function for Norm!a(l¥ D)is2tribution A2 n a 21  J / X - U \ 2f w  = V2n52 If we I* f»y =  S 2 ± ± £  6
=  CO dy = ^ => dx = SdyMA O  E(e“ ) =  J  e a f a d x d x
oo w _ylMAO  = e" t+— e 2 fidy
/ 1 -X *z£ \7 / (pVTn= e e b / dx' f i n d 2
15 3
152
UNIVERSITY OF IBADAN LIBRARY
1 00 00
= e*„ t+£2£ i f  —1—  e _zi dy tt VH + r t  At r r r 12
J 0V2tt M*(£) = (2ton/2u i v 2 / • • • / exp [ - 5  "
The function in the integral is a standardized normal distribution. 
Therefore. -A t) ]  dxa ...dx„
If we lety  =  x - y  -  At
Mx(t) =  e^£+ 2
since
e c » y  +.  -1 t.1x A, „t  0r0  «f  i , .
(27r)n/2|A|V2_.
CO -C O
f  —l = e ~ y ^  dy =  1 
—J  (p V 2n  
By examining
00 •
Lei X~Nn(jin, A), then the moment generating function of X is given as 
Mx(t) = e*V  + J  ... J  e ~ 2yi/>~ly dy 1 ...dyn ,
Proof — 00 —00
We know from Alternate Integral that =  (2tt)t1/2|A|1/2 i.e. Ankens inTeyrat
CO  0 0
j   ... J  e 2*lAXdxi. . .dxn = (27T)n/2|>4| 2 Mx( t ) = ------- 77 r - ( 2tr) / 2|A|2(27r)n/2|A|=-00 -00
Where A = I  = variance -  covariance matrix
OO 0 0 M(x) =  e ‘*y + ; t xAt
=  (2 7 T ) 'I |/ i r i  J  ... |  e x p [ ^ - i ( x - / i ) 2i4_1( x - / i ) ]  dx1 ...dxn
-00  -00 8.2 Bivariate Distribution
If wc Icl Let A' and Y be jointly distributed as
L = tx " ( x  — a/)1/! 1( x - / i ) ,  then simplifying this we have = exp {- (*+;>)}
L = -  ̂ ( x  -  yt -  At)x(x -  y -  At) + tV Obtain the joint m.g.f.
Solution: M, ,.(/,,/,)= )
154 155
UNIVERSITY OF IBADAN
LIBRARY
= \ \ e e ^ * r)dydx. =  npq
= JJe-4'-',)-r<l- i ]cJxdy Practice Questions
1. Obtain E X \ E X \  hence Kurtosis and Skewness of X
2. The m.g.f. of a random variable is (/) a(a  -  l)_l (/i')exp{ur + { rY  } 
Obtain the mean and variance of X  for (i) and (ii) above.
Exercise 3. For the bivariate case
(1) O^xb.tra)i n=  tehxep j|o~in^t (mj:.2g+.fy if) |  -co<x,>'<oo ! f e ) = S ^ (ao)
(2) Obtain the joint m.g.f. if K , . ' ,  (°.°)
fu.r) = £ e A' 0 < x < y
8.3 Obtaining Moments from m.g.f For r and s non-negative integers.
Since m.g.f. continuous and differentiable in /, it is easy to obtain r h moments e {x ')  4. ror K „ ,2(tl,t2)= a 1,a 2(at
from m.g.f.
(i) The Univariate case Obtain E(X,),E(X2),e [x x ) E{x\)Var(Xx\  Var(X2)and C o ^X x,X 2)
5. Obtain the m.g.f. if the joint distribution is given by
E I\ X ‘r )\ =  -«j/p,Hr Mx[0\ where r is a posterior integral 
Example v 1 2
The m.g.f. for the binomial distribution is
0 0.2 0.3
w ,( 0 = ( ? + /v ) r 1 0.4 0.1
4 0  = J l MM = j t [q + Pel)
Obtain the estimate of the means as in the above table.
M',[l) = n(q + Pe‘f  '(Pe'j 
(o)= nP (3) For M X) X; (/„ /,)=  expj/.m, + t7m2 + -X (/,2r ,2)+ 2pr, r2 /, t2 + 1] x] 
£'(.v)= M \\ t )=  n(n -  1 fa + Pe')" ~(Pe' )’ + n(q + Pe1)" ' (/V ) Obtain the same mean as in (4) above.
M['(o)= n(n - \ ) P 2 + nP 
r  -
— n(n -  1 )P? + nP —n~P'
156 157
UNIVERSITY OF IBADAN LIBRARY
CHAPTER 9 x / 0 l 3
C H A R A C T E R IS T IC  FU N C TIO N S / y 
l 0.1 0.2 0.3
9.1 The Characteristic Function (c.f.) 2 0.1 0.05 0.25
The characteristic function (px(t) of a random variable X is the expectation of a 
complex function of X. It is defined as <px(t)= E(e'u) ).1.1 Moments from Characteristic Functions
(i) for the bivariate case
(i) for univatiate case: The rlh moment can be obtained as: E (x r)= prrf'fO) 
(ii) for the multivariate case, through characteristic function is given as This is obtained by differentiating tpfyr times w.r. to / and evaluate the result 
.t > (/,./, at 1 = 0 , then divide by i[r).
Unlike the ordinary expectation, the characteristic function always exists. (ii) For the bivariate case
This is because
EE\(Xx r Hi  )=  ,{r1+ I),  f ' r ’jv  Vo*1.0y01
| <PX (/](= \Ee,L'\ = \E(Cost x  + iSintx)j = 1 
r ( y r ) 1 ^ ,r> a (00)
Examples:
The characteristic function for the binomial distribution is given by 1 rrr,0,0)
9.2 Exponential Distribution
The p.d.f is given as
X <  X  <  00
= ( / v + 9 )'
The C.F. is
oo
Exercise eitx-e~X/°d7
Obtain the joint characteristic function for the following: 
(*) /(x ,y )  = exp{-(jc+ v)|
(//) f ( x ,  v ) -  —  expj - —(r" f  y ’\
2 n [ 2 dx- } { • * * >
(/«) P{x,v) =
= 6 ~ \ e - 0~l -it
158 159
UNIVERSITY OF IB :----- --- ' "•  ̂ -f -ADAN LIBRARY
=  (n -  l)(n  -  2) ...3.3 f ( l )  and f ( l )  =  / 0° V xdx =  1
(1-iflt)
01 (t) =  +£0(1 -  i0 t)~2 The Characteristic Function of the Gamma distribution is obtained as: 
m1 = 0 1(O) =  7 = 0 0(0 = E(eicx) =  /  e itxf (x )d x
0*(t) =  2i29 2( l  -  i0t)~3 
0 n (O) =  2 9 2i2. Aeitx-Ax (Ax)K- 1 / e^.A-^CA x)*"1m 2 = 0 l l (O) =  2 6 2 I W r(/c)
Var(r) = m2 -  tn\ =  i l ; oox it-ic - a - io dxr(k)Jo A
= 2d2 -  92 r oo e - ( A - “ ) x  A k x k _ I  ,
= e 2 =  Jo ------ roo------ dx
0 (0  = ' W" ‘t)x <**
9.3 Gamma Distribution
A random variable is said to have a gamma distribution with parameters (t, A), A > 
Using Laplace transformation, we have
0 and t  > 0 its density function is given by
Ak
x  > 0 (A-it)kr(*) Ak(A4)kfix) - K )
0 x  <  0 0 1(t) =  iU fc( A - tO _k" 1
i * AkA-k_1 k 
where mi = 0 l (O) = i ~  A
m2 = 0 1J(O) =  ++ l )£AkA-k-2T(t) =  j  e~y y l l dy o
integration by parts yields Var(x)- =G m)2f ke\ 2 k fk \ 2+ » ~ ( l )
= - e - v 1 | o + J  e_y(t _  1)y t _  fc 
”  A2
= ( t — 1) J0°° e~yy l~2dy 
= ( t - l ) r ( t - ' l )  . If X is a random variable of the discrete type [i.e.x  =  0,1,2,...] with probability 
If follows that function. P(X = x,) =  P(X), then the characteristics function of X is define by
T(n) = (n -  l)r(n -  1) «K0 =  £ (e ltx) = I * P fce‘tXk ....................... (1)
= (n — l)(n  -  2)f(n — 2) If X is a random variable of the continuous type with pdf/(x)
160 161
UNIVERSITY OF IB T!. . - ;------>—>ADAN LIBRARY
M
©  "g
II 1
then 0 (t)  = E(eitx)  =  f*™ f w e ltxdx (2) Example: The moments of a characteristics function can be obtained by continuous 
differentiation of the function (discrete or continuous) r time and dividing the result 
since [eicx] = 1 and T.k Pk = 1 or f ^ f a d x  = 1
by ir
then /_+rA x )fe£trl dx =  1 i.e. pr =  ; r th moment
The summation in (1) and the integral in (2) are absolutely and uniforming converged. Thus, = 1st moment
Thu, the characteristic function 0 ( t)  is a continuous function for every value oft. Second moment p2 =  1 ; P3 =  ■ /3 ;
Properties Since 0 r (t) =  irx rf(X)eltxdx 
( 0  0(0) = E(e°) =  E( 1) =  1 and 0 r (t) =  Z k irx rP(Xk)e itXk 
(ii) [0 (0 1  =  |£ (e ‘“ ) |  S  £-|e‘“ |) =  1 Example 1:
Hence, |0 (t) | <  1
(iii) 0 (—t) = E[e~ltx) =  E(_Cost X -  i S i n t x ) = E(Cost x ) — iE(Sint x) = iPeicp = i2Pelt 
0 ( - t )  =  E (e itx) = E(Cost X +  i Sin t x) = E(Cost x) -  0"(O) =  i2P
iE (Sintx')
E(x2) = $ r  = P
thus, 0 (—t) =  0 ( t) ;  a conjugate to 0 (0  
E(X2) -  (E(X))2 
p2 = Var(x) = P -  P2
All c.f. must satisfy the above condition.
=  P ( 1 - P )
Example: = pq
Let X be a random variable from the Bermoulli distribution. Obtain the Characteristic
function Example 2: Suppose X is from a Poisson distribution the characteristics function is 
given by
Solution
0(t) = V co
*—l k=0 , tCX/lxe •* 
=  Z i = o * =  0.1 0 W = I x!x=0
_  gl'tOpOgl _|_ eiC(l)p1q°
= q + P e il it-*x
- o o Y ^ '  )
= 1 -  P + Peu = e L ~ * -
x=0
= 1 + P (e<£ -  l)
= e-*e* 'U =e*<-e‘l -V  
<p'(t) = \ teu . e ^ eit-V
162 163
UNIVERSITY OF IBADAN LIBRARY
__   * '(to ) _Pi  -
  X£t  -_   A, 
- m  H P r f * ' - '
0  (t ) =  Afe‘t.e A(eit
• = A ^ “ e ^ ' ,- 1>[Ae“  +  l] 0'(O = - t  e t2/2; 0'(O = 0 
0*(o) =  A?[A +  1] e W  = ^  =  £ =  o 
cr2 =  M2 =  £ (* 2) -  £(JQ2 m 2 = E(x2) = * M  = t 2e - t2/ 2 - e- t2/ ,
= A(A + 1) -  A2 
= A i2
Example 3: The characteristics function, and moments of the standard normal £ - 1 = 1  “  i2
distribution is given as: Var(X) =1 m 20-= - =m1l  
00
0 (0  = j  eitrf(x)dx Exercise:
where -00 Obtain m 3 and 77i4whatare your observation(m2,7n3)7n5)ableto equal to zero
00
For Binonial distribution
= 0 (0  = |  e*txe~* dx
— 00 00
1  /■ ”  _  f x 2 - l t x \  , 0 (0  =  ̂  e ltxPx (  1 -  />)»-*
= F ^ J - e (— J dx x=o x
By completing the square in the experiment
=  £x (* )  (P <?ft)x( l  -  />)"-* 
=  {Peu  +  q)n
= J L . J e - U l z ± ) \ M ! . dx
■ F 2 n  J  \  2  )  2
<t>Xt) = n(Pei t+q)n- 1iPelt
00
1 f - i  ( x - i t \ 2 - t 2/ 0 '( 0) =  inp 
= f 2 n J e , (— ) e ' 2dx ma =  —<P2(.o) = np
F ar (A) =  n(n -  1)P2 + np -  n2P 2 
=nP (l -  P)
165
164
UNIVERSITY OF IBADAN LIBRARY
0*(t) =  n(n -  a X P e lt) 2( P e uA-  q)n~ in _ 0 i(° )  _  , .2 +  P e u (P eu  +  q ) n " 1 rrij — ; — /Ij a2
= in ( n -  1)P2 +  in P  02 (0  = t'2(Ai - A 2)2e( 1 
m2 =  n(n -  a)P2 +  np P2 = Ai +  A2
9.4 Characteristics Function of the Sum of Independent Random Variables Example:
Let X and Y be two independent random variables with characteristics function’s Let X, Y and G be two independent random variable with binominal distributions and 
e lCX and e‘Cyrespectively. Let Z = X+Y and let 0 z (t),0*(O ancl 0y(O denote their let the characteristics function of X and Y be respectively.
respective C.F.S. then M O  =  [ i  +  P («“  - 1)]"1 
0 2ct) = [1 +  -  X)]"*
0 2(t) = E(elt2) = E[el«*+y)] w o = [ i + p ( « “ - i r
= E[eltxelty] ‘ Consider the r.y z = X + Y + G 
= E(eitx) E(eity) Because of independent of X, Y and Z
<PxMy(t) 0z (0  =  0 l(O 0 2(O 0 3(O
This can be extended to any arbitrary number of independent random variable’s = fl + P{elt — l ) ] n»+n*+n3
i.e. if Z = Xl + X 2 + - X n with C.F.S. as 0 z(t), 0 1(t), 0 2(t) +  - ,  0„(t) The above is a binominal distribution where the addition theorem for the binominal 
then distribution holds.
02(t) = 0i(t)02(O02(O -  <PnCO
9.5 Some Special Probability Distribution
Example: Suppose two independent random variable Xi and X2 have POl (A*) and These are probability distribution of special importance in either theory or practices.
POI (A2). Determine the characteristics function of Z = Xi - X 2 9.5.1 The One-Point Distribution
Solution 2r „ - A A random variable X has a one-point distribution if there exist a point x0 such that 2 P{X = x0) =  1 (degenerate distribution)
W i = r )  =  ^ r - : P ( X 2 == rr))  == l 2 f _
r! "  J r!
0*x(t) =  0 X2(t) = ^ We say the probability mass is concentrated at a point.
But the C.F. of (-X 2) is The distribution function is given as 
_  fO.x <  x 0
0x: (O =  ^ ~  U.X > x 0
0 Z(O =  e ' * * - ' ) The characteristics function is defined as
e [A,e‘c+Aze lC-/ti-A2J 0t(O — eitx° 
0 1 (0  = a ,  -  A2e(A>e<t+A2e' “- A*-^] m, = 0 ' (o) = x0
m k = 0 (k)(o) = x£k)
167
166
UNIVERSITY OF IBADAN LIBRARY
Var (x) =  m2 -  m* 
xg -  xg  =  0
It can be shown that if the variance of a random variable X equals zero, then X has a Show that /j3 = m3 -  3m1m2 + 2m\
onc-point distribution. = £(1 — P )(l — 2P) and
Proof
Since expression [X -  £(X)]2 > Oi.e. non-negative 
Var (x) = £[(* -  £ (x )]2 =  0 
Iff. P[X -  E(X) = 0] =  1 or Exercise
Obtain the mean and variance function for each of the following: 
P[X = £(*)] = 1
Thus, we find that the random variable X has a one-point distribution. (/) r f '  = e x p j i> / - - i r V j
(/7) f f 1 = a [ a - i t Y
9.5.2 Two-Point Distribution (ui) ( p t h ) = - a . r f a
A random variable X has a two-point distribution of there exist two values xx and x2 
set. 9.6 The Inversion Formula
P(X = xa) =  P, P{X = x2) = l - P  (0 <  P < 1) The characteristic function corresponds to a family at distribution which is obtained 
If we put x, = 1 and x 2 =  0 we have by adding an arbitrary constant to a d.f. o f a random variable. The inversion formula 
P{X = 1) =  P, and P(X = 0) = 1 -  P is a tool that can be used to get back the original distribution function on the entire 
Then the above qualities as a zero-one distribution. real line if the characteristic function is known.
A very good example of a zero-one distribution is the Bernoulli Distribution 
0(0 = Pelt l + (1 -  P)eu 0 Theorem
= Peil +  (1 -  P) Let F{x)and tf>{'] be the cumulative distribution and the characteristic function 
= 1 + P [eu -  l ) of A' respectively, then for given real numbers a and b, the inversion formula is 
0'(O = P defined as
0 (0  = P F(ll] -  F, , -  P.im —  f ------- 4 i ]dt
0'"(O = P ,n| f — 2 tt l  it
For every K
Proof
mk = P c
Pi = Par(x) = m2 -  m2 ‘ 2n  J it
= P -  Pz 
= P ( 1 - P ) First we need to show that |<P,l' ,|^  that e -  e is bounded.
it
168
169
UNIVERSITY OF IB ** ct -tl.ADAN LIBRARY
= — J — j /  Cost ( x - a ) -  Sin t(x -a )+ i  Cost ( x -b ) -S in  t(x -  b)dt}JF^
< E^Costx+ 1S int x\
£ E[Costx+iSint x) [Costx+ i S int jc) Since the integration of a Cosine function gives a Sine function which will later 
< e (Cos21 x + iSin2t x) vanish
i e\<P['] 1 -  J . j  Sin t(x - a ) -  Sin t(x -  b) ̂
< lje ‘*<fc|
a By complex integration [ ̂ ‘n- Vdv = —
{ v 2
Cl
/• and ^  dv = ~SgnP  where
<jdx Sin is |e",uj < 1
<l 1 it P> 1 
< b - a  hence bounded. Sgn P = 0 it F = 0 
Now it is possible to apply the Fubini's theorem to Ic as -1 it P< 1
- J  r e""* -* Corollary: (Modern Probability Theory, (1985)
2/r it i e‘udF{l)dr, (a<b) (1) Distribution function F  of a random variable and its characteristic function 
determines each other.
(2) If FJxJis the d.f. of a random variable then by definition, it determines the 
characteristic function uniquely.
- s i // (x)
Where ey',JC-°>_e"(r '’•can be written as Proof:
Car/(x-  c) + iSin t ( x -  a ) -  Cost(x-b)+ iS in t(x-b ) If Fand F 1 are the two d.f.s. corresponding to a given characteristic function 
then from the above theorem.
. j  _ _J_ j  j  Cos-/(x -  a) +iSin /(x  -  a)-Cost  (x -b )+ iS in  l(x -  a ) d t ) ^
^ , - ^ , = ^ , - ^ 1  (*><■)
multiply numerator and denominator by i At all the common points of F  and F 1.
I r 2i (  Cost (x -  a)+ i Sin t(x - a ) -  Cost (x -  b) + / Sint(x -  a)dt) \  , Allowing b to vary for fixed a
= 2tt}x i [  it y  " K.\ - = = a constant
asC ->x> Bui F ^ - F ( +  qo) — 0. Allowing b to increase infinitely through continuously points 
of F  and F 1. This implies that F ^  -  F{u) -  0 and hence continuity points of both.
170 171
UNIVERSITY OF IBADAN LIBRARY
\ e‘' (g + p e ^
2*-J,
---k- + —k = 0.;  x<a, x< b,
2 2 - £ J ' - £ C K ' ‘*
0 + - j  = \ x = a, x < b = P V  J J c o j f ( x - y ) - i S ,/w /(x -y)rf/
j Sin f( x - a ) -  Sin t(x -  b)^ y ^  + y ^  = x \ a < x < b
Let //  = /( .r-y )
-/*^/ ^_  +/ ^0  ==  y0^;-x,>x <a ’ a,x = bx > b
i
= ̂ Z [ J - p V _/2 J C a y /(x -y ) - /S w /(x -y )*
■ -  Jo-dFu) + j +  j W „ ,  + J - ^ , But
-* -II -* -«
_ rS & i/^ -y )* r C<»/(x-y)
I (( *x--y/ ))  0 J F(xT-y))
= /r -0
= (̂o+Ol “ ^u-0) + I f a )  ~ V . )  + ^  [̂ (o*0) — ̂ *(o-0) ■ - S  " /> V " '* r A j .
“ ^*) “ (̂U) rim
II' (i and /) arc points o f continuity of F .
= iz
/ > V '
*a\J,
Example:
/> V "
If <p\'] = [q + pe" )", calculate the p.d.f of a random variable X. 7 ;
Solution OR / - v (;3= w/ v
7T -> CO
/ / ■ ^  ( f - x )  f i r - *
Given ^J'1 = (<? + pe")’ J ? ( f  - x) /w/s | t “#/#”
,  J O ?  ( C i s L f i a ; -
•#>  > > (/■ -x j /* w li
1 7 3
172
UNIVERSITY OF IBADAN LIBRARY
C H A P T E R  10
IN T R O D U C T IO N  T O  M E A SU R E  T H E O R Y
10.1 Introduction
Probability theory is a part of mathematics which is useful in discovering the regular 
features of random events or phenomenon. In probability theory, the sigma algebra 
(which we shall define later) often represents the set of available information about a 
phenomenon. A function (or a function of a random variable) is measurable if and 
only if it represents an outcome that is knowable based on the available information 
about the experiment, the event to which it belongs and the probability function.
For us to understand how a probability measure can be obtained, let us develop an 
abstract model for the probability of an event particularly for infinite sample space fl 
from a specified experiment.
10.2 Abstract Model for Probability of an Event
I.etfl be the sample space such that H = {w* i =  1,2, 3 ,........ }
w, are called indecomposable outcome or simple events.
The is a decomposable or compound events, that is Ex = {wj i = 1,2,3 }
The elementary definition of probability is
PART TWO r , ( r \  _ No o f  fa v o u r a b le  cases^ '  T o ta l n u m b e r  o f  ca ffles  ...................... '  '
Since events are subset of H , it follows that the union and intersection of a finite 
number of events and the compliments are also events.
(1) For the model of mirror reality, the operation above can be represented by 
A, B, A U B,A n  B, A , B . That is all statements about events can be written in terms 
of u,.n.
(2) A random for defining probability in term of weights is to allow for the fact, 
that some events are more likely to occur than others. The weight of a set is just the 
sum of the weights associated to each point in the set.
174 175
UNIVERSITY OF IBADAN LIBR
RY
Let ft be sure event, the impossible event will be (p. Let A be a non-empty class of 
subset of ft called events. Let P(be the probability) be a real-valued function defined Any collection of events is a class of events. Classes will be denoted by A, B, etc. 
on A. Such that P(E)denote the probability of event E.
The pair A, P is called the probability field and the triplet (ft, A, P) is called the Example
probability space. Let ft be the real lineR containing all the real points w. i. e. ft =  {w: — oo <  w < 
oojalso let
10.4 Axiom for Finite Probability Space A -  {w : we(—co, a)} and 
(i) If Ej 6 A for i = 1,2,..., n then B =  (w:we(c,d)}
Define:
n n (0  A n  B ; (ii) A u  B\ (iii)Ac and Bc and give your assumptions
| J e< G A and f~ | Et G A (iv) Show that the compliment of an interval need not be an interval.
<=i i«i
0 0  If E G A .then E' 6 A Solution
(ill) If E e  A . then P(E) >  0, also P(ft) = 1 A r\B  = <p i f  a < c < d
(iv) If E and F are any two disjoint events, then P(E + F) =  P(E) +  P(F) =  (c,a) i f  c < a < d 
(v) If A, B and C are any events, then: = (c,d) i f  c < d < a
PiA i) + P(A2) +  A3) =  P(Af) + P(A2) +  (PA3) A U B = {w: either WCA o r c < w < d } will not an interval i f  (a < c < d)
= P(AX) + P(.A$A2)  + A \A \A i  +  - Ac =  (w:a <  w <  oo)
Bc = (w: <  wor w > d)
The number of possible outcomes of an experiment (E) may be finite ot infinite.
Ac n B = B i f  a < c < d 
Let w denote a sample point (an outcome) from the experiment. Ac U B = Ac i f  a  < c < d
Let ft denote the totality of outcomes of E i.e. ft =  {w1( w2, ...} BCAc i f  a < c < d
Let event A={w: w< eft}be a subset of ft. e.g
(i) B={Wj -  oo <  w < co); all values on IRL On your own. define the above if c <  a < d ■ or i f  c < d < a 
(ii) C={wi: a < w < b}-, all values in the range (a,b)
Sequences and Limits
(iii) D={w,: w0); a singleton. A sequence of sets is an ordered arrangement of sets in order of magnitude 
(iv) E={w,: Wj, w2, }; a doubleton. .Monotone increasing sequence: A sequence of { sets  {/ln} is said to be monotone 
(v) F={w: iv. = 0); an empty set.
( increasing if An Q An+, for each An.
The class of all subsets of f t  is called the power set of ft such that if f t  contains n If the sequence {/!„} n =  1,2,... is monotone increasing (non-decreasing) if for every 
points, there are 2n subset of ft.
Thus, if f t  is finite, the number of all possible subset is also finite. n, wchave An+1 3  An
Then the limit of (/ln) is the 3mm of the sequence i.e.
The power set of f t  when ft = {w,. w2, w2. w4) => 24 = 16
176
177
UNIVERSITY OF I ADAN LIBRARY
(ii) The limit of {An} is said to exist if limAn =  lim An =  A,
A = Y  An =  lim An (iii) If {A^} is not monotone and A exists then An —» A i.e. An converges to A.
nZ_ilj n—oo (iv) Even if 1im An does not exist, limAn and lim An will always exist.
or
■ n a OO
= Example:( J1' 4* = An; U  a ‘ = a i. e. An T Ak k = l Consider the sequence {4n} where
= {w: iv belonging to all Ak except Av ...
A„ = w: 0 < iv < b +  ̂ ^ " /n ; (b > 1)
CO
Sup
= i > Does the series {An} converge?.
k = T
For any arbitrary monotone increasing sequence {An), the limit is
OO OO
C = linAn = li sup Ak = |~= l|   Ak k[=Jn Ak Solutionk fiv: 0 < w < b + —\ ,‘i f  n  is even,
Monotone decreasing sequence: A sequence of sets {An) is said to be monotone Let Cn =  * nJ
[w: 0 <  w <  b +  ( 7 (n + x)) j ; i f n i s  odd
decreasing if An+1 Q An for each An.
If {An) n: 1, 2 ,...) of events, is monotone decreasing (non - increasing) and for every limAn =  {w: 0 <  iv <  b)
n we have An rj An/+,',1 , then the limit is the product of event [An) i.e. Similarly,A — Final An — limn_=oo ‘A4n or [w: 0 <  w <  b -  (Vn)]l t f  n s = {w: iv belonging to at least one o f  An, An_ i ...)n oo { [iv: 0 <  w < b -  (V (n + X) ) ] ; i f  n is evenr- 1i  i e A n l A limA„ = {0 < w < b]k= \ k Therefore, lim An *  limA„
For any arbitrary monotone decreasing sequence {An},the limit is
Hence. {/!„} does not converge
OO
In f Exercise:
k=n If An = A: n = 1,3,5,...
= B:n = 2,4, 6, ...
Limits: B -  UmAn = lim inf Ak =
k - l k = i Show that lim An = A u  B, limAn =  A n B 
Note that When docs lim An exist.
(0  linAn £  linAn
178 179
UNIVERSITY OF IBADAN LIBRARY
Exercise: Corollary:
Examine the following for convergence, if convergent, derive the limit;
p |  /l, =  i4x +  A\A 2 + ACXAC2A^ + -
W  ^ -  =  (0 ,V 2 n ) .^ " «  =  [ - l .  V(2ntI)] £=1
If
tb) An = | the s e to f  rational in ( l  -  1/^n  +  ^  1 +- */n)j co
(c) An = 2-1/n, 2 +  2/n), n is odd. W 6 P j  Ait then w belongs to some /lf
i=i
Thus w may belong to Ax or Acx or Acx or A2 or A \A \  i.e. w G Ak for some k. 
10.2 Obtaining Countable Class of Disjoint =» iv £ U ?.i establishes equivalent of both sides of (*)
Lemma 1.1: Given a class =  1,2..... n}of n sets there exists a class
{/?,-, i =  1, 2.....n) of disjoint sets such that U”=i At =  Ef=1 Bt
10.4.1 Definition: Additive Set Function
Proof: By induction A set function /u is said to be additive if V A,B,sJ. = <p(A) + <p{B) and by finite
A^ U A2 = Al + AXCA2 induction.
= Bx + B2 = E?=l B( (say)
This is true for n = 2 = ), V-i-t- j  A, fl = (f
Suppose it is taie for all n < m > 2
Note
Then =  ( U ^ / l i )  U /lm+1 •
• Once the value + oc,-oojs not allowed i.e. <f>*-co
• If all the values of <p are finite, then (p is said to be finite i.e. |$o| < oo
• If every set in a given class d is countable union of such in d and which is finite, 
=  Z r . i» i  + ( E ^ M m+i
m then (p is said to be z -  finite.
= ^  Bi +  Bm +1
i=t 10.4.2 Continuity of Additive Set of Function
An additive set function is said to be
Where /im,, and are disjoint. The lemma holds for n =  m +  1. So Bi c (i) Continuous from below if
V-'
for every increasing sequence \En} t
(ii) Continuous from above if 
</\ f E„]=r- (_f (p{E„)\A
181
180
UNIVERSITY OF IB - •‘ •‘m :«?>«?’ 'Sl> vv . a -ADAN LIBRARY
For every decreasing sequence {En} 1
Theorem
s.t. <p(E„) < co for some values n = n0 and hence for all n i> n0.
The probability function Plmi is a set function that has r -  additive property and hence 
A set function is said to be continuous if it is continuous from above and below.
is a measurement space.
Theorem
Let cp be finitely additive and continuous from below, then f i is  -  additive. Krample: Let (O..F) be a measurable space on which a sequence of probability t
measure Pi,Pl ,...Pu... defines a set function.
Proof < j
Show that P [ E )d . / ^ J—  /^,(£)is an additive set function.
Given a sequence of disjoint sets{£„}, then 2" .,
Solution
It is required to show that 
Let N be a finite number, since (p is finite additive, then (0 0 < /? .,< l
(ii) Plm, is counrably additive and is a measure
V.ns| J » “ l (iii) Prove that P(f2)=l
»N» NSi i -VX sn
-fl
I“  iX . . (y) t̂£) = ~  ̂ (E) + ^ r P:(£) + JT PJ(M + ”
Let Sv = En be an increasing sequence but -l P ,m 2  0. - L p 2lE)> 0 ...
l»«l
(p\?im S y J= Cim tp(Sn) and +
2-
y - i - .  s - 2 _ = _ l » i 
n r2 n “lfl * i - r  i _ y 2
■ # ■ ) 0 </>.,< I
(ii) 1.x! 1/i, i be a sequence of disjoint set, it is required to prove
By finite additively
•/ «*  iv .y|  , » ._. \_ >  'i»  )1
nsI ns| from i .1’ c
(pis r -  additive.
I S3
1 8 2
UNIVERSITY OF IBADAN LIBRARY
* 1  ® Proof: (Using mathematical induction on n) 
4
1=1 /  = nm| Uu-i * / for n =  l:  P {E,) =  )
for n =  2: P (E , [ J E 2) =  P f c ) + P (E 2 ) - P ( E xV \E2)
= t ± t r M
nm | ^  1=1 The result is true for n = 2
Since each of /*„ is a measure and 0 < P(t) <  1 » = 3: />(£, U £ 2 U E, )=  />(£,) + P(E2) + P(E,) -  P(E, R E2) -  P(E, R £ , ) -  P(£2 R £ 3)
* +p(E ln £ , n E J)
■■■-I
1  1=1 - Z*-i ' W L, 1«-i ?z !J
= 2 > U ) * i
i-i Assuming it is true for n and also tme for n = m -1, we have
- Z1-1 ^ ) p (£ ,U £! u ...U £.,.i) = / :( u £. 1 = Z />(£ . ) -  i > f e n £ y)+ I f e n ^ n s y )V  i - i  /  <■ ! i< /< y< o«+ i i s i < y < i & n + i
^.)is countably additive. t ( - i r !p (£ ,n £ 3n ...£„ .,)
(Sii) = 1 ^ . ( 9 )
« = m : i { y £ , l = / ( [ j £ , U £ . ] - i p ( £ , n £ y n £ „ ) +  x ^ n ^ n ^ n * . , )
V i - i  y  V  i - i  )  i s < s y s * s o » - i
=*±=l z±  (1) 1 r  £(£, n  e 2 n ... n  )
1 1 1 . Assuming it is true for it = m, we need to prove that the theorem is true for n =  m +1
2 4 . 8 -
>1 • S in 6 S K =.—  = - ^ _  = l 
• -1-  ^  : m
Le tE  =  [ jE n then
/-i.
10.5 The Halley-De-Moivre Theorem ^ Q e ,.J = P ( E U E .J
Theorem: Let { f jb e  a class of events each of which belongs to a r -  field 91, and 
each of which may or may not occur. Then 
= ! > ( £ , ) -  Z p ( E ,n £ , ) + ( - i r + z ^ M ^ n  £,)+ ...
P\at least one o f  the event E\ occur} «■ I ISi</<*> IS/< j<kSm*i
+ ( - i r p [ f V
v .- i  y
= / f O £. ) = ! > ( £ . ) -  Z ^ . n £ j +  Z r a . n £ / n £ ‘ +( - T l« ’(n £ ,) This implies that the result is true for all positive integers n.
V i  = | /  » « l IS  li j& n  IS 1<  7< *S » i
1 8 4 1 8 5
UNIVERSITY OF IBADAN LIBRARY
Erwuupie: Sotuttrn:
Lc-i 11 be events which belongs to a r -  field %  shew that^e  probability (j) Let l:\ denote the event that ihe i h letter and envelope •march
that exactly K events occurred out of n is given by
ir.<O -
n + k \
- I K 6V-I ‘m C:( ’M y V - l ) /  C\ n ( N - 1 ) N - 2 ) ;* '"  ^  >/!
Where S, * fl Ea f | ... D £» ) r y~ r3 X4-■ l-  r 1 Since e 'x = 1 - x  + :----—  +  ----- ...
.. 2! 31 • 4!
From Halley-De Moivie theorem
)E.] = Si - S 2-+Si - . . .  + ( - l} - 'S ll {i.e.Pmi?(q/'l or 2 o r3 or...or N match) envelope ■
<«i J
If k = 0, no event occurred: .v..p(|Ji:, j -  i -er'1 = 0.63212
V<»i .>
-“(A*, U£, U...U £ j  -  1 ■- / ’(ft'U E, U -U £n) .  = 0.6
= 1- S’. r V o . t f - S . , ( n ) Takirfe limit as N  —>x>
, . -  P{nOn o f  the events occurred) _  l r 1 .
______ j i”
L   ___  4I-
Example 2: . . . 2! 3! 4!
Suppose /r letter and corresponding envelopes are typed by a typist. Suppose 1 + 1 1 1 1_____ .1.
further that the messenger, who is in a hurry to leave for the post office, randomly 2! 3! 4!■ft
insert letters into envelopes, thinking erroneously that all the.letters were identical. 
Finch envelope contains one letter, which i.e. equally likely to be any one of the p  10.6.2 Countable Probability Space
letters. : Sometimes it is impossible for all the sample points in a fi to be equally likely. 
Hence, each P, is viewed as unit probability mass among the sample points following 
li) • Calculate the probability that at least one of the letters is inserted into 
a certain rule or law. This law is sometimes referred to as probability distribution.
its correspondence envelope.
(it) Find the limit of this probability as N -» oo
Example: Fora geometric distribution 
Suppose Q = {0.1. 2. ...}and
/ » , , * ( \ - 6 ) 0 \  x  = 0. 1,2...... (0 < ^  < 1)
m 187
U '—NIVERSITY OF IBADAN LIBRARY
Then Pt -  P[x) > 0, £ / | v) = J ,2) From (1) let \  =  {{a}, fb. c}, fd}, fi, 0} and P{a} = P{d} =  V 4 c) =
Pffi} = 1, P {0} = 0.
If ACO^then P(a] = Y j P[a ' The triplet (fi, L P) is not a probability space since  ̂do not form a field.
xO.4
Poisson Distribution Exercise
C-et <fi= fo, 1. 2 ,...} and A is a class of all subsets of £1 If P specifies that (1) If fi = {w1,iv2}and IF =  {fi, 0}
Show that F is a o -  field.
e lX
PU) = x! (X> 0\ x  = 0, 1, 2,... (2a) Is f  = {/Ft, £2» ■*•.£*} afield.
(b) Hence or otherwise obtain all the elements of the a -  field of t.
T^en P(s\ is a Poisson distribution and X  is a Poisson random variable
(3) Consider the sample space
F =  {0{W!, w2}, {w3, w4}, fi}
Definition 3:
\ f  A =  = {w3#w4}
A class of sets A is called a field or a -  field if and only if the following conditions Show that F is a field; •
hold true.
1. If E, 6 A, then U"=i Ei 6 A
Exercise:
2. If E 6 A, then E' G A Let Er,Ei, '..,En. denote an infinite sequence of events in a — field A.
From the above, it follows that Define
3. If Ej 6 A implies U"=i Ei G A , " W . .
Example h m=n . •
OO
A = {fi, 0 }is a field.C B„=IB  ==  [A, A }is a field m - n{A, H, 0} is no t a field, since A g C (a) Prove that BnCEnAn V-n .
G= (A, B . A B . A  U B.A U B ,  a  U B ,A  U B , A B ,A B , A B , A B ,  fi,0 } is a field. (b) Show that {/4n} is monotone decreasing.
The class of all subset of a given set fi is a field. (c.):show that {fln} is monotone increasing.
Example 2: 10.7 Sigma Field (o -  Field)
(1) Let fi =  fa, b, c, d} and 5 = {{a}, fb, c, d}, fi, 0} A non-empty class of sets which is closed under complementation and countable 
i-e. ? a field? Yes P(a) =  ^ , P(b, c, d) =  ^ unions (or countable intersection) is called a field.
//ps-CQ, l;, P) is a probability space. Note:
Yes, since  ̂forms a field. • A field containing an infinite number of sets may not be a 0 -  field.
1 8 9
1 8 8
UNIVERSITY OF IBADAN LIBRARY
M I
Into section oi an arbitrary number of a — fields is a o -  field.
10.8.1 Borcl Set
1 0 .8  B o r c l  H e l d Borel field and Borel sets play a very important role in the study of probability. 
Hus is a subset of the real line. Let C be a class of all intervals of the term Monotone field: A field A is said to be a monotone field if it is closed under 
(--oo, .v).* G IR as subset of the real line !RL Also let ( 0  = Tl be the minimal field monotone operations, i.e. if lim An e IF whenever {IF} is a monotone sequence of set F. 
generated by d. i.c. Ane F./t,, T A => A e F 
f hen 'ft contains the intervals of the form [x, oo) (i.e. compliments of (—«>, a), it also A„e ¥.An l  A => A e F
contains the intervals. Theorem: A a -  field is a monotone field and conversely.
Proof: Let A be a a — field and Ane A. If An T A, then A = U„ An\s a countable 
(-<»,a | = n  ( - 00, a + “ ). by coutable intersection 
union sets of A\. Hence A e A. Similarly, if An l A.A = C\nAn is a countable 
, ' (a.oo) = (-oo, a |“ by complimentation
(a.b) = (—co,/;) n (a, oo),a < b intersection of sets of A.
(a,b\,[a.b).etc fo r  a.b G K. ••• A e A\, hence, A is a monotone field.
Conversely, let A be
I enuna n n a monotone field and let Ax, A2 .... be sets belonging to A.
1 ct be the class of ail intervals of the term ({,b),(a > b)a,b e IK but arbitrary. Then ( J  Ak and j "~j Ak belong to A\ since A is a field.
k=l k=l
Then a ( t \  = V).
Proof: By (*) (overleaf) a.b.cty  for all a, b. Hence, These are monotone sequences whose limits ( J  Ak and p |A k must belong to A.
By definition of minimal field. a ( t x) c  i k= 1 k=1
fo prove inclusion Thus A is a a -  field.
Let x e (a.b) then.
U“  i( - n . x ) e o ( e x),V 10.9Kandom Variable in Measure Space
l et ft be the sample space with sample points w. Interest is usually in the value n*) 
=> (-oo.Jf) a (et) ^ x associated w ith w.
l' c  rr(/',) as defined in the last example. (a) Point function: function on the space ft to a space ft assigns to each point 
If is also possible to prove that die Bore! field is the minimal field containing any one w e ft a unique point in ft denoted by X(W). Thus X(VV) is the image of the argument w 
of the following-
under A' i.e. value of X at w f t  —— » Q'
e, = {(—oo, x |.x  6 IR} i/rnuw n r u n g r
f ;i = ((a.ftl.fl < b.a.b e r- ’ The set Q* = |X(lv): we ft| which is a subset of Q’ is called the strict range of X.
f  , -  ([a. b I, a  < b . a . b  < ■'} If i f  £1" => X is a mapping from ft to ft.
C,, -  {|a.b),a < b.a.b e IK) The symbol X(vv), etc will be used to denote functions even though they denote 
— 11 a c o ) ,  v t- IK) . t c . values of functions.
: mi
191
UNIVERSITY OF IBADAN LIBRARY
Kxample 1: X_1((w}) = {{w c“ft}: X(vv) = w1)
Let n = |0, ±1, ±2,... |; ft' = (0,1,2,... |
Note that for a point w' e ft12 one or more than one points in ft whose image under 
ft = 10,1,4.9...,]; i f X {w) = w2
X is w l. Let /!' c  ft1. The set of all point for which X(W)e /?1 is called the inverse of 
Thus .Vis0 a mappin0g of ft into and onto Q*...i - 1 i /Sunder X denoted by X-1  (H1)- % ,i- 2 2 With every point function X, we associate a set function A”-1 whose domain is a class ±  3 3 s(j of subsets of ft and whose range is a class'^ (say) of subset ft. Then, X-1  is called the ‘inverse function' (or mapping) of X.
X ( B )  = |X(w):w E  B \ . B C a
iv, = w2 => X(wl) = X ^jone -  to — one X ~ l ( y )  = \ B  ( B ■): B e y  I
In this case X(wl) *  X(w2) A w, = w2 X -I( f t)  = [w:X(w)c f t| = f t
X(w) =  w2is not 1 — 1 function Lemma:
Inverse mapping preserves all set relations.
Since w, = 4-2, w2 = — 2 have the same image 
Proof: Let W c  C c  ft’, then
X(w i) = 4 = X(iv2)
X '(« )  = \w:X{w)e B \ c \w.X{w)eC\ = X ~ \C )
If ft is the real line (-co < w < co) and ft = (0 < w < co)then X(w) = exp(w) is a 
(d) Indicator Function 
1-1 onto function from ft to ft and 1-1 from ft to ft. If the range space is & or its A real valued function lA defined on ft as
subset, the function is said to he a .numerical’ or ‘real-valued’ function.
xa _  (=  1 if w e A 
(b) Set Function ' " A (=  0 if w e.Ac
II the arguments of a function are sets of a certain class, then we have a set is called an indicator function (characteristic function by some authors). The strict 
function. Suppose Zl- E A\, we associate a value p(/l), (say) then n is a set function. f.i range /,,is /„(ft) = (/„(w): weft} = {0,1}. If B is a set function and B c  R, the range 
may represent entity such as weight, length, measure, etc. space then lA l(B).= <j>. ifB does not contain '0' or T  
The interval (a,b) may be associated with b -  a\ f(a ,  b) U f (c .d )  = (b -  a) + = A, if U contain 'T  but not '0'
(d -  c), etc. - A(\  if 13 contain '0' but no t'T  
I wo real valued function X and Ton ft are said to be equal iff X(iv) = Y(w) V- w e ft. = ft, if 13 contain both '0' but not'T  
i. c. X = Y Thus IZl(B) = {<p.A.Ac.n} = <x(/l)
Or X ^ Y  iff. X(W)̂ Y lw) Clearly Cl/X lakes value C on A and 0 on Ac 
II X(W) = C F w eft, then X  is degenerate. Hence. (CI.JJU = l^(.A)
(c) Inverse Function Properties
I he set of all points w e ft whose image under X  is w1 is called the inverse of {w1} (i) If A c  H <=> lA <  lB
denoted byX~‘({w}). The A r =  13 <--» l,\ = 1 .m
UNIVERSITY OF IBADAN LIBRARY
/„ = /2(/t) = l A(.A).!n  = 1
Proof:
(ii) l(A(:) = 1 -  [(A ); l(B—A) =  1(B) -  >A 
o n »i 00 l*i.B ...........................................cA
m u . u  = f^ j^1 = I >
j=i t=i
(=i
= min (Iai*-—Un)
0 V)*(AuR) ~ U +n *B ~  IA- ' b “  maX Oa- ' b)*(A+B) =  *A +  1 [i=Ji   A i =  ^i= l  l A i ^ l A i A Ai +  ^i= i  1A i ^ * A i A Aj +  A k 
Let Bk c  fl ,then w eX_1(n Bk) <=> X(W)€ n Bk => P |  B.elB
<=> X(W)€Bk Ft 1Thus, B is closed under countable intersection. Hence, IBis a o -  field.
<=> w €X_1(Bk) f1t 
«  w e n  X_1(Bk) 10.9.1 /(A) as a Measurable Function
Hence, X"'1 Since lA '(B) = {<plA,Ac.Cl} = <t(a)
k ‘ Let A\ be o -  field in fl. If/If A, Acalso belong to A and lA 1(B)eA =* A eA 
Similarly, Thus, /„is A -  measurable iff AeA .
10.9.2 Induced a  -  field
(as above) Let X be a real valued function on ftand Bis a Borcl field. By corollary A’(overleaf) 
(iii) weX~'(Bc) «  X(W)€l3c ~  XM  € B the class of Sets X~ x(B) = B cB} is called the o -  field induced by X.
<=> weX-'(B)  
.■.X~'(BC) = (X -‘( S ))c 10.9.3 Function of Function
Clearly ( f t )  = [w:X(Ml)cft') ^ f t If X is a function from ftto fl and X is a function from fl to fl , then the function 
X~'(<f)) = 0. (prove)? X (X’(w)) from fl to fl denoted by X X or X (X) is said to be a function of function • 
or composition of two functions X and X .
Corollary: (A) Its inverse (X X)~l is a function on the subset of fl to the subset fl such that for any 
If A\ is a field, a class of subset of ft, then the class of IB o f all sets whose inverse BC fl".
images belong to A is also a o -  field. W xX y K B ) = \ w : X l {XM )eB] =
=  [w:XM e X ' - ' t f )  (XlX)~l =  X ^ X 1' 1
194 195
UNIVERSITY OF IBADAN LIBRARY
10.9.4 Measurable Function Lemma: A' is a random variableiff A " '(f) c  A\. where f is any class of subsets of 
l.01 X bo a real valued function on ftto R. R which generates 33.
Dctlnitiun 1: Lei Proof
1H5 is a a field of subset of ft. If X “1 (Ii)e A for II Borel sets B  e IB. then X  is said to Show that X l (f) c  A » X - ' ( S ) c  A .S in ccfc  A and X ' '(33) c  
bo function measurable with respect to A, i A\,X"'(f) c  A\ conversely.
DefinationZ: If ft is also the real line R or its subset, and If A' is measurable w.r.t. the Since A\ is a a -  field and
Morel field 18 on the domain; then X is called a Borel function. X~l(t) c. A  =* a(X_1(-)) c  A 
=> X~l (a (f)) c A  
10.9.5 Random Variable (Economic Definition) => A
Suppose ft be a sample space. Let A\ao — fieldof events associated with a certain 
fixed experiment. Any real value A\ - measurable function defined on ft is called a 10.9.6 Vector Random Variable
random variable. Thus,’X is a random variable iffB~x, the a -  field induced by X is Suppose w eft, the associate X(w) = (X\wy, Kfvv)) a point in the 2-dimensional 
contained in A\.
huclidian R2. The Z define a function from ftto R2. Consider the class of 6 of all 
Suppose we define two non-negative functions rectangles bounded by the lines xx = a.x =  b,y = c,y = d,a < b,c < d arbitrary, 
X(w) = *(w)» '^X(w) ^  0 flic minimal o -  field containing f  in Borel field (332)in R2.
= 0, it X(W) 0 /. is called a 2-dimcntional random variable ifZ "’(332) c  A\.Z~l(®2) «s a a -  field 
and induced by X.
X(W) ~ (̂w)> < 0 Illustration
= 0 if X(wj >  0 QO •
The above are respectively called the positive and negative parts of X. Then S„ = £  X t, E(Sn) = nA.a(Sn) = VTil
A' ’ and X are Borel function of X and will be random variable if X is a random i- I
variable 
Note: flic moment generating function of Znis given as
1 11 These functions play an important role in the theory of integration of M _  „CO ,  (VJ
probability function.
(2) To show whether a function is a random variable, it is not necessary to 
determine whether X~l (B)e A\ for every B in 33. It is sufficient to verity 
X ~  l(f) c  A where C is any class of subsets of R given in sub interval on log/W/(t) =  —t'fnA -  nA -  e
page 8. =  - t V 5 * - .U  ( i - U + j s + s 5 + 5 ^ + - 1  )
1 % 197
UNIVERSITY OF IBADAN LIBRARY
lini log » 1 , *,2 CHAPTER 11
l|-«CX» ' * /.(£) = —2  => My{t) -  e 2 
LIMIT T HEOREMS AND LAW OK LARGE NUMBERS
= m g f  o f  A/(0,1)
Problem 11.1 Introduction
Suppose that S„ has the binomial distribution b(n,p)- show that
distribution The law of large numbers is concerned with the conditions under which the average of 
%n----------- » N (0. 1) a sequence or random variable converges (in some sense) to the expected average as 
the sample si/.e increases.
Theorem: Let Yn, n > 1 be a sequence of real converging to Y0 Then the sequence 
r,+y2 y,+rz+r:, y1»y2+-.y,i
x ' 2  ' i  n 11.2 Concept of Limit s
Also converges to Y0 However, the inverse is not true. Let .v„ be a point in some intervals oflhe real line '.H. Let /  be a function which is 
Proof: delined at every point of /  except possibly at .v„. The limit of the function as x 
Let > 0, we find n, s.t.n > n => (V̂  + ••• /„) — K„| < £ approaches v0 is /, written as
Since Yn -» K0 3 no s. t. |Tn -  K0| <  e/ 2  K > 1 irn /,\, = L ov /( f| > L as x —> x
l ind > n0 s. t. — K0| < e/ 2  for convinence If for any positive number X (no matter how small) there is some 8  greater than zero 
We claim that n > n, => -  '"0| < £ such that
Then iriii ,/ I _  |(yt+yo)+-t(yno+»b) |/ ( i Z.| < £■„ for all 0 < |.v~.v(1| < S1 it K°l “  I n
|(yl+yll)+- + (y,,o+yo) + (y„oM+y0)+--My,t + y0) I rom the above definition, the number e  > 0 is first given, then we try to find a 
it! n number d > 0 which satisfy the definition.
Example 1: Prove that < in i (3.r — 4) = 14
n  0 n
V  \Yi- Y 0\
n i—i n0 + 1 Z_i Solution: Given A > 0 , find 8  > 0 [depending on 1 .1 s.t. 0 < jv - 6) < d. we have 1=1 i=n0+l
\ f  -  14| < e
* * / 2  + '-T *el 2 -> |3.\ 4 14| |3 ( .v - 6 | 3j.v 6| <  38
S £/ 2 + e/ 2 
N o te  th at  | \ 6 j <  A
<e
Thus, (V" “  K(lVn ^  0 i.e. Wl.LAN holds.
Cov (XY)- H(XY) E(X) E(Y)
IR S
UNIVERSITY OF IBADAN LIBRARY
p { x >  0} < -â
Example 2:
x  + 1 ProofProve that ( Suppose X  is continuous with density function
'-*2 3*+ 4 10
Solution: Given £0 > 0, we went to final
0 < |x -  2| < £, we have < = [ V w  * + « £ V ( x) a
> £ x f ( x)dx
/ ( x) ^ _ * + l 3 x + 2 x - 2  • 8
J '  10 3*+ 4 10 ~ 10(3* + 4) 10(3x + 4) '  10(3x + 4) > j~af(x)dx
If x is sufficiently near to 2 so that = o ^ f ( x)dx
3.r + 4 > 10, thus 1 <1 > aP(X £ a)
10(3*+ 4) 10
/. aP[X > a) < E (X ) => P(X  £ a) < -̂ a
Thus l / M — I 10(3.r-4) 100 The above is for a single variable X. Suppose we have a sequence of variable 
8  = 100 £ {X„}, n = l,2,...n, then we have the Markov’s inequality for a sequence of {A',,}as
Theorem: Let /  be the constant function defined by = C where C is a constant
P X .
tint f ( x ) - C
Proof: Given s>0,  find 8  > 0 such that 0< |* - t7|< 5 =>|y|T)-Cj<£- 11.4 Bienayme-Chebyshev’s Inequality
Theorem: If A  is a random variable with mean n  and variance a 2, then for any value
The distribution of certain statistics of interest are too complicated to derive for e> 0 :
differing sample sizes. In many cases, limiting distributions can be obtained as an 
approximation to the exact distribution, when the number of observation N is large. O"
Thus, most important theoretical results in probability theory are limit theorems. Proof:
Let consider some useful limit theorem. Since ( x - r t  is a non-negative random variable applying the Markov’s inequality
11.3 Markov’s Inequality with a = k \ we have P{{X -  /j )> K }< —-—<r~ ~
Markov’s inequality can be used to obtain approximate probability of an event given 
that the mean of the probability distribution is known. but since {X -  ft)" = K '  iff \X - | > &,then the above (*) is equivalent to
Theorem: If X  is a random variable that takes only non-negative values then for any 
value u > 0
2 0 1
200
UNIVERSITY OF IBADAN LIBRARY
lim P{jXM| ><r:}= 0 is satisfied.
Note: It is only the probability of the event tends to zero as n->oo, it does
The above inequalities are important in that they enable us:
(i) derive bounds on probability when only the mean, or mean and variance of not follow that for every e: > 0. we can find a finite n0 such that for all n > n.'; the 
the probability distribution are known. relations \X,\ <e will be satisfied.
(ii) Determine the convergence of a sequence of random variables or sum of 
independent probability distribution. Example 1:
(iii) Prove important results in statistical theory Let {A'J be a sequence of binomial random variable with
Example: =.0 =(")/>" ( i - / > r
(i) Suppose it is known that the number of eggs sold in a poultry farm in a 
month is a random variable with mean 75 crates. To show that lim pfx„|>e}= 0Hf* * ’ '
(ii) What is the probability that the sales for next month is greater than 100 
crates. Solution:
(iii) Ilf the variance of the sales for the month is >5, determine the bounds on By Chebyshev’s inequality we have 
the probability that sales in the coming month will be between 50 and 100 
crates. E(Xh)- n.p\ Var (x)=  or
v  n 
Jnpq
Solution: Let X  be the number of eggs sold in a month But Chebyshev’s inequality states a  = r
(i) by Markov’s inequality ■Jn
a  2 /
P(X>Vto)< —  = -  P \X  - p \ > e } < ^ -  fo re > 0.
v 7 100 4
(ii) by Chebyshev’s inequality a V  ,
or P \X \>  K a  < ^ r - 4  - — 7  
/>).*'-7 5 |>  25 = — 1 n| 1 k2a 2 nK:
' 1 1 252 25
p \X n\ > k o } < \
p \ x  -  75| < 25} > 1 — — = — A
1 1 J 25 25
Lotting 11 — - wc have
So the probability of sales of eggs for this month is at least —24
Definition: The sequence {Xn} of a random variable is said to be stochastically P\X  |> e}< -^ - = n e
convergent to zero if for every e>  0 the relation
2 0 2 203
UNIVERSITY OF IBADAN LIBRARY
PrjX . - n p \ > e } < & 11.6.1 Weak Law of Large Number (WLLN)
Let Xx,X2, ... be a sequence of iid random variable’s each having finite mean E(Xi) = 
Chebyshev’s Inequality H. Then for any G > 0
This theorem is often used as a statistical tool in proving important results in statistics. > 6j _ 0
as n > co
For example: — p
lfVar(x) = 0 prove that 1'his implies that Xn -* fi
PX = B(x)=\ Proof: suppose the random variable has a finite variance a 2
Proof by Chebyshev’s inequality, for any 0 >  1. £  ^  y ar f Xi+X2..Xn\  _  <F_
Pr  (l-K — mI >  — o It follows from the Chebyshev’s inequality that
as n —» oo and using the continuity property of probability and Chebyshev inequality.
Thus, as as n -» oo
n-*oo (. X\ "h X2 ...xnlim P ( -A * > € } =
” P (E2.  {| 1 - " i > ^ }  = 0
x n -* n
=> p[x  *  n\ =  o
This implies strong convergence (Strong convergence) Convergence Almost Surely
X n is said to converge to X  almost surely, almost certainly or almost strongly 
11.5 Convergence of Random Variables
Convergence in law denoted by denoted by X „ —— if Xn(w) —> ^ (M.,for al w except for those belonging to a 
L
Xn -» X if at every continually of X through distribution function F of null set N.
Limn_ 0o /y,(x) ^{x) Thus X„ -^ ± -> X  iff X„(w) Ar„,.) < oo
Where Pn(x) denotes the distribution function of Xn Thus, the set of convergence of {Xn) has probability unity.
Lemma:
11.6 Laws of Large Number X n———>X iff as n ->co
This refers to the weak or strong convergence of sample mean 
X # = %i t0 a corresponding population mean (/j ). -> 0 V an integer
2 0 4
205
UNIVERSITY OF IBADAN LIBRARY
Proof:
Theorem 2:IfX„ —-~>Cimplies that Fn(x) -» 0 fo rx  <G, Fn(x) —> 1 forx>Cand 
Now AyOv)-* .^(w), if for arbitrary r > i ,  there exist some 
conversely.
/?„(»•. r)s.l.V K >n„ (w,r), \X „ (w ) -X (w ]  < / r Proof: If X n ——» C, Fn (x) -> F(x) where F(x) is the d.f. of the degenerate random 
Moreover X m — > X  imp lies that P[Xn -  X] = 0 variable which takes a constant value C since 
Using de-Morgan rules, fO, x < C •
,-'1 {l, x > X
[ » : \ X . ( w ) - X M \ > y $ \  = 0
r  n Conversely, let Fn(x) —> F(x) as defined above.
i.e. for each r Then PrjjT ,-C |S;e] = P[A', £  C+s]+Pr[X„ S C]
[ w . \ x A » ) - x ( .w ] z . y r\\±<i = 1 ~  r [X n < C+ e]+ Pt[ X £ C]
= l-F „ (C + e -0 )+ /; i(C -e ) 
Equivalently for each r which tends to zero as n —> co.
& t f u k - * l * X l ] = 0 Hence A',,--
Suppose X n's  are discrete random variables taking • values 
Replacing the above by the complimenting condition we have
o ,i ,2,...s.t.p (x „= ;)= /> ,.if p„->p, a s n - » 0and S takes value
/{ n k - * i < x ] ) - >i / with probability Pt (i = 0, 1, 2 ,...) and hence = 1 ■ then
Note:
Lemma 1:A sequence of random variable’s converges a.s. to a random variable iff the .ICT
sequence converges mutually almost surely. So that X a converges to X  in distribution.
Lemma 2:If X n — —  >A\then there exist a subsequence{Xnk}of {Xn}which Examplc:Let A^be a binomial random variable with index hand parameter 
converges a.s. to X. P„ s.t.as n -> co, "Pn -> X > 0 and finite. Then we can verify
Convergence in Distribution (K = 0, 1, 2,...)
If F„[x). is the d.f. of a .random variableX,, and F(x) the d.f of random variable 
The binomial random variable leads to Poisson random variable with parameter X in 
X . then }Xn} is said to converge in distribution or in law or weakly. It is denoted as distribution as n-*co.
X" ——-> X,  Fn -> F  weakly or Fn(x) -> F(x).
Theorem 1: If X„ —— >X , then F„ F(x), x e C(f)
206 207
UNIVERSITY OF IBADAN LIBRARY
, , k - n p  ......
Convergence in r,h Mean Now let y p = " r  - ■ ...... .(**)
A sequence of random variables is said to converge to X  in the rlh mean, denoted by Jnpq
Since n and k are large, we can expressed the factorials in (*) above by means f  the 
Xu ~±~>Xif H\XU - X\r -> 0 as n -> « .
Stirlings formula/approximation as
For r = 2, it is called the convergence quadratic mean or mean square.
e*"1
For r = I. it is called convergence in the first mean.
Lemma: If X „ ~ r~> X  => E\Xn\  -> E\Xf b(k, m, p )—  --------------
Proof: For (r < l)put(Xn - X )  and X for X in tyheincqulaity
p k q " ' k e e
4 ' f. r s £ K - * r + 4 * r (2^  (2n Y n k*y> (n-k )r k*Yi
Interchanging X n and X in inequality and combining, we have
» ( nq
E \ X l - E \ x \ < E \ X a - X \ (2n Y ^ k ( n - k )  U  A n - k  
Thus X n — -r-* X  => E\X ,\  => E\X\ Where 0  -O lH) - 0 (k)- 0 ( n -  k)
. I UsingM< l[ I +i +_l_
Then /*{.¥„ -  X\ >e} < from ( . . )  the above can be rewritten in the form
Substituting for k
: . X m- ~ * X  as n .-**>.
1 1
Lemma 1ltf|<---- + —+ P
The binominal distribution b (k ; n. p) approaches the normal distribution as n <- oc 1 1 12n 1 +  X k K q  q ‘ XtV / " ‘IJ
i.e. /?(k; n ,p )~  - = T ‘---- e ^ If we assume t,h a.t  Vr«  ->v  nu  aass  nn -> °°, then 8 —>0 and e* —> 1 ■ 
J i n  npq
Proof
Let A(k;n, p) = n\
K l(n-K ) p
_ k  qn-K for large values of n.
The above represent P[SK -  K ) where S t is a random variable which denote the 
number of successes in n . Bernoulli trials with probability success for success in each which can be approximated by
trial.
11’we let n >xandkeep /’ fixed then p \S n - np\> np) ->0 ¥ e > 0  by the law of J — for large n Jnpq
large number. Accordingly \K -np\jn -» 0.
209
208
UNIVERSITY OF IBADAN LIBRARY
nq 11.6.2 Criterion for Convergence in ProbabilityTo estimate the quantity  ̂̂ J  |̂ — U- • « The following lemma gives the necessary and sufficient condition for convergence in 
T'aking logarithm of the above gives probability.
Lemma
« ' o e[ y ' y ( n - K ) iog { y n _ k)
Which can be rewritten in the form x - W E ( £ i ) ^ 0 a s n *
|x„|1
X >0 iff E 0 as n -> co.
- t i p \ + xk ~ log
\ n p 1 + * J — U W JPJ
f Proof
nq 1 +■**,/—  log IXI ’ • |x  Inp I+X* J ~ For any X , the r.v. is bounded by unity. Taking g(x) = rj-77 1 for € > 0i nP J \V»\
Upon substitution for K.
Since x /̂7 ^  is small, we cam expand the Logarithmic function in power series. J  _ W _ 1 — 1- < e [  - 1 ^ ) /li+KlJ 1+e * i ,+w j/ r+e
Using the Taylor expansion then a reminder
From RHS E M
log(] + x) = x - y + ^ - ;  (o< |e3|< x ) b + w j
(* *) above becomes From LHS p \x ,\ >e]-» 0 => j  0
- | * ;  + C X V i  '
But K is a non -negptive r.v. so
Where C is a constant ■K
If we assume y  -» 0 then can be approximated by E\ >o
—  x: Theorem: Iff [x) is a continuous real valued function and
X,  — X,  then f(x)
Hence (*ii)is asymptotic to e ^ x‘
Gathering the estimates (i), (ii) and (iii) above, we have
b( k; n, p)~ 1 11.6.3 De Moivrc-Laplace Limit Theorem
J i n  npq
If S  is the number of occurrence of an event in n independent Bernoulli trial, with 
The normal approximation to the binominal distribution.
probability E for success in each trial, then
2 1 0 211
UNIVERSITY OF IBADAN LIBRARY
Now using the Chebyshev’s inequality
:.p\x,\>e)<Zr
If wan bvary so that a3n '^2 -»0and C2« '^  -> 0 as n -»oo.
11.6.4 The Weak Law of Large Number = ><=}< ^  ■■, " • It follows that lim P(jXn| > e)= 0  as n-»co.
n s '  n~'*
Let A'l,X ,,...X nbe a sequence of independent and identically distributed random 
variables each having mean E[X,)=/u and finite variance a 1. Then for any e> 0 11.6.6 Strong Law of Large Number (SLLN)
P^X  -  p| > £ j —> 0, as n —»oo This refers to the strong convergence of the sample mean to the population mean, 
Proof i.e. Xn => E{Xi) = p
It follows that
i. e. Jirn P{sup \Xn -  p | > e )  = 0 
e [x ] -  // and Var (x)=  —
n Or
From Chebyshev’s inequality we have P lim [Xn = ix] =  1
n —*on
’ 1 n s"
Note that SLLN holds iff the population mean exist.
/.lim p jx - / / |>e } -0 Theorem: Let /V„X?, .- .X n be a sequence of independent and identically distributed 
This theorem was first proved by Jacob Bemowlli random variables each having a finite mean // = E{X ,). Then with probability 1
X—x- -*- -X- ,-- -+- -..-.- + A,,11.6.5 Bernoulli's Law of Large Number ---- - /j as n -» co
Let fc}  be a sequence at random variable with pdf Or Prjim (X, + X, +... 4- Xn )/n = p  j= I
\
P 'O - P ^ f o r  0 < P < 1 and r = 0, 1, 2,...
Theorem: Let [Xk),k = 1, 2 , ... be an arbitrary sequence of random variables with 
Further let X n = Yn -  P sequence of random variables {A',,} is stochastically various ok and first moment Mk. If the Markov’s condition (i.e.lim n_co ak = 0) 
convergent to 0 for any e> 0. i.c. lim P(jXn|> e) = 0 is satisfied then the sequence [Xk -  Mk) is stochastically convergent to zero. 
Proof: Proof
Suppose Xk arc pairwise untouched (i.e. independent). Consider the r‘h variable 
W c  have E (X J = 0 *, + *2+••+*„
Ym =
n Wc have
212 213
UNIVERSITY OF IBADAN LIBRARY
l  A
C H A P T E R  12
E(Y^  =  n k=Il^
Such Xk are pairwise uncorrelated, we have P R IN C IP L E S  O F  C O N V E R G E N C E  AND C E N T R A L  L IM IT
^ 20 'n ) = T H E O R E Mp4 LV "k=i 12.1 Introduction
The Central limit theorem is concerned with determining conditions under which the 
If  ̂ n sum of a large number of random variables has a probability distribution that is 
Ifl im — V  at -  0 approximately normal.n -00 n 2Z_i 
k<=l
Then by Chebyehev’s inequality (theorem) it follows that 12.2 Convergence of Random Variable
A sequence of random variables {Xn} is said to be converge to a random variable .A" if 
nl-i*moo />[|rn - E ( ) 'n) |> e ] = 0 W  w)} converges to X(w)<co asn —>co for all w e f l  Thus {Xn} is said to 
Thus, the sequence — Mk} is stochastically convergent to zero converge to X  everywhere.
If X n(w) converges to X(w) only for w'EQf* w e  A, then C is called the set of 
convergence of X,. If Ce A, then tim X n is a random variable clearly, C is the set of 
all w e Cl , at which whatever be £ > 0, \Xn (w)~  A'(w)! < e  for all n greater than 
n = N0(w) sufficiently large symbolically for = n + m, m > 1 
C = [w :X m( w ) ^ X ( w ) ]
= *•n>0 u« nni [ ~ | * ~ w - * w H
Equivalently, replacing “for every £->0 by for every — k = \, 2, ...e
k
Since C is obtained from countable operation on measurable set, C is measure from 
C e A.
Now |/ ( .v ) -C | = |C -C | = 10]<£ 
Hence \ f (x )  — C\ <  s V x
2M 215
UNIVERSITY OF IBADAN LIB
ARY
This theorem tells us that the limit of a constant 8  that constant.
This lemma provides us with sufficient evidence/condition for the convergence in 
Remark: probability.
The proof of the above follows from Markov’s inequality.
If f a  has the limit Las x -> x  then f a  is said to converge to /.
OR i.e. P[
If C is the limit of f a  as x a then f a  is said to converge to a constant C written as n —̂ oo
as Ax) ~*C as x->  a Theorem: Let X  be a k-dimensional r. vector and g  > 0  be a valued (measurable) 
Note that the constant Lor C can also be a random variable. function defined on 9?*, so that g(x) is a vector random and let C > 0, then
Convergence in Probability
A sequence of random variables {XK} is said to converge to X  in probability, denoted Poof: Assume X \s  continuous with pdf Then
by — ~->X, If for every e > 0, as n ->oo
equivalently, if for ¥■ e > 0, as n <- oo -- fg{xltXi.... Xt ) f ( x , ..... x t ) d x „ . . . ,  dxk + J (gx,......,xt ) / ( * ....., x t ) d x ...... d x k
A
P[)Xm- X < e ] - ¥ l .
Note:
This concept plays an important role in statistics, i.e. consistency of estimations, weak Where A =
laws of large numbers.
( x ) ]  -  } & ( x i ” — >x k ) / ( * p  »* )A » ~  A
Equivalcntrandom variables: Two random variables X  and X '  are said to be A
equivalent if X  -> X 1 a.s [almost surely] Using the result from Markov inequality
Lemma: X n— and X n——>X' =>X and X '  are equivalent.
A
This lemma shows that a sequence of random variables cannot converge in probability 
= CPfeW e A] = CP\g{x) > C]
to two essentially different random variables.
Lemma: X n -»0, i f  c\Xn\  ~>0 , ^ f W 2 C ] S £ M
Replacing Xn by (.Xn -  X )  we have:
Note:
X n -+X<*> iff X n- X — ^ 0  If A' is of discrete type, the proof is initial analogous. 
1\X" -  X \  0 implies X n — 1 Special Case I: (»)
Let X  be a random variable and take g [x ) - \X ~ L \  ,/j - J i ( X \  r > 0. Then
216
2I7
UNIVERSITY OF IBADAN LIBRARY
p\x-^\>c]=A Proof:
Cr •
The above is known as Markov’s Inequality
Then X, amd Yt are standardized variables hence
Special Case II:
If r in (*) above is replaced by 2(i.e. r = 2) we have i f f - \ < E ( X ,  >',)<!
4*- - 4 a c]= i  c!]s ̂ L A . s Which becomes (replacing X , and Y{ by their itandardized variable)p \x - r f  
This is known as Tchebichev’s Inequality.
In particular, if C = K g  then
orpj\ X - ^ > K a \ i - L Note:
A A more familiar term of Canchy-Schwarz inequality is
e -[x  r ) < £ ( * 2)£ ( r : )
Remark
12.4 Borel-Cantelli Lemma
Let X  be a random variable with mean ^  and variance cr2 = 0. Then the above gives 
In the study of sequences of events A,,A}... with Pk = P[Ak ); a significant role is 
p \ x  -  //| £ c ]=  0 for every C > 0 
played by Borel-Cantelli Lemma.
This implies that p ( X  = /j ) = 1
i.e. (i) If the series converges, then a finite number at events
12.3 Cauchy-Schwarz Inequality Ak occurs with probability 1.
Let X  and Y be two random variables with means u, ,/i2 and positive variance (ii) If the events are (completely) independent, the series diverges,
a' and a 2 respectively. Then then an infinite number of event At occur with probability 1.
Or equivalently, Theorem:
Let {An} n = 1,2,... be a sequence of events and P{A„ )denote the probability of the 
-  o fo f  < E[[X -  n  )(Y-  AA)] £  ofo} 
event An where 0 < P(An)< 1. Then if
and E [ { X - ^ ) { Y - ^ ) ] = g -g \
JC
(i) If £ p (Aii)< oo with probability one only a finite number of event 
iff ( * = * ) = i
An occur.
2 1 8 2 1 9
UNIVERSITY OF IBADAN LIBRARY
i.e. The infinite product of the r.h.s of the above is divergent. Hence, P ( a )=  1 which
x
30
(ii) II the events {An}n = 1,2,...are independent ^ P { A n)=<x> with prooaouuy shows that ^  P(An) = oo 
/=i <i=i
one, an infinite number of event An occur. 12.5 The Central Limit Theorem
Theorem: Let X1,X2, ... be a sequence of independent and identically distributed 
ac
Proof: Suppose ao random variable’s each having mean p  and variance a. Then the distribution o f
i=i X] +X] +,,, + Xj;—7l/i
Let A denote an infinite number of event An occur adn
what intends to the standard normal as n -» co. 'fT at is
' - n i k
=> A C ( J  An as (  > co
n^r In simple language, the theorem states that a large number of independent random 
variables has a distribution that is approximate normal. It provides a simple method 
Meaning that P{a ) < A  ( J a ,, < for computing approximate probability for sum of independent random variable’s and 
explain the fact that many natural populations are normally distributed.
as n oo; P(An) 0
/!=/'
12.6 The Central Limit Theorem
hence, P(a ) = 0 ^  P(A„)< °° Let {Xn,n  >  1} is a sequence at random variable Define 
fi=l Sn = Xi + X2 + " X ni a(Sn) as the standard deviation of Sn and Zn 
(ii) If An are independent and _  Sn-g(Sn)
*(Sn)
£ > ( 4 , )  = c° Then Zn converges in distribution to /V(0,1). This is an example of SLLN.
nor
then A - l - Ax  xi f f  at most finite number of events An occur. Example
hence, A = \ J f ) A n Suppose Xs above are i.i.d each with the Poisson distribution with parameter X. Show 
<•■1 «i-l that the SLLN holds.
In view of the independence of An, we have 12.6.1 Central Limit Theorem for Independent Random Variables
Let Xi,X2, -  be a sequence of independent random variable’s having means p, = 
\ - f (a ) = p {a ) ^ p(\j ( ) a„ E(Xf) and variance a? = Var (XL). If (a) the Xv are uniformly bounded, that is for 
k. r n | «=r
some M',P$\X‘i\. < M) = 1 for aHTaad
= L ^ f m ) = x
r * l  \.«n| /
220
UNIVERSITY OF IBADAN LIBRARY
i\
We will show that with probability I,
uu CO
( h ) ^  of  =  co, then
1=1
Z U C X i- f i ) Let X1,X2l ...Xn be independent random variable’s with E(X[) = 0, Var (X,) =  a 2 
P< <  a • -* 0 (a) as 7i —* co we have for some a > 0 by Kolmogoro’s inequality
ZT=i°?
Kroncker’s Lemma (Proposition)
If a1( a2, ... are real number such that
co n
Z-r  < oo, converges, then lim Z)l n-*oo — i —n  = 0  1 = 1  n-*oo
12.7 Strong Law of Large Numbers for Independent Random Variables
S B .  X % = °
Let Xi,X2, ... be independent random variable’s with E (A",) =  0, Var (Xs) =  a? <  °o. l=j
By Knonecker’s proposition, we have that
If i <  0, then with probability i.
X1+Xa+-+Xn
0 a s  n  —» co = 1
t=i
Note: It can be observed that Kolmogorov’s inequality is a generalization of 
Chebyshev’s inequality. If X has a mean n and variance a 2, then by letting n =  1 in Pr < Max > a }  = 0
Kolmogorov’s inequality we obtain Ijsksn 1 %l l=J
P{\X -  n\ > a) < -jj{which is Chebyshev's inequlaity) This implies that
n
Where Xl ,X2, — Xn are independent random variable’s with E{Xf) =  0, Var (Xs) = im )  * l/ n = 0 or equivalently that
of; then Chebyshev’s inequality yields -«oo Z1 j =  1
n  2
f{|Jfi +  -  +  Xn | > a } s 2 ] ^ lim ) --------- =
n-cc Z_j n
1 =  1 ‘ 1=1 
Kolmogorov’s inequality gives the same bound for the probability of larger set of and that
variables. The theorem (Kolmogorov’s) is used as a basis for the proof of the strong p f  lim Xk = o] =
law of large numbers in the case where the random variable’s are assumed to by V|| •• Li )
independent but not necessarily identically distributed.
Proof: (Of strong law of large numbers for independent random variable’s)
2 2 3
2 2 2
UNIVERSITY OF 
■*T8 7# . . v,..:;. . .---- — c IBA ..............  —DAN LIBRARY
Definition 1: -■> A„(w) -v A'(w) as n  ̂co
Two sequences h{Xn(w)}, {/„(vv)} of random variables are said to be "Trial Proof
Equivalent" a finite number of terms, Given = X„ Pr{X„\ > a„} < co
i.e. for almost all w Q, X n (vv) = Y„ (vv) => PrlEn occur infinitely often} = 0 
for all but a finite number at n Pr jlim sup E n j= 0
Lemma P'1 k fw-!l Umw E- 1 ■<>
if ' Z Pr^ - x , M * y , ( w ) }  < oo i.e. trials x  X >
ml Pr n u ^ = 1 by de - M orgasloa
Then Pr{w: X n (vv) * Yn (vv) inf initely after }= 0 
Proof /.P r = 1
Let En = [Xn(w) *  ^ ( mO) => Pr [En occurs inf initely after) = 0 
Since X P\En} —v oo in converges
=>Prjlim infE„j = l
PrjPirn sup En J = 0 or 1
Pr{A\(w) = y.(vv) V except a finite number of n} 
Definition 2 . . . ; =>(jT.}and{Yn} are Tail Equivalent.
A sequence {y„}is said to be a truncation at the sequence {A'Jat [an)where {a,,} is a 
sequence of positive real number if Examples
Let En = {vv: Yn(w) -> A„(w) as n ->»}
=*P(E) = 1
We know: \X,\ < an => -a n < X n < an i.e. A -  [A*. -  Y„ Vn except finitly inany nj 
P(/l) = I 
Y / / / X Y / / / -\ cut off the {an} at {Aj in order to obtain {T„}
’ 0 »n If we E amd w e A
we E fl A = B = {w: Y„(w) —> A„(w) as n -> co amd X„ = Y„ V except finitrly many n)
Lemma:
Let the sequence {/„} be a truncation of sequence {Xn} at sequence {a„} be finite i.e. r=.weA 'f|A=e>weB 
Yn - XPrlA^I > an }< co (Efl A)< B
then thus P(E) = 1. P( A) = 1 P(B) = 1
Y„ (w) —> X n (w) as n —v oo where V. and A arc defined on the same ample space.
225
224
UNIVERSITY OF IBADAN LIBRARY
Hence, they are tail equivalent.
12.8 Bolzano-Cauchy Criterion for Convergence
Lemma: Let C be a fixed real number. If |C| < AT efor some K  > 0 and every€> 0, it 
follows that C = 0. By Bolzano-Cauchy criterion for convergence 
Proof: suppose not. Then C *  0 £ /> (£ „ )<  °o
0* I
Since € is chosen arbitrary, put e into let e= > 0, since K is given.
2k given any e> 0,3 an N0(e)
Then |C| J = Such that V-n > N 0.K e which is clearly a contradiction except for |C| = 0
hhK
^ / >(E „l< e and lettingK -» co
12.9 First Borel-Cantelli Lemma
Theorem: Let {£„} be a sequence of events each of which is a subset of ft such that 
£ F where F is a a  -  field of sub-events of ft defined on the probability space Now Z
(ft, F.P), then ^ P ( E En)< co => pjlim supEn J= 0.
P(E)< Z p (£. <e—> 0
[iffE .} is a sequence o f events we are often interested in how many of the event occured] Where ecan be taken arbitrarily close to zero,
OR i.e. only finitely many En occurred.
If f c )  is a sequence of events in a a -  field F, w here ft, F, P is a probability Is1 BC-Lemma does not require independence of the event En.
space. Then P({Eon})< 30=5 /4im  supE„ }= 0 12.10 Second Borel-Cantelli Lemma
Let {£„) be a sequence of independent events on the same probability space 
(fl, F, P)then if £  = mli mMt sup En;
Proof:
Let £  = »li-m«,  supr  En !0->1 ( £ . ) = «°
Then £ - f ) | j £ „ ;  clearly E c  (J  E„ V-m e =>£(£„ occur infmtely often) = 1N
i.e. £jiim supEn j= 1 .
226 227
UNIVERSITY OF IBADAN LIBRARY
Proof:
Corollary to 2ml B.C. Lemma
Recall that hm sup En = p | \ J e „ s If A7 .fareindependentandX„ -> 0 (a.s)
, m=l m m
Then 2 >|jr,,|>c]<<x>
,im *UP (En)]  i = P{hm inf Enc}= pjlim (fl )} .rr»l ; . • * . '
Whatever be C>  0, finite 
For any N >  0 and every K > N
Proof:
If Xn's are independent random variables /!„ = [jx„|]> C are independent. Since 
Since E, s are independents E f 's  are independent too. X„ 0 a.s. iff.
 / > C]< oo as n -» 0 and for any C > 0am-N m N
Since J~[(l -  P{En)) < J"J e ,,[K'1 = e "" by exponential property We have / ’(lim SUP A.n J= 0
mN mN
Since PE(A„)< c o .
as K oo; £  P(En) -» oo i.e. £  P(En) = co Note:
" » N n=N
The converse of Borel-Cantelli lemma is not true if An's are not independent.
- I P ( E . )
=> lim e "" -> 0
Af-w:
=> 1 -P {E )  = 0 ^  P(e )= 1. 12.11 The Zcro-Onc-Law
Theorem: Let A„ A-,,..., be events and let A be the smallest c r -  field containing each of these 
< co events. Suppose E  is an event in A with the property that, for any integer j\, j 2, jk pr{lim sup En }= according as J P ( E n) =
ICO that events.
Whenever Et,E 2>..., En,... are independent E  and Aj, f) Aj2, f l ... fl Ajfc are independent.
r
| Then PIE) is either 0 or I.I e- (|) J^2iib]E (A n)<oo, p(iim An)=0
Exercise 1: If P(An) — V n eQ ; ]f^PAn = oo
(ii) 1 J = ooand A/jrare independent 2 ml
*»*l Does [An}converge??
p(!im (A j)= l.
--> /'(lim sup A3) = 1
Bui ±P(.4„ J~rc=> P^lim inf An )■--- 0
228
229
UNIVERSITY OF IBADAN LIBRARY
Thus lim sup An * lim inf An
n-Kt, n-*xj
Hence {An} dos not converge. By independence of E and Aj, f | Aj, D ... fl Ajk 
= jA jl:'Aj,...AjkdPP(E )
n
Exercise 2: Let X  have the uniform distribution (X ~ p (0 ,  l)) consider the sequence But (fl, A, P)is a complete probability space.
o f events {An} .
.■ .p (/fn £ )= .p M ^ p [e )
Where An = jw: X(w)  < — | .  Are {An}independent. For all A e A, in particular A = E.
:.P{E)={P(E)}2 
P ro o f :  f ( t ) = l Then P(e ) = 0 or 10 < x < 1
Completeness
A measure space (H, 0 ,  P) is said to be complete if Acontains all subsets of sets of 
measure zero.
ThCn 5  P Â" ̂  ~ 5  n ~  +°° Harmonic Series Diverges Note
(i) A non-empty event with zero probability is negligible
But En ^ A n+lz>An+l=>... (ii) Every subset of a negligible event have-zero probability;
/ ’Jim sup A )=
I M 7 '  "  ) = ^ W =0
,,,BI n**ni / Lemma
(1) Given a probability space (H.A, P)and a sequence {£„. u - 1.2,...} of event 
T in )- where EnCH and E T V-n
Clearly the above violates the 2nd B-C lemma as the sequence {/*„} of events is 
overlapping and the therefore not independent. Prove
(1) lim inf En C lim sup En
n- v n  n-f«>
Proof: (ii) lim inf P(En)< lim inf P(En)n -* r n-Kc
on Q -A ( (2) Let (Q, F) be a measure space, on which a sequence of probability measure isLet
on AJ » 1defined. The set function P(e ) dA Pn(£)
Then J IAj{, fAl2 .....//!,* dP = P(Aj, f | Aj, f l . - f l  Ajk f |£ )
(i) Show that 0 < /} ,<  1 .
= p(Aj1nA j; n . . . .n A j j / ’(£) (ii. i i )  is countably additive and is therefore a measure
(iii) Prove that Fj(f2) = 1.
230
231
UNIVERSITY OF IBADAN LIBRARY
Solution Now let Au {w: \X,(w)| ̂ e  zTn}, then An l<p 
(i) P(E)=—  Pn(£)>  1 .Since Pn (e )^  1 and 0 lim (En) = 0 and iim P(An) = 0
2 n 2 " n-*w I N - X
(ii) Show l im j ;  P„ (£ )  = P. (£) and d/IMmC.  Jf X]dP = 0
n=l *• n»l ^ Am
This verifies the 4lh L.T. so by Lindeberg’s theorem
(Mi) P ( f i ) = Z j r P . ( n ) = i ; ^ ( i )
n = l ^  ^ X , + X 2 + -  + X n_ _̂  in distribution
tTn
+ . . .
~ 2 + 4 
Lindeberg’s Theorem (The Conditions of Lindcberg Theorem)
5W =
T ~ 2 Let 1 X itX 2, . :X k t
i ->^n >•••^2*3
Be a rectangular array of random variable satisfying the following condition.
12.12 Limit Theorems for Sums of Independent R.V’s 
1. V-n > 1 X n ,X n ,...Xnt_ are independent
Lirideberg-Levy Theorem
Lcl X r .\\... be a sequence of ij.d.r.v each with mean 0 and variance 2. e (* j = 0;
r J (o < r  < co) Then + * 7 + — + X* 71/(0, l) 3- B] =  r ;  + r ; : + ... +  r ; 4. wrlA 5 ;  >  0 
zTn
Proof:
*«0
Consider an array Xi K e > 0 .
Let S„ = X„ + X„ +...+ X .  and N  a random variable with standard normal 
* „ * i . * ,
distribution
Condition 1,2 and 3 of Lindeberg’s Theorem are satisfied. Then ———  —» -j=L= T e ^4//
5, 42n  ■“
We only need to verify condition 4.
The above statement is basic to the central limit theorem 
Let e> 0, Bl = n r :, then
However, i f E(XiiL ) = 0
n
—n  t Z  J  -m p 2, \x.dpn r i* dt
Since X s are i.i.d.
and
232 233
UNIVERSITY OF IBADAN LIBRARY
Lyapunov’s Theorem 1
LetXi be a sequence of independent random variables. If a positive number 8  can be 
found such that as n —» co; / > 1.
eoW-.i
u„ *=i
k = I V2 jr By hypothesis (ii) above
1 i? .+rf = 0
Proof: • /*.•; • • : . - . -, ' «<-” 6d V*
The random variables, define above satisfies 1, 2, and 3 of Lindeberg’s theorem. It 
. jSl q 5* i i _ » #(o, l)
also satisfies the following: "5 ..
(i) for some fixed & >  0 > E \X » k \2+S <  00
(>o (im ■° n i Et.i £ i^-*r+' = ° then
We now need to show that condition 4 of Lindeberg’s theorem is satisfied.
Let Var(X,)=b for  / = 1, 2 then Bn =bTn 
Setting E(X,) = ar, condition 4 becomes
/ \ x a*hVn
Vr,b2\  |,r-a|!>*^fl, - a^ dFM = o+eJ/r+fp -»
which approaches zero since the Var(Xt)<  oo anc/ 6 = 0 
Now wc need to show that condition (2) implies condition (4)
This follows from the inequality
234 235
UNIVERSITY OF IBADAN LIBRARY
Where /?(ll|is the initial distribution 
C H A P T E R  13 //, is the drift vector 
IN T R O D U C T IO N  T O  B RO W N IA N  M O T IO N ^ i s  the diffusion matrix
13.1 Brownian Motion (Weiner Process)
Brownian motion describes the macroscopic picture of a particle emerging in random 13.2 Brownian Process
system defined be a host of microscopic random effects in d-dimensional space, Peter If the drift vector is zero and the diffusion matrix is the identity, then is
& Yuval (2008). At any step on the microscopic level, the particle receives a termed/referred to as the Standard Brownian Motion. Hence, the macroscopic picture 
displacement caused by other particles hitting if or by an external forces so that it’s emerging from a random walk can be fully described by a Standard Brownian
•• ** H Motion.
posterior at time-zero is So, its posterior at time n is given by S„ = S0 + ^ x ,  where
■̂1
the displacements X t,X 3,... are assumed to be independent, identically distributed 13.3 Multinomial Distribution and Gaussian Process
The most important joint distribution is the multivariate normal (or the multinomial) 
random variables with value in TRd. The process {S’,,:/*£ 0}is a random walk, the distribution. It arises in many applications and has some properties that makes its 
displacements represent the microscopic inputs. Thus Brownian motion is a kind of manipulation very simple.
stochastic process. If A is any (/jx»)symmctric matrix, consider the quadratic form 
Any continuous time stochastic process {#(/):/£0} describing the 
= A X
macroscopic feature of a random walk should have the following properties: n n
(i) For all time 0 </,< /, <... <tn. = X 2 > *  x<xj
, . |  J w  I
are independent, we say the process has Where x  e 91 is the point which has coordination xf and a column vector with
independent decrements.
(ii) the distribution of the increment /?((Wi)- /? (,)does not depend on t, we say the transpose/. If A is positive-definite, then (2n) ^  [def A) ^ e x p j - - /  Ax^ is a
9
process has stationary increments. probability density on 91".
(iii) the process {/3{l):/ > 0} has almost surely continuous paths.
Let V = A'1, then V is also positive-definite and symmetric.
(iv) It follows from the CI.T that these feature implies the existence of and
a matrix le '.R ^ su ch  that for every t > Q  and  h > 0 ,  the increment is Definition 1:
A collection (X„ Kjwhich has the joint density. 
multivariate normally distributed with mean h/.i and covariance matrix /?XXr
Any process {.V,} with the above feature seem be represented by (2x)% m(def.V)T
& i = A .i + M + £ / W  J'” t>0 is said to have the multinomial distribution A'fO.K)
237
236
UNIVERSITY OF IBADAN LIBRARY
13.4 Properties of a Brownian motion (B. M)
Definition 2: If //,, are finite real numbers then X  = The following are the properties of a Brownian motion.
C^i+M ^2 + /^2» -> ^ n + A. )joint p.d.f 1. The Brownian motion is a Gaussian process with autocovariance function.
V { s , t ) = E ( x , . X , )
(27r)/^-(det. P ) ^  exp ~ (±J Y~'(x~m)} a°d l  *s sa>d to have the multinomial
= min (5,/)
distribution n (̂ j, v ) 2. The autocovariance function
P(.v,r) = min (i-,/) 
Definition 3: Letr be any set (usually a subset of the real axis). For every t fe r re t  i.c. symmetric for r = (0,co)
A ^ b e  a random variable defined on a probability space (Q, A,P). Then the family 3. Let A ^be a B.M. process and define A'(.s,f)= Xlt) -  X {s), the increment
,w): re  r} at random variables is called a Stochastic process.
process on the interval' (s,t\ Then A(j ,/) ~ A^(0,/-i)
Definition 4: Let V(s,t) = e \ X ^ - ' / u, \ x ^ - J} be the autocovariance function at 4. Given the Brownian motion process
M M  for all relevant values of t and s and pt = E ] x ^ \  p s = ZsjAQ,)] £ ( / ) = 4 K /+A M d f )
Definition 5: A stochastic process Af(r,w) with the property that all its. finite­ = 3/r
5. The Brownian motion process is continuous everywhere but is nowhere 
dimensional distribution are multinomial and E(X,}=0,
differentiable.
E ( X „ X , ) = V { s , t )
Where K(v ) is a positive-definite function on r , is called a .Gaussian Process with Definition 7: Let T be any set (usually infinite) and possibly uncountable) and let 
autocovariancc function P{-, •): TXT -> 9? be a function with the two properties.
Remark: (ii) for any finite subset }er and any real numbers Z2,...,Zn not
• Two Gaussian processes with the same autocovariance function have all zero
the same finite-dimensional distribution
• The most important example of a Gaussian process is the Weiner (or I.-1 2/-iM ' ,  < > , .* ,>  0
Brownian motion) process. then P(v ) is called a positive-definite function on T 
Lemma:
Definition 6: A Gaussian process is said to be a wiener (Brownian) process 
P(/,./,) - min (r,,r,) is a positive definite function on r
i f  (/') r = (0, ao)
(H) -T1(1) =- 0 and
(iii) = m in(j,/)
238 1I
UNIVERSITY OF IBADAN LIBRARY
Proof:
(/) Clearly V(/,, /,) = V (/,, /,)
(/'/') I f  0 < I, < /, then z * . + ('z -O + (/> -':)[ Z * , z * > +(/«/■I
Z<-i /E1 ^  i > t e  *, = Z<-i  Z,-»i  min 0 / . t e • Z f c  ' . - i J /E-I* /
„ i  ‘-i  ,-i Where /„ -  ()./, >
(1 <./.) Clearly the last expression in the “curly bracket] is a positive number. 
•S/rar min(/i./()-/. /dr / = j and J(.v,/) = min (.y./) is positive definite.
Since b> symmetry, we may interchange / and7 to cover cases in which f
Theorem:
- I ' .  T  + 22 . L * , Let V(/)be a Wiener process and let X(.s.t)- ^  lit * A',.,denote the increment of the 
••I /"ni
process on an interval (.v,/), then 
Expanding the square bracket gives
(/) A (.v,/) -  iV(0,/- .v)
= * .-+ 22 , 2 ;* , (//) // (.v,, /,) anil (a,, /.) arc disjouit intervals.then X ( s t. tx)andX(s,, t2)are
stochastically independent.
= *,’ + 22,(2 ,.,)+  2* .!* ,.,)+ ... + 22,(2 . J  + 2(2 ,.2 .)
Proof
(i) A',#,is Gaussian, therefore the joint distribution of A',f, and .Y(#, is
■ f t * . /"I*!
multinomial and so .Yj(J -  Aj,, -  A',, N(o.r’)
I * ,  I - I  I * , where r -  I a r ( X ( s j ))(-/♦I
-4 * 1 .1 - - O '
Writing the expression in full, we have
= 4 ^ i ,  -  2avv, + -v,;,]
- / , k  IZ . (3, + * + . . .  + 2 .  f} ~ r ( t . / ) - 2 l ( s . t ) +  V (.v..v)
1 C |(^; + -  H )' -  (£; *- £ , + ... + } - / -  2.V + .v
• -  • M . , ' * J  -?„■’ } - /  • .V
Hut / w  ,. by hypothesis, rewriting mi) 'A c ;*-<unie without loss of generality that v, < tt < v. < Then A(.v,. /,) and
The above expression we have (•. . i nave nuiltinormal joint distribution with covariance given by
UNIVERSITY OF IBADAN LIBRARY
/ r [A '( v  / , )(A'(.v,, /,)]=-- E [,V (a, ) -  A'(.v,)] [A' ( / , ) -  Xss] Let ,Y,„bc a Wiener process (B.M.. then consider the B.M. process 
-  E [ - '  (/ . K ^ l / ;  | “  ( l ,  ) A V . ;  )  "  X (> , K ^ ( / ;  I  +  ^  ( a ,  1 [0 , t  =  0
“  V[U. ^ (', ^ 1 •?; )+ ^ * *2)
l /
— o Show that is a Brownian process.
Since Cfn>{s,l) = 0, there exist stochastic independence.
Exercise 1: For any real set of number C,,C,.... C,(and real values random variable Hint for Solution
•j.V, J". show that ]T ]T (?,(?,£■(A', - / / J iA ' , -p )  is positive scmi-dcllnite for Calculate !'(•. ) o/
,.i /-i Note that for a B.M. process // = 0
//)(); //)=£[();,,)(>;„))
Hint for Solution: Let >'■ = A', - / / ,  then £ ()') = 0 and V u r \^ C iY, ] = Zij ^ C ,) '
= .v/»n. n{I  i .-i  •,
W s j
i-i  i- i #-ri -- min (.v./)/
Exercise 4:
M 1 cl A'Ufbc a Brownian motion process and let rr,,, -  A„, — / A*(ll; 
0 < / < I
Example 2: Calculate the autocovariancc function of the Gauss-Markov Process Find the or /',(.•>./)
Hint for Solution
Hint for Solution: Assuming /T()7) = 0; Nolo that /:(-/*„,)- 0 nm/
! > ./) • -  A-[(c"A>:- )(e - Ye-'')]
,  )(«•*)]
Cf e:> for l > s
i.'  <‘*'1 t* for l < s
* 1 for 1 = s
- f, i/ l-s.f
Example 3:
242 24?
UNIVERSITY OF IBADAN LIBRARY
^ E [ ^ r s X j X {l)- t ( x J ]
= £[a'\t)X U) -  sX{t)X {,) -  X[t>f X [x) + tsX 
= P M - . sI'(1,0 - /P ( 1,j )+&K(U)
= Min( s j ) - S  min ( i t ) - t  min(l,.?) + / s min (l,$) 
= .v -  st -  is + st; for s < l
- t - t s - s t  + st; for t < s
= M in ( s j ) - s t ;  V-sJ
PART THREE
244 245
UNIVERSITY OF IBADAN LIBRARY
C H A P T E R  14
Definition
IN T R O D U C T IO N  T O  S T O C H A S T IC  P R O C E S S E S A stochastic process is any process that evolves with time. A few examples are data 
on weather, stock market indices, air-pollution data, demographic data, and political 
14.1 Basic Concepts tracking polls. These also have in common that successive observations are typically 
Researchers in science, engineering, computing, business studies and economics quite not independent, such collection of observations is called a stochastic process. 
often need to model real-world situations using stochastic models in order to Therefore, a stochastic process is a collection of random variables that take values in a 
understand, analyze, and make inferences about real-world random phenomena. set S, the state space. The collection is indexed by another setT, the index set.
Finding a model usually begins with fitting some existing simple stochastic process to 
the observed data to see if this process is an adequate approximation to the real-world The two most common index sets are the natural numbers T  = {0,1, 2,...}, and the 
situation. nonnegative real numbers which usually represent discrete time and continuous time, 
*> '• "• '■ i‘ respectively. The first index set thus gives a sequence of random variables 
Stochastic models are used in several fields of research. Some models used in the (X0,XVX2, — )and the second, a collection of random variables {AT,,,, t > 0 j, one 
engineering sciences are models of traffic flow, queuing models, and reliability 
random variable for each time t. In general, the index set does not have to describe 
models, spatial and spatial-temporal models. In the computer sciences, the queuing 
theory issued in performance models to compare the performance of different time but is also commonly used to describe spatial location.
computer systems. The state space can be finite countable infinite, or uncountable, depending on the 
application.
Learning stochastic processes requires a good knowledge o f the probability theory, 
advanced calculus, matrix algebra and a general level o f mathematical maturity. 14.1.1 Applications of Stochastic Processes
Nowadays, however, less probability theory, calculus, matrix algebra and differential The followings are some areas of Stochastic Processes:
equations arc taught in the undergraduate courses. This makes it a little bit difficult to (i) Marketing: To study customers or consumer buying behaviour and forecast.
teach stochastic processes to undergraduate students. (ii) Finance: To study the customer’s account recordable behaviour and forecast.
(iii) Personnel: To study and determine the manpower requirement of an 
The mathematical techniques and the numerical computation used in stochastic organization.
models are not very simple. In an introductory course, the hope is to teach students a (iv) Production: To study and evaluate alternative maintenance policies, 
small number of stochastic models effectively to enable them to start thinking about inventory, and so on, in industries.
the applications of stochastic processes in their area of research. These small numbers (v) Transport: To effectively control flow and congestion in the transport 
of stochastic models are the core topics to be taught in an introductory course on industry.
stochastic processes directed to researchers in the physical sciences, engineering, 
operational research and computing science. These researchers have a stronger 14.2 Discrete-Time Markov Chains
background in mathematics and probability than researchers in the biological You arc playing a lotto, in each round betting N13 on odd. You start with N30 and 
sciences. after each round record your new fortune. Suppose that the first five rounds give the
247
UNIVERSITY OF IBADAN LIBRARY
sequence loss. loss, win, win, win, which gives the sequence of fortunes, 9, 8, 9. 10, For a transformation matrix, a 2-level change of state will produce 2 by a matrix, a 3- lcvel change produces 3 by 3 matrix and so on
1 1, and that you wish to find the distribution of your fortune after the next round,
18
given this information. Your fortune will be 12 if you win which has probability — 14.3 Classification of General Stochastic Processes
The main elements of distinguishing stochastic process are in the nature of the state
and 10 if you lose, with probability —20 . One thing we realize is that this depends
3 8 space, the index parameter T, and the dependence relations among the random variables XL.
only on the fact that the current fortune is N il and not the values prior to that. In 
general, if your fortunes in the first of rounds are the random variables^, ...,Xn. the 
conditional distribution of ^n+l given Xv ...,Xn depends only on Xn. This is a 
14.3.1 State Space 5
fundamental property and we state the following general condition.
This is the space in which the possible values of each <Ytlie. In the case that S = (0, 1,
2. ...). vve refer to the processes as integer valued, or alternatively as a discrete state
Definition process.
Let X0,X1,X2, ... be a sequence of discrete random variables, taking values in some set 
If 5 the real line ( - 00, 00), then we call Xt a real-valued stochastic process. If  S is 
5 and that are such that
p(x„, =j\x„ the each decision K space then X, is said to be a k vector process. .....X..,.-|.X„ 
For all i , j , i \ ..... t„_iand all n, the sequence jAj,} is then called a Markov chain. In Remarks:
ucncral. the probability P(.Y„., /jA'„„) depends on i , j  andn. It is however, often The choice of slate space is not uniquely specified by the physical situation being 
the case that there is no dependence on n. We call such chains time-homogeneous and described, although usually one particular choice may sand out as most appreciate.
restrict or attention to these claims. Since the conditional probability in the definition 
thus depends only t and j,  we use the notation Ph -  P(sY„_, — j \ X „  = /) i, jeSand 14.3.2 Index (Parameter) SetT
call these the transition probabilities of the Markov chain. This, if the chain is in state I IT = (0.1.... )lhen we state that Xt is a discrete lime stochastic process. Often when 
i, the probabilities p,^describe how the chain chooses which state to jump to next. T is discrete we should write Xn instead of X,. If T  = [0, then X, is called a 
Obviously, the transition probabilities have to satisfy the following two criteria: continuous lime process.
(/) P„> 0 1/7) 1 ^ = 1 ,  for/;eS
14.4 Classical Type of Stochastic Processes
for all ( e S
We now describe (first brielly) then in details some of the classical types of stochastic 
processes characterized by different dependence relationships among At . Unless 
14.2.1 The Transition Matrix
In changing from one stale to another in any Markov system, a measure of probability random .staled, we lake T -  [(), -x-] and assume the random variables A", are real
valued
is always attached. Ii is the collection of all such probabilistic measures which are 
arrange din rows and columns that ids called the transition matrix.
UNIVERSITY OF IBADAN LIBRARY
Then for any t and s we have
14.4.1 Process with Stationary Independent Increment / ( /  + *) = £  [X ,„ -X „]
If the random variables X l2>- X,,. X ,j-X l2,...,Xtn — X ln_{ are independent for all = £  [X,.s-A'.v + X5. -X „ ]
= £ [X ,.S- X ,]  + E[XJ.-X „] 
choices of £1( t2, .... ^satisfying £, < /, < ...< /„  then we say that Xt is a process with 
independent increments. = £ [ X , - X „ ] + £ [ X , - X „ ]
If the index set contains a smallest index t0, it is also assumed 
X c - X li,...,Xtn- X ln_l are independent. If the index set is divided, 
Using the property of stationary increments
what is 7 = (0 ,1 ,...), then a process with independent were reduces to a sequence of 
= / M - / W
independent random variables Z0 = Xq,Z{ =  X, -  X ^ . i  = 1,2,3, ...in the sense that 
The only solution to the functional equation / ( /  + s = / ( / )  + f ( s )  =  /( /) / .  
knowing the individual probabilities/distributions of ZQ,ZV ... enable us to determine 
the joint distributions o f any finite set of Xt, in fact that of differentiating with respect to t and independently with respect to s we have 
X, = Z„ + Z, + ... + Z,, 1 =0,1,2,... f ( r  + s) = f ' ( r )  = f \ s ) .
Therefore for 5  = l, we find f ( t)  = constant = f(i) = c. Integrating this elementary
Remarks'Definition differential equation yields f [ t )  = cl + d.
1. 1! the distribution of the increments X(t, + h ) -X ( t t) depends only on the But / ( 0 )  = 2, / ( 0 )  implies / ( 0 )  = 0 and therefore d = 0.
length h of the interval and not on the time t, the process is said to have 
TTh. eref.o re expressi- on / ( 'x)r  = / ( ! >  «s/ f n %
sia/iunun • increment.
2. For a process with stationary increments, the distribution of X(/2 + h ) - X ( t 2), =>E[X,] = M n +M, 1 as requires.
no matter what the values of h, t2 and h.
3. We now state a theorem;
14.5 Markov Processes
If  a process {Xr  t eT}, whereT = [0, oo] or T = (0,1,2,...) has stationary 
A Markov process is a process with the property that, given the value ofXt , the values 
independent increments and has a finite mean, then it true that: of Xs, S > t, do not depend on the value if X u < t; that is, the probability of any 
£ (X ,)= M n + M, where M0 = £ ( x J  and M ,= £ (X ,) -M 0 particular future behaviour of the process, when the present state is known exactly, is 
07 = 07 + 07 where not altered by additional knowledge concerning the past behaviour, (provided our 
£7,; -  £  [(X„ - M„)] and <j'; = £  [(X, - M, f ]- ct; knowledge of the present state is precise).
(4) Both the Brownian motion process and the Poisson process have stationary 
independent increments. Definition 1
(5) We now prove remark 3(a) In formal terms. a process is said to be Markov if 
E(X,) = E(X„)+[E(X,}-E(X0)] Pr {a < X, <b\Xl) = X t, X l2 = X 2.....XIn = Xn }
Lcl f(t) = E(X ,)-E(X „)
250 251
UNIVERSITY OF IBADAN LIBRARY
Whenever /, < < ...jn < t r\a < X, < h\XM = X „} Practice Questions
1. Define and explain the concept of Stochastic Processes, and give three areas of 
Definition 2
application.
Let A he an interval of the real line. The function
/>|.v: s;«.A[ Pr {x, <r A|X5r.v} is called the transition probability function t > s and 2. Explain the concept of a simple Markov Chain.
is basic to the study of the structure of Markov process. We may express the condition 
( 1) as follows: 3. Define the following:
p r{« < x , ^  tyx ,i x,>Ks = x^ - x ,„ = "„) = H x„> tn, t A) where ) £ |a < £ <  b} (a) Slate Space (5)
(b) Index Set (7 )
14.5.1 Martingales (c) Renewal Process
Let (.V,) be a real-valued stochastic process with discrete or count parameter set. We 
say that (A',) is a Martingale if. for all t, and if for any
< /, e (X1i1.,|X 1I r/,.... Xln =o„) = c for all values of ai, a2, ... a„.
14.5.2 Renewal Process
A renewal process is a sequence Tk of independent and identically distributed (i . i .d ) 
positive random variables, repressing the lifetimes of some “units”. The first unit is 
placed at time zero; it falls at lime /', and is immediately replaced a new unit which 
then fails at time 7', + 7'2and so on. the motivating the name “renewal process”. The 
time of the nth renewal is S„ -  7] + 7', t-... + Tn.
A renewal counting process N, counts the number of renewals in the interval [o.tj. 
formally .V, = n for Sn < ( < Sn,t, n = 0 ,1.2 ....
Remarks: I lie Poisson process with parameter A is a renewal counting process for 
which the unit lifetimes have exponential distribution with common parameter A 
Other examples such as Poisson process, birth and death processes and Branching 
Process v\ ill he considered in small details.
UNIVERSITY OF IBADAN LIBRARY
C H A P T E R  15 P {x < i)  = 1 - q ,
G E N E R A T IN G  FU N C TIO N S A N D  M A R K O V  C H A IN S So that the probability generating function follows 
p(x) = Zi=oPi * ' = I: (X1)
15.1 Introduction
Also for the joint probability, we have the generating function as
Generating function is of central importance in the handling of stochastic processes 
involving integral-valued random variables not only in theoretical analysis that also in Q CO =  £«=0 Qi
practical appreciations. Stochastic process involves all process dealing with 
We can see that (?(*) is not the same as P(x)
individuals’ populations, which may be biological organisms, radioactive atoms, or 
telephone calls.
Q(x) do not in general constitute probability distribution despite the fact the 
15.2 Basic Definitions and Tail Probabilities coefficients are probabilities.
Suppose we have a sequence of real numbers a 0, a a...... Involving the doming Note that
variable x, we may define a formula sothatP(i) = 1,
A (x ) = Go*0 +  a ^x1 + a2x 2 +  ••• = £?= Qaix i and /P (x ) /<  ^T /p .xV
If the series converges in some real inference - x 0 <  x < x0, then the function A (x) is 
known as the generating functions of the sequence { a j. We may also see this as a < ^  Pj. if / x / <  1
transformation that carries the sequence unit the function A(x). If the sequence {a,} is < 1
bounded, then a comparison with the geometric series shows that A(x) converge at This means that P(x) is absolutely convergent at least for /x /<  1. But for Q(x), all 
least for f x f x j . coefficients are less than unity, this making Q(x) to converge absolutely at least in the
II the following restriction is introduced open interval / x /<  1 .
n
Converting P(x) andQ(x), we have
t'=0
( l -x )Q O O  = l - P O )
Then the corresponding function A(x) is viewed as a probability-generating function. 
Specifically, consider the probability distribution given by which is easily seen when the coefficient of both sides are compared, 
H x  = i) = Pi for the mean and variance of p,-. we have
Where X is an integral valued random variable assuming the values 0,1,2 .... U = /•(*) = £  ip, =p<( 1)
Consequently, we define the tail probabilities as i = 0
P{x > i} = q,
= q‘ =<?(D
Bui the usual distribution function is i = 0
then E\x (x -  1)] = £  t(t -  l)p< = p " (l)  =  2 Q \l)
254 255
UNIVERSITY OF IBADAN LIBRARY
So that (T2 =  var(x ) = p"( 1) + p '( l)  -  (F 1 ( l ))2
= 2g '( i)  + q ( i ) -  (<?(D)2 , . V N u*(it)r!
In the same vein rth  factorial moment n[r) about the origin to be 0 (0  =  l  +  )  —*->r=i r!
function and continues
fcWx-DO- 2) .... (x - r  + 1)] = £  (i - lXi -  2).... (i - r + l)Pi the characteristics function exist always both for discrete
function.
=  p « (  1) =
0 ,( 0  =  £ > * / ■ «
t=i
From these result, several other generating function could be obtain such as the 
and
moment generating function, characteristics function, cumulative generating function. 30
<t>x =  j  e ltx f ( x )d x
15.3 Moment-Generating Function —  0 3
This is define as where the Fourier transform o f / (x )  is
A1x(t) = E(eCx) 30
for X discrete witth probability p,-, we have / m = T  j W o d w
- 0 0
A range simpler generating function is that of the cumulants. When the natural 
Mx(t) = 'Yj e tipi =  P (e f) logarithm of either the mgfo r the c f  is generated, it results into the cumulant- 
generating function, which is simpler to handle than the former two.
for X continues with frequency function f (a>u ) ,  we have This is given by Kx(t) = logMx(t)
Mx(t) = J  f{u )d u  
— 00
obtaining the Taylor series expansion of My(t) r!
we have whore /fr is the rth  cumulant.
M(t) = 1 + Zr=i V } tv
In handling discrete variables, the functional moment generating-function is also 
r! useful, which is defined as
where is the rth moment assume the original.
Because of the limitation of the moment generation function ( in that it does not Q(a) =  P ( l + y )  =  e[Cl +  y)i]
always exist) the characteristics function become appropriate which is define by = 1 ! Ir=lUr!(r)yr 0 (t)  = E{eitx)
flic Taylor expansion is similar where uir) is the rth factorial moment about the origin.
256
257
UNIVERSITY OF IBADAN LIBRARY
15.4 Convolutions Jusl as the case of two sequences, several sequences can also be combining together. 
Let there be two non-negative independent integral-valued random variables X, Ywith The generating function of the convolution is simply the product of the individual 
generating functions. That is. if we have the sequence {a;) * {£, ) * {c,} * {d,) * .... the
p d f
P(x  = 0  = a, generating function becomes /l(x) B(x) C(x) D (x )....
and Given the sum of several independent random variables,
P(y = /)  = bj the probability of the joint event (x =  y, -  j ) is given as aibj.
Syi =  Xj + X i  + x ?  + ••• + X n
Where Xk have a common probability distribution given by p,-, with pgfP(x), then the 
Let there be a new random variable S = x  4- y  the event (s = k) is made up of the 
pg/ol'5,, is {(P(x)}71. Further, the distribution of 5„ is given by a sequence of 
mutually exclusive events (X  = 0, Y = k), ( X  = 1 ,Y  = k -  1 ) ,. . , (X  = k,Y  =  0)
probabilities which is the n-fold c o(pn.v)o •lu.t.i..o..n* {opf .){ p=*}  with r if its written as {pi) * ipi)
Given the distribution of 5  as
Pis =  k )  =  ck 15.5 Compound Distributions
Suppose the number of random variables contributing to the sum is itself a random 
then it can be shown that
Ck = a0bk + a-i bk. 1 +  — +  arb0 variable. Thai is
When two sequence of numbers which may not be probabilities are compounded, then SN = + x2 + — + *n
it is called a convolution which ca{nC kb}e  =re p{reks}e nted generally as wherea * [bk] P{xk = i} = f i ' 
Given the following general functions p{N = n} = g lx 
> » (* ) -2 5 o « i* <’|
P{Sn = /) =  /i,.
and the corresponding p d f  be given as
C(x) = l i .o Q x 'J F W  -  £ f i * ‘ ^
we can then write
C(x) = A (x)B(x) Q ( * ) = l 9 n * "
this is because, multiplying the two series A{x) and 5(x), and given the coefficients n (x )  = Z /ijX '.
ol'x* as ck. Simple probability consideration show that we can write the probability distribution
When considering probability distribution functions, the probability-function of the ol'S„ as
sum.5, of two independent non-negative integrated-valued random variable X and Kis ^  = p{s„ = /}
simply the product of the letters probability-generating functions. =  £ p {/ V  = r,}P(Sn = l/N  = n)
259
258
UNIVERSITY OF IBADAN LIBRARY
Set L = 0, (i, =  i. and <iy -  /
Pij = ! \X \  - j / X o  = 0
9nP{sn = l/N = n ) The above is known as transition problem. The entire process is defined by [pf/).
>1 = 0
Tor llxcd n. the distribution of Sn is the n-fold convolution of {F,} with itself, that is 15.6.2 Transition Diagram
(/•;}." Thus A transition diagram is a graphical representation o f the process with arrows from 
E<-uF{5(I = l /N  = njx* = {F(x)}n each stale to indicate the possible direction of movement together with the 
Thus the probability generating function//(x) can be expressed as corresponding transition probabilities against the arrow s.
1 =  0 Kxample 15.1
*' ^  gnp{Sn = l /N =  n) Consider a process w ith three possible slates av  a 2, cmcla2. I .el p,,-: i = 1, 2,3, j  =- 
n= 0 1 , 2 , 3 , denote the transition from one state to the other.
S n ^ p t f n  = l /N  = n}*' The corresponding transition diagram is as follows:
n=0 i=0
I h i  =  ' A >
= ny=0 s „ { f ^ ) } n
=  G ( /M )
Thus gives a functionally simple form for thepg/'of the compound distribution {A;} of 
the sum SN
15.6 Markov Chain The diagram above represents a square matrix 
It would be o f interest to define the joint probability o f the entire experiment. This P = (p „) i =  1 .2 .--------n, j =  1.2.----------
will be a very complicated or intricate problem.Early in the 20lh century, a Russian 
Mathematician A.A Markov, provide a simplification of the problem by making the 15.6.3 Transition M atrix
assumption that the outcome of a trial XL depends on the outcome of the immediate To even transition diagram, there exist a transition matrix and vice versa. F or the 
proceeding trial Xt_, (and on if only) and effects Xc+1 (next trial) only. The resulting example 16.1. the transition matrix is as given below :
process is known as Markov Chain.
15.6.1 Transition Problem Pn Pl2 P13'
P21 P22 P-a
If a,- denote the state of the process X, and a,, i not equal toj  denotes the state of the p = .P31 P32 P n
process X, f ,. then there is a problem of going from a, to a, denoted by pi;- define as, 
Pn ~ W o ,  = ci, / X ( =  a,)
2 0 1
260
UNIVERSI Y OF IBADAN LIBRARY
This is a one-step transition matrix for every given i.{p,y} indicate the branch problem 
in a tree diagram. In general. Pou PU, Poi
rPu P12 ...... -  Pin Pn, P>. Pn.
P21 P22 .....
IPnl Pn 2 .....
Pn
and.
□  « / = i Example 15.2 (Forecasting the Weather)
y=i Suppose that the chance of rain tomorrow depends on the previous weather conditions 
For any given t. p,y is the probability of transition to a. given that the process was in only through whether or not it is raining today and not on past weather conditions. 
slate a,-. Suppose also that if it rains today, then it will rain tomorrow with probability o ; and 
if it docs not rain today, then it will rain tomorrow with probability /?.
In this section, we consider a stochastic process {Xn n=  0,1,2,...} that takes in a II we say that the process is in state 0 when it rains and state 1 when it does not rain, 
finite or countable number of possible values unless otherwise mentioned, his set of then the preceding is a two state Markov chain whose transition probabilities are 
possible values of the process will be tested by the set of non-negative integers (0, I, given by
2. If X„ = 1. the process is said to be in state / at time n. We suppose that a l-c r „  ( a 1 - o '
whenever the process is in state /, there is a fixed probability piy-that it will set be in fi l - /? j { P | - / ? J
state j. That is we suppose that
=- ]\XU -■ ..... X, = l,,X„ = /„ }= P„ or all statesi„, il t ..., in- j .
i./and Vn > 0. Such a stochastic process is known as a Markov chain. The value p,y Example 15.3
represents the probability that process will, when in state i, next make a transition into Suppose that company XYZ has three departments a}, a2 and a3. The employees lean 
stale j. Since probabilities arc non-negative and since the process must make a to be transferred to another department at the end of the year as follows:
i) A man who is in a: . must be transferred only to a2
transition into some state, we have that Ptj > 0, i, j > 0 ; P0 = 1, i = 0 ,1,...
h-o ii) A man who is in a2 cannot be transferred to a lt but can be transferred to a2 or a3 
with equal probability.
P denote the matrix of one-step transition probabilities p,y. so that iii) A man who is in a 3 cannot be transferred to a2 but can be transferred either to a3 
with probability “/g  or a I with probability Draw a one state transition diagram, 
and matrix.
UNIVERSITY OF IBADAN LI "  ^BRARY
First problem: Suppose the process state in other 1, what is the probability that after 
n-steps it will be in state j? Consider a process with only three states al, a2 and a3. 
What is the probability that after two steps the process will be in state j . f o r  j  = 1,2,3 
given that the initial state of the process is i . fo r  i = 1,2,3. P{X2 =  a J X ,  = fl,J =  PU .PU = PPu m  
By assuring that i =  1, we obtain a probability tree for the process as follows: P{X2 = a2\X0 =  a ,) = P12.P2, =  P2, m P»m  
P[X2 = a 3|X, = a,) = P.j .Pj, = P3iw P»m
PllPll + P12P21 + P l3P3lP llP l2 + Pl2P22 +  Pl3P32PllPl3 + P12P23 + P13P33 
P2lPl l +■ P22P21 + P23P3lP2lPl2 +  P22P22 +  P23P32P2lPl3 +  P22P23 +  P23P33 
P3lPll + P32P21 + P33P3lP3lP l2 + P32P22 +  P33P32P3lPl3 + P32P23 + P33P33
Assume tliai i = 2. then
P{-f:; — “ l /  — a2) ~  P21 ■ Pi 2 — P2I
. , . , (2)
/  *0 -  a 2) ~  P22 P22 ~  P22
/■'{a'x = a< /  = a2} = P23 P32 = P^J P23
265
264
UNIVERSITY OF IBADAN LIBRARY
P r/3) =  V 4 
Pn(3) = V 6
Assume that i = 3, then
31 In the same veinP { x  2 =  a \ / x 0  =  a 3 )  =  p 3 i . P i 3  =  P
P { X 2 =  a 2 / X 0  =  a 3 )  =  P32-P23 =  P 32
P { x 2 =  a 2 / x 0  =  a 3}  =  P 3 3 . P 33 =  V r s
It could be seen that p (n  ̂ = p"
Example 15.4
Use a probability tree to fmdp(3) in example 16.3
P{x3 = a 2 /  x0 = a2} = X/ 2 l f 2 . V 2 =  V s
P{*3 = as / xn =  a2} = V 2  • V 2  • V2 =  V s  
P{^3 = a l /  xo = a 2) = V 2 ' V 2 • V 3 = V l2
=  aj) = 1. V 2. V  = V P{x3 =  a 3 / x 0 =  a2} =  V 2 • V 2 ■ 2/3 =  * 4  {x3 a 2 / x a = 2 4 P{x3 = a 2 / x 0 = a2} = 1/ 2 . 1/ 3 l  = V 6 
^ { * 3 =   aa ,j //X o  ==   aa ,j)}   ==   11 -. V1 / 2   • aV/ 2   ==  VV4  P{*3 = az /  *0 = ai) = V 2  • V2 • V2 = V 6 PP[x3  ==  x„  a,} = 2 3 = V 6 P{^3 = /  *0 = Qz) = V 2 • 7  3 • V 3 = V 9{*3 a2/ x 0 = 1-a/ 2 2/3 3 p{x3 = a3/x0 = a2) = V2 • 2/ 3 • 2/3 = 2/g
Pl3<3)=
267
266
UNIVERSITY OF IBADAN LIBRARY
^ ] l !' — V 12 + V69 -7 /72/346 Therefore.- % + v  = 
r-(3)
^23" '  = V i 3 + V6 + %  = 3?/  72 Pu Pu pS [V s  v 4 7/ l 2
,j(3) ...
P21 P22 P23 = 36 7h.A 37/ 72
*3^ P S P23 2S/S4
Note that = p (” l5p = p (0)p"
at n = 1 , pll) = p “»p
at 11 = 2, p(2) =  pCilp =  pC0)p 2 
at n = 3, p(:̂ = pt'z)p = pWp*
This implies that
_ V  „(»- ’ ’
Pij ~ / _ P I k Pkj
k
Definition
Let {9fn , n = 0,1, 2 ....} denote a square of real valued variable index by n. The value 
of x  for given n is the state of the process at the n  th step.
=  a , / x0 =  a,} =  V 3 • 1. V 2 =  V 6 P{xn = j  /  *„_•! =  i} is a one-step transition probability matrix. The index n denote 
l>{xi =  a3 / x„ =  a3) =  V 3 • 1. V 2 =  V 6 something close to time and therefore depend on xn. x, xn_2 *0 anc* not on
P{x3 = a 2 /  *„ = a ,j  = 2/ 3 V 3 • 1 = 2/ g -V7l + l'*71 + 2>
The Markov assumption is that
' ( * 3  =  « . / ^  = «3} = 2/ 3.2/ 3 1/3 =  4/ 27 P^xv ~ in /xv- l — jn-l> xn-2 ~ jn-2>—->x 0 = jo) = P{xn = j r / xn- 1  = /n + l<) 
^  * 3  = * 3  / Vo =• a3} 2/ 3.2/3.2/3 =  fl/2? The conditional distribution of xn given the whole past history of the process must 
equal to the conditional distribution of xn given x ,,^  .
*1* « 4/27
=• V 6 + %  =  ?/18 Example 15.5: (A Communication System)
Pn,nJ , = l //6  ++ 8//2 7 _-  2 5 //54 Consider a communications system which transmits in digits 0 and I. Each digit 
transmitted must pass through several stages, at each of which more is a probability p
2<»K 269
UNIVERSITY OF IBADAN LIBRA
a R
-J Y
^vj 
Ci-Oi "
. In order words, we can-say. that the.proccss is in 
that the digit entered will be unchanged when it leaves. Letting d'ndenote the digit 
State 0 if it rained both today and yesterday;
entering the nth stage, then {Af„,n = 0,l...} is a two-state Markov chain having a State 1 if it rained today but not yesterday;
transition probability matrix. State 2 if it rained yesterday but not today;
P 1 - P  (P l - P }  State 3 if it did not rain either yesterday or today.P = P =
1 - P P 1 - P  P
The preceding would then represent a fair-state Markov chain having a transition on 
probability matrix.
Example 15.6 0.7 0 0.3 0
On any given day Gary is either cheerful (C), so-so (5), or glum (G). If he is cheerful 0.5 0 0.5 0
today, then he will be C, S, or G tomorrow with respective probabilities 0.5, 0.4, 0.1. 0 0.4 0 0.6
If he is feeling so-so today, then he will be C, S, or G tomorrow with probabilities 0.3, 0 0.2 0 0.8
0.4, 0.3. If he is glum today, then he will be C, S, G tomorrow with probabilities 0.2, You should carefully check the matrix P, and make sure you understand how it was 
0.3. 05. obtained.
Letting Xn denote Gary's mood on the nth day. then {Xn, n > 0} is a three states
Markov chain (State 0 = C. state 1 = 5., State 2 = G) with transition probability 15.7 Stationarity Assumption
matrix. A Markov chain is stationary if for m =£ n
0.5 0.4
P[xn = j n / xn-\ = jn - 1>} = P{xm = jm /xm-l ~ jm- 3'}
0.1
0.3 0.4 0.3 or simply.
0.2 P{xn = j / x r - 1 = i.) = P {X m = j/xm- 1 = 00.3 0.5
In this case the one-step transition probability does not depend on the step number. It 
Example 15.7: (Transforming a process into a Markov chain) is therefore sufficient
Suppose that whether or not it rains today depends on previous weather conditions For us to state only the one-step transition probabilities.
through the last two days. Specifically, suppose that if it has rained for the past two We therefore set n = 1 and obtain
days. Even it will rain tomorrow with probability 0.7. if it rained today but. not 
yesterday, then it will rain tomorrow with probability 0.5; if it rained yesterday but pjy° = ^ {* 1 = j / x o  =  0
not today, then it will rain tomorrow with probability 0.4; if it has not rained in the P,("J = P{.xn =  j/x o =
past two days, then it will rain tomorrow with probability 0.2 .
II wo lei lhi.> state at time n depend only on whether or not it is raining at time n. then For, n =  0, this leads to 
>he preceding mode! is not a Maikov chain (why not?). However, we can transform p!j) = 1 for j  =  i
ilus model into a Markov chain by saying that the state at any time is determined by 
the weather conditions during both that day and the previous day.
271
UNIVERSITY OF IBADAN LIBRARY
P i j  is the probability of the first event, and that of the second is
P i( i ' =  0 for J f t
Pi"1 = P{xn = j / x o = 0 by definition 
T,kPikakj
= Y,kP{xn =  j , x n.  i  = k / x 0 = i} marginal from joint
= k ,x0 = i}P{xn.  j = /c/x0 = i) Consider a process with the following three states; a1( a2 a3l where afis an absorbing 
state, and others are transient.
^ P A n  =7 An -1 = W A n - l = kAo = 0
k ,
— P22a2l
= Yk j v* ~ ' )pKi
15.8 Absorbing Markov Chain
A stale in a Markov chain is absorbing if it is impossible to move out of that state. 
That is, the process stays there. A Markov chain is absorbing if it can’t least one 
absorbing state. That is,
Pjj = 1.0 — P 23a 3 l
A state in a Markov chain is transient or non-absorbing if it is possible to get out of 
that stale. That is
Pjj =£ 1.0 for  state j.
15.8.1 Probability of a Markov Process ending in a Given Absorbing State
This depend on the given in that state. Let atj denote the probability that an absorbing 
chain will be absorbed in state if it states in the non-absorbing state a,.
Method 1 Then
There arc two possibilities, either the first transition is to state ay (in which case the a U =  p i x < =  a/Ao = )
chain is immediately absorbed) or the first transition is to some transient or non­
absorbing state ak ,k  *  j, and then the process immediately enters states a, f r o m a k. 
These arc two mutual exclusive events.
272 273
UNIVERSITY OF IBADAN LIBRARY
Bv substitution
As an example consider the following transition matrix lor absorbing Markov chain 
a2t ~  P21 +  P a i + P 3O31 with four states. Note that an absorbing state is indicated by probability l2 2 2 2
v 4 v 4 V 2 0
p = V 3 v 3 0 v 3
ciij is a one-linear equation in several unknowns. Construct a corresponding linear 0 0 1 0
equation by using each o f  the other transit state as initial state. 0 0 0 1
In the given example, a2l is a linear equation in two unknowns. Note that Ptj is 
Note that the absorbing state are a3 and a4.
obtained from the given one step transition matrix. The onlyunknown are akJ-, all k =£ Suppose that we want a13, that is the probability starting from a. will get absorbed in 
j slate a3 . In other word, we want the probability that the chain will enter a3 from a1. 
The corresponding a21is given by l
Then aiywill give us
a33 = Pl.3 + P n a13 + Pl2a23 
a23 = P23 + P23a13 + P22a23
Substitute forpiy. noting that akj is unknown. 
~ P?.2a21
ars = V 2 + V 4 a i3 + V 4 fl23
a23 — 0 + 1/g «13 + V 3 a23
~  P33°31 Solving the simultaneous equation, we obtain 
« i3 =  4/ 5 and a23 = 2/ 5
The matrix becomes
| fl13 a l4|
I °23
Alternatively.
l hen Naive all values of’equation ii for all ^ j  t sue ii as a2 1 and u?1) simullnneouslv
! :>> rr Vi, +  2.H Piltakj
274
27?
UNIVERSITY OF IBADAN LIBRARY
Wo can write this matrix form. Let A denotes the matrix of aiy R denotes the matrix
oI'Pij. Q denotes «3 fl2
The matrix o f pik. That is
A =  (af/) =  ia kj} -s x r
r
R = (Pa)  s x r 1 0 0
0  =  i P i k )   S X 5
Then aijcan be written as
0 1 0
A = R + QA fl4
Where
r = number o f  absorbing states v 2 0 V *
s = number o f  transit states
Step I: Arrange the rows and columns of the one-step transition matrix in which a 0 v 3 V s
way that the absorbing states appear first in the rows and first in the columns.
V
Step 2: partition the new one-step transition matrix as follows
r
-  r ~ \ Step 4: Find I-Q and hence ( / - ( ? )  1
absorbing states 0
transient states 5 ■< R 3/ 4 " 'A( /-< ? )  =
J - v 3 2/3
hrx r )>  0 ( rx s )>  N(sxr)> Q (sxs ) u -  «?i =  (3A ) (2/ 3) -  (V 4) (V 3) =  ( 5/ i 2)  
Step 3: Solve for A. the matrix of the unknown, as follow
[2/ 3 V 3
(/ -Q )A  = R C0f(/- c ) = k  3/ J
A = ( / -< ? ) - '/* cofT(/ -  Q) -  Adj(l -  (?) = 2/3 v 3
Since (l-Q) is  non-singular and so has an inverse. (/ -  (?)_1 is known as the V3 3A
fundamental matrix. , Ad](I -  Q) 2/ 3 v 4
l or example, the above 4 x 4  matrix incan be rearranged as follows: since V - Q r  = Je t( , _  Q) 12AV3 V4
277
UNIVERSITY OF IBADAN LIBRARY
r8/ s 3/ s i
i 4/s 9/5j
Therefore /! =  ( / -  Q)_1R =
L4/s  9/ 5J °  V 3.
%  V 5
7s 3/sJ
15.8.2 The Expected Number of Times a Markov Process will be in each Possible 
Starting Transient (Absorbing) State
Lei N = (liij) where Uij is the number of times the chain is in transient state a;- 
given the initial state is at.
Lot n,7 denote the mean number of time that the chain is in transient state a, .
Let N denote the matrix of n (y, which is a square matrix since i and j  range over the 
transient stateds. Consider the state at time 1. That is, the first time interval is spent in state a, (a( is 
Consider a chain with the three states in (a), a a, a2, a3 where aj is the absorbing transient state). If i =£ j  and the transition probability pik given the probability that the 
state. Assume that the initial state process is a2. process will be in aK from at . Then 
nij = T.kPiknkj
nii = Piknki = 1 
=  dii +  'LkPiknki
Which is combined into
n i i  =  d U  + Y j  P ikU ki ' =  l> f° r ‘ “  j
l<
= 0, for i j
279
278
UNIVERSITY OF IBADAN LIBRARY
i----
o
r—i
LD
co
L̂T)
<»
In matrix form this can be written as
Recall
djj = Vij + I,kPikakj
d 1 d2 • dn
dx d2 . dn
r " N
d, 1 0 0 r ^ \
dx i 0 0d2 0 1 0
d2 0 1 0
dn 0 0 1
dn 0 0 1
J J
15.8.3 The Length of Time (Expected Number of Steps) Required before 
Absorbtion Occurs
For any given initial transient state a, the expected number of step required before {p*} = Q
absorption is given by the elements of the rector M  = N
N = 1 + QN
t = Z ” "
i n  = (/ -  o r 1 1 =  (/ -  <?
\)'- i
Thus the element of the fundamental matrix give the expected number of times the 
Let c be a column rector with the same number of elements as the columns of N and 
every element of C is unity. process will spend in given transient states for any given initial transient state.
Then 8/s  3/s = rl.6 0.61
n = (/ -  Q r l =
t  = NC a2 4 / c  9 /c l0.8 1 .8J
Interpretation:
T Starting in state au  the expected number of times in state al before absorption occurs 
where c = 1 is 1.6. Similarly, starting from aa the expected number of times in state a2 before 
1 .
By using the above fundamental matrix we find absorption is 1 .8.
The expected number of transition before absorption is 
aij™ij =  Pijmij +  Sfc Pikakj mij 
summing over/
15.8.4 The Number of Transitions that will occur before a particular Absorbing 
State is reached M = Y j a <im ijk
Letm,y. denote the number of transition that will occur before a particular absorbing Multiply both side of bymiy- 
state j  is entered given the initial state t.
aijmij = Pijmij +  ^  Pikakj mij
k
2 8 0
2 8 1
UNIVERSITY OF IBADAN LIBRARY
a ij T,k P i k a k j  m k j The transition matrix P of a Markov chain is=   +  
Note that mi; = 1 and m y =£ 1
1 2 3 4 5
In the four state process or chain given earlier, suppose we want to compute m13 0
Thus, 0-ijm.ij becomes. 1 1/2 1/4 1/4
2 1/2 0 1/4 1/4 0
a 13m 13 =  a l3  +  P l l a 137 n 13 +  P l 2 a 23m 23 3 0 0 0 1 0
Constructing another equation by using the other initial transient slate,
4 0 0 2/5 1/10 1/2
a 23m 23 =  a 23 +  P 2 1 a l 3 77713 +  P 2 2 a 23m 23 0 0 1/2 0 1/2
this is because k = 1,2 and j  = 3 5
Substitute for the known values, that is thea’s andp’s. V J
(The University of Sydney, 2009)
Note: The p ’s are from the given one-step transition matrix and a's  are earlier 
solutions. Practice Questions
Therefore 1. Define a generating function. A(x).
2. Define the following-
VV 5  mm 2i33  ==   V 5  ++   VV 34 ' V/ 55  7mTl1t33  ++   VV j* -•  VV 5  mm 2233 (a) moment generating function5 4 3 5 (b) characteristic function
(c) cumulant generating function
This result in 3. Given two random variables X and Y. with probabilities P(x = i) = a,- and 
6»l 13 _  ^23 = 8 P{y — j) = hj. Write an expression for their convolution ck where
(i = 0,1,,..., r) and
4m! 3 4-77123 = 6
0  = 0.1..... k )
4. Given the sum of random variables SN, show that the probability generating 
Thus,77T-J3 =  1.9 andm23 = 3.4 
function H(x) is given as G(F(x)}.
So that we have 5. On any given day Bruce is either cheerful (C), or so-so (5), or glum (G). If 
he is cheerful today, then he will be C or S tomorrow with respective 
a 3 probabilities 0.5, 0.4. If he is feeling so-so today, he will be C or 5 tomorrow 
O] r  1 0 ^ "1.9 " with probabilities 0.3, 0.4. If he is glum today, he will be 5 or G tomorrow 
with probabilities 0.3, 0.5.
a2 0 1 3.4
J (a) Write down the transition matrix P which describes Bruce’s mood oscillations 
over time.
282 283
UNIVERSITY OF IBADAN LIBRARY
(b) Bruce is currently in a cheerful mood. What is the probability that he is not in C H A P T E R  16
a glum mood on any of the following two days? ST E A D Y  S T A T E  AND PA SSA G E T IM E  P R O B A B IL IT IE S
(c) Obtain Pn,n  — 3,4,5. (The University o f  Sydney, 2011)
6. Suppose that whether or not it rains today depends on weather conditions of 16.1 Introduction
the last three days. If it has rained in the past three days then it will rain today This chapter introduces the student to the process of determining the equilibrium state 
with probability 0.8; if it did not rain for any of the past three days, then it will of a Markov chain. That is, after a long process, the probability of a process being in 
rain today with probability 0.2; and in any other case the weather today will, a steady situation.
with probability 0.6, be the same as the weather yesterday. Denote the states 
Consider the formula for the rector, P(7l) ,of state probabilities for the time n given by
by triples of the kind RRR, RRF, etc., and write down the transition matrix P 
p ( n )  _  p ( 0 ) p ( n \
of this Markov chain. Obtain Pn,n  = 3,4,5. (The University of
Sydney. 20/1) Where P(0) is a vector of initial slate probabilities and P(7l) is the n-step transition 
7. Define a Markov Chain and state its assumptions. matrix. The interest is to find out what happen to a Markov chain with Pas n becomes 
large.
8. (i) State the stationarity assumption, (ii) Show that p •" * =  £/< p • We will approach this problem by considering the following example.
9. (a) What is the absorbing state?
(b) Assuming the process starts from states {1,2), what is Example 16.1
(i ) the probability that a Markov process will end up in the given absorbing state. An Engineering company has three departments. Engineering (a-j), production (a2\  
(ii) the expected number of times a Markov process will be in each transient state and sales (as).A man in the Engineering dept, cannot be assigned to sales but may be 
lor each possible transient state. transferred to production or Engineering with equal probability. A man in the 
(iii) the length of time that it will take for a Markov process to be absorbed. That production dept, cannot be transferred to engineering but can be transferred to 
is the number of steps required to reach an absorbing state for the first time. production or sales with equal probability.A man in sale can be transferred to 
(iv) the number of transition that will occur before the absorbing state is reached. Engineering or production and his probability of going to Engineering is 3-times that 
of going to production.
The associated P is as follows:
0.5 0.5 0
p = 0 0.5 0.5
.0.75 0.25 0 .
The following transition probabilities can be deduced from P
284 285
UNIVERSITY OF IBADAN LIBRARY
0.250 0.500 0.250 Marginal Distribution of P{n)n
p (2 )  _  p2 0.375 0.375 0.250
0.375 0.500 0.125
0.3125 0.4375 0.2500'
p (3 ) =  p  3 0.3750 0.4375 0.1875
.0.28125 0.46875 0.2500.
0.34375 0.4375 0.21875 ' 
p ( 4) =  p 4  _ 0.328125 0.453125 0.21875 
0.328125 0.4375 0.234375.
0.3359375 0.4453125 0.21875 
p (S ) _  p 5 0.328125 0.4453125 0.2265625 
0.33984375 0.44140625 0.21875 .
Marginal Distribution of P(n)21
We can go on and on, until the n-step is reached. It should be noted that as the steps 
0.5
increase, the 0.375 0.375 0.328125 0.328125 0.328125
0.4 -----------  --------- - -------« s
Probabilities tend to be steady. This can readily be seen in the following graph.
0.3
p<nlZ1 0.2
16.2 Graph of Marginal Distribution of P ^ 0.1 0
For fixed fair (t,/) we can draw the graph of for various values of n as follows. 0 i " V  3 < s 6
Marginal Distribution of P(n)31 Thr function P,n,M
0.6 0.750.8 Summary of Solution to Problem
0.2 1. The function gives the probability of getting to dept. a : , from dept, a,- inpNj! 0.4 step n, n =  0, 1 , 2......
0
1 2 3 4 5 6 2. When n = 0, P ^ ] = 0, since the person will certainly not be transferred.
Thr function P|nl31 3. As n — co, P̂ te n d s  to converge at interpretation of i. similarly, P -^  -»
0.44 or irrespective of i , and P ^  -* 0.22 or 2/9 irrespective o ft.
287
2S6
UNIVERSITY OF IBADAN LIBRARY
•4. All the rows of the matrix P(n) are identical but the columns are not as This represents a dependent set of equation (since each row elements must sum up to 
n —  g o . unity). One of the infinite numbers of solutions can be found to represent a 
5. The state probabilities that satisfy the above criterion are called steady-state probability solution by imposing the condition
probabilities or equilibrium probabilities, or limit value, or stationary 
distribution.
6. A Markov chain is said to approach equilibrium as n tens to infinity if its This is known as a normalizing equation. 
transition probabilities approach limit values. Example 16.1:
From example 15.1 , find lim„-or) P(1XJ 
16.3 Stationary Distribution Solution
The steady-state distribution is define as the set (i/f) where 0.5 0.5 0.0’
Pj — littiji—oo Pij — limT1_ co P[Xn = j) \P\P2P1 1 = [P\PiPl\ 0.0 0.5 0.5
.0.75 0.25 0.0.
and is independent of i. Furthermore,//, > 0.
or
ZjP j P i 0.5 0.0 0.75 P i
Pi = 0.5 0.5 0.25 Pi
0.5 0.0. Pi.
Pk =  I ,  PiPuc Pi. 0.0
A probability distribution which satisfies pk is called invariant or stationary 
=  0.5//! +  0.75//3
distribution (lor a given Markov Chain). In this case row ofP(n) is the probability 
//2 = 0.5//, + 0.5//2 + 0.25//3 
vector// = (/tl t //2, •••)• Hence, given
Pi =  0 .5 //2
pin) _ p ln-l/p Thus
nl i—m on  P(n) = Um P^ -^ P
Pi ~  -SP\ 
71 —• co
P\Pi PlP2 and
PlP2 — Pi Pi \P] P2 = 2a<3
Substituting we have 
4
This can be written as Pi = 3^1
P =  pP IJv imposing the normalizing condition on the sum ut we obtain
or P, + P2 ■+ "  1
p T = Pr pT 4 2
Pi "F -j  « i  + jr/M
I lien:Ibr.:
2X8
2 6 9
UNIVERSITY OF IBADAN LIBRARY
1 D H R
This means therefore that
4 2 D S 2 v 2 0
Pi = and n3 =  -
P = H v 4 v 2 V
Thus n = (jiiPiPz) = (V 3 4/ g 2/ 9) R 0 v 2 V
This gives a sample method of obtainingP(7l) than raising Pto power n. V J
Findlimn_co P(n) and give all possible interpretation of the result.
Interpretation:
can be interpreted as follows: 16.4 First-Passage and First-Return Probabilities
1. Probability of a distant state: if a point in time Is fixed in the distant future , //y We shall approach this topic by way o f asking certain questions.
is the probability that the process will be as state j. Q l : What is the probability that in a process stating from a(. the first entry to a; 
2. As a time average: if the process is operated for a long time, /iy is the fraction occurs at the nthstep?
of time that the process we be at state j. Q2: What is the number of stepsn, required to reach state ay for the first time?
3. As a fraction of process: if many identical processes are operated For Ql. consider the function/?^ which is the probability that the process will enter 
simultaneously, /iy is a fraction of the process that can be found in state j  after slate j  at the nth step given that it is in state i of the initial step. That is,
a long time.
4. Reciprocal of mean number of transition: iij is the reciprocal of the mean P p> = P[Xn-‘ j\Xo = i)
number of transition between recurrence of the state, that is, average (a) In this case, the process would enter state ay, after onlyk, 1 <  k < n — 1 , 
number of transition before a steps.
man inay will come back to a*. (b) After that is called either stay three is ay or change to another state and then 
return to ay. For Q2, the probability/^ that the process will reach state ay 
Example 16.2 for the first time at the nth step given that it stated from a, is called first- 
An individual of unknown genetic character is crossed with a hybrid. The offspring is passage probability and is define as
again crossed with a hybrid, and so on. The states are dominant(D), hybrid (H) and fij * = P{Xn = j-X71—1 ^  J'Xn-2 *  j > —>X\ *  j\Xo — 0
recessive (/?). The transition probabilities are
Definition: First-Passage Probability
This is the probability that the process is in state ay at time nand not before, given that 
it was in state a, at time 0.
290 291
UNIVERSITY OF IBADAN LIBRARY
Tlu> implies that the probability that n steps are required to reach state aj for the first 0.5 0 0
0 0.5 0
time given that the process siartsfroin slate a,. p ! ? * 0 0 0
Clearly
-  0. the process is still at a,- 0.5 0.25 0 0.5 0 O' 0.25 0.25 O'
0 0.5 0.5 0 0.5 0 = 0 0.25 0
/jy11 = Pi,, the one-step transition probability, t =£ j 0.75 0.25 0 . 0 0 0. .0.375 0.125 0.
A Iso. 0.250 0.500 0.250'
P ? JpW = 0.375 0.375 0.250
.0.375 0.500 0.125.
Then, 0.250 0.500 0.250' 0.25 0.25
O'
0.375 0.375 0.250 - 0 0.25 0n
v <n) = V #•(*)„(»-*) 0.375 0.500 0.125. 0.375 0.125 0PU lij Pji
k=1
n- 1 0 0.25 0.25
_  . V *  A k ) (n-k) = 0.375 0.125 0.25
- h i  v ij +  Z hi Pa
k = l 0 0.375 0.125
-  f (n) -I c(k) An-k) F{3) -  Pij _- t c p i r -  Pij >u H)i - h i  I' ll
/  ,>ij P>i 
/<=1 
or.
_ JJ) r<U (2)
/•<») _  _ !» ' v » ~ l  rO f)A n -k i = Pij - J i j  P„ -  Jf iUi )vPiU)'ll Pij “ 2-ik=\lij Pjj
N/B '/j ' " - = joint probability of reaching stale a; in 1 <  Ic < n -  lsteps only, 
16.5 Distribution of Number of Steps for First Passage
given that it started from a,.
(i) For any fixed lair the set n  =  1, 2,... jgives the distribution of the
lij".  can he obtained iteratively if (/>;"’} ore known.
number of steps to get from i to /(the first passage). That is, the number of steps 
required to reach a ;- for the first time.
Kxample 16.2
(ii) The number of steps required to get from i to j  is therefore a random
Consider (lie problem of departmental transfer in chapter 17.
variableA/,,. with
0.5 0.25 0 />{*.) =  "} = / «
r  .. 0 0.5 0 .
0.75 0.25 0
2 9 3
UNIVERSITY OF IBADAN LIBRARY
16.6 First Return (Recurrence) m i l  =  X p t j + ^ ( 1  +  m k j ) p lk 
( i) If; = t, f£ n) gives the probability of the first return to state a f. For example, k * j
the probability that the person transferred from department a £ will return to a, for =  Pi j  +  ^  Pik  +  P i k ™k j
the first time at time n. k * j  k * j
Corresponding to =  ^  Pik  +  ^  Pi k™k j
all k  k * j
fii  ̂ = P[Xn = ^  2 it —,Xi ^tlA'o = i)
=  1 +  Y j VikTrLki
k*J
( iii) The equation relating / J n) to would also be the same.
But since £ kpik = 1, This expressesm^- as a linear function of m kj as the unknowns.
t i n) =  n * i  = zi*0 = 0  =  1 (ii) By using the same relation for other m £y’s a complete set of linear equation 
(ii) Then Nu is a random variable whose value is the recurrence of state a£. (equation to the number of unknowns) can be expressed.
(iii) Since {/it(n)} for fixed i , j  gives the distribution of fyy. the mean first passage (iii) A solution of the linear equation gives the mean first passage time from any 
time from a,- to ay denoted by m,yis given by state into state j .
oo (iv) Mean first recurrence times are obtained in the same way.
m a =  E(Nu )  = Y j n^ /n>
n = ] Example 16.3
Consider the three-department job assignment. How many assignments will occur, in 
(iv) where t =  mu is the mean first recurrence time. the average, before a man who is first assignment to ax (engineering) will be assigned 
to a 3 (sales)? That is, what is m13?
16.6.1 Calculation ofm iy-
( I ) The formula in (6.6) for would required the complete first passage time Solution to Example 16.3
distribution for solution to be obtained. Using the formula for m,y 
( 2) A simplification of the problem is obtained by conditioning the formula for 7 n 13 =  1 +  P n  m 13 +- p 12m 23
m(; on the state at step 1. That is. on one value of i at a time. There are two unknowns. Hence we form a similar equation for m23 as follows.
(3) Given that the process is in state at at time 0, either the next state is a; in m23 = 1 + Pzim l3 +  P22m 23 
which case /V(/ = 1 , or it is in some other state ak afier which it enter state Now. recall that
ar  in which case the passage time will be m kj = 1 + NkJ, the passage time 0.5 0.5 0 '
p = 0 0.5 0.5
from ak to fly. .0.75 0.25 0 .
By substitution we obtain 
(i) Thus m13 = 1 + 0.5m)3 +  0.5m23
2 9 4 2 9 5
UNIVERSITY OF IBADAN LIBRARY
m23 = 1 + 0.5m23
Solving the simultaneous equation, we found that C H A P T E R  19
mi3 = 4- and m 23 = 2 CH APM AN-K O LM O G O R O V EQUATIONS AND 
C LASSIFIC ATIO N OF STATES
Practice Questions
1. Define the term, steady state probability
17.1 Introduction
2. Write an expression for a limiting distribution.
The n,h-step transition probabilities P" is the probability that a process in state / will 
3. Solve completely the problem in example 5.1, and draw all the graphs.
4. Use matrix multiplication and limiting probabilities to solve Problem 5.2. be in state j  after n additional transitions that is,
5. In the post test in lecture four, obtain the stationary probabilities. /7= H * ,,-„ -^ |X „= ;} ,n 20 ,i ,j ;> 0 .
6. Define and write an expression for The Chapman-Kolmogorov equations provide a method for computing these n-step 
(a) First-passage probability. transition probabilities. These equations are:
(b) First-return probability. p;p?kp;' for all n, m > 0, all i, j
7. Using the post test of lecture four, find the mean first passage time from state
5 to state 4 by making state 4 absorbing. (This has nothing to do with states 
and are established by observing that
{1,2}.) (The University o fS ydney, 2009)
8. The transition matrix P of a Markov chain X = (Xn: n  > 0) is fT "  = |x.. = 'l
= Z ^ f r — “ -/ .X. = K |X ,= /}
. 1 2 3 4. 5 • K
r - 1  p \x - .  - J - !x - -  *• x » = ' M x „ = * 1* .  = -I
I •. o 0 0 0: 1 1 K-1'
2 . 0 ■ 0 1/3 1/2 0 - i c c
3 0 0 1 " 0 0 (.=ii
4 .: 0 1/3 0 1/6 1 /2 If wc let P"” denote the matrix of n-step transition probabilities, P”, then it can be 
5 1 /2 0 ■ .0 0 1 / 2 . asserted that
p i n —m) __ p i n I p ir n I
V  ■ ■
(a) Specify the classes of this chain and determine vyhether they are transient, null where the dot represents matrix multiplication.
recurrent or positive recurrent. Hence,
(b) Find all stationary distributions for this chain. p m )  _  p  p i i i - i ) _  p  p  p i " - - i  _  _  p "
(c) Find the mean recurrence time m.jj for all positive recurrent states. and thus P,nl- may be calculated by multiplying the matrix P by itself n times. 
(The University o f  Sydney. 20 JO) pniis said to be Accessible from slate / if for some P  >0. Two states / and j
accessible to each other is said to communicate and wc write / j.
296
297
UNIVERSITY OF IBADAN LIBRARY
Let denotes the one-step transition probabilities, and />! = Pt Solution:
Observe that P"k P" represents the probability that starting in / the process will go to The one-slep transition on probability matrix is given by
'0.7 0.3'|
stated in n + ^transitions through a path which takes it into K at the nth transition. p  =
.0.4 0.6 J
17.1.1 Proof of C -  K Equations 07 fo.i 0.3Hence. Pm = P: °-3)
Using remark (3) above, summing over all intermediate states /(yields the probability 0.4 0.6 1,0.4 0.5
that the process will be in state j  after n + m transitions. We have
r r  = e k «  = j \x ,  =;} 0.61 0.39'
v 0.52 0.48,
T " = L k . . , = ^ x . = * k = ' ' }
f  0.61 0.39'j 0.61 0.39'j
= 5> k,., = V ,K .t .X 0 = /}p {x „ = K\X„ = l} 0.52 0.48 J ,0-52 0.48 J
*«0
p;
*=0 f 0.579 0.4251 \
[ 0.5668 0.4332 J
Matrix ofn-slep transition probabilities: P(nl
Let Plnl denote the matrix of n -step transition probabilities P'j then the C-K Equation Hence, the required probability P*n equal 0.5749.
asserts that
pin-rm) _  p lm ) p(m) Example 17.2
Consider Example 2.4. Given that it rained on Monday and Tuesday, what is the 
By induction
p‘ probability that it will rain on Thursday?p\.»\ _  p i" - \* k )  _  p i i-l _  p n
That is the n -step transition matrix may be obtained by multiplying the matrix P by 
Solution
itself n times.
The two-step transition matrix is given by
Example 17.1
Consider example in which the weather is considered as a two-state Markov chain. If 
a  = 0.7 and p = 0.4, the calculate the probability that it will rain four days from today 
given that it is raining today.
298 299
UNIVERSITY OF IBADAN LIBRARY
Proof: the Is' two parts follow trivially from the definition of communication. To 
0.7 0 0.3 0 '0.7 0 0.3 0 prove (iii) suppose that /<-> /., and j  k then there exists m,- n such that
p{2) = p'- = 0.5 0 0.5 0 0.5 0 0.5 0 P"' > 0, P" > 0. Hence,
0 0.4 0 0.6 0 0.4 0 0.6
0 0.2 0 0.2 0 P',r = t P"' P* - K  p"k > 0
/•=U
Similarly, we may show there exists an S for which Pks. > 0 . Two states that 
'0.49 0.12 0.21 0.1 8n
0.35 communicate are said to be in the same class and by the proposition any two classes 0.20 0.15 0.30
0.20 .arc either disjoint or identical. We say that the Markov chain is Irreducible. If there is 0.12 0.20 0.48
, 0.10 0.16 oniy one class- that is, if all states communicate with each other.0.10 0.64,
Stale is said to have period d i f P ” = 0, whenever n is not divisible by d and d is the 
Since rain on Thursday is equivalent to the process being in either state 0 or state 1 on greatest integer with this property. (If P* = 0, for all n > 0; then define the period of i 
Thursday, the required probability is given by />2„ + = 0.49 + 0.12 = 0.61 to be infinite). A state with period 1 is said to be A periodic. Let d (i) denote the 
period of/, we can show that periodicity is a class property.
17.2 Classification of States
In order to analyze precisely the asymptotic behaviour of the Markov chain process, I7.2.2Recurrcnt (or Persistent) State
we need to introduce some principles of classifying state of a Markov chain. A state f e  S is said to be Recurrent if Pr(7] <oo)= 1 where T; is the number of steps 
Properties to be classified include: Accessible, Communicate, A periodic. Recurrent, it takes for the chain to finally visit /.
Transient, and Irreducible. Definitions of these properties now follow:
17.2.3Transient State
17.2.1 Irreducible Property A state I'e S is said to be transient if Pr(/, <co)< 1 where T  is the number o f stops it 
We say that the Markov chain is irreducible if there is only one class- i.e. if all states lakes for the chain to finally visit i.
communicate with each other.
Example 17.3
Proposition Suppose that the weather on any day depends on the weather condition for the 
Communication is a exultance relation. That is previous two days. To be exact, suppose that if it was sunny today and yesterday, then 
(/') /' <-» /'; it will be sunny tomorrow with probability 0.8; if it was sunny today but cloudy 
(/'/) / / i <-* j, then j <-> i; yesterday, then it will be sunny tomorrow with probability 0.6; if it was cloudy today 
(iii) If  i j, then j <-» i; then i k. but sunny yesterday, then it will be sunny tomorrow with probability 4; if it was 
cloudy for the last two days, then it will be sunny tomorrow with probability 1 . '
301
300
UNIVER ©ooSITY OF IBADAN
LIBRARY
oo->
o
Definitely, the model above is not a Markov chain. However, such a model can be (b) Obtain the transition matrix
transformed into a Markov chain. (c) Find the stationary distribution in terms of p and q where p + q = 1
(a) Transform this into a Markov chain
(b) Obtain the transition probability matrix Solution
(e) Find the stationary distribution of this Markov chain. The state are (2, 0), (1,0), (1, 1), (0, 1). 
The transition matrix is
Solution
To state-* (2,0) (l,0) (l,l) (O.l)
(a) Suppose we say that the state at any time is determined by the weather 
From state 
conditions during both that day and the previous day. We say the process is in: 9 P 0 . 0
State (S. S) if it was sunny both today and yesterday; P = (2,0) 0 0 <7 p
State (S. C) if it was sunny both yesterday but cloudy today; (1 . 0 ) <? P 0 0
Slate (C, S) if it was cloudy yesterday but sunny today; 0.0  0 I 0 0
State (C, C) if it was cloudy both today and yesterday (o.i)
(b) The transition probability matrix is 17.3 Discrete Time Process
Today’s state 1. Consider a series of events E resulting from the repetition of the same 
(S. S) (S, C) (C,S) (C,C) experiment and occurring consecutively. The common examples are telephone Yesterday's staie(S, S) 
.8 .2 0 0 calls, average customers at a service point, chromosomes breakages and (S,C) 
0 0 .4 .6 radiation, and so on(C,S) 
.6 .4 0 0 2. The occurrence are assumed to be of the same kind y.The number n of events (C,C)
0 0 .1 .9 in a given interval t is a random variable.
3. Letz ( t )  denote the total number o f occurrences within an arbitrary time
2. An airline reservation system has two computers only one of which is in interval t.
4 I t lP .U)  = P(*tO  = n).
operation at any given time. A computer may brake down on any given day which 
probability p. there is a single repair facility which takes at least 2 days to restore a Assumptions
computer to normal. The facilities are such that only one computer and a time can be i. Pn(t) depends only on the time interval of duration t, and does not depend on 
dealt with. the initial instant.
ii. The probability that E will occur more than once is the time interval that is 
(a) Fonn a Markov chain by taking as states the pairs (x, y) where x is the number infinitesimally small (that is, negligible).
of machines in operating condition at the end of a day and y is 1 if a day’s labour has iii. The probability that E occur once in the interval dt is proportionaly to that 
been expended on a machine not yet repaired and 0 otherwise. interval and is written
As Xdt.
302 303
UNIVERSITY OF IBA
AN LIBRARY
ii. it is independent of the values of z(t)prior to t0.
Assumptions on z ( t ) Thus if z ( t0) is known, the value z(t) (which is determined by the probability of 
i. the initial z ( t)  is 0. occurrence in the interval z) depends solely on the law of probability that governs the 
ii. z(t)) increases by I when E occurs.
increment nafter t„.
iii. z(t) remains constant when E does not occur.
The random variablez(t) follows a defined poisson process and constitute an example 
of a Markov chain. Jt is defined completely by the probability
(Z7t Y e - O '
Pn(0 = n = 0,1...n!
17.4 The Poisson Process
Under the assumption of the continuity of time we can expand py (At) in Maclaurin 
series
Pi At =  P i(0) +  pi(0)A t +  ^pi'(0)(A t)2 +  -  
= Pi(0) + p i(0 )A t +  o(At)2
But pjCO) =  0
The probability that Ecan occur 0 time or once is
iv. z(t) =  0, 1 , 2,..., n , ... .At random instants ty, t2, ..., t*, ...it jump abruptly from 
0 to 1 , 1 to 2, and 2 to 3, ... . The increment of z(t) at time interval t  is equal to the p{z(At) = 0 orz(At) = 1} = p0At + pyAt 
number n of events that have occurred. Now consider pnAt 
v. If we know the value of z (t0)att0(the initial instant) we can find the value at p0At +  pi At +  ••• = 1
t = t„ + At.
z(t) = z(t0 +  At) which is necessary.
= z (t0) + z(At)
= z(t0) + p0At = 1 -  px At -  p2At -  •••*
The increment z(At) = n is characterized by the properties of the probability of = l - p ,A t
occurrence in the time interval if This is so because of the assumption 1 -  py At -  o(At)2. But our only interest is in 
i. its probability is Pn(At) Pn(t)-
304 305
UNIVERSITY OF IBADAN LIBRARY
Event E can occur precisely n times during the interval t  + At if the following This equation does not hold for n =  0. We can use the forward chapman-kolmogorov 
mutually exclusive events are time. equation.
i. E occurs n times in the interval t, 0 times in the interval At. p0( t+  At) =  Po(t)p0( At)
ii. E occurs n -  1 times in interval t, once in the interval At. =  p0(t)[l-A A t]
iii. E occurs n -  2 times in the interval £, twice in the interval At.
And so on. p0( t +  A t)-p o ( t)  n ^
---------- Tt---------- —
These lead to the following:
pn(t + At) =  pn(t)p 0At +  pn_1(t)PiAt +  pn_2(t)p2At +  -  In the limit asAt -» 0
Pi At =  1 — p[ (0)At — o(At)2 Po(0 = -^Po(t)
sincepi(O) =  0
Letp] (0) =  A, then At the beginning of the interval t, we have 
Po At = 1 -  AAt -  o(At)2
po(0) = 1 andp^CO) = 0 , 7i *  0 
PiAt can be written as 
P i  At = AAt +  0(At)2 Divide the limit result by p0(t), we have 
So that £o(£) _
pn(t +  At) =  pn(t) [ l  -  AAt] +  Prj-i(£)AAt +  o(At)2 Po(0 Po(0PO
=  Pn(0  -  P„(OAAt +  p„_i(t)AAt +  o(At)2 
Pn(t d + At) -  pn(t) =  [pn_t ( 0  -  p„(t)]AAt + o(At)2 or— l° 9ePo^  = _A
Divide both side by At
p„(t + A t ) - p n(Q J^ lO g e P o ( t)  = ~ x j  dt
At =  ^ b n - l ( 0 - P n ( 0 ]  + 0(At) 2 
As At -> 0 logep0(O= - X t  
^ P n ( t)  =  pn'( t )  = APn-l(0  “ Apn(t), 71 = 0, 1,2,...
PoCO =
3 0 6 307
UNIVERSITY OF IBADAN LIBRARY
Ane -Xt An t j e~Xttr
^p„(t) can still be written as N o w p M  = T 7 T  = 1 F ^ W ’ i ~ 0,1,2'
0pn(O = *P n-l(0  “  Apn(0  » n > 0 so that
Ae ' Xl \ t Je~Xtt r _
Atn = 1 P , ( 0 ~Z) + A _  ( r + ; ) ! ! !
0Pi(O = *Po(O -A pi(0 
putr =  1 , j  =  0 
At n = 2 At ° e -xtt1
Pi ( 0  = =  Xte~Xt =  (At)e_Xt
Dp2(t) = Apj(t) -  Ap2(t) (1 + 0) ! 1 !
Atn = 3 Then
Z)p3(t) =  Ap2(t) -A p 3(t) (D + A )p 2( t ) =  X2te~xt
Dpx(t) can be rewritten as A2 te-Xc _  X2t j e~xit r 
p2( 0  =
(D +A)p1(t) =  Ap0(t) D + A “  (r + / ) ! ! !
So that we have put r =  1 , j  =  1 
2^-Xt
(D + A)Pl(t) =  Xe-Xt X2t2e~xi (*Q 'e
P2( 0  = 2 ! 2 !
Divide through by (£) +  A)
Consequently
Xe~xt Xe~Xct r ^  _ X3te~xt X3t j e~Xct r
Pi (0  = D + A ( r +  1 ) ! 1 ! = D + A = (r + » ! ! !
This is a general solution. put r = 1 , j  =  2
^3j3g-At _ O T _ f ‘
p3( 0  =
Xe~Xlt r \ e~ Xtt r 3!
3!
may also be written as
Noticetliat c T T i j r n r! 1 !
309
308
UNIVERSITY OF IBADAN LIBRARY
In general, Suppose that a continuous-lime Markov chain enters state / at some time, say time 0, 
C O  = (At)ne~XtPn ,n  = 0, 1 , 2, and suppose that the process does not leave state / (that is, a transition does not occur) n\ during the next s time units. What is the probability that the process will not have 
state / during the following t time units?
If we fix t, At is a fixed parameter for the distribution and the set P i(t),p2(t).... then To answer this, note that as the process is in state /' at time s, it follows, by the 
gives a probability distribution of the process at the fixed time interval which is a Markovian property, that the probability it remains in that state during the interval 
Poisson distribution. In terms of a counts of events the above results shows that the [s,s + t] is just the (unconditional) probability that it stays in state i for at least t time 
member of events occurring in a fixed time interval t is distributed as a Poisson with units. That is, if we test t; denote the amount of time that the process stays in state i 
parameter At. before making a transition into a different state, then 
Also since the mean of the Poisson distribution is equal to the parameter At, At can be 
P = {ri > 5’ + r|7j > 5 }= /3{7; > t)
interpreted as the expected number of events that can occur in time t. the quantityA is 
the average or mean rate of occurrence of E.
for all s ,t  > 0. Hence, the random variable 7) is memoryless and must thus be 
exponentially distributed.
17.5 Continuous Time Process The above gives us a way of construction a continuous-time Markov chain, namely, it 
A continuous-time Markov chain is a stochastic process having the Markovian 
is a stochastic process having the properties that each time it enters state /:
property that the conditional distribution of the future state at timet +  s, given the 
(i) the amount of time it spends in that state before making a transition into a 
present state at t and all past states depends only on the present state and is 
different state is exponentially distributed with rate say vL \ and
independent of the past. Thus, this lecture establishes the fact that a continuous time 
(ii) when the process leaves state /'. it will next enter state j  with same 
process is also distributed as an exponential probability
probability, call it p,-y, where * i  pfj = 1.
17.5.1 Definition and Properties
Consider a continuous-time stochastic process [x{l), t > o} taking on values in the set A state i lor which u,- = 00 is called an instantaneous state since when entered it is 
of non-negative integers. In analogy with the definition of a discrete-time Markov instantaneously left. Whereas such states are theoretically possible, we shall; assume 
chain, given earlier, we say that the process [x{l), t > o} is a continuous-time Markov throughout that 0 < v>. < co for all /. (If v, = 0, then state /' is called absorbing since 
chain if for all s , t  > 0 and non-negative integers i , j , X 0 <  u < s, once entered it is never life).
Hence, lor our purposes or continuous-time Markov chain is a stochastic process that 
moves from state to slate in accordance with a (discrete-time) Markov chain, but is 
such that the amount of time it spends in each state, before proceeding to the next 
If, in addition P^\X(,t i )= j |.Y(j) = l} is independent of s, then the continuous-time stale is exponentially distributed. In addition, the amount of time one process spends 
in slate t and the next state visited, must be independent random variables. For if the 
Markov chain is said to have stationary or homogeneous transition probabilities. All 
next slate visited were dependent on 7). then information as to how long the process
Markov chains we consider will be assumed to have stationary transition probabilities.
310 311
UNIVERSITY OF IBADAN LIBRARY
has already been in state i would be relevant to the prediction of the next state-and 
this would contradict the Markovian assumption. . A continuous time stochastic process is said to have Markov property and is called a 
A continuous-time Markov chain is said to be regular if, with probability 1, the continuous time Markov process if for all t* > >  -- .ty  > t0 satisfying the
number of transitions in any finite length of time is finite. An example of a non- condition tn > tn > t0.
regular Markov chain is the one having.
Pi, i+  1 =  1, Vi = i2 1. P(X(tn) = jn/X (tn- i )  = jn-l>X(tn- 2) ~  jn-Z' •••••»^(^o) = jo)
It can be shown that this Markov chain-which always goes from state i to i +  1, = P( X(tn) = jn/X ( tn_i) =  j n- i)
spending an exponentially distributed amount of time with mean ' /2 in state i -  will, 
with positive probability, make an infinite number of transitions in any time interval This is the independent probability and it state that all that is needed to predict the 
of length t ,t  > 0. We shall assume from now on that all Markov chains considered state of the process at time n is the state of the process at the immediately preceding 
are regular. time.
Let qtj be defined by 2. A Markov process is said to be time-homogeneous or stationary if
Qij = VtPij, V i *  j
P{X(t2) = y /^ ( t1) = i) = P iX iti -h )  = j /X ( 0) =  i)V i and j, tj < t 2
Since v, is the rate at which the process leaves state i and p,y is the probability that it 
In words, the process is stationary or time homogeneous if the conditional probability 
then goes to j , it follows that qtj is the rate when in state i that the process makes a in (2) depends only on the time interval between the events considered, rather on the 
transition into state j \  and in fact we call qtj  the transition rate from / to j. absolute time. Note that ‘time-homogeneous’ and ‘stationary’ denote sameness in 
Let us denote by Pij(t) the probability that a Markov chain, presently in state i, will be lime. We can also know that a stationary Markov process is defined completely by the 
in state j  after an additional time t transitional probability function which we defined as
P j M = = M m  = '}
PijCO = p M O  = ;'/* ( 0) =  i }
17.6 The Exponential Process The fundamental equation for stationary Markov process is Chapman-Kolmogorov 
Let us consider a finite state but continuous time process. Let X (t) denote a random equation for p,y(t + r). By definition,
variable. The value of X (t) at fixed t is the state of the process at time t.
A time dependent process is the set (/(t)fo r given t > 0. X fo)depends on ^  > t0, fjtl(l + r) = P[XU + t) = j/X (0 )  =  0  
and not on t2* ><£2- The process is continuos if t can take value on the t-axis. = ^  P[X(t + r) = j ,X ( t)  = k /X {0) =  0  Marginal from joint
Definition k
Using Markov assumption
-  ^  P{X(l + t) = j /X { t)  = l(,X(0) = i) P{X(t) = k,X (0) = 0
k
312
313
UNIVERSITY OF IBADAN LIBRARY
But P{X{t) =  k ,X(  0) = /} = P{X(t) = k / X (0 )  = flP fvff)) = i}
Under the assumption that py(t) is a continuous function oft, we can express 
Therefore, Py(At)by the use ofMaclaurin series.
Pij(t + r) = P{X(t + t )  = j /X( t )  = k,X(  0) =  0  W O  =  k / X  (0) =  i}
k P u m  = p u ( o ) + + i p ^ ' t o x ^ o 3 + -
This is because PfA'(0) = t} = 1 
Thus =  PiyCO) +  Pi'yC0)^t +  OĈ Jt)2 
^  P « t  +  r) =  j /X ( t )  =  =  k /X  (0) =  /)
k LetpijiP)  =  Ay
By the stationary assumption in (2) Py(At) =  Pi/ (o) +  Ay At -I- o(At)2 for i *  j  
Plj(t +  r) =  £  P{X(t) =  y/X(0) = 4} P{X(t) = k/XQS) =  £} Py(At) =  Ay A t +  o(At)2 for i =  ;
k
= Pkj(j)Ptktf) (By definition) •
k Also, let p'y(0) =  Xjj
This is the general form of Chapman-Kolmogorov equation. 
Py(At) =  1 +  Py y At + o(At)2
A specified form of this is:
= 1 + Xjj At + o(At)2
P i j ( t  +  A t )  =  Y  P i k ( t ) p k j ( A t )  
k
The above is forward Chapman-Kolmogrov equation. Sincep-y(O) = 0 /o r  i *  j  is a minimum, Ay is positive. Also, since Py(0) 
For i = j  is a maximum, Ay is non-positive.
The forward Chapman-Kolmogrov equation is given as We can unite the forward Chapman-Kolmogorov.
Pij(At + t) =  ]T p ijt(d t)p k;(t) Py(t + At) = ^ P i k  (t)pky(4t) 
k k
= Py (t)pyy(At) +  ^  p ifc (t)pky(At) 
We expect the following to hold k * j
i )  0  <  P i j ( t )  <  1 f o r  a l l  t
ii) Pij{ 0) = W (  0) = j/X {  0) =  t) = l  , i = j
= 0 , i * j
And for any given i
iii) PikCO = I ,  W O  =  j / X (0) = 0  =  1
314 315
UNIVERSITY OF IBADAN LIBRARY
Winch wo can write as
dpjj(t)
P ,-/(t +  A t) =  PiyCOfl +  V c +  uC4t)2] +  ^  pik ( 0 [ 4 ; 4 t  +  0(At)2] dt
k * j i
= Pij(t) + Vij^jjAt + py (t)0 (d t)2 + Y j [ P i M l kjA t +  Pik(t)OOdt)2]
k*i £ p « (0 )  = °
Pij(.t +  At) -  Pij(t) i
At
Z f t = °
v -1 j
Pij(t)Xjj +  2 ^  Pik ( 0 Xkj
k*j X A«v =  xn  + 2 a ‘7 =  0
y y**
+ M £C ft) I t « jP a (t)O(^Q2
4 t d t
Thus since every of A (diagonal element) is non-negative, the diagonal element Ay 
dt must be equal in magnitude and opposite in signal to the sum of the other element in k
the same row. Ay is called the transition rate from i to j  for iit j .  Ay can be 
'I'he limit as d t -» 0
interpreted as the parameter of negative exponential distribution. For each Ay, the 
dPi/(0 V exponential distribution gives the distribution of time spent in a state i, given that j  is 
~ ~ d t~  = Z j Pik the next step. Thus if Ty is the random variable with Ayk
In matrix form.
(a= diagonal element) i ’(Ty) =  f'HJ
d m dp a it) So that Ay can be estimated as the inverse of a sample mean.
dt dt A= f t ; )
/ (* )  = <* c.-te
P M  = (pyM ) or f ( t )  = Ay-c " V .  x >  0 
B u t ,^ p „ ( t )  =  1 with mean
CO CO
d E(Ty)= J  t / ( ( ) d t  =  J  a ljer x‘l‘ dt  
dt ^  P./(0 = 0 0 0
317
UNIVERSITY OF IBADAN LIBRARY
C H A P T E R  20
= j - [ IN T R O D U C T IO N  T O  T H E  T H E O R Y  O F  G A M E S  AND O U E U IN G
M O D E L S
=-j—sinccr(2) = 1
A ij  A ij 1 Games Tkeary
(mines theory is a branch of Stochastic Processes that can be applied to a situation 
Suppose thill we have the likelihood such as business, stock trading, politics, and so on. where the person involved can be 
n
referred to as a player ox simply a gambler
«=1
18.2 Gambler’s Ruin
Consider a gambler who plays a game of chance against an adversary. Suppose that at 
-n lo g X tj- 'Y ^ X ij t the start of the game, the gambler deposits an amount in nairaZ. The adversary deposit 
dlogL _  n s r 1
0 N ■ /. in naira where N is the cumulated initial capital.
a i < 1. 'The role of the game is that if the gambler wins a game he takes N1 from 
-  = - =  t the adversary and loses same to the adversary otherwise.
2. The game terminates. If dither player loses all his deposits. When the 
gambler loses all his deposit, he is said to be ruined:
This implies that, 3. No game is jumped.
We cun pul the money on a number scale.
C  = Vf
Practice Questions
I. Considered a two-state process such as the operation of a loom for weaving I 2 .......... Z ............................................N
cloth. The two-state for the looms are 0, the loom is shut off and the operator is 
repairing it. And 1, the loom is operating and the operator is idle. Consider the 1 lie uaiii in loss is represented by movement along the scale. Gambler’s gain is 
operating and repair time as continuous. Assume that the constant proportionality is 3 represented by movement to the right observed and its loss represented by movement 
lor repair transition and 2 for breakdown transition. Find the probability distribution m the lelty observed.
of the repair and the operation time. No point .ii. tin. scale is jumped. Movement in either direction on the scale is by pure 
Obtain the general form of the Chapman-Kolmogorov (C — K) equation. chance. The movement along the scale can be seen as that of a particle that moves at
3. Show that Aiy transition from i to j ,  V i & j .  is 1/p
319
318
UNIVERSITY OF IBADAN LIBRARY
random forward and backward. Because of that the process is. known as random 
tvalk. The points on the scale represent the state of the process. Since P{R\Zj] — qZj, P[Zk) =  p and P{Z,} = q
Movement- to a point on the scale depends on the point the gambler (or adversary) is 
at currently. It is therefore a Markov Chain. Generally (10.1) can be written as
q-/, = VQxk + qq/(. 1<Z<N-1
We shall approach this problem by attempting the following questions:
Q1 - What is the probability that a gambler with the initial capital Z will be ruined? Systems like these are known as difference equation. We can write the unit factor on 
Q2 What is the expected gain of the gambler? the left asp + q = 1. Thai is.
Q3 - What is the expected duration of the game?
(p + q)qz2 = pq%3 +
18.2.1 Probability of Gambler’s Ruin pq-/2 + qq*2= pq/3 + qqXy 
Let p denote the probability that the gambler will move to the right of Z. That is, the q(qy.> -q/2) = q(q/2-qz3)
probability of winning a game.
Let q = 1 -  p, the probability of moving to the left of Z, that is losing a game (by the This implies that
gambler). Let the points on the scale be denoted by Z0lZv  ...,ZN and qZj, the 
</z, ~ qz2 = r (.qz2 “  ?z3). where r  = p/ q
probability of ruin given the initial capital Z,.
Thus we can have the following system of equation. 
for simplicity, let N = 5 (live naira). Assume that the initial capital by the gambler is q?.n -  qx, = r{q7) -  q/2)
*2
</*, -  q-/.2 = r(q*7 -  q/J
i----------------1--------------- 1--------------- r ------------------- l
z 0 z, z 2 z Z, N
The probability that the gambler will be ruined if his initial capital isZ2is qZ2 -  qz, = r(qz3 -  <?zJ
f>{R\Z2) = P{Zi,R} + P{Zl,R) -  q/< = r(q/A - q*s)
=  P(R\Z,)P{Z.3) +  P{R\7.JP{Z,) To unify these equations we define
q-,N=q^= o
We can write this as These arc boundary conditions on qy . This becomes
q/2 = pqx3 + </<?*,, i < z, < n  - i q-/3 - q/., = Tq/.,
320
UNIVERSITY OF IBADAN LIBRARY
This extends to other equation in the system 1 -  r ‘
~ <1*, = r ^ z , 1 - r  9z4
1 - r
-  </z2 = r 3g*4 =  + r >̂
1 -  r ‘
<7z0 ~ <?z, =  r 49z4 ~ 7 * ) qto
1 - r l
1 -  r 5
<?*« = A * ,
Thus, we have
Adding the equations, the result is gotten 
<?*, = ( l + r  +  r 2)qz<
<?z0 “  £/z4(r°  + r  +  r 2 +  r 3 + r 4)
Now,
This implies that
(1 +  r  +  r 2) ( l  -  r )  = 1 -  r 3
‘?2, = ’-fe .0 + l + r  +  r 2 + r 3)
= <7z4( r4 + r 3 + r 2 + r  + 1) So that.
1 -  r 3 
(1 + r  -I- r 2) =
If we sum up the identity we have 1 - r
l - r 5 = ( l 4 M r 3+ r 3 + r ‘ ) ( l - r ) Substituting for cfc4. the result follows, 
Thus 1 - r 3
I + /• + r  + r 3 + - l - / - 5r* (,Zl ' “ T*
1 - r
Meaning that And also,
</z, = 0  + r  + r 2 + r 3.)£/,„
Solving in the same manna we did tor qXj. we see that 
( 1 1 /  r /•" 1  H )  =  1|  -_   r-4
By addition Sueh that.
Qxy = 0  + r )<7z4 1 — r 1
322 323
UNIVERSITY OF IBADAN LIBRARY
So that 10.15 can be solved by
- I  ± (l -4 p q )l/? 
- I p
To simplify, we multiply the solutions
-1  + (l -4 p q ) !- - 1  + (l - 4pq) '- _  q
• 2p - 2p p
I his is ihe probability that the gambler will be rained, given his initial capilalZ.
Method of Difference ICquntion X  = j  i f  P *  q 
H‘ W/.> i +/*/*•-■ = 1 i f  p*q
Then by substitution, equation 10.12 becomes
I his is the same as
The general solution can be written as
I Tie particular solution o f 10.1 1 can be written as
I his becomes The boundary conditions 
. V 1 ' \c /X ' q0 = 1 and q N = 0 
That is, when Z= 0, qz — 1 
and when Z= /V, qz — 0
I )i vide by XY‘
This implies that 
A + B = 1, Z = 0
/4 + 5( % )  = °’ Z = yV
325
UNIVERSITY OF IBADAN LIBRARY
Solving the system of equations, we obtain In general1' ’ 
qz = A + BZ
Under the boundary conditions
q0 = I,- qN ==0 . at Z  = 0 and N respectively.
Substitute for A and B, we have
Thus,
>4 = 1 at Z -  0
and
A +  B N =  0 '  at Z  =  N
Thus,
Substitute for A and B
18.2.2 Gambler’s Expected Gain(G)
Possible Values q2 = \ - — ZN
Gain N -Z  with probability l - q z
Loss Z with probability q 7 Substituting for qz
e ( G ) = 4 - ( i - Z / n I - z
The expected gain is
£■((7) = [Combined Capital) (Probability of gain) - (initial papital) = n {z/ n ) - z
= N ( \ - q , ) - Z = 0
That is
18.2,3 Expected Duration of the Game
C(C/)= M (l-q z )-  Z
Assume that the expected duration of the game has a known value Dz . If the first trial 
If /;=</= -  ()r q + p * 0 results in a success, the game continues as if the initial position wasZ +  1 .
Now, the initial position is Z, so that 
We can write qz as a function of Z 
Dz = PDZ+, + q D 7,
<//., = /* //,: + </q, = f(% )  a constant
Under the condition that the first trial in a success 
Then the solution from the result of differential equation with constant coefficient is
D z = P D ^ + q D z_i +  \
(b. = Z
32C, 327
UNIVERSITY OF IBADAN LIBRARY
With boundary conditions Uz is given as 
A ,= *  D* = 0 UZ =A + BXZ
But (10.28) is the same as So that,
= P^Z+2+ ^D2+1
The complete solution is
DZ = V Z +YZ
Particular Solution
Where Uz is the general; solution and Vzis the particular solution. Let the particular solution be 
Vz = aZ
General Solution
Any difference Uz between any two solutions can be written as This means that we can write 
U z -p U iM + q U ^ a(Z + 1) = pa(Z + 2 )+ qaZ  +1 
This is the same as So that
___________ 1_________
U z . i = P u i . i + < l u z Z + 1 -  pZ -  2p — qZ 
The denominator becomes 
Let U2 = j r 2 1 -2 p  = q - p  (Since p + q = l) 
So that we have Therefore,
X M =pXM +qXz 1
Dividing through X z a = ------q - p
X  = pX 2 + q Substituting for a
This becomes a quadratic equation, which can be written as 
- p X 7+ X - q  = 0  
v  , -l±> /r-4pq The complete solution is
- 2p Dz = UZ+ Vz
Multiplying the solutions results in
x = l
P
328 329
UNIVERSITY OF IBADAN LIBRARY
The required boundary conditions are- - So that
A + B — 0, Z = 0, Dy — 1 Nq-p
M S w % A = —= —— ; Z = N, D n =0  
q - p 1
For Z — 0, Dz — 0 Substituting for A and B in (10.39) we have
A + B  = 0 J L  J L  (1 V
n _  Q - P  ,. Q - P  ^\Pp J'
L>Z ~  7  ~ 7 7 T  +  7------------- ---- +  '
For Z =  N,D n = 0
q - p N 'W
q - p  q - p
So that ' { %
183 Queuing Theory
The principal pioneer o f queuing system was A.R. Erlang, who began in 1908 to 
This results in study problems of telephone congestion for the Copenhagen Telephone Company. He 
was concerned with problems such as the following: A manually operated telephone 
exchange has a limited number (one or more) of operations when a subscriber 
attempts to make a call, the subscriber must wait if all the operations are already busy 
making connections for other subscribers. It is of interest to study the waiting time of 
subscribers e.g. the average waiting time and the chance that a subscriber will obtain 
service immediately without waiting and to examine how much the waiting times will 
So that we have be affected if the number of operations is affected or conditions are changed in any 
other way. If there are more or if service can be speeded up, subscribers will be 
^ r = 0 pleased because waiting will be reduced, but the improved facility will become 
- o expensive to maintain, therefore, a reasonable balance must be stntck.
Solving for A, 183.1 Applications of Queuing Theory
- N When persons or things needing the services of a facility or persons arrive at a service 
q - p channel or counter on the account that the facility or persons cannot serve all at a 
time, a queue or waiting line is formed. Examples of this include:
330 331
UNIVERSITY OF IBADAN LIBRARY
(i) cars arriving at a fuel station waiting to be served.
(ii) persons waiting at a bus station waiting to be checked in.
(iii) books arriving at a librarians desk.
(iv) patients waiting to see a doctor or community health dispenser.
(v) customers arriving at a departmental store (supermarket).
(vi) clients waiting to see the Customer Service Executive or Officer.
Arriving customers Served customers
Queuing theory is applied into every field of human endeavour. This is because there leaving
is no perfect service or treatment that can be meted out. Below are some of the fields 
of application: '  1
(i) Business -  banks, supermarket, booking offices, and so on. discouraged n
(ii) Industries -  servicing of automatic machines, production lines, storage, and so customers
on. leaving
(iii) Engineering -  telephony, communication networks, electronic computers, and 
so on. A QUEUING SYSTEM 
(iv) Transportation -  airports, harbours, railways, traffic operations in cities, postal (OR by Swarup et al. 1978, p505)
services, and so on.
(v) Others -  elevators, restaurants, barber shops, and so on. 18.3.3 Components of the Queue System 
A queue situation can be divided into five elements. These are:
18.3.2 Concept and Definition (i) 'Arrival mode
Queuing theory is concerned with the design and planning of service facilities to meet (ii) Service mechanism
a randomly fluctuating demand for service in order to minimize congestion and (iii) Service channels
maintain economic balance between service cost and waiting cost. The cost here (iv) System capacity
refers to time. (v) Queue discipline
A queuing system is composed o f customers arriving at a service channel and is 
attended to by any one or more o f the service attendants. If a customer is not served (i) Arrival Mode -  this refers to the rate at which customers arrive at a service 
immediately he may decide to wait. In the process, however, a few customers may centre and the statistical law which governs the pattern of arrival.
leave the line if they cannot wait. At the end of the process, served customers leave Certain definitions pertaining to the arrival of customers:
the system. bulk or batch arrival: more than one arrival allowed to enter into the system
simultaneously.
balk: customers deciding not o enter a queue because it is long or lengthy. 
renege: customer leaving a queue due to impatience.
332 333
UNIVERSITY OF IBADAN LIBRARY
Symbols and Notations
jockey, customer jostling among parallel queues. We shall employ the following symbols and notations this lecture:.
stationary, arrival pattern which does-not change with time. n  = . number of customers in the system, both waiting and in service, 
transient: a time-dependent arrivalj>ro«ess. A = average number o f customers arriving per unit o f time 
The arrival mode is always denoted by M. . average number of customers being served per unit of time
A
• r  4 traffic intensity
(ii) Service Mechanism -  this refers to the number o f  service points that are 
available and the duration o f service. When the service points or servers are infinite, c = number o f parallel service channels (servers)
the service will be instantaneous, which will result in no queue. In case of finite E ( n )  = average number o f customers in the system, both waiting and in 
points, queue is inevitable. Customers can be served according to a specific order, service
which may be in batches o f fixed size or of variable size. This system is called bulk E (m ) = average number o f customers within in the queue
service system. E (v) = average waiting time of customers in the system, both waiting and in
service.
(iii) Service Channels -  where there are more than one channel of service, then £ 0 ) = average waiting time of a customer in the queue
arrangement of service may be in parallel or series, or a combination of both, P n (  0 = probability that there are n  customers in the system at any time t,
depending on the system design. both waiting and in service.
^n = time independent probability that there are n  customers in the system, 
(iv) System Capacity -  most queuing system are limited in such a way that both waiting and in service.
waiting rooms are all accommodating. This gives limit to the number o f customers 
that can be accepted to the waiting line at any given time. Such situation gives rise to 18.4 The Basic Queuing Process
f in ite  source queues, and results in forced balk. The statistical pattern by which customers arrive over a period o f time must be 
specified.
(v) Queue Discipline -  this is a method of customer selection for service when a It is usually assumed that they are generated according to a Poisson process that is, the 
queue has been formed. The different forms of discipline include: number of customer who arrives until any specific time has a Poisson distribution. 
(ai) First Come, First Served (FCFS), or The Poisson distribution involves the probability of occurrence of an arrival and is 
(aii) First In, First-Out (FIFO) independent o f what has occurred in the preceding observation. This Poisson 
(b) First In, LasvOut (FILO) assumption indicates the number of arrivals per unit time(A) (or mean arrival rate), 
(c) Last In, First Out (LIFO) while1/^  on the lengthy o f interval between two consecutive arrivals. This time 
(d) First in. First Out with Priority (FIFOP) between two consecutive arrivals is referred to as "inter-arrival time.”
(e) Selection for Service In Random Order (SIRO) The mean service rate /i is the number of customers served per unit time whole 
average service time (V jt) >s the l'me un‘ts Per customer service time delivered is
335
334
UNIVERSITY OF IBADAN LIBRARY
18.6 Classification of Queuing System
given by an experiment distribution where the servicing of a customer takes place Queuing systems, generally, may be completely specified in the following symbolic 
between the timet andt + At.
forms:
(a|Z>|c):(d|e)
18.5 Poisson Process and Exponential Distribution
Description
In queuing theory, the arrival rate and service rate follow a Poisson distribution. 
First symbol (a) -  type of distribution of inter-arrival times
However, it should be noted that the number of occurrences in some time interval is a 
Second symbol (b) -  type of distribution of inter-service times
Poisson random variate, and the time between successive occurrences is an 
Third symbol (c) -  number o f servers
exponential distribution. Both are equivalent
Fourth symbol (d) -  system capacity
Fifth symbol (e) -  queue discipline
18.5.1 Axioms of the Poisson Process
Given an arrival process [ N ( t ) , t> 0 ] ,  where N (t) denotes the total number of For the first and second symbols, the following letters may be used:
arrivals up to time t, N (0) =  0. an arrival characterized by the following assumptions M =  Poisson arrival or departure distributions
(axioms) can be described as a Poisson process; Ek = Erlangian or Gamma inter-arrival or service distribution
G1 =  General input distribution
AXIOM 1 - the number of arrivals in non-overlapping intervals are statistically 
G = General service time distribution
independent. This means there is independent increment in the process.
An example of a queue system is
AXIOMS 2 - the probability of more than one arrival between time t and time (M\Ek\Cy.(N\SIRO)
t + At is o(At); this means there is negligibility in the probability of two or more 
arrivals during the small time interval At. This implies that Queuing system is classified into
p0(AO +  p1 (At) +  o(At) =  1 (i) Poisson Queues
(ii) Non-Poisson Queues
AXIOMS 3 - the probability that an arrival occurs between time t and time t  + At
isAAt + o(At). This implies that 
Definitions
Pi (At) =  A At + o(At) Transient State: When a queuing system ahs its operating characteristic (e.g. input, 
Where A, a constant, is independent o f /V(t), At is an incremental element, and output, mean queue length, etc) dependent upon time, then it is said to be in transient
o(A t) represents the terms such that state.
Ahtm- 0  —°(-A—t)  = n0 Steady State: This is a queue system that is independent of time.At Assume Pn(t) to be the probability that there are n customers in the system at time i,
then the steady state use becomes
337
336
UNIVERSITY OF IBADAN LIBRARY
lim| -nP*n (/) = Pn (independent of t) This can be re-written asP.(i + A/) = PH( l \  1 -  AA/ + 0(At)][l - //At + 0(At]+ />„(r)[AA/][/zAt]+ P „ ^ h ^ x + °<At)] 
Meaning that [l -  XAt + 0(A/)]+ P„_x(/)[AA/ + 0(A/][l - / jA\ + 0(At]+ o(A/) .
l i m -  Pn(/) = 0 
'-»« dt This leads to
p.(l) = P -. (0M + 0(A/) n > 1
18.7 Poisson Queues
18.7.1 The M\M\1  System
Suppose n = 0, we have
This deals with the process where arrivals and departures occur randomly over time 
P0{i + A/) = P0(t Jl -  XAt + o(At)]+ Px (rXl -  + °(A0]
generally known as birth-death process.
[//A/ + o(A/)] + o[A/]
1. Model 1: (M |A f|l):(o o |FIFO)
In this model, we have Poisson input, exponential service, single channel, infinite = P0(/J l-/lA /]+ P l(t)/^ r + o(A/)
system capacity and first in first out basis.
If P„(t), be the probability that there are ncustomers in the system at time t, then in We can record the difference equation
order to write the difference equation for P„{t), we first consider how the system can
P„ (/ + At) -  P„(/) = - U  + p)A tPn (/) + /iA/ Pntl (/) + XAt Pn,  (/) +o(A/); n > 1
get to state En at time t + At. To be in state En of time / + A t , the system could have and
been in the state £nat time t and have no arrivals or service competitions in Ator be in P0 (/ + A/) •- P0 (l) = -  XAtP0 (t)+ (iAt P, (/) + o (At)
state £n_i of time t  and have, during A,, one service completion and no arrivals. If
Then,
we assume that n > 1 (having arrivals and service independent of each other), it
can be easily seen that lira P|' ('  + ^ '  = - U  + m ) P J ' )  + 0 )  + V>,„(/) + o(At)A t
a n d
Pn (t + At) = P„(t). P(no arrivals in At). P{no service completions in  At)
lim - ^ , ( / ) + / iP ,( 0  + o(A t)
+Pn (0 - P{one arrival in  At).P (one service in  At) Ai—*0 A/
T^n + l (t). P(one service com pleted in  At). P(_no arrivals in  At) 
+Pn-i( t) .P (o n e  arriva l in A t).P (no  service completions in  At) +  o(At) So that we have
n >  1 - P j n = P ( t ) = A A + p ) P jn  + p P J l )  + J.P-i(t) n > I 
dt
and
338 339
UNIVERSITY OF IBADAN LIBRARY
In general we have
~dt p„w  = K ( ' ) = - K ( ' ) + mP M P. = 1 - 1  P, Vn
P.
The above are known as difference equations in n  and t. The steady-state solutions Proof
for Pn in the system at an arbitrary point of time is obtained by taking the limit as By mathematical condition, we have 
/  —► oG..
rp,, *\ n >1 r n ~ - pr n- \ »
If the steady-state exists (/l < /j, as t - » co), then P P
P„(t) —> P„ and Pn (/) —> 0 as t —»oo
A + p '
- i
P P \ P
I f  A = p  there exist no queue X -'+ fiX 1 A"
If  — > I we have an explosive state 
P ’
Using the condition of steady state,we have = -  Pr
0 = -(A + ^ + / / P „ , + ^ _ , ; n > 1
9 .
Using the boundary condition; Z ^ .  = 1. then 6.5 becomes
and
0 = -AP0 + fjPy >=Z P* = ^ o Z
n-o l/'J
Using iterate procedure we have
= P., Sum o f  geometric series where — < 1
'pi  ^ p' ii
P 1 - P
P
' A >
p2 = f x + » V - a k  =
p  J p = />
V - p  J
f - 1 This implies that p ,=
p  ; p l / ' J 1 - P
Resulting in the steady-state
P„ = p" (1 — p \  p  < 1 and n >0
341
340
UNIVERSITY OF IBADAN LIBRARY
This is the probability distribution of queue length. (jii) Average queue length
Characteristics of Model 1
£ ( » . ) = I » p.;
(i) Probability of queue size greater or equal to n.
where m = n -1  (that is number of customers in queue 
/>(* „ ) = ! > , .  = z o - p ) p i minus customer in service)
A' *n K =<i
= (1 - P ) P '± P 1- = i > - i ) P „ = i > r , , - l P ,
t=n r
= 0  - p ) p "  t p ‘ ~ - 2
K - =( n-0
> p . -
i i i
= 0j ^ v = . * 7 ^ — [ i - ( i - p ) ]
1-P 1- P
(ii) Average number of customers in the system ■ r b - ’
. P 2
E ( n )  =  Y j n  PB = £ n (l-p )p " 1 - P
n=0 <i=0
(iv) Average length of non-empty queue
= 0 -  p )£ "  p„ = p O -  p )S «  p*’1
n=0 n*l E(m / m > 0) = ^ m-  ■
V '  P(m > 0
££de 1 P
p - /1
= p ( i - p ) f S p - , Since /? < 1
P ( P - * )
d e ^
= P ( ' - p ) LO-p)-J This is because P (m > 0) = P(n > l) = ]jT P(1 -  /}, -  Px • L"»u
P (v) The fluctuation (variance) of queue length
1 - / 9  ^ - / l
T ( * )  =  I« = o [ n - i r (n ) ] 2P„
= E^=on2P n - [ ^ ( n )]2 
By algebraic transformations,
P(n) = ( l - p ) £ £ - [ £ f
_  P 
( 1 -P )2
342 343
UNIVERSITY OF IBADAN LIBRARY
V'.-(O) :P(w = 0)
=
(n-W = P (No customer on the systemn upon arrival)
Example 18.1 To find y/uin for / > 0,' we suppose there be n customers in the system upon arrival, 
A TV repairman finds that the time spent on his jobs has an exponential distribution l or a customer to go into service at time between 0 mid t, it means all the customers 
with mean 30 minutes. If he repairs sets in the order in which they come in and if the 
must have been served at time t.
arrival of sets is approximately Poisson with an average rate of 10 per day.
Therefore,
(i) What is the repairman’s expected idle time each day?
(ii) How many jobs are ahead o f the average set just brought in? t//i (,)=: p [(n -1) customers are served at timet) P [one customer being served in timedt]
Solution {idt
2 = — = — . setsperhour 
8 ' 4 The waiting time w is therefore 
w < t]
// = ^>V60 = 2 setsperfhour
(i) The probability o f no unit in the queue is asZ»t’i^ J o^ - ( / ) + V'-C«)
Po n 8 8
Hence the idle time for repairman in 8 hour days
= - , 0  = 3 hours „ z l  ( « - i ;
8 i
= (l -  p ) p  - //t (l - /o)dt + (l - p)
E(n) = - V j o b s  o(ii)
2 - /V 4 3 -  \ -  pe I> 0
18.7.2 Waiting Time Distribution for Model 1 The distribution of waiting time in queue is
Waiting time is mostly a continuous random variable and there is a non-zero I " P  / = 0
probability of delay being zero. Denote time spent in queue by w. Let (/„.(/) be the ^ {,) = 1 - / » / > o
cumulative probability distribution so that from a complex randomness of the Poisson, 
we have
345
344
UNIVERSITY OF IBADAN LIBRARY
Characteristics of Waiting Time Distribution for Model 1
( i) Average waiting time of a customer (in the queue) (iv) Average waiting time that a customer spends in the system including service
X
E(v) = |  t.\//(wl w > 0)ir
£ ( h -)  =
0 o
u>
= f  tpp{\ - p)
o 0
I *
= -------[ x e's dx, for (// - X) = x
P - K
1
P _ A p - X
p ( l~ p )  p (p ~ X )
Relation between Average Queue Length and Average Waiting Time
(ii) Average waiting time of an arrival that has to want (Little’s Formula)
E (w /w >  0)= A2
p[w>  0) E(m) p(p-x)
E(w) = A £ (v )= —!— 
p ( p - * ) \ /  p  p(p ~ x) P - X
1
It can be seen that E(n) = A E(v), E(n) = X E(w) and E(v) = E(w) + —
P ~ X P
Example 18.2
We note that P(w  > 0) = 1 -  P(w  = 0)= 1 - (l - p ) = p Amvals at a telephone both are considered to be Poisson with an average time of 10 
minutes between one arrival and the next. The length of a phone call is assumed to be 
(iii) For the busy period distribution, suppose v  is the random variable denoting the distributed exponentially with mean 3 minutes.
total time that a customer had to spend in the system including service. This makes (i) What is the probability that a person arriving at the booth will have to wait?
the cumulative density function to be (ii) The telephone department will initial a record booth when convinced that an 
• arrival would expect waiting far at least 3 minutes for phone. By how much
v{w /w >  0) = — U ; where ^(w ) = [ipw (/)] 
P[w > 0) at should the flow of arrivals increase in order to justify a record booth.
A /
l  P ) / l / 'J
t .> 0
347
346
UNIVERSITY OF IBADAN LIBRARY
Solution
E(v) = \  E(n) = —
We are given A u-A.
This result applies to the FIFO SIRO and LIFO cases. These three queue discipline 
^ = K) = 0, 1®Person Per minute sonly differ in the distribution of waiting time when the probabilities of along and 
and short waiting times change depending upon the discipline used. When the waiting 
time distribution is not required, the symbol GD(general discipline) can be used to 
p  = ̂  = 0.33 person per minute
represents the three queue disciplines above.
(/) P(w > o ) = l - / >u = l -  f l -  —
l  P ) 18.7.4 Modellll (M |M |l):(A f|F /FO )
_ A _  0.01 There is a deviation from the previous model 1 (especially 1) because the number of 
~ M ~  0.33 customers is now finite (W). As long as n < N, the difference equated o f model 
= 0.33 remains valid for this model. If the system is in state Ew, then the probability of an 
(ii) The installation of record booth will be justified if the arrival rate is greater arrival into the system is zero.
than the waiting time. Then the length of queue will go on increasing. Thus, the additional difference equation for n =  N becomes
Now, E(w) = , ^— r = 3 p „ { t + a/ ) = p H ( i )  [i - M 'l+ ^ v - i ( 0 - M i  - H + < > (a 0
MKM-A)
A1 resulting in the differential-difference equation.
0.33 (0.33-A1)
Where E(w) = 3 and A = A'(w) for record booth. On simplification this yields 4at p n( / ) = - / / P n C 0 + ^ ^ - , « )
A1 = 0.16. hence the arrival rate should become 0.16 person per minute to justifies the and gives the resultant steady state difference equation 
record booth. 0 = - / i Pn + A P n, ( O
18.7.3 ModelII(Af |M |1): (oo|S //?0) Given the interval 1 < n < N  -1 , the complete set of steady-state difference equations 
This model is similar to model 1. The only difference is in the service discipline. The for this model is as follows.
first follow the FIFO rule, while this follows the SJRO rule. We recall that the /^ ,= A P 0
derivation of Pn for model I does not depend on any specific queue discipline, it may pP.,.i = (A + ai)P, -  A P„ ,
then be concluded that for the SIRO rule case, we must have. 
p„ =( ] -  p)  p " , n >0 P*3,. = A*\ ,
The average number of customer in the systemv£(n) remains the same irrespective of 
cases, FIFO or SIRO. Provided P„ remains unchanged, £ (n ) remain the same in all 
queue discipline, thus
348 349
UNIVERSITY OF IBADAN LIBRARY
\s in model I, by iterative procedure, the first two difference equations are 
0 -/g )p "
P« = ( j j  P „ '.n < N -l 1 „ N * I P *  1
; 0 < n < N
n (he same manner, the value of Pn holds for the last difference equation if  n = N. 
Thus, we have
N + 1 (p  = 0
= p"  P0; n < N Note that the steady-state solution exists even for p > 1. Intuitively, there is sense in 
Using the boundary condition, we can obtain the value of P0. this since the process is prevented from blowing up by the maximum limit.
N Thus, given N ->■ co, the steady-state solution results in 
Boundary condition is ^  P = P,
n=0 P„ = (l -  p)p" n< co
Thus
Which is the same as that in model 1.
1 = ^. 2 > n
i - p > Characteristics of Model III
{ i - (i) Average number of customers in the system is given by p  
/> (N + j) E(n) = Y inPil =P„YJnp"
n “II n “0
Thus,
t!>dt dp
\ - p N+l
1 - p * «  
P0 = P'>P Tdp  L i1 *
 -P
N  + \
( I - P ) :
Hence p [l-(N  + l ) p N + N p H*'\
( M O V )
350 351
UNIVERSITY OF IBADAN LIBRARY
(ii) Average queue length
We know that Pn =  p(> e", thus 
£("0 = Z l ( ” _1) Pn = E (n) ~d=Y .P'■ n= I (fl) P, =(0.53) (0.5) = 0.27
= £ ( « ) - ( ! -P „ ) P2 = (0.53) (0:5)2 = 0.13
P3 = (0.53) (0.5)3 = 0.07
(b) E(n) = 1(0.27)+ 2(0.12)+ 3(0.07) = 0.74
_ p 2f l - A T p " - , + ( A f - l ) /) K l
V p ) ( i > ) Hence, the coverage number of trains in the queue is 0.74, and each train takes on an 
(iii) Average waiting time. average 'A (0.085) hours for getting service. As the arrival of new train expects to 
Using Little’s formula: find on average of 0.74 trains in the system before it.
E(w) = (0.74) (0.085) hours
E{v) = ~ ^  where A1 is the mean rate of customers entering the system and is equal 
A = 0.0629 hours or 38 minutes
to a ( i -/> ,.)
18.7.5 Model IV (Birth- Death Process)
Thus, E(w) = E(y) -  — =
P X Assume the system to be in date En, the probability of a birth occurring in a small 
Example 18.3 time interval At  is considered as AnAt + o(At); and that of the death is considered as 
At a railway station, only one train is handled at a time. The railway yard is sufficient finAt  + o(At),n > 1. The system being in En at time t  means it will remain in En at 
only for two trains to wait while the other is given signal to leave the station. Trains timet -F At provided there is no birth and no death/on birth and one death, or the 
arrive at the station at an average rate of 6 per hour and the railway station can handle system might have been in E ^ a n d  had a birth, or in En+1and had a death. Thus, this 
them on an average of 12 per hours. Assuming Poisson arrivals and exponential result in
service distribution,
(a) Find the steady-state probabilities for the various numbers of trains in the Pn (t +  At)  =  Pn (t) (1 -  AnA t -  o(At)){\  -  iinAt -  o(At) ) 4- Pn+1(t) (Mn+i^t 
system. + o (A t) ) ( l  -  An+1At -  o{At)  + Pn- i ( t )  (An_aAt + o (A t))(l 
(b) Also, find the average waiting time of a new train coming into the yard. -  lin_xA t - o { A t )  + o(At),  n >  1
Solution P0 (t +  At)  = P0( t ) ( l  — A0A t- o(At)  + P^O C /ijA t -H\o(At)) + o(At), 
n = 0 \
2 = 6 / /  = 12, p  = — = 0.5 
12 Dividing by At, and taking limit as At -*0, the diffential -  difference equations results
Probability of no train in the system (both waiting and in service is
d
P„. = ——-^-7  = — = 0.53 “JT^nCO — ~ (A n +  P)i)Pn(t)+  Pn + l ̂ >1+ 1 (0  T tl >  1
I - / / *  1 -  (0.5)’*
352
353
UNIVERSITY OF IB
DAN LIBRARY
and By mathematical induction, one can prove that this formula is correct 
"h Pn n ^n -l
d ^n+1 — P n - ' ^ 71-1Pn+1 Pn+1
d t P° ^  ~  ~*nPo(t) +
7 1 -1
Since Pn (t)  is independent o f time, the steady-state solution =1 nipi-1=0 +i
^ P n (t) = 0  and the differential-difference equation reduce to
o =  -  (^n "h Pti )  +  P ti+1 n+l +  1* H >  1 Making use of the boundary condition, we obtain PQPn P ^ n - lP n -
and <r. w
0 =  -A 0P0 +  /ijPj '71Y =J 0 Pn = 1 orpo +7 1Y =  0p T , =  1
-1
Consequently by interactive procedure as in model 1 thus Po =  [ l  +  £ n = l  n w c ^ ) ]
p  - h p  
1 ~ p / ° If the R.H.S in a divergent series, p 0 =  0. If the R.H.S in a divergent series, p0 will 
have its value defining on Aj‘s and p ,’s.
Special case
-  A^O n I. WhenA,, =  X fo m  > 0. andpn =  p f o m  > 1 
P2P1 °
then
A _= h ± P 2 n A1
.3
^2^ 1^0 
Po = l - p  
P3P2 Pi Thus
pn = pn(  1 -  p), fo r n  > 0
So that in general
'71 — ' 0 (same as model 1)n _  ^n-l^n -2  —^0 n
. PnPn- i ~Pi II. When Xnn  =  —n+l f’o r n  > 0 ,
andpn = p fo rn  > 1
Then
-1
= n,"=ro #-*7J+-1 7̂0 . » 2  1 V  A" Po = 1 +  Z-,1n ! p nn
355
354
UNIVERSITY OF IBADAN LIBRARY
It is given that the service increases with increase in the number of persons. 
-1 Thus, pn =  np. where there are n  persons.
1  + p  +  | p 2 + j f />3 +  - ]
= e~P X OP
Thus £(7.) =  ^n =n0  p „ = n^=0n . ( - p " ) e -<’
Pn = { ^ . pn) e ~P f o r n ^ °
Here we can see that pn follows the Poisson distribution where p = - .  But, p > 1 or e~p.p .e p = p
p < 1 most be finite. =  5/g persons
III. When An =  Afo rn  > 0 , andpn =  np_ fo rn  > 1 The average solving time is inversely proportional to the number of people solvingon
Then
1 the problem is given by day problem.i-iZ Xn Expected time for a person entering the line jVPo =  + n!
71 =  1 ^ E(?T) = -l  day or 8 hours.
=  e~p
Thus Practice Questions
1. Derive, using both methods, the probability that a gambler will be ruined 
p* m G s pn) e ~p f o r n  ~ 0 given that his initial capital is Z.
Here, service rate increase with increase in queue length. Hence it is known as the 2. Show that gambler's expected gain is given asN (l — qz ) -  N.
queuing problem with infinite number of channels= (M\M\co): (oo|F I F O ) 3. Under what condition can the expected gain be zero?
4. Company A enters into a project deal with another company B. 4 's  initial 
Example 18.4
deposit is /V577t, while f?’s initial deposit is NAm. For every success, A gains 
Problems arrive at a computing center in a Poisson fashion at an average rate of five more naira from 6 , otherwise it loses same to B. If the probability of success 
per day. The rules of the computing center are that any man waiting to get his is 0.7, what is the probability of losing the entire deal?
problem solved must aid the man whose problem is being solved. Tf the time to solve 5. A gambler's initial fortune is t. On each play of the game the gambler wins 1 
a problem with one man has an exponential distribution with mean time of 'A day, 
with probability p, or loses 1 with probability 1 — p. He or she continues 
and if the average solving time is inversely proportional to the number people 
playing until he/she is n ahead (that is, the fortune is t +  ?t). or losing by m. 
working on the problem, approximate the expected time in the center for a person 
Here 0 < i -  m  and i + n < N. What is the probability that the gambler-quits
entering the line.
as a winner?
Solution
A = 5 problem per day, p = 3 problems per day
356
UNIVERSITY OF IBADAN LIBRARY
6 . Given an initial capital, Z, show that expected duration of the game is be the number of customer in the system (serviced and waiting) immediately 
M after an event. Suppose that an event is equally likely to be an arrival or a N 1 - completed service.
q - p  q - p Ml (a) State the transition graph and transition matrix and find the stationary distribution.
Describe the model 1 of the M|Af llqueue discipline, and show that (b) If a customer arrivers. what is the probability that he finds the system empty? 
(a) the average number of customers in the system is given as Full?
^2 (c) If the system is empty, the time until it is empty again is called a “busy 
(b) the average queue length is given as —.
period". During a busy period, what is the expected number of times that the 
8. In the M |M |1 system of a queuing process, show that the system is full?
(a) steady state probability of model 1 is Pn =  pn( l  - p ) ,  where p <  1 and n > (d) Show that a limit distribution is a stationary distribution.
0.
(b) the waiting distribution is given as
( 1  - p . t  =  o
m  =
[ l - p e - r t ' - P * ,  t >  0
9. SAO Super market has one cashier at its counter. The service discipline of the 
cashier is FIFO. It is observed that the supermarket has 18 arrivals on average 
of every 10 minutes while the cashier can serve 12 customers in 6 minutes. If 
the distributions of arrivals and service rates arc poisson and exponential 
respectively. Calculate.
(a) The traffic intensity and interpret the figure obtained
(b) The average number of customers in the system
(c) The average queue length
(d) The average time a customer spends in the system
(e) The average time a customer waits before being served
10. Customers arrive at an ATM where there is room for three customers to wait
in line. Customers arrive alone with probability and in pairs with
probability ^  (but only one can be served at a time). If both cannot join, they 
both leaver call a completed services or an arrival an “event” and let the slate
358
UNIVERSITY OF IBADAN LIBRARY
R E F E R E N C E S
Amah in. u . N.(2()07). STA 211- Probability II. Ibadan Distance Learning Centre 
Rosen. K. H., Michaels. J. G.. Gross. J. L.. Grossman. J. W. and Shier, D. R. (2000). 
Series. Distance Learning Centre. University o f  Ibadan, Handbook o f Discrete and Combinatorial Mathematics. CRC Press.
Alawode. (). A. andShittu 0 . 1. (2011). Probability III. An Unpublished lecture note Odeyinka. J. A. and Oscni, B. A. (2008). Basic Tools in Statistical Theory. Highland 
developed lor the Ibadan Distance Learning Programme. Publishers.
Bhat, B. R. ( 1985). Modern Probability Theory: An Introductory Textbook. 2"'1 Olubusoye, O. F.. (2000). UnpublishedLec/u/v notes on STA 311.
Edition. Wiley Eastern Ltd.. Bombay. Roussas, G. G. (1973). A First Course in Mathematical Statistics. Addison-Wesley 
Bhat U. N. (1971). Elements of Applied Stochastic Processes. John Wiley & Sons Publishing Company.
Ltd.. London. Shittu. O. I. (2011 ).UnpublishcdAG/c.v on Probability 11. 
Attenborough. M. (2003). Mathematics for Electrical Engineering anti Computing. ii'ii'ir. wikinedia.ore
Nevvnes. Elsevier. Linacre House, Jordan Hill, Oxford. Encyclopaedia of Physics (2nd Edition), R.G. Lcmcr. G.L. Trigg. VHC publishers. 
Brualdi, R. A. (1999). Introductory Combinatorics. Pearson Education Asia Limited 1991, ISBN (Verlagsgesellschaft) 3-527-26954-1. ISBN (VHC Inc.) 0-89573-752-3
and China Machine Press.
Bunneheka BMSG. Ekanayake CiEMUPD (2009) "A new point estimator for the Feller W. (1970). An Introduction to Probability theory and its applications. 3rJ 
median of gamma distribution". Viyoihtyu J  Science. 14:95-103 Edition. John Wiley and Sons Inc. New York.
Fisz M. (1963): Probability Theory and Mathematical Statistics. John Wiley and Sons, 
Barry C. Arnold (1983). Pareto Distributions. International Co-operative Publishing New York.
I louse. ISBN 0-N9974-Q12-X.
Fisher Snedccor (19.38). The true characteristic function of the F distribution. 
Dan Musa (2010): http://probabilityandstats.wordpress.com/2010/02/18/the- Biometrika. 69. 261-264.
malching-problem/ Ilori S. A. and Ajayi O. O. (2000). Algebra. University Mathematics Scries. Y-Books 
Grimaldi, R. P. (1999). Discrete and Combinatorial Mathematics. Pearson Addison- (a
Wesley.
Gupta. B. D. (2001). Mathematical Physics. Vikas Publishing House PVT Ltd. Johnson NL, Kolz S. Balakrislinan N (1994) Continuous univariate distributions Vol 
Hogg. R. V. and Craig. A. T. (1970). Introduction to Mathematical Statistics. New 1. Wiley Series in Probability and Statistics.
York: Macmillan Publishing Co.
Krishnamoorlhy, K. (2006). Handbook o f  Statistical Distributions with Applications. Krishnamoorlhy K. (2006): Handbook of Statistical Distributions with Applications. 
Chapman^ llall/C'RC. Chapman A-1 la 11/CRC
Merris. R. (2003). Combinatorics. Wiley-lntcrseienee.
Mood. A. M.. ( iraybill. F. A. and Boos, D. C. (1974). Introduction to theory o/ Lord, Nick. (July 2010). "Binomial averages when the mean is an integer", IJie 
Statistics. McGrow-1 lill Inc. Mathematical Gazelle 94. 331-332. division of Associated Book Makers Nig. Ltd.)
3f> I
UNIVERSITY OF IBADAN LIBRARY
. Muiu.ll I (2(M)S).SVt///.v//a// Physics (2nd Edition), I . Mandl, Manchester Physics, John 
Wiley & Sons. ISBN 97X0471915331 Ross S. (1988). A First Course in Probability. Macmillan Publishing Coy. New York. 
Ross. Sheldon M. (2009). Introduction to probability and statistics fo r  engineers and 
Maxwell. J.C. (I860) Illustrations of the dynamical tneory ot gases. Philosophical scientists (4th ed.). Associated Press, p. 267. ISBN 97S-0-12-370483-2.
Magazine 19. 19-32 and Philosophical Magazine 20. 21-37.
Shangodoyin. D. K... Olubusoye. O. E., Shittu, O.l. and Adepoju. A. A. (2002). 
Mood, A. M., Graybill, F. A. and Boes. D. C. (1963). Introduction to theory o f  Statistical Theory and Methods. Joytal Printing Press, ISBN: 978-2906-23-9 
Statistics. McGraw-Hill Books Coy. Swarup K... Gupta P.K.. and Mohan M. (1978): Operation Research. Sultcm Chant/and 
Moners P. and Peres Y. (2008). Brownian Motion. An Unpublished lecture note on Sons. New Delhi. Second Ed. Reprinted.
the web.
Nadarajah, S. and Kotz, S (2006).The beta-exponential distribution. Reliability Udofia G. (1997): Lecture Series on Stochastic Processes. University' ofUyo, Uvo. 
Engineering and System Safety, 91(1 ):689-697. Nigeria. (Unpublished)
Neumann. P. (1966). "Uber den Median der Binomial- and Poissonverteilung". 
Hissenscluifiliche Zeitschrift tier Technischen Universitat Dresden (in German) 19: Udomboso. C. G. and Shittu O. 1. (2011). Stochastic Processes. A lecture note 
29-33. developed for the Ibadan Distance Learning Programme.
Ugbebor. O. O. and Bassey U. K. (2003). Mathematics for Users. University 
Odeyiiika J.A., and Oseni B.A. (2008): Basic Tools in Statistical Theory. Highland Mathematics Series. Y-Books (a division of Associated Book Makers Nig. Ltd.) 
Publishers Ugbebor, O. O. (2010). Unpublished Lecture notes on Probability in Mathematics and 
Statistics Departments, University of Ibadan.
Ollolsson P. (2005): Probability, Statistics and Stochastic Processes. John Wiley and Wadsworth, G. P. (1960). Introduction to probability and random variables. USA: 
Sons, U.S.A. McGraw-Hill New York. p. 52.
Rogers L. C. G. and Williams D. (1994): Diffusions. Markov processes and 
Martingles, Vol. I . second ed„ Wiley Series in Probability and Mathematical Young. G.A. and Smith R. L. (2005). Essentials of Statistical Inference. Cambridge 
Statistics: Probability and Mathematical Statistics, John Wiley & Sons Ltd.. University Press
Chichester, 1994. Foundations.
Young, H.D and Freedman R.A. (2008): University Physics -  With Modern Physics 
Rogers L C. G. and Williams D. (1994): Diffusions, Markov processes and (12th Edition), (Original edition), Addison-Wesley (Pearson International). 1st 
Martingles. Vol. 2, Cambridge Mathematical Library, Cambridge University Press. Edition: 1949, 12th Edition:, ISBN (10-) 0-321-50130-6, ISBN (13-) 978-0-321- 
Cambridge. 2000, Ito calculus. Reprint of the second (1994) EDITION. 50130-1
Roussas G.G. ( 1973): A First Course in Mathematical Statistics. Addison-Wiley 
Publishing Company
362 363
UNIVERSITY OF IBADAN LIBRARY