














Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Notes; Class: ST: Prog Analy &Mechanization; Subject: Computer Science; University: University of New Mexico; Term: Fall 2007;
Typology: Study notes
1 / 22
This page cannot be seen from the preview
Don't miss anything!
Marginal probabilities
If you have a joint PDF:
... and want to know about the probability of just one RV (regardless of what happens to the others)
Marginal PDF of or :
1
X 2
1
2
2
2
X 1
1
2
1
Conditional probabilities
Suppose you have a joint PDF, f ( H , W )
Now you get to see one of the values, e.g., H=“ 183cm ”
What’s your probability estimate of A , given this new knowledge?
w
From cond prob. rule, it’s 2 steps to Bayes’ rule:
(Often helps algebraically to think of “given that” operator, “|”, as a division operation)
Maximum likelihood treats parameters as (unknown) constants
Job is just to pick the constants so as to maximize data likelihood
Fullblown Bayesian modeling treats params as random variables
PDF over parameter variables tells us how certain/uncertain we are about the location of that parameter
Also allows us to express prior beliefs (probabilities) about params
Have a “weighted” coin -- want to figure out θ=Pr[heads]
Maximum likelihood:
Flip coin a bunch of times, measure #heads; #tails
Use estimator to return a single value for θ
Bayesian (MAP):
Start w/ distribution over what θ might be
Flip coin a bunch of times, measure #heads; #tails
Update distribution, but never reduce to a single number
0 0.2 0.4 0.6 0.8 1 0
1
2
!=Pr[heads] f( ! ) MAP Estimate: 1 heads; 0 tails (1 total) 0 0.2 0.4 0.6 0.8 1 0
1
2
3 !=Pr[heads] f( ! ) ML Estimate: 1 heads; 0 tails (1 total) 1 flip total
0 0.2 0.4 0.6 0.8 1 0
1
2
!=Pr[heads] f( ! ) MAP Estimate: 2 heads; 3 tails (5 total) 0 0.2 0.4 0.6 0.8 1 0
1
2
3 !=Pr[heads] f( ! ) ML Estimate: 2 heads; 3 tails (5 total) 5 flips total
0 0.2 0.4 0.6 0.8 1 0
1
2
3
4 !=Pr[heads] f( ! ) MAP Estimate: 8 heads; 12 tails (20 total) 0 0.2 0.4 0.6 0.8 1 0
1
2
3 !=Pr[heads] f( ! ) ML Estimate: 8 heads; 12 tails (20 total) 20 flips total
0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 !=Pr[heads] f( ! ) MAP Estimate: 16 heads; 34 tails (50 total) 0 0.2 0.4 0.6 0.8 1 0
1
2
3 !=Pr[heads] f( ! ) ML Estimate: 16 heads; 34 tails (50 total) 50 flips total
Think of parameters as just another kind of random variable
Now your data distribution is
This is the generative distribution
A.k.a. observation distribution, sensor model, etc.
What we want is some model of parameter as a function of the data
Get there with Bayes’ rule:
Let’s look at the parts:
Generative distribution
Describes how data is generated by the underlying process
Usually easy to write down (well, easier than the other parts, anyway)
Same old PDF/PMF we’ve been working with
Can be used to “generate” new samples of data that “look like” your training data
The data prior :
Expresses the probability of seeing data set X independent of any particular model
Huh?
The data prior :
Expresses the probability of seeing data set X independent of any particular model
Can get it from the joint data/parameter model :
In practice, often don’t need it explicitly (why?)
Θ
Θ