import pandas as pd
= pd.read_csv('gss_bayes.csv', index_col=0) gss
gss.head()
year | age | sex | polviews | partyid | indus10 | |
---|---|---|---|---|---|---|
caseid | ||||||
1 | 1974 | 21.0 | 1 | 4.0 | 2.0 | 4970.0 |
2 | 1974 | 41.0 | 1 | 5.0 | 0.0 | 9160.0 |
5 | 1974 | 58.0 | 2 | 6.0 | 1.0 | 2670.0 |
6 | 1974 | 30.0 | 1 | 5.0 | 4.0 | 6870.0 |
7 | 1974 | 48.0 | 1 | 5.0 | 4.0 | 7860.0 |
Political Views and Parties
The other variables we’ll consider are polviews
, which describes the political views of the respondents, and partyid
, which describes their affiliation with a political party.
The values of polviews
are on a seven-point scale:
1 Extremely liberal
2 Liberal
3 Slightly liberal
4 Moderate
5 Slightly conservative
6 Conservative
7 Extremely conservative
I’ll define liberal
to be True
for anyone whose response is “Extremely liberal”, “Liberal”, or “Slightly liberal”.
The values of partyid
are encoded like this:
0 Strong democrat
1 Not strong democrat
2 Independent, near democrat
3 Independent
4 Independent, near republican
5 Not strong republican
6 Strong republican
7 Other party
I’ll define democrat
to include respondents who chose “Strong democrat” or “Not strong democrat”:
= (gss['polviews'] <= 3)
liberal = (gss['partyid'] <= 1) democrat
= gss['sex']==2 # 1=male, 2=female
female female.value_counts()
True 26511
False 22779
Name: sex, dtype: int64
= (gss['indus10'] == 6870)
banker banker.value_counts()
False 48562
True 728
Name: indus10, dtype: int64
def prob(A):
"""Computes the probability of a proposition, A."""
return A.mean()
def conditional(proposition, given):
"""Probability of A conditioned on given."""
return prob(proposition[given])
Exercises
Exercise: Let’s use the tools in this chapter to solve a variation of the Linda problem.
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is more probable? 1. Linda is a banker. 2. Linda is a banker and considers herself a liberal Democrat.
To answer this question, compute
The probability that Linda is a female banker,
The probability that Linda is a liberal female banker, and
The probability that Linda is a liberal female banker and a Democrat.
# Solution goes here
# then logical 'and' the two series and return the 'mean' to get the proportion/fraction/prob of the conjunction
& banker).mean() (female
0.011381618989653074
# Solution goes here
& female & banker).mean() (liberal
0.002556299452221546
# Solution goes here
& female & banker & democrat).mean() (liberal
0.0012375735443294787
# Solution goes here
Exercise: Use conditional
to compute the following probabilities:
What is the probability that a respondent is liberal, given that they are a Democrat?
What is the probability that a respondent is a Democrat, given that they are liberal?
Think carefully about the order of the arguments you pass to conditional
.
# Solution goes here
len(gss[liberal & democrat]) / len(gss[democrat])
0.3891320002215698
liberal[democrat].mean()
0.3891320002215698
# Solution goes here
Exercise: There’s a famous quote about young people, old people, liberals, and conservatives that goes something like:
If you are not a liberal at 25, you have no heart. If you are not a conservative at 35, you have no brain.
Whether you agree with this proposition or not, it suggests some probabilities we can compute as an exercise. Rather than use the specific ages 25 and 35, let’s define young
and old
as under 30 or over 65:
= (gss['age'] < 30)
young prob(young)
0.19435991073240008
= (gss['age'] >= 65)
old prob(old)
0.17328058429701765
For these thresholds, I chose round numbers near the 20th and 80th percentiles. Depending on your age, you may or may not agree with these definitions of “young” and “old”.
I’ll define conservative
as someone whose political views are “Conservative”, “Slightly Conservative”, or “Extremely Conservative”.
= (gss['polviews'] >= 5)
conservative prob(conservative)
0.3419354838709677
Use prob
and conditional
to compute the following probabilities.
What is the probability that a randomly chosen respondent is a young liberal?
What is the probability that a young person is liberal?
What fraction of respondents are old conservatives?
What fraction of conservatives are old?
For each statement, think about whether it is expressing a conjunction, a conditional probability, or both.
For the conditional probabilities, be careful about the order of the arguments. If your answer to the last question is greater than 30%, you have it backwards!
# Solution goes here
& liberal) prob(young
0.06579427875836884
# Solution goes here
young[liberal].mean()
0.24034684651300675
# Solution goes here
& conservative) prob(old
0.06701156421180766
# Solution goes here
& conservative) / prob(conservative) prob(old
0.19597721609113564