import pandas as pd

gss = pd.read_csv('gss_bayes.csv', index_col=0)
gss.head()
year age sex polviews partyid indus10
caseid
1 1974 21.0 1 4.0 2.0 4970.0
2 1974 41.0 1 5.0 0.0 9160.0
5 1974 58.0 2 6.0 1.0 2670.0
6 1974 30.0 1 5.0 4.0 6870.0
7 1974 48.0 1 5.0 4.0 7860.0

Political Views and Parties

The other variables we’ll consider are polviews, which describes the political views of the respondents, and partyid, which describes their affiliation with a political party.

The values of polviews are on a seven-point scale:

1   Extremely liberal
2   Liberal
3   Slightly liberal
4   Moderate
5   Slightly conservative
6   Conservative
7   Extremely conservative

I’ll define liberal to be True for anyone whose response is “Extremely liberal”, “Liberal”, or “Slightly liberal”.

The values of partyid are encoded like this:

0   Strong democrat
1   Not strong democrat
2   Independent, near democrat
3   Independent
4   Independent, near republican
5   Not strong republican
6   Strong republican
7   Other party

I’ll define democrat to include respondents who chose “Strong democrat” or “Not strong democrat”:

liberal = (gss['polviews'] <= 3)
democrat = (gss['partyid'] <= 1)
female = gss['sex']==2 # 1=male, 2=female
female.value_counts()
True     26511
False    22779
Name: sex, dtype: int64
banker = (gss['indus10'] == 6870)
banker.value_counts()
False    48562
True       728
Name: indus10, dtype: int64
def prob(A):
    """Computes the probability of a proposition, A."""    
    return A.mean()
def conditional(proposition, given):
    """Probability of A conditioned on given."""
    return prob(proposition[given])

Exercises

Exercise: Let’s use the tools in this chapter to solve a variation of the Linda problem.

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is more probable? 1. Linda is a banker. 2. Linda is a banker and considers herself a liberal Democrat.

To answer this question, compute

  • The probability that Linda is a female banker,

  • The probability that Linda is a liberal female banker, and

  • The probability that Linda is a liberal female banker and a Democrat.

# Solution goes here
# then logical 'and' the two series and return the 'mean' to get the proportion/fraction/prob of the conjunction
(female & banker).mean()
0.011381618989653074
# Solution goes here
(liberal & female & banker).mean()
0.002556299452221546
# Solution goes here
(liberal & female & banker & democrat).mean()
0.0012375735443294787
# Solution goes here

Exercise: Use conditional to compute the following probabilities:

  • What is the probability that a respondent is liberal, given that they are a Democrat?

  • What is the probability that a respondent is a Democrat, given that they are liberal?

Think carefully about the order of the arguments you pass to conditional.

# Solution goes here
len(gss[liberal & democrat]) / len(gss[democrat])
0.3891320002215698
liberal[democrat].mean()
0.3891320002215698
# Solution goes here

Exercise: There’s a famous quote about young people, old people, liberals, and conservatives that goes something like:

If you are not a liberal at 25, you have no heart. If you are not a conservative at 35, you have no brain.

Whether you agree with this proposition or not, it suggests some probabilities we can compute as an exercise. Rather than use the specific ages 25 and 35, let’s define young and old as under 30 or over 65:

young = (gss['age'] < 30)
prob(young)
0.19435991073240008
old = (gss['age'] >= 65)
prob(old)
0.17328058429701765

For these thresholds, I chose round numbers near the 20th and 80th percentiles. Depending on your age, you may or may not agree with these definitions of “young” and “old”.

I’ll define conservative as someone whose political views are “Conservative”, “Slightly Conservative”, or “Extremely Conservative”.

conservative = (gss['polviews'] >= 5)
prob(conservative)
0.3419354838709677

Use prob and conditional to compute the following probabilities.

  • What is the probability that a randomly chosen respondent is a young liberal?

  • What is the probability that a young person is liberal?

  • What fraction of respondents are old conservatives?

  • What fraction of conservatives are old?

For each statement, think about whether it is expressing a conjunction, a conditional probability, or both.

For the conditional probabilities, be careful about the order of the arguments. If your answer to the last question is greater than 30%, you have it backwards!

# Solution goes here
prob(young & liberal)
0.06579427875836884
# Solution goes here
young[liberal].mean()
0.24034684651300675
# Solution goes here
prob(old & conservative)
0.06701156421180766
# Solution goes here
prob(old & conservative) / prob(conservative)
0.19597721609113564