Module 2: Comparing Two Categorical Variables

In Module 1, we looked at describing a single variable, and in Module 2, we will

start to compare variables to one another and look for a relationship between them.

Before we start to make such a comparison, it is important to make a note of the

importance of random sampling and say a few words about a good hypothesis.

Random Sampling

Most of the data used by political scientists is a sample of the population they are

trying to measure. A population is every single case of what the researcher wants to

study. Usually it is not possible to obtain data on every single member of a population,

so a researcher uses a sample instead. A sample is a smaller set of cases of the

population. In order for a sample to be accepted for use in statistical testing, the sample

must be representative of the population as a whole. To assure that a sample is

representative of the population, researchers use a technique known as random sampling.

The idea behind random sampling is to arrange all the members of the population in a

list, randomly select a number of them, and obtain data from them. The key is that every

member of a population must have an equal chance of being selected; if this weren’t true,

then the sample wouldn’t be truly random, and therefore not representative of the

population. Most statistical tests, and all of them discussed in this guide, assume that the

data you are testing is truly representative of the population and uses random sampling.

Suppose that a researcher wants to do a study on the American public. He obtains

a list of telephone numbers of every person in Alabama, Colorado, Indiana, Maine, and

Oregon. He then randomly selects 1000 numbers and obtains the data from them that he

needs. Is this sample representative of his target population? No, of course not. Only

people in those five states have a chance to be surveyed, whereas someone in Michigan

has no chance of being surveyed. Therefore, the population for his sample is only the

public in those five states and not the entire American public.

Now suppose that same researcher is able to get a list of all the phone numbers of

everyone in America. To obtain his sample, he selects every 10,000th person on the list

and surveys him or her. Is this an acceptable method for the researcher to use? Again,

the answer is no. The method the researcher used is not random. Everyone on the list

does not have an equal chance to be selected because only every 10,000th person is

selected. This is calculated rather than random.

The most popular way of getting a random sample is by using a table of random

numbers (available in most statistics textbooks). Other methods of making selections

random could involve rolling dice or computer randomness programs. The data used in

this guide can be assumed to be a random sample. GSS1998.dta and NES2000.dta are

prime examples of random samples obtained by professional organizations. STATES.dta

and WORLD.dta are examples of times when it is possible to collect data on an entire

population. Thus, we don’t have to concern ourselves with whether these samples are

representative.

Keep in mind that even representative samples contain some sampling error. The

statistical tests that we will be exploring will explain how to account for this error and

make a conclusion about a relationship with some certainty.

-22-

Comparing Two Categorical Variables - Module 2 | POL 242, Exams of Political Science

Related documents