Analysis of Crime Rates

Economics 201, Spring 2009 Cottrell

This is the final piece of graded work for the half-semester. It is due on Friday, March 6. Only printed

copy will be accepted!

1 Outline of the assignment

Download the data file crime2.gdt. This is available in the docs folder on the ECN 201 home page.

The full URL is

http://ricardo.ecn.wfu.edu/˜cottrell/ecn201/docs/crime2.gdt

The idea is to analyse these date with a view to producing a model that explains as much as possible

of the variation in property crime rates across US cities. The basic dependent variable is prop2005,

the rate of property crime per 100,000 population in the year 2005. The data are from 244 US cities

and “places” with populations of 100,000 or more. Please read the description (menu item /Data/Read

info in gretl) for more details.

2 Comments on the data

You will find that the dataset includes several possible explanatory variables — plus some “raw”

variables that were used in constructing variables of interest, but that probably should not be used in

their own right, e.g., genexpend and policefrac, which were used to construct policepc, per-capita

expenditure on police protection.

Note that some of the potential explanatory variables are alternatives rather than complements. For

example medhhinc and pcincome are alternative measures of income levels; and fampovpc,totpovpc

and povpcu18 are alternative measures of poverty rates. In these cases you should determine which

variant has the greatest explanatory power over crime.

A comment on the education-related variables: nohischool is the percentage of the population aged

25 and over who do not have a high school diploma, and hischool is the percentage with a high

school diploma but no higher educational attainment.

And a comment on the age-structure variables such as pop18_24 (population between the ages of 18

and 24): these are in “raw” numerical form: to get the percentage in each of the age groups you’d need

to divide by population.

3 “Explore” the data

The first thing to do is explore the data, getting a sense of the numbers involved, and the distribu-

tions. This might also tell you whether there are any “odd cases” among the cities that might need

special treatment. Your basic tools here are summary statistics (means, medians, standard deviations,

etc.) and graphical methods (boxplots, frequency plots, pairwise X-Y scatter plots). You’ll have to be

selective here, homing in on what you reckon might be most important: I don’t want 20 pages of plots

and summary stats! Also in this context: you might find it helpful to look at some sorted tables (as

Analysis of Crime Rates - Notes for Assignment | ECN 201, Assignments of Economics