Introduction

The data set which we would be using for this research project is based on Crime in the small cities of America. This project would utilize the information contained in the data file about the crime, education and police funding for the small American cities in the 10 southeastern and eastern states of America. The data is based on different states which include Florida, Georgia, South Carolina, North Carolina, New York, Maine, Rhode Island, Connecticut and New Hampshire (Thomas, 2016).

The data set consists of a total of 100 observations which represent the 50 small cities in America within these states. This data has been extracted from “Life in America’s Small Cities by G.S. Thomas. Y is the independent variable and all the other variables (X1, X2, X3, X4, X5, and X6) are the repressors and would be used as the independent variables in our final analysis. The description of each of the variable in the data set is provided as follows:

Y= Total overall reported rate of crime per 1 million residents, X1= reported violent crime rate per 100,000 residents, X2= annual police funding in \$/resident, X3= % of people 25 years+ with 4 yrs. of high school, X4= % of 16 to 19 year-olds not in high school and not high school graduates, X5= % of 18 to 24 year-olds in college and X6= % of people 25 years+ with at least 4 years of college

Research Question

The main research question which we want to answer in this research project by making the use of the selected data set is as follows:

“Whether crime, police funding and education have a relationship and impact on the rate of crime in small American cities?”

Significance of Research

The research topic for this project has been chosen around crime in the smaller cities of United States. Crime is a significant issue in US and it has been recorded since the colonization. The rates of crime have varied in US especially after 1963 and they had reached to peaks in 1970s to 1990s. However, since then crime has declined significantly in US. The violent crime in the US has been declining since the colonization however, during the early 20th century, the rates of crime in US were higher as compared to the crime rates in Europe (FBI, 2016).

The crime rates after World War II had also started to increase significantly. Violent crime in US has quadrupled between 1960 and 1991. During the same period, the property crime had more than doubled. However, most of the theories have been proposed which show that crime has declined in US (CIA, 2016). Looking at the importance of crime in the US economy, we are considering the impact of a range of factors on the rate of crime in the smaller US cities.

Basic Diagnosis of Data Problems

It is highly important that the data set does not contain any problems which might invalidate our future analysis. First, all the variables in the data set have been checked for outliers. The calculations have been performed in the data set file and the results could be seen in exhibit 1 in appendix. All the outliers had been removed in deliverable 2 and the comoputations are shown in exhibit 1. Another important assumption for performing the exploratory analysis such as regression analysis is that the independent variable should be normally distributed. This assumption has been tested and the normal curve in exhibit 2 in appendix shows that the independent variable is normally distributed.

Finally, we have also tested the independent variable for data entry issues. We have done this for the independent variable which also consisted of outliers. It could be seen in the histogram that the data values for independent variable are right-skewed which means the mean is greater than the median of this variable. However, this might also be due to the existence of the outliers and thus it might make the validity of certain parametric tests or exploratory analysis questionable. But this has been removed now since outliers have been removed from the data set.

Concept of Selected Methodology, Its Strengths & Weaknesses

The methodology which has been selected for this research paper is the Multiple Regression Analysis because this modeling technique would address the proposed research question of this project (Armstrong, 2012). We aim to determine the impact of a range of variables such as violent crime rate, annual reported police funding and the education level of different groups of the population on the rate of crime in the smallest American cities. Multiple Regression will let us know that whether the above factors are contributors of crime rate and individually whether the increase or decrease the rate of crime.

The strengths of this methodology are that it helps us to examine the relationship between the predictor and the outcome variables. It allows for tesing multiple variables through a single model. On the other hand, there are a number of weaknesses of this model are that the outputs from this model can lie outside the range of 0,1 and make predictability difficult and this model is highly sensitive to the presence of the outliers in the data and it often given estimates of the unknown parameters (Aldrich, 2005).............

