 # Predicting Real Estate Prices Harvard Case Solution & Analysis STEP 1: Linearity

First of all we test for the linear relationship between the dependent variable, which is the listing price and the range of the independent variables with the dependent variable. The scatter plots have been provided in order to check for the linear relationship. Each of the seven scatter plots are analyzed below:

1. The first scatter plot has been formed between the listing and the interior floor space. If we look at the scatter plot then it could be seen that there is some linear relationship. This relationship is positive and it is weak to moderate linear relationship. This also shows that the variance is not constant throughout.
2. The second scatter plot has been generated between the listing price of the property and the land size. It could be seen that there is a linear relationship but as the data is highly scattered therefore, it could be said that this linear relationship is a positive but a weak linear relationship. This might create a problem in the regression model from the linearity perspective.
3. The third scatter plot has been formulated between the listing price and the number of the bedrooms of the property. By looking at the scatter plot it could be seen that there exists a linear relationship till 6 numbers of bedrooms within the property. This relationship is linear, positive and strong. The variance would also be constant.
4. The fourth scatter plot has been generated between the listing price and the number of the bedrooms. Looking at the scatter plot it could be seen hat there exists a linear, positive and a strong relationship between the dependent and independent variable. The variance would also be constant.
5. The fifth scatter plot has been generated for the listing price and the age of the property. Looking at the graph it could be seen that there exists a negative linear relationship which is weak between these two variables. This means as the age increases, the listing price of the property tends to decrease over the time. This might pose problems for the regression model from a linearity perspective.
6. The sixth scatter plot has been generated for the listing price and the fireplaces. The scatter plot shows a linear, positive and moderate relationship between the two variables. The variance is also constant till 4 fireplaces within the property.
7. The seventh scatter plot has been generated between the listing price and the view of the property. Looking at the scatter plot it could be seen that there exists no relationship between the two variables and both are independent of each other. These variables might be the most problematic variable for the regression model from a linearity perspective.

STEP 2: Multi-Collinearity

Multi-collinearity occurs in the regression data when the independent variables are not independent from each other. This might cause a problem in the regression model formulation. If we analyze the correlation matrix for mutli-collinearity then it could be seen that the multi-collinearity problem exists between the number of bedrooms and the floor size having a correlation of +83.8%. It also exists between the age of the property and number of bathrooms having a negative correlation of -71.1%. This multi-collinearity between these x variables might cause problems for the regression model.

STEP 3: Analysis

a). The Multiple Regression model for the seven independent variable for predicting the listing price of the property has been generated in the excel spreadsheet and shown below.

b). If we analyze the F test value and its significant value in the ANOVA table then it could se seen that the overall model is significant since the p-value of the F-test is 0% which is less than the level of significance.

c). If we analyze the coefficients table, then it could be seen that the p value for the number of fireplaces is 88%, which is the highest of all the insignificant x variables. The step 2 for the backward regression after excluding this independent variable has been generated in the excel spreadsheet and shown below.Predicting Real Estate Prices Case Solution

d). If we review the reduced model, then still it could be seen in the coefficients table that the p-value for the number of bathrooms is insignificant and stands at 59%. Therefore, the new multiple regression model has been generated after eliminating the number of bathrooms (x4) variable. The new model is shown in final model of excel spreadsheet. This gives us the final reduced model as shown below0................................

This is just a sample partial case solution. Please place the order on the website to order your own originally done case solution. Other Similar Case Solutions like