 # Statistical Inference & Linear Regression Harvard Case Solution & Analysis Question 1

Q1a). Two line graphs have been generated in the excel sheet and yes there seem to be some noticeable trends between pc knowledge between customers with and without PC. The variation for the pc knowledge for the customers without a PC is higher. Also, the PC knowledge is much higher on average for the customers that own a PC. The two graphs are as follows:

Q1b). The calculations are performed in excel which are as follows:

 PC-Knowledge with PC Mean 3.57 Lower Limit Upper Limit Confidence Interval 3.26 3.89 PC-Knowledge without PC Mean 2.55 Lower Limit Upper Limit Confidence Interval 2.31 2.80

Q1c).The confidence intervals based upon the equal variance test are as follows:

 CONFIDENCE INTERVAL Employees with Own PC Employees with No PC Mean 3.59 2.59 Z value at 95% 1.96 1.96 S.E 0.16 0.12 Lower Limit 3.277 2.354 Upper Limit 3.899 2.820

The results of the equal variance test are as follows:

 t-Test: Two-Sample Assuming Equal Variances Employees with Own PC Employees with No PC Mean 3.588235294 2.586956522 Variance 0.855614973 0.647826087 Observations 34 46 Pooled Variance 0.73573677 Hypothesized Mean Difference 1.018 df 78 t Stat -0.08619465 P(T<=t) one-tail 47% t Critical one-tail 1.664624645 P(T<=t) two-tail 93% t Critical two-tail 1.990847036

As the p-value is 93% which is higher than the level of significance therefore, it could be said that the difference between the two means is not significant and that the null hypothesis which states that the two means are same would be accepted.

Q1d).  The sample size needed would be 82.17 for customer PC knowledge with a PC and 62.22 without a PC respectively. :

 Sample Size 82.17 62.22

Q1e). The confidence interval for the true proportion of the PC-savvy customers is:

 One sample t-test Count 82 Mean 2.988 Standard deviation 1.000 standard error 0.110 Hypothetical mean 4 alpha 0.05 tails 1 df 81 t stat -9.167 p value 0% sig Yes Lower Control Limit 2.77 Upper Control Limit 3.20

Question 2

Q2a). The mean and standard deviation is as follows:

 MEAN & SD Mean Standard Deviation Sony Pictures 63062074 73728582.89 Warner Bros. 73316434 81424660.81 20th Century Fox 74272230 78079986.82 Fox Searchlight 12410194 14759045.01 Universal 59017596 55201941.51

Q2b).The results of the One-Sample t-test are:

 One Sample t-test Count 103 Mean 58836846.7 Standard Deviation 69920396.24 Standard Error 6889461.356 Hypothetical Mean 50000000 alpha 0.05 tails 1 df 102 t stat 1.28266148 p value 10%

The mean total US gross does not exceed \$50 million significantly for the five largest movie distributors as the p value is 10% which is above the level of significance (5%).

Q2c).The results of the One-Way ANOVA are as follows:

 ANOVA Source of Variation SS df MS F P-value F crit Between Groups 4.738E+16 4 1.185E+16 2.572 0.042 2.465 Within Groups 4.513E+17 98 4.605E+15 Total 4.987E+17 102

As the P value is less than 5% and the F value is higher than F crit value, therefore it could be concluded that there are significant differences between the mean total US gross for the five popular distributors.

Q2d).The confidence intervals based on Tukey correction are:

 TUKEY CORRECTION Total US Gross for Sony Total Us Gross for Warner Bros. Total US gross for 20th Century Total US gross for Fox Searchlight Total US gross for Universal Mean 63062074.04 73316434.36 74272230 12410193.88 59017596.18 Count 23 22 24 17 17 Standard Deviation 73728582.89 81424660.81 78079986.82 14759045.01 55201941.51 S.E 15373472.26 17359796.01 15938010.57 3579594.207 13388437.39 Z value at 95% 1.96 1.96 1.96 1.96 1.96 Lower Control Limit 32930068.41 39291234.18 43033729.28 5394189.237 32776258.9 Upper Control Limit 93194079.68 107341634.5 105510730.7 19426198.53 85258933.46

The overall confidence intervals for all the distributors are:

 Overall Total US GROSS Mean 58836846.7 Count 103 Standard Deviation 69920396.24 S.E 6889461.356 Z value at 95% 1.96 Lower Control Limit 45333502.44 Upper Control Limit 72340190.96

As the confidence intervals for individual distributors are much wider than the confidence interval for the total US gross sales hence, all distributors have significantly different means.

Statistical Inference & Linear Regression Case Solution

Question 3

Q3a). The results of the regression model are as follows:

 Regression Statistics Multiple R 0.208203733 R Square 0.043348794 Adjusted R Square 0.034391386 Standard Error 8.354657572 Observations 540

 ANOVA df SS MS F Significance F Regression 5 1688.970113 337.7940226 4.839434894 0.000244522 Residual 534 37273.36188 69.80030314 Total 539 38962.33199
 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept -2.642159211 3.346530759 -0.789521867 0.430157473 -9.216138678 3.931820255 -9.216138678 3.931820255 GRI -2.110460859 0.738857893 -2.85638264 0.004451872 -3.561885324 -0.659036393 -3.561885324 -0.659036393 SAT 0.005734797 0.002659567 2.156289466 0.031506883 0.0005103 0.010959295 0.0005103 0.010959295 MBA -0.180646966 0.756643724 -0.238747723 0.811392803 -1.667010207 1.305716274 -1.667010207 1.305716274 AGE -0.06889255 0.041817798 -1.647445675 0.100054737 -0.151040112 0.013255012 -0.151040112 0.013255012 TEN -0.11872167 0.083502131 -1.421780125 0.155673863 -0.282754614 0.045311274 -0.282754614 0.045311274

................................

This is just a sample partial case solution. Please place the order on the website to order your own originally done case solution. Other Similar Case Solutions like