## Statistical Inference & Linear Regression Case Solution

**Question 1**

Q1a). Two line graphs have been generated in the excel sheet and yes there seem to be some noticeable trends between pc knowledge between customers with and without PC. The variation for the pc knowledge for the customers without a PC is higher. Also, the PC knowledge is much higher on average for the customers that own a PC. The two graphs are as follows:

Q1b). The calculations are performed in excel which are as follows:

PC-Knowledge with PC | ||

Mean | 3.57 | |

Lower Limit | Upper Limit | |

Confidence Interval | 3.26 | 3.89 |

PC-Knowledge without PC | ||

Mean | 2.55 | |

Lower Limit | Upper Limit | |

Confidence Interval | 2.31 | 2.80 |

Q1c).The confidence intervals based upon the equal variance test are as follows:

CONFIDENCE INTERVAL | ||

Employees with Own PC | Employees with No PC | |

Mean | 3.59 | 2.59 |

Z value at 95% | 1.96 | 1.96 |

S.E | 0.16 | 0.12 |

Lower Limit | 3.277 | 2.354 |

Upper Limit | 3.899 | 2.820 |

The results of the equal variance test are as follows:

t-Test: Two-Sample Assuming Equal Variances | ||

| Employees with Own PC | Employees with No PC |

Mean | 3.588235294 | 2.586956522 |

Variance | 0.855614973 | 0.647826087 |

Observations | 34 | 46 |

Pooled Variance | 0.73573677 | |

Hypothesized Mean Difference | 1.018 | |

df | 78 | |

t Stat | -0.08619465 | |

P(T<=t) one-tail | 47% | |

t Critical one-tail | 1.664624645 | |

P(T<=t) two-tail | 93% | |

t Critical two-tail | 1.990847036 |

As the p-value is 93% which is higher than the level of significance therefore, it could be said that the difference between the two means is not significant and that the null hypothesis which states that the two means are same would be accepted.

Q1d). The sample size needed would be 82.17 for customer PC knowledge with a PC and 62.22 without a PC respectively. :

Sample Size | 82.17 | 62.22 |

Q1e). The confidence interval for the true proportion of the PC-savvy customers is:

One sample t-test | |

Count | 82 |

Mean | 2.988 |

Standard deviation | 1.000 |

standard error | 0.110 |

Hypothetical mean | 4 |

alpha | 0.05 |

tails | 1 |

df | 81 |

t stat | -9.167 |

p value | 0% |

sig | Yes |

Lower Control Limit | 2.77 |

Upper Control Limit | 3.20 |

**Question 2**

Q2a). The mean and standard deviation is as follows:

MEAN & SD | ||

Mean | Standard Deviation | |

Sony Pictures | 63062074 | 73728582.89 |

Warner Bros. | 73316434 | 81424660.81 |

20th Century Fox | 74272230 | 78079986.82 |

Fox Searchlight | 12410194 | 14759045.01 |

Universal | 59017596 | 55201941.51 |

Q2b).The results of the One-Sample t-test are:

One Sample t-test | |

Count | 103 |

Mean | 58836846.7 |

Standard Deviation | 69920396.24 |

Standard Error | 6889461.356 |

Hypothetical Mean | 50000000 |

alpha | 0.05 |

tails | 1 |

df | 102 |

t stat | 1.28266148 |

p value | 10% |

The mean total US gross does not exceed $50 million significantly for the five largest movie distributors as the p value is 10% which is above the level of significance (5%).

Q2c).The results of the One-Way ANOVA are as follows:

ANOVA | ||||||

Source of Variation | SS | df | MS | F | P-value | F crit |

Between Groups | 4.738E+16 | 4 | 1.185E+16 | 2.572 | 0.042 | 2.465 |

Within Groups | 4.513E+17 | 98 | 4.605E+15 | |||

Total | 4.987E+17 | 102 |

As the P value is less than 5% and the F value is higher than F crit value, therefore it could be concluded that there are significant differences between the mean total US gross for the five popular distributors.

Q2d).The confidence intervals based on Tukey correction are:

TUKEY CORRECTION | |||||

Total US Gross for Sony | Total Us Gross for Warner Bros. | Total US gross for 20th Century | Total US gross for Fox Searchlight | Total US gross for Universal | |

Mean | 63062074.04 | 73316434.36 | 74272230 | 12410193.88 | 59017596.18 |

Count | 23 | 22 | 24 | 17 | 17 |

Standard Deviation | 73728582.89 | 81424660.81 | 78079986.82 | 14759045.01 | 55201941.51 |

S.E | 15373472.26 | 17359796.01 | 15938010.57 | 3579594.207 | 13388437.39 |

Z value at 95% | 1.96 | 1.96 | 1.96 | 1.96 | 1.96 |

Lower Control Limit | 32930068.41 | 39291234.18 | 43033729.28 | 5394189.237 | 32776258.9 |

Upper Control Limit | 93194079.68 | 107341634.5 | 105510730.7 | 19426198.53 | 85258933.46 |

The overall confidence intervals for all the distributors are:

Overall Total US GROSS | |

Mean | 58836846.7 |

Count | 103 |

Standard Deviation | 69920396.24 |

S.E | 6889461.356 |

Z value at 95% | 1.96 |

Lower Control Limit | 45333502.44 |

Upper Control Limit | 72340190.96 |

As the confidence intervals for individual distributors are much wider than the confidence interval for the total US gross sales hence, all distributors have significantly different means.

**Question 3**

Q3a). The results of the regression model are as follows:

Regression Statistics | |

Multiple R | 0.208203733 |

R Square | 0.043348794 |

Adjusted R Square | 0.034391386 |

Standard Error | 8.354657572 |

Observations | 540 |

ANOVA | |||||

| df | SS | MS | F | Significance F |

Regression | 5 | 1688.970113 | 337.7940226 | 4.839434894 | 0.000244522 |

Residual | 534 | 37273.36188 | 69.80030314 | ||

Total | 539 | 38962.33199 |

| Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% |

Intercept | -2.642159211 | 3.346530759 | -0.789521867 | 0.430157473 | -9.216138678 | 3.931820255 | -9.216138678 | 3.931820255 |

GRI | -2.110460859 | 0.738857893 | -2.85638264 | 0.004451872 | -3.561885324 | -0.659036393 | -3.561885324 | -0.659036393 |

SAT | 0.005734797 | 0.002659567 | 2.156289466 | 0.031506883 | 0.0005103 | 0.010959295 | 0.0005103 | 0.010959295 |

MBA | -0.180646966 | 0.756643724 | -0.238747723 | 0.811392803 | -1.667010207 | 1.305716274 | -1.667010207 | 1.305716274 |

AGE | -0.06889255 | 0.041817798 | -1.647445675 | 0.100054737 | -0.151040112 | 0.013255012 | -0.151040112 | 0.013255012 |

TEN | -0.11872167 | 0.083502131 | -1.421780125 | 0.155673863 | -0.282754614 | 0.045311274 | -0.282754614 | 0.045311274 |

