Nested Logit regression model Harvard Case Solution & Analysis

Nested Log it regression model  Case Study Solution 


This assignment introduces the census for Motorized vehicle crashes,which represents a significant cause of death annually. However, it was estimated by the world health organization that 20 to 50 million road accidents result in non-fatal injuries which could range from debilitating outcomes to minor injuries. Furthermore, it was also estimated that about 1.24 million people die from fatal accidents on the road. Additionally, it was determined that the increasing road accidents had adverse effects on the economy, which, in turn, raised the medical costs of the victims. Moreover, it decreased the productivity of the organization and also was a major cause of traffic congestion on the road. In addition to this, it could be determined that in the U.S fatalities relating to highways occurred more frequently on rural highways, attributed to the lack of safety measure implement in that area. However, crash severity would remain the primary focus of the analysis as severity levels could have a tremendous impact on the economic condition and emotional state of the people. Therefore, the main object of the study was to estimate crash injury severity outcomes by implementing nested logit and probit Regression models.

Furthermore, KABCO scale directs development of the model towards a discrete choice model. However, this study focuses on binary nested logit Regression models ofseverity outcomes of road accidents.

Data Description

The Data had been collected from four datasets, consisting of details regarding vehicle, crash, roadway and occupant data in dta format, merged together to define specific crash events in the rural areas regarding the highways in Illinois. Furthermore, the dataset defines the level of crash severity for each incident, ranging from the worst injury to any driver or occupant involved in the accident, which could be determined using nested logit and probit regression model.

Furthermore, it can be determined that the studies suggest unreliability of data recorded in the dataset, as it does not account for subjective information of each incident. Moreover, the subjective information, including road defects,weather or driver conditions could also be a relevant cause of crash injuries. Hence, the absence of the subjective data from analysis could cause the model to omit variable bias, which would result in the parameters overestimation or underestimation.

Variable description

Using nested logit and progit regression model to estimate crash severity, where “severity” was taken as the dependent variable. Whereas, the independent variables were distributed into four different groups as per their features namely, Roadway Attributes,Traffic, Driver & Passenger, and Crash Type. However, the variable description is given in Exhibit-1.



Exihibit-1 (Variable Description)

Trafficaadt1Avg annual traffic per day (veh/day)
comm_volvolume of heavy commercial vehicles per day (trucks/day)
largeIndicator variable: 1=large vehicle (i.e., bus or

truck) involved in crash; 0=no large vehicle involved

Roadway Attributeslane widthLane width (ft.)
shoulder typeType of shoulder: 1 = earth/sod; 2 = aggregate; 3 = paved; 4 = composite

shoulder type of aggregate and sod; 5 = composite shoulder type of paved with

either aggregate or sod

shoulder widthShoulder width (ft.)
Crash TypeotherIndicator variable: 1 = collision type other than specific ones below;

0 = otherwise

fixed_objectIndicator variable: 1 = fixed-object collision;

0 = otherwise

animalIndicator variable: 1 = animal collision; 0 = other
overturnIndicator variable: 1 = overturn collision; 0=other
turning_rear endIndicator variable: 1 = turning or rear-end collision; 0=other
side_sameIndicator variable: 1 = sideswipe same direction; 0=other
side opp_head_angleIndicator variable: 1 = sideswipe opposite direction, head-on, or

angle collision; 0=other

Driver & Passengerdrv_max_ageMax age of drivers involved in the crash (years)
back_max_ageMax age of the back seat occupants involved in the crash
front_max_ageMaxage of front seat occupants involved in the crash
num_maleNumber of male occupants involved in the crash (other than the drivers)
num_femaleNumber of female occupants involved in the crash (other than the drivers)
backindBack seat occupant indicator: 1=there was at least one back seat occupant involved in the crash; 0=no back seat occupants involved
front indIndicator variable: 1=there was at least one front seat occupant involved in the crash; 0= no front seat occupant involved
drvsex_maleIndicator variable: 1 = driver is male; 0 = driver is female
drv_no_restIndicator variable: 1 = there was at least one driver involved in the crash that did not use restrain or used restraint improperly; 0 =all drivers used restraint properly
front_no_restIndicator variable: 1 = there was at least one front seat occupant involved in the crash that did not use restraint or used restraint improperly; 0 =all front seat occupants used restraint properly
back_no_restIndicator variable: 1 = there was at least one back seat occupant involved in the crash that did not use restraint or used restraint improperly; 0 =all back seat occupants used restraint properly


The descriptive statistics of the dependent as well as independent variable is given below in exhibit-2.

Nested Logit regression model Harvard Case Solution & Analysis



Exhibit-2 (Descriptive Statistics)


Descriptive Statistics
Variable                   Observations Mean   Std. Dev.   Min  Max
severity                   24,622           0.38           0.90               -             4.00
aadt1                   24,622   3,609.63   2,189.94      400.00   14,500.00
comm_vol                   24,622      422.24      301.12          1.00           14.00
shoulderwi~h                   24,622           6.26           2.57          1.00           14.00
drv_max_age                   24,536         41.63         16.73        13.00           98.00
back_max_age                   23,843           1.91           8.45               -           92.00
front_max_~e                   24,467           8.03         17.93               -           97.00
num_male                   24,622           0.17           0.52               -           19.00
num_female                   24,622           0.24           0.58               -           17.00
backind                   24,622           0.09           0.28               -             1.00
drvsex_male                   24,548           0.59           0.49               -             1.00
drv_no_rest                   21,841           0.03           0.76               -             1.00
front_no_r~t                   24,135           0.01           0.08               -             1.00
back_no_rest                   23,797           0.01           0.08               -             1.00
other                   24,622           0.04           0.20               -             1.00
animal                   24,622           0.58           0.49               -             1.00
overturn                   24,622           0.06           0.23               -             1.00
turning_re~d                   24,622           0.78           0.27               -             1.00
side_same                   24,622           0.11           0.11               -             1.00
side opp_he~e                   24,622           0.04           0.20               -             1.00

Exihibit-3 (Regression Analysis)

In Exhibit-3, the p-values of the independent variables are reasonable. However, most of the values are less than 5%. Moreover, under the regression model, the coefficient of AADT and animal are negative, which means that either AADT or the probability of the crash being caused by animals was increasing. However, the accident tends to be in the minor severity category............

This is just a sample partial case solution. Please place the order on the website to order your own originally done case solution








Other Similar Case Solutions like

Nested Logit regression model

Share This