Crash Severity by Statistical techniques

Aki Kapoor
5 min readFeb 13, 2021

Grouping data by the License type:-

Double Bar Chart

Stacked Bar Chart

Benefits/Drawbacks of the two types of graph:-

For the problems, where we have to calculate the proportion of total population or the percentage then using stacked bar chart is more convenient. For making interpretation of highest proportion or least value, it can be used. However, comparative study is less convenient with double stacked bar chart.

For comparative study between two values, double bar chart should be preferred as it is more convenient to visualize. For the proportion of values, it is not that useful as stacked bar chart.

Information displayed with a pie chart

For the different levels of Crash Severity, the average amount of time taken off work can be determined as follows:-

It can be said that the crash severity is most serious for license type ‘F’ and moderate for license type ‘R’. It is lease for license type ‘L’.

Scatter plot with linear regression:-

Relevant statistics

The value of Multiple R is 0.16 which tells that there is weak co-relation between medical expenses and blood alcohol level. The value of R square is 0.02 which tell that nearly 2% data is fitted to the regression line.

Setting Linear regression

Since, it is weak relation(value of r square is 0.02), so it is not accurate to use linear regression for this.

For interval 15–24,

For interval 50–59,

Difference in blood alcohol levels

The mean of blood alcohol levels for 15–24 year olds is 143.65 mg/dl whereas the mean of blood alcohol levels for 50–59 year olds is 128.38 mg/dl. This means that there were more 15–24 year olds that had high alcohol levels that were in motor vehicle crash.

“serious”

n = 79

sample mean = 150.810

sample standard deviation = 47.365

μ1 = mean blood alcohol level of drivers in crashes that were deemed “serious”

“moderate”

n = 121

sample mean = 131.545

sample standard deviation = 36.168

μ2 = mean blood alcohol level of drivers in crashes that were deemed “moderate”

Ho: μ1 = μ2

Ha:μ1 >μ2

Critical value:

=T.INV(1–0.05,78)

= 1.665 (Critical Value)

T- Statistic test:

Decision: Reject the null hypothesis (test statistic 3.077 >critical value 1.665)

Conclusion: There is sufficient evidence at a= 0.05 that the population mean is higher for drivers in crashes that were deemed serious than drivers in crashes that were deemed moderate.

Ho : Pm = 05

Ha : pm> 0.5

H0: PM = 0.5

Ha: PM 0.5

Sample n = 200

Number of Male drivers = 107

Z Test Static:

= 0.990

P. value:

Pr ( z / 0.990)

=1 — T.DIST(0.99, 199, TRUE)

= 0.162 x 2

= 0.323

Since p-value is greater than alpha, so we do not reject the null hypothesis.

Decision: do not reject the null hypothesis

Conclusion: there is insufficient evidence at a = 0.10 that the population proportion of young drivers involved incar crashes that involved alcohol were more likely to be male than female.

Meaning and implications of Type I & II errors in this context

Type I error can be that NZTA’s test verified that there were more males than females that were involved in crashes after alcohol consumption but in actual there are equal number of males and females youngsters are involved in accident after drinking.

Type II error can be that NZTA’s test verified that there were equal males and females that were involved in crashes after alcohol consumption but in actual there were more males than females youngsters are involved in accident after drinking.

Possible error that may have been made in hypothesis test

The possible error made in this hypothesis test is Type II error. Type II error is made if the null hypotheses is not rejected when it is false whereas, Type I error is made if the null hypothesis is rejected when it is true. Often the null hypothesis is the safe position, as it is what we would prefer to assume until we find convincing evidence against it, but if rejected the null hypothesis when it is actually true can have serious consequences. In this case the evidence suggests at a = 0.10 that the population proportion of young drivers involved in car crashes that involved alcohol were more likely to be male than female and so the null hypothesis is not rejected, causing a possibility of Type II error, however there is a very rare possibility of this error occurring too as the driver at the time of accident can be wrongly classified as a Female may have not been accounted in their hypothesis having no significant effect on the overall outcome. Unconcerned to the Type II error if NZTA continues to target male driver in advertising campaign at young New Zealanders between the ages of 15 and 24 years old about the dangers of drinking and driving, will still have a positive outcome.

--

--

Aki Kapoor

Masters in Applied data science, University of Canterbury, New Zealand. Data scientist who loves to play with the data and make sense from it.