Detailed Results and Tables from
Regression Analysis

The distribution of the dichotomous dependant variable (0, 1) presents a certain number of challenges when we try to apply a linear model. For one, the actual values of the dependant variable are limited to one of two choices. This would create a very strong bias in the distribution of the error term, thus violating the assumptions of a general least-square linear model which states that error terms have to be normally and independently distributed.

Instead, the likelihood of an event is expressed on a continuum, from complete certainty that the event did not occur to the complete certainty that it did. The two certainties are obviously hypothetical since we can never achieve complete statistical certainty, only an approximation within a certain interval of confidence. The likelihood is a product of individual contributions that represents the odds-ratio between two mutually exclusive possibilities: the event occurring (probability = p) and the event not occurring (probability = 1-p). The distribution of p is an s-shaped non-linear curve, between zero and one. The odds ratio ranges from zero to + infinity. As p gets closer to one, the converse (1-p) gets closer to 0, drawing the likelihood estimate closer to + infinity. As p gets closer to zero and its converse closer to one, the limit of the likelihood approaches zero.

By taking the natural logarithm of the odds, we obtain a logit: ln (p/(1-p)). The values of a logit range from – infinity (when p = 0) to + infinity (when p = 1).

A logistic regression refers to a linear function of a set of x independent variables:

L = b0 + b1x1 + …+ bkxk

where L is a logit, that is, ln (p/(1-p)).