Detailed Results and Tables
from
Regression Analysis
The distribution of the dichotomous dependant variable (0, 1) presents
a certain number of challenges when we try to apply a linear model. For
one, the actual values of the dependant variable are limited to one of
two choices. This would create a very strong bias in the distribution
of the error term, thus violating the assumptions of a general least-square
linear model which states that error terms have to be normally and independently
distributed.
Instead, the likelihood of an event is expressed on a continuum, from
complete certainty that the event did not occur to the complete certainty
that it did. The two certainties are obviously hypothetical since we can
never achieve complete statistical certainty, only an approximation within
a certain interval of confidence. The likelihood is a product of individual
contributions that represents the odds-ratio between two mutually exclusive
possibilities: the event occurring (probability = p) and the event not
occurring (probability = 1-p). The distribution of p is an s-shaped non-linear
curve, between zero and one. The odds ratio ranges from zero to + infinity.
As p gets closer to one, the converse (1-p) gets closer to 0, drawing
the likelihood estimate closer to + infinity. As p gets closer to zero
and its converse closer to one, the limit of the likelihood approaches
zero.
By taking the natural logarithm of the odds, we obtain a logit: ln (p/(1-p)).
The values of a logit range from – infinity (when p = 0) to + infinity
(when p = 1).
A logistic regression refers to a linear function of a set of x independent
variables:
L = b0 + b1x1 + …+
bkxk
where L is a logit, that is, ln (p/(1-p)).
|