How does a cascading feature analysis model work?
I need you to note that the model is entirely mute. It reads visually, making no reference to sounds. It simulates, in other words, direct, or lexical, reading. The model postulates that letters are recognised by visual feature analysis; by analysis of the visual features of letters alone. Letters are analysed by certain properties - a line here, a curve there, a bit below the line here and so on. A feature, once identified, is made to excite the representation of every letter which contains it and inhibit all those which do not (the + and - symbols on the diagram indicate this). Letters are rapidly and efficiently recognised in this paradigm. Suppose, for example, a letter appears and a down stroke (|) is identified. The representations of all letters which include such a downstroke (BDEFHIJKLMNPRTbdfhlt) will begin to be activated. All other letters will begin to be inhibited. Suppose that this letter also contains a horizontal stroke (-). Representations of all letters containing a downstroke and this horizontal stroke will begin to be activated, and all others inhibited. With only two features identified we have narrowed it down to a probable B, E, F, H, L, T, t or f. Recognition of a twirl to the right at the top as well as the two features already identified, but no others, would home us in on the letter f. We got there in three, in fact. If the program were to contain a little battery of, say, a dozen or so features against which letters are checked we can envisage it building into a comprehensive letter recognition system. And if the program is simply produced by wiring up particular connections in particular circuits or, in biological terms, neural nets, or pattern associator networks, then it is also an automatic and robust system. It will also cope well with imperfect material (and see notes to this chapter). Once the circuits, the neural nets, have been built (which is what learning is, in this paradigm) then the ensuing use of them is forever automatic. Letter recognition will have become, as it is in fact, something we ‘do without thinking’.
It is time to interject another short deviation to examine what seems, at first sight, to be a paradox. It would seem to make sense to postulate a feature recognition system of great specificity, a meticulous, pedantic system demanding precision in detail. On the face of it, it would seem likely that we would have a feature recognition system which is tremendously precise, a recognition system which very precisely identifies features and which then matches them very precisely with other features to achieve very precise recognition. Such a system would really be a thing to be proud of, would it not?
In fact, it would be a disaster. It would be next to useless because, of course, in the real world hardly anything is ever truly precise. Consider, for example, the dog. We can all recognise a dog for what it is – a dog. We do this, on the strength of just a brief glance, very fast, very accurately, very consistently and very definitely. We do it so easily that it seems deeply unsurprising. But dogs come in a large variety of shapes, sizes, colours, degrees of hairiness, bounciness and so on. You can, nonetheless, easily recognise a dog even if you only get a glimpse of a bit of it for a fragment of time. You can recognise a dog from any angle; from in front, from the side, from behind. You can recognise a dog this way up or that way up. You can even recognise a dog the likes of which you have never seen before.
What are we, probably, doing here, when we identify a dog as a dog? We are probably applying a shortlist of feature ‘questions’ (and, as we have seen with letter identification, probably quite a short list will do). We probably apply the shortlist to the evidence, to the data coming in, and accept features even if they are only approximate. By accepting approximations, but using a feature list which is long enough, we can swiftly identify even completely new examples of known categories (such as dogs). Maybe we categorise something as a dog if it shows, say, 10 out of our 12 listed features of a typical dog to a degree of, say, 85% satisfaction? Very occasionally we may (very occasionally we do) get it wrong, but it probably won’t happen very often and is probably an acceptable risk for the enormous gain in flexibility. If we had to learn and carry about exact recognition units for everything we had ever seen, from every angle and in every circumstance, our brains would have to be the size of large water melons – and we still wouldn’t be able to recognise anything we had not seen before in exactly that situation. (You may pursue the advantages of fuzzy thinking in Kosko 1994.)