How neural networks learn to "see" clearly
A familiar kind of visual illusion is the picture of something -- silhouettes of two vases, a bowl of fruit -- that you suddenly realize can also be seen as something else: two facing profiles, or a grotesque face.
The way your perception suddenly flips from one result to another underscores the essential role of the brain in seeing. The retina captures patterns of color, light and shade -- just pixels, each one representing a signal from a rod or cone -- and the brain turns them into an idea. The ability to flip back and forth -- fruit-dog, dog-fruit -- reveals how dog-ness and fruit-ness are products of the brain, not the eye. Only when consciousness projects the idea back onto the image do we subjectively experience what we see as a thing with a discrete identity, rather than as a muddle of color, light and shade.
Recognizing that seeing is a two-stage process makes the ability of computers to identify cats and dogs and individual human faces a little less mysterious. Some process goes on in our brains to convert the retina's pixel information into the sensation of a face. Some other, possibly similar process also goes on in the computer. Because the two processes are thought to be broadly analogous to one another, the computer version has been dubbed a "neural network". How precise is the analogy? We don't know enough about the brain, or even about exactly what happens in a neural network, to answer that question. But neural networks learn to derive general "ideas" from repeated exposure to images in a way that very much resembles how human brains turn patterns of color, light and shade into "things". This "machine learning" process is the main component of artificial intelligence.
Neural networks are trained much the way babies are: by repeated exposure to images (or sounds, smells, textures, etcs), combined with information about what the image is. Unlike babies, who process the world in great undifferentiated gulps, computers learn one idea at a time. Show the computer 10,000 pictures of cats, each one labeled "cat", and the neural network will call the 10,001st image a cat, even if it's a different cat from all the others. After 100,000 images, its sense of "catness" will rival any human's, notwithstanding the great variety of cats in the world and the infinitude of ways in which they can be depicted.
When this ability of computers to parse visual data was first conceived in the 1960s, its potential utility in medical diagnosis was immediately apparent. For decades, however, every proposed application for computer vision – medical included – was frustrated by the limited speed of computer processors required for running machine learning algorithms and the limited availability of digital information required for training them. Those impediments gradually eroded over the last two decades of the 20th century, clearing a path for machine learning systems capable of recognizing objects – cats, for example – in digital imagery.
Beginning about a decade ago, this ability of computers to parse visual data finally began to be applied to medical imagery. In the scheme of visual perception, teaching a computer to recognize cats is relatively straightforward, however. It is easy to build data sets for training a computer to identify cats, because there is little or no dispute among humans as to what is or is not a cat. Teaching a computer to interpret a radiograph is more complex. Different readers may disagree about the meaning of an indication, so a large number of expert readers must be recruited to evaluate and label a large number of sample images. Their judgments cannot all be objectively verified by recourse to the patient, so the majority opinion, in each case, becomes the "ground truth" for training the neural network.
With a sufficiently large ground truth radiographic dataset, however, computers can be trained to recognize abnormal growths in lungs and other organs in the same way as they could recognize cats. And it turns out that computer vision systems trained to detect medical abnormalities can perform as well as human radiologists. Since 2010, a diverse array of radiologic computer vision systems have been approved by medical regulatory bodies and computerized pathology detection software has become an accepted diagnostic aid in oncology.
Computer vision is now being applied to dental x-rays as well.
While it’s made a later entry in dentistry than in other fields, computer vision’s impact has proved notably positive: Numerous studies and clinical trials have shown computer vision-assisted clinicians to be more accurate in detection of most common conditions, including caries, abscesses, calculus, margin discrepancies, bone loss, and impactions.
Patients are rather apt to recognize this impact because real-time patient-facing radiologic assessment is standard in dentistry. Indeed, it is in the dentist's chair that most people will probably encounter computer vision in medicine for the first time. While it is relatively (and mercifully) unlikely that anyone will need to visit an oncologist for a radiologic cancer screening within the next year or two, almost everyone will visit the dentist for oral x-rays. This fact establishes dental clinicians as the forward guard of the computer vision revolution in medicine.
Dentists should take this new role seriously. Their use of the technology will no doubt shape patient perceptions––perceptions that could either inspire demand in other areas of medicine for further computer vision innovation, or invite concern that stymies it.
Practitioners should not lose sight of the fact that although the computer can convincingly perform certain mental processes, it does not have a mind. Human judgment is still needed to provide the larger medical context for the computer's perceptions. Computer vision does not do the dentist’s job; it just makes them better equipped.