To the average person, artificial intelligence seems to be making great progress. According to the press releases, and some of the more gushing media accounts, OpenAI’s DALL-E 2 can seemingly create spectacular images from any text; another OpenAI system called GPT-3 can talk about just about anything; and a system called Gato that was released in May by DeepMind, a division of Alphabet, seemingly worked well on every task the company could throw at it. One of DeepMind’s high-level executives even went so far as to brag that in the quest for artificial general intelligence (AGI), AI that has the flexibility and resourcefulness of human intelligence, “The Game is Over!” And Elon Musk said recently that he would be surprised if we didn’t have artificial general intelligence by 2029.
Don’t be fooled. Machines may one day be as intelligent as humans, or even smarter than people, but the game is far not over. There is still an immense amount of work to be done in making machines that truly can comprehend and reason about the world around them. We need to do less talking and do more basic research.
While there are some areas where AI is making progress, such as synthetic images that look more real and speech recognition that works in noisy environments, we are still far from general-purpose, human-level AI capable of understanding the meanings and dealing with unexpected obstacles and interruptions. We are still stuck on precisely the same challenges that academic scientists (including myself) having been pointing out for years: getting AI to be reliable and getting it to cope with unusual circumstances.
Take the recently celebrated Gato, an alleged jack of all trades, and how it captioned an image of a pitcher hurling a baseball. The system returned three answers: “A pitcher throwing a ball at a pitcher on top of a field of baseball,” “A player in baseball pitching a ball at a pitcher on the field of baseball,” and “A player in baseball playing and catching during a game of baseball.” The first answer is correct. However, the second and third answers are hallucinations of players not visible in the image. The system does not know what is actually in the image, but it does know what is common for roughly similar images. Any baseball fan would know that this is the pitcher throwing the ball. We expect a catcher or a batter to be nearby but they do not appear in this image.
A baseball player pitching a ball
On top of a baseball diamond.
A man throwing a baseball at a
A pitcher on a baseball field.
A baseball player at bat and a
catcher in the dirt during a
DALL-E 2 couldn’t tell the difference in a red cube placed on top a blue cube and one placed on top a redcube. A newer version of the system, released in May, couldn’t tell the difference between an astronaut riding a horse and a horse riding an astronaut.
When systems like DALLE make mistakes, it is amusing but other AI errors can cause serious problems. To take another example, a Tesla on autopilot recently drove directly towards a human worker carrying a stop sign in the middle of the road, only slowing down when the human driver intervened. The system could recognize humans as they appeared in training data and stop signs in their usual locations, but it failed to slow down when faced with the unusual combination of both. This placed the stop sign in an unusual and new position.
Unfortunately the fact that these systems are still not reliable and can’t cope with new circumstances is often hidden in the fine print. DeepMind reported that Gato performed well on all tasks, but not as well as contemporary systems. GPT-3 often creates fluent prose but still struggles with basic arithmetic, and it has so little grip on reality it is prone to creating sentences like “Some experts believe that the act of eating a sock helps the brain to come out of its altered state as a result of meditation,” when no expert ever said any such thing. These problems are not apparent from a cursory glance at the headlines.
The subplot is that the largest AI research teams are not found in academia, where peer review was a common practice, but in corporations. Corporations, unlike universities, are not motivated to be fair. Instead of submitting their new papers to academic scrutiny they have chosen to publish them by press release, seducing journalists, and bypassing the peer review process. We only know what the companies want.
In the software industry, there’s a word for this kind of strategy: demoware, software designed to look good for a demo, but not necessarily good enough for the real world. Demoware is often referred to as vaporware. It is a product that is announced for shock and awe, but is never released.
Chickens are known to return to their nest eventually. Although cold fusion sounds great, you can’t still get it at the mall. The winter of deflated expectations is likely to see AI’s cost rise. Too many products, like driverless cars, automated radiologists and all-purpose digital agents, have been demoed, publicized–and never delivered. The promise of self-driving cars is still being made, and the investment dollars continue to flow in. However, if the core problems of reliability or coping with outliers don’t get resolved, investment will dry up. We will be left with powerful deepfakes, enormous networks that emit immense amounts of carbon, and solid advances in machine translation, speech recognition and object recognition, but too little else to show for all the premature hype.
Deep Learning has improved the machine’s ability to recognize patterns in data but it has three major flaws. Ironically, the patterns it learns are not conceptual. The results it creates can be difficult to interpret and are difficult to use within the context of other processes such as memory and reasoning. Les Valiant, a Harvard computer scientist, noted that “The central challenge [going forward], is to unify… learning and reasoning.” It is impossible to deal with someone who is carrying a stop sign if one doesn’t understand what it is.
For now, we are stuck in a “local minima” where companies focus on benchmarks rather than foundational ideas. They make small improvements to the technologies they have, but don’t stop to ask fundamental questions. Instead of pursuing flashy straight-to-the-media demos, we need more people asking basic questions about how to build systems that can learn and reason at the same time. Instead, engineering practice is much ahead of science. It’s harder to understand tools than to create new tools and a better theoretical ground. This is why basic research is still crucial.
A large portion of the AI research community, like those who shout “Game Over”, doesn’t even see this. It is heartbreaking.
Imagine an extraterrestrial looking at shadows on ground and observing human interaction. Perhaps it would notice that shadows grow and shrink at different times of the day, without ever looking up to see the sun, or recognize the three-dimensional world above.
It’s time to get serious about artificial intelligence research. PR alone won’t solve AI.
This is an opinion and analysis article, and the views expressed by the author or authors are not necessarily those of Scientific American.