By the time most children are two years old, they can understand about 300 words. By the age of four, the average vocabulary has ballooned to more than 1,000 words. Our species’ incredible capacity to quickly acquire words isn’t fully understood. Some cognitive scientists and linguists have theorized that people are born with built-in expectations and logical constraints that make this possible. Now, however, machine-learning research is showing that preprogrammed assumptions aren’t necessary to swiftly pick up word meanings from minimal data.
A team of cognitive and computer scientists has successfully trained a basic artificial intelligence model to match images to words using just 61 hours of naturalistic footage and sound—previously captured from the perspective of a child named Sam in 2013 and 2014. The study, published on Thursday in Science, used video and transcribed audio recorded by a head-mounted camera that was placed on Sam intermittently when he was six to 25 months old. Although it’s a small slice of a child’s life, it was apparently enough to prompt the AI to figure out what certain nouns mean.
The findings suggest that the recipe for language acquisition could be simpler than previously thought. Maybe children “don’t need a custom-built, fancy-pants language-specific mechanism” to efficiently grasp word meanings, says Jessica Sullivan, an associate professor of psychology at Skidmore College. Sullivan studies language development and was not involved in the new research, though she and others produced the  video dataset that was used in the work. “This is a really beautiful study,” she says, because it offers evidence that simple information from a child’s worldview is rich enough to kick-start pattern recognition and word comprehension.
On supporting science journalism
If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the…
Read the full article here