Toddler Science and Big Data
I’ve been spending a lot of time following my son Patrick around watching him explore the world. I’ve shared a few of his important discoveries with Twitter and with friends, under the tag “Toddler Science”. Key discoveries include that tissue boxes contain a finite supply of tissue and that cat magnets do not stick to cats. I spent the New Years weekend with friends, and they too had an opportunity to watch Patrick learning.
The metaphor that keeps coming to mind, particularly when watching the destructive testing of my belongings, is that a toddler is a video gamer playing with a new engine. They get their bearings by testing out the physics model: how far can I throw a grenade? Do they bounce? If I shoot up the wall and leave the room, does the damage persist or does the state reset?
Patrick is learning the physics engine of his world. To do it, he is doing a large number of observations. And try to make sense of these observations, which is quite difficult.
“He needs to figure out if he is a universe governed by Newtonian mechanics, or just the Quake engine.”
“Aristotle never got past the Quake engine.”
Of course, humans don’t make use of nearly all the data that is available to them. If they did, they would learn a whole lot more a whole lot more quickly, as we can learn from Eliezer Yudkowsky’s fable That Alien Message. But we do receive a whole lot of data, and somehow filter through it to learn to interact with a complex universe, to empathize and communicate with other humans, to learn and to create new ideas. The amount of data consumed by a child before he learns to speak or throw a ball is staggering.
Computer Science is just starting to learn how to manage and analyze data sets at this scale. Today’s Big Data systems come in many flavors, from Google and Facebook to Walmart and eBay. There is some debate about what big data means, with Curt Monash and Dan Abadi having recent posts on the topic.
Regardless of how you define it, there is lots of data available that computers are still ignoring. What passes for big data in artificial intelligence is only starting to approach what passes for big data in biological systems. As big data gets truly big, interesting things will happen. It may be that previous generations of AI weren’t bad ideas. They were just data starved.