With Microsoft expected to announce something called “Avatar Kinect” at CES in Las Vegas tomorrow, the company’s posted up a lengthy piece on the Xbox blog about how to camera actually works.
The breakdown takes in the sensor itself, how it finds and recognises humans, gesture recognition and more.
The “brain” part of the equation’s awesome. Apparently Kinect uses something called “multiple centroid proposals” to judge whether or not the body parts the camera’s seeing and the “right” body parts.
“Each pixel of the player segmentation is fed into a machine learning system that’s been trained to recognize parts of the human body,” explained the company.
“This gives us a probability distribution of the likelihood that a given pixel belongs to a given body part. For example, one pixel may have an 80% chance of belonging to a foot, a 60% chance of belonging to a leg, and a 40% chance of belonging to the chest. It might make sense initially to keep the most probable proposal and throw out the rest, but that would be a bit premature.
“Rather, we send all of these possibilities (called multiple centroid proposals) down the rest of the pipeline and delay this judgment call until the very end.”