Video Object Recognition

A Haystack Full of Needles

Imagine trying to find a white Toyota pickup truck in the middle of Baghdad and tracking it – or doing a facial recognition match with huge crowds of people entering a football stadium or maybe even a long line of visitors at a customs checkpoint for entry into the country. The task of finding something of interest within a huge pool of data can be daunting. It can be like looking through a haystack full of needles to find the one needle of interest.

Thankfully there are several machine based tools available to us that can help alleviate the burden of searching through mountains of information quickly to give us clues as to where to look and what to look for. The basic idea is to exploit the structure of data to search for patterns or features within the video. Of course there is no guarantee of an exact match, therefore results are often shown as a probability percentage. Two of these exciting and ever improving technologies are Object Recognition and a variant of this called Facial Recognition.

Object recognition is the process of identifying a specific object in a digital image or video. Some of the more common applications are for spotting and tracking a specific vehicle amongst many other vehicles and/or similar objects. So if there was a silver van in a major downtown city, then you would need the ability to distinguish characteristics such as color, size, features, etc. in order to pick that vehicle out from many others. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes and scales or even when they are translated or rotated. Objects can even be recognized when they are partially obstructed from view. This task is still a challenge however for computer vision systems.

Object recognition algorithms rely on matching, learning, or pattern recognition algorithms using appearance-based or feature-based techniques. Common techniques include edges, gradients, histograms, wavelets, and linear binary patterns.

We at ViON work with a company called DataFission to address the Object Recognition problem. DataFission is a platform that enables searching large amounts of video holdings, including forensic data, for previously unsearchable unstructured data, such as metadata-free video. It provides for content-based searching basically “Doing for unstructured data what Google does for text”. It processes large amounts of video and then creates metadata tables based on the objects and features within the video that it finds. You can then search the metadata repository for all representations of those particular objects. The tables themselves are miniscule in size relative to the video itself. The scientific techniques are way too complex to detail here, but suffice it to say it uses things like eigenvalues and eigenvectors and other mathematical formulas only an atomic physicist would understand to process the video.

At ViON we also re-sell a Public Safety solution called Hitachi Visualization. It includes an option for facial recognition developed by a division of Hitachi called Kokusai. Given enough video footage, it can scan and index around 36 million faces in just around 1 second. The scanning technology is quite versatile and obviously very high performance. While it obviously can’t identify people who are facing away from the camera, when it comes to people who aren’t looking dead on, the software can handle 30 degrees from straight on, both horizontally and vertically. It also requires the faces to be at least 40 by 40 pixels, but other than that there are no restrictions. So even though you might have 15 or 20 different facial views, such as shown for John Travolta through the years, the system can still detect the correct person based on facial features.

There have historically been major challenges in the facial recognition space. Face recognition requires good quality images of suspects, but most surveillance cameras in use today are decidedly “last” generation and the image quality leaves much to be desired. Typical face recognition algorithms, rather like fingerprints, rely on facial metrics. If the image, or fingerprint scan, are unable to report these metrics, the system won’t work. A further problem with face recognition is that it often reports lots of false positives and false negatives. While it is possible, through analytics, to narrow down the list of suspects, this adds a time delay and requires a lot of human judgment. There is also the fact that searching through various image databases, including video, is very time consuming. The technique used by Hitachi’s facial recognition system is called “clustering” and the idea is to link common characteristics in the database so that when these characteristics are recognized the search engine goes there first. Hitachi adds to this a technology called “edge pattern characteristics” which is another complimentary means of accelerating searching and returning accurate results.

The applications for this capability in public transit and of course law enforcement amongst others are hopefully obvious. Especially when you need to quickly identify that one needle in the haystack.

Contact & Support

Contact & Support

Video Object Recognition

Video Object Recognition

Share this article...