Abstract | Counting how many people are in an image or video stream is central in may applications such as video surveillance, traffic control, and emergency management, and is core part of the methods adopted in crowd analysis. Unfortunately, crowd counting is a very hard problem that requires a lot of labor and time if done manually, as some images can contain thousands of people. There’s therefore a very high demand for automated methods that make population estimates without physical measurements. There have been many attempts for automated crowd counting through direct feature definition and extraction, but due to the high level of complexity in crowd images, these methods haven’t given very accurate results. More recently,Deep Learning techniques started to be heavily used for automated crowd counting tasks, giving promising results. However, the accuracy of Deep Learning models drastically decreases when used in real-life situations because the training dataset and the testing dataset usually have significantly different critical features, such as camera angle, distance, properties of the environment, and density of the crowds. In this project, we extend state-of-the-art crowd counting deep learning models by including a procedure to tune a pre-trained neural network on the spot while testing, to maximize the performance on specific unseen test images. At this aim, we use a Deep Learning based image retrieval system to imitate the role of associative memory in the human brain in labeling the testing images based on the previously seen training images. We then use Few-Shot learning techniques to retrain the model on the spot using the new labeled testing images. Tests on a number of standard benchmark datasets show that or approach can robustly improve the performance of the reference deep learning model.
|
---|