In Computer Vision, one of the most interesting area of research is obstacle detection using Deep Neural Networks. A lot of papers went out, all achieving SOTA (State of the Art) in detecting obstacles with a really high accuracy. The goal of these algorithms is to predict a list of bounding boxes from an input image. Machine Learning has evolved really well into localising and classifying obstacles in real-time in an image. However, none of these algorithm include the notion of time and continuity. When detecting an obstacle, these algorithms assume it’s a new obstacle every time.
I won’t go into the details of the algorithm here, but you can have a look at this video from Siraj Raval that explains it very well.
The output of the algorithm is a list of bounding box, in format [class, x, y, w, h, confidence]. The class is an id related to a number in a txt file (0 for car , 1 for pedestrian, …). x, y, w and h represent the parameters of the bounding box. x and y are the coordinates of the center while w and h are its size (width and height). The confidence is a number expressed in %.
source:https://towardsdatascience.com/computer-vision-for-tracking-8220759eee85?fbclid=IwAR2ksoHRNrwL6r-MzKAvvPycCmPuJqDPb_2MpYZutzjcnxLNkOFDQrf5Smo