Stop reading and solve a real world ML problem

One of the best ways of learning comes from solving a real world problem. A machine learning colleague suggested I solve a real world problem that I was facing using machine learning. My mind was drawn to a problem on the road where I live. Many drivers drive too fast down the road in the evening. I wanted to raise awareness to the police about how fast these vehicles were going. 

I created a github repo with the work. Please take a look, and https://github.com/amplifiedengineering/opencv-radar. The following post is about the process I did to come to that solution.

My first approach was to find a LIDAR sensing device to measure the speed of the vehicles, and capture the license plate information through a camera. As I began researching LIDAR devices, I realized some of the less expensive models were custom built for a specific purpose, for example this devices’ spec sheet shows it does a single reading which is built for measuring the depth for drones to avoid ground collisions. That wouldn’t help me in my tracking, where I need to be able to measure multiple targets potentially (two cars going the same way, or two cars going different ways). 

As I was describing this to a friend who is into robotics, he suggested a different approach. Why not measure the speed using video? I liked this approach, as I could record video from my iPhone, without buying expensive single use hardware, and then run the calculation using software. As I started thinking about how to solve the speed calculation, it seemed straightforward: have a set of lines on the video that mark a well known distance. Then calculate the speed based upon the number of frames it takes for the vehicle to transit between the two lines. 

Speed (miles per hour): miles (distance in feet / distance in feet of a mile) by hour (total # of frames * frames per seconds * ( 1 / seconds per hour )

To be specific, I traded off real-time video processing for using post-processing. So, this approach wouldn’t work for all applications, but I was more interested in grabbing a video file from my phone, doing post processing and finding how many people were speeding. This way, I didn’t have to worry about the frame rate causing a queue to develop in the stream (which can be solved by sampling frames rather than processing each frame).

The first problem was detecting where the vehicle was in the image. I knew about the openCV library, so I started down a path on exploring this. I used a Haar classifier to detect the cars and experimented with multiple different hyperparameters to tune the detection of the cars and trade of the # of frames per second I could process using detectMultiScale API. Some of the key trade-offs here were the size of the video frame to be processed (the smaller the faster the processing, but the less ability to detect), the scaleFactor (API), and the minNeighbors (API). I played around with how much I would resize the greyscale image of the video frame before passing it into the detectMultiScale API. It was working “ok”, but I noticed that there was a bit more choppiness (jumpiness) in the object detection borders, so I looked for a way to improve it. 

I found this blog post informative. It uses Haar classifiers with background averaging to detect the differences in pixels and then creates bounding boxes on those detected contours differences from the background. This again improved the performance slightly. However, I was still seeing very unstable bounding boxes around the cars, which left me wanting a more stable detection mechanism.

Finally, as I was talking with another principal level developer about an unrelated topic, he was sharing about some work which he had done using OpenCV and Yolo. I decided to look up Yolo. I was impressed with the improved stability in object detection and similar performance. At first, I was a bit disappointed that the Yolo architecture requires a color image, I was previously using greyscale (with less channels used in inference I assumed it would be faster). After that I discovered that the architecture would also do all of the resizing for you when you did inference, and that the previous resizing I was doing was superfluous. Finally, I modified from using the v3 predict function to find the objects to the v8 track function

Figure 1: setup screen to verify the start / end lines for video

Figure 2: tracking cars using the YoloV8 track API function, with tracking trails

Hope you enjoyed the ride. A few potential improvements which I’m thinking of now, detecting the speed based upon well known sizes and distance (specifically I could use license plate size), and ability to collate together multiple views to tag a higher resolution view of the license plate.