This project started out as an attempt to figure out whether I had really been spotting an abnormally high number of Teslas--specifically Model Ys--driving past me on my walks for the past two months. The plan was to record footage of my trips, ideally with Meta's Ray-Ban Stories, and then use an object detection computer-vision model to analyze the footage and gauge whether the frequency of white Teslas was actually significantly greater than that of any other car model I come across regularly. However, after thinking about this for more than 2 minutes and chalking it up to the Baader-Meinhof Phenomenon, I decided to switch my focus to license plate detection and blurring instead, as that has a lot more practical use cases (e.g. editorial images and car rental platforms).
To train a computer to identify objects, you need to feed it data to iterate over and learn more about using a specific algorithm, precisely data that contains the tagged patterns you want it to recognize. This model can then be used to identify those patterns in the new data you provide to it.
That's the general gist of the process, albeit a bit simplified. I knew I would have to feed my model lots of images of license plates. The tricky part was figuring out how I was going to get images (with permissive licenses) to populate the dataset. I considered walking around and taking pictures of cars and license plates in the neighborhood, but that felt weird and invasive. Plus, the idea of a mostly 5'11/6ft-tall (depending on my mood), rarely-smiling African guy, taking pictures of random people’s cars somehow didn’t seem like a viable or safe option for me.
Then it occurred to me, why not try using synthetic data? Unreal Engine recently released a City Sample project, which uses the assets in The Matrix Awakens: An Unreal Engine 5 Experience. With this playground, I could simulate different lighting conditions and capture pictures of cars at any angle.
Although this is a virtual environment, I figured the photorealism of the assets in the UE5 engine would provide enough data to apply to real-world scenarios, so I set up different camera positions in the scene at different times of day and recorded footage.
Since all the car assets and license plates were present in the scene as 3D models, I considered setting up a script to automatically generate annotation data, instead of having to tag the data manually afterward. However, I only gave myself a day to do all the capturing and training, so I wanted to avoid coding during this stage, if possible.
In that vein, I decided to use Roboflow to handle the entire training and model generation process. Their platform simplified the process of tagging the license plates in the source data I uploaded, and I was able to breeze through hundreds of images pretty quickly. With this initial dataset, I decided to train the model to see if I was getting anywhere.
After fifteen minutes or so, the model generation process was complete, and the results were looking good, with a mean average precision (mAP) of 81.7%. This 'mAP' is a metric used to measure an object-detection model's performance based on the dataset, Roboflow explains the concept well here if you want to learn more. Now it was time to test the model on fresh data, I uploaded a few photos of parked cars, and the model not only detected some of the license plates in the images, but it also falsely detected a few objects that weren't license plates.
I attributed most of the false positives to the resolution of some of the annotated images in the dataset. However, I was more particular about the false negatives, or the model not accurately detecting some license plates in the provided images. I believe this was due to the lack of variety in the 3D model license plate assets, compared to the real world, which has a surprising range of plate types even within the same geographical location. I had used up most of my time budget, and modeling or importing new license plates into the 3D scene was out of the question, so I decided to merge my dataset with an existing U.S. license plate dataset.
This significantly improved the results, and while the model sometimes struggled with Canadian license plates, the false positives had dropped significantly, and I was happy enough with the current state of things to move on to the next stage.
Roboflow provides a hosted REST API for models created on the platform. You can pass an image in base64 format or as a URL in the request body, and get a result in this JSON format:
Essentially, the API returns the uploaded image's dimensions, an array of bounding box dimensions for each license plate it detected in the image, and how confident the model was about each prediction.
With this, I had all the information I needed to blur the license plates. I knew I would have to work with HTML5 Canvas elements, and I had heard about the fabric.js library which abstracts some of the more complicated aspects of canvas manipulation, so I decided to finally use that here.I set up a simple upload form, and on submit, I send the image to the Roboflow API on the server. While I could've made the API call on the client, I wanted to avoid referencing the API key on the front end. Furthermore, I wanted to be able to share the project with others without forcing every recipient to use their own key.
Once the API call is successful, I load the uploaded image into the canvas element. When a user clicks on the Blur License Plates button, I loop over each prediction in the returned array where the model's confidence is equal to or greater than the user's provided minimum confidence value. For each prediction, I add a new image object to the canvas and then create a clipping mask by extrapolating the coordinates to match the current image object's resolution and masking out any parts of the image outside of those dimensions. This means only the section of the image within the prediction's bounding box will be visible. Because this new image object is the same as the original, you won't notice it, but if we apply a blur to this image object (using the provided blur strength value) you can see the effect.
Is this the most performant way to achieve this result? Probably not, but for this experiment, it did the job. If I were to turn this into a complete tool, I would also add the ability to manually blur out any license plates our model might have missed and add the current image to the dataset (with the user's permission). For now, I have no plans of expanding on or releasing the project in this form as it is a bit barebones at the moment, but if it would be helpful, I might turn it into a public Github repo and share it here.