Mask Wearing Classifier
published April 14, 2022
Goal
To create a computer vision model to detect if a person is wearing their surgical mask correctly.
Preamble
Many computer vision models have been created to detect non-mask wearers so that shop staff can be alerted. We decided to extend this idea to include mask wearers who were wearing their mask incorrectly, and therefore ineffectively.
Data
Hundreds of ‘selfies’ were taken by the employees at my company (Sela) in one of four mask-related poses using their own mobile phone cameras:
Correct mask position
Mask over nose only
Mask over mouth only
No mask
They were also photographed at different angles – from the sides, in front, from above and from below.
Method
Around 175 training images were loaded into an Azure customvision.ai classification project, where they were labelled according to the four mask wearing poses. A compact domain was selected so as to be able to download the trained model, and the model was trained.
Results
Once training was completed the initial model performance was assessed:
A precision score of 84% appears to be rather good, however, from experience we should really be expecting much higher – up in the high 90’s in fact. This is because the images represent quite distinct differences in mask pose, and just like the human eye can distinguish between them easily, so too should the neural network that underlies the model.
One reason for the low score would undoubtedly be the low image count. Another would be the imbalance between the image counts amongst the 4 different classifications. Once one classification is more highly represented, the model will develop a bias towards it and precision and recall will fall.
In fact, when you analyse the Performance Per Tag output, it is readily seen that the ‘No Mask’ classification which is over-represented
in the data, has perfect accuracy.
The red bars above are also highlighting the relative low image count for these classes, and the yellow exclamation mark is warning us that the images are unbalanced across the data set.
So, to rectify this situation, around 100 – 150 images of each mask pose should be trained against, and we should get close to perfect accuracy.
Once a satisfactory model was generated, it was exported to a Docker container and run locally on a laptop connected to a camera. A Python script was written to ingest the images from the camera and present them to the model via the Docker container end point.
The script then retrieved the output from the end point, displaying it on a dashboard for consumption.