HTML UI element extraction

[ PwC US ] Prasang Gupta, Swayambodha Mohapatra

Novel Multi Pass Inference Technique

We achieved $3^{rd}$ rank in the hackathon.

AIM

The aim of this hackathon was to localise and identify several different HTML UI elements in hand-drawn wireframe drawings of websites.

DETAILS

The dataset provided for the hackathon contained about 3000 wireframe drawings of websites. The goal was to identify the different HTML UI elemnents present in the image, such as “Text Box”, “Button”, “Image”, etc. Hence, it boiled down to an object detection problem.

We tried using several Object Detection algorithms like YOLO, R-CNN and Mask-RCNN and decided on Mask-RCNN as it was providing us with the best results. However, one thing we observed in our outputs was that our Precision scores were good, but the model was lacking in Recall bringing the whole F1 down. To solve this problem, we came up with a novel technique “Multi-Pass Inference” that booosted our recall scores.

The technique involves running the image through the model multiple times, each time taking note of the objects that are already detected and removing them for subsequent passes. This forced the model to predict more instances of the elements in the image. We smarlty combined the objects detected in multiple passes to overall boost the recall score of our model helping us to take a podium spot in the leaderboard.

IMPACT

The solution built was performing really good on unseen test images managing an mAP (IoU > 0.5) score of 64.12. This solution was later implemented into a pipeline to allow rapid prototyping of websites and dashboards.

Prasang Gupta
Prasang Gupta
Senior Associate, Emerging Technologies

My research interests include distributed robotics, mobile computing and programmable matter.

Related