Vision Framework

In the previous post, we completed a first version of our app, which can spice up your visits to a zoo by identifying the animals you see. If, like me, you took it for a spin to your local zoo, you might have noticed that the app is not perfect. It can sometimes misidentify animals, and it can also fail to assign a label at the correct level of specificity. For example, it might identify a zebra as a horse, or it might not recognize a Labrador as a “canine” or even a “mammal”. Why does this happen? We will explore the answer to this question in this post.

Over the course of the past few posts (see the overview here), we’ve introduced the ZooScan app and developed its UI using SwiftUI. In this fourth part, we will focus on integrating the Swift Vision framework to classify animals based on images captured by the app.

Creating a Protocol to Define Image Classifiers #

The first step is defining a protocol for our animal classification model. By using a standardized interface, we can easily switch between different models in the future if needed. Here’s how we can define the protocol: