ZooScan - Part 4: Using the Swift Vision Framework to Classify Animals
Over the course of the past few posts (see the overview here), we’ve introduced the ZooScan app and developed its UI using SwiftUI. In this fourth part, we will focus on integrating the Swift Vision framework to classify animals based on images captured by the app.
Creating a Protocol to Define Image Classifiers #
The first step is defining a protocol for our animal classification model. By using a standardized interface, we can easily switch between different models in the future if needed. Here’s how we can define the protocol:
In the code snippet above, we define two components. The ClassificationResult
struct represents the result of an image classification, containing a label and a confidence score. The ImageClassificationModel
protocol declares a method classify(image:)
, which takes a UIImage
and returns an optional ClassificationResult
. Note that the result of classify(image:)
is optional. This allows us to return nil
when it was not possible to make a classification. This provides us with a flexible way to implement different image classification models in the future.
Implementing a concrete model using the Swift Vision framework #
Using this protocol, we can now implement a concrete model that utilizes the Swift Vision framework for image classification. Here’s an example implementation:
This shows how easy it is to implement a concrete model using the Vision framework. The VisionClassificationModel
struct conforms to the ImageClassificationModel
protocol and implements the classify(image:)
method. Let’s break down the key parts of this implementation. The first thing we do is create a ClassifyImageRequest
instance, which is part of the Vision framework:
This request is configured to crop and scale the image to the center, ensuring that the most relevant part of the image is analyzed. This step prepares the image for classification. The underlying image models, most often Convolutional Neural Networks (CNNs), are designed to work with images of a specific size and aspect ratio, often around 224x224 pixels. Most camera images are a lot larger than that. Therefore, the image needs to be resized and cropped to fit the model’s requirements. By also centering the crop, we ensure that the most important part of the image is retained, which is especially useful for animal classification where the subject is typically centered in the frame.
After configuring the request, we need to convert the UIImage
to PNG data:
The ClassifyImageRequest
can take the image data in various formats. To classify the image, we use an overload of the perform(on:)
method that accepts image data. The pngData()
method converts the UIImage
to PNG format, which is suitable for to pass to this method. The pngData()
method returns an optional Data
object, so we use a guard statement to ensure that the conversion was successful. If it fails, we return nil
, indicating that the classification could not be performed.
To perform the classification, we call the perform(on:)
method of the ClassifyImageRequest
, passing in the image data:
This method is asynchronous, so we use await
to wait for the results. The perform(on:)
method returns an
array of classification results, which we then filter to include only classifications meeting a minimum
precision and recall threshold. We’ll discuss what these terms mean in a future post. For now, just think about
it as a way to ensure that we only keep the most relevant classifications. Note that the perform(on:)
method
also is throwing so we need to include it in a do-catch
block to handle any potential errors that may occur
during the classification process.
Finally, we pick the first result from the filtered results, which is the best classification based on the confidence score:
We return a ClassificationResult
containing the label and confidence score of the best classification. If no results are found, we return nil
, indicating that the classification was not successful.
Adding the Classification Model to the AnimalStore #
With the VisionClassificationModel
implemented, we can now integrate it into our app. We will use this model to classify images captured by the user and display the results in the UI. Go to AnimalStore.swift
and change the top of the AnimalStore
class to look like this:
Here we create an instance of the VisionClassificationModel
and store it in a property called classificationModel
. To use the model to classify the images that we captured, change the addAnimal(image:)
method to use the classification model to classify the image and create a ScannedAnimal
instance with the classification result. The updated method looks like this:
The update method now uses the classificationModel
to classify the image asynchronously. If the classification is successful, it creates a new ScannedAnimal
instance with the classification result and adds it to the animals
array. If the classification fails (i.e., returns nil
), it simply exits without modifying the array. This way, we ensure that only successfully classified animals are added to the store. All of this code
is also wrapped in a Task
to ensure that we can wait for the asynchronous classification to complete before proceeding. With this, we’ve successfully integrated the Vision framework into our ZooScan app, allowing us to classify animals based on images captured by the user. You can now run this app on a device and test the classification functionality.
Implementing a Dummy Model for the Simulator #
Unfortunately, the implemented code only works on a physical device. The Vision framework does not work with the simulator. When you try to run the app in the simulator, you will get an error when you try to classify an image. The error message will look something like this:
Error classifying image: internalError("Error Domain=NSOSStatusErrorDomain Code=-1 \"Failed to create espresso context.\" UserInfo={NSLocalizedDescription=Failed to create espresso context.}")
To keep the app testable in the simulator, we need to work around this limitation. We can implement a dummy classification model that simulates the behavior of an image classifier. This dummy model will return some classifications picked from a default array with four animal names (giraffe, penguin, watusi, zebra). It keeps the same order and starts again when it reaches the end of the array. This way, we can test the app in the simulator without running into errors. Here’s how we can implement this dummy model:
The last thing we need to do add the dummy model to the AnimalStore
class. We can use
a conditional compilation directive to check if we are compiling for the simulator or on a physical device.
If we compile for the simulator, we will use the DummyClassificationModel
, otherwise we will use the VisionClassificationModel
:
We’ve now finished a basic version of the ZooScan app that we can use in the simulator and on a physical device. You can see that Apple really made it simple for us to add complex functionality to our apps. The image classification only took us a few lines of code to add! Now it’s time to test the app. Take the app for a spin during your next visit to the zoo! Hope you enjoy!
Conclusion #
In this post, we have successfully integrated the Swift Vision framework into our ZooScan app to classify animals based on images captured by the user. We defined a protocol for image classification models, implemented a concrete model using the Vision framework, and integrated it into our app. Additionally, we created a dummy model to allow testing in the simulator. In the next post, we will have a look at how the app performs in real life and look at the type of model that the Vision framework uses under the hood. Stay tuned for more! 😃