ZooScan - Part 4: Using the Swift Vision Framework to Classify Animals

Wednesday, 11 June, 2025

Over the course of the past few posts (see the overview here), we’ve introduced the ZooScan app and developed its UI using SwiftUI. In this fourth part, we will focus on integrating the Swift Vision framework to classify animals based on images captured by the app.

Creating a Protocol to Define Image Classifiers #

The first step is defining a protocol for our animal classification model. By using a standardized interface, we can easily switch between different models in the future if needed. Here’s how we can define the protocol:

import Foundation
import UIKit

struct ClassificationResult: Equatable, Hashable {
    let label: String
    let confidence: Double
}

protocol ImageClassificationModel 
{
    func classify(image: UIImage) async -> ClassificationResult?
}

In the code snippet above, we define two components. The ClassificationResult struct represents the result of an image classification, containing a label and a confidence score. The ImageClassificationModel protocol declares a method classify(image:), which takes a UIImage and returns an optional ClassificationResult. Note that the result of classify(image:) is optional. This allows us to return nil when it was not possible to make a classification. This provides us with a flexible way to implement different image classification models in the future.

Implementing a concrete model using the Swift Vision framework #

Using this protocol, we can now implement a concrete model that utilizes the Swift Vision framework for image classification. Here’s an example implementation:

import UIKit
import Vision

struct VisionClassificationModel : ImageClassificationModel
{
    func classify(image: UIImage) async -> ClassificationResult? {
        var request = ClassifyImageRequest()
        request.cropAndScaleAction = .centerCrop
                
        guard let data = image.pngData() else {
            return nil
        }

        do 
        {
            let results = try await request.perform(on: data)
                                        .filter { $0.hasMinimumPrecision(0.1, forRecall: 0.8) }

            guard let bestResult = results.first else {
                return nil
            }
            
            return ClassificationResult(label: bestResult.identifier, confidence: Double(bestResult.confidence))
        }
        catch {
            print("Error classifying image: \(error)")
        }
        return nil
    }
}

This shows how easy it is to implement a concrete model using the Vision framework. The VisionClassificationModel struct conforms to the ImageClassificationModel protocol and implements the classify(image:) method. Let’s break down the key parts of this implementation. The first thing we do is create a ClassifyImageRequest instance, which is part of the Vision framework:

var request = ClassifyImageRequest()
request.cropAndScaleAction = .centerCrop

This request is configured to crop and scale the image to the center, ensuring that the most relevant part of the image is analyzed. This step prepares the image for classification. The underlying image models, most often Convolutional Neural Networks (CNNs), are designed to work with images of a specific size and aspect ratio, often around 224x224 pixels. Most camera images are a lot larger than that. Therefore, the image needs to be resized and cropped to fit the model’s requirements. By also centering the crop, we ensure that the most important part of the image is retained, which is especially useful for animal classification where the subject is typically centered in the frame.

After configuring the request, we need to convert the UIImage to PNG data:

guard let data = image.pngData() else {
    return nil
}

The ClassifyImageRequest can take the image data in various formats. To classify the image, we use an overload of the perform(on:) method that accepts image data. The pngData() method converts the UIImage to PNG format, which is suitable for to pass to this method. The pngData() method returns an optional Data object, so we use a guard statement to ensure that the conversion was successful. If it fails, we return nil, indicating that the classification could not be performed.

To perform the classification, we call the perform(on:) method of the ClassifyImageRequest, passing in the image data:

let results = try await request.perform(on: data)
                                .filter { $0.hasMinimumPrecision(0.1, forRecall: 0.8) }

This method is asynchronous, so we use await to wait for the results. The perform(on:) method returns an array of classification results, which we then filter to include only classifications meeting a minimum precision and recall threshold. We’ll discuss what these terms mean in a future post. For now, just think about it as a way to ensure that we only keep the most relevant classifications. Note that the perform(on:) method also is throwing so we need to include it in a do-catch block to handle any potential errors that may occur during the classification process.

Finally, we pick the first result from the filtered results, which is the best classification based on the confidence score:

guard let bestResult = results.first else {
    return nil
}
return ClassificationResult(label: bestResult.identifier, confidence: Double(bestResult.confidence))

We return a ClassificationResult containing the label and confidence score of the best classification. If no results are found, we return nil, indicating that the classification was not successful.

Adding the Classification Model to the AnimalStore #

With the VisionClassificationModel implemented, we can now integrate it into our app. We will use this model to classify images captured by the user and display the results in the UI. Go to AnimalStore.swift and change the top of the AnimalStore class to look like this:

import Foundation
import UIKit

@Observable
class AnimalStore {
    var animals: [ScannedAnimal] = []
    let classificationModel = VisionClassificationModel()

    ... 

}

Here we create an instance of the VisionClassificationModel and store it in a property called classificationModel. To use the model to classify the images that we captured, change the addAnimal(image:) method to use the classification model to classify the image and create a ScannedAnimal instance with the classification result. The updated method looks like this:

func addAnimal(image: UIImage)
{        
    Task
    {
        //let animal = ScannedAnimal(image: image, classification: "Unknown", confidence: 0, isLoved: false)
        guard let animalClassification = await classificationModel.classify(image: image) else
        {
            return
        }
        
        let description = descriptions[animalClassification.label] ?? "Description not available." 

        let animal = ScannedAnimal(
            image: image,
            classification: animalClassification,
            description: description,
            isLoved: false
        )
        animals.insert(animal, at: 0)
    }        
       
}

The update method now uses the classificationModel to classify the image asynchronously. If the classification is successful, it creates a new ScannedAnimal instance with the classification result and adds it to the animals array. If the classification fails (i.e., returns nil), it simply exits without modifying the array. This way, we ensure that only successfully classified animals are added to the store. All of this code is also wrapped in a Task to ensure that we can wait for the asynchronous classification to complete before proceeding. With this, we’ve successfully integrated the Vision framework into our ZooScan app, allowing us to classify animals based on images captured by the user. You can now run this app on a device and test the classification functionality.

Implementing a Dummy Model for the Simulator #

Unfortunately, the implemented code only works on a physical device. The Vision framework does not work with the simulator. When you try to run the app in the simulator, you will get an error when you try to classify an image. The error message will look something like this:

Error classifying image: internalError("Error Domain=NSOSStatusErrorDomain Code=-1 \"Failed to create espresso context.\" UserInfo={NSLocalizedDescription=Failed to create espresso context.}")

To keep the app testable in the simulator, we need to work around this limitation. We can implement a dummy classification model that simulates the behavior of an image classifier. This dummy model will return some classifications picked from a default array with four animal names (giraffe, penguin, watusi, zebra). It keeps the same order and starts again when it reaches the end of the array. This way, we can test the app in the simulator without running into errors. Here’s how we can implement this dummy model:

import Foundation
import UIKit

class DummyClassificationModel: ImageClassificationModel {
    private let animals = ["giraffe", "penguin", "watusi", "zebra"]
    private var currentIndex = 0
    
    func classify(image: UIImage) async -> ClassificationResult? {
        let classification  = ClassificationResult(
            label: animals[currentIndex],
            confidence: Double.random(in: 0.7...1.0)
        )
        
        // Move to next index, reset to 0 if we reach the end
        currentIndex = (currentIndex + 1) % animals.count
        
        return classification
    }
}

The last thing we need to do add the dummy model to the AnimalStore class. We can use a conditional compilation directive to check if we are compiling for the simulator or on a physical device. If we compile for the simulator, we will use the DummyClassificationModel, otherwise we will use the VisionClassificationModel:

#if targetEnvironment(simulator)
let classificationModel = DummyClassificationModel()
#else
let classificationModel = VisionClassificationModel()
#endif

We’ve now finished a basic version of the ZooScan app that we can use in the simulator and on a physical device. You can see that Apple really made it simple for us to add complex functionality to our apps. The image classification only took us a few lines of code to add! Now it’s time to test the app. Take the app for a spin during your next visit to the zoo! Hope you enjoy!

Conclusion #

In this post, we have successfully integrated the Swift Vision framework into our ZooScan app to classify animals based on images captured by the user. We defined a protocol for image classification models, implemented a concrete model using the Vision framework, and integrated it into our app. Additionally, we created a dummy model to allow testing in the simulator. In the next post, we will have a look at how the app performs in real life and look at the type of model that the Vision framework uses under the hood. Stay tuned for more! 😃

Swift IOS Tutorial Zooscan App Development Vision Framework Machine Learning Animal Classification

Tim De Jong

Owner