Shopify is the leading omni-channel commerce platform. Merchants use Shopify to design, set up, and manage their stores across multiple sales channels, including mobile, web, social media, marketplaces, brick-and-mortar locations, and pop-up shops. The Shopify platform was engineered for reliability and scale, making enterprise-level technology available to businesses of all sizes. Headquartered in Ottawa, Canada, Shopify currently powers over 600,000 businesses in approximately 175 countries.
Shopify has a growing number of software developers working on mobile apps such as Shopify, Shopify POS and Frenzy. As a result, their demand for a scalable and stable build system has increased. Shopify's Developer Acceleration team decided to invest in creating a single unified build system for all continuous integration and delivery (CI/CD) pipelines across Shopify, which includes support for Android and iOS. They want their developers to build and test code in a reliable way, as often as they want. Having a CI system that makes this effortless results in delivering new features quickly and with confidence, without sacrificing the stability of Shopify products.
Below is a summary of how Shopify built their iOS CI system on MacStadium infrastructure. Click here to read Shopify's full case study for more details.
The Shopify team has built their own CI system, which they call Shopify Build. It’s based on Buildkite, and they run it on their own infrastructure. They've deployed their own version of the job bootstrap script that sets up the CI environment, rather than the one that ships with Buildkite. This allows Shopify to accomplish the following goals:
1. Provide a standard way to define general purpose build pipelines
2. Ensure the build environment integrates well with our other developer tools and are consistent with our production environment
3. Ensure builds are resilient against infrastructure failures and flakiness of third-party dependencies
4. Provide disposable build environments so that subsequent jobs can’t interfere with each other
5. Support selective builds for monorepos, or repositories with multiple projects in them
Initially, Shopify Build only supported Linux environments using Docker to provide disposable environments, and it works extremely well for backend and Android projects. Previously, they had separate CI systems for iOS projects, but wanted to provide their iOS developers with the same benefits as their other developers by integrating iOS into Shopify Build.
Building infrastructure for iOS comes with its unique set of challenges. It’s the only piece of infrastructure at Shopify that doesn’t run on top of Linux. The Shopify team can leverage the same Google Cloud infrastructure that they already use in production for their Android build nodes, but, unfortunately, cloud providers such as Amazon Web Services (AWS) and Google Cloud Platform (GCP) don’t provide infrastructure that can run macOS. The only feasible option for Shopify was to use a non-cloud provider like MacStadium, but the tradeoff was that they couldn't auto-scale the infrastructure based on demand.
In 2018, the Shopify team decided to implement a new virtualization technology called Anka from Veertu. Anka provides a Docker-like command line interface for spinning up lightweight macOS virtual machines, built on top of Apple’s Hypervisor.framework. Anka has the concept of a container registry similar to Docker with push and pull functionality, fast boot times, and easy provisioning provided through a command line interface. With Anka, Shopify can quickly provision a virtual machine with the preferred macOS version, disk, memory, CPU configuration and Xcode version.
Shopify's original VMWare-based setup ran a small cluster of 12-core Mac Pros with MacStadium. The Mac Pros provided high bandwidth to the shared storage and ran multiple VMs in parallel. For that reason, they were the only viable choice for a SAN-based setup. However, Anka runs on local storage, and therefore it doesn’t require a SAN.
After further experimentation, the Shopify team realized a cluster of Core i7 Mac minis would be a better fit to run with Anka. They are more cost-effective than Mac Pros while providing the same or higher per-core CPU performance. For the price of a single Mac Pro, they could run about 6 Mac minis. Mac minis don’t provide 10 Gbit networking, but that isn’t a deal breaker in Shopify's Anka setup as they no longer need a SAN. They are running only one Anka VM per Mac mini, giving them four cores and up to 16 GB memory per build node. Running a single VM also avoids the performance degradation that the Shopify team found when running multiple VMs on the same host, as they need to share resources.
Shopify uses a separate Mac mini as a controller node that provisions an Anka VM with all dependencies such as Xcode, iOS simulators and Ruby. The command anka create generates the base macOS image in about 30 minutes and only needs a macOS installer (.app) from the Mac App Store as input. Anka’s VM image management optimizes disk space usage and data transfer times when pushing and pulling the VMs on the Mac nodes. After the provisioning completes, the controller node continues by suspending the VM and pushes it to the Anka registries. The final step of the image distribution is a parallel pull performed on the Mac minis with each pulling only the new layers from the available images in their respective Anka Registry to speed up the process.
The team uses Buildkite as the scheduler and front-end for CI at Shopify. It allows for fine-grained customization of pipelines and build scripts, which makes it a good fit for their needs. They run a single Buildkite Agent on each Mac mini and keep their git repositories cached on each of the hosts, for a fast git checkout. As part of running a build, a sequence of Anka commands is invoked. First, the base image is cloned to a temporary snapshot. This is done using anka clone. Then start the VM and wait for it to be booted and continue by mounting volumes to expose artifacts. With anka run Shopify executes the command corresponding to the Buildkite step and wait for it to finish. Artifacts are uploaded to cloud storage and the Anka VM is deleted afterwards with anka delete.
Shopify monitors the demand for build nodes and works with MacStadium to scale the number of Mac minis in both data centers. Their workload is quite spiky, with high load exceeding capacity at moments during the day. During those moments, their queue time will increase. The Shopify team expects to add more Mac minis to their cluster as they grow their developer teams to keep queue times under control.
It took Shopify about four months to implement the new infrastructure on top of Anka with a small team. Building your own CI system requires an investment in engineering time and infrastructure, and at Shopify, they believe it’s worth it for companies that plan to scale while continuing to iterate at a high pace on their iOS apps.
By using Anka, Shopify substantially improved the maintainability and scalability of their iOS build infrastructure and recommend it to anyone looking for macOS virtualization in a Docker-like fashion. During the day, their team of about 60 iOS developers runs about 350 iOS build jobs per hour. Anka provides superior boot times by reducing the setup time of a build step. Upgrading to new versions of macOS and Xcode is easier than before. They have eliminated shared storage as a single point of failure thereby increasing the reliability of their CI system. It also means the system is horizontally scalable, so they can easily scale with the growth of their engineering team. Finally, the system is easier to use for their developers by being part of Shopify Build, sharing the same interface they use for CI across Shopify.
Click here to read the full case study written by the Shopify team.