In our Outlook for 2022, we posed the question of whether data clouds – or cloud computing in general – get easier this year. Our question was directed at the bewildering array of cloud services. There’s lots of choice for the customer, but could too much choice be too much of a good thing?
There’s another side of the equation: picking your cloud computing footprint. Serverless is supposed to address that. You subscribe to the service, and the cloud (or service) provider will then autoscale the cluster based on the default instance types for the service. A startup that just won seed financing, makes the case that serverless is more about convenience than efficiency.
Sync Computing has just emerged from stealth with $6.1 million seed financing and is now offering a cloud-based Autotuner service that will introspect the logs of your Spark workload and will recommend the optimal instance footprint. Sync Computing chose Spark because it is poplar and therefore a logical first target.
Let’s get more specific. It factors in the cloud that the Spark workloads have been running on, taking into account the types of available compute instances and relevant pricing deals.
The natural question to ask is, doesn’t serverless compute already address this issue by letting the cloud service provider to run the autoscaling? The answer is of course quite subjective. According to CEP and cofounder Jeff Chou, serverless is more about convenience that efficiency.
But there is another part of the answer that is objective: not all cloud computing services are serverless, and Spark, Sync’s initial target, is in most cases currently only offered as a provisioned service. A few months back, Google Cloud launched serverless Spark, while Microsoft introduced serverless SQL pools for Azure Synapse (which allows query to external Spark tables) and Databricks offers a public preview.
We’ve railed about the issue of juggling cloud compute instances in the past. For instance, when we last counted a few years back, AWS had five categories of instances, 16 instance families, and 44 instance types – we’re sure that number is larger now. A couple years ago, AWS launched Compute Optimizer, which uses machine learning to identify workload patterns and suggested configurations. We haven’t come across similar offerings for other clouds, at least yet.
There’s an interesting back story to how Sync came up with Autotuner. It was the outgrowth of applying the Ising model to optimize the design of circuitry on a chip. Ising looks at the phase changes that occur within a system, which can apply to anything having to do with changing state – it could be the thermal state, or the phase change of a material, or the changes that occur at various stages of computations. And that’s where optimization of the cloud compute footprint comes in, for a specific problem – in this case, Spark compute runs.
With the company coming out of stealth, its offerings are work in progress. The basic pieces of Autotuner are in place – a customer can submit logs of its previous Spark compute runs and the algorithm will perform optimizations offering a choice of options: optimize for cost or optimize for performance; then the customer goes back. In many ways, it is akin to classic query optimizations for SQL. It currently supports EMR and Databricks on AWS. A reference customer, Duolingo, was able to cut its job cluster size by 4x and job costs in half.
Going forward, Sync Compute intends to upgrade Autotuner into an API that can work automatically; based on customer preferences, it would automatically resize the cluster. And then it intends to extend this to job scheduling and orchestration. Just as there are optimizations for compute instances, there are optimizations for scheduling a series of jobs, chaining jobs that would require the same compute footprint together.
Of course, with anything related to data, compute is not the only variable; the form of storage also factors in. But at this point, Sync Computing is targeting compute. And for now it is targeting Spark compute jobs on AWS, but there is no reason that the approach couldn’t be extended to Azure or Google Cloud, or applied to other compute engines, such as those used for neural networks, deep learning, or HPC. It’s a start.