January 10, 2024
In November 2023, we founded Argmax to empower developers and enterprises who are eager to deploy commercial-scale inference workloads on user devices. To turbocharge our ambitious roadmap, we raised capital from General Catalyst and industry leaders. After the launch of our first open-source project, with several more in the pipeline, we are now hiring and starting to work with early customers. Read on for details or drop us a note!
Our market research and survey of industry leaders convinced us that on-device deployment is highly desirable compared to server-side in many production settings for reasons such as:
To be clear, we do not claim that all inference workloads will be on-device. We will transform the market so that most are. Today, the priority of on-device inference has high variance across market segments and we are working with customers where it is business critical first. In the meanwhile, we are building tools (open-source) and conducting research (open science) to transform the remaining market segments for broader adoption within the next two years.
Our founding team has spent the last 6 years at Apple building a track record of building on-device inference algorithms and software with industry-leading performance. Some notable recent projects include: Transformers for the Apple Neural Engine, Fastest Stable Diffusion on iPhone and Mixed-bit Model Compression. We are also core contributors to the private inference engine behind Core ML.
We have identified the “showstoppers” that are holding back on-device deployment from becoming the industry standard for most inference workloads and we are tackling them one by one:
If you think we missed an important one, we are eager to hear from you!
A common cause underlying these showstoppers is the model size. Applying compression techniques from the past to large foundation models of today created the perception that user devices are simply not capable enough to execute them just yet in a production setting. Argmax conducts compression research (applied and fundamental) to invent and deliver the next generation of model compression techniques and break this perception.
We benefited immensely from open-source projects such as PyTorch, coremltools, transformers, diffusers and so on. In turn, we bolstered projects such as Mochi Diffusion and whisper.cpp.
We are eager to sustain this virtuous cycle between open-source projects towards running state-of-the-art foundation models on user devices.
Towards that end, we are committed to open-sourcing most of our core R&D output while building our business on licensing bleeding-edge inference performance products, customer-requested features and customer-level quality-of-inference SLAs.
We announced WhisperKit two months after founding Argmax as our first project. It is a collection of tools and libraries optimized for real-time performance and extensibility, built to deploy billion parameter scale Transformers compatible with the Whisper speech recognition model on Apple devices as small as the Apple Watch and as old as the iPhone 12!
WhisperKit transcribing a YouTube video in real-time on an Apple Watch Ultra 2
This is the first in a series of projects with vertically integrated software to deliver popular foundation models with industry-leading performance while handling the entire model lifecycle: From over-the-air model delivery to prediction post-processing. Each framework we build addresses a canonical inference workload for a market segment that is poised to leverage on-device inference as a competitive advantage.
WhisperKit is a case study in Argmax’s approach:
We are hiring! See the Careers page for detail then email us at hireme@takeargmax.com with the role you are interested in and (optionally) a link to a related project you are proud of.
Argmax is an open-source inference optimization company building the next generation of compression techniques and on-device inference software for developers and enterprises. For press inquiries: press@takeargmax.com