
AI Inference Engineer
Automate your job search with Sonara.
Submit 10x as many applications with less effort than one manual application.1
Reclaim your time by letting our AI handle the grunt work of job searching.
We continuously scan millions of openings to find your top matches.

Overview
Job Description
Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems. Unlike other NPUs or neural network accelerators in the industry today that can only accelerate a portion of a machine learning graph, the Quadric GPNPU executes both NN graph code and conventional C++ DSP and control code.
Role
The AI Inference Engineer in Quadric is the key bridge between the world of AI/LLM models and Quadric unique platforms. The AI Inference Engineer at Quadric will [1] port AI models to Quadric platform; [2] optimize the model deployment for efficient inference; [3] profile and benchmark the model performance. This senior technical role demands deep knowledge of AI model algorithms, system architecture and AI toolchains/frameworks.
Responsibilities
- Quantize, prune and convert models for deployment
- Port models to Quadric platform using Quadric toolchain
- Optimize inference deployment for latency, speed
- Benchmark and profile model performance and accuracy
- Collaborate across related areas of the AI inference stack to support team and business priorities
- Develop tools to scale and speed up the deployment
- Make Improvement to SDK and runtime
- Provide technical support and documents to customers and developer community
Automate your job search with Sonara.
Submit 10x as many applications with less effort than one manual application.
