AI Researcher — Inference Optimization
About This Gig
Role Overview We are seeking an AI Researcher with deep experience in inference optimization to design, evaluate, and deploy high-performance inference systems for large-scale machine learning models. You will work at the intersection of model architecture, systems engineering, and hardware-aware optimization , improving latency, throughput, and cost efficiency across real-world production environments. Key Responsibilities Research and develop techniques to optimize inference performance for large neural networks. Improve latency, throughput, memory efficiency, and cost per inference . Design and evaluate model-level optimizations (quantization, pruning, KV-cache optimization, architecture-aware simplifications). Implement systems-level optimizations (dynamic batching, kernel fusion, multi-GPU inference, prefill vs decode optimization). Benchmark inference workloads across hardware accelerators. Collaborate with engineering teams to deploy optimized inference pipelines . Translate re
About the Seller
Featherless AI
on Himalayas