# Yotta Labs

> Yotta Labs provides on-demand, elastic GPU compute built for AI at scale. Products include GPU Pods (Compute), Serverless auto-scaling inference, AI Gateway (unified multi-model API for 50+ models across 15+ providers), and cloud-native Model Quantization. Yotta Labs serves 50,000+ developers across 20+ global regions with SOC 2 Type I certified infrastructure.

## Products

- [GPU Compute](https://www.yottalabs.ai/compute.md): On-demand GPU pods and virtual machines including H100, H200, B200, B300, RTX 4090, RTX 5090, and RTX PRO 6000. Per-second billing, no commitments, from $0.38/hr.
- [Serverless](https://www.yottalabs.ai/serverless.md): Auto-scaling GPU inference and training with zero infrastructure management. Best for production inference services, batch workloads, and large-scale training pipelines.
- [AI Gateway](https://www.yottalabs.ai/ai-gateway.md): Single OpenAI-compatible API endpoint (`https://gateway.yottalabs.ai/v1`) for 50+ LLM, image, and video generation models. Intelligent cost/latency/quality routing, automatic fallback, 99.9% uptime SLA.
- [Quantization](https://www.yottalabs.ai/quantization.md): Cloud-native LLM quantization service supporting INT4 and NVFP4 precision. Cuts inference costs by up to 60% and reduces VRAM usage by up to 75%. No local setup required.
- [Launch Templates](https://www.yottalabs.ai/launch-templates.md): Preconfigured GPU deployment templates (PyTorch 2.9, ComfyUI, Unsloth, SkyRL). Bundles Docker image, CUDA drivers, and framework config for one-click GPU Pod deployment.

## Pricing & Resources

- [Pricing](https://www.yottalabs.ai/pricing.md): GPU and storage pricing. Per-second billing, no commitments. GPU on-demand rates from $0.38/hr (RTX 4090) to $5.43/hr (B300). Storage from $0.036/GB/mo.

## Research

- [Our Research](https://www.yottalabs.ai/our-research.md): Peer-reviewed publications on efficient ML, distributed GPU orchestration, and inference optimization. Papers published at USENIX ATC, HPCA, SC, ASPLOS, EuroSys, IPDPS, and MICRO.
- [Research Credit Program](https://www.yottalabs.ai/research-credit.md): Academic GPU credit program offering $1,000 in free compute credits and 6 months of discounted usage for approved researchers, faculty, and graduate students.

## Optional

- [Mini-SGLang-Neuron: Bringing Lightweight LLM Inference to AWS Trainium and Inferentia](https://www.yottalabs.ai/post/mini-sglang-neuron-bringing-lightweight-llm-inference-to-aws-trainium-and-inferentia): Yotta Labs' open-source project enabling lightweight, high-performance LLM inference on AWS Trainium and Inferentia hardware.
- [Yotta Labs Welcomes Jack Dongarra: A Signal for the Next Era of AI Infrastructure](https://www.yottalabs.ai/post/yotta-labs-welcomes-dr-jack-dongarra): Yotta Labs welcomes Dr. Jack Dongarra (Turing Award laureate) as an advisor to advance AI infrastructure through intelligent software orchestration.
- [Academic Research Credit Support Program Launch](https://www.yottalabs.ai/post/academic-research-credit-support-program-launch): Announcement of the Yotta Labs Academic Research Credit Support Program providing researchers with access to modern AI infrastructure and flexible pricing.
- [NeuronMM: High-Performance Matrix Multiplication for LLM Inference on AWS Trainium](https://www.yottalabs.ai/post/neuronmm-high-performance-matrix-multiplication-for-llm-inference-on-aws-trainium): Yotta Labs' NeuronMM accelerates LLM inference on AWS Trainium through optimized matrix multiplication kernels, achieving significant speedups.
- [Optimizing Distributed Inference Kernels for AMD Developer Challenge 2025](https://www.yottalabs.ai/post/optimizing-distributed-inference-kernels-for-amd-developer-challenge-2025): Technical report on optimizing All-to-All, GEMM-ReduceScatter, and AllGather-GEMM kernels for distributed LLM inference on AMD GPUs.
- [NSF SBIR: Decentralized AI Computing Operating System](https://www.yottalabs.ai/post/nsf-sbir-decentralized-artificial-intelligence-os): Yotta Labs received an NSF SBIR grant to develop a Decentralized AI OS for accessible, cost-effective AI computing across heterogeneous hardware.