{"schema_version":"onlylabs.public_analysis.v1","url":"https://onlylabs.fyi/analysis/wafer","json_url":"https://onlylabs.fyi/analysis/wafer/analysis.json","evidence_json_url":"https://onlylabs.fyi/analysis/wafer/evidence.json","generated_at":"2026-06-27T22:18:58.434Z","analysis":{"org_slug":"wafer","url":"https://onlylabs.fyi/analysis/wafer","json_url":"https://onlylabs.fyi/analysis/wafer/analysis.json","evidence_json_url":"https://onlylabs.fyi/analysis/wafer/evidence.json","dossier_url":"https://onlylabs.fyi/labs/wafer","org":{"slug":"wafer","name":"Wafer","category":"neocloud","category_label":"Neocloud","homepage_url":"https://www.wafer.ai"},"title":"Wafer analysis","summary":"Wafer is a hardware-centric AI inference platform building competitive advantage through GPU kernel optimization expertise, with a distinctive multi-vendor strategy spanning NVIDIA and AMD accelerators. The evidence depicts a company vertically integrated from low-level kernel engineering up to a serverless inference product, using public benchmarks and developer education content as both recruiting and go-to-market…","markdown":"## Thesis\n\nWafer is a hardware-centric AI inference platform building competitive advantage through GPU kernel optimization expertise, with a distinctive multi-vendor strategy spanning NVIDIA and AMD accelerators. The evidence depicts a company vertically integrated from low-level kernel engineering up to a serverless inference product, using public benchmarks and developer education content as both recruiting and go-to-market instruments. The firm's public positioning centers on price/performance leadership and a privacy guarantee (zero data retention), while its technical footprint reveals deep engagement with AMD's ROCm ecosystem alongside NVIDIA's latest Blackwell hardware.\n\n## Signal desks\n\n### Hiring\n\n- The `gpu-perf-engineering-resources` curriculum README embeds an explicit hiring call: \"If you're interested in GPU performance engineering — we're hiring at Wafer\" [P2](https://github.com/wafer-ai/gpu-perf-engineering-resources). The curriculum itself — covering fundamentals through Blackwell-specific Tensor Core programming, FlashAttention, PagedAttention, KV cache optimization, Triton, CUTLASS, CuTe, ROCm, and profiling — maps the expected competence profile for candidates [P2](https://github.com/wafer-ai/gpu-perf-engineering-resources).\n- No other hiring signals (job listings, team pages, headcount announcements) appear in the evidence pack; the recruiting signal is thin and mediated entirely through developer content [P2](https://github.com/wafer-ai/gpu-perf-engineering-resources).\n\n### Forks\n\n- **ROCm/composable_kernel** — Forked 2026-01-22; AMD's performance-portable kernel programming model for ML tensor operators across GPU/CPU architectures. Uses HIP C++ with tile-based programming and Tensor Coordinate Transformation [P6, E9].\n- **modular/modular** — Forked 2026-01-22; The Modular Platform including MAX serving framework and Mojo language. README emphasizes hardware-abstracted model serving with \"industry-leading GPU and CPU performance\" [P7, E8].\n- **ROCm/aiter** — Forked 2026-01-26; AMD's centralized AI Tensor Engine for ROCm, providing high-performance AI operators (C++ and Python APIs) with kernels from Triton, CK, and assembly. Covers inference, training, and GEMM+communication kernels; includes Triton-based GPU-initiated communication via Iris [P8, E7].\n- All three forks cluster in a 4-day window (Jan 22–26, 2026); two of three target the AMD ROCm ecosystem, the third targets a hardware-agnostic serving stack. No fork activity beyond this cluster is cited in the evidence [E7, E8, E9].\n\n### Releases\n\n- **Kernel Arena benchmark results** — Published 2026-03-10; two benchmark suites: WaferBench NVFP4 (NVIDIA B200, CUDA 12.8, 6 fused NVFP4 inference kernels evaluated against GPT-5.4, Claude-4.6-Opus, Composer-1.5, Gemini-3.1-Pro) and KernelBench HIP (AMD MI300X, ROCm 7.0, 41 kernels across 4 difficulty levels, 11 models from Anthropic, OpenAI, Google, xAI, Moonshot, Z.AI). Leaderboard, methodology, and reward-hacking catalog linked [P5, E3].\n- **Wafer Serverless with DeepSeek v4** — Announced 2026-06-12 via CEO LinkedIn; DeepSeek v4 Pro and Flash running \"fully optimized\" with zero data retention, 33% price reduction, positioned as \"the provider with the best speed-to-price ratio in the market\" [W5](https://www.linkedin.com/posts/emi-andere_deepseek-v4-pro-and-flash-now-run-fully-optimized-activity-7471314087323312128-D5xy).\n- **Wafer Docs** — Public Mintlify documentation site launched 2026-05-07, active through 2026-06-08; 7 open issues indicate ongoing iteration [P9, E1].\n- **GPU performance engineering curriculum** — Published 2026-01-12, last updated 2026-04-27; 819 stars, 98 forks [P2, E2].\n- **chipbenchmark** — \"A platform for monitoring the chip situation\" (Shell), created 2025-07-13; 17 stars, 3 forks [P1, E4].\n- **HIP-Benchmarks-Results** — \"Traces and Kernels of our LLM generated HIP benchmarks\" (Python), created 2026-01-23; 2 stars [P3, E5].\n- No model weights, model cards, or paper artifacts are cited in this evidence pack.\n\n### Talking\n\n- **CEO Emilio Andere — Magnitude partnership** (2026-06-19): Wafer partnering with Magnitude (YC S25) to power their coding agent with open source models, claiming 60% cost reduction while maintaining quality. Frames open source models as \"the closest to frontier LLMs they've ever been\" [W4](https://www.linkedin.com/posts/emi-andere_excited-for-wafer-to-partner-with-magnitude-activity-7473785115635261441-1yUV).\n- **CEO Emilio Andere — DeepSeek v4 on Wafer Serverless** (2026-06-12): Emphasizes zero data retention privacy guarantee (\"nothing logged, nothing retained, prompts and outputs never leave hardware we control\"), 33% price cut, and the model being the most requested on the platform for months [W5](https://www.linkedin.com/posts/emi-andere_deepseek-v4-pro-and-flash-now-run-fully-optimized-activity-7471314087323312128-D5xy).\n- **CEO Emilio Andere — Podcast** (2026-05-24): \"Intelligence Per Watt with Emilio Andere\" on Alexa's Input (AI); discusses AI infrastructure, inference optimization, economics of the AI compute race, lessons from founding Wafer, open-source AI infrastructure, and the thesis that \"optimizing intelligence itself could become one of the most important engineering problems\" [W6](https://poddtoppen.se/podcast/1548244933/alexas-input-ai/intelligence-per-watt-with-emilio-andere).\n- **Wafer Serverless in oh-my-pi** (2026-06-27): Listed as a frontier API provider alongside Anthropic, OpenAI, Google Gemini, xAI, Mistral, Groq, Cerebras, Together, Hugging Face, NVIDIA, and others — indicating developer ecosystem presence [W2](https://github.com/can1357/oh-my-pi).\n\n## Shipping\n\nWafer's shipping surface is anchored by **Wafer Serverless**, a production inference platform with at least two marquee model families: DeepSeek v4 Pro and Flash, delivered with a privacy guarantee (zero data retention) and claimed 33% under market pricing [W5](https://www.linkedin.com/posts/emi-andere_deepseek-v4-pro-and-flash-now-run-fully-optimized-activity-7471314087323312128-D5xy). A partnership with Magnitude (YC S25) puts Wafer Serverless behind a coding agent product, targeting 60% cost reduction via open source model serving [W4](https://www.linkedin.com/posts/emi-andere_excited-for-wafer-to-partner-with-magnitude-activity-7473785115635261441-1yUV).\n\nOn the benchmarking/transparency front, **Kernel Arena** shipped with two benchmark suites: WaferBench NVFP4 on NVIDIA B200 (evaluating frontier models on fused NVFP4 inference kernels) and KernelBench HIP on AMD MI300X (41 kernels, 11 models). Public leaderboard, methodology docs, and a reward hacking catalog are live [P5](https://github.com/wafer-ai/kernel-arena).\n\nDeveloper assets include the **GPU performance engineering curriculum** (819 stars, maintained through April 2026) [P2](https://github.com/wafer-ai/gpu-perf-engineering-resources), **wafer-docs** (Mintlify, live since May 2026, 7 open issues indicating active iteration) [P9](https://github.com/wafer-ai/wafer-docs), **chipbenchmark** (chip monitoring platform, since July 2025) [P1](https://github.com/wafer-ai/chipbenchmark), and **HIP-Benchmarks-Results** (traces/kernels from LLM-generated HIP benchmarks) [P3](https://github.com/wafer-ai/HIP-Benchmarks-Results).\n\nNotable absence: no model weights, fine-tuned checkpoints, or research papers are cited in this evidence pack. Wafer ships infrastructure and benchmarks, not models.\n\n## Research themes\n\n1. **LLM-generated accelerator kernels** — Kernel Arena evaluates frontier LLMs (GPT-5.4, Claude-4.6-Opus, Composer-1.5, Gemini-3.1-Pro) on their ability to generate correct, performant GPU kernels for both NVIDIA B200 (NVFP4 fused inference) and AMD MI300X (HIP, 4 difficulty levels). The public reward hacking catalog indicates awareness of — and effort to measure — benchmark gaming by LLMs [P5](https://github.com/wafer-ai/kernel-arena).\n\n2. **AMD ROCm ecosystem integration** — Forks of composable_kernel and aiter, plus the HIP-Benchmarks-Results repo and KernelBench HIP suite, reveal sustained research into AMD GPU kernel optimization, HIP code generation, and ROCm operator performance [P3, P5, P6, P8]. The composable_kernel fork inspects a tile-based, performance-portable programming model; the aiter fork inspects AMD's centralized operator repository spanning Triton, CK, and assembly backends [P6, P8].\n\n3. **GPU performance engineering at frontier depth** — The curriculum [P2](https://github.com/wafer-ai/gpu-perf-engineering-resources) is detailed enough to function as a research map: Blackwell-specific Tensor Core content, FlashAttention through PagedAttention and KV cache optimization, Triton through CUTLASS/CuTe, and AMD ROCm fundamentals. This maps the research surface Wafer's own engineers navigate.\n\n4. **Hardware-agnostic serving abstractions** — The Modular platform fork [P7](https://github.com/wafer-ai/modular) suggests investigation of MAX and/or Mojo as potential components in a performance-portable serving stack, complementing the hand-tuned kernel work.\n\n## Hiring & scaling\n\nThe evidence contains **one hiring signal**: the GPU performance engineering curriculum README explicitly states \"we're hiring at Wafer\" [P2](https://github.com/wafer-ai/gpu-perf-engineering-resources). The curriculum's scope — from fundamentals through Blackwell-specific optimization, FlashAttention, Triton, CUTLASS, and ROCm — acts as a de facto job description for GPU kernel engineers [P2](https://github.com/wafer-ai/gpu-perf-engineering-resources). No job listings, team headcounts, office locations, or non-engineering role descriptions appear in this evidence pack. The hiring picture is thin and inferred entirely from developer content strategy.\n\n## Category implications\n\n**Strategy** — Wafer is not a model builder; it is an inference infrastructure company competing on kernel-level performance. The dual-vendor (NVIDIA + AMD) kernel benchmarking strategy [P5](https://github.com/wafer-ai/kernel-arena), combined with AMD-focused forks [P6, P8], signals a bet that the inference market will diversify beyond NVIDIA — and that owning AMD optimization creates a first-mover pricing and availability advantage. The \"intelligence per watt\" framing [W6](https://poddtoppen.se/podcast/1548244933/alexas-input-ai/intelligence-per-watt-with-emilio-andere) positions Wafer for a world where inference cost, not training capability, is the binding constraint.\n\n**Infrastructure** — The composable_kernel and aiter forks [P6, P8] indicate Wafer is building or adapting AMD ROCm kernel infrastructure, likely to serve models on MI300X-class hardware. The Modular fork [P7](https://github.com/wafer-ai/modular) suggests exploration of MAX/Mojo as a higher-level serving abstraction. The tight fork cluster (4 days in January 2026) implies a deliberate technical survey of available kernel and serving stacks, not passive mirroring [E7, E8, E9].\n\n**Product** — Wafer Serverless is the visible product surface, differentiated on three axes: price (33% below market for DeepSeek v4) [W5](https://www.linkedin.com/posts/emi-andere_deepseek-v4-pro-and-flash-now-run-fully-optimized-activity-7471314087323312128-D5xy), privacy (zero data retention, hardware-controlled) [W5](https://www.linkedin.com/posts/emi-andere_deepseek-v4-pro-and-flash-now-run-fully-optimized-activity-7471314087323312128-D5xy), and performance (kernel-level optimization) [P5](https://github.com/wafer-ai/kernel-arena). The Magnitude coding-agent partnership demonstrates product-market fit in the agent infrastructure layer [W4](https://www.linkedin.com/posts/emi-andere_excited-for-wafer-to-partner-with-magnitude-activity-7473785115635261441-1yUV). The oh-my-pi integration listing Wafer Serverless alongside Anthropic, OpenAI, and Google suggests API compatibility and growing developer distribution [W2](https://github.com/can1357/oh-my-pi).\n\n**GTM** — Developer-content-led go-to-market: the GPU curriculum (819 stars) [P2](https://github.com/wafer-ai/gpu-perf-engineering-resources) and Kernel Arena leaderboard [P5](https://github.com/wafer-ai/kernel-arena) serve as top-of-funnel developer magnets. CEO LinkedIn presence drives partnership and product announcements [W4, W5]. Podcast appearances build the \"intelligence per watt\" narrative for technical and investor audiences [W6](https://poddtoppen.se/podcast/1548244933/alexas-input-ai/intelligence-per-watt-with-emilio-andere). The strategy mirrors neocloud GTM patterns (developer love → API adoption → enterprise conversion) but with a hardware-performance rather than model-access value proposition.\n\n**Research** — Wafer's research appears applied and benchmark-driven rather than paper-driven: no papers are cited in this evidence pack. The Kernel Arena methodology and reward hacking catalog [P5](https://github.com/wafer-ai/kernel-arena) represent the most systematic research artifact, treating LLM kernel generation as an eval problem with measurable quality and gaming dimensions.\n\n**Hiring** — The single hiring signal [P2](https://github.com/wafer-ai/gpu-perf-engineering-resources) targets GPU performance engineers capable of working at the kernel level across NVIDIA (CUDA, Tensor Cores, Blackwell) and AMD (ROCm, HIP, composable_kernel, aiter) stacks. This is a narrow, deep talent pool; the curriculum-as-recruiting strategy is a rational response to scarcity.\n\n## Traction highlights\n\n- **GPU performance engineering curriculum**: 819 stars, 98 forks — strong developer interest for a niche technical repo [P2, E2].\n- **Wafer Serverless API distribution**: Listed as a frontier provider in oh-my-pi alongside Anthropic, OpenAI, Google, xAI, and others — indicating real API availability and developer integration [W2](https://github.com/can1357/oh-my-pi).\n- **Kernel Arena**: Evaluating frontier LLMs (GPT-5.4, Claude-4.6-Opus, Gemini-3.1-Pro, Composer-1.5) on kernel generation — these labs' models participating (even passively) signals recognition of the benchmark [P5](https://github.com/wafer-ai/kernel-arena).\n- **Magnitude (YC S25) partnership**: Production coding agent powered by Wafer's inference, claiming 60% cost reduction [W4](https://www.linkedin.com/posts/emi-andere_excited-for-wafer-to-partner-with-magnitude-activity-7473785115635261441-1yUV).\n- **DeepSeek v4 on Wafer Serverless**: Described as \"the most requested model family on the platform for months,\" suggesting sustained developer demand [W5](https://www.linkedin.com/posts/emi-andere_deepseek-v4-pro-and-flash-now-run-fully-optimized-activity-7471314087323312128-D5xy).\n- **chipbenchmark**: 17 stars, 3 forks — modest interest in chip monitoring platform [P1](https://github.com/wafer-ai/chipbenchmark).\n- **wafer-docs**: 7 open issues indicate active user feedback loop [P9](https://github.com/wafer-ai/wafer-docs).\n\nNote: evidence pack contains no revenue, user count, inference volume, or funding data. Traction is inferred from developer signals and partnership announcements only.\n\n## Sources\n\n- [P1](https://github.com/wafer-ai/chipbenchmark) wafer-ai/chipbenchmark repo metadata\n- [P2](https://github.com/wafer-ai/gpu-perf-engineering-resources) wafer-ai/gpu-perf-engineering-resources repo metadata\n- [P3](https://github.com/wafer-ai/HIP-Benchmarks-Results) wafer-ai/HIP-Benchmarks-Results repo metadata\n- [P4](https://github.com/wafer-ai/skills) wafer-ai/skills repo metadata\n- [P5](https://github.com/wafer-ai/kernel-arena) wafer-ai/kernel-arena repo metadata\n- [P6](https://github.com/wafer-ai/composable_kernel) wafer-ai/composable_kernel repo metadata (fork)\n- [P7](https://github.com/wafer-ai/modular) wafer-ai/modular repo metadata (fork)\n- [P8](https://github.com/wafer-ai/aiter) wafer-ai/aiter repo metadata (fork)\n- [P9](https://github.com/wafer-ai/wafer-docs) wafer-ai/wafer-docs repo metadata\n- [E1](https://github.com/wafer-ai/wafer-docs) wafer-ai/wafer-docs repo_new event\n- [E2](https://github.com/wafer-ai/gpu-perf-engineering-resources) wafer-ai/gpu-perf-engineering-resources repo_new event\n- [E3](https://github.com/wafer-ai/kernel-arena) wafer-ai/kernel-arena repo_new event\n- [E4](https://github.com/wafer-ai/chipbenchmark) wafer-ai/chipbenchmark repo_new event\n- [E5](https://github.com/wafer-ai/HIP-Benchmarks-Results) wafer-ai/HIP-Benchmarks-Results repo_new event\n- [E6](https://github.com/wafer-ai/skills) wafer-ai/skills repo_new event\n- [E7](https://github.com/wafer-ai/aiter) wafer-ai/aiter repo_forked event\n- [E8](https://github.com/wafer-ai/modular) wafer-ai/modular repo_forked event\n- [E9](https://github.com/wafer-ai/composable_kernel) wafer-ai/composable_kernel repo_forked event\n- [W2](https://github.com/can1357/oh-my-pi) GitHub oh-my-pi repo (Wafer Serverless listed as frontier API provider)\n- [W4](https://www.linkedin.com/posts/emi-andere_excited-for-wafer-to-partner-with-magnitude-activity-7473785115635261441-1yUV) Emilio Andere LinkedIn — Magnitude partnership announcement\n- [W5](https://www.linkedin.com/posts/emi-andere_deepseek-v4-pro-and-flash-now-run-fully-optimized-activity-7471314087323312128-D5xy) Emilio Andere LinkedIn — DeepSeek v4 on Wafer Serverless announcement\n- [W6](https://poddtoppen.se/podcast/1548244933/alexas-input-ai/intelligence-per-watt-with-emilio-andere) Podcast — \"Intelligence Per Watt with Emilio Andere\"","generated_at":"2026-06-27T19:19:39.602+00:00","citations":[{"url":"https://github.com/wafer-ai/chipbenchmark","path":null,"label":"wafer-ai/chipbenchmark","type":"external"},{"url":"https://github.com/wafer-ai/gpu-perf-engineering-resources","path":null,"label":"wafer-ai/gpu-perf-engineering-resources","type":"external"},{"url":"https://github.com/wafer-ai/HIP-Benchmarks-Results","path":null,"label":"wafer-ai/HIP-Benchmarks-Results","type":"external"},{"url":"https://github.com/wafer-ai/skills","path":null,"label":"wafer-ai/skills","type":"external"},{"url":"https://github.com/wafer-ai/kernel-arena","path":null,"label":"wafer-ai/kernel-arena","type":"external"},{"url":"https://github.com/wafer-ai/composable_kernel","path":null,"label":"wafer-ai/composable_kernel","type":"external"},{"url":"https://github.com/wafer-ai/modular","path":null,"label":"wafer-ai/modular","type":"external"},{"url":"https://github.com/wafer-ai/aiter","path":null,"label":"wafer-ai/aiter","type":"external"},{"url":"https://github.com/wafer-ai/wafer-docs","path":null,"label":"wafer-ai/wafer-docs","type":"external"},{"url":"https://github.com/wafer-ai/wafer-docs","path":null,"label":"wafer-ai/wafer-docs","type":"external"},{"url":"https://github.com/wafer-ai/gpu-perf-engineering-resources","path":null,"label":"wafer-ai/gpu-perf-engineering-resources","type":"external"},{"url":"https://github.com/wafer-ai/kernel-arena","path":null,"label":"wafer-ai/kernel-arena","type":"external"},{"url":"https://github.com/wafer-ai/chipbenchmark","path":null,"label":"wafer-ai/chipbenchmark","type":"external"},{"url":"https://github.com/wafer-ai/HIP-Benchmarks-Results","path":null,"label":"wafer-ai/HIP-Benchmarks-Results","type":"external"},{"url":"https://github.com/wafer-ai/skills","path":null,"label":"wafer-ai/skills","type":"external"},{"url":"https://github.com/wafer-ai/aiter","path":null,"label":"wafer-ai/aiter","type":"external"},{"url":"https://github.com/wafer-ai/modular","path":null,"label":"wafer-ai/modular","type":"external"},{"url":"https://github.com/wafer-ai/composable_kernel","path":null,"label":"wafer-ai/composable_kernel","type":"external"},{"url":"https://github.com/can1357/oh-my-pi","path":null,"label":"can1357/oh-my-pi","type":"external"},{"url":"https://www.linkedin.com/posts/emi-andere_excited-for-wafer-to-partner-with-magnitude-activity-7473785115635261441-1yUV","path":null,"label":"linkedin.com/posts","type":"external"},{"url":"https://www.linkedin.com/posts/emi-andere_deepseek-v4-pro-and-flash-now-run-fully-optimized-activity-7471314087323312128-D5xy","path":null,"label":"linkedin.com/posts","type":"external"},{"url":"https://poddtoppen.se/podcast/1548244933/alexas-input-ai/intelligence-per-watt-with-emilio-andere","path":null,"label":"poddtoppen.se/podcast","type":"external"}],"provenance":{"provider":"deepseek","model":"deepseek-v4-pro","workflow":"onlylabs-deepagents-analysis-v3","agent":"deepagents"},"evidence":{"total":24,"pages":9,"events":9,"web":6,"signal_desks":{"forks":3,"repos":6,"hiring":0,"talking":0,"releases":0},"data_radar_lanes":null,"data_radar_matches":null}},"signal_counts":{"total":9,"model_released":0,"release":0,"repo_new":6,"repo_forked":3,"post_published":0,"job_opened":0}}