MNIST from scratch in Metal (C++) I built a simple 2-layer MNIST MLP that trains + runs inference from scratch, only using Apple’s metal-cpp library. The goal was to learn GPU programming “for real” and see what actually moves the needle on Apple Silicon. Not just a highly optimized matmul kernel, but also understanding Metal's API for buffer residency, command buffer structure, and CPU/GPU synchronization. It was fun (and humblin Sector: Electronic Labour | Confidence: 95% Source: https://www.reddit.com/r/MachineLearning/comments/1rf3qb7/p_mnist_from_scratch_in_metal_c/ --- Council (3 models): The signal demonstrates that developers are achieving performance gains in AI workloads on Apple Silicon through low-level Metal API programming, surpassing optimized high-level frameworks like MLX for specific tasks. This highlights a growing friction between high-level ML frameworks and low-level GPU programming, revealing latent demand for specialized expertise in electronic labour. The rise of Apple Silicon and the need for optimized workloads drive increased open-source contributions in low-level GPU programming for ML. Cross-sector impacts include faster, custom GPU implementations enabling real-time risk modeling in finance and reduced latency in industrial IoT predictive maintenance systems. Cross-sector: Finance, Real Infrastructure ? How does the performance gap between custom Metal implementations and MLX scale with larger models or batch sizes? ? Are other hardware vendors seeing similar demand for low-level GPU programming expertise in their ecosystems? ? What barriers exist for developers to adopt Metal or similar low-level frameworks in production environments? #FIRE #Circle #ai