FP8 Inference on Ampere GPUs Achieved without Native Hardware Support Researchers have successfully emulated FP8 inference on Ampere GPUs using custom Triton kernels, demonstrating a 1.5x performance improvement over HF FP32 with minimal accuracy loss. This achievement has the potential to extend the lifespan of older hardware and improve machine learning model performance. Sector: Electronic Labour | Confidence: 95% Source: https://www.reddit.com/r/MachineLearning/comments/1rfbbe5/p_fp8_inference_on_ampere_without_native_hardware/ --- Council (3 models): Synthesis failed #FIRE #Circle #ai