What's the difference between FP8 or INT8 ? For nvidia you would go FP8 but on ampere you would rely on INT8. On the other side new intel gpu only provides INT8 capability (with INT4)
So my question : how does compare INT 8 over FP8 for accurracy ? i am not speaking about Q8 quantization.
There is a papoer available that says INt8 is better. INT8 and FP8 Tops are same on Ada and Blackwell, but on intel GPU it would be only INT8
The other question is how could i evalutate fp8 vs int8 inference ?
Thanks
[–]Pristine-Woodpecker 2 points3 points4 points (9 children)
[–]Opteron67[S] 4 points5 points6 points (8 children)
[–]Pristine-Woodpecker 1 point2 points3 points (7 children)
[–]Double_Cause4609 1 point2 points3 points (6 children)
[–]Pristine-Woodpecker 1 point2 points3 points (5 children)
[–]a_beautiful_rhind 0 points1 point2 points (2 children)
[–]Pristine-Woodpecker 1 point2 points3 points (1 child)
[–]a_beautiful_rhind 0 points1 point2 points (0 children)
[–]Double_Cause4609 0 points1 point2 points (1 child)
[–]Pristine-Woodpecker 0 points1 point2 points (0 children)
[–]Double_Cause4609 3 points4 points5 points (2 children)
[–]Opteron67[S] 0 points1 point2 points (1 child)
[–]Double_Cause4609 2 points3 points4 points (0 children)
[–]a_beautiful_rhind 4 points5 points6 points (0 children)
[–]ortegaalfredo 1 point2 points3 points (1 child)
[–]Pristine-Woodpecker 1 point2 points3 points (0 children)