New Chat
Ctrl
K
  • Websites
  • Docs
  • Slides
  • Sheets
  • Deep Research
Kimi Code Kimi Claw
Chat History
Mobile App
  • About Us
    • Visit Moonshot AI
    • Open Platform
    • Features
    • Terms of Service
    • Privacy Policy
  • Language
  • User Feedback

FP8 i INT8 performanse

E da ne znas mozda zasto FP8 i INT8 imaju razlicite peformanse kod analize slika
Compare FP8 and INT8 Performance in Vision Models
The user is asking in Croatian/Serbian/Bosnian: "And you don't happen to know why FP8 and INT8 have different performances in image analysis"
This is a technical question about deep learning quantization formats - specifically comparing FP8 (8-bit floating point) vs INT8 (8-bit integer) performance in image analysis tasks (likely computer vision models like CNNs, Vision Transformers, etc.).
Key aspects to cover:
  1. What are FP8 and INT8 quantization formats
  2. Why they perform differently in image analysis tasks
  3. Technical differences (dynamic range, precision, hardware support)
  4. Impact on vision models specifically (computation vs memory bandwidth, activation distributions)
I should search for current information about FP8 vs INT8 performance comparison, specifically in computer vision/image analysis contexts.
Pitao/la si o razlikama u performansama između FP8 (8-bit floating point) i INT8 (8-bit integer) kod analize slika. To je relevantno pitanje s obzirom na to da nove generacije GPU-ova (poput NVIDIA Hopper/Ada i AMD MI300) uvode podršku za FP8, dok je INT8 dugo bio standard za kvantizaciju.
Potražit ću recentne informacije o usporedbi ovih formata u kontekstu computer visiona.
Search
FP8 vs INT8 performance comparison computer vision image analysis FP8 quantization vision transformers vs CNN performance why FP8 better than INT8 deep learning accuracy
23 results
Comparing Dynamic Range and Quantization in Vision CNNs vs Transformers
This is a comprehensive research paper (Qualcomm) comparing FP8 vs INT8 for deep learning inference. Let me extract the key findings relevant to the user's question about why FP8 and INT8 have different performances in image analysis.
Key findings from the paper:
  1. For image-related networks (CNNs like ResNet, MobileNet, DeepLabV3, HRNet), INT8 generally performs BETTER than FP8 formats
  2. For transformers (Vision Transformers, BERT), FP8-E4 can be better due to outliers, but there are workarounds for INT8
  3. Hardware efficiency: FP8 is 40-50% less efficient in terms of area and energy than INT8
  4. The difference comes down to distributions:
    • CNNs/vision models: Gaussian-like distributions without significant outliers → INT8 is better (more precision in the dense region)
    • Transformers: Have significant outliers in specific layers → FP8-E4 handles outliers better due to larger dynamic range
So the answer should explain:
  1. Distribution differences: Vision CNNs typically have well-behaved Gaussian distributions while transformers have outliers
  2. Hardware efficiency differences (FP8 requires more complex logic)
  3. Precision vs range trade-off (INT8 has uniform precision, FP8 has precision near zero but less at extremes)
  4. Post-training quantization vs quantization-aware training effects
  5. Specific results showing INT8 is often better for computer vision tasks
The user asked in Croatian/Serbian, so I should answer in the same language.
Zanimljivo pitanje. Prema recentnim istraživanjima (posebno radu iz Qualcomma koji detaljno uspoređuje ove formate
), INT8 često ima bolje performanse od FP8 kod analize slika, što može zvučati kontraintuitivno s obzirom na to da je FP8 "moderniji" format.
Evo ključnih razloga zašto se performanse razlikuju:

1. Različiti raspodjeli vrijednosti u mrežama za vid

Kod klasičnih CNN-ova (ResNet, MobileNet, DeepLabV3, HRNet) za analizu slika:
  • Težine i aktivacije imaju Gaussovu raspodjelu bez ekstremnih outliera
  • INT8 (fiksna točka) ima uniformnu preciznost u cijelom rasponu, što bolje odgovara ovim "dobro ponašanim" raspodjelama
  • FP8 (plutajuća točka) troši bitove na eksponent, pa ima manju preciznost za većinu vrijednosti koje su blizu nule
Kod Vision Transformera (ViT, BERT):
  • Pojavljuju se značajni outlieri u specificnim slojevima (attention mehanizam)
  • FP8-E4 (s 4 bita za eksponent) bolje podnosi outliere zbog većeg dinamičkog raspona
  • INT8 bez prilagodbe gubi točnost zbog "clippinga" outliera

2. Hardverska efikasnost (otprilike 50% razlike)

FP8 zahtijeva značajno složeniju logiku u hardveru:
  • Množenje i akumulacija u FP8 traže više logičkih vrata (gates) u siliciju
  • Prema sintezi hardvera, FP8 MAC (multiply-accumulate) jedinice su 40-50% manje efikasne po pitanju površine i energije od INT8
  • To znači da će isti čip biti sporiji s FP8 nego s INT8 ako je workload compute-bound

3. Rezultati na stvarnim modelima

U spomenutom istraživanju provedenom na ImageNet i drugim datasetovima:
Table
ModelINT8 QATFP8-E4 QATRazlog
ResNet1870.43%69.35%Gaussovska raspodjela
MobileNetV271.82%70.89%Regularizirane težine
DeepLabV373.99%73.22%Semantic segmentation
HRNet81.27%81.06%High-resolution
ViT (Vision Transformer)83.26%83.91%Outlieri u attentionu

4. Quantization-Aware Training (QAT) efekti

Kada se mreže treniraju s kvantizacijom u petlji:
  • INT8 značajno profitira jer outlieri budu "otklonjeni" tijekom treninga (mreža nauči raditi bez njih)
  • FP8 također funkcionira, ali ne pokazuje superiornost nad INT8 za vizualne zadatke
  • Čak i kada se mreža trenira u FP8-E4, konverzija natrag u INT8 često poboljša rezultate za CNN-ove jer su podatci bolje reprezentirani

5. Memorijska propusnost

Kod analize slika s velikim rezolucijama:
  • INT8 omogućuje efikasniji prijenos podataka (manja zahtjevnost za bandwidth)
  • FP8 često zahtijeva da se aktivacije drže u FP16 da bi očuvale točnost, što umanjuje prednost manje veličine formata

Zaključak

Za klasičnu analizu slika (CNN-ovi): INT8 je superioran jer:
  1. Bolje reprezentira Gaussovske raspodjele bez outliera
  2. Značajno je efikasniji u hardveru (brži inference)
  3. Lakše se kvantizira bez gubitka točnosti
Za Vision Transformere: FP8-E4 može biti bolji u "brzom" post-training kvantizacijskom scenariju zbog outliera, ali s tehnikama kao što su W8A16 (8-bit težine, 16-bit aktivacije) za probletične slojeve ili QAT, INT8 može postići istu točnost uz veću efikasnost.
Isto istraživanje zaključuje da za edge/embedded scenarije (mobiteli, kamera, IoT), INT8 (pa čak i INT4 za težine) ostaje bolji izbor od FP8 za computer vision zadatke.
: van Baalen et al., "FP8 versus INT8 for efficient deep learning inference", 2023. rad detaljno analizira hardversku implementaciju i točnost formata na desecima modela.