Figure 3. Using gradual pruning and dynamic quantization to control the accuracy-efficiency trade-off (IMAGE)
Caption
The trained model was pruned by removing the lowest weight in each channel. Only one element remains after 8 rounds of pruning (pruned to 1/9). Each of the pruned models is then subjected to dynamic quantization.
Credit
Hot Chips
Usage Restrictions
Nothing
License
Original content