- This performance data were collected based on the maximum CPU and NPU frequencies of each platform.
- The script for setting the frequencies is located in the scripts directory.
- All models should be converted with
optimization_level set to 0 to enable optimized runtime performance.
| Model | Model Size | Dtype | Seqlen | New_tokens | TTFT(ms) | Tokens/s | Memory(MB) |
|---|
| Qwen2 | 0.5B | w8a8 | 128 | 64 | 143.83 | 42.58 | 654.26 |
| MiniCPM4 | 0.5B | w8a8 | 128 | 64 | 128.46 | 45.13 | 524.55 |
| Qwen3 | 0.6B | w8a8 | 128 | 64 | 213.50 | 32.16 | 773.77 |
| TinyLLAMA | 1.1B | w8a8 | 128 | 64 | 239.00 | 24.49 | 1085.21 |
| Qwen2.5 | 1.5B | w8a8 | 128 | 64 | 412.27 | 16.32 | 1659.15 |
| RWKV7 | 1.5B | w8a8 | 128 | 64 | 788.00 | 13.33 | 1450.29 |
| InternLM2 | 1.8B | w8a8 | 128 | 64 | 374.00 | 15.58 | 1765.71 |
| Gemma2 | 2B | w8a8 | 128 | 64 | 679.90 | 9.80 | 2765.30 |
| Gemma3n | 2B | w8a8 | 128 | 64 | 1220.40 | 9.46 | 2709.25 |
| TeleChat2 | 3B | w8a8 | 128 | 64 | 649.60 | 10.22 | 2777.00 |
| Phi3 | 3.8B | w8a8 | 128 | 64 | 1022.00 | 7.50 | 3747.73 |
| MiniCPM3 | 4B | w8a8 | 128 | 64 | 1385.92 | 5.99 | 4339.61 |
| ChatGLM3 | 6B | w8a8 | 128 | 64 | 1395.34 | 4.94 | 5976.43 |
| Qwen3-VL | 2B | w8a8 | 128 | 64 | 391 | 15.12 | 1892.13 |
| DeepSeekOCR | 3B(A570M) | w8a8 | 128 | 64 | 696.21 | 31.81 | 3028.66 |
| Model | Model Size | Dtype | Seqlen | New_tokens | TTFT(ms) | Tokens/s | Memory(MB) |
|---|
| Qwen2 | 0.5B | w4a16 | 128 | 64 | 327.72 | 34.24 | 426.24 |
| 0.5B | w4a16_g128 | 128 | 64 | 363.58 | 33.22 | 445.95 |
| 0.5B | w8a8 | 128 | 64 | 334.26 | 22.95 | 661.1 |
| MiniCPM4 | 0.5B | w4a16 | 128 | 64 | 348.87 | 35.8 | 322.41 |
| 0.5B | w4a16_g128 | 128 | 64 | 371.96 | 32.88 | 362.23 |
| 0.5B | w8a8 | 128 | 64 | 337.52 | 23.71 | 528.96 |
| Qwen3 | 0.6B | w4a16 | 128 | 64 | 482.82 | 25.16 | 495.99 |
| 0.6B | w4a16_g128 | 128 | 64 | 512.36 | 24.3 | 528.48 |
| 0.6B | w8a8 | 128 | 64 | 448.94 | 17.09 | 779.62 |
| TinyLLAMA | 1.1B | w4a16 | 128 | 64 | 517.82 | 21.32 | 591 |
| 1.1B | w4a16_g128 | 128 | 64 | 658.78 | 18.89 | 681 |
| 1.1B | w8a8 | 128 | 64 | 537.82 | 12.63 | 1082.83 |
| RWKV7 | 1.5B | w4a16 | 128 | 64 | 1779.65 | 9.96 | 799.89 |
| 1.5B | w4a16_g128 | 128 | 64 | 1877.95 | 9.37 | 890.16 |
| 1.5B | w8a8 | 128 | 64 | 1718.8 | 6.96 | 1458.48 |
| InternLM2 | 1.8B | w4a16 | 128 | 64 | 771.6 | 13.65 | 966.12 |
| 1.8B | w4a16_g128 | 128 | 64 | 1001.23 | 12.18 | 1061.57 |
| 1.8B | w8a8 | 128 | 64 | 777.86 | 7.91 | 1773.23 |
| Gemma2 | 2B | w4a16 | 128 | 64 | 1119.51 | 8.45 | 1529.03 |
| 2B | w4a16_g128 | 128 | 64 | 1407.31 | 7.76 | 1616.45 |
| 2B | w8a8 | 128 | 64 | 1052.77 | 5.01 | 2771.54 |
| Gemma-3n | 2B | w4a16 | 128 | 64 | 3187 | 7.38 | 1574.34 |
| 2B | w8a8 | 128 | 64 | 3229.16 | 4.75 | 2722.76 |
| TeleChat2 | 3B | w4a16 | 128 | 64 | 1143.73 | 9.05 | 1514.98 |
| 3B | w4a16_g128 | 128 | 64 | 1422.38 | 7.91 | 1633.54 |
| 3B | w8a8 | 128 | 64 | 1035.37 | 5.15 | 2783.73 |
| Phi3 | 3.8B | w4a16 | 128 | 64 | 1800.92 | 6.52 | 1985.75 |
| 3.8B | w4a16_g128 | 128 | 64 | 2236.9 | 5.96 | 2141.89 |
| 3.8B | w8a8 | 128 | 64 | 1591.59 | 3.76 | 3757.22 |
| MiniCPM3 | 4B | w4a16 | 128 | 64 | 2484.63 | 4.94 | 2336.73 |
| 4B | w4a16_g128 | 128 | 64 | 3053.52 | 4.49 | 2618.14 |
| 4B | w8a8 | 128 | 64 | 2509.27 | 3.04 | 4366.85 |
| ChatGLM3 | 6B | w4a16 | 128 | 64 | 2121.26 | 4.7 | 3014.38 |
| 6B | w4a16_g128 | 128 | 64 | 2958.88 | 4.03 | 3244.15 |
| 6B | w8a8 | 128 | 64 | 1920.97 | 2.5 | 5958.65 |
| Qwen3-VL | 2B | w4a16 | 128 | 64 | 791.20 | 12.88 | 1082.65 |
| 2B | w4a16_g128 | 128 | 64 | 1026.31 | 11.62 | 1170.89 |
| 2B | w8a8 | 128 | 64 | 799.09 | 7.67 | 1900.80 |
| DeepSeekOCR | 3B(A570M) | w4a16 | 128 | 64 | 1010.15 | 24.85 | 1756.13 |
| 3B(A570M) | w8a8 | 128 | 64 | 1312.00 | 16.21 | 3072.33 |
| Model | Model Size | Dtype | Seqlen | New_tokens | TTFT(ms) | Tokens/s |
|---|
| Qwen2 | 0.5B | w4a16 | 128 | 64 | 650.69 | 21.43 |
| 0.5B | w4a16_g128 | 128 | 64 | 679.78 | 18.18 |
| 0.5B | w8a8 | 128 | 64 | 636.90 | 13.91 |
| MiniCPM4 | 0.5B | w4a16 | 128 | 64 | 654.20 | 22.97 |
| 0.5B | w4a16_g128 | 128 | 64 | 691.57 | 18.78 |
| 0.5B | w8a8 | 128 | 64 | 663.41 | 15.12 |
| Qwen3 | 0.6B | w4a16 | 128 | 64 | 955.94 | 15.41 |
| 0.6B | w4a16_g128 | 128 | 64 | 1019.94 | 12.60 |
| 0.6B | w8a8 | 128 | 64 | 945.18 | 10.55 |
| Model | Stage | RK3588(w8a8) | RK3576(w4a16) |
|---|
| Qwen2-VL-2B | img-encoder(392*392) | 3.28s | 3.55s |
| Prefill(len=196) | 632.6ms | 1234.9ms |
| Decode | 16.6 tokens/s | 14.57 tokens/s |
| Qwen2.5-VL-3B | img-encoder(392*392) | 2.93s | 2.87s |
| Prefill(len=196) | 1120ms | 2130ms |
| Decode | 8.66 tokens/s | 7.87 tokens/s |
| MiniCPM-V-2_6 | img-encoder(448*448) | 3.27s | 2.4s |
| Prefill(len=64) | 826ms | 1230ms |
| Decode | 4.18 tokens/s | 3.85 tokens/s |
| SmolVLM-256M | Img-encoder(512*512) | 842ms | 768ms |
| Prefill(len=128) | 77.3ms | 180ms |
| Decode | 78 tokens/s | 57.73tokens/s |
| Qwen3-VL-2B | img-encoder(448*448) | 2.08s | 1.61s |
| Prefill(len=196) | 649ms | 1587ms |
| Decode | 14.91 tokens/s | 10.36 tokens/s |
| DeepSeekOCR-3B(A570M) | Img-encoder(448*448) | 2.09s | 2.27ms |
| Prefill(len=128) | 696ms | 1010ms |
| Decode | 31.8 tokens/s | 22.3 tokens/s |
- The img-encoder runs inference on RKNN with FP16, tested using all NPU cores.
The performance benchmarks and inference data presented in this section are sourced from the official Rockchip RKNN Model Zoo. These results demonstrate the optimized performance of various LLMs and VLMs on Rockchip NPU platforms using the latest RKNN Toolkit2.
Source: