滴滴云A100 40G+TensorFlow1.15.2 +Ubuntu 18.04 性能测试
今天拿到了滴滴云 (大师码:8888)内测版A100,跑了一下 TensorFlow基准测试,现在把结果记录一下!
运行环境
平台为:滴滴云
系统为:Ubuntu 18.04
显卡为:A100-SXM4-40GB
Python版本: 3.6
TensorFlow版本:1.15.2 NV编译版
系统环境:
测试方法
TensorFlow benchmarks测试方法:
https://github.com/tensorflow/benchmarks
resnet50_v1.5
python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50_v1.5
Step Img/sec total_loss 1 images/sec: 602.4 +/- 0.0 (jitter = 0.0) 7.847 10 images/sec: 606.8 +/- 1.2 (jitter = 5.4) 8.053 20 images/sec: 606.3 +/- 0.8 (jitter = 4.4) 8.102 30 images/sec: 605.8 +/- 0.8 (jitter = 3.8) 8.117 40 images/sec: 606.2 +/- 0.7 (jitter = 3.8) 7.893 50 images/sec: 606.1 +/- 0.5 (jitter = 3.0) 7.919 60 images/sec: 606.2 +/- 0.5 (jitter = 2.9) 8.104 70 images/sec: 606.6 +/- 0.5 (jitter = 2.9) 7.985 80 images/sec: 606.6 +/- 0.4 (jitter = 2.8) 7.805 90 images/sec: 606.6 +/- 0.4 (jitter = 2.8) 7.973 100 images/sec: 606.7 +/- 0.4 (jitter = 2.8) 7.644 ---------------------------------------------------------------- total images/sec: 606.23 ----------------------------------------------------------------
–use_fp16
python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50_v1.5 --use_fp16
Step Img/sec total_loss 1 images/sec: 1327.1 +/- 0.0 (jitter = 0.0) 7.972 10 images/sec: 1321.2 +/- 5.7 (jitter = 27.6) 7.885 20 images/sec: 1323.5 +/- 4.4 (jitter = 25.9) 8.073 30 images/sec: 1323.6 +/- 3.7 (jitter = 27.3) 7.934 40 images/sec: 1322.1 +/- 3.3 (jitter = 32.9) 8.102 50 images/sec: 1321.4 +/- 3.0 (jitter = 27.7) 7.876 60 images/sec: 1322.2 +/- 2.8 (jitter = 32.3) 7.883 70 images/sec: 1322.3 +/- 2.5 (jitter = 32.6) 7.962 80 images/sec: 1324.0 +/- 2.4 (jitter = 32.2) 8.049 90 images/sec: 1324.2 +/- 2.2 (jitter = 31.2) 7.909 100 images/sec: 1325.1 +/- 2.1 (jitter = 29.6) 7.874 ---------------------------------------------------------------- total images/sec: 1322.76 ----------------------------------------------------------------
Resnet50 BS64
python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50
Step Img/sec total_loss 1 images/sec: 653.5 +/- 0.0 (jitter = 0.0) 8.219 10 images/sec: 646.2 +/- 2.0 (jitter = 6.0) 7.879 20 images/sec: 646.1 +/- 1.4 (jitter = 7.2) 7.909 30 images/sec: 646.0 +/- 1.2 (jitter = 6.0) 7.820 40 images/sec: 646.2 +/- 1.0 (jitter = 6.3) 8.006 50 images/sec: 646.0 +/- 1.0 (jitter = 8.6) 7.769 60 images/sec: 646.0 +/- 0.9 (jitter = 8.6) 8.114 70 images/sec: 645.7 +/- 0.9 (jitter = 9.5) 7.811 80 images/sec: 645.8 +/- 0.8 (jitter = 9.5) 7.979 90 images/sec: 645.8 +/- 0.8 (jitter = 8.0) 8.095 100 images/sec: 645.8 +/- 0.7 (jitter = 6.4) 8.038 ---------------------------------------------------------------- total images/sec: 645.26 ----------------------------------------------------------------
–use_fp16
python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --use_fp16
Step Img/sec total_loss 1 images/sec: 1300.1 +/- 0.0 (jitter = 0.0) 8.101 10 images/sec: 1310.1 +/- 7.5 (jitter = 7.4) 7.758 20 images/sec: 1309.7 +/- 8.0 (jitter = 42.3) 7.912 30 images/sec: 1315.0 +/- 5.9 (jitter = 32.1) 7.776 40 images/sec: 1315.5 +/- 4.7 (jitter = 28.2) 7.918 50 images/sec: 1317.5 +/- 3.9 (jitter = 27.7) 7.895 60 images/sec: 1316.5 +/- 3.4 (jitter = 18.6) 7.711 70 images/sec: 1317.3 +/- 3.1 (jitter = 16.1) 8.008 80 images/sec: 1316.9 +/- 2.8 (jitter = 11.4) 7.777 90 images/sec: 1317.7 +/- 2.6 (jitter = 11.8) 7.808 100 images/sec: 1317.1 +/- 2.4 (jitter = 9.9) 8.036 ---------------------------------------------------------------- total images/sec: 1315.11 ----------------------------------------------------------------
AlexNet BS512
python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=512 --model=alexnet
Step Img/sec total_loss 1 images/sec: 8294.2 +/- 0.0 (jitter = 0.0) nan 10 images/sec: 8290.2 +/- 1.6 (jitter = 5.3) nan 20 images/sec: 8290.6 +/- 1.0 (jitter = 3.7) nan 30 images/sec: 8290.8 +/- 0.7 (jitter = 2.8) nan 40 images/sec: 8291.3 +/- 0.6 (jitter = 2.7) nan 50 images/sec: 8289.8 +/- 1.4 (jitter = 2.9) nan 60 images/sec: 8290.2 +/- 1.2 (jitter = 2.9) nan 70 images/sec: 8290.4 +/- 1.3 (jitter = 3.6) nan 80 images/sec: 8291.1 +/- 1.1 (jitter = 3.5) nan 90 images/sec: 8291.9 +/- 1.0 (jitter = 4.4) nan 100 images/sec: 8291.9 +/- 1.1 (jitter = 5.2) nan ---------------------------------------------------------------- total images/sec: 8282.46 ----------------------------------------------------------------
–use_fp16
python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=512 --model=alexnet --use_fp16
Step Img/sec total_loss 1 images/sec: 10618.6 +/- 0.0 (jitter = 0.0) 7.250 10 images/sec: 10607.7 +/- 4.4 (jitter = 16.3) 7.251 20 images/sec: 10602.5 +/- 3.0 (jitter = 13.1) 7.251 30 images/sec: 10604.1 +/- 2.3 (jitter = 11.2) 7.251 40 images/sec: 10601.0 +/- 2.5 (jitter = 13.4) 7.251 50 images/sec: 10601.7 +/- 2.5 (jitter = 13.8) 7.251 60 images/sec: 10603.0 +/- 2.2 (jitter = 14.0) 7.250 70 images/sec: 10605.1 +/- 2.1 (jitter = 12.5) 7.251 80 images/sec: 10605.4 +/- 1.9 (jitter = 12.2) 7.251 90 images/sec: 10605.4 +/- 1.7 (jitter = 12.1) 7.251 100 images/sec: 10605.8 +/- 1.7 (jitter = 12.3) 7.251 ---------------------------------------------------------------- total images/sec: 10587.67 ----------------------------------------------------------------
Inception v3 BS64
python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3
Step Img/sec total_loss 1 images/sec: 436.8 +/- 0.0 (jitter = 0.0) 7.276 10 images/sec: 437.9 +/- 1.2 (jitter = 0.8) 7.337 20 images/sec: 437.8 +/- 1.0 (jitter = 2.2) 7.269 30 images/sec: 437.9 +/- 0.8 (jitter = 2.2) 7.422 40 images/sec: 437.9 +/- 0.6 (jitter = 3.5) 7.299 50 images/sec: 438.6 +/- 0.6 (jitter = 4.1) 7.277 60 images/sec: 439.2 +/- 0.5 (jitter = 3.7) 7.363 70 images/sec: 439.5 +/- 0.5 (jitter = 4.8) 7.347 80 images/sec: 440.3 +/- 0.5 (jitter = 5.3) 7.410 90 images/sec: 440.3 +/- 0.5 (jitter = 5.2) 7.325 100 images/sec: 440.3 +/- 0.4 (jitter = 5.0) 7.346 ---------------------------------------------------------------- total images/sec: 440.01 ----------------------------------------------------------------
–use_fp16
python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3 --use_fp16
Step Img/sec total_loss 1 images/sec: 901.5 +/- 0.0 (jitter = 0.0) 7.305 10 images/sec: 945.5 +/- 7.0 (jitter = 5.0) 7.354 20 images/sec: 945.6 +/- 4.9 (jitter = 7.1) 7.330 30 images/sec: 945.3 +/- 3.9 (jitter = 6.9) 7.382 40 images/sec: 946.3 +/- 3.2 (jitter = 7.3) 7.278 50 images/sec: 946.6 +/- 2.8 (jitter = 7.5) 7.373 60 images/sec: 946.3 +/- 2.5 (jitter = 7.6) 7.299 70 images/sec: 946.8 +/- 2.3 (jitter = 7.5) 7.323 80 images/sec: 946.5 +/- 2.1 (jitter = 7.6) 7.317 90 images/sec: 946.6 +/- 2.0 (jitter = 7.6) 7.357 100 images/sec: 947.2 +/- 1.8 (jitter = 7.3) 7.327 ---------------------------------------------------------------- total images/sec: 946.03 ----------------------------------------------------------------
VGG16 BS64
python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=vgg16
Step Img/sec total_loss 1 images/sec: 442.1 +/- 0.0 (jitter = 0.0) 7.321 10 images/sec: 442.4 +/- 0.1 (jitter = 0.4) 7.315 20 images/sec: 442.4 +/- 0.1 (jitter = 0.3) 7.269 30 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.271 40 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.282 50 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.291 60 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.250 70 images/sec: 442.4 +/- 0.1 (jitter = 0.2) 7.278 80 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.274 90 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.286 100 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.283 ---------------------------------------------------------------- total images/sec: 442.20 ----------------------------------------------------------------
–use_fp16
python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=vgg16 --use_fp16
Step Img/sec total_loss 1 images/sec: 687.4 +/- 0.0 (jitter = 0.0) 7.279 10 images/sec: 688.2 +/- 0.2 (jitter = 0.5) 7.255 20 images/sec: 688.0 +/- 0.1 (jitter = 0.5) 7.283 30 images/sec: 688.0 +/- 0.1 (jitter = 0.7) 7.254 40 images/sec: 687.9 +/- 0.1 (jitter = 0.7) 7.283 50 images/sec: 687.8 +/- 0.1 (jitter = 0.7) 7.249 60 images/sec: 687.7 +/- 0.1 (jitter = 0.8) 7.294 70 images/sec: 687.6 +/- 0.1 (jitter = 0.9) 7.278 80 images/sec: 687.6 +/- 0.1 (jitter = 0.9) 7.268 90 images/sec: 687.7 +/- 0.1 (jitter = 0.9) 7.264 100 images/sec: 687.6 +/- 0.1 (jitter = 0.9) 7.268 ---------------------------------------------------------------- total images/sec: 687.07 ----------------------------------------------------------------
GoogLeNet BS128
python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=128 --model=googlenet
Step Img/sec total_loss 1 images/sec: 1577.4 +/- 0.0 (jitter = 0.0) 7.104 10 images/sec: 1565.9 +/- 4.1 (jitter = 12.5) 7.105 20 images/sec: 1561.7 +/- 3.1 (jitter = 20.4) 7.094 30 images/sec: 1562.3 +/- 2.5 (jitter = 15.1) 7.087 40 images/sec: 1561.5 +/- 2.2 (jitter = 16.1) 7.067 50 images/sec: 1561.6 +/- 2.0 (jitter = 15.6) 7.091 60 images/sec: 1561.5 +/- 1.8 (jitter = 15.7) 7.049 70 images/sec: 1560.3 +/- 1.9 (jitter = 15.3) 7.074 80 images/sec: 1558.8 +/- 1.9 (jitter = 17.2) 7.077 90 images/sec: 1558.2 +/- 1.8 (jitter = 17.2) 7.079 100 images/sec: 1557.5 +/- 1.8 (jitter = 17.6) 7.066 ---------------------------------------------------------------- total images/sec: 1556.06 ----------------------------------------------------------------
–use_fp16
python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=128 --model=googlenet --use_fp16
Step Img/sec total_loss 1 images/sec: 2690.1 +/- 0.0 (jitter = 0.0) 7.173 10 images/sec: 2675.3 +/- 13.9 (jitter = 35.5) 7.068 20 images/sec: 2682.4 +/- 9.9 (jitter = 55.4) 7.086 30 images/sec: 2686.6 +/- 8.3 (jitter = 36.6) 7.075 40 images/sec: 2687.8 +/- 6.9 (jitter = 30.6) 7.084 50 images/sec: 2686.7 +/- 6.0 (jitter = 36.4) 7.076 60 images/sec: 2687.5 +/- 5.4 (jitter = 36.4) 7.075 70 images/sec: 2681.0 +/- 6.8 (jitter = 41.6) 7.075 80 images/sec: 2683.2 +/- 6.1 (jitter = 34.0) 7.065 90 images/sec: 2684.1 +/- 5.6 (jitter = 35.6) 7.092 100 images/sec: 2683.9 +/- 5.2 (jitter = 36.1) 7.052 ---------------------------------------------------------------- total images/sec: 2680.27 ----------------------------------------------------------------
ResNet152 BS32
python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet152
Step Img/sec total_loss 1 images/sec: 225.6 +/- 0.0 (jitter = 0.0) 9.060 10 images/sec: 228.3 +/- 1.0 (jitter = 2.0) 8.594 20 images/sec: 228.3 +/- 0.6 (jitter = 2.0) 8.635 30 images/sec: 228.2 +/- 0.5 (jitter = 2.5) 8.719 40 images/sec: 227.9 +/- 0.5 (jitter = 2.8) 8.599 50 images/sec: 228.1 +/- 0.5 (jitter = 2.9) 8.791 60 images/sec: 228.3 +/- 0.4 (jitter = 3.6) 8.668 70 images/sec: 228.3 +/- 0.4 (jitter = 3.3) 9.072 80 images/sec: 228.3 +/- 0.4 (jitter = 3.5) 8.874 90 images/sec: 228.4 +/- 0.3 (jitter = 3.7) 9.030 100 images/sec: 228.4 +/- 0.3 (jitter = 3.7) 8.839 ---------------------------------------------------------------- total images/sec: 228.29 ----------------------------------------------------------------
–use_fp16
python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet152 --use_fp16
Step Img/sec total_loss 1 images/sec: 392.9 +/- 0.0 (jitter = 0.0) 9.147 10 images/sec: 397.9 +/- 2.8 (jitter = 6.0) 9.000 20 images/sec: 399.0 +/- 2.1 (jitter = 8.6) 8.842 30 images/sec: 393.7 +/- 2.9 (jitter = 14.7) 8.813 40 images/sec: 394.4 +/- 2.3 (jitter = 15.2) 8.984 50 images/sec: 394.9 +/- 2.0 (jitter = 13.9) 8.647 60 images/sec: 395.7 +/- 1.8 (jitter = 13.9) 8.838 70 images/sec: 396.5 +/- 1.6 (jitter = 15.3) 8.941 80 images/sec: 395.9 +/- 1.4 (jitter = 13.4) 8.913 90 images/sec: 396.2 +/- 1.3 (jitter = 14.1) 8.807 100 images/sec: 395.7 +/- 1.3 (jitter = 14.5) 8.729 ---------------------------------------------------------------- total images/sec: 395.34 ----------------------------------------------------------------
性能对比
A100 和V100 和 2080ti 性能对比:
https://www.tonyisstark.com/383.html