ResNet50でのベンチマークテスト

AppleのWebサイト https://developer.apple.com/metal/tensorflow-plugin/ にあるResNet50 のサンプルのソースコードで実験.
名前の通り,層が深いのでCPUとGPUの差がはっきりと出るのかもしれない.

使用したコード

1epochの計算時間と,CPU,GPU使用率のスクリーンショット.

Tensorflowのversion: Windows native環境ではtensorflow 2.10.1,それ以外は2.12 or 2.13 を使用.

import tensorflow as tf

cifar = tf.keras.datasets.cifar100
(x_train, y_train), (x_test, y_test) = cifar.load_data()
model = tf.keras.applications.ResNet50(
	include_top=True,
	weights=None,
	input_shape=(32, 32, 3),
	classes=100,)

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
model.fit(x_train, y_train, epochs=5, batch_size=64)

NVIDIA GPU

GTX980.Windows native

60 sec.GPU使用率は80%前後.

GTX980.Windows WSL(Ubuntu22.04)

Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6666 MB memory:
 -> device: 0, name: NVIDIA GeForce GTX 980, pci bus id: 0000:01:00.0, compute capability: 5.2
Epoch 1/5
Loaded cuDNN version 8801
StreamExecutor device (0): NVIDIA GeForce GTX 980, Compute Capability 5.2
disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.

782/782 [==============================] - 83s 69ms/step - loss: 4.8694 - accuracy: 0.0718   
Epoch 2/5
782/782 [==============================] - 53s 67ms/step - loss: 4.0766 - accuracy: 0.1277
Epoch 3/5
782/782 [==============================] - 53s 68ms/step - loss: 3.7435 - accuracy: 0.1677
Epoch 4/5
782/782 [==============================] - 54s 69ms/step - loss: 3.8044 - accuracy: 0.1817
Epoch 5/5
782/782 [==============================] - 54s 69ms/step - loss: 3.4434 - accuracy: 0.2208
54 sec.GPU使用率は90%前後.
Windows Nativeよりやや高速.Tensorflowのversionが違うから?

Titan X(Pascal), windows native

70sec.GPU使用率は50%前後.

Titan X(Pascal), WSL

uya/ResNet50.py 
TF-TRT Warning: Could not find TensorRT
GPU:0 with 10600 MB memory: NVIDIA TITAN X (Pascal), compute capability: 6.1
Epoch 1/5
Loaded cuDNN version 8801
StreamExecutor device (0): NVIDIA TITAN X (Pascal), Compute Capability 6.1

782/782 [==============================] - 92s 72ms/step - loss: 4.7130 - accuracy: 0.0721
Epoch 2/5
782/782 [==============================] - 54s 70ms/step - loss: 4.0493 - accuracy: 0.1363
Epoch 3/5
782/782 [==============================] - 54s 69ms/step - loss: 4.1746 - accuracy: 0.1366
Epoch 4/5
782/782 [==============================] - 54s 70ms/step - loss: 3.9994 - accuracy: 0.1588
Epoch 5/5
782/782 [==============================] - 54s 69ms/step - loss: 3.9037 - accuracy: 0.1532
54 sec.ネイティブ環境より速い.GPU使用率は50%前後.

Titan V, Windows native

/device:GPU:0 with 9826 MB memory: device: 0, name: NVIDIA TITAN V, compute capability: 7.0
Epoch 1/5
Loaded cuDNN version 8800
782/782 [==============================] - 65s 66ms/step - loss: 4.8289 - accuracy: 0.0671
Epoch 2/5
782/782 [==============================] - 53s 67ms/step - loss: 4.3591 - accuracy: 0.0976
Epoch 3/5
782/782 [==============================] - 54s 69ms/step - loss: 4.1728 - accuracy: 0.1067
Epoch 4/5
782/782 [==============================] - 52s 67ms/step - loss: 3.7613 - accuracy: 0.1540
Epoch 5/5
782/782 [==============================] - 55s 70ms/step - loss: 3.8035 - accuracy: 0.1512
53sec. GPU使用率は60%前後.

Titan V, WSL

/device:GPU:0 with 9764 MB memory: device: 0, name: NVIDIA TITAN V, compute capability: 7.0
Epoch 1/5
Loaded cuDNN version 8801
StreamExecutor device (0): NVIDIA TITAN V, Compute Capability 7.0
782/782 [==============================] - 91s 67ms/step - loss: 4.6153 - accuracy: 0.0900
Epoch 2/5
782/782 [==============================] - 51s 66ms/step - loss: 4.5570 - accuracy: 0.0846
Epoch 3/5
782/782 [==============================] - 50s 64ms/step - loss: 4.0112 - accuracy: 0.1204
Epoch 4/5
782/782 [==============================] - 50s 64ms/step - loss: 3.6474 - accuracy: 0.1689
Epoch 5/5
782/782 [==============================] - 50s 64ms/step - loss: 4.0726 - accuracy: 0.1295

Windows Nativeより若干早いか.Tensorflowのversionが少し新しい.

RTX3050 Laptop, WSL

Laptopでもこれくらい.さすが.

GPU:0 with 1611 MB memory: device: 0, name: NVIDIA GeForce RTX 3050 Laptop GPU, compute capability: 8.6
Epoch 1/5
Loaded cuDNN version 8801
TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
StreamExecutor device (0): NVIDIA GeForce RTX 3050 Laptop GPU, Compute Capability 8.6
782/782 [==============================] - 75s 58ms/step - loss: 4.9869 - accuracy: 0.0536
Epoch 2/5
782/782 [==============================] - 42s 54ms/step - loss: 4.5081 - accuracy: 0.0774
Epoch 3/5
782/782 [==============================] - 42s 54ms/step - loss: 4.5099 - accuracy: 0.0677
Epoch 4/5
782/782 [==============================] - 42s 54ms/step - loss: 4.3222 - accuracy: 0.0631
Epoch 5/5
782/782 [==============================] - 42s 54ms/step - loss: 4.5077 - accuracy: 0.0550
42 sec. GPU使用率は86%前後.

Macbook

M1 Max (MacBook Pro 16 inch)

Metal device set to: Apple M1 Max
systemMemory: 64.00 GB
maxCacheSize: 24.00 GB
Epoch 1/5
Plugin optimizer for device_type GPU is enabled.
782/782 [==============================] - 66s 71ms/step - loss: 4.9320 - accuracy: 0.0548
Epoch 2/5
782/782 [==============================] - 57s 72ms/step - loss: 4.2894 - accuracy: 0.0907
Epoch 3/5
782/782 [==============================] - 56s 72ms/step - loss: 3.8890 - accuracy: 0.1406
Epoch 4/5
782/782 [==============================] - 57s 72ms/step - loss: 3.6129 - accuracy: 0.1742
Epoch 5/5
782/782 [==============================] - 55s 71ms/step - loss: 3.4545 - accuracy: 0.2035
	
57 sec.NVIDIA のそこそこの GPU に匹敵.

M1 (MacBook Pro 13 inch)

Apple M1
16.00 GB
5.33 GB
782/782 [==============================] - 116s 143ms/step - loss: 1.9598 - accuracy: 0.3815
Epoch 2/5
782/782 [==============================] - 111s 142ms/step - loss: 1.7507 - accuracy: 0.4290
Epoch 3/5
782/782 [==============================] - 111s 142ms/step - loss: 1.8120 - accuracy: 0.4301
Epoch 4/5
782/782 [==============================] - 111s 142ms/step - loss: 1.5029 - accuracy: 0.5215
Epoch 5/5
782/782 [==============================] - 111s 142ms/step - loss: 1.7785 - accuracy: 0.4335
111 sec.NVIDIA の GPU 程ではないが,CPUよりは圧倒的に速い.GPU使用率は95%以上.

Radeon Pro 560, MacBook Pro(Intel Core i5)

Metal device set to: AMD Radeon Pro 560
systemMemory: 16.00 GB
Epoch 1/5
Plugin optimizer for device_type GPU is enabled.
782/782 [==============================] - 205s 214ms/step - loss: 4.5386 - accuracy: 0.0804 
Epoch 2/5
782/782 [==============================] - 167s 213ms/step - loss: 4.7394 - accuracy: 0.0634
Epoch 3/5
782/782 [==============================] - 154s 197ms/step - loss: 4.1199 - accuracy: 0.0959
Epoch 4/5
782/782 [==============================] - 151s 193ms/step - loss: 4.0015 - accuracy: 0.1285
Epoch 5/5
782/782 [==============================] - 150s 192ms/step - loss: 3.6907 - accuracy: 0.1623
150 sec.NVIDIA程ではないが,CPUよりは圧倒的に速い.

CPU

Intel Core i7-6770

1170 sec. 遅い!

Intel Xeon E5-2687W

48 thread をもってしても 610sec 程度