当我试图验证GPU在CPU上执行矩阵操作时,根据我的经验结果,我得到了意外的results.CPU比GPU更好的性能,这让我感到困惑。
我用cpu和gpu做矩阵乘法,respectively.Programming环境是MXNet和CODA-10.1.
与gpu:
import mxnet as mx
from mxnet import nd
x = nd.random.normal(shape=(100000,100000),ctx=mx.gpu())
y = nd.random.normal(shape=(100000,100000),ctx=mx.gpu())
%timeit nd.do
我注意到最近的一个模型警告2.37G内存无法分配:
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 2.37GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
但是我的GPU的运行速度几乎是100% (与这种情况下的大型号相比,输入很小)。
如果我
假设我有一个像这样的角面模型:
with tf.device("/CPU"):
model = tf.keras.Sequential([
# Adds a densely-connected layer with 64 units to the model:
tf.keras.layers.Dense(64, activation='relu', input_shape=(32,)),
# Add another:
tf.keras.layers.Dense(64, activation='relu'),