python - GPU is not improving (cudamat) -
the following code calculates euclidean distance between x points of n dimensions , w fixed points (in same space, n dimensions)
there 10.000 x points, , 14 w points.
distance method calls made in batches (of 2.000 points, , 200 or 20 points)
distance operations involves matrixes multiplication of size (batchsizexn) x (nx14). involves element wise pow operations, , cols/rows sums.
using gpu has overcost of copying both matrixes pc ram nvidia ram. expected in space high dimensionality (n = 1.000 dimensions) reverted time execution improvement.
but attached code did bad results:
- with n=64 dimensions gpu 0.2 times faster (cpu 5 times faster gpu)
- with n=1000 dimensions gpu , cpu has same results
- with n=10000 dimensions gpu 1.2 times faster gpu
- with n=100000 dimensions gpu 0.85 times faster gpu
the cpu intel i7 , gpu nvidia 750ti. i'm using ubuntu dedicated server. think x windows not using graphic card. connect vía vnc / ssh
the best improvement of 1.2 versus 1 thread of 1 i7 core.
i'm newbie on gpus. have better results welcomed.
thanks in advance
import numpy np import cudamat cm def distance(self, x, usegpu=false): if usegpu: print("using gpu...") w=self.w gpu_w = cm.cudamatrix(w) gpu_x = cm.cudamatrix(x) gpu_d = cm.empty ((w.shape[0] , x.shape[0]) ) gpu_x2 = cm.empty (x.shape) gpu_w2 = cm.empty (w.shape) cm.pow(gpu_x, 2, target=gpu_x2) gpu_x2 = gpu_x2.sum(axis=1) cm.pow(gpu_w, 2, target=gpu_w2) gpu_w2 = gpu_w2.sum(axis=1) gpu_d = cm.dot(gpu_w, gpu_x.t) gpu_d = gpu_d.mult((-2)) gpu_d = gpu_d.add_col_vec(gpu_w2) gpu_d = gpu_d.add_row_vec(gpu_x2.transpose()) d_t=gpu_d.transpose().asarray() return d_t else: w = self.w x2 = (x**2).sum(1)[:, none] d = -2*np.dot(w, x.t) + (w**2).sum(1)[:, none] + x2.t return d.t
Comments
Post a Comment