问使用Thust OMP在CPU上并行蒙特卡罗
EN

Stack Overflow用户

提问于 2014-08-14 12:21:57

回答 1查看 202关注 0票数 0

目标是使用thrust::omp并行化蒙特卡罗过程。

int main()
{
  unsigned Nsimulations = 1000;
  // construct some objects here that will be required for Monte Carlo
  A a;
  B b;
  C c;

  /*
   * use thrust::omp to run the simulation Nsimulation times
   * and write a custom reduction function (a sum for objects of type R)
   */
}

// this is the Monte Carlo function - it needs the global variables a, b, c
// passed by reference because they are very large; the return type is R
R monteCarlo(A& a, B& b, C& c)
{
   // something supercomplicated here
   return r;
}

我需要知道：

如果/如何访问全局变量a，b，c(此处只读取不存在竞争条件问题)
如何设置线程的数量(N模拟是数以千计的，也许更多，所以我不想为此过度杀戮。
我希望运行monteCarlo函数，n模拟时间，并可能将它们存储在一个向量中，并最终将其存储在推力缩减或串行缩减中，因为这不是耗时的部分。

multithreading

thrust

montecarlo

openmp

回答 1

Stack Overflow用户

回答已采纳

发布于 2014-08-14 12:41:45

正如我在你之前的问题中说过的，为了回答这样的问题，您必须使用学习更多关于推力的知识，这样的问题对您有任何意义。

你可以这样做：

thrust::generate(...);

生成你的初始随机数。这里的向量的长度是你想要的模拟次数。

你可以这样做：

thrust::for_each(...);

将先前生成的随机数向量作为输入传递，可能使用结果向量压缩。每次操作将使用自定义函子对输入向量中的每个随机数元素单独调用monteCarlo例程。每个输入元素的monteCarlo例程的输出将位于结果向量中的相应位置。A，B，C全局值中的参数/数据可以作为初始化参数传递给for_each使用的函子。

你可以这样做：

thrust::reduce(...);

在以前生成的结果向量上，创建最终的约简。

我不会关心你想要做的模拟到OMP线程的映射。推力将处理这一点，就像#pragma omp parallel for在OMP情况下处理它一样。

下面是一个充分发挥作用的例子：

$ cat t536.cpp
#include <iostream>
#include <stdlib.h>
#include <thrust/system/omp/execution_policy.h>
#include <thrust/system/omp/vector.h>
#include <thrust/reduce.h>
#include <thrust/for_each.h>
#include <thrust/iterator/zip_iterator.h>


struct A {
  unsigned a;
};

struct B {

  int b;
};

struct C {

  float c;
};

A a;
B b;
C c;

float monteCarlo(int rn){

  return ((rn % a.a)+ b.b)/c.c;
}

struct my_functor
{
  template <typename Tuple>
  void operator()(const Tuple &data) const{

    thrust::get<1>(data) = monteCarlo(thrust::get<0>(data));
   }
};


int main(int argc, char *argv[]){
  a.a = 10;
  b.b = 2;
  c.c = 4.5f;
  unsigned N = 10;
  if (argc > 1) N = atoi(argv[1]);
  thrust::omp::vector<unsigned> rands(N);
  thrust::omp::vector<float> result(N);
  thrust::generate(thrust::omp::par, rands.begin(), rands.end(), rand);
  thrust::for_each(thrust::omp::par, thrust::make_zip_iterator(thrust::make_tuple(rands.begin(), result.begin())), thrust::make_zip_iterator(thrust::make_tuple(rands.end(), result.end())), my_functor());
  float answer = thrust::reduce(thrust::omp::par, result.begin(), result.end());
  std::cout << answer << std::endl;
  return 0;
}
$ g++ -O2 -I/usr/local/cuda/include -o t536 t536.cpp -fopenmp -lgomp
$ ./t536 10
14.8889
$