Hi all, I've posted this over at the Nvidia forums as well but thought I'd try my luck here too
Original Post:
I've just started to learn CUDA and have created a simple program that creates a 2D array of int, assigns the memory on the device and then copies the array onto the device. Eventually I want to expand this into a graph searching algorithm. However, when using an array with 1,000 verticies (indicies) it simply crashes. As far as I can tell, its populating the array on the host that causes the crash.
Call me a noob but I thought that an array of this size was perfectly acceptable?
Here's my code anyway
#include <stdio.h>
#include <cuda.h>
__global__ void myKernel(int* deviceArrayPtr, int pitch)
{
}
main()
{
int* deviceArrayPtr;
size_t devicePitch, hostPitch, width, height;
int hostArray[1000][1000];
width = 1000;
height = 1000;
for(int i = 0; i < 1000; i ++)
for(int j = 0; j < 1000; j ++)
hostArray[i][j] = 20; //20 is an abitrary number
//Allocates memory on the device
cudaMallocPitch((void**)&deviceArrayPtr, &devicePitch, width * sizeof(int), height);
hostPitch = devicePitch;
//Copies hostArray onto the pre-allocated device memory
cudaMemcpy2D(deviceArrayPtr, devicePitch, &hostArray, hostPitch, width * sizeof(int), height, cudaMemcpyHostToDevice);
myKernel <<< 100, 512 >>> (deviceArrayPtr, devicePitch);
}
Anyone have any ideas about this?


LinkBack URL
About LinkBacks

Reply With Quote
abit.care@HEXUS
