I’m working on a MATLAB project that requires FFT and IFFT computations on large images using GPUs. Currently, the image size is
32000×25600 with double precision data. Although we need to process even larger images, we’ve reduced the size due to GPU memory constraints.
We’re using two NVIDIA GeForce RTX 3080 GPUs (each with 9.51 GB of available memory) to accelerate the process. Below is a simplified version of our code:
**parpool('Processes', 2);
object = ones(32000, 25600, 'gpuArray');
objectFT = fftshift(fft2(object));**
However, this approach runs into GPU memory issues. I attempted to split the image into smaller chunks to manage memory better and wrote a function to divide the computation across two GPUs using spmd as shown below:
**function result = dualGPUFFT(image, chunkSize, inverse)
[m, n] = size(image);
result = zeros(m, n);
spmd
gpuDevice(labindex);
disp(['Worker ', num2str(labindex), ' using GPU ', num2str(gpuDevice().Index)]);
if labindex == 1
localData = gpuArray(image(1:floor(m / 2), :)); % Top half
else
localData = gpuArray(image(floor(m / 2) + 1:end, :)); % Bottom half
end
localResult = gpuArray.zeros(size(localData), 'like', localData);
numChunks = ceil(size(localData, 1) / chunkSize);
for chunkIdx = 1:numChunks
rowStart = (chunkIdx - 1) * chunkSize + 1;
rowEnd = min(chunkIdx * chunkSize, size(localData, 1));
chunk = localData(rowStart:rowEnd, :);
if inverse
chunkResult = ifft2(ifftshift(chunk));
else
chunkResult = fftshift(fft2(chunk));
end
localResult(rowStart:rowEnd, :) = chunkResult;
end
resultLocal = gather(localResult);
if labindex == 1
result(1:floor(m / 2), :) = resultLocal;
else
result(floor(m / 2) + 1:end, :) = resultLocal;
end
end
end**
Unfortunately, this approach also failed to handle the memory issue effectively.
Here are my key questions:
How can I efficiently manage GPU memory for FFT/IFFT computations on such large images? Are there other chunking strategies or libraries that could handle this better in MATLAB?
Does performing FFT/IFFT on GPUs for large images generally offer significant speed improvements over CPUs?
I’d greatly appreciate any insights, recommendations, or alternative approaches for handling such large-scale FFT/IFFT computations.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1742338927a4425222.html
评论列表(0条)