I am trying to move away from OpenCL and CUDA into ArrayFire. One of my functions uses the GPU's popcount() to make pre-processing data easier. But I can't find it anywhere in the list of functions in ArrayFire.
OpenCL has popcount, CUDA has popc, and there is the builtin_popcount for CPU work. Where the heck is the function in ArrayFire? I see count() and count_all() but those are for the element count of the array not the bits in an element (as far as I can tell).
Am I missing something or is this just a feature not implemented in the library? I feel like it is a pretty important function and expected it to be with the bitwise manipulation functions.
I was expecting some function with the ability to tell me the count of 1s in an integer. I honestly would like to leverage the optimization features of the library, but without this it is impossible.
Yes, I can write my own. No I don't want to do it. I want to use the architecture optimized implementations provided by the hardware vendor on the CPUs/GPUs.
I am trying to move away from OpenCL and CUDA into ArrayFire. One of my functions uses the GPU's popcount() to make pre-processing data easier. But I can't find it anywhere in the list of functions in ArrayFire.
OpenCL has popcount, CUDA has popc, and there is the builtin_popcount for CPU work. Where the heck is the function in ArrayFire? I see count() and count_all() but those are for the element count of the array not the bits in an element (as far as I can tell).
Am I missing something or is this just a feature not implemented in the library? I feel like it is a pretty important function and expected it to be with the bitwise manipulation functions.
I was expecting some function with the ability to tell me the count of 1s in an integer. I honestly would like to leverage the optimization features of the library, but without this it is impossible.
Yes, I can write my own. No I don't want to do it. I want to use the architecture optimized implementations provided by the hardware vendor on the CPUs/GPUs.
Share Improve this question asked Nov 21, 2024 at 3:50 Ben HBen H 393 bronze badges1 Answer
Reset to default 1Well I will answer my own question.
There is not one.
I looked through the repo, yes they use popc and popcount (CUDA and OpenCl) for the nearest_neighbour. But it is not used anywhere else. So it is not implemented.
Now I have a few choices; use the custom kernel, fork their code and make my own, or abandon this folly and move on.
I will probably try the custom kernel. If it fails I will switch back to OpenCL and CUDA.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1742315068a4420672.html
评论列表(0条)