Is there an efficient method to convert from a Vector{String}
to String
in Julia?
I attempted a naive implementation using a for
loop, however the performance is terrible. (Even when type information is available.)
In this particular case, I have a data structure which is a Vector
of single character strings.
This is effectively a concatenation / reduce operation. I'm not sure how to express it in Julia.
The fact that a for
loop gives poor performance makes me suspect there is no efficient implementation for this.
I think the poor performance is most likely due to the result of repeated memory allocations.
I tried sizehint!
, but this doesn't appear to have an implementation for String
. So I am not sure if there is a way to avoid the repeated memory allocation.
Is there an efficient method to convert from a Vector{String}
to String
in Julia?
I attempted a naive implementation using a for
loop, however the performance is terrible. (Even when type information is available.)
In this particular case, I have a data structure which is a Vector
of single character strings.
This is effectively a concatenation / reduce operation. I'm not sure how to express it in Julia.
The fact that a for
loop gives poor performance makes me suspect there is no efficient implementation for this.
I think the poor performance is most likely due to the result of repeated memory allocations.
I tried sizehint!
, but this doesn't appear to have an implementation for String
. So I am not sure if there is a way to avoid the repeated memory allocation.
3 Answers
Reset to default 3You're likely going to get the best mileage out of join(iterator)
:
using BenchmarkTools
v = [String(rand('a':'z', 10)) for _ in 1:1000];
@btime reduce(*, $v)
376.600 μs (999 allocations: 4.87 MiB)
@btime prod($v)
367.800 μs (999 allocations: 4.87 MiB)
@btime join($v)
13.000 μs (17 allocations: 41.55 KiB)
@assert reduce(*, v) == prod(v) == join(v)
The timing of reduce
and prod
being equal makes sense: they both ultimately call mapreduce(identity, *, v)
.
The speedup from join
occurs because join(::Vector{String})
calls sprint(join, ::Vector{String})
, then join(::IOBuffer, ::Vector{String})
, then ultimately String(_unsafe_take!(::IOBuffer))
, meaning the call benefits from allocating a buffer to hold the concatenated string, then wrapping the String
around the buffer without checking and, if your compiler allows it, eliding the extra Array
allocation and performing an equivalent move operation instead. If you want it to go even faster and you know the size of the resultant string, you can do something like this:
@btime sprint(join, $v, sizehint=length($v)*10)
11.500 μs (3 allocations: 9.97 KiB)
(I played around with the headroom needed to maximize performance, so your mileage may vary).
prod(vector_of_string)
seems both fastest (twice as fast as reduce
on my machine) and semantically clearest.
One possible implementation
vector_of_string::Vector{String} = ...
reduce(*, vector_of_string)
I don't know how this performs compared to other alternatives, but it seems much faster than my existing implementation.
The reverse operation, if you require it
a_string::String = ...
collect(a_string) # returns Vector{Char}
string.(collect(a_string)) # returns Vector{String}
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745085674a4610391.html
评论列表(0条)