The concat loop is akin to O(n^2) whereas the push loop is O(n). By creating a new array you need to do the copy, as you showed. This happens for every iteration of the loop*. An array push however is amortized constant time (usually), thus you only copy each element once (well probably three times).
Part of the issue with increased copying is the increased memory access. Your array probably exceeds the bounds of the L0 cache on the CPU, possibly the L1 cache. This means that when concat copies, it needs to load new memory from a more distant cache, repeatedly. The push version never accesses the data already in the list, thus avoiding this extra memory loading.
(*Before somebody gets pedantic, it's more like a K*N loop, where K is the loop iterations. Hwever, since the size of the resulting array is linearly related to K, it means c * K = N, thus ~ N^2)
Talking strictly in terms of the performance of push versus concat (albeit naively as it relates to VM implementation details), this wouldn't really apply.
A meaningful comparison of the two would take two large pre-allocated arrays, and compare the results of a single concat operation and a single push operation, where I'd expect concat to perform better when dealing with sufficiently large arrays, since it's (theoretically) a single allocation and then two memcpys, whereas push copies a single element over one at a time causing multiple reallocations.
To be pedantic, the context in which we're talking about the performance characteristics of push and concat is in terms of merging a large number of arrays, where I'd wager garbage collection is likely the dominating factor in the concat solution, even after taking runtime characteristics into account.
I'm not following how this means what I said doesn't apply?
push is highly unlikely to allocate memory one bit at a time. If it's anything like a C++ vector, or a contiguous array in any other language, it has amortized O(1) push time by allocating geometrically rather than linearly.
We're also dealing in sizes that are definitely large enough to spill over the CPU buffers, especially in the concat case where we allocate new data.
GC is potentially involved, but the allocation and deallocation of memory, in such tight loops, for an optimized VM (I assume it is), is likely also linear time.
There's no reason a memcpy is faster than a loop copying over items one at a time. memcpy is not a CPU construct, it gets compiled to a loop that copies a bit of memory at a time.
I'm not following how this means what I said doesn't apply?
I should clarify: What you said applies to merging multiple arrays, where the input size is the number of arrays being merged. If we're talking strictly about the performance of push vs concat for concatenating one array to another (which the headline sort of implies), then the input size is always one if analyzing in terms of merging multiple arrays.
push is highly unlikely to allocate memory one bit at a time
As you said, it's going to re-allocate similarly to a C++ vector, which for the purpose of runtime analysis we just disregard. I choose not to disregard it since I don't believe runtime analysis tells the whole story here. Or perhaps more appropriately, merging multiple arrays is a specific use of push and concat that doesn't speak to their performance characteristics in a generalized way.
The point is, concat may perform better in scenarios where one is likely to use it, and it would be misleading to advocate always using push instead.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Nice investigation.
The concat loop is akin to O(n^2) whereas the push loop is O(n). By creating a new array you need to do the copy, as you showed. This happens for every iteration of the loop*. An array push however is amortized constant time (usually), thus you only copy each element once (well probably three times).
Part of the issue with increased copying is the increased memory access. Your array probably exceeds the bounds of the L0 cache on the CPU, possibly the L1 cache. This means that when concat copies, it needs to load new memory from a more distant cache, repeatedly. The push version never accesses the data already in the list, thus avoiding this extra memory loading.
(*Before somebody gets pedantic, it's more like a K*N loop, where K is the loop iterations. Hwever, since the size of the resulting array is linearly related to K, it means
c * K = N
, thus ~ N^2)Also known as a Schlemiel the Painter’s Algorithm.
Talking strictly in terms of the performance of
push
versusconcat
(albeit naively as it relates to VM implementation details), this wouldn't really apply.A meaningful comparison of the two would take two large pre-allocated arrays, and compare the results of a single
concat
operation and a singlepush
operation, where I'd expectconcat
to perform better when dealing with sufficiently large arrays, since it's (theoretically) a single allocation and then two memcpys, whereaspush
copies a single element over one at a time causing multiple reallocations.To be pedantic, the context in which we're talking about the performance characteristics of
push
andconcat
is in terms of merging a large number of arrays, where I'd wager garbage collection is likely the dominating factor in theconcat
solution, even after taking runtime characteristics into account.I'm not following how this means what I said doesn't apply?
push
is highly unlikely to allocate memory one bit at a time. If it's anything like a C++ vector, or a contiguous array in any other language, it has amortized O(1)push
time by allocating geometrically rather than linearly.We're also dealing in sizes that are definitely large enough to spill over the CPU buffers, especially in the concat case where we allocate new data.
GC is potentially involved, but the allocation and deallocation of memory, in such tight loops, for an optimized VM (I assume it is), is likely also linear time.
There's no reason a
memcpy
is faster than a loop copying over items one at a time.memcpy
is not a CPU construct, it gets compiled to a loop that copies a bit of memory at a time.I should clarify: What you said applies to merging multiple arrays, where the input size is the number of arrays being merged. If we're talking strictly about the performance of
push
vsconcat
for concatenating one array to another (which the headline sort of implies), then the input size is always one if analyzing in terms of merging multiple arrays.As you said, it's going to re-allocate similarly to a C++ vector, which for the purpose of runtime analysis we just disregard. I choose not to disregard it since I don't believe runtime analysis tells the whole story here. Or perhaps more appropriately, merging multiple arrays is a specific use of
push
andconcat
that doesn't speak to their performance characteristics in a generalized way.The point is,
concat
may perform better in scenarios where one is likely to use it, and it would be misleading to advocate always usingpush
instead.