Quick point: This isn't memoization, or at least it's not of a form that I know of. Memoization is where you compute a result, store it, and reuse it when requested for that result again. As you use each of the intermediate results exactly once, this isn't memoization. It is of course quite a clever transformation of a problem from something that is naively O(n2 ) to O(n), and on a technical level it likely introduces quite a lot of SIMD parallelism as well - but it's not memoization.
Thanks for reading and commenting. I was definitely thinking exact same thing as I was writing the post, and I was somewhat hesitant to use the term "memoization" here. The code does seem to be slightly outside the exact definition of memoization, insofar as I'm not caching the results of a pure function for given inputs. However, I do think what's happening here is still fundamentally the same—speeding up a program by caching the results of expensive operations, and then using those later rather than calculating the same value again. I suppose due to how I was approaching the problem, with the total product being calculated incrementally across a for-loop, the cache also had to be calculated incrementally, so it didn't really lend itself to a more functional approach, where you would call a function with a single input, and cache the result for that input.
I think we could meet the exact formal definition of memoization by doing essentially the same thing, but doing it recursively?:
Then later in the problem, when we're at, for example, myArray[4] we could call something like:
getTotalProduct(myArray, 3) * getReverseTotalProduct(myArray, 5)
and we would just get values from our cache, because we've already calculated results for those particular inputs...
This isn't about a "formal definition" of memoization, or indeed caching, as such. The basic misunderstanding is that memoisation cannot be applied to a problem where each request is unique. Your second version isn't memoization - it could be counted as pre-computing your arrays, but that's it. Fundamentally, you still only read each element of the cache once.
As I said before, the speedup here is from the way the problem has been transformed to being O(n) rather than O(n2 ). But for this particular problem, memoization isn't useful, because each answer is required exactly once. Because each query is unique, you cannot memoize.
Memoization might be of use in a slightly different problem specification, where there's a user of the system who asks for different indicies and you cannot precompute the entire array; in that case, as the user may request the same index multiple times memoization is viable. But for this one, there just isn't anything that can be memoized.
Not sure I follow, to be honest. In the problem, we need to calculate, for example, the product of 1*2*3*4 two separate times, meaning that that particular result is required exactly twice. The second time, we don't have to recalculate it because we've already cached it. Each request is not unique—it happens twice.
It's true that we only read from the cache once per item, but does it make a difference whether we're reading each element from the cache once or 500 times?
Well, yes, I think it kind of does, because it governs what's an appropriate technique. Certainly in the case of the forward direction, all you've actually implemented is an accumulator, and in a particularly inefficient way as far as memory is concerned. And perhaps I got a bit muddled because you only presented the forward direction - the backwards direction probably does require memoisation (Edit: No, it doesn't).
The real point to this interview question, I believe, is that the most efficient solution requires different approaches for the forward and backward directions.
OK, I've had a proper chance to think about this (when I'm not posting from work/on the move), and Memoization is definitely not an ideal optimisation technique here. The problem is that what you really want is to not have to recompute things all the time, and so you want to have partial results. However, because of the form of these partial results, there's no requirement to actually store them in either a precomputed array or by memoization. It's trivial to use a pair of accumulators, as in the following code (Python, because that's what I normally use)
This version is likely to be as fast as you can get the algorithm, is pretty simple, and uses the bare minimum of additional memory (for clarity the two accummulators are seperate; they can be combined, meaning that the overhead needed for this algorithm is a single integer + the output array).
If you don't object, I might write this up into an object lesson on how to tell if the optimizations people choose to apply are actually appropriate optimizations. As I was saying, if you're only reading your memoized value once, it's unlikely to be the correct technique.
Thanks for all your insights, David. This is a really awesome solution. It didn't quite occur to me that I could have two separate accumulator loops (forward and reverse), and then combine the results like this.
You're definitely welcome to write this up in a post. That would be super interesting.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Quick point: This isn't memoization, or at least it's not of a form that I know of. Memoization is where you compute a result, store it, and reuse it when requested for that result again. As you use each of the intermediate results exactly once, this isn't memoization. It is of course quite a clever transformation of a problem from something that is naively O(n2 ) to O(n), and on a technical level it likely introduces quite a lot of SIMD parallelism as well - but it's not memoization.
Thanks for reading and commenting. I was definitely thinking exact same thing as I was writing the post, and I was somewhat hesitant to use the term "memoization" here. The code does seem to be slightly outside the exact definition of memoization, insofar as I'm not caching the results of a pure function for given inputs. However, I do think what's happening here is still fundamentally the same—speeding up a program by caching the results of expensive operations, and then using those later rather than calculating the same value again. I suppose due to how I was approaching the problem, with the total product being calculated incrementally across a for-loop, the cache also had to be calculated incrementally, so it didn't really lend itself to a more functional approach, where you would call a function with a single input, and cache the result for that input.
I think we could meet the exact formal definition of memoization by doing essentially the same thing, but doing it recursively?:
Then later in the problem, when we're at, for example, myArray[4] we could call something like:
getTotalProduct(myArray, 3) * getReverseTotalProduct(myArray, 5)
and we would just get values from our cache, because we've already calculated results for those particular inputs...
This isn't about a "formal definition" of memoization, or indeed caching, as such. The basic misunderstanding is that memoisation cannot be applied to a problem where each request is unique. Your second version isn't memoization - it could be counted as pre-computing your arrays, but that's it. Fundamentally, you still only read each element of the cache once.
As I said before, the speedup here is from the way the problem has been transformed to being O(n) rather than O(n2 ). But for this particular problem, memoization isn't useful, because each answer is required exactly once. Because each query is unique, you cannot memoize.
Memoization might be of use in a slightly different problem specification, where there's a user of the system who asks for different indicies and you cannot precompute the entire array; in that case, as the user may request the same index multiple times memoization is viable. But for this one, there just isn't anything that can be memoized.
Not sure I follow, to be honest. In the problem, we need to calculate, for example, the product of 1*2*3*4 two separate times, meaning that that particular result is required exactly twice. The second time, we don't have to recalculate it because we've already cached it. Each request is not unique—it happens twice.
It's true that we only read from the cache once per item, but does it make a difference whether we're reading each element from the cache once or 500 times?
Well, yes, I think it kind of does, because it governs what's an appropriate technique. Certainly in the case of the forward direction, all you've actually implemented is an accumulator, and in a particularly inefficient way as far as memory is concerned. And perhaps I got a bit muddled because you only presented the forward direction - the backwards direction probably does require memoisation (Edit: No, it doesn't).
The real point to this interview question, I believe, is that the most efficient solution requires different approaches for the forward and backward directions.
OK, I've had a proper chance to think about this (when I'm not posting from work/on the move), and Memoization is definitely not an ideal optimisation technique here. The problem is that what you really want is to not have to recompute things all the time, and so you want to have partial results. However, because of the form of these partial results, there's no requirement to actually store them in either a precomputed array or by memoization. It's trivial to use a pair of accumulators, as in the following code (Python, because that's what I normally use)
This version is likely to be as fast as you can get the algorithm, is pretty simple, and uses the bare minimum of additional memory (for clarity the two accummulators are seperate; they can be combined, meaning that the overhead needed for this algorithm is a single integer + the output array).
If you don't object, I might write this up into an object lesson on how to tell if the optimizations people choose to apply are actually appropriate optimizations. As I was saying, if you're only reading your memoized value once, it's unlikely to be the correct technique.
Thanks for all your insights, David. This is a really awesome solution. It didn't quite occur to me that I could have two separate accumulator loops (forward and reverse), and then combine the results like this.
You're definitely welcome to write this up in a post. That would be super interesting.