For quite some time I found my time procastinating after getting home from work. Untill a few days ago I stumbled upon the Daily Coding Problem (DC...
[Read Full]

there was actually an article recently that touched on the subject mentioned by Serhii Sakal dev.to/_codingblocks/when-is-an-ar...
I was wondering, why not just use an object instead of an array?
let's take a simple example, where you have something like this

consta=[1,2];a[1000]=1000;console.log(a);

and look at the size of a.
even with your example, the created lookup array is bigger than the given array, so you're using more memory than needed.

a proper hash function (for integers their own value is a very good hash code) and

a hash set implementation that resizes once a certain percentage of buckets have been filled (which all standard collection libraries do)

...then not only is the cost of a hash collision amortized, but also the cost of re-hashing in case that the hash set runs out of space. There's a whole lot of theory behind this (look up "amortized complexity analysis"), but generally speaking, provided a proper implementation of both the set collection itself and the hash function of the objects to be stored, hash set lookups are always amortized O(1).

I think yiu might be misunderstanding how hash tables work.
The idea is that there exists a "bounded" table for hashes. Hence, hases of ints are not themselves, but themselves modulo some number. Due to that, it might be the case that the numbers in the given list are all divisible into that number, hence they will all fall into the same hash bucket, hence the hash becomes a simple linked list, hence it's O(n). Moreover, there is no hash function dealing with thia problem, unless you assume an infinite storage space LOL

No, I understand hash tables perfectly fine. Every bucket in ha hash table can be implemented as a list, that's true (however, alternatives do exist as well). That list has O(n) complexity for "contains", that's true as well. However, each hash table implementation worth its salt will perform a procedure called "rehashing". For example, lets say you have a hash table with 10 buckets. Now entries 1 through 7 are being inserted, no problem. If you have a decent hash function, you will end up with 7 buckets of size 1, and 3 empty buckets. Collissions MAY occur, but if your hash function is any good they will be extremely rare. Now, we have 70% fill ratio in our table. Entry #8 wants to be inserted. Instead of just putting it in (risking more collisions), we create a NEW list of buckets which is larger than the old one (typically twice as large), so now we have 20 buckets. We reinsert (rehash) our first 7 entries into the new 20-bucket-table, then insert entry #8. We continue inserting until we are again 70% full, then we resize to 40 buckets, and so on. No infinite storage needed. Amortized cost of contains is O(1), also for insertion O(1).

I studied all of that long enough, but you don't have to take my word for it - just check it on wikipedia or Stackoverflow.

Question: What happens if the hash of the values being inserted happens to be the same for all values? (Hint: Remember that you have to first check if the value already exists in the set before you can add it).

Is amortized cost for n inserts still O(n) in that scenario?

Or an even easier "proof": If you're doing open hashing as most implementations do, a hashset that contains values with the same hash degrades to a simple linked list. What is the amount of time to insert n values into a linked list of you have to check for duplicates first?

The amortized performance you learned for hashsets in school only works under the assumption of a reasonably well distributed hash function - in the worst case it degrades.

If all values deliver the same hash code then I'd argue that you have a pretty useless hash function... as I said, all of the argumentation I provided is based on the assumption of a proper hash function.

No, you don't have to check first if something is in a hash set before you insert it. Simply look up the bucket and the position where the element would be, then put it there if absent, or don't if it's not there. If your buckets are of size 0, 1 or 2 (as they should be) then the algorithm is O(1).

Bad hash codes will result in bad hash performance. No rocket science. If you take an object from a standard library, be it a number, a string, you name it - they are bound to have proper hash functions with good distributions. If a real-world hash set degenerates into a list (which is possible), I would question the programmer - not the theory.

The problem is that even the best hash function will always have pathological cases. In this case with integers the problem isn't the hash function of the integer, but the necessary transformation to fit into the set, which always requires some kind of modulo computation.

If hash mod bucket_size is always the same number you have the problem. If you know the used algorithm to convert the hash to the bucket number, the starting size and the growth algorithm of the set you can always pick your numbers in such a way that you'll get collisions. This has nothing to do with whether the hash algorithm of the data type is bad or good (integers have a pretty perfect naive hash algorithm after all).

"No, you don't have to check first if something is in a hash set before you insert it. Simply look up the bucket and the position where the element would be, then put it there if absent, or don't if it's not there."
Good explanation of how you check to see if the element exists - that's exactly how you'd do it. And since we've already established that for certain inputs all values will be in the same bucket, this degenerates to O(N) in the worst case.

Consequently the worst case performance here is O(n^{2)} and not O(n). That's simply how hashsets work. Is that important? Depends on your use case. If you aren't worrying about an attacker, or the input is outside the control of any attacker you're good. If not, you've just opened yourself up to a classical DDOS attack by not paying attention to those * next to the complexity numbers.

And before you disregard this as purely theory: This issue has plagued web frameworks and their handling of POST form data amongst many other examples I can think of. You can read bugs.python.org/issue13703 for one classic example.

Hold your horses. We went from a discussion with somebody who clearly was unfamiliar with hashing techniques to malicious attack vectors as an argument to prove a generally accepted theory wrong? Good grief, that's a stretch. Sorry, this is getting ridiculous - I just wanted to clarify a misunderstanding, that's all. If we all explained things this way, we would have an army of students running about, firmly believing that the common access complexity for a hash set is O(n). I highly doubt that this is useful to anybody.

To each their own. I find "Hashsets have O(1) amortized lookup and insert performance assuming a relatively uniform hash distribution, otherwise O(n)" a perfectly reasonable explanation even if it is marginally more complex than "Hashsets always have O(1) amortized lookup and insert performance". I doubt that most students would fail to understand the first explanation.

It's also a valuable lesson to students to teach them that you have to specify whether you mean best case, average or worst case performance when analysing algorithms.

The first is that I don't think you want to do k-=n, rather just k-n. As written, this will only work with pairs with the first number in the array.

Second, you need to limit the includes check to the remainder of the array, or you will fail in cases such as doTheyAddTo([5], 10), because 5 will be considered as a valid addend of itself.

To fix the latter, refactor into a for (let i = 0, n; i < a.length; n = a[i++]) loop to get access to the index, or the much much more readable find, e.g...

functiondoTheAddTo(a,k){a.find((n,i)=>{// because we're in a closure instead of a loop iteration,// we also don't have to worry about prematurely returning false// and can instead just return the value of our checkreturna.includes(k-n,i+1)})}

Note, using find instead of reduce or filter will maintain the short circuit you had.

Best way for 1 pass would be to as you iterate through the array, store the number pair that would make that index total k into an array. And then on each subsequent number check it it exists in the check array before storing its pair

This page lists the complexities of the major data structures in Python. Sets and dicts have a similar implementation. The lookup is O(1) in average case, the hash function can lead to O(n) lookups in pathological cases.
Also, amortized O(n) is possible when dicts/sets are resized when adding new elements

PS: For this specific problem, we do not need to use a set as an array will do i.e keys are integers. So if the range of numbers is limited, array index provides a good hash function.

I do not think there is an upper range of numbers and they are also not subsequent, so using an array like a hashset is not usefull in this particular situation

This solution is, unfortunately, not correct. Let's say we have the set {1,2,3,10,16} with k=20. Clearly, there are no two numbers that sum up to k. Your code, and all the ones I've seen in the comments, would return true, because on the 3rd iteration (number=10) we find an entry in the set that matches, but it is the number itself.
Also, while your solution is O(n), I'm pretty sure it does two passes. If I remember correctly, building a hash set is itself O(n).

In the given example, it would not return true, because the current number is added to the set after checking.
Also I tried it and it returns false.
I am not sure about the hashset building, could also use a dictionary, but it is wasted space.

Right, I am sorry, I misread. I thought you had built the hash set before you started the comparison pass, not during it. On closer inspection, good solution.

I would sort the list and create a break condition in the innermost loop. When a sum bigger than k is observed there is no point in continuing on testing said number of the outer loop.

If we sort the list first, I can do you one better: have two iterators/indices, i at the start and j at the end. If the sum of the pointees is k, return true. If it is less, increment i. If it is more, decrement j. If i and j meet (and the sum did not equal k), return false. This is O(n).

Thanks for sharing this Daily coding challenge. Is the re a way to get a daily email with a new challenge?
For this problem, solution is simple and its O(n).
If we assume that the list is sorted. we can get the sum pair in O(logn)

namespace PlayGround
{
class DailyProblems
{
int[] numArray;
int arraySize = 0;
int sumValue = 0;

public void ElementsSum()
{
Console.Write("Enter sum value: ");
sumValue = Convert.ToInt32(Console.ReadLine());
Console.Write("Enter the size of the array :");
arraySize = Convert.ToInt32(Console.ReadLine());
// Initialize array
numArray = new int[arraySize];
// Get input for the array elements
for (int i = 0; i < arraySize; i++)
{
Console.Write("Element " + (i + 1) + ": ");
numArray[i] = Convert.ToInt32(Console.ReadLine());
}
Console.WriteLine("Moment of truth...: " + MyEvaluate());
Console.ReadLine();
}
private bool MyEvaluate()
{
bool flag = false;
int difValue = 0;
for (int i = 0; i < arraySize; i++)
{
if (sumValue > numArray[i])
{
difValue = sumValue - numArray[i];
for (int k = i+1; k < arraySize; k++)
{
if (difValue == numArray[k])
return true;
}
}
}
return flag;
}
}

def num(k,a,b,c):
assert a + b or b + c or c + a == k
if a + b == k:
print("True")
elif b + c == k:
print("True")
elif c + a == k:
print("True")
else:
print("False")

I'm trying to understand this better.
How come you didn't do numbers.contains() instead of set.contains()? Does using the original list of numbers violate one pass?

Because numbers.contains() is a list and to see if it contains the element would loop through the whole list (basically O(n). That makes the solution to have two nested loops and going full O(n^2)

I thought the catch for this problem was you cant have the same number like 3 + 3 = 6. It has to be either 2 + 4 = 6 or any combination that doesn't use the same number twice.

## Daily Coding Problem #1

## Ivan on November 24, 2018

One pass in Python using built-in list and set types

Can I suggest you use more explicit variable names?

Although I prefer a more concise implementation, although probably not O(N)

This is should be the shortest version of your code

I've just realised that my algorithm is wrong, it would fail for something like 10 and [5]. Guess I need to put a bit more thought in my comments.

ATM I see 2 proposals for JS, here's another one taking advantage of the fact that JS arrays are sparse:

No need for fancy hash structures (because arrays in javascript are already hashes).

Yes, you could,

but if you look into

lookuparray, and imagen`k = -1`

,you will create an array with a lot of empty

lookup.length === 16now, for example, we have input some array with big numbers

lookup.length > max.number in inListSo… you are saying that having very big sequence of empty space is a problem.

Can you explain why is that a problem?

There no such a big problem now,

it just my opinion,

but as suggested my friend

anyway, it's better using

`new Map || new Set`

for better performance.jsperf.com/hashtable-vs-array

Here is my O(N) solution using Set:

You should move line:

visitedNumbers.add(number);

after the if clause (otherwise the sums like 4+4=8 will be counted)

A compare i made for the Set vs HashTable

jsperf.com/add-to-k

there was actually an article recently that touched on the subject mentioned by Serhii Sakal dev.to/_codingblocks/when-is-an-ar...

I was wondering, why not just use an object instead of an array?

let's take a simple example, where you have something like this

and look at the size of a.

even with your example, the created lookup array is bigger than the given array, so you're using more memory than needed.

Hash lookup is O(n) due to collisions. You cannot assume no collisions, so it's O(n

^{2)}That's only partially true. If we asume:

a proper hash function (for integers their own value is a very good hash code) and

a hash set implementation that resizes once a certain percentage of buckets have been filled (which all standard collection libraries do)

...then not only is the cost of a hash collision amortized, but also the cost of re-hashing in case that the hash set runs out of space. There's a whole lot of theory behind this (look up "amortized complexity analysis"), but generally speaking, provided a proper implementation of both the set collection itself and the hash function of the objects to be stored, hash set lookups are always amortized O(1).

I think yiu might be misunderstanding how hash tables work.

The idea is that there exists a "bounded" table for hashes. Hence, hases of ints are not themselves, but themselves modulo some number. Due to that, it might be the case that the numbers in the given list are all divisible into that number, hence they will all fall into the same hash bucket, hence the hash becomes a simple linked list, hence it's O(n). Moreover, there is no hash function dealing with thia problem, unless you assume an infinite storage space LOL

No, I understand hash tables perfectly fine. Every bucket in ha hash table

canbe implemented as a list, that's true (however, alternatives do exist as well). That list has O(n) complexity for "contains", that's true as well. However, each hash table implementation worth its salt will perform a procedure called "rehashing". For example, lets say you have a hash table with 10 buckets. Now entries 1 through 7 are being inserted, no problem. If you have a decent hash function, you will end up with 7 buckets of size 1, and 3 empty buckets. Collissions MAY occur, but if your hash function is any good they will be extremely rare. Now, we have 70% fill ratio in our table. Entry #8 wants to be inserted. Instead of just putting it in (risking more collisions), we create a NEW list of buckets which is larger than the old one (typically twice as large), so now we have 20 buckets. We reinsert (rehash) our first 7 entries into the new 20-bucket-table, then insert entry #8. We continue inserting until we are again 70% full, then we resize to 40 buckets, and so on. No infinite storage needed. Amortized cost of contains is O(1), also for insertion O(1).I studied all of that long enough, but you don't have to take my word for it - just check it on wikipedia or Stackoverflow.

Question: What happens if the hash of the values being inserted happens to be the same for all values? (Hint: Remember that you have to first check if the value already exists in the set before you can add it).

Is amortized cost for n inserts still O(n) in that scenario?

Or an even easier "proof": If you're doing open hashing as most implementations do, a hashset that contains values with the same hash degrades to a simple linked list. What is the amount of time to insert n values into a linked list of you have to check for duplicates first?

The amortized performance you learned for hashsets in school only works under the assumption of a reasonably well distributed hash function - in the worst case it degrades.

If all values deliver the same hash code then I'd argue that you have a pretty useless hash function... as I said, all of the argumentation I provided is based on the assumption of a proper hash function.

No, you don't have to check first if something is in a hash set before you insert it. Simply look up the bucket and the position where the element would be, then put it there if absent, or don't if it's not there. If your buckets are of size 0, 1 or 2 (as they should be) then the algorithm is O(1).

Bad hash codes will result in bad hash performance. No rocket science. If you take an object from a standard library, be it a number, a string, you name it - they are bound to have proper hash functions with good distributions. If a real-world hash set degenerates into a list (which is possible), I would question the programmer - not the theory.

The problem is that even the best hash function will always have pathological cases. In this case with integers the problem isn't the hash function of the integer, but the necessary transformation to fit into the set, which always requires some kind of modulo computation.

If hash mod bucket_size is always the same number you have the problem. If you know the used algorithm to convert the hash to the bucket number, the starting size and the growth algorithm of the set you can always pick your numbers in such a way that you'll get collisions. This has nothing to do with whether the hash algorithm of the data type is bad or good (integers have a pretty perfect naive hash algorithm after all).

"No, you don't have to check first if something is in a hash set before you insert it. Simply look up the bucket and the position where the element would be, then put it there if absent, or don't if it's not there."

Good explanation of how you check to see if the element exists - that's exactly how you'd do it. And since we've already established that for certain inputs all values will be in the same bucket, this degenerates to O(N) in the worst case.

Consequently the worst case performance here is O(n

^{2)}and not O(n). That's simply how hashsets work. Is that important? Depends on your use case. If you aren't worrying about an attacker, or the input is outside the control of any attacker you're good. If not, you've just opened yourself up to a classical DDOS attack by not paying attention to those * next to the complexity numbers.And before you disregard this as purely theory:

This issue has plagued web frameworks and their handling of POST form dataamongst many other examples I can think of. You can read bugs.python.org/issue13703 for one classic example.Hold your horses. We went from a discussion with somebody who clearly was unfamiliar with hashing techniques to malicious

attack vectorsas an argument to prove a generally accepted theory wrong? Good grief, that's a stretch. Sorry, this is getting ridiculous - I just wanted to clarify a misunderstanding, that's all. If we all explained things this way, we would have an army of students running about, firmly believing that the common access complexity for a hash set is O(n). I highly doubt that this is useful to anybody.To each their own. I find "Hashsets have O(1) amortized lookup and insert performance

assuming a relatively uniform hash distribution, otherwise O(n)" a perfectly reasonable explanation even if it is marginally more complex than "Hashsets always have O(1) amortized lookup and insert performance". I doubt that most students would fail to understand the first explanation.It's also a valuable lesson to students to teach them that you have to specify whether you mean best case, average or worst case performance when analysing algorithms.

I have come up with this

Here's another solution in Haskell

In JavaScript you could do

looks good, but when you use

`includes`

is itO(log n)?jsperf.com/hashtable-vs-includes

Two problems I can find.

The first is that I don't think you want to do

`k-=n`

, rather just`k-n`

. As written, this will only work with pairs with the first number in the array.Second, you need to limit the

`includes`

check to the remainder of the array, or you will fail in cases such as`doTheyAddTo([5], 10)`

, because 5 will be considered as a valid addend of itself.To fix the latter, refactor into a

`for (let i = 0, n; i < a.length; n = a[i++])`

loop to get access to the index, or the much much more readable`find`

, e.g...Note, using

`find`

instead of`reduce`

or`filter`

will maintain the short circuit you had.Your function does not return anything though. Even if you do return the value of the initial find it won't be a Boolean.

whoops. too much REPL, not enough real code. Should be

`return typeof a.find(...) !== 'undefined'`

.If you enjoy this, you may also like Project Euler

Rust solution:

Made a sized Set to overcome the reallocation and copying of data. This increases the Size requirement to O(N).

Probably the shortest

slightimprovement with a short circuit (`find`

vs`filter`

):I just realized both of these solutions fail in the case of

`twoNumbersEqual([5], 10)`

. This can be fixed by passing the index to`includes`

, as such...This should also theoretically be more performant.

Best way for 1 pass would be to as you iterate through the array, store the number pair that would make that index total k into an array. And then on each subsequent number check it it exists in the check array before storing its pair

This page lists the complexities of the major data structures in Python. Sets and dicts have a similar implementation. The lookup is O(1) in average case, the hash function can lead to O(n) lookups in pathological cases.

Also, amortized O(n) is possible when dicts/sets are resized when adding new elements

PS: For this specific problem, we do not need to use a set as an array will do i.e keys are integers. So if the range of numbers is limited, array index provides a good hash function.

I do not think there is an upper range of numbers and they are also not subsequent, so using an array like a hashset is not usefull in this particular situation

Yes. I was not sure of that constraint. In that case, set is better.

This solution is, unfortunately, not correct. Let's say we have the set {1,2,3,10,16} with k=20. Clearly, there are no two numbers that sum up to k. Your code, and all the ones I've seen in the comments, would return true, because on the 3rd iteration (number=10) we find an entry in the set that matches, but it is the number itself.

Also, while your solution is O(n), I'm pretty sure it does two passes. If I remember correctly, building a hash set is itself O(n).

In the given example, it would not return true, because the current number is added to the set after checking.

Also I tried it and it returns false.

I am not sure about the hashset building, could also use a dictionary, but it is wasted space.

Right, I am sorry, I misread. I thought you had built the hash set before you started the comparison pass, not during it. On closer inspection, good solution.

JavaScript:

returning all possible matches:

I would sort the list and create a break condition in the innermost loop. When a sum bigger than k is observed there is no point in continuing on testing said number of the outer loop.

If we sort the list first, I can do you one better: have two iterators/indices, i at the start and j at the end. If the sum of the pointees is k, return true. If it is less, increment i. If it is more, decrement j. If i and j meet (and the sum did not equal k), return false. This is O(n).

Unfortunately sorting the list is O(n log(n)).

Good point. I forgot that sorting is expensive

Don't know if it suits the problem given the fact that it'll output true/false on each item in the HashSet but here's my take on the problem

I know Count() maybe adds unneeded complexity and it may not be a one pass.

I'm looking for feedback here.

Thanks for sharing this Daily coding challenge. Is the re a way to get a daily email with a new challenge?

For this problem, solution is simple and its O(n).

If we assume that the list is sorted. we can get the sum pair in O(logn)

You just have to subscribe on their site.

I think we cannot make any assumptions about the input data and solve the task with random input

using System;

namespace PlayGround

{

class DailyProblems

{

int[] numArray;

int arraySize = 0;

int sumValue = 0;

}

Incorrect.

`s`

is a set. The`in`

(contains) for`s`

is O(1). It is traversing through the list exactly once. How are you getting O(N²)?Isn't this something like mine solution?

The task description says to do it in one pass, which I assume is

`O(N)`

This is a pythonic code:-

## Only taking 3 factors for this problem

def num(k,a,b,c):

assert a + b or b + c or c + a == k

if a + b == k:

print("True")

elif b + c == k:

print("True")

elif c + a == k:

print("True")

else:

print("False")

Well as a student here is a less elegant attempt. But I think it gets the job done

github.com/subsr97/daily-coding-pr...

This is my git repo which contains solutions in Python! :)

Hi Ivan,

I'm trying to understand this better.

How come you didn't do numbers.contains() instead of set.contains()? Does using the original list of numbers violate one pass?

Because

`numbers.contains()`

is a list and to see if it contains the element would loop through the whole list (basically`O(n)`

. That makes the solution to have two nested loops and going full`O(n^2)`

I thought the catch for this problem was you cant have the same number like 3 + 3 = 6. It has to be either 2 + 4 = 6 or any combination that doesn't use the same number twice.

Here is my solution in C++