Grew up in Russia, lived in the States, moved to Germany, sometimes live in Spain. I program since I was 13. I used to program games, maps and now I reverse engineer password managers and other stuff
Location
Berlin and Málaga
Education
MS in CS from State Polytechnic University of St. Petersburg
Maybe I'm misunderstanding you, but I would assume that creating one temp array is generally going to be faster than creating N strings, for large N. And less memory, since the strings will need O(n2).
Since the strings are immutable, it's possible to optimize the substring storage. It's possible to make it very efficient, actually. The original string is already occupying the memory, so the parts of it would have to just refer to character ranges in the original buffer. I'm quite certain split and join are implemented in the core in C or C++ and are much more efficient.
And it shouldn't be O(n^2), since the sum of all substrings is the original string, which is n.
I'll contend that unless the performance issue is staring you in the face, that cleaner version is the right one. When there's a bottleneck, you can profile and find it.
That is true to an extent. Sometimes the performance degradation is like a death by thousand cuts, when it's impossible to find a bottleneck. Just things become slower and slower. I agree that premature optimization is not a good thing, but understanding the impact of your code and potential scaling problems is very important. It's quite easy to introduce a naive O(n^2) algorithm and not noice the problems because n is very small. And then the system hits production and things start to time out in weird places.
Full stack web dev.
Studying FP web development approaches, while helping Mission Bit create paths to programming for underserved public school kids.
Previously @ Gradescope.
the sum of all substrings is the original string, which is n.
Ah, I meant the intermediate strings. The sum of "a", "ab", "abc", ... is triangular, so the memory usage depends on whether they're stored. (I'd originally written something like "language optimization aside" but took it out for some reason.)
But your comment led me to read about "cons strings" in V8; thanks for that. That's pretty cool.
And then the system hits production and things start to time out in weird places.
Regarding UniformlySlowCode: FWIW, my experience is like that of the folks who said there that they hadn't seen it firsthand, or only saw it when the architecture was a bad fit. It does happen. Twitter rewrote their RoR back end in Scala. But they do real-time messaging with 250m users. They're an exceptional case. Who knows if they'd have gotten to that size if they hadn't started with RoR?
I worried about writing tight loops when I did contest problems (in Java, so buffered string building was necessary). I also might have a different take if I wrote DOS games. But what I see in my web application work is stuff like this:
transaction wraparound ID / vacuuming made an important db table unusable
big parallel image processing jobs done by a microservice had to be batched because having them each call back with results DDOSed us
CSS transitions + scaling of large images on mobile retina devices quickly hit the per-tab memory limit
not being able to pass continuous mouse input, or even fast keystrokes, through a redux store
The causes vary, but they're never amenable to a quick fix involving array indices. I guess something that is similar is when you have to replace ORM-generated queries with hand-tuned ones. You can (and people do) advocate ditching the ORM. But there are always trade-offs, and I've seen some real dumpster-fire data access layers on projects that lack them.
Anyway, if I somehow had to regularly reverse 1 GB things, I'd put them on a job queue, and might also have the app shell out to a C program that first wrote the reversed thing to disk. Probably not the right answer for an interview question, though 😂😂
Grew up in Russia, lived in the States, moved to Germany, sometimes live in Spain. I program since I was 13. I used to program games, maps and now I reverse engineer password managers and other stuff
Location
Berlin and Málaga
Education
MS in CS from State Polytechnic University of St. Petersburg
Ah, I meant the intermediate strings. The sum of "a", "ab", "abc", ... is triangular
That's triangular, but In don't really see why would that be needed in split.
Anyway, if I somehow had to regularly reverse 1 GB things, I'd put them on a job queue, and might also have the app shell out to a C program that first wrote the reversed thing to disk. Probably not the right answer for an interview question, though
I understand that it doesn't make sense to reverse 1 gb stings and this will never happen in real life. As never will happen a task to reverse a string or find the longest word on your day job after you passed that interview. It's just a made up example to talk about the code. And to talk about performance of that code. Which might be very well a follow up question to this answer. When you write any code you should be able to reason about it's performance to some extent, both memory and CPU impact. That's why talking about very big numbers becomes important, for n = 3 any algorithm works fine ;-)
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Since the strings are immutable, it's possible to optimize the substring storage. It's possible to make it very efficient, actually. The original string is already occupying the memory, so the parts of it would have to just refer to character ranges in the original buffer. I'm quite certain
split
andjoin
are implemented in the core in C or C++ and are much more efficient.And it shouldn't be
O(n^2)
, since the sum of all substrings is the original string, which isn
.That is true to an extent. Sometimes the performance degradation is like a death by thousand cuts, when it's impossible to find a bottleneck. Just things become slower and slower. I agree that premature optimization is not a good thing, but understanding the impact of your code and potential scaling problems is very important. It's quite easy to introduce a naive
O(n^2)
algorithm and not noice the problems becausen
is very small. And then the system hits production and things start to time out in weird places.Thanks a lot =)
Ah, I meant the intermediate strings. The sum of "a", "ab", "abc", ... is triangular, so the memory usage depends on whether they're stored. (I'd originally written something like "language optimization aside" but took it out for some reason.)
But your comment led me to read about "cons strings" in V8; thanks for that. That's pretty cool.
Regarding UniformlySlowCode: FWIW, my experience is like that of the folks who said there that they hadn't seen it firsthand, or only saw it when the architecture was a bad fit. It does happen. Twitter rewrote their RoR back end in Scala. But they do real-time messaging with 250m users. They're an exceptional case. Who knows if they'd have gotten to that size if they hadn't started with RoR?
I worried about writing tight loops when I did contest problems (in Java, so buffered string building was necessary). I also might have a different take if I wrote DOS games. But what I see in my web application work is stuff like this:
The causes vary, but they're never amenable to a quick fix involving array indices. I guess something that is similar is when you have to replace ORM-generated queries with hand-tuned ones. You can (and people do) advocate ditching the ORM. But there are always trade-offs, and I've seen some real dumpster-fire data access layers on projects that lack them.
Anyway, if I somehow had to regularly reverse 1 GB things, I'd put them on a job queue, and might also have the app shell out to a C program that first wrote the reversed thing to disk. Probably not the right answer for an interview question, though 😂😂
That's triangular, but In don't really see why would that be needed in
split
.I understand that it doesn't make sense to reverse 1 gb stings and this will never happen in real life. As never will happen a task to reverse a string or find the longest word on your day job after you passed that interview. It's just a made up example to talk about the code. And to talk about performance of that code. Which might be very well a follow up question to this answer. When you write any code you should be able to reason about it's performance to some extent, both memory and CPU impact. That's why talking about very big numbers becomes important, for n = 3 any algorithm works fine ;-)