Discussion on: Rust futures: an uneducated, short and hopefully not boring tutorial - Part 4 - A "real" future from scratch

View post

Thanks for writing this tutorial. I'm wondering about something though. Creating a separate thread for unparking seems like killing the point of futures all together. I might as well store my result in a cross-thread variable as an option and check from time to time whether it has Some(xxx) or None.

What seems appropriate is since the future is say calculating something, or querying a db, anyways it's running code, it should probably unpark the future when it's done. That way you have no overhead and no latency. However in that case why poll (even though futures doesn't actually poll). In other words why even have a poll method if you can replace it with an event. Maybe it's just a misnomer, but if you unpark why not pass the result of the future to it at once, saving the call to poll.

Maybe I'm just confused and I missed something fundamental, but I have an allergy to polling anyway.

Francesco Cogno • Nov 29 '17 • Edited

Creating a separate thread for unparking seems like killing the point of futures all together.

Yes of course you are right! This is a contrived example, I'm just using a separate thread to simulate an "external event coming to completion". you are not supposed to do this in reality (as you point correctly this will defeat the purpose of having a future :) ).

What seems appropriate is since the future is say calculating something, or querying a db, anyways it's running code, it should probably unpark the future when it's done.

Exactly! Futures are great when waiting for an "external resource". You can block your thread waiting for it to complete or do something else. Apologies if the example above is misleading, I just wanted to simulate a "external blocking resource" and I thought a sleeping thread would be simple to understand.

However in that case why poll (even though a future doesn't actually poll). In other words why even have a poll method if you can replace it with an event. Maybe it's just a misnomer, but if you unpark why not pass the result of the future to it at once, saving the call to poll.

Rust futures support both approaches. If the "external resource" can raise an event upon completion (or progress) you can definitely use the event route. Just park the task and let the completion event unpark the task (as we did above).
If not you can poll the "external resource".

This difference is visible only to the crate implementer. The user consuming the future does not need to care about it. It's just a future: he can chain, join, etc... it regardless of it being "event based" or "polling based". It will work.

I have an allergy to polling anyway.

That made me laugh :D!

Naja Melan • Nov 29 '17 • Edited

Ok, thanks. That clarifies some things. If I get it right to fully benefit from the async any external resource still has to support async as well, since otherwise some thread will still have to block (eg. if I use the file system api in std?). And any computational work is best put in separate threads to benefit of concurrency.

I write this because the Alex Crichton tutorial starts out by making a point that you don't need so much multithreading when using async which is true to some extend, but also a bit confusing if you just try to understand how to fit it all together.

When I tried to understand how to use hyper, I run into stuff like this. It did help me because it seems like the example with least boilerplate (and showing how to use futures-await with hyper), but it makes all methods async even though they're all running in the same thread. If I understand it well, this will just give more overhead and no benefit at all. As far as I can tell, futures-await does not make your code multithreaded. Am I getting it right?

ps: I found the explanation of how futures get unparked here: tokio.rs/docs/going-deeper-futures...

Francesco Cogno • Nov 30 '17 • Edited

Ok, thanks. That clarifies some things. If I get it right to fully benefit from the async any external resource still has to support async as well, since otherwise some thread will still have to block (eg. if I use the file system api in std?). And any computational work is best put in separate threads to benefit of concurrency.

Yes. In general, OS offer some support to async IO and leave to devs to optimize CPU bound tasks via, for example, thread pools. Take a look at this pages Synchronous and Asynchronous I/O and I/O Completion Ports: they show how Windows offers async IO to its devs.

I write this because the Alex Crichton tutorial starts out by making a point that you don't need so much multithreading when using async which is true to some extend, but also a bit confusing if you just try to understand how to fit it all together.

I felt the same way. Is not that Alex's tutorial is bad. It's actually very, very good. But it many things for granted meaning mortal developers such as myself have an hard time following it. That's why I wrote this tutorial in the first place. I hope it helps someone :)

When I tried to understand how to use hyper, I run into stuff like this. It did help me because it seems like the example with least boilerplate (and showing how to use futures-await with hyper), but it makes all methods async even though they're all running in the same thread. If I understand it well, this will just give more overhead and no benefit at all. As far as I can tell, futures-await does not make your code multithreaded. Am I getting it right?

TL;DR

Futures are more efficient than threads.

Long answer

The whole point of futures is to multiplex more tasks in a single thread. In the case of Hyper web server it allows you to handle multiple connections concurrently in the same thread. While one task is sending data to the network, for example, you can prepare the next answer (in the same thread).

You can have the same effect using multiple threads of course (the classic educational approach: "listen to a port, accept a connection, fork the process to handle the connection") but it's less efficient.
Threads are expensive to create (both in terms of CPU and memory) so with many short lived connections (such as HTTP) spawning a thread for each connection is terrible: you end up waiting more for the thread creation than everything else. Also threads make it very hard to surface errors/exceptions (Rust here helps, to an extent). Generally speaking, resource cleanup in threads is hard.

You could have avoided the thread generation cost using thread pools but the burden of managing it would be on your shoulders. Hence the futures. Futures are a convenient way of hiding the complexity of multiplexing tasks in a single/few thread(s).

Alex's futures are particularly elegant because, off the top of my head:

They handle the "failure" gracefully using the standard Rust Result
They allow zero cost abstractions (if you don't Box of course)
They support both poll based and event based tasks
They allow to pick and choose when to move in closures
The combinators are very ergonomic to use (once understood how they work)
The stream maps nicely on iterators

To make my point look at the performance of tokio-minihttp (a minimal webserver based on futures) compared to hyper: