I was reading an excellent series of articles about TPL Dataflow from Jack Vanlightly the other day. He draw my attention to the difference between Post and SendAsync when we want to post something to a block. I wasn’t thinking about it much at the time. But as I progressed through my readings, I saw what I was missing and saw how important understanding this distinction really is. So I thought it’s not bad if I make it into a post and share it with my readers.
The Difference Between Post and SendAsync
Some blocks in TPL Dataflow have the ability to buffer the incoming and outgoing messages. They do that because when a message comes into a block, the block might be busy. So the block put the items into the message queue which we call buffer. But sometime this incoming buffer is already fully filled. It’s only one reason out of many that can happen but we’ll see how Post and SendAsync Behave in this situation. It’s important to note that blocks can postpone the decision about how to deal with the current message. This is where the main different between these two lies.
When we use
Post, the control immediately return if the block want to postpone its decision about the message. In other words for example if the incoming message queue is full the post going to return false. That means basically the post rejects the current and we need to be aware of it and put in some mechanism to resend the item etc. Let make things more clear with a unit test.
As you can see above, our
BufferBlock only have one slut for buffer. So when we send an item to it, and then we try to send another item,
block.Post(13) will return false. So basically our item is not accepted and we should have a mechanism to try to send it again.
The SendAsync method is different in that if a block decides to postpone accepting a message it doesn’t return false immediately. But it returns a task of bool that represent the block’s decision in the future. In other words we’ll wait for the block to make a decision about whether or not it wants to accept our message. Let’s clarify this by these two unit tests.
In our first test, when we send the second item into our block using SendAsync, we’ll see that if we check the state of our task it’s not complete. So
SendAsync will return a task for us in the buffer was full and wait for the block to accept our item as opposed to Post method that return false immediately in these situations.
Why The Difference Is Important
So we saw how different these two methods are when we use the dataflow blocks. But when should we use these two method and what’s the implications? The implication is that if we use Post and we’re not careful we might loose some important data that we cannot afford to loose. Keep in mind that this also can happen with SendAsync for other reasons. So we should have a mechanism in place for both of them.
What We Can Do About This?
IMO we should use SendAsync most of the time unless we are OK with loosing an item in the process. For example if we lost an input in some kind of statistic app and we had a wide tolerance, it maybe OK to use
Post method. Also worth to note that even if we decide to use SendAsync, our block still might reject the item. But that’s not something that happens due to limit in our buffer, but it’s more of an special case.
There’s a lot of solutions to this problem, but the most straightforward thing we could do for both Post and SendAsync is to dequeue an item send it to post, if it was not accepted, we queue the item again. Another solution can be condition in a while loop that make sure we send the item again and again into the block until the work on the item is done and then we move to the next item.
In this post I described the difference between Post and SendAsync and we saw how they behave under different circumstances. I also propose some simple solutions to deal with situations where we can’t afford data loss. If you need more information I suggest you also read this post and this question too.