.In my previous post, I explained what TPL Dataflow is and when we should and shouldn’t use it. But I haven’t got into much detail about how it can help us solve real world scenarios. Since my last post, thanks to a comment on my previous post, I went though different Stackoverflow questions. I saw how the mainstream concurrency constructs in .Net can get cumbersome and limit us from solving the core issue. So In this post I’m going to go into more detail about TPL Dataflow and I’m going to use ActionBlock to do that.
In my last post I said we can build pipelines with TPL Dataflow. Since then I realized that it can be used to solve all kind of other concurrency problems too. So I’ll go through couple of these scenarios and show you how we can easily solve these kind of problems easily with TPL Dataflow. In my next post I’ll show you a real world example of building a processing pipeline and how it can makes development easier.
Asynchronous Data Steaming
Problem: Data Stream With Asynchronous Iterator Block
As you might know, the await and yield keyword doesn’t mix well together, you can see what I mean in this question. Now imagine we have a method like this and we want to make it asynchronous.
If we try to make this method async we will get this error.
The body of ‘Read’ cannot be an iterator block because ‘Task<IEnumerable<Data>>’ is not an iterator interface type.
So if one solution which is not ideal is to accumulate the data into a list and then return it.
But this solution can waste a lot of resources on server since we should accumulate all the data before returning it. But the ideal solution is to have some kind of construct that not only do things asynchronously, but also steam the data as it becomes available. By doing so we don’t need to accumulate the data and waste a lot of ram. We’ll see that in the next solution.
Solution: Data Stream With TPL Dataflow ActionBlock
One easy solution for these kind of problem is to use ActionBlock. We can write something like this instead of previous code.
With this simple solution, not only we solved the problem with accumulation of data but also now we have an asynchronous program. In the code above, first we create an ActionBlock and pass in a delegate that is responsible for processing the data. Then we specify the ExecutionDataflowBlockOptions
and tell the block to only process a 1000 data at the same time by setting the BoundedCapacity
. We also set the MaxDegreeOfParallelism
, which determines how many thread is allowed to be used by this ActionBlock.
After Creating the block, we post our items asynchronously into the block for processing. So now we have a more flexible and robust way of steaming in an asynchronous fashion. You can read the full discussion around this problem in this question.
Asynchronous Batch Processing
Problem: Batch Processing With Task.Run
Suppose we have to do some kind of batch processing. We have some kind of queue, in a loop we dequeue an item every time and we need to process the data asynchronously. Imagine we have written a code like this.
The problem with this code is that we create lots of worker threads. For every items in the queue we create a new thread for processing and that can have negative effect on our performance. Our goal here is to create a situation where we have a program which is asynchronous and non-blocking and parallel at the same time. But with the current solution, we waste a lot of resources to achieve this. You can see the solution to this next.
Solution: Batch Processing With TPL Dataflow ActionBlock
With help of TPL Dataflow ActionBlock, could change our previous code this something like this.
Now not only we have a program that is both works in parallel and is asynchronous, but also we control how much resources we want to dedicate to this task. I know this example is arbitrary, but what I want to show is that a lot of bad and verbose async and parallel code can be changed to use TPL Dataflow constructs. There’s a lot of other constructs in TPL Dataflow library which often used for more advanced data processing networks. But we could also use something as simple as ActionBlock to create asynchronous and parallel programs that not only are more flexible, but also easier to write and maintain.
Summary
In this post we went through couple of examples of real world problem related to streaming and batch processing from Stackoverflow. We saw how using the ordinary asynchronous constructs in .Net can make things more difficult. Finally we saw how we can solve this kind of problems with more easily with what TPL Dataflow makes available to us. If you want to know more about TPL Dataflow, here’s a good place to start.