Efficient Data Handling in C# with IAsyncEnumerable: Chunking Example

Introduction

When working with large data sets, such as files, databases, or API responses, loading everything into memory can cause significant performance issues and unnecessary memory usage. In C#, the IAsyncEnumerable<T> interface allows us to process data in chunks, or as a stream, rather than loading it all at once.

This blog will focus on chunking, a memory-efficient technique, and how it works with IAsyncEnumerable to avoid memory overload. We’ll walk through an example that demonstrates streaming a large file in chunks, processing each chunk asynchronously.

The Problem: Loading All Data into Memory

In typical scenarios, when dealing with large data sets, a naive approach would load the entire file or database result set into memory at once. This approach can easily lead to out-of-memory exceptions or severe performance degradation, especially when the data is large or the application runs on resource-constrained environments.

Here’s an example of how data is inefficiently loaded all at once:

public async Task<List<string>> LoadFileAsync(string filePath)
{
    var lines = new List<string>();
    var allLines = await File.ReadAllLinesAsync(filePath); // Loads everything at once
    lines.AddRange(allLines);
    return lines;
}

In this example, the ReadAllLinesAsync method reads the entire file into memory. This approach is not ideal for large files, as it can consume substantial memory.

The Solution: Streaming with IAsyncEnumerable and Chunking

Using IAsyncEnumerable, we can stream the data in manageable chunks, processing each chunk as it becomes available. This approach saves memory and improves performance, especially for large data sets or files.

Let’s walk through an example where we read a large file line by line in chunks, without loading the entire file into memory. We’ll process a chunk of lines asynchronously and continue reading as more lines become available.

Example: Streaming Large Files in Chunks

In this example, we read a file in chunks of lines and process each chunk asynchronously:

public async IAsyncEnumerable<List<string>> ReadFileInChunksAsync(string filePath, int chunkSize)
{
    using var reader = new StreamReader(filePath);
    var lines = new List<string>();
    
    while (!reader.EndOfStream)
    {
        while (lines.Count < chunkSize && !reader.EndOfStream)
        {
            var line = await reader.ReadLineAsync();
            if (line != null)
                lines.Add(line);
        }
        
        // Return the chunk
        yield return lines;
        lines.Clear(); // Clear the list for the next chunk
    }
}

In this example:

chunkSize controls how many lines are read before yielding the next batch.
yield return sends each chunk back to the caller without loading the entire file into memory.
The list is cleared after each chunk is processed, allowing the next chunk to be collected and processed without memory bloat.

Consuming the Chunks

To consume the chunks asynchronously, you can use an await foreach loop. Here’s an example of how to process each chunk:

public async Task ProcessFileChunksAsync(string filePath, int chunkSize)
{
    await foreach (var chunk in ReadFileInChunksAsync(filePath, chunkSize))
    {
        Console.WriteLine($"Processing chunk of {chunk.Count} lines...");
        // Process each line in the chunk
        foreach (var line in chunk)
        {
            Console.WriteLine(line);
        }
    }
}

In this example:

Each chunk is processed as it becomes available.
Memory is conserved since we only hold one chunk of data in memory at any time, rather than the entire file.

Why This Saves Memory

With the chunking approach using IAsyncEnumerable, we stream data in small, manageable batches instead of loading everything into memory at once. The key advantages include:

Lower memory usage: Only a portion of the data (a chunk) is loaded into memory, preventing the application from over-consuming memory resources.
Increased scalability: This approach works well for processing large files, logs, or database result sets, especially when the data set size exceeds available memory.
Reduced latency: Data can be processed as soon as the first chunk is ready, without waiting for the entire data set to be loaded.

Comparing Memory Usage

Here’s a simple comparison to illustrate how memory is managed in both approaches:

Without chunking (entire file in memory): If the file is 500 MB, the entire 500 MB is loaded into memory.
With chunking (streaming): If the chunk size is 1 MB, only 1 MB is loaded into memory at a time, reducing peak memory usage significantly.

Best Practices

Adjust chunk size: Find an optimal chunk size that balances between memory usage and processing efficiency. Smaller chunks reduce memory, but may involve more I/O overhead, while larger chunks may use more memory.
Use cancellation tokens: Implement cancellation tokens to allow the stream to be cancelled during long-running operations.
Dispose resources properly: Always ensure streams and readers are disposed of correctly to avoid memory leaks.

Example with CancellationToken

Let’s add a cancellation token to handle long-running operations gracefully:

public async IAsyncEnumerable<List<string>> ReadFileInChunksAsync(
    string filePath,
    int chunkSize,
    CancellationToken cancellationToken = default)
{
    using var reader = new StreamReader(filePath);
    var lines = new List<string>();

    while (!reader.EndOfStream)
    {
        while (lines.Count < chunkSize && !reader.EndOfStream)
        {
            var line = await reader.ReadLineAsync(cancellationToken);
            if (line != null)
                lines.Add(line);

            // Check if cancellation has been requested
            if (cancellationToken.IsCancellationRequested)
                yield break;
        }
        
        yield return lines;
        lines.Clear();
    }
}

Now the method checks for cancellation requests during execution, allowing the operation to be stopped if needed.

Conclusion

Using IAsyncEnumerable with chunking is an efficient way to handle large data sets in C#. By streaming data in chunks, you save memory, improve scalability, and keep your applications responsive.

Start implementing chunking with IAsyncEnumerable in your projects to optimize memory usage, especially when dealing with large files or databases. Stay tuned for more performance optimization techniques in our upcoming blogs.