Learn how to zip and unzip compressed files with C#. Beware: it’s not as obvious as it might seem!
Table of Contents
Just a second! 🫷 If you are here, it means that you are a software developer.
So, you know that storage, networking, and domain management have a cost .
If you want to support this blog, please ensure that you have disabled the adblocker for this site. I configured Google AdSense to show as few ADS as possible – I don’t want to bother you with lots of ads, but I still need to add some to pay for the resources for my site.
Thank you for your understanding. – Davide
When working with local files, you might need to open, create, or update Zip files.
In this article, we will learn how to work with Zip files in C#. We will learn how to perform basic operations such as opening, extracting, and creating a Zip file.
The main class we will use is named ZipFile, and comes from the System.IO.Compression namespace. It’s been present in C# since .NET Framework 4.5, so we can say it’s pretty stable 😉 Nevertheless, there are some tricky points that you need to know before using this class. Let’s learn!
Using C# to list all items in a Zip file
Once you have a Zip file, you can access the internal items without extracting the whole Zip.
You can use the ZipFile.Open method.
using ZipArchive archive = ZipFile.Open(zipFilePath, ZipArchiveMode.Read);
System.Collections.ObjectModel.ReadOnlyCollection<ZipArchiveEntry> entries = archive.Entries;
Notice that I specified the ZipArchiveMode. This is an Enum whose values are Read, Create, and Update.
Using the Entries property of the ZipArchive, you can access the whole list of files stored within the Zip folder, each represented by a ZipArchiveEntry instance.
The ZipArchiveEntry object contains several fields, like the file’s name and the full path from the root archive.
There are a few key points to remember about the entries listed in the ZipArchiveEntry.
It is a ReadOnlyCollection<ZipArchiveEntry>: it means that even if you find a way to add or update the items in memory, the changes are not applied to the actual files;
It lists all files and folders, not only those at the root level. As you can see from the image above, it lists both the files at the root level, like File.txt, and those in inner folders, such as TestZip/InnerFolder/presentation.pptx;
Each file is characterized by two similar but different properties: Name is the actual file name (like presentation.pptx), while FullName contains the path from the root of the archive (e.g. TestZip/InnerFolder/presentation.pptx);
It lists folders as if they were files: in the image above, you can see TestZip/InnerFolder. You can recognize them because their Name property is empty and their Length is 0;
Lastly, remember that ZipFile.Open returns an IDisposable, so you should place the operations within a using statement.
❓❓A question for you! Why do we see an item for the TestZip/InnerFolder folder, but there is no reference to the TestZip folder? Drop a comment below 📩
Extracting a Zip folder is easy but not obvious.
We have only one way to do that: by calling the ZipFile.ExtractToDirectory method.
It accepts as mandatory parameters the path of the Zip file to be extracted and the path to the destination:
var zipPath = @"C:\Users\d.bellone\Desktop\TestZip.zip";
var destinationPath = @"C:\Users\d.bellone\Desktop\MyDestination";
ZipFile.ExtractToDirectory(zipPath, destinationPath);
Once you run it, you will see the content of the Zip copied and extracted to the MyDestination folder.
Note that this method creates the destination folder if it does not exist.
This method accepts two more parameters:
entryNameEncoding, by which you can specify the encoding. The default value is UTF-8.
overwriteFiles allows you to specify whether it must overwrite existing files. The default value is false. If set to false and the destination files already exist, this method throws a System.IO.IOException saying that the file already exists.
Using C# to create a Zip from a folder
The key method here is ZipFile.CreateFromDirectory, which allows you to create Zip files in a flexible way.
The first mandatory value is, of course, the source directory path.
The second mandatory parameter is the destination of the resulting Zip file.
Or it can be a Stream that you can use later for other operations:
using (MemoryStream memStream = new MemoryStream())
{
string sourceFolderPath = @"\Desktop\myFolder";
ZipFile.CreateFromDirectory(sourceFolderPath, memStream);
var lenght = memStream.Length;// here the Stream is populated}
You can finally add some optional parameters:
compressionLevel, whose values are Optimal, Fastest, NoCompression, SmallestSize.
includeBaseDirectory: a flag that defines if you have to copy only the first-level files or also the root folder.
A quick comparison of the four Compression Levels
As we just saw, we have four compression levels: Optimal, Fastest, NoCompression, and SmallestSize.
What happens if I use the different values to zip all the photos and videos of my latest trip?
Fastest compression generates a smaller file than Smallest compression.
Fastest compression is way slower than Smallest compression.
Optimal lies in the middle.
This is to say: don’t trust the names; remember to benchmark the parts where you need performance, even with a test as simple as this.
Wrapping up
This was a quick article about one specific class in the .NET ecosystem.
As we saw, even though the class is simple and it’s all about three methods, there are some things you should keep in mind before using this class in your code.
I hope you enjoyed this article! Let’s keep in touch on Twitter or LinkedIn! 🤜🤛
Just a second! 🫷 If you are here, it means that you are a software developer.
So, you know that storage, networking, and domain management have a cost .
If you want to support this blog, please ensure that you have disabled the adblocker for this site. I configured Google AdSense to show as few ADS as possible – I don’t want to bother you with lots of ads, but I still need to add some to pay for the resources for my site.
Thank you for your understanding. – Davide
In my opinion, Unit tests should be well structured and written even better than production code.
In fact, Unit Tests act as a first level of documentation of what your code does and, if written properly, can be the key to fixing bugs quickly and without adding regressions.
One way to improve readability is by grouping similar tests that only differ by the initial input but whose behaviour is the same.
Let’s use a dummy example: some tests on a simple Calculator class that only performs sums on int values.
publicstaticclassCalculator{
publicstaticint Sum(int first, int second) => first + second;
}
One way to create tests is by creating one test for each possible combination of values:
publicclassSumTests{
[Test]publicvoid SumPositiveNumbers()
{
var result = Calculator.Sum(1, 5);
Assert.That(result, Is.EqualTo(6));
}
[Test]publicvoid SumNegativeNumbers()
{
var result = Calculator.Sum(-1, -5);
Assert.That(result, Is.EqualTo(-6));
}
[Test]publicvoid SumWithZero()
{
var result = Calculator.Sum(1, 0);
Assert.That(result, Is.EqualTo(1));
}
}
However, it’s not a good idea: you’ll end up with lots of identical tests (DRY, remember?) that add little to no value to the test suite. Also, this approach forces you to add a new test method to every new kind of test that pops into your mind.
When possible, we should generalize it. With NUnit, we can use the TestCase attribute to specify the list of parameters passed in input to our test method, including the expected result.
We can then simplify the whole test class by creating only one method that accepts the different cases in input and runs tests on those values.
[Test][TestCase(1, 5, 6)][TestCase(-1, -5, -6)][TestCase(1, 0, 1)]publicvoid SumWorksCorrectly(int first, int second, int expected)
{
var result = Calculator.Sum(first, second);
Assert.That(result, Is.EqualTo(expected));
}
By using TestCase, you can cover different cases by simply adding a new case without creating new methods.
Clearly, don’t abuse it: use it only to group methods with similar behaviour – and don’t add if statements in the test method!
There is a more advanced way to create a TestCase in NUnit, named TestCaseSource – but we will talk about it in a future C# tip 😉
Further readings
If you are using NUnit, I suggest you read this article about custom equality checks – you might find it handy in your code!
C# devs have the bad habit of creating interfaces for every non-DTO class because «we need them for mocking!». Are you sure it’s the only way?
Table of Contents
Just a second! 🫷 If you are here, it means that you are a software developer.
So, you know that storage, networking, and domain management have a cost .
If you want to support this blog, please ensure that you have disabled the adblocker for this site. I configured Google AdSense to show as few ADS as possible – I don’t want to bother you with lots of ads, but I still need to add some to pay for the resources for my site.
Thank you for your understanding. – Davide
One of the most common traits of C# developers is the excessive usage of interfaces.
For every non-DTO class we define, we usually also create the related interface. Most of the time, we don’t need it because we have multiple implementations of an interface. Instead, we say that we need an interface to enable mocking.
That’s true; it’s pretty straightforward to mock an interface: lots of libraries, like Moq and NSubstitute, allow you to create mocks and pass them to the class under test. What if there were another way?
In this article, we will learn how to have complete control over a dependency while having the concrete class, and not the related interface, injected in the constructor.
C# devs always add interfaces, just in case
If you’re a developer like me, you’ve been taught something like this:
One of the SOLID principles is Dependency Inversion; to achieve it, you need Dependency Injection. The best way to do that is by creating an interface, injecting it in the consumer’s constructor, and then mapping the interface and the concrete class.
Sometimes, somebody explains that we don’t need interfaces to achieve Dependency Injection. However, there are generally two arguments proposed by those who keep using interfaces everywhere: the “in case I need to change the database” argument and, even more often, the “without interfaces, I cannot create mocks”.
Are we sure?
The “Just in case I need to change the database” argument
One phrase that I often hear is:
Injecting interfaces allows me to change the concrete implementation of a class without worrying about the caller. You know, just in case I had to change the database engine…
Yes, that’s totally right – using interfaces, you can change the internal implementation in a bat of an eye.
Let’s be honest: in all your career, how many times have you changed the underlying database? In my whole career, it happened just once: we tried to build a solution using Gremlin for CosmosDB, but it turned out to be too expensive – so we switched to a simpler MongoDB.
But, all in all, it wasn’t only thanks to the interfaces that we managed to switch easily; it was because we strictly separated the classes and did not leak the models related to Gremlin into the core code. We structured the code with a sort of Hexagonal Architecture, way before this term became a trend in the tech community.
Still, interfaces can be helpful, especially when dealing with multiple implementations of the same methods or when you want to wrap your head around the methods, inputs, and outputs exposed by a module.
The “I need to mock” argument
Another one I like is this:
Interfaces are necessary for mocking dependencies! Otherwise, how can I create Unit Tests?
Well, I used to agree with this argument. I was used to mocking interfaces by using libraries like Moq and defining the behaviour of the dependency using the SetUp method.
It’s still a valid way, but my point here is that that’s not the only one!
One of the simplest tricks is to mark your classes as abstract. But… this means you’ll end up with every single class marked as abstract. Not the best idea.
We have other tools in our belt!
A realistic example: Dependency Injection without interfaces
Let’s start with a real-ish example.
We have a NumbersRepository that just exposes one method: GetNumbers().
publicclassNumbersRepository{
privatereadonlyint[] _allNumbers;
public NumbersRepository()
{
_allNumbers = Enumerable.Range(0, int.MaxValue).ToArray();
}
public IEnumerable<int> GetNumbers() => Random.Shared.GetItems(_allNumbers, 50);
}
Generally, one would be tempted to add an interface with the same name as the class, INumbersRepository, and include the GetNumbers method in the interface definition.
We are not going to do that – the interface is not necessary, so why clutter the code with something like that?
Now, for the consumer. We have a simple NumbersSearchService that accepts, via Dependency Injection, an instance of NumbersRepository (yes, the concrete class!) and uses it to perform a simple search:
We have overridden the GetNumbers method, but to do so, we had to include a new method, SetNumbers, to define the expected result of the former method.
We then can use it in our tests like this:
[Test]publicvoid Should_WorkWithStubRepo()
{
// Arrangevar repository = new StubNumberRepo();
repository.SetNumbers(1, 2, 3);
var service = new NumbersSearchService(repository);
// Actvar result = service.Contains(3);
// Assert Assert.That(result, Is.True);
}
You now have the full control over the subclass. But this approach comes with a problem: if you have multiple methods marked as virtual, and you are going to use all of them in your test classes, then you will need to override every single method (to have control over them) and work out how to decide whether to use the concrete method or the stub implementation.
For example, we can update the StubNumberRepo to let the consumer choose if we need the dummy values or the base implementation:
With this approach, by default, we use the concrete implementation of NumbersRepository because _useStubNumbers is false. If we call the SetNumbers method, we also specify that we don’t want to use the original implementation.
Way 2: Use the virtual keyword in the service to avoid calling the dependency
Similar to the previous approach, we can mark some methods of the caller as virtual to allow us to change parts of our class while keeping everything else as it was.
To achieve it, we have to refactor a little our Service class:
public class NumbersSearchService
{
private readonly NumbersRepository _repository;
public NumbersSearchService(NumbersRepository repository)
{
_repository = repository;
}
public bool Contains(int number)
{
- var numbers = _repository.GetNumbers();
+ var numbers = GetNumbers();
return numbers.Contains(number);
}
+ public virtual IEnumerable<int> GetNumbers() => _repository.GetNumbers();
}
The key is that we moved the calls to the external references to a separate method, marking it as virtual.
This way, we can create a stub class of the Service itself without the need to stub its dependencies:
The approach is almost identical to the one we saw before. The difference can be seen in your tests:
[Test]publicvoid Should_UseStubService()
{
// Arrangevar service = new StubNumberSearch();
service.SetNumbers(12, 15, 30);
// Actvar result = service.Contains(15);
// Assert Assert.That(result, Is.True);
}
There is a problem with this approach: many devs (correctly) add null checks in the constructor to ensure that the dependencies are not null:
public NumbersSearchService(NumbersRepository repository)
{
ArgumentNullException.ThrowIfNull(repository);
_repository = repository;
}
While this approach makes it safe to use the NumbersSearchService reference within the class’ methods, it also stops us from creating a StubNumberSearch. Since we want to create an instance of NumbersSearchService without the burden of injecting all the dependencies, we call the base constructor passing null as a value for the dependencies. If we validate against null, the stub class becomes unusable.
There’s a simple solution: adding a protected empty constructor:
We mark it as protected because we want that only subclasses can access it.
Way 3: Use the “new” keyword in methods to hide the base implementation
Similar to the virtual keyword is the new keyword, which can be applied to methods.
We can then remove the virtual keyword from the base class and hide its implementation by marking the overriding method as new.
public class NumbersSearchService
{
private readonly NumbersRepository _repository;
public NumbersSearchService(NumbersRepository repository)
{
ArgumentNullException.ThrowIfNull(repository);
_repository = repository;
}
public bool Contains(int number)
{
var numbers = _repository.GetNumbers();
return numbers.Contains(number);
}
- public virtual IEnumerable<int> GetNumbers() => _repository.GetNumbers();
+ public IEnumerable<int> GetNumbers() => _repository.GetNumbers();
}
We have restored the original implementation of the Repository.
Now, we can update the stub by adding the new keyword.
internal class StubNumberSearch : NumbersSearchService
{
private IEnumerable<int> _numbers;
private bool _useStubNumbers;
public void SetNumbers(params int[] numbers)
{
_numbers = numbers.ToArray();
_useStubNumbers = true;
}
- public override IEnumerable<int> GetNumbers() => _useStubNumbers ? _numbers : base.GetNumbers();
+ public new IEnumerable<int> GetNumbers() => _useStubNumbers ? _numbers : base.GetNumbers();
}
We haven’t actually solved any problem except for one: we can now avoid cluttering all our classes with the virtual keyword.
A question for you! Is there any difference between using the new and the virtual keyword? When you should pick one instead of the other? Let me know in the comments section! 📩
Way 4: Mock concrete classes by marking a method as virtual
Sometimes, I hear developers say that mocks are the absolute evil, and you should never use them.
Oh, come on! Don’t be so silly!
That’s true, when using mocks you are writing tests on a irrealistic environment. But, well, that’s exactly the point of having mocks!
If you think about it, at school, during Science lessons, we were taught to do our scientific calculations using approximations: ignore the air resistance, ignore friction, and so on. We knew that that world did not exist, but we removed some parts to make it easier to validate our hypothesis.
In my opinion, it’s the same for testing. Mocks are useful to have full control of a specific behaviour. Still, only relying on mocks makes your tests pretty brittle: you cannot be sure that your system is working under real conditions.
That’s why, as I explained in a previous article, I prefer the Testing Diamond over the Testing Pyramid. In many real cases, five Integration Tests are more valuable than fifty Unit Tests.
But still, mocks can be useful. How can we use them if we don’t have interfaces?
If we try to use Moq to create a mock of NumbersRepository (again, the concrete class) like this:
[Test]publicvoid Should_WorkWithMockRepo()
{
// Arrangevar repository = new Moq.Mock<NumbersRepository>();
repository.Setup(_ => _.GetNumbers()).Returns(newint[] { 1, 2, 3 });
var service = new NumbersSearchService(repository.Object);
// Actvar result = service.Contains(3);
// Assert Assert.That(result, Is.True);
}
It will fail with this error:
System.NotSupportedException : Unsupported expression: _ => _.GetNumbers()
Non-overridable members (here: NumbersRepository.GetNumbers) may not be used in setup / verification expressions.
This error occurs because the implementation GetNumbers is fixed as defined in the NumbersRepository class and cannot be overridden.
Unless you mark it as virtual, as we did before.
public class NumbersRepository
{
private readonly int[] _allNumbers;
public NumbersRepository()
{
_allNumbers = Enumerable.Range(0, 100).ToArray();
}
- public IEnumerable<int> GetNumbers() => Random.Shared.GetItems(_allNumbers, 50);
+ public virtual IEnumerable<int> GetNumbers() => Random.Shared.GetItems(_allNumbers, 50);
}
Now the test passes: we have successfully mocked a concrete class!
Further readings
Testing is a crucial part of any software application. I personally write Unit Tests even for throwaway software – this way, I can ensure that I’m doing the correct thing without the need for manual debugging.
However, one part that is often underestimated is the code quality of tests. Tests should be written even better than production code. You can find more about this topic here:
Also, Unit Tests are not enough. You should probably write more Integration Tests than Unit Tests. This one is a testing strategy called Testing Diamond.
In this article, we learned that it’s not necessary to create interfaces for the sake of having mocks.
We have different other options.
Honestly speaking, I’m still used to creating interfaces and using them with mocks.
I find it easy to do, and this approach provides a quick way to create tests and drive the behaviour of the dependencies.
Also, I recognize that interfaces created for the sole purpose of mocking are quite pointless: we have learned that there are other ways, and we should consider trying out these solutions.
Still, interfaces are quite handy for two “non-technical” reasons:
using interfaces, you can understand in a glimpse what are the operations that you can call in a clean and concise way;
interfaces and mocks allow you to easily use TDD: while writing the test cases, you also define what methods you need and the expected behaviour. I know you can do that using stubs, but I find it easier with interfaces.
I know, this is a controversial topic – I’m not saying that you should remove all your interfaces (I think it’s a matter of personal taste, somehow!), but with this article, I want to highlight that you can avoid interfaces.
I hope you enjoyed this article! Let’s keep in touch on Twitter or LinkedIn! 🤜🤛
Just a second! 🫷 If you are here, it means that you are a software developer.
So, you know that storage, networking, and domain management have a cost .
If you want to support this blog, please ensure that you have disabled the adblocker for this site. I configured Google AdSense to show as few ADS as possible – I don’t want to bother you with lots of ads, but I still need to add some to pay for the resources for my site.
Thank you for your understanding. – Davide
Asynchronous programming enables you to execute multiple operations without blocking the main thread.
In general, we often think of the Happy Scenario, when all the operations go smoothly, but we rarely consider what to do when an error occurs.
In this article, we will explore how Task.WaitAll and Task.WhenAll behave when an error is thrown in one of the awaited Tasks.
Prepare the tasks to be executed
For the sake of this article, we are going to use a silly method that returns the same number passed in input but throws an exception in case the input number can be divided by 3:
public Task<int> Echo(intvalue) => Task.Factory.StartNew(
() =>
{
if (value % 3 == 0)
{
Console.WriteLine($"[LOG] You cannot use {value}!");
thrownew Exception($"[EXCEPTION] Value cannot be {value}");
}
Console.WriteLine($"[LOG] {value} is a valid value!");
returnvalue;
}
);
Those Console.WriteLine instructions will allow us to see what’s happening “live”.
We prepare the collection of tasks to be awaited by using a simple Enumerable.Range
var tasks = Enumerable.Range(1, 11).Select(Echo);
And then, we use a try-catch block with some logs to showcase what happens when we run the application.
try{
Console.WriteLine("START");
// await all the tasks Console.WriteLine("END");
}
catch (Exception ex)
{
Console.WriteLine("The exception message is: {0}", ex.Message);
Console.WriteLine("The exception type is: {0}", ex.GetType().FullName);
if (ex.InnerException is not null)
{
Console.WriteLine("Inner exception: {0}", ex.InnerException.Message);
}
}
finally{
Console.WriteLine("FINALLY!");
}
If we run it all together, we can notice that nothing really happened:
In fact, we just created a collection of tasks (which does not actually exist, since the result is stored in a lazy-loaded enumeration).
We can, then, call WaitAll and WhenAll to see what happens when an error occurs.
Error handling when using Task.WaitAll
It’s time to execute the tasks stored in the tasks collection, like this:
try{
Console.WriteLine("START");
// await all the tasks Task.WaitAll(tasks.ToArray());
Console.WriteLine("END");
}
Task.WaitAll accepts an array of tasks to be awaited and does not return anything.
The execution goes like this:
START
1 is a valid value!
2 is a valid value!
:( You cannot use 6!
5 is a valid value!
:( You cannot use 3!
4 is a valid value!
8 is a valid value!
10 is a valid value!
:( You cannot use 9!
7 is a valid value!
11 is a valid value!
The exception message is: One or more errors occurred. ([EXCEPTION] Value cannot be 3) ([EXCEPTION] Value cannot be 6) ([EXCEPTION] Value cannot be 9)
The exception type is: System.AggregateException
Inner exception: [EXCEPTION] Value cannot be 3
FINALLY!
There are a few things to notice:
the tasks are not executed in sequence: for example, 6 was printed before 4. Well, to be honest, we can say that Console.WriteLine printed the messages in that sequence, but maybe the tasks were executed in another different order (as you can deduce from the order of the error messages);
all the tasks are executed before jumping to the catch block;
the exception caught in the catch block is of type System.AggregateException; we’ll come back to it later;
the InnerException property of the exception being caught contains the info for the first exception that was thrown.
There are two main differences to notice when comparing Task.WaitAll and Task.WhenAll:
Task.WhenAll accepts in input whatever type of collection (as long as it is an IEnumerable);
it returns a Task that you have to await.
And what happens when we run the program?
START
2 is a valid value!
1 is a valid value!
4 is a valid value!
:( You cannot use 3!
7 is a valid value!
5 is a valid value!
:( You cannot use 6!
8 is a valid value!
10 is a valid value!
11 is a valid value!
:( You cannot use 9!
The exception message is: [EXCEPTION] Value cannot be 3
The exception type is: System.Exception
FINALLY!
Again, there are a few things to notice:
just as before, the messages are not printed in order;
the exception message contains the message for the first exception thrown;
the exception is of type System.Exception, and not System.AggregateException as we saw before.
This means that the first exception breaks everything, and you lose the info about the other exceptions that were thrown.
📩 but now, a question for you: we learned that, when using Task.WhenAll, only the first exception gets caught by the catch block. What happens to the other exceptions? How can we retrieve them? Drop a message in the comment below ⬇️
Comparing Task.WaitAll and Task.WhenAll
Task.WaitAll and Task.WhenAll are similar but not identical.
Task.WaitAll should be used when you are in a synchronous context and need to block the current thread until all tasks are complete. This is common in simple old-style console applications or scenarios where asynchronous programming is not required. However, it is not recommended in UI or modern ASP.NET applications because it can cause deadlocks or freeze the UI.
Task.WhenAll is preferred in modern C# code, especially in asynchronous methods (where you can use async Task). It allows you to await the completion of multiple tasks without blocking the calling thread, making it suitable for environments where responsiveness is important. It also enables easier composition of continuations and better exception handling.
Let’s wrap it up in a table:
Feature
Task.WaitAll
Task.WhenAll
Return Type
void
Task or Task<TResult[]>
Blocking/Non-blocking
Blocking (waits synchronously)
Non-blocking (returns a Task)
Exception Handling
Throws AggregateException immediately
Exceptions observed when awaited
Usage Context
Synchronous code (e.g., console apps)
Asynchronous code (e.g., async methods)
Continuation
Not possible (since it blocks)
Possible (use .ContinueWith or await)
Deadlock Risk
Higher in UI contexts
Lower (if properly awaited)
Bonus tip: get the best out of AggregateException
We can expand a bit on the AggregateException type.
That specific type of exception acts as a container for all the exceptions thrown when using Task.WaitAll.
It contains a property named InnerExceptions that contains all the exceptions thrown so that you can access them using an Enumerator.
A common example is this:
if (ex is AggregateException aggEx)
{
Console.WriteLine("There are {0} exceptions in the aggregate exception.", aggEx.InnerExceptions.Count);
foreach (var innerEx in aggEx.InnerExceptions)
{
Console.WriteLine("Inner exception: {0}", innerEx.Message);
}
}
Further readings
This article is all about handling the unhappy path.
If you want to learn more about Task.WaitAll and Task.WhenAll, I’d suggest you read the following two articles that I find totally interesting and well-written:
Small changes sometimes make a huge difference. Learn these 6 tips to improve the performance of your application just by handling strings correctly.
Table of Contents
Just a second! 🫷 If you are here, it means that you are a software developer.
So, you know that storage, networking, and domain management have a cost .
If you want to support this blog, please ensure that you have disabled the adblocker for this site. I configured Google AdSense to show as few ADS as possible – I don’t want to bother you with lots of ads, but I still need to add some to pay for the resources for my site.
Thank you for your understanding. – Davide
Sometimes, just a minor change makes a huge difference. Maybe you won’t notice it when performing the same operation a few times. Still, the improvement is significant when repeating the operation thousands of times.
In this article, we will learn five simple tricks to improve the performance of your application when dealing with strings.
Note: this article is part of C# Advent Calendar 2023, organized by Matthew D. Groves: it’s maybe the only Christmas tradition I like (yes, I’m kind of a Grinch 😂).
Benchmark structure, with dependencies
Before jumping to the benchmarks, I want to spend a few words on the tools I used for this article.
The project is a .NET 8 class library running on a laptop with an i5 processor.
Running benchmarks with BenchmarkDotNet
I’m using BenchmarkDotNet to create benchmarks for my code. BenchmarkDotNet is a library that runs your methods several times, captures some metrics, and generates a report of the executions. If you follow my blog, you might know I’ve used it several times – for example, in my old article “Enum.HasFlag performance with BenchmarkDotNet”.
All the benchmarks I created follow the same structure:
the class is marked with the [MemoryDiagnoser] attribute: the benchmark will retrieve info for both time and memory usage;
there is a property named Size with the attribute [Params]: this attribute lists the possible values for the Size property;
there is a method marked as [IterationSetup]: this method runs before every single execution, takes the value from the Size property, and initializes the AllStrings array;
the methods that are parts of the benchmark are marked with the [Benchmark] attribute.
Generating strings with Bogus
I relied on Bogus to create dummy values. This NuGet library allows you to generate realistic values for your objects with a great level of customization.
The string array generation strategy is shared across all the benchmarks, so I moved it to a static method:
Here I have a default set of predefined values ([string.Empty, " ", "\n \t", null]), which can be expanded with the values coming from the additionalStrings array. These values are then placed in random positions of the array.
In most cases, though, the value of the string is defined by Bogus.
Generating plots with chartbenchmark.net
To generate the plots you will see in this article, I relied on chartbenchmark.net, a fantastic tool that transforms the output generated by BenchmarkDotNet on the console in a dynamic, customizable plot. This tool created by Carlos Villegas is available on GitHub, and it surely deserves a star!
Please note that all the plots in this article have a Log10 scale: this scale allows me to show you the performance values of all the executions in the same plot. If I used the Linear scale, you would be able to see only the biggest values.
We are ready. It’s time to run some benchmarks!
Tip #1: StringBuilder is (almost always) better than String Concatenation
Let’s start with a simple trick: if you need to concatenate strings, using a StringBuilder is generally more efficient than concatenating string.
Whenever you concatenate strings with the + sign, you create a new instance of a string. This operation takes some time and allocates memory for every operation.
On the contrary, using a StringBuilder object, you can add the strings in memory and generate the final string using a performance-wise method.
Here’s the result table:
Method
Size
Mean
Error
StdDev
Median
Ratio
RatioSD
Allocated
Alloc Ratio
WithStringBuilder
4
4.891 us
0.5568 us
1.607 us
4.750 us
1.00
0.00
1016 B
1.00
WithConcatenation
4
3.130 us
0.4517 us
1.318 us
2.800 us
0.72
0.39
776 B
0.76
WithStringBuilder
100
7.649 us
0.6596 us
1.924 us
7.650 us
1.00
0.00
4376 B
1.00
WithConcatenation
100
13.804 us
1.1970 us
3.473 us
13.800 us
1.96
0.82
51192 B
11.70
WithStringBuilder
10000
113.091 us
4.2106 us
12.081 us
111.000 us
1.00
0.00
217200 B
1.00
WithConcatenation
10000
74,512.259 us
2,111.4213 us
6,058.064 us
72,593.050 us
666.43
91.44
466990336 B
2,150.05
WithStringBuilder
100000
1,037.523 us
37.1009 us
108.225 us
1,012.350 us
1.00
0.00
2052376 B
1.00
WithConcatenation
100000
7,469,344.914 us
69,720.9843 us
61,805.837 us
7,465,779.900 us
7,335.08
787.44
46925872520 B
22,864.17
Let’s see it as a plot.
Beware of the scale in the diagram!: it’s a Log10 scale, so you’d better have a look at the value displayed on the Y-axis.
As you can see, there is a considerable performance improvement.
There are some remarkable points:
When there are just a few strings to concatenate, the + operator is more performant, both on timing and allocated memory;
When you need to concatenate 100000 strings, the concatenation is ~7000 times slower than the string builder.
In conclusion, use the StringBuilder to concatenate more than 5 or 6 strings. Use the string concatenation for smaller operations.
Edit 2024-01-08: turn out that string.Concat has an overload that accepts an array of strings. string.Concat(string[]) is actually faster than using the StringBuilder. Read more this article by Robin Choffardet.
Tip #2: EndsWith(string) vs EndsWith(char): pick the right overload
One simple improvement can be made if you use StartsWith or EndsWith, passing a single character.
There are two similar overloads: one that accepts a string, and one that accepts a char.
Again, let’s generate the plot using the Log10 scale:
They appear to be almost identical, but look closely: based on this benchmark, when we have 10000, using EndsWith(string) is 10x slower than EndsWith(char).
Also, here, the duration ratio on the 1.000.000-items array is ~3.5. At first, I thought there was an error on the benchmark, but when rerunning it on the benchmark, the ratio did not change.
It looks like you have the best improvement ratio when the array has ~10.000 items.
Tip #3: IsNullOrEmpty vs IsNullOrWhitespace vs IsNullOrEmpty + Trim
As you might know, string.IsNullOrWhiteSpace performs stricter checks than string.IsNullOrEmpty.
To demonstrate it, I have created three benchmarks: one for string.IsNullOrEmpty, one for string.IsNullOrWhiteSpace, and another one that lays in between: it first calls Trim() on the string, and then calls string.IsNullOrEmpty.
As you can see from the Log10 table, the results are pretty similar:
On average, StringIsNullOrWhitespace is ~2 times slower than StringIsNullOrEmpty.
So, what should we do? Here’s my two cents:
For all the data coming from the outside (passed as input to your system, received from an API call, read from the database), use string.IsNUllOrWhiteSpace: this way you can ensure that you are not receiving unexpected data;
If you read data from an external API, customize your JSON deserializer to convert whitespace strings as empty values;
Needless to say, choose the proper method depending on the use case. If a string like “\n \n \t” is a valid value for you, use string.IsNullOrEmpty.
Tip #4: ToUpper vs ToUpperInvariant vs ToLower vs ToLowerInvariant: they look similar, but they are not
Even though they look similar, there is a difference in terms of performance between these four methods.
[MemoryDiagnoser]publicclassToUpperVsToLower()
{
[Params(100, 1000, 10_000, 100_000, 1_000_000)]publicint Size;
publicstring[] AllStrings { get; set; }
[IterationSetup]publicvoid Setup()
{
AllStrings = StringArrayGenerator.Generate(Size);
}
[Benchmark]publicvoid WithToUpper()
{
foreach (string s in AllStrings)
{
_ = s?.ToUpper();
}
}
[Benchmark]publicvoid WithToUpperInvariant()
{
foreach (string s in AllStrings)
{
_ = s?.ToUpperInvariant();
}
}
[Benchmark]publicvoid WithToLower()
{
foreach (string s in AllStrings)
{
_ = s?.ToLower();
}
}
[Benchmark]publicvoid WithToLowerInvariant()
{
foreach (string s in AllStrings)
{
_ = s?.ToLowerInvariant();
}
}
}
What will this benchmark generate?
Method
Size
Mean
Error
StdDev
Median
P95
Ratio
WithToUpper
100
9.153 us
0.9720 us
2.789 us
8.200 us
14.980 us
1.57
WithToUpperInvariant
100
6.572 us
0.5650 us
1.639 us
6.200 us
9.400 us
1.14
WithToLower
100
6.881 us
0.5076 us
1.489 us
7.100 us
9.220 us
1.19
WithToLowerInvariant
100
6.143 us
0.5212 us
1.529 us
6.100 us
8.400 us
1.00
WithToUpper
1000
69.776 us
9.5416 us
27.833 us
68.650 us
108.815 us
2.60
WithToUpperInvariant
1000
51.284 us
7.7945 us
22.860 us
38.700 us
89.290 us
1.85
WithToLower
1000
49.520 us
5.6085 us
16.449 us
48.100 us
79.110 us
1.85
WithToLowerInvariant
1000
27.000 us
0.7370 us
2.103 us
26.850 us
30.375 us
1.00
WithToUpper
10000
241.221 us
4.0480 us
3.588 us
240.900 us
246.560 us
1.68
WithToUpperInvariant
10000
339.370 us
42.4036 us
125.028 us
381.950 us
594.760 us
1.48
WithToLower
10000
246.861 us
15.7924 us
45.565 us
257.250 us
302.875 us
1.12
WithToLowerInvariant
10000
143.529 us
2.1542 us
1.910 us
143.500 us
146.105 us
1.00
WithToUpper
100000
2,165.838 us
84.7013 us
223.137 us
2,118.900 us
2,875.800 us
1.66
WithToUpperInvariant
100000
1,885.329 us
36.8408 us
63.548 us
1,894.500 us
1,967.020 us
1.41
WithToLower
100000
1,478.696 us
23.7192 us
50.547 us
1,472.100 us
1,571.330 us
1.10
WithToLowerInvariant
100000
1,335.950 us
18.2716 us
35.203 us
1,330.100 us
1,404.175 us
1.00
WithToUpper
1000000
20,936.247 us
414.7538 us
1,163.014 us
20,905.150 us
22,928.350 us
1.64
WithToUpperInvariant
1000000
19,056.983 us
368.7473 us
287.894 us
19,085.400 us
19,422.880 us
1.41
WithToLower
1000000
14,266.714 us
204.2906 us
181.098 us
14,236.500 us
14,593.035 us
1.06
WithToLowerInvariant
1000000
13,464.127 us
266.7547 us
327.599 us
13,511.450 us
13,926.495 us
1.00
Let’s see it as the usual Log10 plot:
We can notice a few points:
The ToUpper family is generally slower than the ToLower family;
The Invariant family is faster than the non-Invariant one; we will see more below;
So, if you have to normalize strings using the same casing, ToLowerInvariant is the best choice.
Tip #5: OrdinalIgnoreCase vs InvariantCultureIgnoreCase: logically (almost) equivalent, but with different performance
Comparing strings is trivial: the string.Compare method is all you need.
There are several modes to compare strings: you can specify the comparison rules by setting the comparisonType parameter, which accepts a StringComparison value.
As you can see, there’s a HUGE difference between Ordinal and Invariant.
When dealing with 100.000 items, StringComparison.InvariantCultureIgnoreCase is 12 times slower than StringComparison.OrdinalIgnoreCase!
Why? Also, why should we use one instead of the other?
Have a look at this code snippet:
var s1 = "Aa";
var s2 = "A" + newstring('\u0000', 3) + "a";
string.Equals(s1, s2, StringComparison.InvariantCultureIgnoreCase); //Truestring.Equals(s1, s2, StringComparison.OrdinalIgnoreCase); //False
As you can see, s1 and s2 represent equivalent, but not equal, strings. We can then deduce that OrdinalIgnoreCase checks for the exact values of the characters, while InvariantCultureIgnoreCase checks the string’s “meaning”.
So, in most cases, you might want to use OrdinalIgnoreCase (as always, it depends on your use case!)
Tip #6: Newtonsoft vs System.Text.Json: it’s a matter of memory allocation, not time
For the last benchmark, I created the exact same model used as an example in the official documentation.
This benchmark aims to see which JSON serialization library is faster: Newtonsoft or System.Text.Json?
As you might know, the .NET team has added lots of performance improvements to the JSON Serialization functionalities, and you can really see the difference!
Method
Size
Mean
Error
StdDev
Median
Ratio
RatioSD
Gen0
Gen1
Allocated
Alloc Ratio
WithJson
100
2.063 ms
0.1409 ms
0.3927 ms
1.924 ms
1.00
0.00
–
–
292.87 KB
1.00
WithNewtonsoft
100
4.452 ms
0.1185 ms
0.3243 ms
4.391 ms
2.21
0.39
–
–
882.71 KB
3.01
WithJson
10000
44.237 ms
0.8787 ms
1.3936 ms
43.873 ms
1.00
0.00
4000.0000
1000.0000
29374.98 KB
1.00
WithNewtonsoft
10000
78.661 ms
1.3542 ms
2.6090 ms
78.865 ms
1.77
0.08
14000.0000
1000.0000
88440.99 KB
3.01
WithJson
1000000
4,233.583 ms
82.5804 ms
113.0369 ms
4,202.359 ms
1.00
0.00
484000.0000
1000.0000
2965741.56 KB
1.00
WithNewtonsoft
1000000
5,260.680 ms
101.6941 ms
108.8116 ms
5,219.955 ms
1.24
0.04
1448000.0000
1000.0000
8872031.8 KB
2.99
As you can see, Newtonsoft is 2x slower than System.Text.Json, and it allocates 3x the memory compared with the other library.
So, well, if you don’t use library-specific functionalities, I suggest you replace Newtonsoft with System.Text.Json.
Wrapping up
In this article, we learned that even tiny changes can make a difference in the long run.
Let’s recap some:
Using StringBuilder is generally WAY faster than using string concatenation unless you need to concatenate 2 to 4 strings;
Sometimes, the difference is not about execution time but memory usage;
EndsWith and StartsWith perform better if you look for a char instead of a string. If you think of it, it totally makes sense!
More often than not, string.IsNullOrWhiteSpace performs better checks than string.IsNullOrEmpty; however, there is a huge difference in terms of performance, so you should pick the correct method depending on the usage;
ToUpper and ToLower look similar; however, ToLower is quite faster than ToUpper;
Ordinal and Invariant comparison return the same value for almost every input; but Ordinal is faster than Invariant;
Newtonsoft performs similarly to System.Text.Json, but it allocates way more memory.
My suggestion is always the same: take your time to explore the possibilities! Toy with your code, try to break it, benchmark it. You’ll find interesting takes!
I hope you enjoyed this article! Let’s keep in touch on Twitter or LinkedIn! 🤜🤛
Now you can’t run your application because another process already uses the port. How can you find that process? How to kill it?
Table of Contents
Just a second! 🫷 If you are here, it means that you are a software developer.
So, you know that storage, networking, and domain management have a cost .
If you want to support this blog, please ensure that you have disabled the adblocker for this site. I configured Google AdSense to show as few ADS as possible – I don’t want to bother you with lots of ads, but I still need to add some to pay for the resources for my site.
Thank you for your understanding. – Davide
Sometimes, when trying to run your ASP.NET application, there’s something stopping you.
Have you ever found a message like this?
Failed to bind to address https://127.0.0.1:7261: address already in use.
You can try over and over again, you can also restart the application, but the port still appears to be used by another process.
How can you find the process that is running on a local port? How can you kill it to free up the port and, eventually, be able to run your application?
In this article, we will learn how to find the blocking port in Windows 10 and Windows 11, and then we will learn how to kill that process given its PID.
How to find the process running on a port on Windows 11 using PowerShell
Let’s see how to identify the process that is running on port 7261.
Open a PowerShell and run the netstat command:
NETSTAT is a command that shows info about the active TCP/IP network connections. It accepts several options. In this case, we will use:
-n: Displays addresses and port numbers in numerical form.
-o: Displays the owning process ID associated with each connection.
-a: Displays all connections and listening ports;
-p: Filter for a specific protocol (TCP or UDP)
Notice that the last column lists the PID (Process ID) bound to each connection.
From here, we can use the findstr command to get only the rows with a specific string (the searched port number).
netstat -noa -p TCP | findstr 7261
Now, by looking at the last column, we can identify the Process ID: 19160.
How to kill a process given its PID on Windows or PowerShell
Now that we have the Process ID (PID), we can open the Task Manager, paste the PID value in the topmost textbox, and find the related application.
In our case, it was an instance of Visual Studio running an API application. We can now kill the process by hitting End Task.
If you prefer working with PowerShell, you can find the details of the related process by using the Get-Process command:
Then, you can use the taskkill command by specifying the PID, using the /PID flag, and adding the /F flag to force the killing of the process.
We have killed the process related to the running application. Visual Studio is still working, of course.
Further readings
Hey, what are these fancy colours on the PowerShell?
It’s a customization I added to show the current folder and the info about the associated GIT repository. It’s incredibly useful while developing and navigating the file system with PowerShell.
Just a second! 🫷 If you are here, it means that you are a software developer.
So, you know that storage, networking, and domain management have a cost .
If you want to support this blog, please ensure that you have disabled the adblocker for this site. I configured Google AdSense to show as few ADS as possible – I don’t want to bother you with lots of ads, but I still need to add some to pay for the resources for my site.
Thank you for your understanding. – Davide
Imagine you need a way to raise events whenever an item is added or removed from a collection.
Instead of building a new class from scratch, you can use ObservableCollection<T> to store items, raise events, and act when the internal state of the collection changes.
In this article, we will learn how to use ObservableCollection<T>, an out-of-the-box collection available in .NET.
Introducing the ObservableCollection type
ObservableCollection<T> is a generic collection coming from the System.Collections.ObjectModel namespace.
It allows the most common operations, such as Add<T>(T item) and Remove<T>(T item), as you can expect from most of the collections in .NET.
Moreover, it implements two interfaces:
INotifyCollectionChanged can be used to raise events when the internal collection is changed.
INotifyPropertyChanged can be used to raise events when one of the properties of the changes.
Let’s see a simple example of the usage:
var collection = new ObservableCollection<string>();
collection.Add("Mario");
collection.Add("Luigi");
collection.Add("Peach");
collection.Add("Bowser");
collection.Remove("Luigi");
collection.Add("Waluigi");
_ = collection.Contains("Peach");
collection.Move(1, 2);
As you can see, we can do all the basic operations: add, remove, swap items (with the Move method), and check if the collection contains a specific value.
You can simplify the initialization by passing a collection in the constructor:
var collection = new ObservableCollection<string>(newstring[] { "Mario", "Luigi", "Peach" });
collection.Add("Bowser");
collection.Remove("Luigi");
collection.Add("Waluigi");
_ = collection.Contains("Peach");
collection.Move(1, 2);
How to intercept changes to the underlying collection
As we said, this data type implements INotifyCollectionChanged. Thanks to this interface, we can add event handlers to the CollectionChanged event and see what happens.
var collection = new ObservableCollection<string>(newstring[] { "Mario", "Luigi", "Peach" });
collection.CollectionChanged += WhenCollectionChanges;
Console.WriteLine("Adding Bowser...");
collection.Add("Bowser");
Console.WriteLine("");
Console.WriteLine("Removing Luigi...");
collection.Remove("Luigi");
Console.WriteLine("");
Console.WriteLine("Adding Waluigi...");
collection.Add("Waluigi");
Console.WriteLine("");
Console.WriteLine("Searching for Peach...");
var containsPeach = collection.Contains("Peach");
Console.WriteLine("");
Console.WriteLine("Swapping items...");
collection.Move(1, 2);
The WhenCollectionChanges method accepts a NotifyCollectionChangedEventArgs that gives you info about the intercepted changes:
privatevoid WhenCollectionChanges(object? sender, NotifyCollectionChangedEventArgs e)
{
var allItems = ((IEnumerable<object>)sender)?.Cast<string>().ToArray() ?? newstring[] { "<empty>" };
Console.WriteLine($"> Currently, the collection is {string.Join(',', allItems)}");
Console.WriteLine($"> The operation is {e.Action}");
var previousItems = e.OldItems?.Cast<string>()?.ToArray() ?? newstring[] { "<empty>" };
Console.WriteLine($"> Before the operation it was {string.Join(',', previousItems)}");
var currentItems = e.NewItems?.Cast<string>()?.ToArray() ?? newstring[] { "<empty>" };
Console.WriteLine($"> Now, it is {string.Join(',', currentItems)}");
}
Every time an operation occurs, we write some logs.
The result is:
Adding Bowser...
> Currently, the collection is Mario,Luigi,Peach,Bowser
> The operation is Add
> Before the operation it was <empty>
> Now, it is Bowser
Removing Luigi...
> Currently, the collection is Mario,Peach,Bowser
> The operation is Remove
> Before the operation it was Luigi
> Now, it is <empty>
Adding Waluigi...
> Currently, the collection is Mario,Peach,Bowser,Waluigi
> The operation is Add
> Before the operation it was <empty>
> Now, it is Waluigi
Searching for Peach...
Swapping items...
> Currently, the collection is Mario,Bowser,Peach,Waluigi
> The operation is Move
> Before the operation it was Peach
> Now, it is Peach
Notice a few points:
the sender property holds the current items in the collection. It’s an object?, so you have to cast it to another type to use it.
the NotifyCollectionChangedEventArgs has different meanings depending on the operation:
when adding a value, OldItems is null and NewItems contains the items added during the operation;
when removing an item, OldItems contains the value just removed, and NewItems is null.
when swapping two items, both OldItems and NewItems contain the item you are moving.
How to intercept when a collection property has changed
To execute events when a property changes, we need to add a delegate to the PropertyChanged event. However, it’s not available directly on the ObservableCollection type: you first have to cast it to an INotifyPropertyChanged:
var collection = new ObservableCollection<string>(newstring[] { "Mario", "Luigi", "Peach" });
(collection as INotifyPropertyChanged).PropertyChanged += WhenPropertyChanges;
Console.WriteLine("Adding Bowser...");
collection.Add("Bowser");
Console.WriteLine("");
Console.WriteLine("Removing Luigi...");
collection.Remove("Luigi");
Console.WriteLine("");
Console.WriteLine("Adding Waluigi...");
collection.Add("Waluigi");
Console.WriteLine("");
Console.WriteLine("Searching for Peach...");
var containsPeach = collection.Contains("Peach");
Console.WriteLine("");
Console.WriteLine("Swapping items...");
collection.Move(1, 2);
We can now specify the WhenPropertyChanges method as such:
privatevoid WhenPropertyChanges(object? sender, PropertyChangedEventArgs e)
{
var allItems = ((IEnumerable<object>)sender)?.Cast<string>().ToArray() ?? newstring[] { "<empty>" };
Console.WriteLine($"> Currently, the collection is {string.Join(',', allItems)}");
Console.WriteLine($"> Property {e.PropertyName} has changed");
}
As you can see, we have again the sender parameter that contains the collection of items.
Then, we have a parameter of type PropertyChangedEventArgs that we can use to get the name of the property that has changed, using the PropertyName property.
Let’s run it.
Adding Bowser...
> Currently, the collection is Mario,Luigi,Peach,Bowser
> Property Count has changed
> Currently, the collection is Mario,Luigi,Peach,Bowser
> Property Item[] has changed
Removing Luigi...
> Currently, the collection is Mario,Peach,Bowser
> Property Count has changed
> Currently, the collection is Mario,Peach,Bowser
> Property Item[] has changed
Adding Waluigi...
> Currently, the collection is Mario,Peach,Bowser,Waluigi
> Property Count has changed
> Currently, the collection is Mario,Peach,Bowser,Waluigi
> Property Item[] has changed
Searching for Peach...
Swapping items...
> Currently, the collection is Mario,Bowser,Peach,Waluigi
> Property Item[] has changed
As you can see, for every add/remove operation, we have two events raised: one to say that the Count has changed, and one to say that the internal Item[] is changed.
However, notice what happens in the Swapping section: since you just change the order of the items, the Count property does not change.
As you probably noticed, events are fired after the collection has been initialized. Clearly, it considers the items passed in the constructor as the initial state, and all the subsequent operations that mutate the state can raise events.
Also, notice that events are fired only if the reference to the value changes. If the collection holds more complex classes, like:
publicclassUser{
publicstring Name { get; set; }
}
No event is fired if you change the value of the Name property of an object already part of the collection:
var me = new User { Name = "Davide" };
var collection = new ObservableCollection<User>(new User[] { me });
collection.CollectionChanged += WhenCollectionChanges;
(collection as INotifyPropertyChanged).PropertyChanged += WhenPropertyChanges;
me.Name = "Updated"; // It does not fire any event!
Notice that ObservableCollection<T> is not thread-safe! You can find an interesting article by Gérald Barré (aka Meziantou) where he explains a thread-safe version of ObservableCollection<T> he created. Check it out!
As always, I suggest exploring the language and toying with the parameters, properties, data types, etc.
You’ll find lots of exciting things that may come in handy.
I hope you enjoyed this article! Let’s keep in touch on Twitter or LinkedIn! 🤜🤛
A simple way to improve efficiency is knowing your IDE shortcuts. Let’s learn how to create custom ones to generate code automatically.
Table of Contents
Just a second! 🫷 If you are here, it means that you are a software developer.
So, you know that storage, networking, and domain management have a cost .
If you want to support this blog, please ensure that you have disabled the adblocker for this site. I configured Google AdSense to show as few ADS as possible – I don’t want to bother you with lots of ads, but I still need to add some to pay for the resources for my site.
Thank you for your understanding. – Davide
One of the best tricks to boost productivity is knowing your tools.
I’m pretty sure you’ve already used some predefined snippets in Visual Studio. For example, when you type ctor and hit Tab twice, VS automatically creates an empty constructor for the current class.
In this article, we will learn how to create custom snippets: in particular, we will design a snippet that automatically creates a C# Unit Test method with some placeholders and predefined Arrange-Act-Assert blocks.
Snippet Designer: a Visual Studio 2022 extension to add a UI to your placeholders
Snippets are defined in XML-like files with .snippet extension. But we all know that working with XMLs can be cumbersome, especially if you don’t have a clear idea of the expected structure.
Therefore, even if not strictly necessary, I suggest installing a VS2022 extension called Snippet Designer 2022.
This extension, developed by Matthew Manela, can be found on GitHub, where you can view the source code.
This extension gives you a UI to customize the snippet instead of manually editing the XML nodes. It allows you to customize the snippet, the related metadata, and even the placeholders.
Create a basic snippet in VS2022 using a .snippet file
As we saw, snippets are defined in a simple XML.
In order to have your snippets immediately available in Visual Studio, I suggest you create those files in a specific VS2022 folder under the path \Documents\Visual Studio 2022\Code Snippets\Visual C#\My Code Snippets\.
So, create an empty file, change its extension to .snippet, and save it to that location.
Now, you can open Visual Studio (it’s not necessary to open a project, but I’d recommend you to do so). Then, head to File > Open, and open the file you saved under the My Code Snippets directory.
Thanks to Snippet Designer, you will be able to see a nice UI instead of plain XML content.
Have a look at how I filled in the several parts to create a snippet that generates a variable named x, assigns to it a value, and then calls x++;
Have a look at the main parts:
the body, which contains the snippet to be generated;
the top layer, where we specified:
the Snippet name: Int100; it’s the display name of the shortcut
the code language: C#;
the shortcut: int100; it’s the string you’ll type in that allows you to generate the expected snippet;
the bottom table, which contains the placeholders used in the snippet; more on this later;
the properties tab, on the sidebar: here is where you specify some additional metadata, such as:
Author, Description, and Help Url of the snippet, in case you want to export it;
the kind of snippet: possible values are MethodBody, MethodDecl and TypeDecl. However, this value is supported only in Visual Basic.
Now, hit save and be ready to import it!
Just for completeness, here’s the resulting XML:
<?xml version="1.0" encoding="utf-8"?><CodeSnippetsxmlns="http://schemas.microsoft.com/VisualStudio/2005/CodeSnippet"><CodeSnippetFormat="1.0.0"><Header><SnippetTypes><SnippetType>Expansion</SnippetType></SnippetTypes><Title>Int100</Title><Author></Author><Description></Description><HelpUrl></HelpUrl><Shortcut>int100</Shortcut></Header><Snippet><CodeKind="method decl"Language="csharp"Delimiter="$"><![CDATA[int x = 100;
x++;]]></Code></Snippet></CodeSnippet></CodeSnippets>
Notice that the actual content of the snippet is defined in the CDATA block.
Import the snippet in Visual Studio
It’s time to import the snippet. Open the Tools menu item and click on Code Snippets Manager.
From here, you can import a snippet by clicking the Import… button. Given that we’ve already saved our snippet in the correct folder, we’ll find it under the My Code Snippets folder.
Now it’s ready! Open a C# class, and start typing int100. You’ll see our snippet in the autocomplete list.
By hitting Tab twice, you’ll see the snippet’s content being generated.
How to use placeholders when defining snippets in Visual Studio
Wouldn’t it be nice to have the possibility to define customizable parts of your snippets?
Let’s see a real example: I want to create a snippet to create the structure of a Unit Tests method with these characteristics:
it already contains the AAA (Arrange, Act, Assert) sections;
the method name should follow the pattern “SOMETHING should DO STUFF when CONDITION”. I want to be able to replace the different parts of the method name by using placeholders.
You can define placeholders using the $ symbol. You will then see the placeholders in the table at the bottom of the UI. In this example, the placeholders are $TestMethod$, $DoSomething$, and $Condition$. I also added a description to explain the purpose of each placeholder better.
The XML looks like this:
<?xml version="1.0" encoding="utf-8"?><CodeSnippetsxmlns="http://schemas.microsoft.com/VisualStudio/2005/CodeSnippet"><CodeSnippetFormat="1.0.0"><Header><SnippetTypes><SnippetType>Expansion</SnippetType></SnippetTypes><Title>Test Sync</Title><Author>Davide Bellone</Author><Description>Scaffold the AAA structure for synchronous NUnit tests</Description><HelpUrl></HelpUrl><Shortcut>testsync</Shortcut></Header><Snippet><Declarations><LiteralEditable="true"><ID>TestMethod</ID><ToolTip>Name of the method to be tested</ToolTip><Default>TestMethod</Default><Function></Function></Literal><LiteralEditable="true"><ID>DoSomething</ID><ToolTip>Expected behavior or result</ToolTip><Default>DoSomething</Default><Function></Function></Literal><LiteralEditable="true"><ID>Condition</ID><ToolTip>Initial conditions</ToolTip><Default>Condition</Default><Function></Function></Literal></Declarations><CodeLanguage="csharp"Delimiter="$"Kind="method decl"><![CDATA[[Test]
public void $TestMethod$_Should_$DoSomething$_When_$Condition$()
{
// Arrange
// Act
// Assert
}]]></Code></Snippet></CodeSnippet></CodeSnippets>
Now, import it as we already did before.
Then, head to your code, start typing testsync, and you’ll see the snippet come to life. The placeholders we defined are highlighted. You can then fill in these placeholders, hit tab, and move to the next one.
Bonus: how to view all the snippets defined in VS
If you want to learn more about your IDE and the available snippets, you can have a look at the Snippet Explorer table.
You can find it under View > Tools > Snippet Explorer.
Here, you can see all the snippets, their shortcuts, and the content of each snippet. You can also see the placeholders highlighted in green.
It’s always an excellent place to learn more about Visual Studio.
Further readings
As always, you can read more on Microsoft Docs. It’s a valuable resource, although I find it difficult to follow.
There are some tips that may improve both the code quality and the developer productivity.
If you want to enforce some structures or rules, add such snippets in your repository; when somebody joins your team, teach them how to import those snippets.
I hope you enjoyed this article! Let’s keep in touch on Twitter or LinkedIn! 🤜🤛
You have a collection of items. You want to retrieve N elements randomly. Which alternatives do we have?
Table of Contents
Just a second! 🫷 If you are here, it means that you are a software developer.
So, you know that storage, networking, and domain management have a cost .
If you want to support this blog, please ensure that you have disabled the adblocker for this site. I configured Google AdSense to show as few ADS as possible – I don’t want to bother you with lots of ads, but I still need to add some to pay for the resources for my site.
Thank you for your understanding. – Davide
One of the most common operations when dealing with collections of items is to retrieve a subset of these elements taken randomly.
Before .NET 8, the most common way to retrieve random items was to order the collection using a random value and then take the first N items of the now sorted collection.
From .NET 8 on, we have a new method in the Random class: GetItems.
So, should we use this method or stick to the previous version? Are there other alternatives?
For the sake of this article, I created a simple record type, CustomRecord, which just contains two properties.
publicrecordCustomRecord(int Id, string Name);
I then stored a collection of such elements in an array. This article’s final goal is to find the best way to retrieve a random subset of such items. Spoiler alert: it all depends on your definition of best!
Method #1: get random items with Random.GetItems
Starting from .NET 8, released in 2023, we now have a new method belonging to the Random class: GetItems.
There are three overloads:
public T[] GetItems<T>(T[] choices, int length);
public T[] GetItems<T>(ReadOnlySpan<T> choices, int length);
publicvoid GetItems<T>(ReadOnlySpan<T> choices, Span<T> destination);
We will focus on the first overload, which accepts an array of items (choices) in input and returns an array of size length.
If you need to preserve the initial order of the items, you should create a copy of the initial array and shuffle only the copy. You can do this by using this syntax:
CustomRecord[] copy = [.. Items];
If you just need some random items and don’t care about the initial array, you can shuffle it without making a copy.
Once we’ve shuffled the array, we can pick the first N items to get a subset of random elements.
Method #3: order by Guid, then take N elements
Before .NET 8, one of the most used approaches was to order the whole collection by a random value, usually a newly generated Guid, and then take the first N items.
This approach works fine but has the disadvantage that it instantiates a new Guid value for every item in the collection, which is an expensive memory-wise operation.
Method #4: order by Number, then take N elements
Another approach was to generate a random number used as a discriminator to order the collection; then, again, we used to get the first N items.
We are going to run the benchmarks on arrays with different sizes. We will start with a smaller array with 100 items and move to a bigger one with one million items.
We generate the initial array of CustomRecord instances for every iteration and store it in the Items property. Then, we randomly choose the number of items to get from the Items array and store it in the TotalItemsToBeRetrieved property.
We also generate a copy of the initial array at every iteration; this way, we can run Random.Shuffle without modifying the original array.
Finally, we define the body of the benchmarks using the implementations we saw before.
Notice: I marked the benchmark for the GetItems method as a baseline, using [Benchmark(Baseline = true)]. This way, we can easily see the results ratio for the other methods compared to this specific method.
When we run the benchmark, we can see this final result (for simplicity, I removed the Error, StdDev, and Median columns):
Method
Size
Mean
Ratio
Allocated
Alloc Ratio
WithRandomGetItems
100
6.442 us
1.00
424 B
1.00
WithRandomGuid
100
39.481 us
6.64
3576 B
8.43
WithRandomNumber
100
22.219 us
3.67
2256 B
5.32
WithShuffle
100
7.038 us
1.16
1464 B
3.45
WithShuffleNoCopy
100
4.254 us
0.73
624 B
1.47
WithRandomGetItems
10000
58.401 us
1.00
5152 B
1.00
WithRandomGuid
10000
2,369.693 us
65.73
305072 B
59.21
WithRandomNumber
10000
1,828.325 us
56.47
217680 B
42.25
WithShuffle
10000
180.978 us
4.74
84312 B
16.36
WithShuffleNoCopy
10000
156.607 us
4.41
3472 B
0.67
WithRandomGetItems
1000000
15,069.781 us
1.00
4391616 B
1.00
WithRandomGuid
1000000
319,088.446 us
42.79
29434720 B
6.70
WithRandomNumber
1000000
166,111.193 us
22.90
21512408 B
4.90
WithShuffle
1000000
48,533.527 us
6.44
11575304 B
2.64
WithShuffleNoCopy
1000000
37,166.068 us
4.57
6881080 B
1.57
By looking at the numbers, we can notice that:
GetItems is the most performant method, both for time and memory allocation;
using Guid.NewGuid is the worst approach: it’s 10 to 60 times slower than GetItems, and it allocates, on average, 4x the memory;
sorting by random number is a bit better: it’s 30 times slower than GetItems, and it allocates around three times more memory;
shuffling the array in place and taking the first N elements is 4x slower than GetItems; if you also have to preserve the original array, notice that you’ll lose some memory allocation performance because you must allocate more memory to create the cloned array.
Here’s the chart with the performance values. Notice that, for better readability, I used a Log10 scale.
If we move our focus to the array with one million items, we can better understand the impact of choosing one approach instead of the other. Notice that here I used a linear scale since values are on the same magnitude order.
The purple line represents the memory allocation in bytes.
So, should we use GetItems all over the place? Well, no! Let me tell you why.
The problem with Random.GetItems: repeated elements
There’s a huge problem with the GetItems method: it returns duplicate items. So, if you need to get N items without duplicates, GetItems is not the right choice.
Here’s how you can demonstrate it.
First, create an array of 100 distinct items. Then, using Random.Shared.GetItems, retrieve 100 items.
The final array will have 100 items; the array may or may not contain duplicates.
int[] source = Enumerable.Range(0, 100).ToArray();
StringBuilder sb = new StringBuilder();
for (int i = 1; i <= 200; i++)
{
HashSet<int> ints = Random.Shared.GetItems(source, 100).ToHashSet();
sb.AppendLine($"run-{i}, {ints.Count}");
}
var finalCsv = sb.ToString();
To check the number of distinct elements, I put the resulting array in a HashSet<int>. The final size of the HashSet will give us the exact percentage of unique values.
If the HashSet size is exactly 100, it means that GetItems retrieved each element from the original array exactly once.
For simplicity, I formatted the result in CSV format so that I could generate plots with it.
As you can see, on average, we have 65% of unique items and 35% of duplicate items.
Further readings
I used the Enumerable.Range method to generate the initial items.
I wrote an article to explain how to use it, which are some parts to consider when using it, and more.
Just a second! 🫷 If you are here, it means that you are a software developer.
So, you know that storage, networking, and domain management have a cost .
If you want to support this blog, please ensure that you have disabled the adblocker for this site. I configured Google AdSense to show as few ADS as possible – I don’t want to bother you with lots of ads, but I still need to add some to pay for the resources for my site.
Thank you for your understanding. – Davide
Even when the internal data is the same, sometimes you can represent it in different ways. Think of the DateTime structure: by using different modifiers, you can represent the same date in different formats.
We can make this class implement the IFormattable interface so that we can define and use the advancedToString:
publicclassPerson : IFormattable
{
publicstring FirstName { get; set; }
publicstring LastName { get; set; }
public DateTime BirthDate { get; set; }
publicstring ToString(string? format, IFormatProvider? formatProvider)
{
// Here, you define how to work with different formats }
}
Now, we can define the different formats. Since I like to keep the available formats close to the main class, I added a nested class that only exposes the names of the formats.
publicclassPerson : IFormattable
{
publicstring FirstName { get; set; }
publicstring LastName { get; set; }
public DateTime BirthDate { get; set; }
publicstring ToString(string? format, IFormatProvider? formatProvider)
{
// Here, you define how to work with different formats }
publicstaticclassStringFormats {
publicconststring FirstAndLastName = "FL";
publicconststring Mini = "Mini";
publicconststring Full = "Full";
}
}
Finally, we can implement the ToString(string? format, IFormatProvider? formatProvider) method, taking care of all the different formats we support (remember to handle the case when the format is not recognised!)
publicstring ToString(string? format, IFormatProvider? formatProvider)
{
switch (format)
{
case StringFormats.FirstAndLastName:
returnstring.Format("{0} {1}", FirstName, LastName);
case StringFormats.Full:
{
FormattableString fs = $"{FirstName} {LastName} ({BirthDate:D})";
return fs.ToString(formatProvider);
}
case StringFormats.Mini:
return$"{FirstName.Substring(0, 1)}.{LastName.Substring(0, 1)}";
default:
returnthis.ToString();
}
}
A few things to notice:
I use a switch statement based on the values defined in the StringFormats subclass. If the format is empty or unrecognised, this method returns the default implementation of ToString.
You can use whichever way to generate a string, like string interpolation, or more complex ways;
In the StringFormats.Full branch, I stored the string format in a FormattableString instance to apply the input formatProvider to the final result.
Getting a custom string representation of an object
We can try the different formatting options now that we have implemented them all.
Look at how the behaviour changes based on the formatting and input culture (Hint: venerdí is the Italian for Friday.).
Person person = new Person
{
FirstName = "Albert",
LastName = "Einstein",
BirthDate = new DateTime(1879, 3, 14)
};
System.Globalization.CultureInfo italianCulture = new System.Globalization.CultureInfo("it-IT");
Console.WriteLine(person.ToString(Person.StringFormats.FirstAndLastName, italianCulture)); //Albert EinsteinConsole.WriteLine(person.ToString(Person.StringFormats.Mini, italianCulture)); //A.EConsole.WriteLine(person.ToString(Person.StringFormats.Full, italianCulture)); //Albert Einstein (venerdì 14 marzo 1879)Console.WriteLine(person.ToString(Person.StringFormats.Full, null)); //Albert Einstein (Friday, March 14, 1879)Console.WriteLine(person.ToString(Person.StringFormats.Full, CultureInfo.InvariantCulture)); //Albert Einstein (Friday, 14 March 1879)Console.WriteLine(person.ToString("INVALID FORMAT", CultureInfo.InvariantCulture)); //Scripts.General.IFormattableTest+PersonConsole.WriteLine(string.Format("I am {0:Mini}", person)); //I am A.EConsole.WriteLine($"I am not {person:Full}"); //I am not Albert Einstein (Friday, March 14, 1879)
Not only that, but now the result can also depend on the Culture related to the current thread:
using (new TemporaryThreadCulture(italianCulture))
{
Console.WriteLine(person.ToString(Person.StringFormats.Full, CultureInfo.CurrentCulture)); // Albert Einstein (venerdì 14 marzo 1879)}
using (new TemporaryThreadCulture(germanCulture))
{
Console.WriteLine(person.ToString(Person.StringFormats.Full, CultureInfo.CurrentCulture)); //Albert Einstein (Freitag, 14. März 1879)}
(note: TemporaryThreadCulture is a custom class that I explained in a previous article – see below)
Further readings
You might be thinking «wow, somebody still uses String.Format? Weird!»
Well, even though it seems an old-style method to generate strings, it’s still valid, as I explain here: