In my last post about Signal I talked about adding a background task to receive messages while the app was closed. Earlier this week the other contributer to the project noticed that the message count of some of their conversations were different than the number of messages in the database for that conversation. This causes strange bugs and is definitely a problem. Here's the issue on GitHub. They created a proof of concept (PoC) (available here) that narrowed the problem down to how we were using semaphores. However, as I played around with the PoC it looked like semaphores weren't the issue, but I did discover some "interesting" things.

Here's the setup, a conversation between you and another person is stored in a local database on your device. The conversation database record keeps a count of all the messages between you and the person you're talking to. There's also a reference the actual list of messages in the conversation. The number of messages in the message list and the message count (the separate number) are updated one after another. So, first we add a message to the message list for that conversation, then we add 1 to the message count. Then we save all that information to the database. If you're wondering why we use a separate number to keep track of the number of messages when we could use the message list count, I wouldn't be able to tell you, I'm wondering the same thing currently.

We want to make sure that only one process updates the database at a time, if more than one process tries to update the database at the same time things get out of sync. The PoC showed this example. It contained two apps, one Universal Windows Platform (UWP) app and one console app. Each app would create a named semaphore, try and take a lock on that semaphore, and then update the database. After it was done updating the database it would release the lock on the semaphore. Wikipedia has a good analogy for what a semaphore is. Our semaphore is set to only allow one process at a time. So if the UWP app gets the semaphore first then the console app has to wait until the UWP app is done with it. When you name a semaphore that is supposed to create a global semaphore. This means that any other process on the system can see and try to get a lock on the semaphore. However after putting in log statements it looked like both apps were getting a lock on the semaphore. The logs are below.

[2018-02-18T11:12:04.4472573-08:00][UWP][12]: Started place order
[2018-02-18T11:12:04.4482559-08:00][UWP][12]: Attempting to create lock
[2018-02-18T11:12:04.4492575-08:00][UWP][12]: Lock creation status: False. Attempting to get lock.
[2018-02-18T11:12:04.4502565-08:00][UWP][12]: Lock got
[2018-02-18T11:12:04.4622692-08:00][UWP][12]: Customer order count BEFORE addition: 47
[2018-02-18T11:12:04.4623182-08:00][Console][1]: Started place order
[2018-02-18T11:12:04.4632612-08:00][UWP][12]: Customer order number BEFORE addition: 47
[2018-02-18T11:12:04.4635083-08:00][Console][1]: Attempting to create lock
[2018-02-18T11:12:04.4642569-08:00][UWP][12]: Customer order count AFTER addition: 47
[2018-02-18T11:12:04.4647733-08:00][Console][1]: Lock creation status: False. Attempting to get lock.
[2018-02-18T11:12:04.4659480-08:00][Console][1]: Lock got
[2018-02-18T11:12:04.4662562-08:00][UWP][12]: Customer order number AFTER addition: 48
[2018-02-18T11:12:04.4672709-08:00][UWP][12]: Customer order count AFTER increment: 48
[2018-02-18T11:12:04.4697093-08:00][Console][1]: Customer order count BEFORE addition: 47
[2018-02-18T11:12:04.4708298-08:00][Console][1]: Customer order number BEFORE addition: 47
[2018-02-18T11:12:04.4719322-08:00][Console][1]: Customer order count AFTER addition: 47
[2018-02-18T11:12:04.4729332-08:00][Console][1]: Customer order number AFTER addition: 48
[2018-02-18T11:12:04.4739024-08:00][Console][1]: Customer order count AFTER increment: 48
[2018-02-18T11:12:04.4934393-08:00][UWP][12]: Unlocking lock
[2018-02-18T11:12:04.4944382-08:00][UWP][12]: Unlocked lock
[2018-02-18T11:12:04.5002894-08:00][Console][1]: Unlocking lock
[2018-02-18T11:12:04.5014012-08:00][Console][1]: Unlocked lock

So on line 11, the console app was able to get the lock. However the UWP app supposedly got the lock first on line 4. This shouldn't happen. Both apps were trying to create the semaphore then get a lock on it. There's another method you can use to try and get a reference to an already created semaphore, Semaphore.OpenExisting(name). So if there's a global semaphore called locking_across_the_globe, you'll try and use that one instead of creating a new semaphore with the same name. If you're diligent and read the documentation however you'll see that you can't create two different global semaphores with the same name. So what's going on?

So instead of creating two semaphores with the same name I tried creating the semaphore in the console app and tried opening the semaphore in the UWP app. Surprisingly I get this exception in the UWP app on the Semaphore.OpenExisting call.

System.UnauthorizedAccessException: Access to the path is denied.
   at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   at System.Threading.Semaphore.OpenExistingWorker(String name, Semaphore& result)
   at System.Threading.Semaphore.OpenExisting(String name)
   at DatabaseManager.FooDBContext.Lock(String name)
   at DatabaseManager.FooDBContext.PlaceOrder(String name)
   at UWPClient.MainPage.<Button_Click>b__4_0()
   at System.Threading.Tasks.Task.Execute()

So I can create the semaphore but I can't open it? I then tried the other way, create the semaphore in the UWP app and try to open it in the console app. I get a different exception.

System.Threading.WaitHandleCannotBeOpenedException: No handle of the given name exists.
   at System.Threading.Semaphore.OpenExisting(String name)
   at DatabaseManager.FooDBContext.Lock(String name) in C:\Users\Sanders\Documents\GitHub\EFCoreSqliteTest\ConsoleClient\DB.cs:line 44
   at DatabaseManager.FooDBContext.PlaceOrder(String name) in C:\Users\Sanders\Documents\GitHub\EFCoreSqliteTest\ConsoleClient\DB.cs:line 67
   at ConsoleClient.Program.Main(String[] args) in C:\Users\Sanders\Documents\GitHub\EFCoreSqliteTest\ConsoleClient\Program.cs:line 24

Now the semaphore doesn't exist? This lead me to believe that global semaphores in UWP apps aren't actually global. To further test this I created another PoC. This time I had two UWP apps but the same setup. One would create the semaphore and the other one would try and open it. This leads to the same WaitHandleCannotBeOpenedException meaning that the second app can't see the first app's semaphore. This confirmed what I feared, UWP semaphores aren't actually global. So if they aren't global, what are they? When I was testing the background task with the main Signal app I was able to create a named semaphore and have everything work correctly, either the main app or the background task could get the lock on the semaphore, but not both. So I was fairly certain named semaphores worked in the same app even in different processes but to confirm this I created a third PoC.

This PoC was almost identical to the real app, the app creates a semaphore, takes a lock on it, and then a background task for the app tries to open and lock the semaphore. This worked correctly, the background task wasn't able get a lock on the semaphore since the app already had it.

So where does this leave me? I haven't figured out the source of the original bug yet but now I know that named semaphores in UWP apps are only visible to the app. This fact isn't documented anywhere I've looked so far. I found this in the documentation leading me to believe UWP semaphores aren't global for security reasons.

Be careful when you use named semaphores. Because they are system wide, another process that uses the same name can enter your semaphore unexpectedly. Malicious code executing on the same computer could use this as the basis of a denial-of-service attack.

If your global semaphore isn't actually global then your app can't get hacked 🤷🏾‍♂️.