Improving performance of C# Binding for ZeroMQ (clrzmq)

In one of my project, I used ZeroMQ  for inter-process communication which is extremely fast, allows async IO, different messaging patterns and supported on multiple platforms/languages.

I used  following three messaging patterns.

  1. Publish/Subscribe: Where client subscribes to specific types of messages. When server reads these messages from hardware, it will publish to these clients.
  2. Request/Response: Client can send request to server who execute the request, interact with hardware and get's the response back. E.g Client can request to open a serial port or play an audio.
  3. Push/Pull: All clients will push the logs to the central logging server, central logging server pulls the messages and writes to the file.

As the development is done using C# on  Windows Embedded environment,  I use clrzmq which is a C# binding for ZeroMQ.   Based on my initial performance test, I realized that clrzmq is taking lot more CPU than I expected. 

I used RedGate's ANTS performance profiler for .NET which gives detail analysis on how much  CPU cycles are spent on each function and how many times it is called.

What I  found is that ZmqSocket.Receive() method spent it's time on 
  • SpinWait:17.1%
  • Stopwatch.GetElapsedDateTimeTicks: 4.1%
  • Stopwatch.StartNew: 2.4%
  • Receive: 73.3%
In which Receive() function  spent 64.4% of the time on SocketProxy.Receive()
  • ErrorProxy.get_ShouldTryAgain: 5.1%
  • SocketProxy.Receive: 64.4%
Now CPU Usage for SocketProxy.Receive()
  • DisposableIntPtr.Dispose: 11.1%
  • ZmqMsgT.Init:7.1% 
  • ZmqMsgT.Close:5.8%
  • SocketProxy.RetryIfInterrupted: 20.8%
See the attached Picture where  SocketProxy.Receive() uses 13142.42 CPU ticks
Average CPU ticks per request = 191,196,025 / 14,548 =  13142.42

As part of optimization,  I used pre-allocated raw buffer to send and receive data instead of ZmqMsg object, moved StopWatch and SpinWait code in to a limited scope where timeout is defined and longer than certain value.

After these optimization, SocketProxy.Receive() uses only 2696.54 CPU ticks which is almost 1/5 of original cpu usages. See the attached picture below. 
Average CPU ticks per request = 5,132,725,522 / 1,903,448 =  2696.54

Here is github link for optimized ZeroMQ library.

I am happy to say that my patch was accepted by clrzmq author and merge into the mainline clrzmq library.



2 comments:

Arkadiusz Szulakiewicz said...

I have write nice article about caching object in C#. You can find it here: http://blog.szulak.net/programming/caching-objects-in-c/

Anonymous said...

I wish I'd seen this article before I made a similar discovery manually.. My short term solution was to remove the timeout in the call to receive as I noticed the WithTimeout extension was where all the SpinWait and StopWatch action was happening. Sounds like I need the newer version :)