Improving performance of C# Binding for ZeroMQ (clrzmq)

In one of my project, I used ZeroMQ  for inter-process communication which is extremely fast, allows async IO, different messaging patterns and supported on multiple platforms/languages.

I used  following three messaging patterns.

  1. Publish/Subscribe: Where client subscribes to specific types of messages. When server reads these messages from hardware, it will publish to these clients.
  2. Request/Response: Client can send request to server who execute the request, interact with hardware and get's the response back. E.g Client can request to open a serial port or play an audio.
  3. Push/Pull: All clients will push the logs to the central logging server, central logging server pulls the messages and writes to the file.

As the development is done using C# on  Windows Embedded environment,  I use clrzmq which is a C# binding for ZeroMQ.   Based on my initial performance test, I realized that clrzmq is taking lot more CPU than I expected. 

I used RedGate's ANTS performance profiler for .NET which gives detail analysis on how much  CPU cycles are spent on each function and how many times it is called.

What I  found is that ZmqSocket.Receive() method spent it's time on 
  • SpinWait:17.1%
  • Stopwatch.GetElapsedDateTimeTicks: 4.1%
  • Stopwatch.StartNew: 2.4%
  • Receive: 73.3%
In which Receive() function  spent 64.4% of the time on SocketProxy.Receive()
  • ErrorProxy.get_ShouldTryAgain: 5.1%
  • SocketProxy.Receive: 64.4%
Now CPU Usage for SocketProxy.Receive()
  • DisposableIntPtr.Dispose: 11.1%
  • ZmqMsgT.Init:7.1% 
  • ZmqMsgT.Close:5.8%
  • SocketProxy.RetryIfInterrupted: 20.8%
See the attached Picture where  SocketProxy.Receive() uses 13142.42 CPU ticks
Average CPU ticks per request = 191,196,025 / 14,548 =  13142.42

As part of optimization,  I used pre-allocated raw buffer to send and receive data instead of ZmqMsg object, moved StopWatch and SpinWait code in to a limited scope where timeout is defined and longer than certain value.

After these optimization, SocketProxy.Receive() uses only 2696.54 CPU ticks which is almost 1/5 of original cpu usages. See the attached picture below. 
Average CPU ticks per request = 5,132,725,522 / 1,903,448 =  2696.54

Here is github link for optimized ZeroMQ library.

I am happy to say that my patch was accepted by clrzmq author and merge into the mainline clrzmq library.



3 comments:

Anonymous said...

I have write nice article about caching object in C#. You can find it here: http://blog.szulak.net/programming/caching-objects-in-c/

Anonymous said...

I wish I'd seen this article before I made a similar discovery manually.. My short term solution was to remove the timeout in the call to receive as I noticed the WithTimeout extension was where all the SpinWait and StopWatch action was happening. Sounds like I need the newer version :)

salalilaas said...

The Best Ways to Make a TOTO TOTO TOTO TOTO TOTO TOTO TOTO
The TOTO TOTO TOTO TOTO TOTO babyliss pro titanium straightener TOTO TOTO TOTO TOTO TOTO TOTO TOTO TOTO TOTO TOTO TOTO TOTO TOTO TOTO is titanium a metal TOTO TOTO TOTO TOTO TOTO TOTO everquest: titanium edition TOTO TOTO TOTO TOTO TOTO TOTO TOTO titanium blade TOTO titanium jewelry piercing TOTO TOTO TOTO TOTO TOTO TOTO