Fibonacci(50) performance : Java > C > C++ > D > Go > Terra (Lua) > Lua-JIT (Lua)

Recently,  I have been spending some time to learn D and Go  languages.  D Lang is an evolution of C++ where as Go is being claimed to be an evolution of C but I think  it is a Google's attempt to replace dependency on the Java.

I really like the simplicity of Go where the learning curve is very lean but accepting it as a system level programming would be a tough sell.   I think it is more for Java/Python developers for building the enterprise softwares rather than using it for hardware/device-drivers/embedded programming, but you never know as  Google is very successful selling  Android on low powered smart devices.

I did some performance bench for C, C++, D and Go  languages using the Fibonacci algorithm.

Results: (see updated results at the end)

For Fibonacci(25), C++ >= Go > C >= Lua-JIT > D > Lua-Terra > Java 1.6

For Fibonacci(50), Java > C > C++ > D-ldc > D-dmd > Go > Lua-Terra > Lua-JIT

Now surprisingly,  Java out performed C/C++  for Fibonacci (50) which hurts my ego :) !!

Language % C++ Speed Compiler/VM Flags
FIBONACCI-25
C++ 100.0000 Apple LLVM version 6.0 -O3
GO 100.0000 go version go1.3.3 darwin/amd64
C 77.7778 Apple LLVM version 6.0 -O3
LUA 77.7778 LuaJIT 2.0.3
D 63.6364 dmd  -m64 -O  -inline -noboundscheck
D 63.6364 ldc -m64 -O  -inline
LUA 43.7500 Terra
JAVA 1.6 43.4783 1.6.0_65-b14-462-11M4609)
FIBONACCI-50
JAVA 1.6 169.6710 1.6.0_65-b14-462-11M4609)
C 101.9846 Apple LLVM version 6.0 -O3
C++ 100.0000 Apple LLVM version 6.0 -O3
D 92.6376 ldc -m64 -O  -inline
D 81.7197 dmd  -m64 -O  -inline -noboundscheck
GO 76.7760 go version go1.3.3 darwin/amd64
LUA 43.9684 Terra
LUA 38.9649 LuaJIT 2.0.3





For Source code and results:  check out my github project.

Update:
It seems Clang on MacOS has some issue. I executed these on my Linux Virtual machine with gnu g++ and g++ is outperforming Java. My ego is intact :)

$g++ --version
g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)
$g++ -O3 fib.cpp
time ./a.out 50
real 0m47.991s
user 0m47.981s
sys 0m0.000s

$java -version
java version "1.7.0_51"
$javac fib.java
$time java fib 50
real 0m51.897s
user 0m51.815s
sys 0m0.113s

Fibonacci numbers: LuaJIT vs Terra

Recently I came across Terra Languange which is  a new low-level system programming language that is designed to interoperate seamlessly with the Lua programming language.

I thought of comparing the performance between Lua-Jit and Terra.  We all know Lua-JIT is extremely fast thanks to Mike Pall.

You can find fib.lua source code here as lang-compare  on my github.

FIBONACCI - 25

>time luajit-2.0.3 fib.lua 25

Running LUA-JIT 2.0.3 test
LANGUAGE LUA: 75025
real 0m0.009s
user 0m0.002s
sys 0m0.006s

Running fib.lua with Terra :
>time terra  fib.lua 25

Running Terra test
LANGUAGE LUA: 75025
real 0m0.016s
user 0m0.005s
sys 0m0.010s

Here LuaJIT is 77% faster than Terra for the same fib.lua file.

FIBONACCI - 50

>time luajit-2.0.3 fib.lua 50
Running LUA-JIT test
LANGUAGE LUA: 12586269025
real 3m8.326s
user 3m8.157s
sys 0m0.045s

>time terra  fib.lua 50
Running Terra test
LANGUAGE LUA: 12586269025
real 2m46.895s
user 2m46.734s
sys 0m0.043s

Here LuaJit is 56% slower than terra.   I ran the same tests multiple times and the results are consistently same where terra is out performing luajit by ~45% when ran for longer duration.



Apple Push Notification Service (APNS) Simulator

APNS simulator implementes APNS Specs for a simple and an enhanced push notification.
Prerequisite:
Once you have downloaded/installed LuaJit and luarocks, other dependencies can be installed using luarocks
e.g.
luarocks install copas

luarocks install LuaSec

luarocks install LuaLogging
Usage:   apns-sim.lua -t ssl_enabled [ -k ssl_key -c ssl_cert] [ -s server -p port -l loglevel -]

Here ssl_key  and ssl_cert fields are mandatory if ssl_enabled is set to true.  

ssl_enabled :  defaul false
server : default value is 127.0.0.1

port  :  default 8080

loglevel : default value is 'warn'

e.g. for ssl connection:
lua  apns-sim.lua -t true -k ./key.pem -c ./cert.pem

for non-ssl connection:  lua  apns-sim.lua 
When client connect to this simulator and send a push notification, you will see log entries on console.
Wed Oct 15 09:16:13 2014 INFO Received client connection  from '127.0.0.1:53444':
Wed Oct 15 09:16:13 2014 INFO Received notification: command=1; id=21; expiry=1413382573; token=adf3b210e7adf35f540f45b2697760d9d41081569dc4509ee98bb4d4c92a72ae; payload={"aps":{"alert":{"body":"Hello World"}}}

Improving performance of C# Binding for ZeroMQ (clrzmq)

In one of my project, I used ZeroMQ  for inter-process communication which is extremely fast, allows async IO, different messaging patterns and supported on multiple platforms/languages.

I used  following three messaging patterns.

  1. Publish/Subscribe: Where client subscribes to specific types of messages. When server reads these messages from hardware, it will publish to these clients.
  2. Request/Response: Client can send request to server who execute the request, interact with hardware and get's the response back. E.g Client can request to open a serial port or play an audio.
  3. Push/Pull: All clients will push the logs to the central logging server, central logging server pulls the messages and writes to the file.

As the development is done using C# on  Windows Embedded environment,  I use clrzmq which is a C# binding for ZeroMQ.   Based on my initial performance test, I realized that clrzmq is taking lot more CPU than I expected. 

I used RedGate's ANTS performance profiler for .NET which gives detail analysis on how much  CPU cycles are spent on each function and how many times it is called.

What I  found is that ZmqSocket.Receive() method spent it's time on 
  • SpinWait:17.1%
  • Stopwatch.GetElapsedDateTimeTicks: 4.1%
  • Stopwatch.StartNew: 2.4%
  • Receive: 73.3%
In which Receive() function  spent 64.4% of the time on SocketProxy.Receive()
  • ErrorProxy.get_ShouldTryAgain: 5.1%
  • SocketProxy.Receive: 64.4%
Now CPU Usage for SocketProxy.Receive()
  • DisposableIntPtr.Dispose: 11.1%
  • ZmqMsgT.Init:7.1% 
  • ZmqMsgT.Close:5.8%
  • SocketProxy.RetryIfInterrupted: 20.8%
See the attached Picture where  SocketProxy.Receive() uses 13142.42 CPU ticks
Average CPU ticks per request = 191,196,025 / 14,548 =  13142.42

As part of optimization,  I used pre-allocated raw buffer to send and receive data instead of ZmqMsg object, moved StopWatch and SpinWait code in to a limited scope where timeout is defined and longer than certain value.

After these optimization, SocketProxy.Receive() uses only 2696.54 CPU ticks which is almost 1/5 of original cpu usages. See the attached picture below. 
Average CPU ticks per request = 5,132,725,522 / 1,903,448 =  2696.54

Here is github link for optimized ZeroMQ library.

I am happy to say that my patch was accepted by clrzmq author and merge into the mainline clrzmq library.



Accessing ZeroMQ from Windows PowerShell


In one of the project, we wanted to write logs from Windows PowerShell script to our central logging server when script is executed to update the packages and restart the windows service.   

On server side, we implemented ZeroMQ Server (C#) which pulls data from multiple processes and write to the log file using NLog.   

On client side, I used Windows PowerShell accessing ZeroMQ push socket to push the messages.

Window PowerShell support loading and accessing functionality from any CLR assemblies. 

To load the assembly, you need to use LoadFile function.
[System.Reflection.Assembly]::LoadFile("C:\zmqclr\bin\clrzmq.dll");.


To create the ZeroMQ Context which is a static function,
 [ZeroMQ.ZmqContext]::Create()


To create a socket (push) from Zmq Context
([ZeroMQ.ZmqSocket]$zmqCtx.CreateSocket([ZeroMQ.SocketType]::PUSH));


And now connect to the central logging server on port 10000
zmqSocket.Connect("tcp://localhost:10000");


Once Socket is connected, you can send messages using send function.
zmqSocket.Send($data, $data.Length, [ZeroMQ.SocketFlags]::None)

Below is the script.

#=======================================================================
# Purpose: Send logs to logging server using ZeroMQ
# Author: Rohit Joshi
# Date: 10/31/2012
#=======================================================================
#Global variables
$global:zmqSocket=$null
$enc = [system.Text.Encoding]::Unicode

#=======================================================================
# Function: Init ZeroMQ
# Arguments:
#=======================================================================
# Purpose: Initialize the zeromq
#=======================================================================
Function InitZeroMQ()
{

  $zmqCtx= [ZeroMQ.ZmqContext]::Create()
  $global:zmqSocket =([ZeroMQ.ZmqSocket]$zmqCtx.CreateSocket([ZeroMQ.SocketType]::PUSH));
  $global:zmqSocket.SendHighWatermark = 5000;
  $global:zmqSocket.Connect("tcp://localhost:10000");
}

#=======================================================================
# Function: SendMessage
# Arguments: string 
#=======================================================================
# Purpose:Send Message
#=======================================================================
Function SendMessage($msg)
{
 $data=$enc.GetBytes($msg) 
 $global:zmqSocket.Send($data, $data.Length, [ZeroMQ.SocketFlags]::None)
}

InitZeroMQ();
SendMessage("Sending data from PowerShell script");



Map-Reduce implementation in Lua


lua-mapreduce is a fast and easy MapReduce implementation for lua inspired by other ma-reduce implementation and particularly octopy in python.
It doesn't aim to meet all your distributed computing needs, but its simple approach is amendable to a large proportion of parallelizable tasks. If your code has a for-loop, there's a good chance that you can make it distributed with just a few small changes.
It uses following lua modules.
  1. lausocket: tcp client-server connectivity
  2. copas: Coroutine Oriented Portable Asynchronous Services for Lua
  3. lualogging
  4. serialize(included in this project)
  5. luafilesystem: Used only in the task-file example to list files from the directory. lua-mapreduce client/server doesn't depend on this module
For windows, you can install luaforwindows which includes these modules.
For Linux/Unix/MacOS and Windows: you can use LuaDist

Directory structure:

  1. lua-mapreduce-server.lua : It is a map-reduce server which receives the connections from clients, sends them task-file and than sends them tasks to perform map/reduce functionality.
  2. lua-mapreduce-client.lua : It connects to the server, receives the task and executes map/reduce functions defines in the task-file
  3. utils/utils.lua : Provides utility functionality
  4. utils/serialize.lua : Provides table serialization functionality
  5. example/word-count-taskfile.lua : Example task-file for counting words from all .lua files in current directory More details on how to create task file is given in word-count example page of wiki.

Usage:

  1. Start Server: lua-mapreduce-server.lua -t task-file.lua [-s server-ip -p port -l loglevel]
   2. Start Client: lua-mapreduce-client.lua [-s server-ip -p port -l loglevel]
example/word-counttask-file.lua is a sample task-file for work count which implements following functions. Each of these functions are invoked using coroutines to avoid non-blocking calls so as well as returning results one entry at a time to save the memory.
All these functions must be defines as part of table and return this table in function mapreduce()

Server required functions:

  1. taskfn: It reads the source and creates the map of the tasks. E.g For word count, it reads the all file with .lua from current directory and creates a map with key as a file name and content as a value.
    mr.taskfn = function()
        --logger:debug("Getting map task")
        local tasks = read_source()  -- read source is utility function defined to read data source
        for key, value in pairs(tasks) do
            coroutine.yield(key, value)
        end
    end
Here read_source() is a local function defined as below. NOTE: it uses luafilesystem (lfs) module to read the files
    local function read_source()
      --local file_path = system.pathForFile( "*.lua", lfs.currentdir() )
      local file_path = lfs.currentdir()
      --logger:debug("Current directory path:" .. file_path)
      local source_table = {}
      for file in lfs.dir(file_path) do
        if(string.find(file, ".lua") ~= nil) then
        --  logger:debug("File name:" .. file_path .. "/" .. file)
            local c = read_file(file_path .. "/" .. file)
        --  logger:debug("file:" .. file .. ", length:" .. #c)

            if( c ~= nil) then
                source_table[file]=c
            end
        end
      end
      return source_table
    end
  1. finalfn : How to output final result. Here it prints on the console.
    mr.finalfn = function (results)
        print("Final results of the task:")
        for key, value in pairs(results) do
            print( key .. ":" .. value)
            coroutine.yield()
        end
    end
    

Client required functions:

  1. mapfn: Map function which processes the task (content of the file here) and splits into lines and each lines into words
    -- Map function : Here it splits the content of the files into lines and each line into words
    mr.mapfn = function(key, value)
        --logger:debug("mapfn with key:" .. key .. ", value :" .. value .. "\r\n\r\n")
        local file_words = {}
    
    
    local lines = value:split("[^\r\n%s]+")
    
    -- logger:debug("Number of lines in " .. #lines .. " in the file " .. key)
    for k, w in  ipairs(lines) do
    
        if(w ~= nil) then
    
            local words = {}
            string.gsub(w, "(%a+)", function (word)
                table.insert(words, string.lower(word))
            end)
    
            --local words = w:split("[^ %s]+")
    
            if(words ~= nil) then
                --logger:debug("Number of words in line " .. k .. " are " .. #words)
                for j=1, #words do
                    --logger:debug("mapfn:yielding " .. words[j])
                    coroutine.yield(words[j], 1)
    
                end
            end
        end
    end
    
    end
  1. reducefn : Reduce function which processes the number of occurrences for a word. Here it returns the size of an array.
    ---Reduce function: It returns the numbe of entries for the values

    mr.reducefn = function (key, value)
        --logger:debug("reducefn: for key:" .. key ..  ", number of words  :" .. #value)
        coroutine.yield(key, #value)
    end

 Todo

  1. Add support to handled failed task. currently if client disconnect, the task handled by the client is lost
  2. Support for multiple client connections based on number of cores available on the computer. Use copas for async
  3. Ability to send multiple task-files to the server.
  4. Add more example of task-files
  5. Possibly integrate with apache-mesos

Performance: Nginx, RestExpress, CppCMS, Playframework, Gwan

Quick and not-scientific performance test comparison. Goal of this test is to find out how fast is the engine for my car. No car is faster than it's engine capacity. Ofcourse engine is not everything and we can't ride on the engine but it's good to know.

Concurrency Level : 200
Total Request: 100,000
OS: Linux ubuntu 2.6.38-13-generic #57-Ubuntu SMP Mon Mar 5 18:29:54 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

NOTE: I have updated PlayFramework results by using "play start" command as some one mentioned in comment which has improved performance drastically.

Summary:

Nginx CppCMS RestExpress Play2.0 Gwan
Time taken (sec) 14.17 14.724 19.653 20.896 25.348
Mean request time ms 28.34 29.448 39.448 41.793 50.697
Total requests/sec 7057.28 6791.69 5088.19 4785.52 3945.03
Transfer rate (kb/s) 2494.86 1883.63 1157.76 1065.53 1587.46
Failed Requests 0 0 0 0 2843
With Nginx Frontend CppCMS RestExpress Play2.0
Time taken (sec) 14.672 23.64 26.810
Req per  sec (ms) 29.345 47.281 53.621
Reqs per sec 6815.00 4230.05 3729.90
Transfer rate (kb/s) 2076.62 1206.23 1114.60
Failed-Requests 18 0 0

Nginx CppCMS RestExpress Play2.0 Gwan
CPU% 90 237 254 210 360
Memory (RES) 10088 34m 404m 577m 30m
Memory (SHR) 3424 4444 9996 14m 7704

CppCMS + Nginx front end:
CPU%  : 247% ( CppCMS + Nginx)
CPU%:  149% (CppCMS)
Memory (RES) : 17252 (CppCMS + Nginx)
Memory (SHR):  8260 (CppCMS + Nginx)
Memory (RES) : 6700 (CppCMS)
Memory (SHR): 4444 (CppCMS)

1.  Nginx (0.8.54)

$ ab -n 100000 -c 200 http://localhost:80/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:        nginx/0.8.54
Server Hostname:        localhost
Server Port:            80

Document Path:          /
Document Length:        151 bytes

Concurrency Level:      200
Time taken for tests:   14.170 seconds
Complete requests:      100000
Failed requests:        0
Write errors:           0
Total transferred:      36200000 bytes
HTML transferred:       15100000 bytes
Requests per second:    7057.28 [#/sec] (mean)
Time per request:       28.340 [ms] (mean)
Time per request:       0.142 [ms] (mean, across all concurrent requests)
Transfer rate:          2494.86 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        5   13   6.5     13     141
Processing:     3   15   7.5     14     145
Waiting:        2   11   6.5     10     142
Total:         16   28   9.8     26     157

Percentage of the requests served within a certain time (ms)
  50%     26
  66%     27
  75%     28
  80%     28
  90%     30
  95%     31
  98%     56
  99%     75
 100%    157 (longest request)


top - 05:18:20 up 12:14,  4 users,  load average: 1.96, 1.65, 1.38
Tasks:   5 total,   0 running,   5 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.0%us, 16.7%sy,  0.0%ni, 77.2%id,  0.3%wa,  0.0%hi,  3.9%si,  0.0%st
Mem:   4056556k total,  3267276k used,   789280k free,   295200k buffers
Swap:   916476k total,     2732k used,   913744k free,  1360740k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                  
20974 www-data  20   0 71608 2212  796 S   46  0.1   0:37.47 nginx                                                                                                                    
20971 www-data  20   0 71608 2212  796 S   17  0.1   0:33.82 nginx                                                                                                                    
20973 www-data  20   0 71608 2212  796 S   14  0.1   0:31.63 nginx                                                                                                                    
20972 www-data  20   0 71608 2212  796 S   13  0.1   0:28.03 nginx                                                                                                                    
20970 root      20   0 71288 1240  240 S    0  0.0   0:00.00 nginx

2. CppCMS (1.0.1)


$ ab -n 100000 -c 200 http://localhost:10000/hello
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:        CppCMS-Embedded/1.0.1
Server Hostname:        localhost
Server Port:            10000

Document Path:          /hello
Document Length:        147 bytes

Concurrency Level:      200
Time taken for tests:   14.724 seconds
Complete requests:      100000
Failed requests:        0
Write errors:           0
Total transferred:      28400000 bytes
HTML transferred:       14700000 bytes
Requests per second:    6791.69 [#/sec] (mean)
Time per request:       29.448 [ms] (mean)
Time per request:       0.147 [ms] (mean, across all concurrent requests)
Transfer rate:          1883.63 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        8   13   2.0     13      55
Processing:     5   16   2.5     15      57
Waiting:        3   12   2.4     11      51
Total:         17   29   2.9     29     108

Percentage of the requests served within a certain time (ms)
  50%     29
  66%     29
  75%     30
  80%     31
  90%     32
  95%     33
  98%     34
  99%     35
 100%    108 (longest request)


top - 05:14:50 up 12:11,  4 users,  load average: 1.84, 1.51, 1.28
Tasks:   2 total,   2 running,   0 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.5%us, 26.5%sy,  0.0%ni, 63.6%id,  0.6%wa,  0.0%hi,  4.8%si,  0.0%st
Mem:   4056556k total,  3186752k used,   869804k free,   295088k buffers
Swap:   916476k total,     2732k used,   913744k free,  1284760k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                  
21182 rjoshi    20   0  701m  34m 4444 R  237  0.9  12:13.20 hello                                                                                                                    


3. RestExpress (0.7.1)

Java settings:
/usr/lib/jvm/jdk1.7.0/jre/bin/java -classpath /Downloads/kickstart/build/classes/main:/Downloads/kickstart/lib/DateAdapterJ-1.0.0.jar:/Downloads/kickstart/lib/RestExpress-0.7.1-build-40.jar:/home/rjoshi/Downloads/kickstart/lib/gson-1.6.jar:/Downloads/kickstart/lib/junit-4.4.jar:/home/rjoshi/Downloads/kickstart/lib/netty-3.2.5.Final.jar:/Downloads/kickstart/lib/xpp3_min-1.1.4c.jar:/Downloads/kickstart/lib/xstream-1.3.1.jar com.kickstart.Main

ab -n 100000 -c 200 http://localhost:8081/hello
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:      
Server Hostname:        localhost
Server Port:            8081

Document Path:          /hello
Document Length:        148 bytes

Concurrency Level:      200
Time taken for tests:   19.653 seconds
Complete requests:      100000
Failed requests:        0
Write errors:           0
Total transferred:      23300000 bytes
HTML transferred:       14800000 bytes
Requests per second:    5088.19 [#/sec] (mean)
Time per request:       39.307 [ms] (mean)
Time per request:       0.197 [ms] (mean, across all concurrent requests)
Transfer rate:          1157.76 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   24 178.9     15    3027
Processing:     1   15   9.4     16     390
Waiting:        1   12   9.2     12     389
Total:          1   39 180.5     31    3326

Percentage of the requests served within a certain time (ms)
  50%     31
  66%     33
  75%     34
  80%     35
  90%     39
  95%     43
  98%     45
  99%     48
 100%   3326 (longest request)



top - 05:31:32 up 12:27,  4 users,  load average: 1.92, 1.54, 1.39
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.5%us, 24.0%sy,  0.0%ni, 65.4%id,  0.2%wa,  0.0%hi,  5.0%si,  0.0%st
Mem:   4056556k total,  3748332k used,   308224k free,   297168k buffers
Swap:   916476k total,     2732k used,   913744k free,  1395984k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                  
10207 rjoshi    20   0 1355m 404m 9996 S  254 10.2   8:51.39 java

4. Playframework (2.0)

Config:
%production.application.mode=prod
logger.root=ERROR
logger.play=ERROR
logger.application=ERROR

Started using: play start

Java settings:

java -Xms512M -Xmx1536M -Xss1M -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=384M -Dfile.encoding=UTF8 -Dplay.version=2.0 -Dsbt.ivy.home=/home/rjoshi/Downloads/play-2.0/framework/../repository -Dplay.home=/home/rjoshi/Downloads/play-2.0/framework -Dsbt.boot.properties=/home/rjoshi/Downloads/play-2.0/framework/sbt/sbt.boot.properties -jar /home/rjoshi/Downloads/play-2.0/framework/sbt/sbt-launch.jar start
rjoshi    3051  2983 99 13:35 pts/0    00:19:59 java -Dsbt.ivy.home=/home/rjoshi/Downloads/play-2.0/framework/../repository -Djava.runtime.name=Java(TM) SE Runtime Environment -Dsun.boot.library.path=/usr/lib/jvm/jdk1.7.0/jre/lib/amd64 -Djava.vm.version=21.0-b17 -Djava.vm.vendor=Oracle Corporation -Djava.vendor.url=http://java.oracle.com/ -Dpath.separator=: -Djava.vm.name=Java HotSpot(TM) 64-Bit Server VM -Dfile.encoding.pkg=sun.io -Duser.country=US -Dsun.java.launcher=SUN_STANDARD -Dsun.os.patch.level=unknown -Djava.vm.specification.name=Java Virtual Machine Specification -Duser.dir=/home/rjoshi/Downloads/play-2.0/samples/java/helloworld -Djava.runtime.version=1.7.0-b147 -Dsbt.boot.properties=/home/rjoshi/Downloads/play-2.0/framework/sbt/sbt.boot.properties -Djava.awt.graphicsenv=sun.awt.X11GraphicsEnvironment -Djava.endorsed.dirs=/usr/lib/jvm/jdk1.7.0/jre/lib/endorsed -Dos.arch=amd64 -Djava.io.tmpdir=/tmp -Dline.separator=? -Djava.vm.specification.vendor=Oracle Corporation -Dos.name=Linux -Dsun.jnu.encoding=UTF-8 -Djava.library.path=:/usr/local/lib:/usr/lib:/usr/share/lib:.:/lib://opt/arawat/lib:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib -Djava.specification.name=Java Platform API Specification -Djava.class.version=51.0 -Dplay.home=/home/rjoshi/Downloads/play-2.0/framework -Dsun.management.compiler=HotSpot 64-Bit Tiered Compilers -Dos.version=2.6.38-13-generic -Duser.home=/home/rjoshi -Duser.timezone=America/New_York -Djava.awt.printerjob=sun.print.PSPrinterJob -Dfile.encoding=UTF8 -Djava.specification.version=1.7 -Djava.class.path=/home/rjoshi/Downloads/play-2.0/framework/sbt/sbt-launch.jar -Duser.name=rjoshi -Dplay.version=2.0 -Djava.vm.specification.version=1.7 -Dsun.java.command=/home/rjoshi/Downloads/play-2.0/framework/sbt/sbt-launch.jar start -Djava.home=/usr/lib/jvm/jdk1.7.0/jre -Dsun.arch.data.model=64 -Duser.language=en -Djava.specification.vendor=Oracle Corporation -Dawt.toolkit=sun.awt.X11.XToolkit -Djava.vm.info=mixed mode -Djava.version=1.7.0 -Djava.ext.dirs=/usr/lib/jvm/jdk1.7.0/jre/lib/ext:/usr/java/packages/lib/ext -Dsun.boot.class.path=/usr/lib/jvm/jdk1.7.0/jre/lib/resources.jar:/usr/lib/jvm/jdk1.7.0/jre/lib/rt.jar:/usr/lib/jvm/jdk1.7.0/jre/lib/sunrsasign.jar:/usr/lib/jvm/jdk1.7.0/jre/lib/jsse.jar:/usr/lib/jvm/jdk1.7.0/jre/lib/jce.jar:/usr/lib/jvm/jdk1.7.0/jre/lib/charsets.jar:/usr/lib/jvm/jdk1.7.0/jre/classes -Djava.vendor=Oracle Corporation -Dfile.separator=/ -Djava.vendor.url.bug=http://bugreport.sun.com/bugreport/ -Dsun.io.unicode.encoding=UnicodeLittle -Dsun.cpu.endian=little -Dsun.desktop=gnome -Dsun.cpu.isalist= -Dhttp.port=9000 -cp /home/rjoshi/Downloads/play-2.0/samples/java/helloworld/target/scala-2.9.1/classes:/home/rjoshi/Downloads/play-2.0/framework/sbt/boot/scala-2.9.1/lib/scala-library.jar:/home/rjoshi/Downloads/play-2.0/repository/local/play/play_2.9.1/2.0/jars/play_2.9.1.jar:/home/rjoshi/Downloads/play-2.0/repository/local/play/templates_2.9.1/2.0/jars/templates_2.9.1.jar:/home/rjoshi/Downloads/play-2.0/repository/local/com.github.scala-incubator.io/scala-io-file_2.9.1/0.2.0/jars/scala-io-file_2.9.1.jar:/home/rjoshi/Downloads/play-2.0/repository/local/com.github.scala-incubator.io/scala-io-core_2.9.1/0.2.0/jars/scala-io-core_2.9.1.jar:/home/rjoshi/Downloads/play-2.0/repository/local/com.github.jsuereth.scala-arm/scala-arm_2.9.1/0.3/jars/scala-arm_2.9.1.jar:/home/rjoshi/Downloads/play-2.0/repository/local/play/anorm_2.9.1/2.0/jars/anorm_2.9.1.jar:/home/rjoshi/Downloads/play-2.0/repository/local/io.netty/netty/3.3.0.Final/bundles/netty.jar:/home/rjoshi/Downloads/play-2.0/repository/local/org.slf4j/slf4j-api/1.6.4/jars/slf4j-api.jar:/home/rjoshi/Downloads/play-2.0/repository/local/org.slf4j/jul-to-slf4j/1.6.4/jars/jul-to-slf4j.jar:/home/rjoshi/Downloads/play-2.0/repository/local/org.slf4j/jcl-over-slf4j/1.6.4/jars/jcl-over-slf4j.jar:/home/rjoshi/Downloads/play-2.0/repository/local/ch.qos.logback/logback-core/1.0.0/jars/logback-core.jar:/home/rjoshi/Downloads/play-2.0/repository/local/ch.qos.logback/logback-classic/1.0.0/jars/logback-classic.jar:/home/rjoshi/Downloads/play-2.0/repository/local/com.typesafe.akka/akka-ac



rjoshi@ubuntu:~$ ab -n 100000 -c 200 http://localhost:9000/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:      
Server Hostname:        localhost
Server Port:            9000

Document Path:          /
Document Length:        147 bytes

Concurrency Level:      200
Time taken for tests:   20.896 seconds
Complete requests:      100000
Failed requests:        0
Write errors:           0
Total transferred:      22800000 bytes
HTML transferred:       14700000 bytes
Requests per second:    4785.52 [#/sec] (mean)
Time per request:       41.793 [ms] (mean)
Time per request:       0.209 [ms] (mean, across all concurrent requests)
Transfer rate:          1065.53 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   25 197.9     11    3032
Processing:     3   16   5.5     15     267
Waiting:        2   12   4.8     11     263
Total:          3   40 198.2     26    3060

Percentage of the requests served within a certain time (ms)
  50%     26
  66%     30
  75%     32
  80%     33
  90%     37
  95%     39
  98%     44
  99%     46
 100%   3060 (longest request)



top - 13:44:18 up 22 min,  4 users,  load average: 0.41, 0.67, 0.51
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s): 20.0%us, 38.3%sy,  0.0%ni, 37.8%id,  0.2%wa,  0.0%hi,  3.7%si,  0.0%st
Mem:   4056556k total,  2775780k used,  1280776k free,   276848k buffers
Swap:   916476k total,        0k used,   916476k free,   933780k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                    
 3051 rjoshi    20   0 1527m 408m  12m S  430 10.3  22:16.51 java



5. CppCMS with Nginx frontend

top - 05:52:23 up 12:48,  4 users,  load average: 1.60, 1.32, 1.36
Tasks:   6 total,   1 running,   5 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.0%us, 25.8%sy,  0.0%ni, 65.6%id,  0.1%wa,  0.0%hi,  3.5%si,  0.0%st
Mem:   4056556k total,  3281736k used,   774820k free,   298500k buffers
Swap:   916476k total,     2732k used,   913744k free,  1429692k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                     
10425 rjoshi    20   0  568m 6700 4444 S  149  0.2   1:16.31 hello                                                                                                                       
10366 www-data  20   0 71608 2308  892 S   37  0.1   0:14.01 nginx                                                                                                                       
10367 www-data  20   0 71696 2380  892 R   31  0.1   0:15.32 nginx                                                                                                                       
10368 www-data  20   0 71608 2308  892 S   29  0.1   0:09.18 nginx                                                                                                                       
10365 www-data  20   0 71608 2312  896 S    1  0.1   0:11.01 nginx                                                                                                                       
10364 root      20   0 71288 1244  244 S    0  0.0   0:00.00 nginx
$ ab -n 100000 -c 200 http://localhost:9999/hello
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:        nginx/0.8.54
Server Hostname:        localhost
Server Port:            9999

Document Path:          /hello
Document Length:        147 bytes

Concurrency Level:      200
Time taken for tests:   14.672 seconds
Complete requests:      100000
Failed requests:        18
   (Connect: 0, Receive: 0, Length: 18, Exceptions: 0)
Write errors:           0
Non-2xx responses:      18
Total transferred:      31200234 bytes
HTML transferred:       14700468 bytes
Requests per second:    6815.54 [#/sec] (mean)
Time per request:       29.345 [ms] (mean)
Time per request:       0.147 [ms] (mean, across all concurrent requests)
Transfer rate:          2076.62 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        6   12   3.0     12     107
Processing:     3   17   2.8     17      55
Waiting:        3   13   2.7     14      54
Total:         17   29   3.8     29     131

Percentage of the requests served within a certain time (ms)
  50%     29
  66%     30
  75%     31
  80%     31
  90%     32
  95%     33
  98%     34
  99%     35
 100%    131 (longest request)

6. Playframework with Nginx frontend

Configuration:
%production.application.mode=prod
logger.root=ERROR
logger.play=ERROR
logger.application=ERROR


rjoshi@ubuntu:~$ ab -n 100000 -c 200 http://localhost:9999/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:        nginx/0.8.54
Server Hostname:        localhost
Server Port:            9999

Document Path:          /
Document Length:        147 bytes

Concurrency Level:      200
Time taken for tests:   26.810 seconds
Complete requests:      100000
Failed requests:        0
Write errors:           0
Total transferred:      30600000 bytes
HTML transferred:       14700000 bytes
Requests per second:    3729.90 [#/sec] (mean)
Time per request:       53.621 [ms] (mean)
Time per request:       0.268 [ms] (mean, across all concurrent requests)
Transfer rate:          1114.60 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    5  53.1      4    3025
Processing:     3   44 297.0     14    3369
Waiting:        3   43 297.0     12    3368
Total:          3   50 301.6     18    3379

Percentage of the requests served within a certain time (ms)
  50%     18
  66%     22
  75%     24
  80%     25
  90%     29
  95%     32
  98%     38
  99%    284
 100%   3379 (longest request)



top - 13:50:05 up 28 min,  5 users,  load average: 0.32, 0.32, 0.39
Tasks:   6 total,   2 running,   4 sleeping,   0 stopped,   0 zombie
Cpu(s): 18.7%us, 46.5%sy,  0.0%ni, 27.9%id,  0.0%wa,  0.0%hi,  6.9%si,  0.0%st
Mem:   4056556k total,  2817740k used,  1238816k free,   277404k buffers
Swap:   916476k total,        0k used,   916476k free,   944256k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                    
 3051 rjoshi    20   0 1527m 422m  12m S  381 10.7  24:22.37 java                                                                                                                                                                      
 5561 www-data  20   0 72152 2760  796 R   45  0.1   0:06.70 nginx                                                                                                                                                                      
 5559 www-data  20   0 72664 3004  796 S   39  0.1   0:04.60 nginx                                                                                                                                                                      
 5560 www-data  20   0 72636 3244  796 S   28  0.1   0:09.14 nginx                                                                                                                                                                      
 5558 www-data  20   0 71608 2284  864 R   26  0.1   0:07.15 nginx                                                                                                                                                                      
 5557 root      20   0 71288 1248  244 S    0  0.0   0:00.00 nginx

7. RestExpress with Nginx frontend:

$ ab -n 100000 -c 200 http://localhost:9999/hello
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:        nginx/0.8.54
Server Hostname:        localhost
Server Port:            9999

Document Path:          /hello
Document Length:        148 bytes

Concurrency Level:      200
Time taken for tests:   23.640 seconds
Complete requests:      100000
Failed requests:        0
Write errors:           0
Total transferred:      29200000 bytes
HTML transferred:       14800000 bytes
Requests per second:    4230.05 [#/sec] (mean)
Time per request:       47.281 [ms] (mean)
Time per request:       0.236 [ms] (mean, across all concurrent requests)
Transfer rate:          1206.23 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    6 103.7      1    3014
Processing:     3   36 310.7     13    9320
Waiting:        2   35 310.7     13    9320
Total:          3   42 328.2     16    9320

Percentage of the requests served within a certain time (ms)
  50%     16
  66%     20
  75%     22
  80%     24
  90%     28
  95%     29
  98%     31
  99%     40
 100%   9320 (longest request)

8. GWAN (3.3.28 64-bit (Mar 28 2012 11:24:16))

$ ab -n 100000 -c 200 http://localhost:9090/?hello
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:        G-WAN
Server Hostname:        localhost
Server Port:            9090

Document Path:          /?hello
Document Length:        146 bytes

Concurrency Level:      200
Time taken for tests:   25.348 seconds
Complete requests:      100000
Failed requests:        2843
   (Connect: 0, Receive: 0, Length: 1423, Exceptions: 1420)
Write errors:           0
Total transferred:      41205186 bytes
HTML transferred:       14392242 bytes
Requests per second:    3945.03 [#/sec] (mean)
Time per request:       50.697 [ms] (mean)
Time per request:       0.253 [ms] (mean, across all concurrent requests)
Transfer rate:          1587.46 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       10   25   4.5     25      69
Processing:     6   26   4.8     26      71
Waiting:        0   18   4.6     18      71
Total:         25   50   3.7     50      96

Percentage of the requests served within a certain time (ms)
  50%     50
  66%     52
  75%     52
  80%     53
  90%     54
  95%     56
  98%     58
  99%     61
 100%     96 (longest request)

top - 10:52:11 up 17:48,  4 users,  load average: 2.17, 1.53, 1.30
Tasks:   2 total,   0 running,   2 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.0%us, 24.5%sy,  0.0%ni, 64.9%id,  0.2%wa,  0.0%hi,  7.4%si,  0.0%st
Mem:   4056556k total,  3577880k used,   478676k free,   405480k buffers
Swap:   916476k total,     2640k used,   913836k free,  1642436k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                  
 8161 root      RT   0 1662m  30m 7704 S  360  0.8   2:06.54 gwan