Let me explain the scenario. We have n (tens, hundreds or thousands) of Akka actors listening to a queue on RabbitMQ server. So the scenario looks something like this
As you would notice, each of the actors gets a message from the Q and invokes a plugin to do the processing and returns the results back to the result Q.
Now the fun happens when I execute the same code on my machine and my competitors (Meetu’s) machine.
Meetu has a Macbook pro – 2 cores – 2.3 GHz Intel core i5 – RAM 4GB – 64 bit
Mine is Dell Vostro – 4 cores – 2.4 GHz Intel core i5 – RAM 3GB – 32 bit
And here are the results
We start with 10 actors getting 100 messages and go all the way to 1,000 actors getting 10,000 messages. Everything in the code is the same except the hardware on which the tests are executed. I start of badly and see that on my machine 10 actors are processing messages in 5.298 s as compared to 3.039s on Meetu’s mac. This prompts me to skip to the other portion of the spectrum where I feel I can show him the power of my 4 cores. I am mistaken that the initial load of fewer messages and actors is actually having overhead for 4 cores and they would perform well on the other end.
Sadly with 1000 actors and 10000 messages the tiny mac with 2 cores outperforms the hulk dell with 4 cores by a margin of more than 10s.
What could be going wrong? There is no other heavy processing happening, on either machine, when we are running the tests. I doubt 64 bit vs 32 bit would cause such an upset. What else?
Another data point: My dell is running on Ubuntu 11.04
What does your akka.conf look like? Which version are you using?
Viktor, Akka version is 1.1.3 and we are using the default configuration for now (akka-reference), nothing overridden.
If your core-count is correct (including things like HT) then you’re using different settings on your machines (factor): https://github.com/jboner/akka/blob/release-1.1.3/config/akka-reference.conf#L43
2 cores for your mac = 2 * 1.0 == 2 threads in default dispatcher on OSX
4 cores for your Dell = 4 * 1.0 == 4 threads in the default dispatcher on Windows
Also, are you using exactly the same JVM options for both runs? (-server xmx/xms etc)?
Of course this is if you’re not using the “GlobalExecutorBasedEventDriven” or if you’re using your own dispatcher.
Viktor, sorry for the late reply, was out of action for a few days.
The core count is correct and there is no HT. and what you suggest is correct that we indeed end up using different settings on our machines in terms of the # of threads. But would having 4 threads on my linux dell make it slower than 2 threads on osx?
btw just figured out that all mac book pros come with HT so ideally Runtime.getRuntime.availableProcessors would return 4 on both the machines.