====== Performance and Capacity Testing for V3.0 ======
===== Summary =====
As part of research project into [[http://janakj.org/papers/green_voip.pdf|energy efficiency of VoIP systems]], we performed a number
of performance tests of a SIP-Router based SIP server. Our goal was to figure out the
maximum number of User Agents (UA) that can be handled by a single SIP server with a
fully-featured configuration and with signaling traffic that is similar to what ITSPs
would see in the public Internet.
To generate realistic traffic patterns, we surveyed three major European ITSPs. We created a
model of signaling traffic based on the data obtained from those ITSPs and setup a test
bed to generate traffic with similar patterns for a variable number of user agents.
We found that a single full-featured SIP server can sustain the signaling traffic generated
by 0.5 million subscribers. The aggregate volume of signaling traffic generated by the load
generators during the test was 210 Mbit/s. We also learned that the SIP server consumes 4.5kB
of memory per TCP connection and that OpenSSL consumes additional 61kB of memory for each TLS
connection. That makes OpenSSL memory consumption a major bottleneck for TLS based setups.
We ran all the tests described below in February 2010 with a SIP-Router snapshot very similar
to the source code released as version **3.0.0**. The results are applicable to both SER and
Kamailio SIP server series **v3.0.x**.
Jan Janak and Salman Baset created the testing scenarios, designed the test bed, administered
the tests, and compiled this report.
Daniel-Constantin Mierla contributed section "Enhancements in v3.1.x" which describes
possible performance improvements implemented in later releases.
===== Goals =====
We wanted to estimate the size of subscriber population a single SIP server
with a fully-featured configuration file can handle. We were not interested
in performing aggressive optimizations or simplyfing the configuration file.
Our goal was to measure the performance of our SIP server in a default
out-of-the-box configuration. In particular we wanted to use all the features
a modern Internet Telephony Service Provider (ITSP) would need to use in the
public Internet.
Our goal was to figure out the maximum number of User Agents (UA) the SIP server
can support on a single server. All the tests were performed as part of an effort
to estimate the power consumption of VoIP based services. Although we primarily
focused on UDP as transport protocol for signaling, we also performed a number of
simpler tests for TCP and TLS with the goal to determine bottlenecks in those
scenarios.
===== Testbed Overview =====
The testbed consisted of 8 PCs connected with gigabit Ethernet. One
machine was running the SIP server, the rest of the machines were used
as SIP load generators. All of the load generator machines were connected
to the SIP server machine via gigabit Ethernet. The SIP server machine
had two gigabit Ethernet cards and the traffic generated by the load
generators was evenly split between the two Ethernet segments.
The SIP server machine contained two Intel Xeon CPUs clocked at 2.33GHz.
Each CPU contained 4 cores. The machine had 4GB of memory and two Intel
82545GM Gigabit Ethernet controllers. The operating system was Debian
Squeeze with the Linux kernel 2.6.32. The Linux kernel had been compiled
with Physical Address Extensions (PAE) enabled.
All load generators ran Debian Squeeze booted off a live CD. We used sipp
version 3.1.r590-1, installed from a Debian package, to generate SIP
traffic. Two load generators were 1U HP servers with Intel Xeon CPUs
running at 3.06GHz, each of the CPUs had one core. The rest of
the load generators were off-the-shelf desktop PCs.
To make sure that the testbed does not become a bottleneck, we first tried
to establish the maximum number of calls per second (CPS) sipp can generate
on our load generators, i.e., without a proxy server in the between. We were
able to generate and process about 5000 CPS with sipp in our testbed. That
generates about 97 Mbit/s of network traffic. Single sipp process cannot
process this amount of traffic, therefore we started 5 sipp processes
(5x UAS and 5x UAC) and let each process generate/process only 1000 CPS.
With a higher number of CPS per process sipp started dropping calls.
===== SIP Server Configuration =====
The SIP server was based on the publicly available source code from the
[[http://git.sip-router.org|repository]] of the [[http://sip-router.org|sip-router project]].
We retrieved a snapshot of the source code on 3rd February 2010.
Our SIP server implementation is multi-process based and we configured the
server to create 16 processes to handle the SIP traffic. We configured the SIP
server to use 2 GB of memory maximum (this memory is mainly used for SIP
transactions and a cache of user location database records).
All the data related to subscribers was stored in a MySQL database. The MySQL
database ran on the same machine as the SIP server. We used MySQL version
5.1.41 from a Debian package and configured the MySQL server to use 2 GB of
memory for the query cache. In a previous performance test we learned that the
query cache can have profound effect on SIP server's performance because the
server emits simple, read-only, easy-to-cache queries repeatedly.
We provisioned the system with data for one million subscribers. The data
includes user names and passwords for digest authentication, SIP URIs users can
use in SIP messages and various other configuration related data. The size of
all the data on the local hard disk was about 1.7 GB and because that is less
than the size of the query cache, all the data can fit into the cache.
We also installed the RTP relay known as rtpproxy on the same server. Although
we did not generate RTP traffic in our tests, we wanted to account for the
communication overhead of creating and destroying RTP relaying sessions between
the SIP server and rtpproxy. We used rtpproxy 1.2.1 installed from a Debian
package.
For our tests we configured the SIP server with the most advanced
configuration file which is [[http://git.sip-router.org/cgi-bin/gitweb.cgi?p=sip-router;a=blob_plain;f=etc/sip-router-oob.cfg;hb=HEAD|available]] from the source code repository.
We believe that the configuration file implements all the features typically
implemented by ITSPs operating in the public Internet. In particular the following
features were important for our tests:
* Digest authentication of all REGISTER and INVITE messages.
* User location database look-ups for incoming INVITEs.
* NAT traversal detection.
* Support for NAT-binding keep-alives.
* The possibility to relay calls through an RTP relay (rtpproxy).
With this configuration file NOTIFY messages generated by user agents to keep
NAT-bindings open were replied statelessly. REGISTER messages were
authenticated and also replied statelessly. The processing of INVITE, ACK,
and BYE requests was always transaction stateful and all INVITE requests
originating from one of the local subscribers were subject to digest
authentication.
The SIP server performed NAT detection of incoming SIP messages by inspecting
the source IP of UDP datagrams, IP addresses in Via and Contact headers and the
IP address in SDP bodies. The IP address in SDP was rewritten if the server
detected that an RTP relay was needed.
The SIP server would insert Record-Route headers into all SIP message. This made all
in-dialog requests, such as end-to-end ACKs and BYEs, go through the SIP
server, no matter what.
===== SIP Traffic =====
We wanted to generate SIP traffic patterns similar to those found in existing
real-world ITSP setups. Therefore, we collected traffic statistics from three
European ITSPs and created our traffic model based on the data. The sizes of
their subscriber population range from 100k to low millions. We design our test
scenarios based on the information about the actual signaling traffic from one
of the ITSPs. Our load generators emulate the UA population of a given size and
for each UA they generate:
* One NOTIFY request per 15 seconds. The server sends a 200 OK response back. The purpose of the request is to keep UDP bindings in NATs open. We observed that the ITSP used this technique because it was the most reliable method, despite it generated a lot of network traffic.
* One registration refresh per 50 minutes. This included two REGISTER messages and two responses because the SIP server would challenge the first request with digest authentication.
* Call setups with the following SIP message sequence:
INVITE-407-ACK-INVITE-100-180-200-ACK-BYE-200.
During the test we kept increasing the rate of call setups as long as the system
remained stable, with occasional retransmissions only and no dropped calls.
We tried to generate the traffic for for 0.5 million users. This resulted in 33k
NOTIFY requests per second, 166 registration refreshes per second and 75 call
setups per second.
===== Tests Scenarios =====
Our first goal was to see if the server could support 0.5 million users. Before
running the main test, we populated the user location database with contacts
for all the 0.5 million users. We did that by sending REGISTER requests for
each of the users at a high rate. After that we slowed the registration rate
down to 166 updates per second (each update generated two REGISTER messages,
one with authentication and one without). In addition to populating the user
location database with contacts, this initial test also fetched data into the
memory query cache in MySQL.
UAC Registrar
| REGISTER |
|--------------------->|
| 401 |
|<---------------------|
| REGISTER w/ digest |
|--------------------->|
| 200 OK |
|<---------------------|
Next, we added NOTIFY requests for NAT keep-alives. A population of 0.5 million
where each user sends a NOTIFY request per 15 seconds generates 33k NOTIFY
requests per second. We needed to start a number of sipp processes as multiple
machines to achieve that rate.
UAC Proxy
| NOTIFY |
|---------------->|
| 200 OK |
|<----------------|
Finally, we added INVITE-ACK-BYE transactions to the mix. From the ITSP survey
we know that 100k subscribers generate 20 INVITE transactions in busy hours.
Therefore, we started additional sipp instances and configured them to generate
100 calls per second. We also configured sipp to wait for 4 second before
sending the final 200 OK for an INVITE. The following diagram shows the call
flow of this scenario.
UAC Proxy UAS
| INVITE | |
|--------------------->| |
| 407 | |
|<---------------------| |
| | |
| INVITE w/ digest | |
|--------------------->| |
| 100 Trying | |
|<---------------------| INVITE |
| |-------------------->|
| | 180 Ringing |
| 180 Ringing |<--------------------|
|<---------------------| |
(Pause for 4s)
| | 200 OK |
| 200 OK |<--------------------|
|<---------------------| |
| ACK | |
|--------------------->| ACK |
| |-------------------->|
| BYE | |
|--------------------->| BYE |
| |-------------------->|
| | 200 OK |
| 200 OK |<--------------------|
|<---------------------| |
===== Results =====
We found out that the SIP server can handle the signaling traffic generated by
0.5 million user agents without any problems. During the test all CPU cores
were utilized to less than 10%. The signaling traffic generated by all sipp
instances on both network interfaces on the server was about 210 Mbits per
second (as measured by iftop). MySQL server consumed most CPU time and under
this load the whole server consumed about 210W. Because the load of the
SIP server was low, we repeated the test and tried to simulate 1 million
subscribers.
For 1 million subscribers we needed to generate 66k NOTIFY requests per second,
332 registration refreshes per second and 200 calls per second. Unfortunately
our sipp instances could not generate 66k NOTIFY requests per second and we had
no other machines we could use as additional load generators.
To determine how much CPU load NOTIFY requests alone would generate, we stopped
all other sipp instances and kept only those that were generating NOTIFY requests.
The CPU load in this scenario was about 1-2% per CPU core. The amount of network
traffic generated by the keep-alives alone was about 200 Mbit/s.
Finally, we stopped all the sipp instances generating NOTIFYs and repeated the
registration and call setup tests at a rate that would be generated by 1 million
users, i.e., 332 register refreshes per second and 200 calls per second. This
amount of traffic generated from 10% to 20% of load per CPU core. The busiest
processes were again those of MySQL. During this test (without NOTIFYs) the server
consumed 190W. The traffic generators generated 25.5 Mbit of signaling traffic per
second. During the test there were 42000 active transactions on the SIP server.
Even during this test the system was stable, with no dropped calls and a negligible
number of retransmissions.
===== Conclusions =====
We were able to simulate all signaling traffic generated by the population of
0.5 million user agents and verified that the single SIP server can handle that
amount of traffic.
Because of the low load on the SIP server, we repeated the test for 1 million
user agents. Although we didn't have enough load generators to generate NOTIFY
messages for 1 million users, we learned that the load generated by NOTIFY
requests alone is very low. Furthermore. we verified that the single SIP server
can handle user location updates and call setups generated by 1 million of users.
===== Short-Lived TLS Connections =====
The goal of this test was to stress-test the TLS connection establishment phase
and see the impact of the TLS handshakes on CPU utilization. Therefore, we
configured sipp to create a new TLS connection for each registration and for
each new call. Below are some preliminary numbers. The testing scenario was the
same as in the previous tests (333 registration refreshes per second, 150 CPS,
1000000 users), except the following:
* There were no NOTIFYs keep-alives (they are not needed in this scenario).
* All traffic was encrypted with TLS.
All averages were calculated from 5 consecutive measurements. With the following
numbers the system was stable and could run for hours:
* **Maximum CPU utilization**: 800% (2 CPUs, 4 cores per CPU)
* **Average CPU utilization by MySQL**: 27% (as reported by top)
* **Average CPU utilization by SIP server processes**: 176% (as reported by top)
* **Average number of established TLS connections**: 617 (as reported by SIP server and verified with netstat)
* **Average number of SIP transactions**: 3605 (as reported by sercmd tm.stats)
* **Call rate**: 150 CPS (inv-407-ack-inv-180-4_sec_pause-200-ack-bye-200)
* **Maximum call rate**: 200 CPS
* **Registered contacts**: 1 million
* **Openssl compression**: disabled
* **TLS version**: TLSv1
* **Certificate verification**: disabled everywhere
* **SIP server processes**: 16
* **Maximum power consumption under load**: 209W
* **Traffic volume with encryption**: 27 Mbit/s (TX+RX)
* **SIP server shared memory**: 2048MB
* **Average TLS connection setup rate**: 478 new TLS connections per second
We could increase the call rate all the way up to 200 CPS. Above that sipp instances
could not keep up and the whole system became unstable. sipp started generating
traffic spikes (as a result of variable call rate) and the spikes would eventually
overload the server.
We also determined that the SIP server needs 61 kB of memory per one TLS connection.
On a 32-bit machine with 4GB of memory and with 2.5GB reserved for SIP server, the
server could support no more than 43k simultaneous TLS connections.
===== TCP Connections =====
To stress test SIP over TCP, we configured sipp to send all signaling traffic over TCP and ran
a series of very simple TCP tests. The load-generators generated 332 registration refreshes
per second and all messages were sent over TCP. The CPU load generated by the SIP server was
from 6% to 8%. With 80k permanent TCP connections, the SIP server could still handle at least
1000 requests per second and a connection arrival rate of 1000 new connections per second,
with 20k new connections.
The SIP server consumed about 4.5 kB of memory per TCP connection. That is in addition to any
memory consumed by the kernel and the OS for those connections.
===== Enhancements in v3.1.x =====
We used a source code snapshot made on 3rd February 2010 for all the tests
described on this page. The source code in that snapshot is practically
identical with the source code that is included in release 3.0.0, the first
release that integrated both SER and Kamailio SIP servers in same application.
A new major version was released in October 2010. That release includes
improvements that could potentially improve performance of the SIP server
even further, such as:
* **Support for raw UDP sockets in Linux**. A 30% increase in performance of the SIP server over UDP was reported on the mailing lists in some scenarios.
* **New options for memory tunning of OpenSSL**. The TLS tests revealed that OpenSSL's memory consumption can be a major bottleneck. New options implemented in the tls module may be used to bring the memory consumption down somewhat.
* **Use of asynchronous API for TLS connections**. This feature could potentially increase the overall processing capacity of the SIP server over TLS/TCP connections in scenarios where memory consumption is not the bottleneck.
For a full list of features and improvements in v3.1.x, see:
* http://www.kamailio.org/dokuwiki/doku.php/features:new-in-3.1.x