sip-router

SIP Router Project

Tasklist

FS#136 - tcp_async=yes breaks connect_timeout (no effect) and TCP reject detection

Attached to Project: sip-router
Opened by Iñaki Baz Castillo (ibc) - Wednesday, 15 June 2011, 09:47 GMT
Last edited by Daniel-Constantin Mierla (miconda) - Friday, 01 June 2012, 07:18 GMT
Task Type Bug Report
Category Core → tcp
Status Assigned
Assigned To Andrei Pelinescu-Onciul (andrei)
Operating System Linux
Severity Very Low
Priority Normal
Reported Version Development
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 0
Private No

Details

When using tcp_async=yes, tcp_connect_timeout has no effect at all. I set it to 3 seconds and try sending a request with RURI “sip:lalalala@91.121.79.216:7777;transport=tcp”. The server 91.121.79.216 refuses the TCP connection (REJECT action in iptables rather than DROP), try yourself:

<pre>

~# telnet 91.121.79.216 7777
Trying 91.121.79.216...
telnet: Unable to connect to remote host: Connection refused

</pre>

However when my Kamailio tries to open a TCP connection with such destination, it produces a local 408 after 30 seconds (fr_timer = 30000)!

Since Kamailio should detect that the TCP connection is rejected, I expect that it should generate an inmediate local 503 for the client transaction as RFC 3261 states.

I attach a file kamailio-tcp-reject.log which shows the issue. It clearly shows that tcp connection attemp fails inmediately (due to TCP reject) but it does not terminate the client transaction and instead it takes 30 seconds until it generates local 408:


Jun 15 12:40:51 kamailio[7275]: ERROR: <core> [tcp_main.c:4137]: connect 91.121.79.216:7777 failed
Jun 15 12:41:21 kamailio[7270]: DEBUG: tm [t_reply.c:1132]: →»»»» T_code=100, new_code=408


I’ve also tested sending a TCP INVITE to a host which is not reachable (1.2.3.4) so TCP timeouts occurs (after 5 seconds rather than 3 however...) but as in the other case, such TCP timeout does not terminate the client transaction and instead local 408 is generated after 30 seconds (I attach kamailio-tcp-timeout.log):


Jun 15 12:42:23 kamailio[7338]: DEBUG: <core> [io_wait.h:390]: DBG: io_watch_add(0x890f60, 18, 2, 0x7fae6c97d278), fd_no=10
Jun 15 12:42:28 kamailio[7338]: DEBUG: <core> [tcp_main.c:4304]: tcp_main: entering timer for 0x7fae6c97d278 (ticks=1376168512, timeout=1376170384 (117 s), wr_timeout=1376168512 (0 s)), write queue: 940 bytes
Jun 15 12:42:28 kamailio[7338]: ERROR: <core> [tcp_main.c:4322]: connect 1.2.3.4:5060 failed (timeout)
Jun 15 12:42:28 kamailio[7338]: DEBUG: <core> [tcp_main.c:4346]: tcp_main: timeout for 0x7fae6c97d278
Jun 15 12:42:28 kamailio[7338]: DEBUG: <core> [io_wait.h:617]: DBG: io_watch_del (0x890f60, 18, -1, 0×10) fd_no=11 called
Jun 15 12:42:53 kamailio[7333]: DEBUG: tm [t_reply.c:1132]: →»»»» T_code=100, new_code=408


If I set tcp_async=no, then the above issues don’t occur at all.

Kamailio info:

version: kamailio 3.2.0-dev5 (x86_64/linux) 552425
flags: STATS: Off, USE_IPV6, USE_TCP, USE_TLS, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, DBG_QM_MALLOC, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 4MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: 552425
compiled on 09:46:21 Jun 15 2011 with gcc 4.5.2

This task depends upon

Comment by Iñaki Baz Castillo (ibc) - Wednesday, 15 June 2011, 19:03 GMT

I assume what is happening:

When tcp_async=no, the worker itself open the TCP connection (which of course is bad as it gets blocked for a while) and can detect TCP failures or timeouts.

When tcp_async=yes, the worker just sends the message to some TCP process and inmediately returns so can handle other TCP requests. Then it just expects a SIP response associated to the client transaction from the transactiin layer, and it’s not aware of TCP rejection or timeouts.

If I’m not wrong in the above, could the tcp_async=yes mode be improved so some worker is notified about TCP rejection or timeout?

Comment by Iñaki Baz Castillo (ibc) - Wednesday, 06 July 2011, 12:03 GMT

Comment by Andrei Pelinescu in the devel maillist:

Yes, it’s a known limitation. Basically when async it’s own, tm has no way of knowing that a connect() has failed and would have to rely on sip timeout.
Of course these could be changed, but it would have both performance and memory usage impact and it would be very hard to integrate with tls. I would rather not do it in the near future.

The tcp_connect_timeout refers to how long the tcp connect will be attempted, but it’s not linked to tm. The value is not 100% exact, the tcp timers are executed on a best effort basis, at most at 5s intervals and at minimum 1/16 seconds, so you should expect a 5s error If it’s too much for you, change TCP_MAIN_SELECT_TIMEOUT and TCP_CHILD_SELECT_TIMEOUT in tcp_conn.h (btw. we don’t use select() anymore, the names where not updated when we switched to
epoll/kqueue/dev_poll).

Loading...