Sub

What AV software do you use?

----------

Discuss these topics on the Forum.












Voice over IP security - SIP and RTP protocols

Voice over IP security - SIP and RTP protocols

Tobias Glemser, Reto Lorenz

Voice Over IP (VoIP) is one of the hottest buzzwords in contemporary IT, even more so since the last CeBit in March 2005, and a new hope for both service providers and device manufacturers. Countries with good network infrastructure typically have several offers of VoIP bundles, consisting of a hardware router with VoIP functionality and attractive pricing for both Internet access and telephony. VoIP is set to displace stationary telephony solutions sooner or later, but serious security issues tend to go unnoticed in all the hype.

Today, VoIP technology is a common component of broadband Internet access offers, with free calls between VoIP users within the same provider and cheap all-inclusive offers for interfacing to classic telephony systems serving to spur the popularity of this technology. What's more, it is not only the SOHO (Small Office Home Office) users who are embracing VoIP - larger companies also increasingly recognising the technology's potential for communications infrastructure consolidation. They can now connect branch offices with one fibre-optic cable and use it to transmit both voice and data. Employees can always be reached at the same phone numbers, regardless of where they physically are, while the dual use of network infrastructure sharply cuts the costs of purchasing, installing and maintaining active and passive network components. As usual, problems only appear after a system has been bought and deployed, as manufacturers are not too forthcoming in this matter, preferring to push their brilliant migration strategies and overvalued services instead.

One of these shortcomings received a lot of media attention recently, when a thirteen year old girl died because the US emergency call number (911) had not been routed in the VoIP network her mother used. In most countries, legal regulations concerning the routing of emergency calls in VoIP networks simply don't exist yet, with the issue only being discussed since quite recently.

Besides organisational deficiencies, several attacks against the VoIP technical infrastructure exist. Before approaching them, we'll need to understand the basics of SIP (Session Initiation Protocol) security. We will stick to SIP, as current trends clearly indicate a migration away from H.323 and towards SIP.

The purpose of this article is not to introduce SIP itself (see Frame SIP - Simply bare necessities for some background information), but rather to see how attacks against VoIP can be conducted and what can be done to guard against them. The attacks described here target a typical VoIP environment which uses SIP as the signalling protocol, and are based on commonly used methods, as implementation-specific attack methods are beyond the scope of this article.

SIP - Simply bare necessities

SIP packets contain initial call setup parameters. All other parameters - such as RTP connection attributes - are sent using the Session Description Protocol (SDP), which is embedded into SIP messages as the message body. SIP packets can be divided into request and response packets. Messages are encoded using the UTF-8 standard, so they are directly readable if no other security measures are employed.

SIP messages are very similar to HTTP - Table 1 shows the required header request fields. A glance at the protocol elements reveals that the protocol definitions actually provide contextual communication, even if data is sent using a stateless transport protocol such as UDP.

Now we know the basic SIP components, let's have a look at the literal request strings (see Table 2), corresponding to several different request methods. SIP can be enhanced with new request methods, so will only be referring to the basic ones (see the relevant RFCs for specifications of other methods). The request methods and their related request strings indicate that several types of attacks can be conducted (a discussion of other response classes and their uses is beyond the scope of this article).

Messages are integrated into the communication context. The latter may contain two types of components: dialogues and transactions, with each dialogue potentially including multiple transactions. For example, any VoIP call is an SIP dialogue consisting of the INVITE, ACK and BYE transactions. User agents must be capable of storing dialogue status for an extended period in order to generate messages with the correct parameters.

The use of dialogues means that there are several other connection parameters besides Call-ID - two of these are tag and branch. It must be noted that the correspondence between context-specific values and user-agent behaviour is not as clear-cut as other SIP definitions, which is one reason for the existence of buggy, unreliable and insecure implementations.

After a call is successfully switched through an SIP proxy, the actual voice communication proceeds using RTP. Using the exchanged codes, voice messages are transferred between the communicating parties (provided direct IP communication is possible), and the SIP proxy is only needed for call release.

Table 1. SIP request header fields

Header

Description

Request-URI

Contains the method, the request URI and the SIP version used. The request URI is typically the same address as the To field (except for the REGISTER method).

To

Target for the message and its associated method. The target is a logical recipient, because it is not clear from the beginning whether the message will reach the named recipient. Depending on the communication context, a tag value may also be attached.

From

Logical identifier of the request sender. The From field has to contain a tag value, which is chosen by the client.

CSeq

Short for Command Sequence. Used for checking the order of the message within a transaction. Consists of an integer value and an identifier of the request method.

Call-ID

Unique value assigned to identify all the messages within a dialogue. It should be established using cryptographic methods.

Max-Forwards

Used to avoid loop situations. If no external criteria exist for specifying a certain value, the value 70 should be given.

Via

Shows the forwarding path and response target location. The field has to contain a branch value, which is unique to a specific user agent. The Branch-ID always starts with z9hG4bK and uses the request to mark the beginning of a transaction.

Table 2. SIP request header methods

Method

Description

REGISTER

Method for registering and deregistering a proxy client. Registering is required to prepare for VoIP communication. Deregistering is done by setting the period value to 0.

INVITE

The most important method, and the reason we need SIP. All subsequent methods are subordinate to it, even if they are used in isolation. INVITE is used to set up new calls.

ACK

Once a call (such as a video conference) is set up, readiness is acknowledged by sending a separate ACK request. A streaming connection immediately follows.

BYE

Used to end calls normally. Sending it terminates a transaction established using INVITE. A BYE message will not be processed without the appropriate dialogue parameter (Call-ID or tag).

CANCEL

Used for cancelling a connection before a call is established. Also used in error situations.

OPTIONS

Used to establish the supported request methods or the transmission media attribute.

NOTIFY

Additional request method defined in RFC 3265, allowing a client to be notified of the status of the resource they are connected to (for example receiving notification of new voice messages).

SIP and family

Understanding VoIP communication requires a discussion of several protocols used for setting up and ending a call. One of these hashes the signal to divide it between the various communicating parties for signalling, voice transfer or gateway messages. Unlike traditional telephony, where - from a user's point of view - communication requires only a single cable, VoIP involves split communication paths. Here are the most important protocols:

  • signalling - SIP and SDP (to establish streaming properties),

  • transport - UDP, TCP, SCTP,

  • streaming - RTP, sRTP, RTCP,

  • gateways - SIP, MGCP.

These protocols provide core VoIP functionality and are used in a growing number of implementations. Other protocols also exist, but here will focus just on the ones listed above.

To appreciate how attacks can be approached, we will go through the process of setting up a basic call, using just one SIP proxy for all examples. The proxy is a part of the signalling and dial switching infrastructure. In practice, there are usually two or more switching SIP proxies, especially if the call participants are not within the same network environment. If several proxies are used, they also exchange SIP messages, which results in extra layers of communication. Before we go into more detail, Figure 1 provides an overview of the basic mechanism. The actual protocols contain no ground-breaking features. SIP, for instance, uses some very typical techniques, including elements of HTTP, while RTP was defined almost 10 years ago and last updated in 2003.

Figure 1. Overview of setting up a call using SIP

 

SIP/ARP attacks against VOIP

Several attack vectors exist, each requiring different activity on the part of the attacker. We will look at seven of the most popular, most effective and most widely discussed attacks, and see how they can be used in practice.

The main reason for the vulnerability of VoIP when compared to Plain Old Telephone Systems (POTS) is the use of a shared medium. No dedicated line exists for call transactions, just a network used by lots of users and lots of different applications. This makes it much easier for an attacker to tap into communication - all he needs to do is use a suitable computer.

Eavesdropping on telephone calls and replaying them in front of the communicating parties is definitely one of the most impressive attacks on VoIP. As outlined earlier, signalling is done via an SIP proxy, while the actual communication between parties uses the peer-to-peer model. In our scenario, we want to listen in on the conversation between Alice and Bob. To achieve this, we should launch a man in the middle (MITM) attack using ARP poisoning (see Frame ARP poisoning attack) to convince the proxy and Alice and Bob's VoIP phones that they actually want to communicate with us rather than each other.

ARP poisoning attack

The attacker poisons the ARP table of the systems to be attacked. The purpose of the ARP table is to convert logical IP addressing to actual physical addressing in Layer 2 of the OSI reference model (Ethernet MAC addresses). Almost every non-hardened operating system accepts unrequested ARP replies, so the attacker first fills the ARP table with all the IP addresses he wants to get between and then deposits his own MAC address for all these IP addresses by sending such unrequested ARP replies. Each packet received is duly forwarded to the original recipient, who is also being poisoned. Communication is working, but the interception will not recognized by the communicating parties if they don't use cryptographic mechanisms like TLS/SSL.

Figure 2 presents an outline of VoIP transmission sniffing. First, the call is set up. Alice sends the SIP proxy a request to call Bob. The message is intercepted and forwarded by the attacker. The SIP proxy now tries to reach Bob to tell him that Alice wants to communicate with him - this message is intercepted and forwarded, too. After successful call initialisation, the actual call between Alice and Bob begins (using the RTP protocol), and this RTP communication is also intercepted and forwarded by the attacker.

If you use a tool like Ethereal to sniff the communication, you will also receive the RTP stream payload. To listen to it, you can load the sniffed data into a voice decoder like the Firebird DND-323 Analyzer or use Ethereal itself, provided the G.711 U-law (PCMU) or G.711 A-law (PCMA) codecs are used (these are the international standards for coding and decoding telephony transmissions).

 

Figure 2. VoIP sniffing

 

A very clever tool for performing both voice decoding and ARP poisoning is called Cain & Abel (see Frame On the Net). Once you have it up and running, you should check all existing hosts in your subnet (using ARP requests) by clicking the plus symbol. These hosts can now be seen under the tab Sniffer and can be chosen as victims in the sub-tab ARP. For our attack, we will select the IP addresses of Alice, Bob and the SIP proxy. After clicking the Start/Stop ARP button, the ARP poisoning is initialized and the attacker has only one thing left to do - sit and wait. The rest is done by Cain & Abel (see Figure 3). If a call between Alice and Bob was established and concluded, it will automatically be stored as a WAV file and shown in the VoIP tab - you can listen to the conversation using any audio player. By the way, if the communicating parties happened to exchange some passwords in the meantime (POP3 for example), the attacker might want to have a look at them using the Passwords tab.

 

Figure 3. Voice decoding with Cain & Abel

 

As you can see, if no additional security measures are employed, an attacker within the local network can easily sniff the communication and then simply listen to it.

Identity theft and registration hijacking

Registering with an SIP proxy is normally done by submitting a username and password. As already mentioned, SIP messages are unencrypted. If an attacker is sniffing the authentication process (for example using ARP spoofing), he can use the username and password combination to authenticate himself on the SIP proxy.

However, such attacks are no longer possible for contemporary VoIP implementations. The authentication process (see Frame Security measures within VoIP protocols) and other secured operations make use of digest authentication. The client starts by attempting to authenticate with the SIP proxy (see Listing 1). The proxy rejects the authentication attempt by sending the status code 401 Unauthorized (Listing 2) and returns a demand for the client to log on using digest authentication. In the line beginning with WWW-Authenticate, a random nonce value is provided.

Security measures within VoIP protocols

Apart from mechanisms for protecting contextual communication, SIP features a number of other security measures (though these are not obligatory for SIP implementations), dealing mainly with authentication and cryptographic security of communication.

Several authentication methods are available. A common one is called digest authentication - a simple challenge-response mechanism which can be used for any request.

Another way of securing SIP packets is to use the well-known S/MIME protocol, which allows the SIP message body to be secured with S/MIME certificates. Using S/MIME assumes that a PKI and the necessary certificate verification mechanisms are available. In case of SIP, S/MIME is typically used to secure SDP messages, but using it in practice can be arduous and time-consuming if the necessary infrastructure is not in place.

Other security mechanisms require additional protocol elements. For example, TLS can be used both for SIP and RTP, but in the case of SIP the protection is only hop-by-hop, so it cannot be automatically assumed that the other party is using a TLS enabled phone.

Listing 1. SIP registration phase 1 (client to SIP proxy)

 
REGISTER sip:sip.example.com SIP/2.0
Via: SIP/2.0/UDP 10.10.10.1:5060;rport;
branch=z9hG4bKBA66B9816CE44C848BC1DEDF0C52F1FD
From: Tobias Glemser <sip:123456@sip.example.com>;tag=1304509056
To: Tobias Glemser <sip:123456@sip.example.com>
Contact: "Tobias Glemser" <sip:123456@10.10.10.1:5060>
Call-ID: 2FB73E1760144FC0978876D9D69AE254@sip.example.com
CSeq: 20187 REGISTER
Expires: 1800
Max-Forwards: 70
User-Agent: X-Lite
Content-Length: 0

 

Listing 2. SIP registration phase 2 (proxy to client) - rejection

 
SIP/2.0 401 Unauthorized
Via: SIP/2.0/UDP 10.10.10.1:5060;rport=58949;
branch=z9hG4bKBA66B9816CE44C848BC1DEDF0C52F1FD
From: Tobias Glemser <sip:123456@sip.example.com>;tag=1304509056
To: Tobias Glemser <sip:123456@sip.example.com>;
tag=b11cb9bb270104b49a99a995b8c68544.a415
Call-ID: 2FB73E1760144FC0978876D9D69AE254@sip.example.com
CSeq: 20187 REGISTER
WWW-Authenticate: Digest realm="sip.example.com", 
nonce="42b17a71cf370bb10e0e2b42dec314e65fd2c2c0"
Server: sip.example.com ser
Content-Length: 0 

In the third step (see Listing 3), the client re-authenticates, this time also sending a WWW-Authenticate message containing the username, the appropriate realm and the nonce value previously sent by the server. The most important part is the response value, which is usually an MD5 hash generated from the username, password, the nonce sent by the server, the HTTP method and the request URI. The message is processed by the server, which builds its own MD5 hash from the same data. If the two hashes are equal, authentication has been successful and is acknowledged by a status message from the server (Listing 4).

Listing 3. SIP registration phase 3 (client to proxy) - re-authentication

 
REGISTER sip:sip.example.com SIP/2.0
Via: SIP/2.0/UDP 10.10.10.1:5060;rport;
branch=z9hG4bK913D93CF77A5425D9822FB1E47DF7792
From: Tobias Glemser <sip:123456@sip.example.com>;tag=1304509056
To: Tobias Glemser <sip:123456@sip.example.com>
Contact: "Tobias Glemser" <sip:123456@10.10.10.1:5060>
Call-ID: 2FB73E1760144FC0978876D9D69AE254@sipgate.de
CSeq: 20188 REGISTER
Expires: 1800
Authorization: Digest username="123456",realm="sip.example.com",
nonce="42b17a71cf370bb10e0e2b42dec314e65fd2c2c0",
response="bef6c7346eb181ad8b46949eba5c16b8",uri="sip:sip.example.com"
Max-Forwards: 70
User-Agent: X-Lite
Content-Length: 0

 

Listing 4. SIP registration phase 4 (proxy to client) - success

 
SIP/2.0 200 OK
Via: SIP/2.0/UDP 10.10.10.1:5060;rport=58949;
branch=z9hG4bK913D93CF77A5425D9822FB1E47DF7792
From: Tobias Glemser <sip:123456@sip.example.com>;tag=1304509056
To: Tobias Glemser <sip:1888819@sipgate.de>;
tag=b11cb9bb270104b49a99a995b8c68544.017a
Call-ID: 2FB73E1760144FC0978876D9D69AE254@sip.example.com
CSeq: 20188 REGISTER
Contact: <sip:123456@10.10.10.1:5060>;q=0.00;expires=1800
Server: sip.example.com ser
Content-Length: 0

 

The hash sent in step 3 has two features that prevent fake authentication or the use of previously intercepted user data: it is valid only for the random nonce value and includes the username and password. This means that it is practically impossible for an attacker to break the password and tap into communication in a realistic amount of time.

DoS - Denial of Service

As with any other service, it is always possible to bring down a VoIP service if you have enough bandwidth available. In case of an SIP proxy, this could be done by using a register-storm attack to overload the service. Implementation vulnerabilities can also make DoS attacks against the service itself possible. It might even be possible to gain access to the server using buffer overflow attacks - one such vulnerability was discovered in 2003 in the open source Asterisk PBX server (CAN-2003-0761). Exploiting flawed parameter processing with MESSAGE and INFO messages, an attacker could launch local commands in the context of the asterisk service, which is typically started by root.

SIP's susceptibility to going down due to invalid SIP messages depends on the implementation - if a specific server has no mechanisms for handling (or even just ignoring) invalid messages, it might eventually go down. The Java-based PROTOS Test Suite is available to test server behaviour, and any PBX (Private Branch Exchange) owner would be well advised to run it against his box - see Frame On the Net).

A different type of DoS is user-supported DoS. Figure 4 shows a UDP message sent to an SIP phone with login 14 and IP 192.168.5.84 from the SIP-Proxy 192.168.5.25. By sending this message, the proxy (or the attacker) signals that the user has new voice mail in their inbox. You might notice this by having a look at the message body and the Messages-Waiting: yes and Voice-Message: 1/0 entries. The same notification applies for example to fax messages. The first digit (1) indicates how many new messages are stored, while the second (0) shows the number of old messages.

 

Figure 4. A modified SIP packet

 

As you can see, we have edited this packet. This can easily be done using the Packetyzer utility for Windows (see Frame On the Net), which is technically based on Ethereal. Any packet can be edited, and incorrect checksums are also shown and can be corrected. We can send our message to arbitrary recipients - we also need the user's IP and login ID, which is usually the same as their phone number. To illustrate that no further information is necessary, we will fill all other fields with 0 values (such fields as User-Agent don't matter, of course).

Faking such a message shouldn't be problem - after all, it doesn't contain any sensitive information, does it? Most phones (we tested a Cisco 9750 and a Grandstream BT-100) process such messages (even ones with incorrect checksums) and show them to the user. Usually, a notification icon or the whole display starts to blink. The user now calls their mailbox to listen to the non-existent new message. Because there is no new message, the user might think this is just a bug and ignore it. Shortly afterwards, the display starts blinking again. Now our user is calling technical support, who will busily set about locating the error (which could actually be quite amusing to look at, considering that there is no error).

If an attacker starts sending such messages to all the users in a network, both the users and the support staff will waste a great deal of time trying to track down the error. Sending the message to many users at once will also result in everyone calling their mailbox, potentially leading to service congestion or even a server breakdown.

Call interruption

Many papers report that sending a simple BYE message to a call participant is enough to immediately terminate a call. Well, it isn't quite that easy. First of all, as we already know, the attacker has to know the call ID of the call dialogue. RFC 3261 says: The Call-ID header field acts as a unique identifier to group together a series of messages. It MUST be the same for all requests and responses sent by either UA [User-Agent] in a dialogue.

There is no strict rule that the call ID has to be generated by hashing or has to be non-incremental, but most implementations exhibit exactly this behaviour, using randomly chosen call IDs. This means that in order to end the call using the call ID, the attacker would need to sniff out the call initialization phase, and if he's in a position to do so, then the content of the call would presumably be of much more interest than the ability to simply end the call.

Phreaking

Phreaking, or the fraud of telephony services, traditionally accomplished by sending special system tones in public call boxes, can well experience a revival. Due to the decoupling of payload (RTP voice stream) and signalling (SIP), the phreaking scenario outlined below seems pretty likely, though at present it is not yet possible.

A prepared client sets up a new call to another prepared client. Both connect via an SIP proxy and behave in a normal manner. Directly after the call has been established, the proxy receives a signal to end the call, which both clients acknowledge, but without actually quitting the RTP streaming. The call has not ended, but the SIP server doesn't notice it.

If both clients are located within the same subnet, the call would not end in any case, as the voice stream is P2P. If there's a breakout through the SIP proxy (for example if connecting to another network), RTP communication is routed via the proxy, which now has to end the RTP stream itself. The proxy would therefore have to recognize that call termination has been signalled via SIP and transfer this information directly to RTP communication control.

Another phreaking attack might also be possible, depending on the SIP proxy implementation. Some implementations, like the current version of Asterisk, require re-authentication using digest authentication (as presented in Listings 1-4) for almost every single client-server exchange. However, other implementations only require re-authentication after a certain period of time, and the following scenario demonstrates how this could be exploited to generate costs for the provider.

An attacker sends a valid INVITE message to the SIP proxy using the credentials of a successfully authenticated user. The SIP proxy now initializes the call, and the remaining packets required for successful call initialization can be sent by the attacker after a specific time, without waiting for the response packets from the server. Some special service number operators charge enormous amounts for a call, regardless of call duration. Using this scenario, an attacker could cause other users to be charged high rates for short special service calls.

SPIT (SPam over IP Telephone)

SPIT is one of the most commonly mentioned dangers of establishing VoIP services - attackers can send junk voice messages just like e-mail spam. Unlike calls from robots in the world of traditional telephony, VoIP calls don't generate initial costs. Like spammers, a spitter uses the victim's address, except in this case it is not their e-mail, but their SIP address. With the increasing popularity of IP telephony, it's only a matter of time before spitters will be able to easily obtain a great many valid SIP addresses, especially if central address books are indeed going to be introduced.

The spitter calls an SIP number, the victim's SIP proxy processes the call and the victim now has to listen to junk such as the required minimum size of one's manhood. Just like a spammer, a spitter needs just one thing - bandwidth. Voice messages require considerably more resources than e-mails. Assuming a 15 second message (as few victims could handle listening to more), one piece of spit would be 120 kB in size (if using a 64 kbps codec). The activity of trojan horses - just as with spam once again - could cause any unprotected Internet user to unwittingly send SPIT using their own bandwidth.

Diallers

A revival in the use of diallers, which were declared dead when non-dial-up technologies like DSL and cable modems became popular, may pose another threat. Because of the way an SIP client connects, we have the same scenario as with ordinary diallers which use modems or ISDN lines to call premium numbers. For example, a dialler could infect an SIP client and install a certain number as the standard call prefix or specify a new and very expensive SIP proxy. Calls would then be made through these costly numbers unknown to the user - at least until the first bill arrived.

No such diallers have yet been seen in the wild, but it's probably just a matter of time before we hear the first stories of VoIP dialler success.

Conclusion

There is no doubt that VoIP is one of the most thrilling IT innovations of past few years and is set to become another widespread use for the Internet and dominate both corporate and private phone networks. Judging by the media attention given to VoIP security problems, it might seem that the combination of SIP and RTP protocols is a rather a feeble coupling. Whatever the truth, security problems should always be carefully considered before migrating to a new technology.

As this article has shown, numerous attack vectors have been known for years - most are just slightly modified attacks on the IP protocol. Successful attacks against SIP/RTP are typically possible in LAN structures with unencrypted communications, for example by sniffing RTP streams. This attack is absolutely no different to sniffing data communications in TCP/IP. Most of the other attacks can only be successful if the SIP proxy or the UAC (User Agent Client) don't process the call ID correctly or if the attacker sniffs out the call ID. Security is also at risk if no digest authentication is demanded for every single action which requires it. However, SPIT is likely to be the biggest problem - when it comes to money, we can be sure that no evil advertiser will hesitate to make use of the new medium.