Guide to WhatsApp Voice (Inbound) & SIP Integration via Gupshup

Introduction

WhatsApp Voice enables end-customers to initiate and receive calls via WhatsApp, creating a unified messaging and voice experience. Gupshup facilitates this capability through SIP integrations. This guide combines both product documentation and technical specifications for implementing incoming WhatsApp voice calls.


📘

If voice testing is not being done, please disable the voice icon. Leaving it enabled without proper setup can lead to the WABA being flagged or banned by Meta.


Prerequisites

From Tech Provider & Business:

  • Active WhatsApp phone number on Gupshup.
  • SIP Voice setup with PSTN calling support.
  • Audio Codec: Opus/48000 (mandatory for compatibility).
  • Call Routing: Ensure seamless forwarding with no failures.
  • Firewall Configuration: No blocking of SIP or RTP traffic.
  • IP Whitelisting:
    • Brazil: 54.207.112.105
    • India: 35.154.159.246
    • US: 3.225.218.184

    Ensure your firewall allows traffic to/from ephemeral ports (1024–65535).

    Media will come from the following IPs: 166.117.124.43 and 76.223.74.4.


📘

Support:

For Beta issues, write directly to [email protected] for faster resolution


Step 1: Partner initiates enablement via Partner API with SIP configuration. Step 2: Gupshup forwards the request to Meta. Step 3: Meta responds:

  • ✅ Success: "Enable Voice - OK"
  • ❌ Failure: Error message with reason.

SIP Configuration Options

Basic SIP Call (No Registration)

  • Scenario:
    • This configuration is used for direct SIP calling without registration. The SIP endpoint sends INVITE messages directly to the destination without establishing a prior registration.
  • Use cases:
    • Direct SIP trunking between providers
    • Anonymous SIP calls
    • IP-to-IP direct calling
    • Some wholesale VoIP scenarios
  • Example:
{
  "host": "ip/domain",
  "port": "50X1"
}
  • No REGISTER messages are sent
  • INVITE messages go directly to the destination
  • Often used with IP-based authentication instead of user credentials
  • May require specific firewall/NAT traversal handling

No REGISTER messages sent. INVITE goes directly. Used with IP-based authentication.


SIP Call With Authentication

  • Scenario:
    • Registration is disabled, but authentication is still required.
      • This hybrid configuration is used when registration isn't needed but authentication is still required for outgoing calls. The SIP endpoint authenticates each INVITE rather than maintaining a registration.
  • Use Cases:
    • SIP peering arrangements
    • Some SIP trunking scenarios
    • Cases where registration is undesirable but security is needed
    • Load balancer or proxy scenarios
  • Example SIP Data
{
  "host": "ip/domain",
  "port": "50X1",
  "secret_key": "XXXXXXXX18fc94ac1488b86c0XXXXX"
}
  • The client will periodically re-register based on the expiration time
  • Authentication uses SIP digest authentication (usually MD5)

TCP Transport Protocol Required

  • Scenario
    • This configuration forces TCP as the transport protocol instead of the default UDP. TCP is used when message reliability or larger message sizes are required.
  • Use Cases
    • Large SIP messages (many headers or SDP)
    • Networks with UDP reliability issues
    • Certain security requirements
    • NAT traversal scenarios where TCP works better
  • Example SIP Data:
{
  "username": "911XXXX258312",
  "host": "ip/domain",
  "port": "50X1",
  "secret_key": "XXXXXXXX18fc94ac1488b86c0XXXXX",
  "force_tcp": true
}
  • Uses SIP over TCP (RFC 3261)
  • TCP provides reliable delivery, but with higher overhead
  • The default SIP port remains 5060 (5061 for TLS)
  • May require different keepalive mechanisms than UDP
  • Often used as a fallback when UDP fails‌

SIP Configuration Metadata Format

[
  {
    "user": "911XXXX258312",
    "username": "911XXXX258312",
    "host": "ip-XX-232-3-XX.ap-XXX-1.compute.internal",
    "port": "50X1",
    "secret_key": "XXXXXXXX18fc94ac1488b86c0XXXXX",
    "force_tcp": false
  }
]
ParameterDescriptionTypeRequiredDefault
userVirtual SIP endpoint (user portion of SIP URI)StringNoDialed Number
usernameSIP user or extensionStringNoDialer Number
hostSIP server IP/domainStringYes
portSIP URI forward portStringYes
secret_keyPassword used for registrationStringNoBlank String
force_tcpWhether to use TCP instead of UDPBoolNofalse
📘

username@host:port : “sip:[email protected]:5071”. ( sip URI )



SIP INVITE Flow (For Incoming Calls)

+--------+         +----------+         +--------+
 | Caller |         |  SIP     |         | Callee |
 | Agent  |         |  Server  |         | Agent  |
 +--------+         +----------+         +--------+
     |                  |                     |
     |---- INVITE ----->|                     |
     |<-- 100 Trying ---|                     |
     |                  |---- INVITE -------> |
     |                  |<-- 180 Ringing ---- |
     |<-- 180 Ringing --|                     |

SIP works over Symmetric Response Routing — RFC3581


Sample SIP INVITE Payload

INVITE sip:[email protected]:5071 SIP/2.0
Via: SIP/2.0/UDP XXX.XXX.XXX.XXX:XXXXX;rport;branch=z9hG4bKXXXXXXXXXXX
Max-Forwards: 70
From: "+91XXXXXXXXXX" <sip:[email protected]:5071>;tag=XXXXXX
To: <sip:[email protected]:5071>
Call-ID: wacid.XXXXXXXXXXXXXXXXXX
CSeq: XXXXXXXXX INVITE
Contact: <sip:[email protected]:XXXXX;transport=udp>
User-Agent: Gupshup WebRTC Gateway
Allow: INVITE, ACK, BYE, CANCEL, OPTIONS, REFER, MESSAGE, INFO, NOTIFY
Supported: replaces
Content-Type: application/sdp
Content-Disposition: session
Content-Length: 451
X-WV-CALLID: wacid.XXXXXXXXXXXXXXX
X-WV-FROM: +91XXXXXXXXXX
X-WV-TO: +91XXXXXXXX
v=0
o=- XXXXXXX IN IP4 XXX.XXX.XXX.XXX
s=-
t=0 0
a=ice-lite
m=audio XXXXX RTP/AVP 111 126
c=IN IP4 XXX.XXX.XXX.XXX
a=rtpmap:111 opus/48000/2
a=fmtp:111 maxaveragebitrate=20000;maxplaybackrate=16000;minptime=20;sprop-maxcapturerate=16000;useinbandfec=1
a=rtpmap:126 telephone-event/8000
a=mid:audio
a=msid:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX WhatsAppTrack1
a=rtcp-fb:111 transport-cc
a=maxptime:20
a=ptime:20

Media & RTP Requirements

  • OPUS/48000 for codec, since the RTP clock rate is set at 48000 in SDP as per the RFC.
    • Partners using Twilio can face some issues with their default support being for g711.
    • fmtp can suggest maxaveragebitrate=20000;maxplaybackrate=16000;minptime=20;sprop maxcapturerate=16000;useinbandfec=1 , but with opus 48k
  • DTMF coming through the RTP data stream.
    • rfc4733 ; a=rtpmap:126 telephone-event/8000
  • Symmetrical RTP Requirement
    • rfc4961
    • We send the Symmetric RTP Requirement
    • Infra should allow symmetric RTP
    • Ensures media flows through the same path as the signalling
    • It will be expected that RTP packets to be sent from the same IP/port combination that was negotiated in SDP

Billing Events API (Sample Payload)

{
  "call": {
    "id": "wacid.XXXXXXXXXXXXX",
    "to": "16315XXX601",
    "from": "16315XX3602",
    "event": "terminate",
    "direction": "incoming",
    "timestamp": 1671644824,
    "status": "Completed",
    "start_time": 1671644824,
    "end_time": 1671644944,
    "duration": 120
  },
  "phone": "780XXX7021",
  "conversationType": "CALL",
  "isGsBilling": true,
  "appId": "269a4153-XXX-4590-XXXX-6e2e937836e4",
  "billable": true
}

WhatsApp Media Flow (README)

A common symptom of this issue is that the WhatsApp user hears the business audio with a 1–2 second delay. For example, if an IVR plays “1-2-3,” the user may hear it 1–2 seconds after answering the call.

To reduce this delay, we expect the customer system (partner SIP) to send the Answer SDP (200 OK) as quickly as possible after receiving the initial INVITE.

Gupshup uses an “optimistic accept” strategy. This means we pre-start the media setup before receiving the 200 OK. By the time the 200 OK arrives, the media path is already ready and audio starts immediately.

This behaviour follows RFC 8839.

Without Optimistic Accept

+---------+           +----------+           +---------+
| WhatsApp|           | Gupshup  |           | Partner |
| Agent   |           | Server   |           |  SIP    |
+---------+           +----------+           +---------+
     |                     |                     |
     | INCOMING CALL       |                     |
     |-------------------->|  INVITE             |
     |                     | ------------------->|
     |                     |                     |
     |                     | 100 Trying          |
     |                     |<--------------------|
     |    Initialize MEDIA |                     |
     |<--------------------|                     |
     |--------MEDIA------->|                     |  // Call timer starts
     |                     |                     |  // but no audio yet
     |                     |    (late) 200OK     |
     |                     |<--------------------|
     |    START MEDIA      |                     |
     |<--------------------|                     |
     |                     |                     |
     |                     | -------MEDIA------->|

With Optimistic Accept (Recommended)

+---------+           +----------+           +---------+
| WhatsApp|           | Gupshup  |           | Partner |
| Agent   |           | Server   |           |  SIP    |
+---------+           +----------+           +---------+
     |                     |                     |
     | INCOMING CALL       |                     |
     |-------------------->|  INVITE             |
     |                     | ------------------->|
     |                     |                     |
     |    Initialize MEDIA |                     |
     |<--------------------|                     |
     |                     |                     |
     |                     |   (early) 200OK     |
     |                     |<--------------------|
     |    START MEDIA      |                     |
     |<--------------------|                     |
     |--------MEDIA------->|                     | // No delay
     |                     | -------MEDIA------->|

Media IPs

Media will originate from these IPs:
- 166.117.124.43
- 76.223.74.4

Voice Analytics

Available on Partner Analytics Dashboard. Only incoming call events and duration are shown.

📘

Only Incoming voice events count of minutes will be shown

📘

The caller can be from any region, but latency may vary based on the caller’s location.

📘

Inbound Voice calls are free of charge by Meta

📘

Gupshup supports up to 100 calls at the same time for each WABA (WhatsApp Business Account).

OUTGOING CALL (Business initiated) (Tier-based pricing for higher than 50,000 call minutes per month will also be available from Meta. Check here)

Please check the WhatsApp voice calling rates from here.