The MessageGateway Engine will bring voice to text-based chatbots. The Gateway accepts SIP VoIP and then communicates with a chatbot over a JSON/REST interface. The Gateway also communicates with Google over gRPC for Speech to Text and Amazon Poly over JSON/REST for Text to Speech.
Interfaces
SIP Interface - SIP signalling interface.
RTP Interface - G.711 and G.722 RTP media interface.
JSON/REST API - Chatbot textual interface.
Speech to Text - Google gRPC streaming recognition interface.
Text to Speech - Amazon Poly TTS
ChatAPI Fields:
Version: Version of the API
Call-ID: SIP Call-ID
Orig/Term: Originating and Terminating numbers
User-Agent: Device and Version
Voice: The Amazon TTS Voice to use.
Text: The Message
Suggested Third-party Libraries:
libFAAD
libFAAC
libjrtp
libgrpc
libprotobuf
openssl (required for libgrpc)
reSIProcate
BOOST
libevent
Other Third-party libraries may be used as long as they are not GPL.
Operation:
1) Incoming SIP call to Gateway
2) Gateway sends chatbot a ping message with Call-ID, Orig/Term numbers, User-Agent: this lets the chatbot know a new session is starting.
3) 2-way media is setup between SIP user and Gateway
4) Incomming media is streamed to Google over gRPC
5) Chatbot sends "Hello" in Text: with Voice: set to the desired Amazon Polly Voice.
6) Gateway sends "Hello" text to Amazon Polly for TTS
7) Response from Amazon Poly contains "Hello" speech
8) "Hello" speech is sent to SIP User
9) "What time is it?" coms in from SIP User
10) Google responds with Text "What time is it?"
11) Test: "What time is it?" is sent to chatbot
12) Text "7:28 PM" sent to Gateway
13) Gateway sends "7:28 PM" text to Amazon Polly for TTS
14) Response from Amazon Poly contains "7:28 PM"
15) "7:28 PM" speech is sent to SIP User