Internet Engineering Task Force SIP WG Internet Draft Rosenberg,Mataga,Ladd draft-rosenberg-sip-vxml-00.txt dynamicsoft July 13, 2001 Expires: February 2002 A SIP Interface to VoiceXML Dialog Servers STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt To view the list Internet-Draft Shadow Directories, see http://www.ietf.org/shadow.html. Abstract VoiceXML is an XML based scripting language for describing voice dialogs. VoiceXML interpreters run within an interpreter context that, among other tasks, provides a call control interface for accessing the interpreter. It is very natural to provide a VoIP-based interpreter context that uses SIP and RTP to communicate with the outside world. In this document, we provide detailed specifications for a SIP/RTP based interpreter context. 1 Introduction VoiceXML [1] is an XML based scripting language for describing voice dialogs. It supports user input through speech recognition and DTMF, and can communicate with the user through text-to-speech or recorded files. VoiceXML scripts are interpreted by a VoiceXML interpreter. Rosenberg,Mataga,Ladd [Page 1] Internet Draft sip-vxml July 13, 2001 This interpreter, in turn, runs within an interpreter context. The interpreter context is the interface between the outside world and the interpreter. It typically handles the mechanisms by which the script execution begins, and by which it is fed media to drive it. It also provides the means for fetching documents from some form of document server. It is very natural to provide a VoiceXML interpeter context based purely on IP. Specifically, based on VoIP using SIP [2] and RTP [3], along with HTTP for document access. An incoming VoIP call triggers the execution of the script, fetched from a server using HTTP. The incoming RTP stream for the call is passed to the interpeter for processing, and speech generated by the interpreter is sent over RTP to the called party. We call a pure IP-based VoiceXML system an "IP dialog server", or just "dialog server". Dialog servers are a key part of the application story for SIP-based networks, as described in the SIP application component architecture [4]. That document describes SIP-based dialog servers, and provides a high level overview of how the SIP interface works. This document provides a stand-alone, self-contained, more thorough description of a SIP-based VoIP VoiceXML interpreter context. 2 Script Initiation The script execution begins when a session is established using an INVITE request. 2.1 Script Naming In SIP, the request-URI identifies the user or service that the call is destined for. In the case of a dialog server, the dialog itself is the target for the call. As such, the request URI should contain the identifier for this dialog. This is consistent with the Request-URI service invocation model of RFC 3087 [5]. This URL can be in one of two formats. In the first, the VoiceXML script is identified directly by an HTTP URL. In the second, the script is not specified. Rather, the dialog server uses its configuration to map the incoming request to a specific script. The format for the Request-URI in either case is: Request-URI = "sip:" service-ID "." dialog-type ["." dialog-specific] "@" hostport url-parameters [headers] service-ID = "dialog" | extension-token dialog-type = "vxml" | service-token dialog-specific = vxml-specific | service-token Rosenberg,Mataga,Ladd