Features and User Guide: WinRTP

Overview of this Document

Introduction

What's New

Features

Components and Installation

Source Code Distribution
Binary Distribution
Installation
Test Program

Interface Description/Programmer’s Guide

WINRTP COM Interface
Details of WINRTP Interface (Interface ICCNMediaTerm)
COM GUIDs, etc

Configurable Parameters

Static/Dynamic Jitter Buffer
Jitter Buffer Length
Type of Service Value (DiffServ Byte) of Outgoing RTP Packets
Fixed Transmit Port for UDP Packets
Pre-emphasis of Transmitted Audio
Post-emphasis of Received Audio
Volume Limiting

Sample Code

Future Improvements

Overview of this Document

This document describes the WinRTP for the Cisco IP SoftPhone from a programmer’s point of view. It discusses the COM interface it provides, installation, and configuration of the component.

Introduction

The WinRTP (WINRTP) was developed as part of the Cisco IP SoftPhone product. Cisco IP SoftPhone is a PC based telephone integrated with AVVID, and works with the Cisco Call Manager. The primary focus of the WINRTP is to ensure that it works well with other products in AVVID including desktop IP Phones, gateways, etc.It can also be used as an independent component.

The basic job of the WINRTP is to source and sink audio streams to/from the network. So if an application needs the ability to do audio endpointing for real time voice (especially one that is integrated with Cisco’s AVVID), it can use this component.

What's New

Here are the improvements since the last version

Features

WINRTP consists of two independent parts. One part has the ability to capture the user’s voice (using the system’s microphone), encode it, and send the voice as an RTP (Real Time Protocol) stream to a configurable destination. The other part listens for an RTP stream from the network, extracts the audio from it, and plays it using the PC’s speaker. When both parts are used together, then WINRTP acts like an IP based voice endpoint. Here are the features in short

Components and Installation

The distribution comes with both source code and binaries. Extract the ZIP file to obtain everything. It will create a directory called WinRTP which will contain everything. 

Source Code Distribution

Test Program

There is also a test program (both source code and binary) that is available. It is a simple program that does not exercise all the features. It just connects your default microphone to your default speaker for 5 seconds so that you can hear yourself, and then exits. The source code is in the “WinRTP/TestWinRTP” folder and the binary is WinRTP/TestWinRTP.exe.

Interface Description/Programmer’s Guide

The WINRTP’s main DLL is CCNSMT.dll which exposes a COM interface that can be used to make calls to the WINRTP.

WINRTP COM Interface

The interface of WINRTP (ICCNMediaTerm) consists of the following functions

HRESULT Initialize() 
HRESULT UnInitialize() 
HRESULT StartMicrophone() - not implemented/needed in this version. 
HRESULT StopMicrophone() - not implemented/needed in this version. 
HRESULT StartAudioReceive() - not implemented/needed in this version. 
HRESULT StopAudioReceive() - not implemented/needed in this version. 
HRESULT SetAudioCodecRX(
                      [in] long CompressionType,
                      [in] long MillisecPacketSize, 
                      [in] long EchoCancellationValue,
                      [in] long G723BitRate
                      ) 
HRESULT SetAudioCodecTX(
                      [in] long CompressionType,
                      [in] long MillisecPacketSize,
                      [in] long PrecedenceValue,
                      [in] long SilenceSuppression,
                      [in] unsigned short MaxFramesPerPacket,
                      [in] long G723BitRate
                      ) 
HRESULT SetAudioDestination(
                      [in] BSTR strHostName,
                      [in] long nUDPortNumber
                      ) 
HRESULT SetAudioReceivePort(
                      [in] long nUDPPortNumber
                      ) 
HRESULT StartPlayingFileTX(
                      [in] BSTR Filename,
                      [in] unsigned long Mode,
                      [in] unsigned long StartPosition,
                      [in] unsigned long StopPosition,
                      [in, out] long * Cookie
                      ) 
HRESULT StartPlayingFileRX(
                      [in] BSTR Filename,
                      [in] unsigned long Mode,
                      [in] unsigned long StartPosition,
                      [in] unsigned long StopPosition,
                      [in] unsigned long waveoutDeviceID,
                      [in, out] long * Cookie
                      ) 
HRESULT StopPlayingFileTX(
                      [in] unsigned long Cookie
                      ) 
HRESULT StopPlayingFileRX(
                      [in] unsigned long Cookie
                      ) 
HRESULT StartTX(unsigned long waveinDeviceID) 
HRESULT StopTX() 
HRESULT StartRX(unsigned long waveoutDeviceID) 
HRESULT StopRX() 
HRESULT SetSpeakerVolume(
                      [in] unsigned long volume
                      ) 
HRESULT SetMicrophoneVolume(
                      [in] unsigned long volume
                      ) 
HRESULT SetFilePlayVolume(
                      [in] unsigned long cookie,
                      [in] unsigned long volume
                      ) 

Events From WINRTP

WINRTP not only exposes a COM interface, it also has the ability to fire events to the component that using WINRTP. This is done through the standard Connection Point mechanism. For information on connection points read a book on COM and ATL (Active Template Library). The basic idea is that WINRTP describes a COM interface for receiving events. If a component implements that COM interface, then it can subscribe itself as a listener of events generated by the WINRTP.

The events interface (ICCNMediaTermEvents) is as follows

HRESULT EndOfFileEventRX(
                      [in] long Cookie
                      )
HRESULT EndOfFileEventTX(
                      [in] long Cookie
                      )

Details of WINRTP Interface (Interface ICCNMediaTerm)

All methods in the interface return an HRESULT value. If the method succeeds, they return 0, otherwise a negative number for failure. The return values are changing L so the recommended way to debug any function failures is to use the trace mechanism (i.e. turn on tracing for the WINRTP, and look at the trace file which includes a description of the error that caused the negative return value. If problems persist, contact the developer of WINRTP for details/help.In some cases, some of the important return values may be discussed for a function, but not for all functions

HRESULT Initialize()

Description:

Initializes the WINRTP. Instantiates all components. Also sets up default codecs using the following calls:

This function must be called before any other calls to WINRTP

Parameters:

None

HRESULT UnInitialize()

Description:

Uninitializes the WINRTP and releases all allocated resources. This must be the last call made to the WINRTP

Parameters:

None

HRESULT StartMicrophone()

Description:

No implemented/needed in this version

Parameters:

None

HRESULT StopMicrophone()

Description:

No implemented/needed in this version

Parameters:

None

HRESULT StartAudioReceive()

Description:

No implemented/needed in this version

Parameters:

None

HRESULT StopAudioReceive()

Description:

No implemented/needed in this version

Parameters:

None

HRESULT SetAudioCodecRX

Description:

Call this function to inform WINRTP of the audio codec used to encode the incoming RTP stream. This function may be called before StartRX is called. (so you may need to call StopRX before making this call). If called before StartRX is called, it sets the codec for the next invocation of StartRX. If it is called while receiving audio (i.e. after StartRX) it may return an error.

Parameters:

[in] long CompressionType: The following values are supported

[in] long MillisecPacketSize: Specifies the length of audio in each incoming RTP audio packets

[in] long EchoCancellationValue: Ignored. Put any value here. Echo cancellation is not supported in the WINRTP

[in] long G723BitRate: Ignored

HRESULT SetAudioCodecTX

Description:

Sets the audio codec for the transmit stream (outgoing stream). Should be called while NOT streaming (i.e. before StartTX/after StopTX)

Parameters:

[in] long CompressionType: See SetAudioCodecRX

[in] long MillisecPacketSize: See SetAudioCodecRX

[in] long PrecedenceValue: Ignored

[in] long SilenceSuppression: Specifies whether to do silence suppression in the transmit stream

[in] unsigned short MaxFramesPerPacket: Ignored

[in] long G723BitRate: See SetAudioCodecRX

HRESULT SetAudioDestination

Description:

Sets the destination [IP Address, UDP Port] where the send side audio stream should be transmitted. Must be called while not streaming (i.e. Before StartTX/after StopTX).

Parameters:

[in] BSTR strHostName: IP address of the destination. E.g. “171.69.12.34”

[in] long nUDPortNumber: UDP port number where to send the stream

HRESULT SetAudioReceivePort

Description:

Informs the WINRTP of the UDP port number where it should listen for the incoming RTP audio stream. Note: StartAudioReceive must be called before any audio from the incoming stream is played to the speaker.

Parameters:

[in] long nUDPPortNumber: UDP port number

HRESULT StartPlayingFileTX

Description:

This method should be used when a WAV file needs to be transmitted. The audio from the file is mixed in with the outgoing audio stream (user’s voice). The WINRTP fires an event to let the caller know when the file has finished playing, so that another file may be played. If the file finished playing, the WINRTP automatically calls StopPlayingFileTX so the caller need not call it. Only one file may be playing at a time. If this function is called while another file is playing already, an error is returned and the original file keeps playing. The function returns an unique identifier (cookie) that may be used in later calls related to this file play (to set the volume, or stop it from playing any more). This method can also play the file in a loop continuously without stopping. By default, files start playing at 25% volume.

Parameters:

[in] BSTR Filename: the location (path) of the file to be played

[in] unsigned long Mode: specifies whether to play the file once or in a loop

[in] unsigned long StartPosition: unimplemented/ignored

[in] unsigned long StopPosition: unimplemented/ignored

[in, out] long * Cookie: WINRTP returns a unique ID for this instance of the file being played. This cookie should be used in later calls pertaining to the instance of the file playing

HRESULT StartPlayingFileRX

Description:

This function starts mixing audio from the specified file to the received audio stream, so that the user hears audio from both the incoming audio stream and the file. The only difference is that we can have two files playing simultaneously in the receive side instead of one. By default, files start playing at 25% volume.

Parameters:

Exactly the same as StartPlayingFileTX, but with another extra parameter [in] unsigned long waveoutDeviceID: specifies which speaker device to play the file to. WinRTP now allows you to play the file using the wave/speaker device opened for audio (with StartRX) or to another wave/speaker device. sometimes it may be useful to play a file locally to another audio device (for e.g. if you are using a USB headset for speech, you may want to play ring tones for incoming calls through the speakers connected to the sound card so that it is heard loudly). See StartRX for a discussion on waveoutDeviceID

HRESULT StartTX

Description:

Starts streaming on the transmit side. This method must be called before StartPlayingFileTX is called. Calling this method starts transmitting the user’s voice

Parameters:

unsigned long waveinDeviceID: specifies which audio device to use for audio capture/recording. device ID's are numbered 0...(#of recording audio devices-1), and -1 means use default audio device for windows. Check out waveInOpen() and waveInGetDevCaps() in the windows API. If you are confused, -1 actually means (unsigned long) –1.

HRESULT StartRX

Description:

Sets up WINRTP to start the receive side. It also starts playing the received audio to the speaker. 

Parameters:

unsigned long waveoutDeviceID: specifies which audio device to use for playback/speaker. These device ID's are numbered 0 ... (# of playback devices - 1), and -1 means use the default playback device. Check out waveOutOpen() and waveOutGetDevCaps() functions in the windows API.. If you are confused, -1 actually means (unsigned long) –1.

HRESULT StopTX

Description:

Stops transmitting audio. Stops transmitting the user’s voice and files.

Parameters:

None

HRESULT StopRX

Description:

Stops receiving and playing audio. Stops playing the received audio stream and the files

Parameters:

None

HRESULT SetSpeakerVolume

Description:

Sets the speaker volume on the PC. This setting sets the WAVEOUT volume of the system (not the master volume).

Parameters:

[in] unsigned long volume: value between 0 and 100 where 0 = silence, and 100 = max volume. The scale is linear, so 50 = half volume

HRESULT SetMicrophoneVolume

Description:

Sets the microphone volume. This setting changes the PC’s microphone volume or audio capture volume.

Parameters:

[in] unsigned long volume: value between 0 and 100 where 0 = silence, and 100 = max volume. The scale is linear, so 50 = half volume

HRESULT SetFilePlayVolume

Description:

Sets the volume of a file being played by the WINRTP.

Parameters:

[in] unsigned long cookie: the cookie that pertains to this instance of the file play. The cookie is obtained when StartPlayingFileTX(or RX) is called

[in] unsigned long volume: Volume setting. Starts from 0 (silence) to 100 (max volume)

HRESULT StopPlayingFileTX

Description:

Stops a file being played in the transmit side

Parameters:

[in] unsigned long Cookie: Cookie that was returned when the file started playing.

HRESULT StopPlayingFileRX

Description:

Stops a file being played in the receive side

Parameters:

[in] unsigned long Cookie: Cookie that was returned when the file started playing.

COM GUIDs, etc

The important GUIDs are

The following code snippet may be useful for more information

WINRTP Interface

// CCNSMT.idl : IDL source for CCNSMT.dll
//

// This file will be processed by the MIDL tool to
// produce the type library (CCNSMT.tlb) and marshalling code.

import "oaidl.idl";
import "ocidl.idl";
[
object,
uuid(94221C4D-00F1-11D4-9D59-0060B0FC246C),

helpstring("ICCNMediaTerm Interface"),
pointer_default(unique)
]
interface ICCNMediaTerm : IUnknown
{
[helpstring("method Initialize")]
HRESULT Initialize();
[helpstring("method UnInitialize")]
HRESULT UnInitialize();
[helpstring("method StartMicrophone")] 
HRESULT StartMicrophone();
[helpstring("method StopMicrophone")] 
HRESULT StopMicrophone();
[helpstring("method StartAudioReceive")] 
HRESULT StartAudioReceive();
[helpstring("method StopAudioReceive")] 
HRESULT StopAudioReceive();
[helpstring("method StopDtmfTone")] 
HRESULT StopDtmfTone();
[helpstring("method SetAudioCodecRX")] 
HRESULT SetAudioCodecRX([in] long CompressionType, [in] long MillisecPacketSize, [in] long EchoCancellationValue, [in] long
G723BitRate);
[helpstring("method SetAudioCodecTX")] 
HRESULT SetAudioCodecTX([in] long CompressionType, [in] long MillisecPacketSize, [in] long PrecedenceValue, [in] long
SilenceSuppression, [in] unsigned short MaxFramesPerPacket, [in] long G723BitRate);
[helpstring("method SetAudioDestination")] 
HRESULT SetAudioDestination([in] BSTR strHostName, [in] long nUDPortNumber);
[helpstring("method SetAudioReceivePort")] 
HRESULT SetAudioReceivePort([in] long nUDPPortNumber);
[helpstring("method StartDtmfTone")] 
HRESULT StartDtmfTone([in] long cToneAsChar, [in] long OnTime, [in] long OffTime);
[helpstring("method StartPlayingFileTX")] 
HRESULT StartPlayingFileTX([in] BSTR Filename, [in] unsigned long Mode, [in, out] long * Cookie);
[helpstring("method StartPlayingFileRX")] 
HRESULT StartPlayingFileRX([in] BSTR Filename, [in] unsigned long Mode, [in] unsigned long waveoutDeviceID, [in, out] long *
Cookie);
[helpstring("method StopPlayingFileTX")] 
HRESULT StopPlayingFileTX([in] unsigned long Cookie);
[helpstring("method StopPlayingFileRX")] 
HRESULT StopPlayingFileRX([in] unsigned long Cookie);
[helpstring("method StartTX")] 
HRESULT StartTX([in] unsigned long waveinDeviceID);
[helpstring("method StopTX")] 
HRESULT StopTX();
[helpstring("method StartRX")] 
HRESULT StartRX([in] unsigned long waveoutDeviceID);
[helpstring("method StopRX")] 
HRESULT StopRX();
[helpstring("method SetSpeakerVolume")] 
HRESULT SetSpeakerVolume([in] unsigned long deviceID, [in] unsigned long volume);
[helpstring("method SetMicrophoneVolume")] 
HRESULT SetMicrophoneVolume([in] unsigned long deviceID, [in] unsigned long volume);
[helpstring("method SetFilePlayVolume")] 
HRESULT SetFilePlayVolume([in] unsigned long cookie, [in] unsigned long volume);
[helpstring("method NetworkMonitor")] 
HRESULT NetworkMonitor([in] unsigned long Enable, [in] unsigned long DurationMillisec);
};


[
uuid(94221C4F-00F1-11D4-9D59-0060B0FC246C),
helpstring("_ICCNMediaTermEvents Interface")
]
interface _ICCNMediaTermEvents : IUnknown
{
[helpstring("method EndOfFileEventRX")] 
HRESULT EndOfFileEventRX([in] long Cookie);
[helpstring("method EndOfFileEventTX")] 
HRESULT EndOfFileEventTX([in] long Cookie);
[helpstring("method NetworkMonitorEventRX")] 
HRESULT NetworkMonitorEventRX([in] double RXMean, [in] double RXVariance);
[helpstring("method NetworkMonitorEventTX")] 
HRESULT NetworkMonitorEventTX([in] double TXMean, [in] double TXVariance);
};


[
uuid(94221C40-00F1-11D4-9D59-0060B0FC246C),
version(1.0),
helpstring("CCNSMT 1.0 Type Library")
]
library CCNSMTLib
{
importlib("stdole32.tlb");
importlib("stdole2.tlb");

[
uuid(94221C4E-00F1-11D4-9D59-0060B0FC246C),
helpstring("CCNMediaTerm Class")
]
coclass CCNMediaTerm
{
[default] interface ICCNMediaTerm;
[default, source] interface _ICCNMediaTermEvents;
};
};

Type Library

[
uuid(94221C40-00F1-11D4-9D59-0060B0FC246C),
version(1.0),
helpstring("CCNSMT 1.0 Type Library")
]
library CCNSMTLib
{
importlib("stdole32.tlb");
importlib("stdole2.tlb");

[
uuid(94221C4E-00F1-11D4-9D59-0060B0FC246C),
helpstring("CCNMediaTerm Class")
]
coclass CCNMediaTerm
{
[default] interface ICCNMediaTerm;
[default, source] interface _ICCNMediaTermEvents;
};
};

Sample C++ Code

Using the type library generated while compiling WinRTP (CCNSMT.tlb) one can easily use WinRTP in code. Visual C++ 6.0 allows importing a type library in the #import command, as the following sample code shows. Note that you cannot import WinRTP as a COM object into your project because it is NOT and ActiveX control nor does it support IDispatch.

#import "../CCNMediaTerm/CCNSMT/CCNSMT.tlb" no_namespace, raw_interfaces_only

int main()
{
HRESULT hr;

// Initialize COM
hr = CoInitialize(NULL);

// Get Interface ICCNMediaTerm from the WinRTP COM Object using smart pointer defined by the #import command above. 
// Automatically calls IUnknown::AddRef();
ICCNMediaTermPtr pICCNMediaTerm(__uuidof(CCNMediaTerm));

// Initialize WinRTP. Must be the first call
pICCNMediaTerm->Initialize();

// Set parameters for receive side
pICCNMediaTerm->SetAudioCodecRX(4, 20, 0, 0);
pICCNMediaTerm->SetAudioReceivePort(8500);

// Set parameters for transmit side
pICCNMediaTerm->SetAudioCodecTX(4, 20, 0, 0, 0, 0);
pICCNMediaTerm->SetAudioDestination(L"127.0.0.1", 8500);

// Start reception side. we will use the default (-1) playback device
pICCNMediaTerm->StartRX(-1);

// Start transmit side. we will use the default (-1) recording device
pICCNMediaTerm->StartTX(-1);

// Set the speaker volume to 50%
pICCNMediaTerm->SetSpeakerVolume(-1, 50);

// Set the microphone volume to 50%
pICCNMediaTerm->SetMicrophoneVolume(-1, 50);

// Hear yourself for 5 seconds
Sleep(5000);

// Stop reception & transmission
pICCNMediaTerm->StopRX();
pICCNMediaTerm->StopTX();

// Unitialize WinRTP. Must be the last call
pICCNMediaTerm->UnInitialize();
// Let go of the reference to the ICCNMediaTerm interface. Automatically calls IUnknown::Release()
pICCNMediaTerm = 0;

// Uninitialize COM
CoUninitialize();

return 0;
}

Configurable Parameters

The configurable parameters of WINRTP are mostly set using the registry. The registry key for these settings is HKEY_LOCAL_USER\Software\Cisco Systems\CCNMediaTerm\1.0. If these entries do not exist in the registry, then the WINRTP creates them automatically with the default values the first time it needs to use them.

Static/Dynamic Jitter Buffer

Set the UseDynamicJitterBuffer registry entry to “true” to use dynamic jitter buffer algorithm for audio reception. Set it to “false” to use static jitter buffer (like the old version of winrtp)

Jitter Buffer Length

This value is relevant only if static jitter buffer is being used. The length of the jitter buffer can be specified using the JitterBufferTime registry setting. This setting is in milliseconds. The default value is 180, but lower values work on most computers. At the beginning of each talk spurt, the WINRTP fills x milliseconds of audio in the jitter buffer (where x is the value of the JitterBufferTime registry setting) before it starts playing it to the speaker. Higher jitter buffer length provides smoother audio and immunity to network problems, but increases the latency in a two-way conversation. But lowering this value too much can lead to bad quality audio (stuttering or jittery audio) in which case the user should try to increase this setting. The optimal value is very dependent on the configuration of the PC (sound card and drivers, operating system, etc.), so it should be set on a per-computer basis. The default value of 180 works on majority of computers (lower values may work too).

Try the following (Windows 2000/XP : 60ms, WinNT 4.0 : 120ms, Win 95/98/ME : 180ms

Type of Service Value (DiffServ Byte) of Outgoing RTP Packets

The WINRTP can stamp outgoing RTP packets with an IP TOS (type of service) value in the IP header. This is important for QoS purposes where packets of a certain TOS may be given priority in the network to reduce delay. To do this, you need to change the value in the RtpOut filter project (RtpOut.cpp)

Fixed Transmit Port for UDP Packets

If you want to use a particular local UDP port to transmit RTP streams, set the UseFixedTransmitPort to “true” and set the TransmitPort registry entry to the port number you want to use. Otherwise, set UseFixedTransmitPort to “false”. Note the receive and transmit port cannot be the same. Make sure transmit port != receive port, and transmit port != (receive port + 1)

Pre-emphasis of Transmitted Audio

To do pre-emphasis of transmitted audio to make it sound sharper, set the MicrophonePreprocess registry entry to “true” (“false” otherwise) and then set the TxFIRFilter registry entry to either “1” or “2”. This chooses between a set of parameters to set up an FIR filter to do pre-emphasis of the audio. Experiment to see which setting sounds best

Post-emphasis of Received Audio

To do post-emphasis of received audio to make it sound sharper, set the SpeakerPostprocess registry entry to “true” (“false” otherwise) and then set the RxFIRFilter registry entry to either “1” or “2”. This chooses between a set of parameters to set up an FIR filter to do post-emphasis of the audio. Experiment to see which setting sounds best

Volume Limiting

Sometimes the received audio may be too loud and you may want to do volume limiting to reduce the max volume. In that case, set the LimitVolume registry entry to “true” (“false” otherwise). This will turn on the volume limiting feature. To control the behavior of the limiter, there are three registry settings: LimiterThreshold (default –8.0), LimiterLossIncrement (default 0.075), LimiterLossDecrement (default –0.00075). Setting the threshold lower (for e.g. to –25.0 instead of –8.0) will limit audio to a lower volume. I recommend against playing around with the other parameters.

Sample Code

The following sections describes through an example how to use the Media Term component. Here are the basic steps

  1. Initialize COM (CoInitializeEx) 
  2. Instantiate the Media Term Component and get the ICCNMediaTerm COM interface (Say the variable is CCNMediaTerm) using
  1. Initialize the WINRTP 
  1. 4.Transmit Side 
  1. Receive Side 
  1. Uninitialize the WINRTP and release all resources 
  1. 7.Uninitialize COM if needed using CoUninitialize 

I plan to release some sample C++ code to show how to use this component soon

Future Improvements

Some of the future improvements that are being considered are


[ Vovida.org |Applications | WinRTP Download ]
www@vovida.org