As I have mentioned on this blog before, I help do some work for a fairly large enterprise call center. One of the recent initiatives was to move from our archaic IBM TTS to Nuance TTS and ASR. Everything tested out well. We had 40 licenses on the old TTS, so we bought 40 licenses for the new one thinking that would be that.

We cut over and calls were failing everywhere! The Nuance logs showed that we were running over the license limit by a lot, and it wasn’t even our busy hour. A little research and we found out that CVP holds a TTS session open for as long as the call is parked on the VXML gateway, which covers the entire time in the self-service app plus any queuing. That means you need a license for every concurrent call, not just while it’s saying TTS. That’s not how our old system worked, and it’s not how Nuance works with UCCX.

TAC seemed to agree that it was supposed to work this way, and I found the following document which confirmed our concerns: Read it here

It looked like we were screwed. This would mean buying hundreds of extra Nuance licenses at ~$2000 a pop. No small order.

We at Cloverhound aren’t ones to take things lying down, so we set out solve the problem without the customer spending six figures.

Technical Deep Dive

Nuance (and pretty much any TTS/ASR product) runs on a protocol named MRCP, which comes in two flavors: MRCPv1 and MRCPv2. MRCPv1 works on top of RTSP, while MRCPv2 works on top of SIP. I’m going to focus on MRCPv1 since that’s the default and it’s what we’re using for this customer.

Ideally an MRCP session is started when TTS is needed, and as soon as TTS is done the client (VXML gateway in this case) sends an RTSP TEARDOWN request to end the session. A license is only used for the lifetime of the session, not the lifetime of the call. That’s the theory anyways. Let’s see what’s actually happening by taking a look at Nuance now.
We’re running Nuance Vocalizer 6 on Windows, which by default places it’s log files in the folder: C:\ProgramData\Nuance. From here we can dig down to the individual call logs. Note that each call makes a separate log file.

You can see the folder here

log dir

If we open a call we see this..

TIME=20150312115928435|CHAN=FFABLHOAAAAFLKMFAAAAAADN|EVNT=SWIclst|VALU=Session FFABLHOAAAAFLKMFAAAAAADN started|SRC=NSS|UCPU=0|SCPU=0
TIME=20150312115928435|CHAN=FFABLHOAAAAFLKMFAAAAAADN|EVNT=SWIfrmt|ENCD=UTF-8|UCPU=0|SCPU=0
TIME=20150312115928442|CHAN=FFABLHOAAAAFLKMFAAAAAADN|EVNT=NVOCliss|LUSED=83|LMAX=200|OMAX=200|LFEAT=tts
TIME=20150312115928499|CHAN=FFABLHOAAAAFLKMFAAAAAADN|EVNT=NVOCsyst|LANG=American English|VOIC=Ava|VMDL=full_encryptf8|FREQ=8000|PVER=5.2.3|LVER=5.2.3.0000|VVER=5.2.3.12291|APNM=
TIME=20150312115928505|CHAN=FFABLHOAAAAFLKMFAAAAAADN|EVNT=NVOCaudf|SAMP=8026|FREQ=8
TIME=20150312115928506|CHAN=FFABLHOAAAAFLKMFAAAAAADN|EVNT=NVOCsynd|INPT=106|DURS=1003|RSTT=ok
TIME=20150312115928506|CHAN=FFABLHOAAAAFLKMFAAAAAADN|EVNT=NVOCinpt|MIME=application/synthesis+ssml|TXSZ=106|TEXT=
TIME=20150312120047321|CHAN=FFABLHOAAAAFLKMFAAAAAADN|EVNT=NVOClise|LUSED=60|LMAX=200|OMAX=200|LFEAT=tts
TIME=20150312120047420|CHAN=FFABLHOAAAAFLKMFAAAAAADN|EVNT=SWIclnd|VALU=Session FFABLHOAAAAFLKMFAAAAAADN ended|SRC=NSS|UCPU=0|SCPU=0
TIME=20150312120047420|CHAN=FFABLHOAAAAFLKMFAAAAAADN|EVNT=NUANtnat|TNAT=Nuance|UCPU=0|SCPU=0

Here we see a few things. First thing that confused me was that we see 2 license counts. There is a good explinations..

  • EVENT NVOCliss = the count at the start of the call
  • EVENT NVOClise = the count at the end of the call

Now if we look at the timestamps of messages, the session lasted nearly 90 seconds. It was not speaking TTS for 90 seconds. What it is doing is holding the license for the length of the VXML leg. Cisco’s MRCPv1 deep dive below gives us a step-by-step of what’s going on here, and confirms it is actually expected:

Read it here

Call Flow Example
This section describes the call flow that results from this configuration example.

1. An ISDN call arrives at the PSTN / VXML Gatway across T1 PRI 3/0.
2. IOS Gateway matches POTS dial-peer 1 as the inbound dial-peer for this call.
3. IOS Gateway hands off the call control to the Pharmacy service that is associated to dial-peer 1.
4. The CVP VXML / TCL script associated with the Pharmacy service sends an HTTP GET request to the VXML server.
5. The VXML server returns a 200 OK response. This response contains a VXML document or page.
6. IOS Gateway executes the VXML document.
7. If the VXML document specifies a URL for an audio prompt, IOS Gateway downloads the audio file and plays the prompt.
8. If the VXML document specifies a Text for an audio prompt, IOS Gateway establishes a RTSP session with rtsp://10.86.177.39/synthesizer (TTS Server). After the RTSP session is established, the Gateway and TTS Server exchange MRCP messages such as SPEAK, SPEAK-COMPLETE by using the RTSP ANNOUNCE request.
The TTS Server sends the G.711ulaw RTP audio stream to the IP address and UDP port number provided by the Gateway in the “Transport” Header of the RTSP SETUP request.

9. If the VXML document specifes the Gateway to recognize DTMF digits and spoken words, IOS Gateway establishes a RTSP session with rtsp://10.86.177.39/recognizer (ASR server). After the RTSP session is established, the Gateway and ASR server exchange MRCP messages such as DEFINE GRAMMAR, COMPLETE, RECOGNIZE, RECOGNITION-COMPLETE by using the RTSP ANNOUNCE request.
The IOS VXML Gateway sends the G.711ulaw RTP audio stream to the IP address and UDP port number provided by the ASR in the SDP of the RTSP 200 OK response. The IOS VXML Gateway sends the digits entered by the PSTN user as RTP-NTE events to the ASR server.
10. After the execution of the VXML document, the Gateway sends an HTTP POST request (with a set of parameters) as specified in the tag of the VXML document or page.
11. Steps 6 – 10 occur for each VXML document sent by the server.
12. When the VXML Application finishes the service provided to the caller, it sends a VXML document with just a tag within the element.
13. IOS Gateway disconnects the MRCPv1 sessions established with the TTS and ASR servers.
14. IOS Gateway disconnects the call on the ISDN side.

If you have a 5 minute self service app that says some name like “William Jones” at the beginning, exits the app and sits waiting in queue for 10 minutes – that license will be held a total of 15 minutes. Complete absurdity.

Cisco does publish a workaround to make this slightly better:

Workaround to Release ASR/TTS License in Queuing
You can, however, force the licenses to be released by causing the call to be removed from the VoiceXML gateway and then redelivered there as a new VRU leg call.
Removing it from the VoiceXML gateway releases the ASR and TTS licenses, and redelivering the call makes it immediately available to play queue prompts again, but this time without ASR and TTS licenses.

You can accomplish this result by transferring to a bogus label causing a re-query to ICM and placing an explicit SendToVRU node or TranslationRouteToVRU node ahead of the Queue node in ICM scripting to release the ASR/TTS licenses.

The issue for us is that all of our TTS is used in large self-service CVP studio apps, and this would mean the call would need to leave the studio app, thats impossible for us.

A workaround from the Nuance side

One thing to know is by default Nuance ships with a 60 second idle session timeout. If a TTS session is up for 60 seconds with no activity, Nuance kills the connection. Could this solve our problem?

We tried setting this down to 5 seconds, and it sorta works. Nuance does indeed kill the session after 5 seconds and clear the license. The catch is the VXML gateway is not cool with this and any subsequent TTS request fails. Ouch.

Worse, this means we need to increase that time out to cover the entire length of our calls, which makes things worse. Seems like we’re moving in the wrong direction here …

The key to an actual workaround that works around the “workarounds”

I hope you all enjoy the title of this section, because we have to workaround the workaround. Cisco’s workaround helps of course, but not nearly enough. If a script plays 5 seconds of TTS, it shouldn’t need more than ~5 seconds of TTS licensing. Hopefully Cisco and Nuance will come to the table with a real fix.

Anyways I was searching the web, and found this gem on the excellent CVP developers forum:
https://communities.cisco.com/docs/DOC-50194

(Incidentally Janine Graves is a really smart woman, and everyone can learn from the goodness going on over there. Better yet, take her CVP class and learn from her directly.)

Now the post is actually about how to achieve a redundancy with CVP Studio apps and ASR/TTS but little did they knew they were unlocking the key to this 8 year old problem.

The Hypothesis

Janine talks about how to switch ASR servers in the middle of a studio app, she says you can set a VXML variable on the Audio TTS element in order to switch to a new TTS server. This was a key, you could have more than one RTSP session per app!! From Janine…

“You can set the Vxml Properties either in the root document (as you mention) so it lasts during the entire application. Or if necessary, in the Settings tab of any Voice Element (but this will only last for the duration of that element) – so if you need to change ASR servers in the middle of an application, then you’ll need to set this property in EVERY voice element’s Settings tab. Look in the IOS VXML Programming Guide for the information to list in the URL. It’ll match what is usually entered into the Gateway Configuration. For example: Name: com.cisco.asr-server Value: rtsp://NuanceServer/recognizer Perhaps it’ll be best to do it within each Voice Element and use a variable within the value rtsp://{Data.Session.NuanceServer}/recognizer – then you can just change the value of your Session variable when necessary”

So what if we could just trick the CVP app to think every TTS request was going to a new server, and have Nuance terminate the old sessions using Nuance session timers? Since the MRCP URL can use hostnames, I figured I could just change the hostname to something new each time TTS plays, while still pointing each of these hostnames to the same IP address.

The Plan

To get this working, first we create multiple ip host entries in the VXML gateways, all pointing to the same Nuance server. Like so:

ip host nuance1 10.1.1.1
ip host nuance2 10.1.1.1
ip host nuance3 10.1.1.1
ip host nuance4 10.1.1.1

We then use an audio element with the com.cisco.tts-server VXML property set to rtsp://nuance:4900/synthesizer (We have port 4900 here because we are using the Nuance default rather than the Cisco default). We use a counter in the script to increment the server number every time we finish a TTS step. This way every request for TTS has unique hostname.

We then set the Nuance MRCP session idle timeout to 3 seconds instead of 60, to let nuance do the cleanup for us, since we would only ever use an MRCP session once. If we didn’t do this, we just be opening up several MRCP sessions for every call and making things much much worse.

The configuration file with the timeout setting is located on nuance in the NSSserver.cfg located as seen below.

Nuance-SS-Config-Folder

We will be changing the to the following values. By default everything will use MRCPv1, but just in case ☺

server.mrcp1.rtsp.sessionTimeout VXIInteger 3000
server.mrcp2.rtsp.sessionTimeout VXIInteger 3000

Creating

Below is the outline of the CVP Studio app I have created to test our hypethesis..

App-TTS

As you can see here, we have use the value builder to easily create the dynamic hostname using the value of the counter.

App-TTS-VXML-Property

Lastly, we have a .25 second silence wav file since it looks like the first audio element following a TTS step is also sent to Nuance (not sure exactly why). If we skip this step, the license could be help as long as the next audio file is playing. Not a major hit on licensing but better to avoid it.

App-TTS-Release

Here is a copy of our studio app for testing. DOWNLOAD HERE

Verification

To see the difference you just copy app we built and remove the VXML property (com.cisco.tts-server) on the Studio elements. The app will fail on the second loop.

If we upload the app we built above the TTS will play as we expect on every loop. If we look at a call log, we can see a vast difference in the time license is held:

TIME=20150312161756317|CHAN=FFACACIEAAAENJAIAAAAAAAA|EVNT=SWIclst|VALU=Session FFACACIEAAAENJAIAAAAAAAA started|SRC=NSS|UCPU=0|SCPU=0
TIME=20150312161756317|CHAN=FFACACIEAAAENJAIAAAAAAAA|EVNT=SWIfrmt|ENCD=UTF-8|UCPU=0|SCPU=0
TIME=20150312161756333|CHAN=FFACACIEAAAENJAIAAAAAAAA|EVNT=NVOCliss|LUSED=1|LMAX=40|OMAX=40|LFEAT=tts
TIME=20150312161756461|CHAN=FFACACIEAAAENJAIAAAAAAAA|EVNT=NVOCsyst|LANG=American English|VOIC=Ava|VMDL=full_encryptf8|FREQ=8000|PVER=5.2.3|LVER=5.2.3.0000|VVER=5.2.3.12291|APNM=
TIME=20150312161756480|CHAN=FFACACIEAAAENJAIAAAAAAAA|EVNT=NVOCaudf|SAMP=8192|FREQ=8
TIME=20150312161756481|CHAN=FFACACIEAAAENJAIAAAAAAAA|EVNT=NVOCaudn|SAMP=548|FREQ=8
TIME=20150312161756481|CHAN=FFACACIEAAAENJAIAAAAAAAA|EVNT=NVOCaudn|SAMP=8192|FREQ=8
TIME=20150312161757452|CHAN=FFACACIEAAAENJAIAAAAAAAA|EVNT=NVOCsynd|INPT=100|DURS=2116|RSTT=ok
TIME=20150312161757452|CHAN=FFACACIEAAAENJAIAAAAAAAA|EVNT=NVOCinpt|MIME=application/synthesis+ssml|TXSZ=100|TEXT= Test TTS Loop 3
TIME=20150312161802231|CHAN=FFACACIEAAAENJAIAAAAAAAA|EVNT=NVOClise|LUSED=0|LMAX=40|OMAX=40|LFEAT=tts
TIME=20150312161802292|CHAN=FFACACIEAAAENJAIAAAAAAAA|EVNT=SWIclnd|VALU=Session FFACACIEAAAENJAIAAAAAAAA ended|SRC=NSS|UCPU=0|SCPU=0
TIME=20150312161802292|CHAN=FFACACIEAAAENJAIAAAAAAAA|EVNT=NUANtnat|TNAT=Nuance|UCPU=0|SCPU=0

The license is only being held 6 seconds in this call! ☺

Conclusion

I wish I had nice things to say, but I don’t. It’s a mess to configure but hey, it can save you a ton on licensing costs. We’re also building some custom CVP Studio elements and applications to deal with this a more streamlined way. Drop us a line if you’re interested, or feel free to send us any questions at contact@cloverhound.com

I will be interested to see the response, I was posed this problem 2 days ago and in my investigation I’ve seen posts dating back 8 years with no solutions. Has anyone else seen this issue with CVP? How did you solve it?

Chad