Understanding the Lync Video Interoperability Server (VIS)

March 3rd, 2014 | Tags:

Last year at Lync Conference 2013 Microsoft announced that they were going to introduce a new video interoperability component to Lync vNext, namely the “Video Interoperability Server” or (VIS). During the conference Mike Stacy, Jeff Schertz and I presented a session on Lync 2013 video interoperability, during the session we covered off these three pillars

Lync 2013 new video functionality
Microsoft’s SVC implementation
Modes leveraged for interop, specifically (Gateways vs. MCU vs. native integration)

At this year’s Lync Conference the keynote again visited the VIS component and Microsoft provided a public demo.

LyncConf14VISDemo

During the demo we can see a traditional room system (in this case a Tandberg/Cisco EX60) being added to a Lync multi-party conference and presence being reflected within the Lync client.

So architecturally how does this work and how does this fit into the overall Lync video interop landscape, the biggest question I hear is “does this replace the needs for Gateways, MCUs and the need for native Lync support?”

These are good questions and as much as I’d love to give you a one size fits all answer, I can’t – ultimately this is going to be based on how you use Lync, your investment in video endpoints, the dependencies you have on H.323 and standard SIP calling, the architecture of your environment, size and geographic location of your end-users. Dustin Hannifin and I re-visited Lync 2013 video interoperability at this year’s Lync Conference, we both re-visited the pillars above and included VIS as one of those options. I’m going to go over here what was discussed at that session (in finer detail), cognizant that VIS is still under wraps and some lower level detail is still yet to be disclosed by Microsoft.

First let’s start with the problem, when Lync 2013 was announced so was support for H.264 SVC, which led to some confusion, specifically “so now all my traditional video just works with Lync?” Of course this isn’t the case, whilst the H.264 SVC codec has far closer alignment with traditional video systems from a media perspective there’s still some outstanding work required to make things work on the signaling side.

Now we know that there’s not really such a thing as “standard SIP”, all UC call control platforms have their own flavors and SIP extensions, so with this in mind Microsoft’s SVC implementation follows suit.
Jeff Schertz wrote an article on Lync 2013 video interoperability that explains this is greater detail, but for those that have either not read this (or don’t have a few hours set aside :-)) the conclusion is that Microsoft’s implementation of H.264 SVC isn’t interoperable with AVC VTCs without the following:

a) Re-packetization of media – Microsoft’s SVC is Mode 1 (Temporal Scalability with Hierarchical P i.e. a single video stream is sent for each requested resolution that could be comprised of different frame rates, typically a stream with two layers is sent (one layer @ 15fps and another @ 15fps, this in turn would be capable of facilitating either 15fps or a cumulative 30fps).

However traditional video systems utilize H.264 AVC is Mode 0. Folks will argue that Mode 1 contains streams that are AVC compliant, but a non-SVC or AVC only room system isn’t going to be able to handle this Mode 1 stream without some modifications.

b) Signaling translation – as stated above there’s often this misnomer hanging over standard signaling, Lync 2013 and the new world of SVC compounds this further given that even more work is offloaded to the endpoints i.e. dynamic layout composition and multi-stream video intelligence.

Another interim puzzle that some early Lync 2013 adoptees experienced is whereby H.263 is being leveraged for point-to-point Lync calls, Lync 2013 dropped support for this dated CIF resolution capable codec. So the priority level for compatibility within the items above become high on the UC agenda.

The answer to this problem hasn’t changed largely speaking, we’re still in the world of Gateways, MCUs and endpoints that natively support Lync. VIS fits neatly into the Gateway category, VIS should be regarded as a Back-to-Back User Agent or (B2BUA), co-located either on your Lync Front End or deployed standalone in environments where additional scale is required.

B2BUAs aren’t new to Lync, in fact Session Border Controllers have effectively been acting in this capacity with Lync for voice, specifically performing signaling translation. However in this case more than just signaling is augmented, bit stream information also needs to be modified to effectively setup calls, it’s still lightweight work i.e. there’s no transcoding but due to this requirement both signaling and media needs to flow via the B2BUA is all scenarios.

VIS Blog Post

A B2BUA registered VTC has no Lync intelligence, it’s completely unaware of Lync and the B2BUA component is acting as a SIP proxy, tunneling all payloads. A Lync client would behave differently and where possible in point-to-point scenarios will always seek out the best media path, this in turn offers lower latency and in many cases gives more back to the network folks by avoiding WAN traversal.

As per the demo at Lync Conference other Lync clients will now see presence and be able to add the VTC to a Lync multi-party call but single click-to-join from the VTC isn’t possible as Lync Online Meeting translation/presentation is a whole other story. Without performing significant heavy lifting a B2BUA isn’t going to add Gallery View (we’re entering MCU territory here), so a Lync 2010 style will be on offer for the VTC (single speaker voice switched experience). In this brave new SVC world DSP driven MCUs are becoming less relevant, but a non-SVC VTC isn’t going to be able to handle multiple simulcast video streams, therefore no snazzy layouts here.

Another consideration is content, VTCs have standardized on H.239/BFCP it’s not clever stuff and is nowhere near as sophisticated or collaborative as what Lync has on offer (RDP, PSOM & WAC), but unlike the video payload there is no way today in which these can be interoperable without transcoding.

In conclusion, seeing Microsoft’s investment in VIS is a really good thing, but as always there are other mechanisms for interoperability which either negate the need for this work to be performed by the backend or facilitate easier ways of joining Lync conference calls whilst also preserving the Gallery View experience.

For more information on this I’d encourage you to watch our presentation (for now it’s only open to Lync Conference attendees, but this will go big bang in the coming weeks…)