HD video conferencing (High Definition) depends on the ability to send and receive data at higher speeds, and for the cameras and displays that comprise the video conferencing endpoints to capture and display the audio and visual data within HD Video standards.
Since it is important that all video conferencing system components match each others’ capabilities and specifications, a brief overview of standards and nomenclature, and how they evolved, may help.
As technology improves, video monitor manufacturers have been able to control increasing numbers of pixels per square inch of display surface. Video formats that could exploit these advances followed suit.
Although the terms “HD Video,” “HD Television,” or “HD Video Conferencing” primarily refer to image resolution, there are emerging standards that also feature increased frame rates.
Video Frames Per Second or Frame Rates
Early analog film and theater used frame rates between 16 and 24 frames per second (fps), eventually settling on the 24 fps as a cinematic standard.
Digital video enables the use of higher frame rates which reduces or eliminates blur in fast changing pixels. Higher frame rate videos retain visual clarity and convey more realistic experiences and visual detail to viewers.
Analog television and transmission systems set standards for video formats. In an analog television, electron beams paint every other horizontal line with each scan and cover the entire surface of the screen twice in the display of one frame. Each “half image” is a “field.” This is “interlaced” video. It works well with the optical illusion of fluid motion that human vision and perception create when viewing a series of still images in rapid succession.
Interlacing allows for data compression: A monitor can display an image in half its pixel density twice, in fields for 1/60th of a second (assuming a 30 fps frame rate). Each sweep of the surface area of the display has only has to reproduce half the data for a complete image, and our sense of vision perceives the full 30 fps.
However, interlacing produces a flicker effect, which can lead to eye strain for some. The flicker is very noticeable when old televisions or CRT monitors appear in video recordings like that of old NASA mission control footage. Also, the actual frame is never displayed in its entirety at one time, only the “fields” are. If you were to freeze motion on an interlaced display, you would not see a crystal clear image that would be ideal for a screen capture, you would only have a half-resolution field displayed.
Interlacing is relevant because some of the largest format HD monitors use it. Also, early large format plasma screens used it, and you might still find a couple of these kicking around in conference rooms.
The lowercase “i” in modern video specifications indicates the use of interlaced frame painting for that monitor.
60i has been part of the US NTSC television standard since 1941. The nomenclature 60i represents that the device scans the screen 60 times per second in an interlaced fashion, transmitting 30fps.
Progressive Video Scan
Progressive video displays create the image in one “progressive” scan of the entire screen, configuring every pixel with each sequential pass. The television and digital cinema industries primarily use three frame rate standards for recording and display: 24p, 25p, and 30p. The standards indicate the frames per second and the lowercase “p” indicates the progressive nature of the display. The lowercase “p” is also used to denote the number of vertical pixels on higher resolution displays, but we will cover that later in this article.
The 24p format matches the longtime standard 35mm film camera frame exposure rate of 24 frames per second. It is used to reproduce the “feel” of a movie shot on film. The 25p format matched the European PAL and French SECAM analog broadcast systems, and 30p matches the 60i NTSC US broadcast television standard. (I’m skipping irrelevant and distracting technical detail when I say the frame rates “match” the broadcast standards. They used to, but were modified slightly for color and sound signals.)
A new standard of 48p fps has been tested by some studios, but widespread adoption does not seem imminent.
Standard Definition Video
Standard Definition (SD), or “what we were incredibly used to” video resolution is 480i for NTSC and 576i for PAL and SECAM systems. In other words video was broadcast with either 480 or 576 interlaced horizontal lines of resolution in the US and Europe, respectively. In other words, screens were always 480 or 576 pixels “tall.”
The aspect ratio of 4:3 for NTSC implies that there are 640 columns of pixel width (480 x 4/3=640) in a standard frame. Standard Definition also has a “wide screen” variation with a 16:9 aspect ratio, which results in a 854 pixel width frame.
Screen dimensions are customarily listed with the width presented first (in pixels):
640 x 480 Standard Definition
854 x 480 Standard Definition wide screen.
SD video incorporates the refresh rates of 24, 25, or 30 frames per second as discussed.
HD and Ultra HD Video Screen Resolution
Pixel density for video monitors is now expressed in terms of the number of horizontal pixel lines, and is also designated with a lowercase “p,” not to be confused with progressive scan indicators. Aspect ratios or the number of pixel columns are used to express the full screen size.
The screen resolution will matter most to the video conference participants who are sitting closest to large displays. More pixels per given screen size avoids image graininess when viewed up close. With the exception of purpose-built rooms, most videoconferencing displays are mounted on a narrow wall of a rectangular conference room, relegating some participants to sitting close, adjacent to the screen, and some at the other end of a long conference room table.
The most common HD Screen dimensions are described by the number of vertical pixels, 720p, and 1080p with an assumed 16:9 wide screen aspect ratio. This results in the following “native” image sizes:
720p = 1280 x 720
1080p = 1920 x 1080
The recent Ultra HD standard, however, is known as “4K” for its nearly four thousand pixel width.
4K Ultra HD = 3840 x 2160
Various technologies have been used to increase the pixel density, and the physical size of the monitor you choose for your videoconferencing display(s) may or may not be able to accommodate your desired standard without extra video processing for HD content.
Bandwidth for HD Videoconferencing
The higher resolutions imply more data is required to create each frame. Let’s compare the number of pixels for the screen sizes discussed:
854 x 480 = 409,920 pixels
1280 x 720 = 921,600 pixels
1920 x 1080 = 2,457,600 pixels
3840 x 2160 = 8,294,400 pixels
Looking at the number of pixels comprising one frame makes it obvious that higher data rates are required to transmit and receive higher resolution video, especially considering a 24, 30, 60, or 120 frame per second frame rate for your video. However, the relationship between pixels per frame and the frame rate and the bandwidth required isn’t linear.
Video encoding and algorithms reduce the volume of data transmitted for each frame by predicting the next values of blocks of pixels. This works to the advantage of HD Videoconferencing in that the volume of data can be lower than that of a movie streamed in HD if there is not a lot of motion in camera’s field of view.
H.264 has become the standard encoding and compression algorithm to create MPEG 4 formatted video and all videoconferencing systems employ it. It achieved a 50% reduction in bandwidth usage over the earlier MPEG 2 standard. There are different extensions to the standard, though, and if any marketing materials claim advantages of using different extensions in their video conferencing systems, consider that all endpoints would have to uniformly employ them for their advantages to be realized. One example is Polycom’s H.264 High Profile capabilities, available on Polycom video conferencing endpoints. By contrast, most systems are built to the H.264 Baseline Profile.
Video conferencing systems differ from regular video content such as movies and video clips in that many elements in the field of view of the cameras are static. That is, the majority of the field of view of the videoconferencing cameras is static. There is usually not a lot of motion in a frame. The bandwidth rates for a given resolution reflect the maximum bandwidth the endpoint will generate.
When validating your infrastructure for HD videoconferencing, the additional real-time bandwidth demands of video conferencing have to be considered. In the simplest configuration, you only have to design for one endpoint in each location. In central locations or where you have multiple endpoints, the aggregate of additional bandwidth demand must be built into your network for the instance of simultaneous conferences.
Polycom published recommendations for bandwidth for their video conferencing systems. They published the following chart illustrating the difference in possible video quality between their High Profile Resolution and Baseline Resolution encoding for video conferencing.
Click here for the Video Conferencing Bandwidth chart image in a new tab.
As a point of comparison of video conferencing bandwidth to streaming video, the following download speeds are recommended by Netflix for playing movies and TV shows:
3.0 Megabits per second – Recommended for SD quality
5.0 Megabits per second – Recommended for HD quality
25 Megabits per second – Recommended for Ultra HD quality
Remember that your Internet connection upload speed may be different from each other if you depend on the Hybrid Fiber Coax infrastructure of Cable TV systems for your Internet access.
As noted in a previous post about Internet Video Conferencing, CATV Internet upload and download speeds differ due to the available bandwidth for channels. As a matter of fact, you can look at this page for Comcast’s different rate packages of upload and download speeds for business Internet for an example of this.
Videoconferencing Monitor/Display Screen Refresh Rates
Television monitors refresh the content on the screen at fixed rates, irrespective of the frame rate of the source video signal, whether from a television broadcaster or IP Video conference. You will achieve the highest quality if the refresh rate, given in cycles per second, or Herz (Hz) is a multiple of the video frame rate. No extra video processing to shift from one rate to another would be required.
Consider this scenario: Your source video conference camera sends720p video at 24p fps to the remote site. The display is refreshed at 60Hz, a common refresh rate. The monitor would display a frame 2.5 times before needing to display the next frame. Halfway through the third progressive scan, the values of all the pixels change (60 cycles per second / 24fps = 2.5). Displays have the ability to calculate and adapt to this difference in rates, but it can be perceived by some people.
While this mostly applies to viewing content captured on film, it is another parameter that can be aligned between endpoint components and video conference servers. In the previous example, if the monitor refresh rate were 120Hz, the display pixels would remain the same through five complete refresh cycles. The sixth refresh would display the next frame of the 24p fps content. No extra image processing would be required. Again, most people will not notice if there is any extra image processing “behind the scenes,” but if it can be optimized in the specification process, it may be worth considering.
References helpful in the creation of this article:
Netflix re: Encoding
Wikipedia re: H.264
Techradar about 4k TVs
4k.com on monitor resolution
Wikipedia frame rate
Netflix recommendations on Internet Bandwidth
Howstuffworks.com on Refresh Rate