Friday, May 16, 2008

Synchronizing Video, Text, and Graphics with SMIL

Today, video is all over the Web. Top networks and media companies now display your favorite shows online; from nail-biting dramas, high-scoring sports, and almost-real reality TV shows to classic feature films.

Apple TV and iTunes stream 720p high-definition (HD) video, and the video Web site has started to add high-definition videos using Adobe Flash Player 9.0 using H.264 encoding.

Synchronized Multimedia Integration Language (SMIL) is the W3C specification standard streaming media language that provides a time-based synchronized environment to stream audio, video, text, images and Flash files. The key to SMIL is its use of blocks of XML (eXtensible Markup Language).

Pronounced "smile," SMIL is an XML compliant markup language that coordinates when and how multimedia files play. Using SMIL, you can

* describe the temporal behavior of the presentation
* describe the layout of the presentation on a screen
* associate hyperlinks with media objects

SMIL players are client applications that receive and display integrated multimedia presentations. SMIL servers are responsible for providing content channels and serving presentations to clients. Although SMIL itself is an open technology, some of the players and servers use proprietary techniques to handle multimedia streaming and encoding.

A SMIL file (extension .smil) can be created with a text editor and be saved as a plain text output file. In its simplest form, a SMIL file lists multiple media clips played in sequence:

video src="rtsp://"
video src="rtsp://"

The master SMIL file is a container for the other media types. It provides the positions for the RealPix graphics files to appear and it starts and stops the video.

The master file is divided into three sections:

* Head: The head element contains information that is not related to the temporal behavior of the presentation. The "head" element may contain any number of "meta" elements and either a "layout" element or a "switch" element. The head contains the meta information, including copyright info, author of the page, and the title.

* Regions: The different regions, which are defined inside the REGION tags control the layout in the RealPlayer window.

* Body: The body of the SMIL file describes the order in which the presentations will appear. The PAR tags mean that the VideoChannel,

PixChannel and TextChannel will be displayed in parallel.

The regions are arranged in a layout similar to the cells in a table. The LEFT and TOP attributes control the position of the different regions along with HEIGHT and WIDTH attributes that specify their size. SMIL has many similarities to HTML, but also some important differences. The SMIL mark-up must start with a smil tag and end with the smil closing tag. All other mark-up appears between these two tags.

A SMIL file can include an optional header section defined by head tags. It requires a body section defined by body tags. Attribute values, must be enclosed in double quotation marks. File names in SMIL must reflect the file name exactly. They can use upper, lower, or mixed case but must be identical with how it appears on the server. SMIL files are saved with the extension .smi or .smil.

The SMIL Sequential seq and Parallel par tags allow you to structure your media. Use the seq tag to play various clips in sequence. In the following, the second video clip begins when the first video clip finishes.

video src="videos/video1.rm"
video src="videos/video2.rm"

To play two or more clips at the same time use the par tag Here the video clip is playing while the text of the lyrics are scrolling in synchroniztion.

video src="videos/video1.rm"
textstream src="lyrics/words.rt"

When RealServer G2 streams parallel groups, it ensures that the clips stay synchronized. If some video frames don't arrive, RealServer either drops those frames, or halts playback until the frames do arrive. SMIL timing elements let you specify when a clip starts playing and how long it plays. If you do not set timing event, the clips start and stop according to their normal timelines and their positions within par and seq groups. The easiest way to designate a time is with shorthand markers of h, min, s, and ms.

For more information about technology innovations and Web video see the following references.


Alesso, H. P. and Smith, C. F., Connections: Patterns of Discovery John Wiley & Sons, Inc. 2008.

Alesso, H. P. and Smith, C. F.,
e-Video: Producing Internet Video as Broadband Technologies Converge (with CD-ROM) Addison-Wesley, 2000.

Web Site:
Video Software Lab

No comments: