![]() |
Collaborative Virtual Environments Contributions to MPEG-4 SNHC |
![]() |
Fabrice Vergnengre,
Tolga
K. Capin, Daniel Thalmann
EPFL, Computer Graphics Lab.
Introduction
- MPEG-4 Body Animation - Work
Summary -
Animating
different models - Coding - Deformations
- Reference sofware
Having a look at the past ten years, we may notice the fast convergence of many diverse domains :
The MPEG-4 standard,
initiated in 1995, aims at proposing tools for efficient coding of multimedia
scenes. Currently normalized for its first version, MPEG-4 version 2 will
be international standard in December of 1999. The first version proposed
an efficient coding of diverse kind of data :
The second version
adds Body Objects.
An MPEG Body must be viewed as a collection of Nodes. The top-level node, BodyNode, contains basically 2 nodes.
The first one, BAP (Body Animation Parameter) contains 296 parameters describing the topology of the skeleton. These BAPs can be applied to any MPEG-4 compliant body and produce the same animation. If we look at a skeleton as a hierarchy of joints related each other, the 3 possible degrees of rotation for a joint have each a BAP assigned to. A node called renderedBody, belonging to Body Node, contains the scene sub-graph describing the renderer body. A default body model is created by the decoder, so that it is possible to play an animation only containing BAPs.
If the source wants the decoder to display a specific body, this one must be sent in the BDP (Body Definition Parameters) Node. The specific body scene sub-graph replaces the default body model in renderedBody Node. MPEG4 permits also to send deformation tables for a proper rendering of the intersecting bodyparts.

" A body model is a representation of a virtual human or human-like character that allows portraying body movements adequate to achieve nonverbal communication and general actions.
A body model is animated by a stream of body animation parameters (BAP) encoded for low-bitrate transmission in broadcast and dedicated interactive communications. The BAPs manipulate independent degrees of freedom in the skeleton model of the body to produce animation of the body parts. The BAPs are quantized considering the joint limitations, and prediction errors are calculated and coded arithmetically. Similar to the face, the remote manipulation of a body model in a terminal with BAPs can accomplish lifelike visual scenes of the body in real-time without sending pictorial and video details of the body every frame.
The BAPs, if correctly interpreted, will produce reasonably similar high level results in terms of body posture and animation on different body models, also without the need to initialize or calibrate the model. The BDP set defines the set of parameters to transform the default body to a customized body optionally with its body surface, body dimensions, and texture.
The body definition parameters (BDP) allow the encoder
to replace the local model of a more capable terminal. BDP parameters include
body geometry, calibration of body parts, degrees of freedom, and optionally
deformation information. "
Contribution to the MPEG4-SNHC (Synthetic-Natural Hybrid Coding) and VRML standardization efforts has also been a fundamental importance to COVEN's activities. From the very start, the COVEN project has had standards embedded into its workplan as an integral part of its long term vision. Virtual human avatar representation plays the key role in this activity, as low-bitrate streaming of virtual human avatar data is an important requirement for large-scale CVEs.
The major technical issues for MPEG-4 body animation work within COVEN have been:
Animating different H-Anim models
An MPEG-4 FBA-compliant decoder is assumed to have a default body model. Body definition parameters (BDPs) allow this local model of the receiver site to be customized by a particular body model. It contains the following items:
* Body surface geometry
(with texture coordinates if texture is used)
* The body surface
geometry is downloaded using the 3D mesh transmission mechanism.
* Joint centre
locations
* Texture images
(optional)
After the FBA activities started, a working group, called H-Anim (Humanoid Animation) was formed within the VRML consortium to describe a standard representation of a virtual human. The goal of this working group was related to the MPEG-4 BDP specification, and it was important to have a good coordination to produce consistent standards. A considerable amount of effort was done to have joint meetings between the FBA and H-Anim groups to solve the technical problems arising.
Body animation parameters (BAPs) are used to modify the posture of the virtual body during animation. The same BAPs can be applied to different models in order to produce reasonably similar high-level results in terms of body posture and animation, without the need to initialize and calibrate the model. The BAP set contains the joint angles connecting different body parts; these include toe, ankle, knee, hip, spine (C1-C7, T1-T12, L1-L5), shoulder, clavicle, elbow, wrist and the fingers. The body contains a total of 186 degrees of freedom, including 25 degrees of freedom for each hand. In particular, the spine contains five levels of detail, and models can be constructed with varying complexity: a total of 9, 24, 42, 60, 72 degrees of freedom. Thus, the models can have variable complexity depending on the target application.
EPFL has exchanged
models and animation sequences to verify the result of animations on different
models. One of the core experiment sequences was a alphabet sequence for
sign language. (click to play)
Click
to animate MPEG movie
Click
to animate VRML 2 animation (requires VRML browser)After the standard parameters were discussed, the next step was deciding on a scheme to code these parameters efficiently. For BDP transmission, the syntax of a number of new body nodes was defined. For BAP parameters, an linear quantization scheme with arithmetic coding was chosen. The same elementary bitstream is shared by face and body models.
The BAPs are quantized and coded by a predictive coding scheme, similar to the FAPs. For each parameter to be coded in the current frame, the decoded value of this parameter in the previous frame is used as the prediction. Then the prediction error, i.e., the difference between the current parameter and its prediction, is computed and coded by arithmetic coding. This predictive coding scheme prevents the coding error from accumulating.[CH1] The arithmetic decoding process is described in detail in MPEG-4 FDIS Version 2.
Similar to FAPs,
each BAP has a different precision requirement. Therefore different quantisation
step sizes are applied to the BAPs. The base quantisation step size for
each BAP is defined in the tables below. The bit rate is controlled by
adjusting the quantisation step via the use of a quantisation parameter
(scaling factor). The magnitude of the quantisation parameter ranges from
0 to 31 (0 denotes no quantisation). Alternatively, the FAPs and BAPs can
also be coded by DCT coding scheme, more suitable for offline animation
sequences with blocks of 16 frames. (click to
play)
![]() |
Gesture sequence with quantization=8, bitrate=6
Kbits/s @ 30 fps
Gesture sequence with quantization=16, bitrate=3 Kbits/s @ 30 fps Gesture sequence with quantization=31, bitrate=1
Kbits/s @ 30 fps
|
![]() |
Talking sequence with quantization=8, bitrate=7
Kbits/s @ 30 fps
Talking sequence with quantization=16, bitrate=3 Kbits/s @ 30 fps Talking sequence with quantization=31, bitrate=2
Kbits/s @ 30 fps
|
![]() |
Tennis sequence with quantization=1,
bitrate=12 Kbits/s @ 30 fps
Tennis sequence with quantization=16, bitrate=4 Kbits/s @ 30 fps Tennis sequence with quantization=31,
bitrate=2 Kbits/s @ 30 fps
|
(click to play)
Upper
arm deformation test results (shaded)
Upper
arm deformation test results (wireframe)
Upper
leg deformation test results (shaded)
Upper
leg deformation test results (wireframe)
EPFL is involved
in the implementation of the MPEG-4 reference sofware, by integrating Body
Animation part into IM1 Player 3D. The Body Animation reference software
is able to :
The IM1 Core
permits to create a default body by a parse-and-construct process (for
the Joints and Segments) devoted to the creation of an IM1 hierarchy. The
hierarchy obtained after such process is shown by the following scheme
:

A dynamically created table of pointers to every Transform Node (defining a Joint) permits us to use the output of the EPFLBODY module to change (ie to animate the body) the values (rotations and translations) of each Transform Node with the EPFLBODY values. This table must be reconstructed when a new Body (contained in BDP Node) is specified.
EPFL is releasing
the implementation of an MPEG-4 compliant decoder that can decode as well
as other MPEG-4 tools. Here are some screenshots of the very firsts results
of the Player3d playing an MPEG-4 stream containing a Body Object. The
Body used is the default one (EPFL model), since the stream is only containing
BAPs. (click to enlarge)
Here is a screenshot of a more complete scene, containing a ground, an object (the shelf), an video-textured object (the TV) and a body, plus a 1 minute long sound source. This stream is a good example of the effectiveness of MPEG-4 for the coding of multimedia scenes, the size of the complete scene (audio+video+synthetic objects + a body and its animation) is approximately 130Kbytes. (click to play)