Difference: CADiscussMay19 (2 vs. 3)

Revision 32006-05-26 - PhilippeBeaudoin

  META TOPICPARENT 
 name="CharacterAnimationGroup" 

 Discussion on May 19th 2006 reading group meeting 

Paper presented:
- META TOPICPARENT
+ name="CharacterAnimationGroup"
-<
<
+Arikan, Okan. Compression of Motion Capture Databases, to appear in Siggraph 2006 proceedings
->
>
+Kang Hoon Lee, Myung Geol Choi and Jehee Lee. Motion Patches: Building Blocks for Virtual Environments Annotated with Motion Data, to appear in Siggraph 2006 proceedings
-<
<
+ Project page (with paper and video): http://www.cs.utexas.edu/~okan/papers/s2006/compression.html
->
>
+ Project page (with paper and video): http://mrl.snu.ac.kr/~zoi/motion_patch/motion_patch.html
  Paper Overview
  Paper Discussion
  Paper Overview
-<
<
+They present a method to compress a large database of skeletal motion data. Their method is separated in two parts:  
 Global motion compression
  Specific precise compression of joints in contact with the environment (the feet in their case)
 

Global Motion Compression 

 Each joint is associated to 3 virtual markers which are tracked through time. The marker positions are said to be more linear than the DOF angles. The DOF angles can easily be re-extracted from these markers, even after compression (using least-square fit).
  Separate the motion in 16-32 frames clips
  For each clip, fit a 3D Bezier curve through the moving markers
  Each clip is therefore a vector in a space of dimension d = 12 x 3 x number of joints .
  Reduce the dimension using Clustered PCA over the whole database 
 Spectral clustering using Nystrom Approximation (ref given)
  Typically, 1 to 20 clusters
  Randomly draw 10000 frames of the database before performing the CPCA
  A parameter is used to decide how many dimensions are kept  
 
  Quantize elements of the reduced vector to 16 bits.
 

Specific Joint Compression 

 Ground reaction force is quite significant and applies over a veryshort time ==> High frequencies in the motion
  Sliding feet are a perceptually important artifact
  Consider the x,y,z coordinates of the virtual markers on the feet (or other contact joints) as separate 1D signals
  For each clip, apply DCT on these signals, then quantize, then entropy-encode (Huffman codes)
  During decompression, use IK (Tolani et al. 2006) to plant foot at reconstructed position.
 

Features and results 

 To access an individual frame, one has to decompresse a 16-32 frame clip
  Compresses at 1 ms/frame, decompress at 1.2 ms/frame (7 times real-time)
  Random access any clip for decompression
  CPCA performed offline on a random 10000 frames of animation, clips can be processed independantly
  After CPCA, clips can be compressed independantly and incrementally. If statistical distribution changes, can perform CPCA again.
  Clip-to-clip transition can be discontinuous. Fix this by solving a sparse linear system over the clip, called Continuous Merge (???)
->
>
+The basic idea is to create an environment by assembling a large number of regular building blocks called unit objects, such as a slide, a square ground tile or a cubic block. Some configuration of these unit objects is physically built in a motion-capture studio in order to capture a long sequence of a human navigating this environment.

A model of the physical environment is then used to build motion patches. These are groups of 2 unit blocks (similar up to a rigid transform) together with all the frames of motion that pass through them. Note : The authors say that motion patches can be built with 1 or more unit blocks, although their implementation only use 2 unit block patches except for a single 1 unit block patch. The following discussion is easier to follow if we think of 2 unit blocks patches.

As a pre-process, all the acquired poses are clustered so that two poses fall in the same cluster if they are close enough according to a simple distance metric. Each cluster is assigned an index p. (Incidentally, this clustering is performed in a +/- 200 dimensional space, they use the agglomerative hierarchical k-means algorithm [ref given])

In order to interactively branch between animations, each pose of each motion contained in a patch is binned into a regular grid of cells. A cell occupies a projected square area of about 10cm x 10cm. Also, a cell has 4 dimensions (x, y, theta, p). A motion frame is said to be in the cell with with coordinate (x, y) if its root node falls within the projected square at position x, y. A motion frame falls in the cell with angle theta if the yaw orientation of the character at that frame is equal to theta (within a threshold?). Finally, the motion frame falls in cell with index p if the cluster index of the pose is p.

After each frame of each motion has been binned, the system builds a directed graph where each cell is a node. A link is added between two cells if there exist a segment of motion starting in one cell and ending in the other. When the same cell is occupied by 2 or more frames, the animation can interactively branch at that cell. In order for perform this branching quickly at run-time, the different motions present in the cell are warped to create a smooth branching transition [ref given].

To create a novel environment, motion patches are automatically fitted to the different unit blocks that build the environment. Since motion patch occupies two building block, they will often overlap. When this happens, the cells in the overlapping region are scanned to create links between the motion patches. (The authors don't mention if the motions are warped or if motion blending is used in this case.)

A final type of patch is introduced: a large flat square that can be tiled to produce arbitrarily large interactive walking motions. The idea here is to create a number of regularly spaced entry/exit points along the sides of the square tile. The motion of a subject randomly walking around for about 10 minutes is then captured. This motion is analysed to find paths that would connect each entry point to each exit point. An entry/exit point is annotated by (x, y, theta, p) similarly to the previously introduced cells. A technique to prune the motion graph when obstacle are present on the tilable tile is presented. Path planning and collision avoidance techniques are also discussed.
  Paper Discussion
-<
<
+Here's what we think is missing in the paper : 
 Progressive compression / Generating various animation LOD 
  The technique doesn't take into account that the database is made of multiple sequences.
  Good results only if the database is large enough
  The frame-rates are not realistic for usage such as real-time games
  Incrementally compressing motion is efficient as long as the statistical properties do not change. When does that happen?
  Baseline comparison methods are probably too simple
  Decompression could exhibit cache issues since large chunk of data (PCA matrices) must be randomly accessed
  Compression is faster than decompression? Sounds weird...
  Justification for not using angular data is kind of weak
 

Here are some ideas that came out : 

 Check how receptive field weighted regression would perform for temporal compression
  Use progressive compression technique for large mocap database exploration over a slow channel (see PhilippeBeaudoin)
 

Some links to papers that are not referred to but are related : 

 A. Ahmed, A. Hilton, and F. Mokhtarian. Adaptive compression of human animation data. In Eurograhics - Short Paper, September 2002.
  Naka, T., Mochizuki, Y., Hijiri, T., Cornish, T., and Asahara, S. A compression/decompression method for streaming based humanoid animation. In Proc. of the 4th Symp. on VRML, 1999. http://doi.acm.org/10.1145/299246.299264
   T.K. Capin, E. Petajan, J. Ostermann. Very low bit rate coding of virtual human animation in MPEG-4, Proceedings of ICME 2000. http://ieeexplore.ieee.org/xpl/abs_free.jsp?arNumber=871554
  S. Chattopadhyay, S.M. Bhandarkar, K. Li. BAP sparsing: a novel approach to MPEG-4 body animation parameter compression, Proceedings Systems Communications, 2005. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1515510
  S. Chattopadhyay, S.M. Bhandarkar, K. Li. Compression by indexing: an improvement over MPEG-4 body animation parameter compression, Proceedings of SPIE, 2006. http://spiedl.aip.org/getabs/servlet/GetabsServlet?prog=normal&id=PSISDG00607100000160710K000001&idtype=cvips&gifs=yes
->
>
+Is this the good way to go? 
 Is it such a good idea to have so many very similar motions?
  Wouldn't it be better to do a directed mocap session where each animation performed by the subject will be useful?
  In real life, animations are more smooth and squishy (I think that's what Kevin said) that the authors assumed. Here, only minimal modifications are applied in order to stitch motions together.
  The tilable patch is probably not the best way to create long walking sequence. A nice interactive walking technique would seem more appropriate.
  Although fast and simple, path planning on the tile level still let the character move erratically within each tile, not a very convincing behaviour.
 -- PhilippeBeaudoin - 19 May 2006

View topic | History: r4 < r3 < r2 < r1 | More topic actions...