1:00pm - 1:20pm
Real-Time 3D Visual Singing Synthesis: From Appearance to Internal Articulators
University of Science and Technology of China, China, People's Republic of
A real-time musical score + lyric driven facial animation system creates song synchronized 3D articulatory animation at phoneme level. A 3D head mesh model, including the meshes of appearance and internal articulators, is constructed by fusing the data from visible images and Magnetic Resonance Imaging. Both finite element method and anatomical model are used to simulate the deformation of articulators, and synthesize the articulatory animation corresponding to each phoneme with musical note. To obtain visual co-articulation model, musical note, phoneme and articulatory movement are modeled simultaneously in framework of the context-dependent Hidden Semi-Markov Model trained on an articulatory song corpus collected by Electro-Magnetic Articulography. Articulatory animations corresponding to all phonemes are concatenated by visual co-articulation model to produce the song synchronized articulatory animation. The experiments show the singing ability increases the human computer interaction capability of talking head significantly, and quantitative improvements are demonstrated in objective evaluation and user studies, comparing with the output of different talking head systems.
1:20pm - 1:40pm
Visual robotic object grasping through combining RGB-D data and 3D mesh
1National Key Lab for Novel Software Technology, Nanjing University, Nanjing, China; 2Riseauto Intelligent Tech., Beijing, China
Robotic grasping driven by cameras is challenging due to a lot inherent difficulties brought by vision based systems, especially the inaccuracy in determining sizes, distances and orientations for visual objects. In this paper, we present a novel framework to drive automatic robotic grasp by matching camera captured RGB-D data with 3D meshes, on which prior knowledge for grasp is pre-defined for each object type. The proposed framework consists of two modules, namely, pre-defining grasping knowledge for each type of object shape on 3D meshes, and automatic robotic grasping by matching RGB-D data with pre-defined 3D meshes. In the first module, we scan 3D meshes for typical object shapes and pre-define grasping regions for each 3D shape surface, which will be considered as the prior knowledge for guiding automatic robotic grasp. In the second module, for each RGB-D image captured by a depth camera, we recognize 2D shape of the object in it by an SVM classifier, and then segment it from background using depth data. Next, we propose a new algorithm to match the segmented RGB-D shape with predefined 3D meshes to guide robotic self-location and grasp by an automatic way. Our experimental results show that the proposed framework is particularly useful to guide camera based robotic grasp.
1:40pm - 2:00pm
ReMagicMirror: Action Learning Using Human Reenactment with the Mirror Metaphor
1Nara Institute of Science and Technology, Japan; 2Microsoft Research Asia, China; 3Kagoshima University, Japan
We propose ReMagicMirror, a system to help people learn actions (e.g., martial arts, dances). We first capture the motions of a teacher performing the action to learn, using two RGB-D cameras. Next, we fit a parametric human body model to the depth data and texture it using the color data, reconstructing the teacher's motion and appearance. The learner is then shown the ReMagicMirror system, which acts as a mirror. We overlay the teacher's reconstructed body on top of this mirror in an augmented reality fashion. The learner is able to intuitively manipulate the reconstruction's viewpoint by simply rotating her body, allowing for easy comparisons between the learner and the teacher. We perform a user study to evaluate our system's ease of use, effectiveness, quality, and appeal.
2:00pm - 2:20pm
Augmented Telemedicine Platform for Real-time Remote Medical Consultation
University of California at Berkeley, Berkeley, CA, USA
Current telemedicine systems for remote medical consultation are based on decades old video-conferencing technology. Their primary role is to deliver video and voice communication between medical providers and to transmit vital signs of the patient. This technology, however, does not provide the expert physician with the same hands-on experience as when examining a patient in person. Virtual and augmented reality (VR and AR) on the other hand have the capacity to enhance the experience and communication between healthcare professionals in geographically distributed locations. By transmitting RGB+D video (texture and depth) of the patient in real time, the expert physician can interact with this 3D representation in novel ways. Furthermore, the use of AR technology at the patient side has potential to improve communication by providing clear visual instructions to the caregiver. In this paper, we propose a framework for 3D real-time communication that combines interaction via VR and AR. We demonstrate the capabilities of our framework on a prototype system consisting of a depth camera, projector and 3D display. The system is used to analyze the network performance and data transmission quality of the multimodal streaming in a remote scenario.
2:20pm - 2:40pm
What are Good Design Gestures? -Towards user- and machine-friendly interface-
Kyushu University, Japan
This paper discusses about gesture design for man-machine interfaces. Traditionally, gesture-interface studies focused on improvement of the performance: high recognition accuracy, real-time recognition, reducing mis-classification. Many studies forget an important issue how to design good gestures from the viewpoints of both machine friendliness and user friendliness. The former is, of course, to guarantee higher recognition accuracy by a machine. The latter is to provide gestures which are ease to use and ease to remember for a user. In this paper, we investigate what kinds of gestures are desirable for both a machine and a user, and give concepts for gesture design obtained by our experiments involving 351 participants through crowdsourcing.