日時:2月22日(木) 13:15〜14:45
場所:総合研究7号館セミナー室1 (127号室)
During this talk I will expose the approaches that we have developed to reduce the need for labelled data for several domains of machine learning applications. First I will discuss histopathological images and the difficulties it poses for deep learning segmentation algorithms. I will present several domain invariant approaches that we have developed to overcome these difficulties, and some interest findings that have arisen from them. Next I will discuss the difficulty of performing domain adaptation and learning domain invariant features with multi-modal imagery data (i.e. image domains with different numbers of bands), with application to remote sensing data. Finally, I will present our work on constrained clustering for time-series, a semi-supervised approach to reduce the need for labelled data using must-link and cannot-link constraints. I will extend this to propose an approach to explain the clustering result in an easily interpretable manner.
日時:12月27日(水) 13:30〜14:45
場所:総合研究12号館316号室
Do I experience the world in the “same” way as you do? How similar is my consciousness to other humans, animals, insects, octopuses, and artificial intelligence? In this talk, I will discuss this issue focusing on the content of consciousness, what-it-feels-like, or in short “qualia”. Can we scientifically investigate if my red qualia is similar to your red qualia? Traditionally, the answer has been No. Because "qualia" are so intrinsic and purely subjective, qualia have often been considered as outside of the realm of scientific inquiry. Recently, our group has proposed a method of characterising a quale in terms of its relation to all other qualia, inspired by a mathematical theorem called “Yoneda lemma” in the field of category theory. Based on this idea, we conducted experiments in which we asked a large number (>500) of neurotypical and colorblind subjects to report the similarity of a subset of ~5000 colour combinations in an online setting. Using the similarity structures estimated from these data, we quantified whether it is possible to “align” the colour qualia structures between different populations, using an unsupervised method, called “optimal transport” in the field of machine translation (https://psyarxiv.com/h3pqm). Our qualia structure approach is generalizable to qualia in other domains (such as similarity of evoked emotional experience of short movies), or even to structures between qualia structures. The relationship between qualia structures may eventually provide an opportunity to address questions such as, “Why are colour qualia perceived as colour qualia?”
日時:12月4日(月) 13:30〜14:30
場所:総合研究7号館 情報1講義室 (1階 107)
In this talk, I will first review our work on learning to synthesize image and video content from image data. The underlying theme is to exploit different priors to synthesize diverse content with robust formulations. When time allows, I can also discuss other recent results on image editing, video generation, learning features, surface normal estimation, and deformable 3D reconstruction.
日時:10月20日(金) 13:30〜14:30
場所:総合研究7号館 情報1講義室 (1階 107)
Geometric Deep Learning has made striking progress with the advent of Deep Implicit Fields. They allow for detailed modeling of surfaces of arbitrary topology while not relying on a 3D Euclidean grid, resulting in a learnable 3D surface parameterization that is not limited in resolution. Unfortunately, they have not yet reached their full potential for applications that require an explicit surface representation in terms of vertices and facets because converting the implicit representation to such an explicit representation requires a marching-cube algorithm, whose output cannot be easily differentiated with respect to the implicit surface parameters.
In this talk, I will present our approach to overcoming this limitation and implementing convolutional neural nets that output complex 3D surface meshes while remaining fully-differentiable and end-to-end trainable. I will also present applications to single view reconstruction, physically-driven Shape optimization, and bio-medical image segmentation.
日時:9月22日(金) 13:30〜14:30
場所:総合研究7号館 情報1講義室 (1階 107)
Generation and prediction of time-series data have a variety of real-world applications. In particular, we focus on human-like motion generation and prediction. In this presentation, (1) super-fast task-agnostic probabilistic prediction, (2) physically-constrained human motion generation from a limited number of motion samples, and (3) robot motion planning with the initial state presented by an image from a limited number of motion samples. For (1) prediction, previous methods have either of the following two problems: (i) non-probabilistic (i.e., deterministic) prediction methods cannot represent the stochasticity and diversity of possible motions and (ii) computationally slow methods that are not applicable to real-time applications. Our method is extended from normalizing flow, which enables probabilistic prediction, so that past motion transformations whose computational cost is a bottleneck are reused for real-time processing. In (2), the policy of controlling a human kinematic model is trained so that its motion gets close to real human motions in a physics simulator. For robot motion learning with a limited number of motion samples, we propose the following methods: (i) Transformer-based network in which high-resolution but efficient temporal features are extracted so that the features are spatially aligned with sample motions for representing precise robotic motions such as grasping an object and (ii) Diffusion-based network for retrieving the sample motion most probable to achieve a given task and for rectifying the retrieved motion to improve the achievability.
日時:8月25日(金) 13:30〜14:30
場所:総合研究7号館 情報1講義室 (1階 107)
The world underlying images and videos is 3-dimensional and dynamic, with people interacting with each other, objects, and the underlying scene. Even in videos of a static scene, there is always the camera moving about in the 4D world. However, disentangling this 4D world from a video is a challenging inverse problem due to fundamental ambiguities of depth and scale. Yet, accurately recovering this information is essential for building systems that can reason about and interact with the underlying scene, and has immediate applications in visual effects and creation of immersive digital worlds. In this talk, I will discuss recent updates in 4D human perception, which includes disentangling the camera and the human motion from challenging in-the-wild videos with multiple people. Our approach takes advantage of background pixels as cues for camera motion, which when combined with motion priors and inferred ground planes can resolve scene scale and depth ambiguities up to an "anthropometric" scale. I will also talk about nerf.studio, a modular open-source framework for easily creating photorealistic 3D scenes and accelerating NeRF development. I will introduce two new works that highlight how language can be incorporated for editing and interacting with the recovered 3D scenes. These works leverage large-scale vision and language models, demonstrating the potential for multi-modal exploration and manipulation of 3D scenes
日時:8月23日(水) 13:30〜14:30
場所:総合研究7号館情報2講義室(1階 101)
Using the latest image generation tools, we study how generation and recognition models are related. First, we explore amodal completion with Text-to-Image guidance: iteratively inpaint occluder regions for a given query object, eliminating the need for intermediate amodal mask prediction. We identify a crucial need to quantify object completeness, to prevent regenerating common co-occurring occluders and attached objects. Our amodal completion algorithm works in tandem with the dataset creation: the inpainting algorithm creates a large set of realistic images for human experts to label. We developed a human-curated Amodal Completion of Common Objects (ACCO) dataset containing 80 common object categories in the COCO dataset.
We then study why how generation models enhance visual recognition. We propose a diverse outpainting model to synthesize and comprehend potential background interactions with an object. We utilize a pretrained inpainting model to generate labels for object-context spatial relationships, enabling the training of models to predict plausible object placement and affordance in a scene. Using the learned object placement model, we demonstrate the effectiveness of compositing objects into different compatible contexts as a data augmentation technique for object detection and instance segmentation.
Combining these two studies, we observe a nested structured encoding of object-context, similar to that of AND-Or graph, emerging from self-trained image generation models.
日時:8月9日(水) 14:00〜15:00
場所:総合研究7号館 セミナー室2 (1階 131)
Generative AI has seen a surge in popularity, with Diffusion Models (DMs) being a crucial component of many successful visual content generation techniques. Examples include DALL-E by OpenAI, Imagen by Google, and Stable Diffusion by Stability AI. While DMs are commonly known for their ability to generate content, our research group has discovered that DMs are also highly effective general problem solvers. Specifically, we focus on structured geometry modeling (e.g., CAD models). We have recently made significant strides in vector graphicsfloorplan generation, HD map reconstruction, and spatial arrangement estimation. DM-based approaches consistently achieve the best performance across all tasks, surpassing existing state-of-the-art methods tailored to specific tasks.
日時:8月2日(水) 13:30〜14:30
場所:総合研究7号館情報2講義室(1階 101)
I will present our recent work on how a general AI algorithm can be used for 3D scene understanding to reduce the need for training data. More exactly, we propose several modifications of the Monte Carlo Tree Search (MCTS) algorithm to retrieve objects and room layouts from noisy RGB-D scans. While MCTS was developed as a game-playing algorithm, we show it can also be used for complex perception problems. Our adapted MCTS algorithm has few easy-to-tune hyperparameters and can optimise general losses. We use it to optimise the posterior probability of objects and room layout hypotheses given the RGB-D data. This results in a render-and-compare method that explores the solution space efficiently. I will then show that the same algorithm can be applied to other scene understanding problems.
日時:7月7日(金) 13:30〜14:30
場所:総合研究7号館情報2講義室(1階 101)
Terahertz (THz) computational imaging has recently attracted significant attention thanks to its non-invasive, non-destructive, non-ionizing, material-classification, and ultra-fast nature for 3D object exploration and inspection. However, its strong water absorption nature and low noise tolerance lead to undesired blurs and distortions of reconstructed THz images. The performances of existing methods are highly constrained by the diffraction-limited THz signals. In this talk, we will introduce the characteristics of THz imaging and its applications. We will also show how to break the limitations of THz imaging with the aid of complementary information between the THz amplitude and phase images sampled at prominent frequencies (i.e., the water absorption profile of THz signal) for THz image restoration. To this end, we propose a novel physics-guided deep neural network design, namely Subspace-Attention-guided Restoration Network (SARNet), that fuses such multi-spectral features of THz images for effective restoration. Furthermore, we experimentally construct an ultra-fast THz time-domain spectroscopy system covering a broad frequency range from 0.1 THz to 4 THz for building up temporal/spectral/spatial/phase/material THz database of hidden 3D objects.
日時:6月2日(金) 13:30〜14:30
場所:総合研究7号館情報2講義室(1階 101)
Photometric stereo is a longstanding computer vision problem focused on recovering a detailed surface normal map of objects in a scene using images captured under varying illumination. Despite the simplicity of the basic problem statement, the underlying problem formulation and image acquisition setups are incredibly complex. To apply a photometric stereo method, the appropriate algorithm must be selected and images carefully captured in a controlled environment, taking into consideration assumptions about surface geometry, material, camera, and lighting. In this talk, I share my journey toward overcoming the fundamental challenges in photometric stereo by introducing the learning-based photometric stereo method, named "universal" photometric stereo, which aims to remove these complicated assumptions and acquisition setups. When performing deep learning on photometric stereo tasks, managing varying numbers of unordered input images under different lighting conditions is a critical challenge. I will firstly discuss my previous attempts to address this issue, including the development of an "observation map" (ECCV 2018, ICIP 2021) and a "light-axis transformer" (BMVC 2021), and how these ideas were extended to universal photometric stereo networks.
The core of my talk presents my recent universal photometric stereo networks (UniPS, CVPR 2022 and SDM-UniPS, CVPR 2023), which can recover impressively detailed surface normal maps even when images are captured under unknown, spatially-varying lighting conditions in uncontrolled environments. This advancement enables the application of photometric stereo in everyday settings, effectively allowing for Real-World Photometric Stereo.
日時:5月12日(金) 13:30〜14:30
場所:総合研究7号館情報2講義室(1階 101)
自動運転車やドローン、サービスロボット、スマートフォンなど、カメラを搭載した移動体が普及するにつれ、3次元マップの重要性が以前よりも大幅に高まっています。同時に、広域のデータを比較的容易に収集できるようになり、3次元マップの構築だけでなく、実世界の事象を解析する研究も盛んに行われています。さらに、モバイルカメラで収集した動画像のシーンプライバシーにも注目が集まっています。本講演では、3次元モデリング、シーン認識、シーンプライバシー保護を中心に、空間モデリング技術の最新動向と自身の取り組みについて解説します。