IST COLLOQUIUM 2023

Reducing the Need of Labelled Data in Machine Learning

講演者の画像

Thomas Lampert 🌐

Université de Strasbourg
講演者経歴 Thomas Lampert is a Computer Science researcher in the general field of Artificial Intelligence, and more specifically in Machine Learning and Image and Time-Series Analysis in various fields of application (most recently in medical imaging and remote sensing). Dr. Lampert currently holds the Chair of Data Science and Artificial Intelligence at Télécom Physique Strasbourg and ICube, University of Strasbourg.
日時:2月22日(木) 13:15〜14:45

場所:総合研究7号館セミナー室1 (127号室)

During this talk I will expose the approaches that we have developed to reduce the need for labelled data for several domains of machine learning applications. First I will discuss histopathological images and the difficulties it poses for deep learning segmentation algorithms. I will present several domain invariant approaches that we have developed to overcome these difficulties, and some interest findings that have arisen from them. Next I will discuss the difficulty of performing domain adaptation and learning domain invariant features with multi-modal imagery data (i.e. image domains with different numbers of bands), with application to remote sensing data. Finally, I will present our work on constrained clustering for time-series, a semi-supervised approach to reduce the need for labelled data using must-link and cannot-link constraints. I will extend this to propose an approach to explain the clustering result in an easily interpretable manner.

Is my “red” your “red”? A structural approach on the issue of qualia

講演者の画像

Nao (Naotsugu) Tsuchiya

Professor, School of Psychological Sciences, Turner Institute for Brain and Mental Health, Monash University, Australia
講演者経歴 Dr Tsuchiya was awarded a PhD at California Institute of Technology (Caltech) in 2006 and underwent postdoctoral training at Caltech until 2010. Receiving a PRESTO grant from Japan Science and Technology (JST) agency, Dr Tsuchiya returned to Japan in 2010. In Jan 2012, he joined the School of Psychological Sciences at Monash University as an Associate Professor. Since 2013, he is an ARC Future Fellow. His main research interest is to uncover the neuronal basis of consciousness. Specifically, he focuses on 1) the scope and limit of non-conscious processing, 2) the relationship between attention and consciousness, and 3) the neuronal correlates of consciousness by analysing the multi-channel neuronal recording obtained in animals and humans and 4) testing a theory of consciousness, in particular, integrated information theory of consciousness.
日時:12月27日(水) 13:30〜14:45

場所:総合研究12号館316号室

Do I experience the world in the “same” way as you do? How similar is my consciousness to other humans, animals, insects, octopuses, and artificial intelligence? In this talk, I will discuss this issue focusing on the content of consciousness, what-it-feels-like, or in short “qualia”. Can we scientifically investigate if my red qualia is similar to your red qualia? Traditionally, the answer has been No. Because "qualia" are so intrinsic and purely subjective, qualia have often been considered as outside of the realm of scientific inquiry. Recently, our group has proposed a method of characterising a quale in terms of its relation to all other qualia, inspired by a mathematical theorem called “Yoneda lemma” in the field of category theory. Based on this idea, we conducted experiments in which we asked a large number (>500) of neurotypical and colorblind subjects to report the similarity of a subset of ~5000 colour combinations in an online setting. Using the similarity structures estimated from these data, we quantified whether it is possible to “align” the colour qualia structures between different populations, using an unsupervised method, called “optimal transport” in the field of machine translation (https://psyarxiv.com/h3pqm). Our qualia structure approach is generalizable to qualia in other domains (such as similarity of evoked emotional experience of short movies), or even to structures between qualia structures. The relationship between qualia structures may eventually provide an opportunity to address questions such as, “Why are colour qualia perceived as colour qualia?”

Learning to Synthesize Image and Video Contents

講演者の画像

Ming-Hsuan Yang

Professor, University of California, Merced, USA
講演者経歴 Ming-Hsuan Yang is a Professor in Electrical Engineering and Computer Science at University of California, Merced, and a Research Scientist at Google. He received a Ph.D. degree in Computer Science from the University of Illinois at Urbana-Champaign in 2000. He served as a Program Chair for IEEE International Conference on Computer Vision (ICCV) in 2019 and Asian Conference on Computer Vision (ACCV) in 2014. He is as an Associate Editor-in-Chief of the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) from 2023, co-Editor-in-Chief of Computer Vision and Image Understanding (CVIU) from 2022 to 2023, and Associate Editor of International Journal of Computer Vision (IJCV). Yang received Longuet-Higgins (Test-of-Time) Prize at CVPR in 2023, CAREER award from the National Science Foundation in 2012, and Google Faculty Award in 2009. He is a Fellow of the IEEE and ACM.
日時:12月4日(月) 13:30〜14:30

場所:総合研究7号館 情報1講義室 (1階 107)

In this talk, I will first review our work on learning to synthesize image and video content from image data. The underlying theme is to exploit different priors to synthesize diverse content with robust formulations. When time allows, I can also discuss other recent results on image editing, video generation, learning features, surface normal estimation, and deformable 3D reconstruction.

Deep Surface Meshes

講演者の画像

Pascal Fua

Professor, École Polytechnique Fédérale de Lausanne, Switzerland
講演者経歴 Pascal Fua received an engineering degree from Ecole Polytechnique,Paris, in 1984 and a Ph.D. in Computer Science from the University of Orsay in 1989. He joined EPFL (Swiss Federal Institute of Technology) in 1996 where he is a Professor in the School of Computer and Communication Science and head of the Computer Vision Lab. Before that, he worked at SRI International and at INRIA Sophia-Antipolis as a Computer Scientist.

His research interests include shape modeling and motion recovery from images, analysis of microscopy images, and machine learning. He has (co)authored over 300 publications in refereed journals and conferences. He has received several ERC grants. He is an IEEE Fellow and has been an Associate Editor of IEEE journal Transactions for Pattern Analysis and Machine Intelligence. He often serves as program committee member, area chair, and program chair of major vision conferences and has cofounded three spinoff companies.
日時:10月20日(金) 13:30〜14:30

場所:総合研究7号館 情報1講義室 (1階 107)

Geometric Deep Learning has made striking progress with the advent of Deep Implicit Fields. They allow for detailed modeling of surfaces of arbitrary topology while not relying on a 3D Euclidean grid, resulting in a learnable 3D surface parameterization that is not limited in resolution. Unfortunately, they have not yet reached their full potential for applications that require an explicit surface representation in terms of vertices and facets because converting the implicit representation to such an explicit representation requires a marching-cube algorithm, whose output cannot be easily differentiated with respect to the implicit surface parameters.

In this talk, I will present our approach to overcoming this limitation and implementing convolutional neural nets that output complex 3D surface meshes while remaining fully-differentiable and end-to-end trainable. I will also present applications to single view reconstruction, physically-driven Shape optimization, and bio-medical image segmentation.

Human-like Motion Generation and Prediction

講演者の画像

浮田 宗伯

豊田工業大学 教授
講演者経歴 Norimichi Ukita (Member, IEEE) received the BE and ME degrees in information engineering from Okayama University, Japan, in 1996 and 1998, respectively, and the PhD degree in informatics from Kyoto University, Japan, in 2001. He is a professor with the graduate school of engineering, Toyota Technological Institute, Japan (TTI-J). After working for five years as an assistant professor at NAIST, he became an associate professor, in 2007 and moved to TTI-J, in 2016. He was a research scientist of Precursory Research for Embryonic Science and Technology, Japan Science and Technology Agency (JST), during 2002 -2006. He was a visiting research scientist at Carnegie Mellon University during 2007-2009. He currently works also at Toyota Technological Institute at Chicago (TTI-C) as an adjoint professor. His main research interests are low-level vision, object detection/tracking, and human pose estimation and action recognition.
日時:9月22日(金) 13:30〜14:30

場所:総合研究7号館 情報1講義室 (1階 107)

Generation and prediction of time-series data have a variety of real-world applications. In particular, we focus on human-like motion generation and prediction. In this presentation, (1) super-fast task-agnostic probabilistic prediction, (2) physically-constrained human motion generation from a limited number of motion samples, and (3) robot motion planning with the initial state presented by an image from a limited number of motion samples. For (1) prediction, previous methods have either of the following two problems: (i) non-probabilistic (i.e., deterministic) prediction methods cannot represent the stochasticity and diversity of possible motions and (ii) computationally slow methods that are not applicable to real-time applications. Our method is extended from normalizing flow, which enables probabilistic prediction, so that past motion transformations whose computational cost is a bottleneck are reused for real-time processing. In (2), the policy of controlling a human kinematic model is trained so that its motion gets close to real human motions in a physics simulator. For robot motion learning with a limited number of motion samples, we propose the following methods: (i) Transformer-based network in which high-resolution but efficient temporal features are extracted so that the features are spatially aligned with sample motions for representing precise robotic motions such as grasping an object and (ii) Diffusion-based network for retrieving the sample motion most probable to achieve a given task and for rectifying the retrieved motion to improve the achievability.

From Videos to 4D Worlds and Beyond

講演者の画像

Angjoo Kanazawa

Assistant Professor, University of California, Berkeley, USA
講演者経歴 Angjoo Kanazawa is an Assistant Professor in the Department of Electrical Engineering and Computer Science at the University of California at Berkeley. Her research is at the intersection of Computer Vision, Computer Graphics, and Machine Learning, focusing on the visual perception of the dynamic 3D world behind everyday photographs and video. Previously, she was a research scientist at Google NYC, and prior to that she was a BAIR postdoc at UC Berkeley. She completed her PhD in Computer Science at the University of Maryland, College Park, where she also spent time at the Max Planck Institute for Intelligent Systems. She has been named a Rising Star in EECS and has been honored with the Google Research Scholar Award and most recently the Sloan Fellowship 2023.
日時:8月25日(金) 13:30〜14:30

場所:総合研究7号館 情報1講義室 (1階 107)

The world underlying images and videos is 3-dimensional and dynamic, with people interacting with each other, objects, and the underlying scene. Even in videos of a static scene, there is always the camera moving about in the 4D world. However, disentangling this 4D world from a video is a challenging inverse problem due to fundamental ambiguities of depth and scale. Yet, accurately recovering this information is essential for building systems that can reason about and interact with the underlying scene, and has immediate applications in visual effects and creation of immersive digital worlds. In this talk, I will discuss recent updates in 4D human perception, which includes disentangling the camera and the human motion from challenging in-the-wild videos with multiple people. Our approach takes advantage of background pixels as cues for camera motion, which when combined with motion priors and inferred ground planes can resolve scene scale and depth ambiguities up to an "anthropometric" scale. I will also talk about nerf.studio, a modular open-source framework for easily creating photorealistic 3D scenes and accelerating NeRF development. I will introduce two new works that highlight how language can be incorporated for editing and interacting with the recovered 3D scenes. These works leverage large-scale vision and language models, demonstrating the potential for multi-modal exploration and manipulation of 3D scenes

Visual Generation and Recognition via Object Completion

講演者の画像

Jianbo Shi

Professor, University of Pennsylvania, USA
講演者経歴 Jianbo studied Computer Science and Mathematics as an undergraduate at Cornell University where he received his B.A. in 1994. He received his Ph.D. degree in Computer Science from University of California at Berkeley in 1998, for his thesis on Normalize Cuts image segmentation algorithm. He joined The Robotics Institute at Carnegie Mellon University in 1999 as a research faculty. Since 2003, he has been with the Department of Computer & Information Science at the University of Pennsylvania. His group is developing vision algorithms for both human and image recognition. Their ultimate goal is to develop computation algorithms to understand human behavior and interaction with objects, and to do so at multiple levels of abstractions: from the basic body limb tracking, to human identification, gesture recognition, and activity inference. His group is developing a visual thinking model that allows computers to understand their surroundings and achieve higher-level cognitive abilities such as machine memory and learning.
日時:8月23日(水) 13:30〜14:30

場所:総合研究7号館情報2講義室(1階 101)

Using the latest image generation tools, we study how generation and recognition models are related. First, we explore amodal completion with Text-to-Image guidance: iteratively inpaint occluder regions for a given query object, eliminating the need for intermediate amodal mask prediction. We identify a crucial need to quantify object completeness, to prevent regenerating common co-occurring occluders and attached objects. Our amodal completion algorithm works in tandem with the dataset creation: the inpainting algorithm creates a large set of realistic images for human experts to label. We developed a human-curated Amodal Completion of Common Objects (ACCO) dataset containing 80 common object categories in the COCO dataset.

We then study why how generation models enhance visual recognition. We propose a diverse outpainting model to synthesize and comprehend potential background interactions with an object. We utilize a pretrained inpainting model to generate labels for object-context spatial relationships, enabling the training of models to predict plausible object placement and affordance in a scene. Using the learned object placement model, we demonstrate the effectiveness of compositing objects into different compatible contexts as a data augmentation technique for object detection and instance segmentation.

Combining these two studies, we observe a nested structured encoding of object-context, similar to that of AND-Or graph, emerging from self-trained image generation models.

Generation Meets Reconstruction. Looking at 3D Computer Vision through the Lens of Generative AI

講演者の画像

Yasutaka Furukawa

Associate Professor, Simon Fraser University, Canada
講演者経歴 Dr. Yasutaka Furukawa is an associate professor in the School of Computing Science at Simon Fraser University (SFU). Dr. Furukawa's group has made fundamental and practical contributions to 3D reconstruction algorithms, improved localization techniques, and computational architectural modeling. Their open-source software has been widely adopted by tech companies and used in surprising applications such as 3D printing of turtle shells and archaeological reconstruction. Dr. Furukawa received the best student paper award at ECCV 2012, the NSF CAREER Award in 2015, CS-CAN Outstanding Young CS Researcher Award 2018, Google Faculty Research Awards in 2016, 2017, and 2018, and PAMI Longuet-Higgins prize in 2020.
日時:8月9日(水) 14:00〜15:00

場所:総合研究7号館 セミナー室2 (1階 131)

Generative AI has seen a surge in popularity, with Diffusion Models (DMs) being a crucial component of many successful visual content generation techniques. Examples include DALL-E by OpenAI, Imagen by Google, and Stable Diffusion by Stability AI. While DMs are commonly known for their ability to generate content, our research group has discovered that DMs are also highly effective general problem solvers. Specifically, we focus on structured geometry modeling (e.g., CAD models). We have recently made significant strides in vector graphicsfloorplan generation, HD map reconstruction, and spatial arrangement estimation. DM-based approaches consistently achieve the best performance across all tasks, surpassing existing state-of-the-art methods tailored to specific tasks.

Self-Supervised 3D Scene Understanding

講演者の画像

Vincent Lepetit

Professor, ENPC ParisTech, France
講演者経歴 Vincent Lepetit is a professor at ENPC ParisTech, France. Prior to this position, he was a full professor at the Institute for Computer Graphics and Vision, Graz University of Technology (TU Graz), Austria and before that, a senior researcher at CVLab, Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland. His current research focuses on 3D scene understanding, especially at trying to reduce the supervision needed by a system to learn new 3D objects and new 3D environments. In 2020, he received with colleagues the Koenderick “test-of-time” award for “Brief: Binary Robust Independent Elementary Features”. He often serves as an area chair of major computer vision conferences (CVPR, ICCV, ECCV) and as an editor for the Pattern Analysis and Machine Intelligence (PAMI) and International Journal of Computer Vision (IJCV) journals. He was awarded in 2023 an ERC Advanced Grant for the 'explorer' project on creating digital twins of large-scale sites.
日時:8月2日(水) 13:30〜14:30

場所:総合研究7号館情報2講義室(1階 101)

I will present our recent work on how a general AI algorithm can be used for 3D scene understanding to reduce the need for training data. More exactly, we propose several modifications of the Monte Carlo Tree Search (MCTS) algorithm to retrieve objects and room layouts from noisy RGB-D scans. While MCTS was developed as a game-playing algorithm, we show it can also be used for complex perception problems. Our adapted MCTS algorithm has few easy-to-tune hyperparameters and can optimise general losses. We use it to optimise the posterior probability of objects and room layout hypotheses given the RGB-D data. This results in a render-and-compare method that explores the solution space efficiently. I will then show that the same algorithm can be applied to other scene understanding problems.

Making the Invisible Visible: Toward High-Quality Deep THz Computational Imaging

講演者の画像

Chia-Wen Lin

Professor, National Tsing Hua University, Taiwan
講演者経歴 Prof. Chia-Wen Lin is currently a Professor with the Department of Electrical Engineering, National Tsing Hua University (NTHU), Taiwan. He also serves as Deputy Director of the AI Research Center of NTHU. He is currently a Visiting Professor at the Graduate School of Informatics, Informatics, Kyoto University from July 2023 to December 2023. He served as Visiting Professor at Nagoya University and National Institute of Informatics, Japan, in 2019 and 2015, respectively. His research interests include image/video processing, computer vision, and video networking. Dr. Lin is an IEEE Fellow, and has been serving on IEEE Circuits and Systems Society (CASS) Fellow Evaluating Committee since 2021. He serves as IEEE CASS BoG member-at-Large during 2022-2024. He was Steering Committee Chair of IEEE ICME (2020-2021), IEEE CASS Distinguished Lecturer (2018-2019), and President of the Chinese Image Processing and Pattern Recognition (IPPR) Association, Taiwan (2019-2020). He has served as Associate Editor of IEEE Transactions on Image Processing, IEEE Transactions on Multimedia, IEEE Transactions on Circuits and Systems for Video Technology, and IEEE Multimedia. He served as TPC Chair of IEEE ICME in 2010 and IEEE ICIP in 2019, and the Conference Chair of IEEE VCIP in 2018.
日時:7月7日(金) 13:30〜14:30

場所:総合研究7号館情報2講義室(1階 101)

Terahertz (THz) computational imaging has recently attracted significant attention thanks to its non-invasive, non-destructive, non-ionizing, material-classification, and ultra-fast nature for 3D object exploration and inspection. However, its strong water absorption nature and low noise tolerance lead to undesired blurs and distortions of reconstructed THz images. The performances of existing methods are highly constrained by the diffraction-limited THz signals. In this talk, we will introduce the characteristics of THz imaging and its applications. We will also show how to break the limitations of THz imaging with the aid of complementary information between the THz amplitude and phase images sampled at prominent frequencies (i.e., the water absorption profile of THz signal) for THz image restoration. To this end, we propose a novel physics-guided deep neural network design, namely Subspace-Attention-guided Restoration Network (SARNet), that fuses such multi-spectral features of THz images for effective restoration. Furthermore, we experimentally construct an ultra-fast THz time-domain spectroscopy system covering a broad frequency range from 0.1 THz to 4 THz for building up temporal/spectral/spatial/phase/material THz database of hidden 3D objects.

Toward Real-World Photometric Stereo: Why Photometric Stereo Must Be Universal?

講演者の画像

池畑 諭

国立情報学研究所 助教
講演者経歴 He received a B.A. in Psychology in 2009, and an M.S. and Ph.D. in Information Studies in 2011 and 2014, respectively, from the University of Tokyo. He worked as a postdoctoral researcher at Washington University in St. Louis from 2014 to 2016. Currently, he is an assistant professor at the National Institute of Informatics, a specially appointed associate professor at Tokyo Tech, and a visiting researcher at the University of Tokyo. His main interests lie in 3D computer vision, with a particular focus on the use of photometric stereo to achieve professional-grade 3D reconstruction in real-world scenarios.
日時:6月2日(金) 13:30〜14:30

場所:総合研究7号館情報2講義室(1階 101)

Photometric stereo is a longstanding computer vision problem focused on recovering a detailed surface normal map of objects in a scene using images captured under varying illumination. Despite the simplicity of the basic problem statement, the underlying problem formulation and image acquisition setups are incredibly complex. To apply a photometric stereo method, the appropriate algorithm must be selected and images carefully captured in a controlled environment, taking into consideration assumptions about surface geometry, material, camera, and lighting. In this talk, I share my journey toward overcoming the fundamental challenges in photometric stereo by introducing the learning-based photometric stereo method, named "universal" photometric stereo, which aims to remove these complicated assumptions and acquisition setups. When performing deep learning on photometric stereo tasks, managing varying numbers of unordered input images under different lighting conditions is a critical challenge. I will firstly discuss my previous attempts to address this issue, including the development of an "observation map" (ECCV 2018, ICIP 2021) and a "light-axis transformer" (BMVC 2021), and how these ideas were extended to universal photometric stereo networks.

The core of my talk presents my recent universal photometric stereo networks (UniPS, CVPR 2022 and SDM-UniPS, CVPR 2023), which can recover impressively detailed surface normal maps even when images are captured under unknown, spatially-varying lighting conditions in uncontrolled environments. This advancement enables the application of photometric stereo in everyday settings, effectively allowing for Real-World Photometric Stereo.

Spatial AIとシーンプライバシー

講演者の画像

櫻田 健

産業技術総合研究所 主任研究員
講演者経歴 2015年東北大学大学院情報科学研究科博士後期課程修了。日本学術振興会特別研究員(DC2)、カーネギーメロン大学客員研究員、東京工業大学博士研究員、名古屋大学助教、産業技術総合研究所客員研究員を経て、2018年4月より産業技術総合研究所に主任研究員として勤務。
日時:5月12日(金) 13:30〜14:30

場所:総合研究7号館情報2講義室(1階 101)

自動運転車やドローン、サービスロボット、スマートフォンなど、カメラを搭載した移動体が普及するにつれ、3次元マップの重要性が以前よりも大幅に高まっています。同時に、広域のデータを比較的容易に収集できるようになり、3次元マップの構築だけでなく、実世界の事象を解析する研究も盛んに行われています。さらに、モバイルカメラで収集した動画像のシーンプライバシーにも注目が集まっています。本講演では、3次元モデリング、シーン認識、シーンプライバシー保護を中心に、空間モデリング技術の最新動向と自身の取り組みについて解説します。

<<2024年の講演はこちら2022年の講演はこちら>>