PhD, Research Associate Professor at Ritsumeikan University, President & Founder of Coarobo GK, Husband & Father of 3, based in Japan.
[Last updated: June 2025.]
Extended Bio
Lotfi El Hafi, PhD, is a Research Associate Professor at Ritsumeikan University, Japan, and the President & Founder of Coarobo GK, Japan.
He received his MScEng in Mechatronics from the Université catholique de Louvain (UCLouvain), Belgium, in 2013, and his PhD in Engineering from the Nara Institute of Science and Technology (NAIST), Japan, in 2017. His master’s thesis contributed to TICO compression, now standardized as JPEG XS, for Ultra HD 4K/8K video distribution during an internship at intoPIX SA, Belgium, and later as a Sales Engineer while completing the AWEX International Business & Management EXPLORT Program. He then joined the Robotics Laboratory at NAIST as a recipient of the MEXT Scholarship in 2014. His PhD thesis proposed a novel eye-tracking method, “Simultaneous Tracking and Attention Recognition from Eyes (STARE),” which leveraged deep learning to extract behaviors from scene images reflected in the eyes, earning him the “Best of IET and IBC 2016–2017.”
He joined the Ritsumeikan Global Innovation Research Organization (R-GIRO) of Ritsumeikan University, Japan, for the “International and Interdisciplinary Research Center for the Next-Generation Artificial Intelligence and Semiotics (AI+Semiotics)” project as a Senior Researcher in 2017, later becoming Research Assistant Professor in 2019. His research activities included service robotics, AI, and system integration, with a particular interest in multimodal interaction in extended reality (XR). In this regard, he received 2 research awards from JST CREST: the “Best Award for Research Proposal Breaking Hagita-CREST Shell: Visualization of Emergent Reality in Robotics for Future Harmonious Interaction” and the “Best Award for Forecasting Research Proposal: Toward Early Realization of Harmonious Human-Robot Interaction.”
He has also won numerous international robotics competition prizes, including the “1st Place” in the Airbus Shopfloor Challenge at IEEE ICRA 2016, the “Finalist Prize” in the Amazon Robotics Challenge 2017, the “NEDO Chairman’s Award for Excellence in World Robot Summit 2018,” the “SICE Award for WRS Future Convenience Store Challenge 2018,” the “METI Minister’s Award for Excellence in World Robot Summit 2020,” the “2nd Place Regional Prize in Eastern Asia + South Eastern Asia” in the OpenCV AI Competition 2021, and multiple “1st Places” in the WRS Future Convenience Store Challenge since 2018, including one at IEEE/RSJ IROS 2022. These contributions addressed both the technological and theoretical challenges of super-smart societies (Society 5.0), in collaboration with the Robotics Hub of Panasonic Corporation, Japan, where he has served as a Research Advisor for Robotics Competitions since 2019. There, he contributed to the development of AMIS, or “Autonomous Manipulation for Intelligence Services,” a human-centered robotic platform, also in collaboration with NAIST.
His recognized achievements in competitions also led to his appointment as a Specially Appointed Researcher of HSR Community by Toyota Motor Corporation, Japan, from 2020 to 2022, during which he developed a containerized Software Development Environment (SDE) using cloud-first, open-source technologies to accelerate cross-institutional collaborative R&D. More than 20 institutions across Japan and the EU are known to have tested the SDE, with half adopting it for research. This endeavor earned him the inaugural “HSR Community Research Encouragement Award” from RSJ in 2022.
He has held the position of Research Associate Professor at the Research Organization of Science and Technology of Ritsumeikan University, Japan, since 2023. His contributions to the development of large-scale cognitive architectures for the JST CREST project “Symbol Emergence in Robotics for Future Human-Machine Collaboration” and the JST Moonshot R&D Program “The Realization of an Avatar-Symbiotic Society where Everyone Can Perform Active Roles without Constraint” have led him to investigate the mechanisms of multimodal robot learning in XR. His recent work on probabilistic inference and contrastive learning for service robots received the “Best Paper Awards” at IEEE/SICE SII 2023 and IEEE IRC 2024, respectively.
Over the years, he has authored over 80 academic publications and holds an h-index of 13 on Google Scholar, with more than 450 citations. He has helped build a vibrant research environment at Ritsumeikan University through successful grant applications as a Principal Investigator, including a competitive JSPS KAKENHI Grant-in-Aid for Early-Career Scientists titled “Emergent Reality: Knowledge Formation from Multimodal Learning through Human-Robot Interaction in Extended Reality,” securing over 15 million JPY in cumulative internal and external funding. He also informally supervises graduate students and international interns in the Emergent Systems Laboratory at Ritsumeikan University and has formally supervised 4 international master’s theses with UCLouvain.
He is an active member of IEEE (since 2015) and RSJ (since 2017), and a former member of JSME (2017–2022). He has peer-reviewed over 35 manuscripts for prestigious journals and conferences and served as Associate Editor for IEEE RO-MAN 2021 and Publicity Chair for IEEE/SICE SII 2026. Finally, he is the President & Founder of Coarobo GK, Japan, which provides containerized SDE solutions for researchers and developers in the fields of service robotics and AI.
[Last updated: June 2025.]
Publication List
-
L. El Hafi, K. Onishi, S. Hasegawa, A. Oyama, T. Ishikawa, M. Osada, C. Tornberg, R. Kado, K. Murata, S. Hashimoto, S. Carrera Villalobos, A. Taniguchi, G. A. Garcia Ricardez, Y. Hagiwara, T. Aoki, K. Iwata, T. Horii, Y. Horikawa, T. Miyashita, T. Taniguchi, and H. Ishiguro, "Public Evaluation on Potential Social Impacts of Fully Autonomous Cybernetic Avatars for Physical Support in Daily-Life Environments: Large-Scale Demonstration and Survey at Avatar Land", in Proceedings of 2025 IEEE International Conference on Advanced Robotics and its Social Impacts (ARSO 2025), Osaka, Japan, Jul. 17, 2025. [International conference article, peer-reviewed.][Accepted for presentation.]
< Abstract >
Cybernetic avatars (CAs) are key components of an avatar-symbiotic society, enabling individuals to overcome physical limitations through virtual agents and robotic assistants. While semi-autonomous CAs intermittently require human teleoperation and supervision, the deployment of fully autonomous CAs remains a challenge. This study evaluates public perception and potential social impacts of fully autonomous CAs for physical support in daily life. To this end, we conducted a large-scale demonstration and survey during Avatar Land, a 19-day public event in Osaka, Japan, where fully autonomous robotic CAs, alongside semi-autonomous CAs, performed daily object retrieval tasks. Specifically, we analyzed responses from 2,285 visitors who engaged with various CAs, including a subset of 333 participants who interacted with fully autonomous CAs and shared their perceptions and concerns through a survey questionnaire. The survey results indicate interest in CAs for physical support in daily life and at work. However, concerns were raised regarding task execution reliability. In contrast, cost and human-like interaction were not dominant concerns. Project page: https://lotfielhafi.github.io/FACA-Survey/.
-
A. Silahli, J. P. De la Rosa, J. Solis, G. A. Garcia Ricardez, L. El Hafi, J. Håkansson, A. S. Sørensen, and T. Rocha Silva, "Gesture-based Behaviour-driven Development Approach for End-User Cobot Programming", in Robotica, Special Issue on Recent Advances in Parallel and Service Robotics, from Development to Applications, 2025. [International journal article, peer-reviewed.][Accepted for publication.]
< Abstract >
This study presents an innovative framework aimed at improving the accessibility and usability of collaborative robot programming. Building on previous research that evaluated the feasibility of using a Domain-Specific Language (DSL) based on Behaviour-Driven Development (BDD), this paper addresses the limitations of earlier work by integrating additional features like a drag-and-drop Blockly web interface. The system enables end users to define and execute robot actions with minimal technical knowledge, making it more adaptable and intuitive. Additionally, a gesture-recognition module facilitates multimodal interaction, allowing users to control robots through natural gestures. The system was evaluated through a user study involving participants with varying levels of professional experience and little-to-no programming background. Results indicate significant improvements in user satisfaction, with the System Usability Scale (SUS) overall score increasing from 7.50 to 8.67 out of a maximum of 10 and integration ratings rising from 4.42 to 4.58 out of 5. Participants completed tasks using a manageable number of blocks (5 to 8) and reported low frustration levels (mean: 8.75 out of 100) alongside moderate mental demand (mean: 38.33 out of 100). These findings demonstrate the tool's effectiveness in reducing cognitive load, enhancing user engagement and supporting intuitive, efficient programming of collaborative robots for industrial applications.
-
B. Bastin, S. Hasegawa, J. Solis, R. Ronsse, B. Macq, L. El Hafi, G. A. Garcia Ricardez, and T. Taniguchi, "GPTAlly: A Safety-oriented System for Human-Robot Collaboration based on Foundation Models", in Proceedings of 2025 IEEE/SICE International Symposium on System Integration (SII 2025), pp. 878-884, Munich, Germany, Jan. 21, 2025. DOI: 10.1109/SII59315.2025.10870936 [International conference article, peer-reviewed.]
< Abstract >
As robots increasingly integrate into the workplace, Human-Robot Collaboration (HRC) has become increasingly important. However, most HRC solutions are based on pre-programmed tasks and use fixed safety parameters, which keeps humans out of the loop. To overcome this, HRC solutions that can easily adapt to human preferences during the operation as well as their safety precautions considering the familiarity with robots are necessary. In this paper, we introduce GPTAlly, a novel safety-oriented system for HRC that leverages the emerging capabilities of Large Language Models (LLMs). GPTAlly uses LLMs to 1) infer users' subjective safety perceptions to modify the parameters of a Safety Index algorithm; 2) decide on subsequent actions when the robot stops to prevent unwanted collisions; and 3) re-shape the robot arm trajectories based on user instructions. We subjectively evaluate the robot's behavior by comparing the safety perception of GPT-4 to the participants. We also evaluate the accuracy of natural language-based robot programming of decision-making requests. The results show that GPTAlly infers safety perception similarly to humans, and achieves an average of 80% of accuracy in decision-making, with few instances under 50%. Code available at: https://axtiop.github.io/GPTAlly
-
E. Martin, S. Hasegawa, J. Solis, B. Macq, R. Ronsse, G. A. Garcia Ricardez, L. El Hafi*, and T. Taniguchi, "Integrating Multimodal Communication and Comprehension Evaluation during Human-Robot Collaboration for Increased Reliability of Foundation Model-based Task Planning Systems", in Proceedings of 2025 IEEE/SICE International Symposium on System Integration (SII 2025), pp. 1053-1059, Munich, Germany, Jan. 21, 2025. DOI: 10.1109/SII59315.2025.10871045 [International conference article, peer-reviewed.][*Corresponding author.]
< Abstract >
Foundation models provide the adaptability needed in robotics but often require explicit tasks or human verification due to potential unreliability in their responses, complicating human-robot collaboration (HRC). To enhance the reliability of such task-planning systems, we propose 1) an adaptive task-planning system for HRC that reliably performs non-predefined tasks implicitly instructed through HRC, and 2) an integrated system combining multimodal large language model (LLM)-based task planning with multimodal communication of human intention to increase the HRC success rate and comfort. The proposed system integrates GPT-4V for adaptive task planning and comprehension evaluation during HRC with multimodal communication of human intention through speech and deictic gestures. Four pick-and-place tasks of gradually increasing difficulty were used in three experiments, each evaluating a key aspect of the proposed system: task planning, comprehension evaluation, and multimodal communication. The quantitative results show that the proposed system can interpret implicitly instructed tabletop pick-and-place tasks through HRC, providing the next object to pick and the correct position to place it, achieving a mean success rate of 0.80. Additionally, the system can evaluate its comprehension of three of the four tasks with an average precision of 0.87. The qualitative results show that multimodal communication not only significantly enhances the success rate but also the feelings of trust and control, willingness to use again, and sense of collaboration during HRC.
-
S. Hasegawa, K. Murata, T. Ishikawa, Y. Hagiwara, A. Taniguchi, L. El Hafi, G. A. Garcia Ricardez, and T. Taniguchi, "大規模言語モデルによる複数ロボットの知識統合とタスク割当を用いた現場学習のコスト削減 (Reducing Cost of On-Site Learning by Multi-Robot Knowledge Integration and Task Decomposition via Large Language Models)", in Journal of the Robotics Society of Japan (JRSJ), 2025. [Domestic journal article, peer-reviewed.][Published in Japanese.][Accepted for publication.]
< Abstract >
When robots are deployed in large environments such as hospitals and offices, they must learn place-object relationships in a short period of time. However, the amount of observational data required for multiple robots to perform object search and tidy-up tasks satisfactorily is often unclear a priori, making rapid knowledge acquisition necessary. Therefore, we propose a method in which each robot inputs its knowledge based on on-site learning of a spatial concept model into a large language model, GPT-4, to infer probabilistic action planning based on its predictions. We conducted simulations of object search tasks with multiple robots according to user instructions and evaluated the success score of each task for each iteration of spatial concept learning. As a result of the experiment, the proposed method achieved a high success score while reducing the amount of observational data by more than half compared to the baseline.
-
T. Sakaguchi, A. Taniguchi, Y. Hagiwara, L. El Hafi, S. Hasegawa, and T. Taniguchi, "Real-World Instance-specific Image Goal Navigation: Bridging Domain Gaps via Contrastive Learning", in Proceedings of 2024 IEEE International Conference on Robotic Computing (IRC 2024), pp. 139-146, Tokyo, Japan, Dec. 11, 2024. DOI: 10.1109/IRC63610.2024.00032 [International conference article, peer-reviewed.]
< Abstract >
Improving instance-specific image goal navigation (InstanceImageNav), which involves locating an object in the real world that is identical to a query image, is essential for enabling robots to help users find desired objects. The challenge lies in the domain gap between the low-quality images observed by the moving robot, characterized by motion blur and low resolution, and the high-quality query images provided by the user. These domain gaps can significantly reduce the task success rate, yet previous work has not adequately addressed them. To tackle this issue, we propose a novel method: few-shot cross-quality instance-aware adaptation (CrossIA). This approach employs contrastive learning with an instance classifier to align features between a large set of low-quality images and a small set of high-quality images. We fine-tuned the SimSiam model, pretrained on ImageNet, using CrossIA with instance labels based on a 3D semantic map. Additionally, our system integrates object image collection with a pretrained deblurring model to enhance the quality of the observed images. Evaluated on an InstanceImageNav task with 20 different instance types, our method improved the task success rate by up to three-fold compared to a baseline based on SuperGlue. These findings highlight the potential of contrastive learning and image enhancement techniques in improving object localization in robotic applications. The project website is https://emergentsystemlabstudent.github.io/DomainBridgingNav/.
-
T. Sakaguchi, A. Taniguchi, Y. Hagiwara, L. El Hafi, S. Hasegawa, and T. Taniguchi, "Object Instance Retrieval in Assistive Robotics: Leveraging Fine-tuned SimSiam with Multi-View Images based on 3D Semantic Map", in Proceedings of 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024), pp. 7817-7824, Abu Dhabi, United Arab Emirates, Oct. 14, 2024. DOI: 10.1109/IROS58592.2024.10802697 [International conference article, peer-reviewed.]
< Abstract >
Robots that assist humans in their daily lives should be able to locate specific instances of objects in an environment that match a user’s desired objects. This task is known as instance-specific image goal navigation (InstanceImageNav), which requires a model that can distinguish different instances of an object within the same class. A significant challenge in robotics is that when a robot observes the same object from various 3D viewpoints, its appearance may differ significantly, making it difficult to recognize and locate accurately. In this paper, we introduce a method called SimView, which leverages multi-view images based on a 3D semantic map of an environment and self-supervised learning using SimSiam to train an instance-identification model on-site. The effectiveness of our approach was validated using a photorealistic simulator, Habitat Matterport 3D, created by scanning actual home environments. Our results demonstrate a 1.7-fold improvement in task accuracy compared with contrastive language-image pre-training (CLIP), a pre-trained multimodal contrastive learning method for object searching. This improvement highlights the benefits of our proposed fine-tuning method in enhancing the performance of assistive robots in InstanceImageNav tasks. The project website is https://emergentsystemlabstudent.github.io/MultiViewRetrieve/.
-
S. Hashimoto, T. Ishikawa, S. Hasegawa, A. Taniguchi, Y. Hagiwara, L. El Hafi, and T. Taniguchi, "Ownership Information Acquisition of Objects in the Environment by Active Question Generation with Multimodal Large Language Models and Probabilistic Generative Models", in Proceedings of 2024 SIGDIAL Workshop on Spoken Dialogue Systems for Cybernetic Avatars (SDS4CA 2024), pp. 1-1, Kyoto, Japan, Sep. 17, 2024. [International workshop abstract, non-peer-reviewed.]
< Abstract >
In daily life environments, such as a home or office, a robot that coexists with the user is required to perform various tasks through interaction with the environment and the user. Ownership information is important for tasks such as the robot bringing an object specified by the user. For example, if there are two identical looking cups in the living room, each cup may have its own owner. Knowing the owner of each cup in advance, the robot can identify "Bob's cup" and perform the task when the user gives the robot the verbal instruction "Bring me Bob's cup". Interaction with the user is effective for the robot to acquire invisible ownership information of objects in the environment. For example, when the robot finds an object, it may be able to acquire the visible attributes of the object, such as "the red cup" from its appearance. On the other hand, ownership information of the object, which depends on the user and the environment, such as whose cup it is, can be acquired through interaction with the user. However, when the robot learns such ownership information, it is burdensome for the user to interact passively with all objects in the environment, such as unilaterally instructing the robot about ownership information of each object. We propose a method that reduces the user's teaching burden and enables the robot to efficiently acquire ownership information of the object. The robot selects whether or not an object should be asked about its ownership information by utilizing the common sense knowledge of the multimodal large language model, GPT-4. It also selects objects to ask questions based on active inference that minimizes the expected free energy for ownership information. Then, a probabilistic generative model is constructed to learn ownership information of the object based on the location and attributes of the object in the environment and the user's answers obtained by question generation. We conduct experiments in a field simulating an actual laboratory to verify whether the robot can accurately learn ownership information of each object placed in the environment. We will also verify whether the proposed method using GPT-4 and active inference improves the learning efficiency and reduces the burden on the user compared to the comparison method.
-
T. Matsushima, R. Takanami, M. Kambara, Y. Noguchi, J. Arima, Y. Ikeda, K. Yanagida, K. Iwata, S. Hasegawa, L. El Hafi, K. Yamao, K. Isomoto, N. Yamaguchi, R. Kobayashi, T. Shiba, Y. Yano, A. Mizutani, H. Tamukoh, T. Horii, K. Sugiura, T. Taniguchi, Y. Matsuo, and Y. Iwasawa, "HSRT-X: コミュニティを活用したロボット基盤モデルの構築", in Proceedings of 2024 Annual Conference of the Robotics Society of Japan (RSJ 2024), ref. RSJ2024AC1D2-01, pp. 1-4, Osaka, Japan, Sep. 3, 2024. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
< Abstract >
多様な環境・タスクに利用可能なEnd-to-Endで巨大なロボットの方策モデルをはじめとするロボット基盤モデルを構築するために,モバイルマニピュレータHSRのユーザコミュニティであるHSRコミュニティを活用して,複数の拠点でデータセットを収集し,モデル学習を行うHSRT-Xプロジェクトの現況に関して紹介する.
-
S. Hasegawa, K. Murata, T. Ishikawa, Y. Hagiwara, A. Taniguchi, L. El Hafi, G. A. Garcia Ricardez, and T. Taniguchi, "マルチモーダル大規模言語モデルによる複数ロボットの知識統合とタスク割当を用いた現場学習のコスト削減", in Proceedings of 2024 Annual Conference of the Robotics Society of Japan (RSJ 2024), ref. RSJ2024AC1D2-02, pp. 1-4, Osaka, Japan, Sep. 3, 2024. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
< Abstract >
病院やオフィスなどの大規模環境にロボットを導入する際,ロボットが物体と場所の関係を迅速に学習することが重要である.複数ロボットが物体探索や片付けタスクを行う際に必要な観測データ量は明確ではなく,迅速な知識獲得が求められる.そこで我々は,各ロボットが場所概念モデルに基づいた現場知識をGPT-4に入力し,その予測に基づいて確率的な行動計画を立てる手法を提案する.シミュレータ上で,複数のロボットがユーザの指示に従って物体探索を実行し,場所の学習回数ごとにタスクの成功スコアを評価した.実験の結果,提案手法はベースラインよりも2倍以上の観測データ量を削減したうえで,高い成功スコアを達成した.
-
K. Murata, S. Hasegawa, T. Ishikawa, Y. Hagiwara, A. Taniguchi, L. El Hafi, and T. Taniguchi, "ロボット間の現場知識の差を考慮した基盤モデルによる物体探索の言語指示におけるタスク分解と割当", in Proceedings of 2024 Annual Conference of the Robotics Society of Japan (RSJ 2024), ref. RSJ2024AC3D2-05, pp. 1-4, Osaka, Japan, Sep. 3, 2024. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
< Abstract >
家庭環境では,“バナナとコップを探して”のように,複数の目標物体を含む言語指示がロボットに与えられる場合が考えられる.このような指示に対して複数のロボットが手分けしてタスクを実行するためには,タスクをサブタスクへ分解し正しく各ロボットへの割当を行うことが重要である.本研究では,提案手法がタスクのサブタスクへの分解と複数のロボットに対するサブタスクの割当を行う.場所名や物体の配置を推論できる場所概念モデルを用いて,GPT-4によるタスクの分解と割当を行う手法を提案し,適切にサブタスクを割当可能かを検証した.実験結果では,GPT-4と場所概念モデルを活用しタスクの分解と割当を行うことで,ベースライン手法より2倍近い成功数でのタスクを割当てを実現した.
-
T. Nakashima, S. Otake, A. Taniguchi, K. Maeyama, L. El Hafi, T. Taniguchi, and H. Yamakawa, "Hippocampal Formation-inspired Global Self-Localization: Quick Recovery from the Kidnapped Robot Problem from an Egocentric Perspective", in Frontiers in Computational Neuroscience, Research Topic on Brain-Inspired Intelligence: The Deep Integration of Brain Science and Artificial Intelligence, vol. 18, pp. 1-15, Jul. 18, 2024. DOI: 10.3389/fncom.2024.1398851 [International journal article, peer-reviewed.]
< Abstract >
It remains difficult for mobile robots to continue accurate self-localization when they are suddenly teleported to a location that is different from their beliefs during navigation. Incorporating insights from neuroscience into developing a spatial cognition model for mobile robots may make it possible to acquire the ability to respond appropriately to changing situations, similar to living organisms. Recent neuroscience research has shown that during teleportation in rat navigation, neural populations of place cells in the cornu ammonis-3 region of the hippocampus, which are sparse representations of each other, switch discretely. In this study, we construct a spatial cognition model using brain reference architecture-driven development, a method for developing brain-inspired software that is functionally and structurally consistent with the brain. The spatial cognition model was realized by integrating the recurrent state—space model, a world model, with Monte Carlo localization to infer allocentric self-positions within the framework of neuro-symbol emergence in the robotics toolkit. The spatial cognition model, which models the cornu ammonis-1 and -3 regions with each latent variable, demonstrated improved self-localization performance of mobile robots during teleportation in a simulation environment. Moreover, it was confirmed that sparse neural activity could be obtained for the latent variables corresponding to cornu ammonis-3. These results suggest that spatial cognition models incorporating neuroscience insights can contribute to improving the self-localization technology for mobile robots. The project website is https://nakashimatakeshi.github.io/HF-IGL/.
-
B. Bastin, "GPTAlly: A Safety-oriented System for Human-Robot Collaboration based on Foundation Models", in Master's thesis, Université catholique de Louvain (UCLouvain), Louvain-la-Neuve, Belgium, Jun. 2024. [Master's thesis.][Supervised by B. Macq, R. Ronsse, G. A. Garcia Ricardez, L. El Hafi, and J. Solis.]
< Abstract >
We are aiming for Society 5.0, which emphasizes improving workplace quality of life through AI and robotics. However, current robots lack human-like situational understanding and often rely on pre-programmed tasks or supervised learning. Additionally, there is a need for safety metrics that consider users' subjective safety perceptions. This thesis introduces GPTAlly, a system for safe human-robot collaboration using Large Language Models (LLMs) and Visual Language Models (VLMs). LLMs help infer users' subjective safety perceptions in collaborative tasks, influencing a Safety Index algorithm that adjusts safety evaluations. The system ensures robots stop to prevent harmful collisions and uses an LLM-based coding paradigm to determine subsequent actions, either autonomously or as per user preferences. The actions are implemented by an LLM, which shapes robotic arm trajectories by interpreting the user's natural language instructions to suggest 3D poses. A user study compares safety perception scaling factors from GPT-4 with participants' estimates. The study also evaluates user satisfaction with the changes in robot behavior. The accuracy of the streamlined coding paradigm is evaluated through contextual experiments by varying the number of conditions processed by the LLM and paraphrasing the conditions. The satisfaction with the trajectories shaped from 3D poses is assessed through another user study. The study finds that LLMs effectively integrate human safety perceptions. GPT-4's estimations of the scaling factors closely match the user responses, and participants express satisfaction with behavior changes. However, the coding paradigm's contextual accuracy can be below 50%. Finally, the robotic arm trajectories found that users preferred trajectories shaped by their natural language inputs over uninfluenced ones. Codebase available at: https://axtiop.github.io/GPTAlly
-
E. Martin, "Task Planning System using Foundation Models in Multimodal Human-Robot Collaboration", in Master's thesis, Université catholique de Louvain (UCLouvain), Louvain-la-Neuve, Belgium, Jun. 2024. [Master's thesis.][Supervised by L. El Hafi, G. A. Garcia Ricardez, B. Macq, R. Ronsse, and J. Solis.]
< Abstract >
Society 5.0, the society Japan aspires to, aims to create a cyber-physical system where humans and robots collaborate. To this end, both should be able to work together on the same tasks. In conventional robotics, robots are trained and specialized to perform specific tasks. While they perform well on this pre-defined set of tasks, these models require extensive data gathering and a time-consuming process. Moreover, when facing unknown environments, they experience a decrease in performance due to their non-adaptability to unforeseen situations. Additionally, if they are part of the same working team, the robots must understand and interpret human intentions. However, most of the past proposed intention recognition methods also lack flexibility and contextualization capability. To tackle this, this thesis proposes 1) a dynamic task planning system capable of performing non-predefined tasks, and 2) a framework that combines automatic task planning with human multimodal intention communication, enhancing the success of the task and human well-being (e.g., trust, willingness to use the system again). In this regard, there have been recent improvements in zero-shot learning in Human-Robot Collaboration using large pre-trained models. Because they were trained on large amounts of data, these models can apply their knowledge to tasks beyond their training data. Visual Language Models have recently demonstrated their ability to understand and analyze images. For this reason, these models are widely used as the robot’s reasoning module. Therefore, the system proposed in this thesis is divided into three modules: 1) automatic task planning computed using GPT-4V, 2) use of GPT-4V to compute a confidence level that reflects its comprehension of the task, and 3) a multimodal communication module to correct the automatic task planning in case of failure. Firstly, automatic task planning is achieved by feeding the Visual Language Model with an image of the task currently being performed. The VLM is then asked to determine the next step to pursue the task. The confidence level is defined as a number between 0 and 10, reflecting the robot’s comprehension of the task. Multimodal communication is achieved using deictic movements and speech communication. The results show that: 1) GPT-4V is able to understand simple tabletop pick-and-place tasks and provide the next object to pick and the corresponding placement position, 2) GPT-4V is able to evaluate its comprehension for three of the four implemented tasks, and 3) multimodal communication integrated into the automatic system enhances, in the tested task, both the success rate and human well-being.
-
C. Tornberg, L. El Hafi*, P. M. Uriguen Eljuri, M. Yamamoto, G. A. Garcia Ricardez, J. Solis, and T. Taniguchi, "Mixed Reality-based 6D-Pose Annotation System for Robot Manipulation in Retail Environments", in Proceedings of 2024 IEEE/SICE International Symposium on System Integration (SII 2024), pp. 1425-1432, Ha Long, Vietnam, Jan. 8, 2024. DOI: 10.1109/SII58957.2024.10417443 [International conference article, peer-reviewed.][*Corresponding author.]
< Abstract >
Robot manipulation in retail environments is a challenging task due to the need for large amounts of annotated data for accurate 6D-pose estimation of items. Onsite data collection, additional manual annotation, and model fine-tuning are often required when deploying robots in new environments, as varying lighting conditions, clutter, and occlusions can significantly diminish performance. Therefore, we propose a system to annotate the 6D pose of items using mixed reality (MR) to enhance the robustness of robot manipulation in retail environments. Our main contribution is a system that can display 6D-pose estimation results of a trained model from multiple perspectives in MR, and enable onsite (re-)annotation of incorrectly inferred item poses using hand gestures. The proposed system is compared to a PC-based annotation system using a mouse and the robot camera's point cloud in an extensive quantitative experiment. Our experimental results indicate that MR can increase the accuracy of pose annotation, especially by reducing position errors.
-
P. Zhu, L. El Hafi*, and T. Taniguchi, "Visual-Language Decision System through Integration of Foundation Models for Service Robot Navigation", in Proceedings of 2024 IEEE/SICE International Symposium on System Integration (SII 2024), pp. 1288-1295, Ha Long, Vietnam, Jan. 8, 2024. DOI: 10.1109/SII58957.2024.10417171 [International conference article, peer-reviewed.][*Corresponding author.]
< Abstract >
This study aims to build a system that bridges the gap between robotics and environmental understanding by integrating various foundation models. While current visual-language models (VLMs) and large language models (LLMs) have demonstrated robust capabilities in image recognition and language comprehension, challenges remain in integrating them into practical robotic applications. Therefore, we propose a visual-language decision (VLD) system that allows a robot to autonomously analyze its surroundings using three VLMs (CLIP, OFA, and PaddleOCR) to generate semantic information. This information is further processed using the GPT-3 LLM, which allows the robot to make judgments during autonomous navigation. The contribution is twofold: 1) We show that integrating CLIP, OFA, and PaddleOCR into a robotic system can generate task-critical information in unexplored environments; 2) We explore how to effectively use GPT-3 to match the results generated by specific VLMs and make navigation decisions based on environmental information. We also implement a photorealistic training environment using Isaac Sim to test and validate the proposed VLD system in simulation. Finally, we demonstrate VLD-based real-world navigation in an unexplored environment using a TurtleBot3 robot equipped with a lidar and an RGB camera.
-
A. Kanechika, L. El Hafi*, A. Taniguchi, Y. Hagiwara, and T. Taniguchi, "Interactive Learning System for 3D Semantic Segmentation with Autonomous Mobile Robots", in Proceedings of 2024 IEEE/SICE International Symposium on System Integration (SII 2024), pp. 1274-1281, Ha Long, Vietnam, Jan. 8, 2024. DOI: 10.1109/SII58957.2024.10417237 [International conference article, peer-reviewed.][*Corresponding author.]
< Abstract >
Service robots operating in unfamiliar environments require capabilities for autonomous object recognition and learning from user interactions. However, present semantic segmentation methods, crucial for such tasks, often demand large datasets and costly annotations to achieve accurate inference. In addition, they cannot handle all possible objects or environmental variations without a large additional number of images and annotations. Therefore, this study introduces a learning system for semantic segmentation that combines 3D semantic mapping with interactions between an autonomous mobile robot and a user. We show that the proposed system can: 1) autonomously construct 3D semantic maps using an autonomous mobile robot, 2) improve the prediction accuracy of models pre-trained by supervised and weakly supervised learning in new environments, even without interaction, and 3) more accurately predict new classes of objects with a small number of additional coarse annotations obtained through interaction. Results obtained from experiments conducted in a real-world setting using models pre-trained on the NYU, VOC, and COCO datasets demonstrated an improvement in semantic segmentation accuracy when using our proposed system.
-
S. Hasegawa, A. Taniguchi, Y. Hagiwara, L. El Hafi, and T. Taniguchi, "Integrating Probabilistic Logic and Multimodal Spatial Concepts for Efficient Robotic Object Search in Home Environments", in SICE Journal of Control, Measurement, and System Integration (JCMSI), Virtual Issue on IEEE/SICE SII 2023, vol. 16, no. 1, pp. 400-422, Dec. 26, 2023. DOI: 10.1080/18824889.2023.2283954 [International journal article, peer-reviewed.]
< Abstract >
Our study introduces a novel approach that combined probabilistic logic and multimodal spatial concepts to enable a robot to efficiently acquire place-object relationships in a new home environment with few learning iterations. By leveraging probabilistic logic, which employs predicate logic with probability values, we represent common-sense knowledge of the place-object relationships. The integration of logical inference and cross-modal inference to calculate conditional probabilities across different modalities enables the robot to infer object locations even when their likely locations are undefined. To evaluate the effectiveness of our method, we conducted simulation experiments and compared the results with three baselines: multimodal spatial concepts only, common-sense knowledge only, and common-sense knowledge and multimodal spatial concepts combined. By comparing the number of room visits required by the robot to locate 24 objects, we demonstrated the improved performance of our approach. For search tasks including objects whose locations were undefined, the findings demonstrate that our method reduced the learning cost by a factor of 1.6 compared to the baseline methods. Additionally, we conducted a qualitative analysis in a real-world environment to examine the impact of integrating the two inferences and identified the scenarios that influence changes in the task success rate.
-
G. A. Garcia Ricardez, C. Tornberg, L. El Hafi, J. Solis, and T. Taniguchi, "Toward Safe and Efficient Human-Robot Teams: Mixed Reality-based Robot Motion and Safety Index Visualization", in Abstract Booklet of 16th IFToMM World Congress (WC 2023), pp. 53-54, Tokyo, Japan, Nov. 5, 2023. [International conference abstract, peer-reviewed.]
-
Y. Hagiwara, S. Hasegawa, A. Oyama, A. Taniguchi, L. El Hafi, and T. Taniguchi, "現場環境で学習した知識に基づく曖昧な発話からの生活物理支援タスク", in Proceedings of 2023 Annual Conference of the Robotics Society of Japan (RSJ 2023), ref. RSJ2023AC1J1-05, pp. 1-4, Sendai, Japan, Sep. 9, 2023. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
< Abstract >
家庭環境では,「あれ取って」や「コップを持ってきて」といった曖昧な言語指示が用いられる.これらの言語指示には,持ってくる物体や取ってくる場所の情報が明示的に含まれていない.本稿では,ロボットが現場環境で学習した知識に基づいて不足している情報を補い,曖昧な言語指示から生活物理支援タスクを実現する二つの手法について述べる.一つは,現場のマルチモーダル情報を用いた指示語を含む言語指示の外部照応解析の手法である.もう一つは,場所概念モデルにより獲得された現場知識と大規模言語モデルを活用したプランニングの手法である.
-
S. Hasegawa, M. Ito, R. Yamaki, T. Sakaguchi, Y. Hagiwara, A. Taniguchi, L. El Hafi, and T. Taniguchi, "生活支援ロボットの行動計画のための大規模言語モデルと場所概念モデルの活用", in Proceedings of 2023 Annual Conference of the Robotics Society of Japan (RSJ 2023), ref. RSJ2023AC1K3-06, pp. 1-4, Sendai, Japan, Sep. 9, 2023. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
< Abstract >
生活支援ロボットがユーザに支援をするときに,ロボットはユーザの言語指示を理解し,その場に適した行動を取ることが重要である.我々は,多様な指示にロボットが対処するために,場所概念モデルで構築した現場知識をChatGPTに与え,現場知識に基づき行動計画を行うシステムを提案する.シミュレータ上で,ロボットがユーザの指示から物体を探す実験を行った.ロボットが物体を発見するまでに要した部屋の訪問数等を評価した.実験から,提案システムはBaselineよりも探索時の部屋の訪問数を削減可能なことを示した.
-
G. A. Garcia Ricardez, T. Wakayama, S. Ikemura, E. Fujiura, P. M. Uriguen Eljuri, H. Ikeuchi, M. Yamamoto, L. El Hafi, and T. Taniguchi, "Toward Resilient Manipulation of Food Products: Analysis of 6D-Pose Estimation at the Future Convenience Store Challenge 2022", in Proceedings of 2023 IEEE International Conference on Automation Science and Engineering (CASE 2023), pp. 1-6, Auckland, New Zealand, Aug. 26, 2023. DOI: 10.1109/CASE56687.2023.10260506 [International conference article, peer-reviewed.]
< Abstract >
Service robots, the class of robots that are designed to assist humans in their daily lives, are needed in the retail industry to compensate for the labor shortage. To foster innovation, the Future Convenience Store Challenge was created, where robotic systems for the manipulation of food products are tasked to dispose of expired products and replenish the shelves. We, as team NAIST-RITS-Panasonic, have developed a mobile manipulator with which we have obtained 1st place in the past three editions of the challenge. In the last edition, we manipulated the five types of items without fiducial markers or customized packaging using a suction-based end effector. In this paper, we evaluate the accuracy of the 6D-pose estimation as well as its effect on the grasping success rate by 1) comparing the 6D-pose estimation results with the ground truth, and 2) evaluating the grasping success rate with the estimated pose during and after the competition. The results show that the 6D-pose estimation error has a significant effect on the grasping success rate.
-
A. Taniguchi, Y. Tabuchi, T. Ishikawa, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "Active Exploration based on Information Gain by Particle Filter for Efficient Spatial Concept Formation", in RSJ Advanced Robotics (AR), Special Issue on World Models and Predictive Coding in Robotics (Part I), vol. 37, no. 13, pp. 840-870, Jul. 3, 2023. DOI: 10.1080/01691864.2023.2225175 [International journal article, peer-reviewed.]
< Abstract >
Autonomous robots need to learn the categories of various places by exploring their environments and interacting with users. However, preparing training datasets with linguistic instructions from users is time-consuming and labor-intensive. Moreover, effective exploration is essential for appropriate concept formation and rapid environmental coverage. To address this issue, we propose an active inference method, referred to as spatial concept formation with information gain-based active exploration (SpCoAE) that combines sequential Bayesian inference using particle filters and information gain-based destination determination in a probabilistic generative model. This study interprets the robot's action as a selection of destinations to ask the user, "What kind of place is this?" in the context of active inference. This study provides insights into the technical aspects of the proposed method, including active perception and exploration by the robot, and how the method can enable mobile robots to learn spatial concepts through active exploration. Our experiment demonstrated the effectiveness of the SpCoAE in efficiently determining a destination for learning appropriate spatial concepts in home environments.
-
S. Hasegawa, R. Yamaki, A. Taniguchi, Y. Hagiwara, L. El Hafi, and T. Taniguchi, "大規模言語モデルと場所概念モデルの統合による未観測物体の語彙を含んだ言語指示理解 (Understanding Language Instructions that Include the Vocabulary of Unobserved Objects by Integrating a Large Language Model and a Spatial Concept Model)", in Proceedings of 2023 Annual Conference of the Japanese Society for Artificial Intelligence (JSAI 2023), ref. 1Q4-OS-7b-03, pp. 1-4, Kumamoto, Japan, Jun. 7, 2023. DOI: 10.11517/pjsai.JSAI2023.0_1Q4OS7b03 [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
< Abstract >
For a robot to assist people in home environments, it is important to handle the vocabulary of unobserved objects while learning the knowledge of places. It is assumed that there exist objects that the robot did not observe through its sensors during learning. For such a case, the robot is expected to perform household tasks on language instructions that include the vocabulary of these objects. We propose a method that integrates a large language model and a spatial concept model to enable the robot to understand language instructions that include the vocabulary of unobserved objects while learning places. Even if the objects that the user instructed the robot to search for are not included in a training dataset during learning, the number of room visits during object search can be expected to reduce by combining the inference of these models. We validated our method in an experiment in which a robot searched for unobserved objects in a simulated environment. The results showed that our proposed method could reduce the number of room visits during the search compared to the baseline method.
-
L. El Hafi, Y. Zheng, H. Shirouzu, T. Nakamura, and T. Taniguchi, "Serket-SDE: A Containerized Software Development Environment for the Symbol Emergence in Robotics Toolkit", in Proceedings of 2023 IEEE/SICE International Symposium on System Integration (SII 2023), pp. 1-6, Atlanta, United States, Jan. 17, 2023. DOI: 10.1109/SII55687.2023.10039424 [International conference article, peer-reviewed.]
< Abstract >
The rapid deployment of intelligent robots to perform service tasks has become an increasingly complex challenge for researchers due to the number of disciplines and skills involved. Therefore, this paper introduces Serket-SDE, a containerized Software Development Environment (SDE) for the Symbol Emergence in Robotics Toolkit (Serket) that relies on open-source technologies to build cognitive robotic systems from multimodal sensor observations. The main contribution of Serket-SDE is an integrated framework that allows users to rapidly compose, scale, and deploy probabilistic generative models with robots. The description of Serket-SDE is accompanied by demonstrations of unsupervised multimodal categorizations using a mobile robot in various simulation environments. Further extensions of the Serket-SDE framework are discussed in conclusion based on the demonstrated results.
-
S. Hasegawa, A. Taniguchi, Y. Hagiwara, L. El Hafi, and T. Taniguchi, "Inferring Place-Object Relationships by Integrating Probabilistic Logic and Multimodal Spatial Concepts", in Proceedings of 2023 IEEE/SICE International Symposium on System Integration (SII 2023), pp. 1-8, Atlanta, United States, Jan. 17, 2023. DOI: 10.1109/SII55687.2023.10039318 [International conference article, peer-reviewed.]
< Abstract >
We propose a novel method that integrates probabilistic logic and multimodal spatial concepts to enable a robot to acquire the relationships between places and objects in a new environment with a few learning times. Using predicate logic with probability values (i.e., probabilistic logic) to represent commonsense knowledge of place-object relationships, we combine logical inference using probabilistic logic with the cross-modal inference that can calculate the conditional probabilities of other modalities given one modality. This allows the robot to infer the place of the object to find even when it does not know the likely place of the object in the home environment. We conducted experiments in which a robot searched for daily objects, including objects with undefined places, in a simulated home environment using four approaches: 1) multimodal spatial concepts only, 2) commonsense knowledge only, 3) commonsense knowledge and multimodal spatial concepts, and 4) probabilistic logic and multimodal spatial concepts (proposed). We confirmed the effectiveness of the proposed method by comparing the number of place visits it took for the robot to find all the objects. We also observed that our proposed approach reduces the on-site learning cost by a factor of 1.6 over the three baseline methods when the robot performs the task of finding objects with undefined places in a new home environment.
-
H. Nakamura, L. El Hafi*, A. Taniguchi, Y. Hagiwara, and T. Taniguchi, "Multimodal Object Categorization with Reduced User Load through Human-Robot Interaction in Mixed Reality", in Proceedings of 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022), pp. 2143-2150, Kyoto, Japan, Oct. 23, 2022. DOI: 10.1109/IROS47612.2022.9981374 [International conference article, peer-reviewed.][*Corresponding author.]
< Abstract >
Enabling robots to learn from interactions with users is essential to perform service tasks. However, as a robot categorizes objects from multimodal information obtained by its sensors during interactive onsite teaching, the inferred names of unknown objects do not always match the human user's expectation, especially when the robot is introduced to new environments. Confirming the learning results through natural speech interaction with the robot often puts an additional burden on the user who can only listen to the robot to validate the results. Therefore, we propose a human-robot interface to reduce the burden on the user by visualizing the inferred results in mixed reality (MR). In particular, we evaluate the proposed interface on the system usability scale (SUS) and the NASA task load index (NASA-TLX) with three experimental object categorization scenarios based on multimodal latent Dirichlet allocation (MLDA) in which the robot: 1) does not share the inferred results with the user at all, 2) shares the inferred results through speech interaction with the user (baseline), and 3) shares the inferred results with the user through an MR interface (proposed). We show that providing feedback through an MR interface significantly reduces the temporal, physical, and mental burden on the human user compared to speech interaction with the robot.
-
G. A. Garcia Ricardez, P. M. Uriguen Eljuri, Y. Kamemura, S. Yokota, N. Kugou, Y. Asama, Z. Wang, H. Kumamoto, K. Yoshimoto, W. Y. Chan, T. Nagatani, P. Tulathum, B. Usawalertkamol, L. El Hafi, H. Ikeuchi, M. Yamamoto, J. Takamatsu, T. Taniguchi, and T. Ogasawara, "Autonomous Service Robot for Human-aware Restock, Straightening and Disposal Tasks in Retail Automation", in RSJ Advanced Robotics (AR), Special Issue on Service Robot Technology: Selected Papers from WRS 2020 (Part I), vol. 36, no. 17-18, pp. 936-950, Sep. 17, 2022. DOI: 10.1080/01691864.2022.2109429 [International journal article, peer-reviewed.]
< Abstract >
The workforce shortage in the service industry, recently highlighted by the pandemic, has increased the need for automation. We propose an autonomous robot to fulfill this purpose. Our mobile manipulator includes an extendable and compliant end effector design, as well as a custom-made automated shelf, and it is capable of manipulating food products such as lunch boxes, while traversing narrow spaces and reacting to human interventions. We benchmarked the solution in the international robotics competition Future Convenience Store Challenge (FCSC) where we obtained the first place in the 2020 edition, as well as in a laboratory setting, both situated in a convenience store scenario. We reported the results evaluated in terms of the score of the FCSC 2020 and further discussed the real-world applicability of the current system and open challenges.
-
T. Wakayama, E. Fujiura, M. Yamaguchi, N. Yoshida, T. Inoue, H. Ikeuchi, M. Yamamoto, L. El Hafi, G. A. Garcia Ricardez, J. Takamatsu, T. Taniguchi, and T. Ogasawara, "Versatile Cleaning Service Robot based on a Mobile Manipulator with Tool Switching for Liquids and Garbage Removal in Restrooms", in RSJ Advanced Robotics (AR), Special Issue on Service Robot Technology: Selected Papers from WRS 2020 (Part I), vol. 36, no. 17-18, pp. 967-981, Sep. 17, 2022. DOI: 10.1080/01691864.2022.2109430 [International journal article, peer-reviewed.]
< Abstract >
In recent years, the labor shortage has become a significant problem in Japan and other countries due to aging societies. However, service robots can play a decisive role in relieving human workers by performing various household and assistive tasks. Restroom cleaning is one of such challenging tasks that involve performing motion planning in a constrained restroom setting. In this study, we propose a mobile manipulator to perform various tasks related to restroom cleaning. Our key contributions include system integration of multiple tools on an arm with high DoF mounted on a mobile, omni-directional platform capable of versatile service cleaning and with extended reachability. We evaluate the performance of our system with the competition setting used for the restroom cleaning task of the Future Convenience Store Challenge at the World Robot Summit 2020, where we obtained the 1st Place. The proposed system successfully completed all the competition tasks within the time limit and could remove the liquid with a removal rate of 96%. The proposed system could also dispose of most garbage and got an average garbage disposal rate of 90%. Further experiments confirmed the scores obtained in the competition with an even higher liquid removal rate of 98%.
-
P. M. Uriguen Eljuri, Y. Toramatsu, K. Maeyama, L. El Hafi, and T. Taniguchi, "Software Development Environment to Collect Sensor and Robot Data for Imitation Learning of a Pseudo Cranial Window Task", in Proceedings of 2022 Annual Conference of the Robotics Society of Japan (RSJ 2022), ref. RSJ2024AC4A2-07, pp. 1-4, Tokyo, Japan, Sep. 5, 2022. [Domestic conference article, non-peer-reviewed.]
< Abstract >
The use of AI in robotics has become more common, attempting to make robots able to learn and execute tasks similar to humans. To teach the robot how to do a task, we must record multiple samples from an expert. This paper introduces our containerized software development environment that can be quickly deployed to collect and extract data from a robot while doing a teleoperation task. This environment can be deployed on multiple computers, so a user can extract and process the collected data while the expert keeps recording sample data. This environment was tested by recording multiple sensor information while an expert did a pseudo cranial window task.
-
S. Hasegawa, Y. Hagiwara, A. Taniguchi, L. El Hafi, and T. Taniguchi, "確率論理と場所概念を結合したモデルによる場所の学習コストの削減 (Reducing the Cost of Learning Places via a Model that Integrates Probabilistic Logic and Spatial Concept)", in Proceedings of 2022 Annual Conference of the Japanese Society for Artificial Intelligence (JSAI 2022), ref. 1N5-OS-10b-04, pp. 1-4, Kyoto, Japan, Jun. 14, 2022. DOI: 10.11517/pjsai.JSAI2022.0_1N5OS10b04 [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
< Abstract >
We propose a method that integrates probabilistic logic and spatial concept to enable a robot to acquire knowledge of the relationships between objects and places in a new environment with a few learning times. By combining logical inference with prior knowledge and cross-modal inference within spatial concept, the robot can infer the place of an object even when the probability of its existence is a priori unknown. We conducted experiments in which a robot searched for objects in a simulation environment using four methods: 1) spatial concept only, 2) prior knowledge only, 3) spatial concept and prior knowledge, and 4) probabilistic logic and spatial concept (proposed). We confirmed the effectiveness of the proposed method by comparing the number of place visits it took for the robot to find all the objects. We observed that the robot could find the objects faster using the proposed method.
-
G. A. Garcia Ricardez, L. El Hafi, H. Ikeuchi, M. Yamamoto, J. Takamatsu, T. Taniguchi, and T. Ogasawara, "Team NAIST-RITS-Panasonic at the Future Convenience Store Challenge: Our Approach from 2018 to 2021", in Journal of the Society of Instrument and Control Engineers (SICE), Special Issue on WRS Future Convenience Store Challenge, vol. 61, no. 6, pp. 422-425, Jun. 10, 2022. DOI: 10.11499/sicejl.61.422 [Domestic journal article, non-peer-reviewed.]
< Abstract >
The paper describes the system development approach followed by researchers and engineers of the team NAIST-RITS-Panasonic (Japan) for their participation in the Future Convenience Store Challenge 2018, 2019 trials and 2020 (held in 2021). This international competition is about the development of robotic capabilities to execute complex tasks for retail automation. The team built four different robots with multiple end effectors, as well as different technologies for mobile manipulation, vision, and HRI. The diversity of the tasks and the competitiveness of the challenge allowed us to template our philosophy and to delineate our path to innovation.
-
L. El Hafi, G. A. Garcia Ricardez, F. von Drigalski, Y. Inoue, M. Yamamoto, and T. Yamamoto, "Software Development Environment for Collaborative Research Workflow in Robotic System Integration", in RSJ Advanced Robotics (AR), Special Issue on Software Framework for Robot System Integration, vol. 36, no. 11, pp. 533-547, Jun. 3, 2022. DOI: 10.1080/01691864.2022.2068353 [International journal article, peer-reviewed.]
< Abstract >
Today's robotics involves a large range of knowledge and skills across many disciplines. This issue has recently come to light as robotics competitions attract more talented teams to tackle unsolved problems. Although the tasks are challenging, the preparation cycles are usually short. The teams involved, ranging from academic institutions to small startups and large companies, need to develop and deploy their solutions with agility. Therefore, this paper introduces a containerized Software Development Environment (SDE) based on a collaborative workflow relying on open-source technologies for robotic system integration and deployment. The proposed SDE enables the collaborators to focus on their individual expertise and rely on automated tests and unattended simulations. The analysis of the adoption of the proposed SDE shows that several research institutions successfully deployed it in multiple international competitions with various robotic platforms.
-
T. Wakayama, E. Fujiura, M. Yamaguchi, H. Ikeuchi, M. Yamamoto, L. El Hafi, and G. A. Garcia Ricardez, "掃除ツール取り換え機能を有する多種類ゴミ廃棄可能なトイレ清掃ロボットの開発 World Robot Summit 2020 Future Convenience Store Challengeを活用した実用システム開発の試み (Development of the Restroom Cleaning Robot that Can Dispose of Various Types of Garbage with a Cleaning Tool Change Function: Attempt to Develop a Practical System utilizing World Robot Summit 2020 Future Convenience Store Challenge)", in Proceedings of 2022 JSME Conference on Robotics and Mechatronics (ROBOMECH 2022), no. 22-2, ref. 2P2-T03, pp. 1-4, Sapporo, Japan, Jun. 1, 2022. DOI: 10.1299/jsmermd.2022.2P2-T03 [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
< Abstract >
We have developed a ROS-based mobile manipulator for restroom cleaning with the capability to recognize garbage of various types (pieces of paper, cups, and liquids) and to select the most appropriate tool from three cleaning tools (suction to hold, vacuuming, and moping) to effectively clean a restroom. Upon deployment at the Future Convenience Store Challenge of the World Robot Summit 2020, we obtained the 1st Place in the Restroom Cleaning Task with an almost perfect score (96%).
-
A. Taniguchi, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "記号創発ロボティクスにおける場所概念の形成と応用 (Spatial Concept Formation for Symbol Emergence in Robotics and its Application)", in ISCIE Systems, Control and Information, Special Issue on Spatial Cognition and Semantic Understanding in Mobile Robots, vol. 66 no. 4, pp. 133-138, Apr. 15, 2022. DOI: 10.11509/isciesci.66.4_133 [Domestic journal article, non-peer-reviewed.][Published in Japanese.]
-
Y. Katsumata, A. Kanechika, A. Taniguchi, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "Map Completion from Partial Observation using the Global Structure of Multiple Environmental Maps", in RSJ Advanced Robotics (AR), Special Issue on Symbol Emergence in Robotics and Cognitive Systems (II), vol. 36, no. 5-6, pp. 279-290, Mar. 19, 2022. DOI: 10.1080/01691864.2022.2029762 [International journal article, peer-reviewed.]
< Abstract >
Using the spatial structure of various indoor environments as prior knowledge, the robot would construct the map more efficiently. Autonomous mobile robots generally apply simultaneous localization and mapping (SLAM) methods to understand the reachable area in newly visited environments. However, conventional mapping approaches are limited by only considering sensor observation and control signals to estimate the current environment map. This paper proposes a novel SLAM method, map completion network-based SLAM (MCN-SLAM), based on a probabilistic generative model incorporating deep neural networks for map completion. These map completion networks are primarily trained in the framework of generative adversarial networks (GANs) to extract the global structure of large amounts of existing map data. We show in experiments that the proposed method can estimate the environment map 1.3 times better than the previous SLAM methods in the situation of partial observation.
-
T. Fukumori, C. Cai, Y. Zhang, L. El Hafi, Y. Hagiwara, T. Nishiura, and T. Taniguchi, "Optical Laser Microphone for Human-Robot Interaction: Speech Recognition in Extremely Noisy Service Environments", in RSJ Advanced Robotics (AR), Special Issue on Symbol Emergence in Robotics and Cognitive Systems (II), vol. 36, no. 5-6, pp. 304-317, Mar. 19, 2022. DOI: 10.1080/01691864.2021.2023629 [International journal article, peer-reviewed.]
< Abstract >
Domestic robots are often required to understand spoken commands in noisy environments, including service appliances' operating sounds. Most conventional domestic robots use electret condenser microphones (ECMs) to record the sound. However, the ECMs are known to be sensitive to the noise in the direction of sound arrival. The laser Doppler vibrometer (LDV), which has been widely used in the research field of measurement, has the potential to work as a new speech-input device to solve this problem. The aim of this paper is to investigate the effectiveness of using the LDV as an optical laser microphone for human-robot interaction in extremely noisy service environments. Our robot irradiates an object near a speaker with a laser and measures the vibration of the object to record the sound. We conducted three experiments to assess the performance of speech recognition using the optical laser microphone in various settings and showed stable performance in extremely noisy conditions compared with a conventional ECM.
-
J. Wang, L. El Hafi*, A. Taniguchi, Y. Hagiwara, and T. Taniguchi, "Extending HoloGAN by Embedding Image Content into Latent Vectors for Novel View Synthesis", in Proceedings of 2022 IEEE/SICE International Symposium on System Integration (SII 2022), pp. 383-389, Narvik, Norway (Virtual), Jan. 9, 2022. DOI: 10.1109/SII52469.2022.9708823 [International conference article, peer-reviewed.][*Corresponding author.]
< Abstract >
This study aims to further develop the task of novel view synthesis by generative adversarial networks (GAN). The goal of novel view synthesis is to, given one or more input images, synthesize images of the same target content but from different viewpoints. Previous research showed that the unsupervised learning model HoloGAN achieved high performance in generating images from different viewpoints. However, HoloGAN is less capable of specifying the target content to generate and is difficult to train due to high data requirements. Therefore, this study proposes two approaches to improve the current limitations of HoloGAN and make it suitable for the task of novel view synthesis. The first approach reuses the encoder network of HoloGAN to get the corresponding latent vectors of the image contents to specify the target content of the generated images. The second approach introduces an auto-encoder architecture to HoloGAN so that more viewpoints can be generated correctly. The experiment results indicate that the first approach is efficient in specifying a target content. Meanwhile, the second approach method helps HoloGAN to learn a richer range of viewpoints but is not compatible with the first approach. The combination of these two approaches and their application to service robotics are discussed in conclusion.
-
A. S. Rathore, L. El Hafi*, G. A. Garcia Ricardez, and T. Taniguchi, "Human Action Categorization System using Body Pose Estimation for Multimodal Observations from Single Camera", in Proceedings of 2022 IEEE/SICE International Symposium on System Integration (SII 2022), pp. 914-920, Narvik, Norway (Virtual), Jan. 9, 2022. DOI: 10.1109/SII52469.2022.9708816 [International conference article, peer-reviewed.][*Corresponding author.]
< Abstract >
We propose a system using a multimodal probabilistic approach to solve the human action recognition challenge. This is achieved by extracting the human pose from an ongoing activity from a single camera. This pose is used to capture additional body information using generalized features such as location, time, distances, and angles. A probabilistic model, multimodal latent Dirichlet allocation (MLDA), which uses this multimodal information, is then used to recognize actions through topic modeling. We also investigate the influence of each modality and their combinations to recognize human actions from multimodal observations. The experiments show that the proposed generalized features captured significant information that enabled the classification of various daily activities without requiring prior labeled data.
-
P. M. Uriguen Eljuri, L. El Hafi, G. A. Garcia Ricardez, A. Taniguchi, and T. Taniguchi, "Neural Network-based Motion Feasibility Checker to Validate Instructions in Rearrangement Tasks before Execution by Robots", in Proceedings of 2022 IEEE/SICE International Symposium on System Integration (SII 2022), pp. 1058-1063, Narvik, Norway (Virtual), Jan. 9, 2022. DOI: 10.1109/SII52469.2022.9708602 [International conference article, peer-reviewed.]
< Abstract >
In this paper, we address the task of rearranging items with a robot. A rearrangement task is challenging because it requires us to solve the following issues: determine how to pick the items and plan how and where to place the items. In our previous work, we proposed to solve a rearrangement task by combining the symbolic and motion planners using a Motion Feasibility Checker (MFC) and a Monte Carlo Tree Search (MCTS). The MCTS searches for the goal while it collaborates with the MFC to accept or reject instructions. We could solve the rearrangement task, but one drawback is the time it takes to find a solution. In this study, we focus on quickly accepting or rejecting tentative instructions obtained from an MCTS. We propose using a Neural Network-based Motion Feasibility Checker (NN-MFC), a Fully Connected Neural Network trained with data obtained from the MFC. This NN-MFC quickly decides if the instruction is valid or not, reducing the time the MCTS uses to find a solution to the task. The NN-MFC determines the validity of the instruction based on the initial and target poses of the item. Before the final execution of the instructions, we re-validate the instructions with the MFC as a confirmation before execution. We tested the proposed method in a simulation environment by doing an item rearrangement task in a convenience store setup.
-
T. Wakayama, G. A. Garcia Ricardez, L. El Hafi, and J. Takamatsu, "6D-Pose Estimation for Manipulation in Retail Robotics using the Inference-embedded OAK-D Camera", in Proceedings of 2022 IEEE/SICE International Symposium on System Integration (SII 2022), pp. 1046-1051, Narvik, Norway (Virtual), Jan. 9, 2022. DOI: 10.1109/SII52469.2022.9708910 [International conference article, peer-reviewed.]
< Abstract >
The socio-economic need for service robots has become more evident during the ongoing pandemic. To boost their deployment, robots need to improve their manipulation capabilities, which includes solving one of the biggest challenges: determine the position and orientation of the target objects. While conventional approaches use markers which require constant maintenance, deep-learning-based approaches require a host computer with high specifications. In this paper, we propose a 6D-pose estimation system whose segmentation algorithm is embedded into OAK-D, a camera capable of running neural networks on-board, which reduces the host requirements. Furthermore, we propose a point cloud selection method to increase the accuracy of the 6D-pose estimation. We test our solution in a convenience store setup where we mount the OAK-D camera on a mobile robot developed for straightening and disposing of items, and whose manipulation success depends on 6D-pose estimation. We evaluate the accuracy of our solution by comparing the estimated 6D-pose of eight items to the ground truth. Finally, we discuss technical challenges faced during the integration of the proposed solution into a fully autonomous robot.
-
P. M. Uriguen Eljuri, Y. Toramatsu, L. El Hafi, G. A. Garcia Ricardez, A. Taniguchi, and T. Taniguchi, "物体の再配置タスクにおける動作実行可能性判定器のニューラルネットワークを用いた高速化 (Neural Network Acceleration of Motion Feasibility for Object Arrangement Task)", in Proceedings of 2021 SICE System Integration Division Annual Conference (SI 2021), pp. 3422-3425, (Virtual), Dec. 15, 2021. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
< Abstract >
We focus on the task of arranging objects using a robot. In our previous work, we proposed to use a Monte Carlo Tree Search (MCTS) and a Motion Feasibility Checker (MFC) to solve the task. However, the problem with the existing method is that is time-consuming. In this paper, we propose to use a Neural Network-based MFC (NN-MFC). This NN-MFC can quickly determine the motion feasibility of the robot and reduce the time used by the MCTS to find a solution. We tested the proposed method in a simulation environment by doing an item rearrangement task in a convenience store setup.
-
S. Hasegawa, A. Taniguchi, Y. Hagiwara, L. El Hafi, T. Nakashima, and T. Taniguchi, "確率論理と場所概念モデルの結合による確率的プランニング", in Proceedings of 2021 Annual Conference of the Robotics Society of Japan (RSJ 2021), ref. RSJ2021AC2H2-01, pp. 1-4, (Virtual), Sep. 8, 2021. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
-
T. Nakashima, A. Taniguchi, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "場所概念獲得がLoop Closure性能に及ぼす影響評価", in Proceedings of 2021 Annual Conference of the Robotics Society of Japan (RSJ 2021), ref. RSJ2021AC1I4-06, pp. 1-3, (Virtual), Sep. 8, 2021. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
-
A. Kanechika, L. El Hafi, A. Taniguchi, Y. Hagiwara, and T. Taniguchi, "自然な発話文教示に基づく弱教師あり物体領域分割の検証", in Proceedings of 2021 Annual Conference of the Robotics Society of Japan (RSJ 2021), ref. RSJ2021AC1I2-02, pp. 1-4, (Virtual), Sep. 8, 2021. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
-
T. Taniguchi, L. El Hafi, Y. Hagiwara, A. Taniguchi, N. Shimada, and T. Nishiura, "The Necessity of Semiotically Adaptive Cognition for Realizing Remotely-Operated Service Robots in the New Normal Society", in Proceedings of 2021 IEEE International Conference on Advanced Robotics and its Social Impacts (ARSO 2021), pp. 266-267, Tokoname, Japan (Virtual), Jul. 8, 2021. [International conference article, peer-reviewed.]
< Abstract >
In this study, we argue that the development of semiotically adaptive cognition is indispensable for realizing remotely-operated service robots to enhance the quality of the new normal society. To enable a wide range of people to work from home in a pandemic like the current COVID-19 situation, the installation of remotely-operated service robots into the work environment is crucial. However, it is evident that remotely-operated robots must have partial autonomy. The capability of learning local semiotic knowledge for improving autonomous decision making and language understanding is crucial to reduce the workload of people working from home. To achieve this goal, we refer to three challenges: the learning of local semiotic knowledge from daily human activities, the acceleration of local knowledge learning with transfer learning and active exploration, and the augmentation of human-robot interactions.
-
T. Taniguchi, L. El Hafi, Y. Hagiwara, A. Taniguchi, N. Shimada, and T. Nishiura, "Semiotically Adaptive Cognition: Toward the Realization of Remotely-Operated Service Robots for the New Normal Symbiotic Society", in RSJ Advanced Robotics (AR), Extra Special Issue on Soft/Social/Systemic (3S) Robot Technologies for Enhancing Quality of New Normal (QoNN), vol. 35, no. 11, pp. 664-674, Jun. 3, 2021. DOI: 10.1080/01691864.2021.1928552 [International journal article, peer-reviewed.]
< Abstract >
The installation of remotely-operated service robots in the environments of our daily life (including offices, homes, and hospitals) can improve work-from-home policies and enhance the quality of the so-called new normal. However, it is evident that remotely-operated robots must have partial autonomy and the capability to learn and use local semiotic knowledge. In this paper, we argue that the development of semiotically adaptive cognitive systems is key to the installation of service robotics technologies in our service environments. To achieve this goal, we describe three challenges: the learning of local knowledge, the acceleration of onsite and online learning, and the augmentation of human-robot interactions.
-
A. Taniguchi, S. Isobe, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "Autonomous Planning based on Spatial Concepts to Tidy Up Home Environments with Service Robots", in RSJ Advanced Robotics (AR), vol. 35, no. 8, pp. 471-489, Apr. 18, 2021. DOI: 10.1080/01691864.2021.1890212 [International journal article, peer-reviewed.]
< Abstract >
Tidy-up tasks by service robots in home environments are challenging in robotics applications because they involve various interactions with the environment. In particular, robots are required not only to grasp, move, and release various home objects but also to plan the order and positions for placing the objects. In this paper, we propose a novel planning method that can efficiently estimate the order and positions of the objects to be tidied up by learning the parameters of a probabilistic generative model. The model allows a robot to learn the distributions of the co-occurrence probability of the objects and places to tidy up using the multimodal sensor information collected in a tidied environment. Additionally, we develop an autonomous robotic system to perform the tidy-up operation. We evaluate the effectiveness of the proposed method by an experimental simulation that reproduces the conditions of the Tidy Up Here task of the World Robot Summit 2018 international robotics competition. The simulation results show that the proposed method enables the robot to successively tidy up several objects and achieves the best task score among the considered baseline tidy-up methods.
-
L. El Hafi, H. Nakamura, A. Taniguchi, Y. Hagiwara, and T. Taniguchi, "Teaching System for Multimodal Object Categorization by Human-Robot Interaction in Mixed Reality", in Proceedings of 2021 IEEE/SICE International Symposium on System Integration (SII 2021), pp. 320-324, Iwaki, Japan (Virtual), Jan. 11, 2021. DOI: 10.1109/IEEECONF49454.2021.9382607 [International conference article, peer-reviewed.]
< Abstract >
As service robots are becoming essential to support aging societies, teaching them how to perform general service tasks is still a major challenge preventing their deployment in daily-life environments. In addition, developing an artificial intelligence for general service tasks requires bottom-up, unsupervised approaches to let the robots learn from their own observations and interactions with the users. However, compared to the top-down, supervised approaches such as deep learning where the extent of the learning is directly related to the amount and variety of the pre-existing data provided to the robots, and thus relatively easy to understand from a human perspective, the learning status in bottom-up approaches is by their nature much harder to appreciate and visualize. To address these issues, we propose a teaching system for multimodal object categorization by human-robot interaction through Mixed Reality (MR) visualization. In particular, our proposed system enables a user to monitor and intervene in the robot's object categorization process based on Multimodal Latent Dirichlet Allocation (MLDA) to solve unexpected results and accelerate the learning. Our contribution is twofold by 1) describing the integration of a service robot, MR interactions, and MLDA object categorization in a unified system, and 2) proposing an MR user interface to teach robots through intuitive visualization and interactions.
-
K. Hayashi, W. Zheng, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "Bidirectional Generation of Object Images and Positions using Deep Generative Models for Service Robotics Applications", in Proceedings of 2021 IEEE/SICE International Symposium on System Integration (SII 2021), pp. 325-329, Iwaki, Japan (Virtual), Jan. 11, 2021. DOI: 10.1109/IEEECONF49454.2021.9382768 [International conference article, peer-reviewed.]
< Abstract >
The introduction of systems and robots for automated services is important for reducing running costs and improving operational efficiency in the retail industry. To this aim, we develop a system that enables robot agents to display products in stores. The main problem in automating product display using common supervised methods with robot agents is the huge amount of data required to recognize product categories and arrangements in a variety of different store layouts. To solve this problem, we propose a crossmodal inference system based on joint multimodal variational autoencoder (JMVAE) that learns the relationship between object image information and location information observed on site by robot agents. In our experiments, we created a simulation environment replicating a convenience store that allows a robot agent to observe an object image and its 3D coordinate information, and confirmed whether JMVAE can learn and generate a shared representation of an object image and 3D coordinates in a bidirectional manner.
-
Y. Katsumata, A. Taniguchi, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "SpCoMapGAN: Spatial Concept Formation-based Semantic Mapping with Generative Adversarial Networks", in Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2020), pp. 7927-7934, Las Vegas, United States (Virtual), Oct. 24, 2020. DOI: 10.1109/IROS45743.2020.9341456 [International conference article, peer-reviewed.]
< Abstract >
In semantic mapping, which connects semantic information to an environment map, it is a challenging task for robots to deal with both local and global information of environments. In addition, it is important to estimate semantic information of unobserved areas from already acquired partial observations in a newly visited environment. On the other hand, previous studies on spatial concept formation enabled a robot to relate multiple words to places from bottom-up observations even when the vocabulary was not provided beforehand. However, the robot could not transfer global information related to the room arrangement between semantic maps from other environments. In this paper, we propose SpCoMapGAN, which generates the semantic map in a newly visited environment by training an inference model using previously estimated semantic maps. SpCoMapGAN uses generative adversarial networks (GANs) to transfer semantic information based on room arrangements to a newly visited environment. Our proposed method assigns semantics to the map of an unknown environment using the prior distribution of the map trained in known environments and the multimodal observations made in the unknown environment. We experimentally show in simulation that SpCoMapGAN can use global information for estimating the semantic map and is superior to previous methods. Finally, we also demonstrate in a real environment that SpCoMapGAN can accurately 1) deal with local information, and 2) acquire the semantic information of real places.
-
L. El Hafi and T. Yamamoto, "Toward the Public Release of a Software Development Environment for Human Support Robots", in Proceedings of 2020 Annual Conference of the Robotics Society of Japan (RSJ 2020), ref. RSJ2020AC3E1-01, pp. 1-2, (Virtual), Oct. 9, 2020. [Domestic conference article, non-peer-reviewed.]
< Abstract >
This paper describes the latest developments of the ongoing effort to bring a shared Software Development Environment (SDE) to the Toyota Human Support Robot (HSR) Community to collaborate on large robotics projects. The SDE described in this paper is developed and maintained by the HSR Software Development Environment Working Group (SDE-WG) officially endorsed by Toyota Motor Corporation (TMC). The source code and documentation for deployment are available to all HSR Community members upon request at: https://gitlab.com/hsr-sde-wg/HSR.
-
Y. Katsumata, A. Kanechika, A. Taniguchi, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "深層生成モデルを用いた地図補完とSLAMの統合", in Proceedings of 2020 Annual Conference of the Robotics Society of Japan (RSJ 2020), ref. RSJ2020AC2C1-01, pp. 1-4, (Virtual), Oct. 9, 2020. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
-
H. Nakamura, L. El Hafi, A. Taniguchi, Y. Hagiwara, and T. Taniguchi, "拡張現実を用いたロボットの物体カテゴリ分類教示システムの提案", in Proceedings of 2020 Annual Conference of the Robotics Society of Japan (RSJ 2020), ref. RSJ2020AC2E3-02, pp. 1-4, (Virtual), Oct. 9, 2020. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
-
A. Taniguchi, Y. Tabuchi, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "環境の能動的な探索による効率的な場所概念の形成 (Efficient Spatial Concept Formation by Active Exploration of the Environment)", in Proceedings of 2020 Annual Conference of the Japanese Society for Artificial Intelligence (JSAI 2020), ref. 2M4-OS-3a-05, pp. 1-4, Kumamoto, Japan (Virtual), Jun. 9, 2020. DOI: 10.11517/pjsai.JSAI2020.0_2M4OS3a05 [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
< Abstract >
Autonomous service robots are required to adaptively learn the categories and names of various places through the exploration of the surrounding environment and interactions with users. In this study, we aim to realize the efficient learning of spatial concepts by autonomous active exploration with a mobile robot. Therefore, we propose an active learning algorithm that combines sequential Bayesian inference by a particle filter and position determination based on information-gain in probabilistic generative models. Our experiment shows that the proposed method can efficiently determine the position to form spatial concepts in simulated home environments.
-
Y. Katsumata, A. Taniguchi, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "Generative Adversarial Networksと場所概念形成の確率モデルの融合に基づくSemantic Mapping (Probabilistic Model of Spatial Concepts Integrating Generative Adversarial Networks for Semantic Mapping)", in Proceedings of 2020 Annual Conference of the Japanese Society for Artificial Intelligence (JSAI 2020), ref. 2M6-GS-13-01, pp. 1-4, Kumamoto, Japan (Virtual), Jun. 9, 2020. DOI: 10.11517/pjsai.JSAI2020.0_2M6GS1301 [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
< Abstract >
This paper proposes SpCoMapGAN, a method to generate the semantic map in a newly visited environment by training an inference model using previously estimated semantic maps. SpCoMapGAN uses generative adversarial networks (GANs) to transfer semantic information based on room arrangements to the newly visited environment. We experimentally show in simulation that SpCoMapGAN can use global information for estimating the semantic map and is superior to previous related methods.
-
G. A. Garcia Ricardez*, L. El Hafi*, and F. von Drigalski*, "Standing on Giant's Shoulders: Newcomer's Experience from the Amazon Robotics Challenge 2017", in Book chapter, Advances on Robotic Item Picking: Applications in Warehousing & E-Commerce Fulfillment, pp. 87-100, May 9, 2020. DOI: 10.1007/978-3-030-35679-8_8 [Book chapter.][*Authors contributed equally.]
< Abstract >
International competitions have fostered innovation in fields such as artificial intelligence, robotic manipulation, and computer vision, and incited teams to push the state of the art. In this chapter, we present the approach, design philosophy and development strategy that we followed during our participation in the Amazon Robotics Challenge 2017, a competition focused on warehouse automation. After introducing our solution, we detail the development of two of its key features: the suction tool and storage system. A systematic analysis of the suction force and details of the end effector features, such as suction force control, grasping, and collision detection, are also presented. Finally, this chapter reflects on the lessons we learned from our participation in the competition, which we believe are valuable to future robot challenge participants, as well as warehouse automation system designers.
-
G. A. Garcia Ricardez, S. Okada, N. Koganti, A. Yasuda, P. M. Uriguen Eljuri, T. Sano, P.-C. Yang, L. El Hafi, M. Yamamoto, J. Takamatsu, and T. Ogasawara, "Restock and Straightening System for Retail Automation using Compliant and Mobile Manipulation", in RSJ Advanced Robotics (AR), Special Issue on Service Robot Technology: Selected Papers from WRS 2018, vol. 34, no. 3-4, pp. 235-249, Feb. 16, 2020. DOI: 10.1080/01691864.2019.1698460 [International journal article, peer-reviewed.]
< Abstract >
As the retail industry keeps expanding and shortage of workers increasing, there is a need for autonomous manipulation of products to support retail operations. The increasing amount of products and customers in establishments such as convenience stores requires the automation of restocking, disposing and straightening of products. The manipulation of products needs to be time-efficient, avoid damaging products and beautify the display of products. In this paper, we propose a robotic system to restock shelves, dispose expired products, and straighten products in retail environments. The proposed mobile manipulator features a custom-made end effector with compact and compliant design to safely and effectively manipulate products in retail stores. Through experiments in a convenience store scenario, we verify the effectiveness of our system to restock, dispose and rearrange items.
-
G. A. Garcia Ricardez, N. Koganti, P.-C. Yang, S. Okada, P. M. Uriguen Eljuri, A. Yasuda, L. El Hafi, M. Yamamoto, J. Takamatsu, and T. Ogasawara, "Adaptive Motion Generation using Imitation Learning and Highly-Compliant End Effector for Autonomous Cleaning", in RSJ Advanced Robotics (AR), Special Issue on Service Robot Technology: Selected Papers from WRS 2018, vol. 34, no. 3-4, pp. 189-201, Feb. 16, 2020. DOI: 10.1080/01691864.2019.1698461 [International journal article, peer-reviewed.]
< Abstract >
Recent demographic trends in super aging societies, such as Japan, is leading to severe worker shortage. Service robots can play a promising role to augment human workers for performing various household and assistive tasks. Toilet cleanup is one such challenging task that involves performing complaint motion planning in a constrained toilet setting. In this study, we propose an end-to-end robotic framework to perform various tasks related to toilet cleanup. Our key contributions include the design of a complaint and multipurpose end-effector, an adaptive motion generation algorithm, and an autonomous mobile manipulator capable of garbage detection, garbage disposal and liquid removal. We evaluate the performance of our framework with the competition setting used for toilet cleanup in the Future Convenience Store Challenge at the World Robot Summit 2018. We demonstrate that our proposed framework is capable of successfully completing all the tasks of the competition within the time limit.
-
L. El Hafi, S. Isobe, Y. Tabuchi, Y. Katsumata, H. Nakamura, T. Fukui, T. Matsuo, G. A. Garcia Ricardez, M. Yamamoto, A. Taniguchi, Y. Hagiwara, and T. Taniguchi, "System for Augmented Human-Robot Interaction through Mixed Reality and Robot Training by Non-Experts in Customer Service Environments", in RSJ Advanced Robotics (AR), Special Issue on Service Robot Technology: Selected Papers from WRS 2018, vol. 34, no. 3-4, pp. 157-172, Feb. 16, 2020. DOI: 10.1080/01691864.2019.1694068 [International journal article, peer-reviewed.]
< Abstract >
Human-robot interaction during general service tasks in home or retail environment has been proven challenging, partly because (1) robots lack high-level context-based cognition and (2) humans cannot intuit the perception state of robots as they can for other humans. To solve these two problems, we present a complete robot system that has been given the highest evaluation score at the Customer Interaction Task of the Future Convenience Store Challenge at the World Robot Summit 2018, which implements several key technologies: (1) a hierarchical spatial concepts formation for general robot task planning and (2) a mixed reality interface to enable users to intuitively visualize the current state of the robot perception and naturally interact with it. The results obtained during the competition indicate that the proposed system allows both non-expert operators and end users to achieve human-robot interactions in customer service environments. Furthermore, we describe a detailed scenario including employee operation and customer interaction which serves as a set of requirements for service robots and a road map for development. The system integration and task scenario described in this paper should be helpful for groups facing customer interaction challenges and looking for a successfully deployed base to build on.
-
Y. Katsumata, L. El Hafi, A. Taniguchi, Y. Hagiwara, and T. Taniguchi, "Integrating Simultaneous Localization and Mapping with Map Completion using Generative Adversarial Networks", in Proceedings of 2019 IEEE/RSJ IROS Workshop on Deep Probabilistic Generative Models for Cognitive Architecture in Robotics (DPGM-CAR 2019), Macau, China, Nov. 8, 2019. [International conference article, peer-reviewed.]
< Abstract >
When autonomous robots perform tasks which include moving in daily human environments, they need to generate environment maps. In this research, we propose a simultaneous localization and mapping method which integrates the prior probability distribution of the map completion trained by a generative model architecture. The contribution of this research is that the method can estimate the environment map efficiently from pre-training in other environments. We show with an experiment that the proposed method performs better than other classic methods to estimate environment maps by observation without moving in a simulator.
-
L. El Hafi, S. Matsuzaki, S. Itadera, and T. Yamamoto, "Deployment of a Containerized Software Development Environment for Human Support Robots", in Proceedings of 2019 Annual Conference of the Robotics Society of Japan (RSJ 2019), ref. RSJ2019AC3K1-03, pp. 1-2, Tokyo, Japan, Sep. 3, 2019. [Domestic conference article, non-peer-reviewed.]
< Abstract >
This paper introduces a containerized Software Development Environment (SDE) for the Toyota Human Support Robot (HSR) to collaborate on large robotics projects. The objective is twofold: 1) enable interdisciplinary teams to quickly start research and development with the HSR by sharing a containerized SDE, and 2) accelerate research implementation and integration within the Toyota HSR Community by deploying a common SDE across its members. The SDE described in this paper is developed and maintained by the HSR Software Development Environment Working Group (SDE-WG) following a solution originally proposed by Ritsumeikan University and endorsed by Toyota Motor Corporation (TMC). The source code and documentation required to deploy the SDE are available to all HSR Community members upon request at: https://gitlab.com/hsr-sde-wg/HSR.
-
H. Nakamura, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "複合現実によるロボットの空間認識可視化のためのSemantic-ICPを用いたキャリブレーション (Calibration System using Semantic-ICP for Visualization of Robot Spatial Perception through Mixed Reality)", in Proceedings of 2019 Annual Conference of the Japanese Society for Artificial Intelligence (JSAI 2019), ref. 1L3-J-11-02, pp. 1-4, Niigata, Japan, Jun. 4, 2019. DOI: 10.11517/pjsai.JSAI2019.0_1L3J1102 [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
< Abstract >
To achieve symbiosis between humans and robots, it is important to know what the robots recognize in their environment. Such information can be displayed using a Mixed Reality (MR) head-mounted device to provide an intuitive understanding of a robot perception. However, a robust calibration system is required because the robot and head-mounted MR device have different coordinate systems. In this paper, we develop a semantic-based calibration system for human-robot interactions in MR using Semantic-ICP. We show that the calibration system using Semantic-ICP is better than using GICP SE(3) when the accuracy of the semantic labels is high.
-
L. El Hafi, Y. Hagiwara, and T. Taniguchi, "Abstraction-Rich Workflow for Agile Collaborative Development and Deployment of Robotic Solutions", in Proceedings of 2018 Annual Conference of the Robotics Society of Japan (RSJ 2018), ref. RSJ2018AC3D3-02, pp. 1-3, Kasugai, Japan, Sep. 5, 2018. [Domestic conference article, non-peer-reviewed.]
< Abstract >
This paper introduces a collaborative workflow for development and deployment of robotic solutions. The main contribution lies in the introduction of multiple layers of abstraction between the different components and processes. These layers enable the collaborators to focus on their individual expertise and rely on automated tests and simulations from the system. The ultimate goal is to help interdisciplinary teams to work together efficiently on robotics projects.
-
J. Takamatsu, L. El Hafi, K. Takemura, and T. Ogasawara, "角膜反射画像を用いた視線追跡と物体認識 (Gaze Estimation and Object Recognition using Corneal Images)", in Proceedings of 149th MOC/JSAP Microoptics Meeting on Recognition and Authentication, vol. 36, no. 3, pp. 13-18, Tokyo, Japan, Sep. 5, 2018. [Domestic workshop article, non-peer-reviewed.][Published in Japanese.]
< Abstract >
We introduce a method for simultaneously estimating gaze directions and types of the gazed objects from corneal images captured by an eye camera embedded in an eye tracker. The proposed method is useful for simplifying the inherent mechanisms of eye trackers. Since the target objects are distorted on the corneal images, we use two approaches: one is to undistort the images and then apply conventional object detection, and the other is to apply deep learning-based object detection directly to the distorted images. In the latter approach, we describe a method to collect a large amount of data to train the detection with little effort.
-
G. A. Garcia Ricardez, F. von Drigalski, L. El Hafi, S. Okada, P.-C. Yang, W. Yamazaki, V. G. Hoerig, A. Delmotte, A. Yuguchi, M. Gall, C. Shiogama, K. Toyoshima, P. M. Uriguen Eljuri, R. Elizalde Zapata, M. Ding, J. Takamatsu, and T. Ogasawara, "Warehouse Picking Automation System with Learning- and Feature-based Object Recognition and Grasping Point Estimation", in Proceedings of 2017 SICE System Integration Division Annual Conference (SI 2017), pp. 2249-2253, Sendai, Japan, Dec. 20, 2017. [Domestic conference article, non-peer-reviewed.]
< Abstract >
The Amazon Robotics Challenge (ARC) has become one of the biggest robotic competitions in the field of warehouse automation and manipulation. In this paper, we present our solution to the ARC 2017 which uses both learning-based and feature-based techniques for object recognition and grasp point estimation in unstructured collections of objects and a partially controlled space. Our solution proved effective both for previously unknown items even with little data acquisition, as well as for items from the training set, obtaining the 6th place out of 16 contestants.
-
G. A. Garcia Ricardez, F. von Drigalski, L. El Hafi, M. Ding, J. Takamatsu, and T. Ogasawara, "Lessons from the Airbus Shopfloor Challenge 2016 and the Amazon Robotics Challenge 2017", in Proceedings of 2017 SICE System Integration Division Annual Conference (SI 2017), pp. 572-575, Sendai, Japan, Dec. 20, 2017. [Domestic conference article, non-peer-reviewed.]
< Abstract >
International robotics competitions bring together the research community to solve real-world, current problems such as drilling in aircraft manufacturing (Airbus Shopfloor Challenge) and warehouse automation (Amazon Robotics Challenge). In this paper, we discuss our approaches to these competitions and describe the technical difficulties, design philosophy, development, lessons learned and remaining challenges.
-
F. von Drigalski*, L. El Hafi*, P. M. Uriguen Eljuri*, G. A. Garcia Ricardez*, J. Takamatsu, and T. Ogasawara, "Vibration-Reducing End Effector for Automation of Drilling Tasks in Aircraft Manufacturing", in IEEE Robotics and Automation Letters (RA-L), vol. 2, no. 4, pp. 2316-2321, Oct, 2017. DOI: 10.1109/LRA.2017.2715398 [International journal article, peer-reviewed.][Presented at 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017), Vancouver, Canada, Sep. 2017.][*Authors contributed equally.]
< Abstract >
In this letter, we present an end effector that can drill holes compliant to aeronautic standards while mounted on a lightweight robot arm. There is an unmet demand for a robotic solution capable of drilling inside an aircraft fuselage, as size, weight, and space constraints disqualify current commercial solutions for this task. Our main contribution is the mechanical design of the end effector with high-friction, vibration-reducing feet that are pressed against the workpiece during the drilling process to increase stability, and a separate linear actuator to advance the drill. This relieves the robot arm of the task of advancing and stabilizing the drill, and leaves it with the task of positioning and holding the end effector. The stabilizing properties of the end effector are confirmed experimentally. The solution took first place at the Airbus Shopfloor Challenge, an international robotics competition held at ICRA 2016 that modeled the in-fuselage drilling task.
-
L. El Hafi, "STARE: Real-Time, Wearable, Simultaneous Gaze Tracking and Object Recognition from Eye Images (STARE: 眼球画像を用いた実時間処理可能な装着型デバイスによる視線追跡と物体認識)", in PhD thesis, Nara Institute of Science and Technology (NAIST), ref. NAIST-IS-DD1461207, Ikoma, Japan, Sep. 25, 2017. DOI: 10.34413/dr.01472 [PhD thesis.][Supervised by T. Ogasawara, H. Kato, J. Takamatsu, M. Ding, and K. Takemura.]
< Abstract >
This thesis proposes STARE, a wearable system to perform real-time, simultaneous eye tracking and focused object recognition for daily-life applications in varied illumination environments. The proposed system extracts both the gaze direction and scene information using eye images captured by a single RGB camera facing the user's eye. In particular, the method requires neither infrared sensors nor a front-facing camera to capture the scene, making it more socially acceptable when embedded in a wearable device. This approach is made possible by recent technological advances in increased resolution and reduced size of camera sensors, as well as significantly more powerful image treatment techniques based on deep learning. First, a model-based approach is used to estimate the gaze direction using RGB eye images. A 3D eye model is constructed from an image of the eye by fitting an ellipse onto the iris. The gaze direction is then continuously track by rotating the model to simulate projections of the iris area for different eye poses and matching the iris area of the subsequent images with the corresponding projections obtained from the model. By using an additional one-time calibration, the point of regard (POR) is computed, which allows to identify where a user is looking in the scene image reflected on the cornea. Next, objects in the scene reflected on the cornea are recognized in real time using the gaze direction information. Deep learning algorithms are applied to classify and then recognize the focused object in the area surrounding the reflected POR on the eye image. Additional processes using High Dynamic Range (HDR) demonstrate that the proposed method can perform in varied illumination conditions. Finally, the validity of the approach is verified experimentally with a 3D-printable prototype of a wearable device equipped with dual cameras, and a high-sensitivity camera in extreme illumination conditions. Further, a proof-of-concept implementation of a state-of-the-art neural network shows that the focused object recognition can be performed in real time. To summarize, the proposed method and prototype contribute a novel, complete framework to 1) simultaneously perform eye tracking and focused object analysis in real time, 2) automatically generate datasets of focused objects by using the reflected POR, 3) reduce the number of sensors in current gaze trackers to a single RGB camera, and 4) enable daily-life applications in all kinds of illumination. The combination of these features makes it an attractive choice for eye-based human behavior analysis, as well as for creating large datasets of objects focused by the user during daily tasks.
-
L. El Hafi, M. Ding, J. Takamatsu, and T. Ogasawara, "STARE: Realtime, Wearable, Simultaneous Gaze Tracking and Object Recognition from Eye Images", in SMPTE Motion Imaging Journal, vol. 126, no. 6, pp. 37-46, Aug. 9, 2017. DOI: 10.5594/JMI.2017.2711899 [International journal article, peer-reviewed.]
< Abstract >
We propose STARE, a wearable system to perform realtime, simultaneous eye tracking and focused object recognition for daily-life applications in varied illumination environments. Our proposed method uses a single camera sensor to evaluate the gaze direction and requires neither a front-facing camera nor infrared sensors. To achieve this, we describe: 1) a model-based approach to estimate the gaze direction using red-green-blue (RGB) eye images; 2) a method to recognize objects in the scene reflected on the cornea in real time; and 3) a 3D-printable prototype of a wearable gaze-tracking device. We verify the validity of our approach experimentally with different types of cameras in different illumination settings, and with a proof-of-concept implementation of a state-of-the-art neural network. The proposed system can be used as a framework for RGB-based eye tracking and human behavior analysis.
-
G. A. Garcia Ricardez*, L. El Hafi*, F. von Drigalski*, R. Elizalde Zapata, C. Shiogama, K. Toyoshima, P. M. Uriguen Eljuri, M. Gall, A. Yuguchi, A. Delmotte, V. G. Hoerig, W. Yamazaki, S. Okada, Y. Kato, R. Futakuchi, K. Inoue, K. Asai, Y. Okazaki, M. Yamamoto, M. Ding, J. Takamatsu, and T. Ogasawara, "Climbing on Giant's Shoulders: Newcomer's Road into the Amazon Robotics Challenge 2017", in Proceedings of 2017 IEEE ICRA Warehouse Picking Automation Workshop (WPAW 2017), Singapore, Singapore, May 29, 2017. [International workshop article, non-peer-reviewed.][*Authors contributed equally.]
< Abstract >
The Amazon Robotics Challenge has become one of the biggest robotic challenges in the field of warehouse automation and manipulation. In this paper, we present an overview of materials available for newcomers to the challenge, what we learned from the previous editions and discuss the new challenges within the Amazon Robotics Challenge 2017. We also outline how we developed our solution, the results of an investigation on suction cup size and some notable difficulties we encountered along the way. Our aim is to speed up development for those who come after and, as first-time contenders like us, have to develop a solution from zero.
-
L. El Hafi, M. Ding, J. Takamatsu, and T. Ogasawara, "眼球画像を用いた視線追跡と物体認識 日常生活のための装着型デバイス (Gaze Tracking and Object Recognition from Eye Images: Wearable Device for Daily Life)", in Proceedings of 2017 JSME Conference on Robotics and Mechatronics (ROBOMECH 2017), no. 17-2, ref. 2A1-I12, pp. 1-2, Fukushima, Japan, May 10, 2017. DOI: 10.1299/jsmermd.2017.2A1-I12 [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
< Abstract >
This paper introduces a method to identify the focused object in eye images captured from a single camera in order to enable intuitive eye-based interactions using wearable devices. The proposed method relies on a 3D eye model reconstruction to evaluate the gaze direction from the eye images. The gaze direction is then used in combination with deep learning algorithms to classify the focused object reflected on the cornea. Experimental results using a wearable prototype demonstrate the potential of the proposed method.
-
L. El Hafi, M. Ding, J. Takamatsu, and T. Ogasawara, "Gaze Tracking and Object Recognition from Eye Images", in Proceedings of 2017 IEEE International Conference on Robotic Computing (IRC 2017), pp. 310-315, Taichung, Taiwan, Apr. 10, 2017. DOI: 10.1109/IRC.2017.44 [International conference article, peer-reviewed.]
< Abstract >
This paper introduces a method to identify the focused object in eye images captured from a single camera in order to enable intuitive eye-based interactions using wearable devices. Indeed, eye images allow to not only obtain natural user responses from eye movements, but also the scene reflected on the cornea without the need for additional sensors such as a frontal camera, thus making it more socially acceptable. The proposed method relies on a 3D eye model reconstruction to evaluate the gaze direction from the eye images. The gaze direction is then used in combination with deep learning algorithms to classify the focused object reflected on the cornea. Finally, the experimental results using a wearable prototype demonstrate the potential of the proposed method solely based on eye images captured from a single camera.
-
L. El Hafi, M. Ding, J. Takamatsu, and T. Ogasawara, "Gaze Tracking using Corneal Images Captured by a Single High-Sensitivity Camera", in The Best of IET and IBC 2016-2017, vol. 8, pp. 19-24, Sep. 8, 2016. [International journal article, peer-reviewed.][Also in Proceedings of 2016 International Broadcasting Convention (IBC 2016), pp. 33-43, Amsterdam, Netherlands, Sep. 2016.]
< Abstract >
This paper introduces a method to estimate gaze direction using images of the eye captured by a single high-sensitivity camera. The purpose is to develop wearable devices that enable intuitive eye-based interactions and applications. Indeed, camera-based solutions, as opposed to commercially available infrared-based ones, allow wearable devices to not only obtain natural user responses from eye movements, but also scene images reflected on the cornea, without the need for additional sensors. The proposed method relies on a model approach to evaluate the gaze direction and does not require a frontal camera to capture scene information, making it more socially acceptable if embedded in a glasses-shaped device. Moreover, recent development in high-sensitivity camera sensors allows us to consider the proposed method even in low-light condition. Finally, experimental results using a prototype wearable device demonstrate the potential of the proposed method solely based on cornea images captured from a single camera.
-
L. El Hafi, M. Ding, J. Takamatsu, and T. Ogasawara, "Gaze Tracking using Corneal Images Captured by a Single High-Sensitivity Camera", in Proceedings of 2016 International Broadcasting Convention (IBC 2016), pp. 33-43, Amsterdam, Netherlands, Sep. 8, 2016. DOI: 10.1049/ibc.2016.0033 [International conference article, peer-reviewed.][Also in The Best of IET and IBC 2016-2017, vol. 8, pp. 19-24, Sep. 2016.]
< Abstract >
This paper introduces a method to estimate gaze direction using images of the eye captured by a single high-sensitivity camera. The purpose is to develop wearable devices that enable intuitive eye-based interactions and applications. Indeed, camera-based solutions, as opposed to commercially available infrared-based ones, allow wearable devices to not only obtain natural user responses from eye movements, but also scene images reflected on the cornea, without the need for additional sensors. The proposed method relies on a model approach to evaluate the gaze direction and does not require a frontal camera to capture scene information, making it more socially acceptable if embedded in a glasses-shaped device. Moreover, recent development in high-sensitivity camera sensors allows us to consider the proposed method even in low-light condition. Finally, experimental results using a prototype wearable device demonstrate the potential of the proposed method solely based on cornea images captured from a single camera.
-
F. von Drigalski*, L. El Hafi*, P. M. Uriguen Eljuri*, G. A. Garcia Ricardez*, J. Takamatsu, and T. Ogasawara, "NAIST Drillbot: Drilling Robot at the Airbus Shopfloor Challenge", in Proceedings of 2016 Annual Conference of the Robotics Society of Japan (RSJ 2016), ref. RSJ2016AC3X2-03, pp. 1-2, Yamagata, Japan, Sep. 7, 2016. [Domestic conference article, non-peer-reviewed.][*Authors contributed equally.]
< Abstract >
We propose a complete, modular robotic solution for industrial drilling tasks in an aircraft fuselage. The main contribution is a custom-made end effector with vibration-reducing feet that rest on the workpiece during the drilling process to increase stability. The solution took 1st place at the Airbus Shopfloor Challenge, an international robotics competition held at ICRA 2016.
-
L. El Hafi, P. M. Uriguen Eljuri, M. Ding, J. Takamatsu, and T. Ogasawara, "Wearable Device for Camera-based Eye Tracking: Model Approach using Cornea Images (カメラを用いた視線追跡のための装着型デバイス 角膜画像によるモデルアプローチ)", in Proceedings of 2016 JSME Conference on Robotics and Mechatronics (ROBOMECH 2016), no. 16-2, ref. 1A2-14a4, pp. 1-4, Yokohama, Japan, Jun. 8, 2016. DOI: 10.1299/jsmermd.2016.1A2-14a4 [Domestic conference article, non-peer-reviewed.]
< Abstract >
The industry's recent growing interest in virtual reality, augmented reality and smart wearable devices has created a new momentum for eye tracking. Eye movements in particular are viewed as a way to obtain natural user responses from wearable devices alongside gaze information used to analyze interests and behaviors. This paper extends our previous work by introducing a wearable eye-tracking device that enables the reconstruction of 3D eye models of each eye from two RGB cameras. The proposed device is built using high-resolution cameras and a 3D-printed frame attached to a pair of JINS MEME glasses. The 3D eye models reconstructed from the proposed device can be used with any model-based eye-tracking approach. The proposed device is also capable of extracting scene information from the cornea reflections captured by the cameras, detecting blinks from an electrooculography sensor as well as tracking head movements from an accelerometer combined with a gyroscope.
-
A. Yuguchi, R. Matsura, R. Baba, Y. Hakamata, W. Yamazaki, F. von Drigalski, L. El Hafi, S. Tsuichihara, M. Ding, J. Takamatsu, and T. Ogasawara, "モーションキャプチャによるボールキャッチ可能なロボット制御コンポーネント群の開発 (Development of Robot Control Components for Ball-Catching Task using Motion Capture Device)", in Proceedings of 2015 SICE System Integration Division Annual Conference (SI 2015), pp. 1067-1068, Nagoya, Japan, Dec. 14, 2015. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
< Abstract >
This paper describes the design and implementation of RT-middleware components for a ball-catching task by humanoid robots. We create a component to get the position of a thrown reflective ball from a motion capture device. We also create component to estimate the trajectory and the point where it will fall. The estimation is used to catch the ball using an HRP-4 humanoid robot with the control component.
-
L. El Hafi, K. Takemura, J. Takamatsu, and T. Ogasawara, "Model-based Approach for Gaze Estimation from Corneal Imaging using a Single Camera", in Proceedings of 2015 IEEE/SICE International Symposium on System Integration (SII 2015), pp. 88-93, Nagoya, Japan, Dec. 11, 2015. DOI: 10.1109/SII.2015.7404959 [International conference article, peer-reviewed.]
< Abstract >
This paper describes a method to estimate the gaze direction using cornea images captured by a single camera. The purpose is to develop wearable devices capable of obtaining natural user responses, such as interests and behaviors, from eye movements and scene images reflected on the cornea. From an image of the eye, an ellipse is fitted on the colored iris area. A 3D eye model is reconstructed from the ellipse and rotated to simulate projections of the iris area for different eye poses. The gaze direction is then evaluated by matching the iris area of the current image with the corresponding projection obtained from the model. We finally conducted an experiment using a head-mounted prototype to demonstrate the potential of such an eye-tracking method solely based on cornea images captured from a single camera.
-
L. El Hafi, J.-B. Lorent, and G. Rouvroy, "Mapping SDI with a Light-Weight Compression for High Frame Rates and Ultra-HD 4K Transport over SMPTE 2022-5/6", in Proceedings of 2014 VSF Content in Motion, Annual Technical Conference and Exposition (VidTrans14), Arlington, United States, Feb. 26, 2014. [International workshop article, non-peer-reviewed.]
< Abstract >
Considering the necessary bandwidth for the next generation of television with higher resolution video and higher frame rates, live uncompressed transport across 10 GB Ethernet network is not always possible. Indeed, uncompressed 4K video at 60 fps requires 12 Gbps or more. A light-weight compression can be optimal to address this challenge. A pure lossless codec would be the best. However, in general it is difficult to predict the compression ratio achievable by a lossless codec. Therefore, a light-weight visually lossless guaranteeing compression at very low compression ratio with no impact on latency seems optimal to perfectly map SDI links over SMPTE 2022-5/6.
-
L. El Hafi* and T. Denison*, "TICO : Étude d'un système de compression vidéo de faible complexité sur FPGA (TICO: Study of a Low-Complexity Video Compression Scheme for FPGA)", in Master's thesis, Université catholique de Louvain (UCLouvain) & intoPIX, Louvain-la-Neuve, Belgium, Jun. 2013. [Master's thesis.][*Authors contributed equally.][Supervised by J.-D. Legat, B. Macq, and G. Rouvroy.][Published in French.]
< Abstract >
L'évolution des techniques d'affichage, en matière de résolution d'écran, de nombre d'images par seconde et de profondeur des couleurs, nécessite de nouveaux systèmes de compression en vue de réduire notamment la puissance consommée aux interfaces vidéo. Face à cette problématique, le consortium Video Electronics Standards Association (VESA) lance en janvier 2013 un appel à propositions pour la création d'un nouveau standard de compression Display Stream Compression (DSC). Ce document, réalisé en collaboration avec intoPIX, répond à l'appel de VESA et propose Tiny Codec (TICO), un schéma de compression vidéo de faible complexité hardware. Il y est décrit d'une part l'étude algorithmique d'un codeur entropique, inspiré de l'Universal Variable Length Coding (UVLC), affichant un rendement de 85% sur du contenu filmé et, d'autre part, l'implémentation sur FPGA d'une transformée en ondelettes discrète, horizontale de type 5:3, traitant des flux vidéo 4K jusqu'à 120 images par seconde. L'implémentation réalisée consomme 340 slices par composante de couleur sur les plateformes basse consommation Artix-7 de Xilinx.