Summary
-
Assistant Professor at Ritsumeikan
-
Appointed Researcher for Toyota
-
Research Advisor for Panasonic
-
Founder of Coarobo
Lotfi El Hafi, PhD, is a Research Assistant Professor at Ritsumeikan Global Innovation Research Organization (R-GIRO) for the International and Interdisciplinary Research Center for the Next-Generation Artificial Intelligence and Semiotics (AI+Semiotics) Project. He received in 2013 his MScEng in Mechatronics from the Université catholique de Louvain (UCLouvain), Belgium, where his Master thesis explored TICO compression, now JPEG XS, for Ultra HD 4K/8K video distribution in collaboration with intoPIX SA. After working for intoPIX SA as a Sales Engineer while being trained in the AWEX International Business & Management EXPLORT Program, he joined in 2014 the Robotics Laboratory of the Nara Institute of Science and Technology (NAIST), Japan, under the MEXT Scholarship Program where his Doctor thesis proposed Simultaneous Tracking and Attention Recognition from Eyes (STARE), a novel eye-tracking approach that leveraged deep learning to extract behavioral information from scenes reflected in the eyes. Today, his research interests include service robotics and artificial intelligence, with a particular interest in human-robot interaction. In this regard, he is the recipient of multiple research awards, such as the Best of IET and IBC 2016-2017 and the JST CREST Best Award for Forecasting Research Proposal 2019, as well as international robotics competition prizes, such as, among others, the 1st Place of Airbus Shopfloor Challenge 2016, the Finalist Prize of Amazon Robotics Challenge 2017, the NEDO Chairman's Award for Excellence in World Robot Summit 2018, the SICE Award for WRS Future Convenience Store Challenge 2018, and the 1st Places of WRS Future Convenience Store Challenge 2018 & 2019, which led him to become a Specially Appointed Researcher for the Toyota HSR Community, and a Research Advisor for Robotics Competitions at the Robotics Hub of Panasonic Corporation since 2019. Finally, he is also an active member of RSJ, JSME, IEEE, and the President & Founder of Coarobo GK.
2013年ベルギーのルーヴァンカトリック大学(UCLouvain)修士課程修了.2017年奈良先端科学技術大学院大学博士課程修了.博士(工学).同年,立命館グローバルイノベーション研究機構(R-GIRO)の専門研究員を経て,2019年同研究教員(助教)となり現在に至る.同年,トヨタHSR開発コミュニティのSpecially Appointed ResearcherとパナソニックRobotics HubのResearch Advisor for Robotics Competitionsを務める.主にサービスロボティクスや人工知能の研究に従事し,METI,NEDO,JST,SICE,IBC,IETから研究に関する賞等を受賞.RSJ,JSME,IEEEなどの会員.また,コアロボ合同会社の設立者であり,代表社員を務める.
Skills
Service Robotics
Artificial Intelligence
Information Science
Project Management
Academic Research
Experiences
-
President & Founder
Oct. 2019 - Presentat Coarobo GK
in Kyoto, JapanDetails
- Providing consulting services for innovative robotics solutions powered by state-of-the-art artificial intelligence.
-
Research Assistant Professor
Oct. 2019 - Presentat Ritsumeikan University, Ritsumeikan Global Innovation Research Organization (R-GIRO)
in Kusatsu (Kyoto Area), JapanDetails
- Pursuing research on developing novel model architectures for multimodal unsupervised learning in service robotics contexts.
- Daily supervision of graduate students to help them produce higher-quality research in the fields of service robotics and artificial intelligence.
-
Research Advisor for Robotics Competitions
Apr. 2019 - Presentat Panasonic Corporation, Panasonic Robotics Hub
in Osaka, JapanDetails
- Working as a research advisor for successful participation in international robotics competitions.
- Developing robotics solutions for novel human-robot interactions in service contexts.
- Supervising Ritsumeikan University's research and development contributions to Team NAIST-RITS-Panasonic.
-
Specially Appointed Researcher of HSR Community
Apr. 2019 - Presentat Toyota Motor Corporation, Toyota HSR Community
in Nagoya, JapanDetails
- Developing a containerized Software Development Environment (SDE) for the Toyota Human Support Robot (HSR).
- Accelerating research integration within the Toyota HSR Community by deploying a common SDE across its 100+ members.
- Formally invited by Toyota to lead the HSR Software Development Environment Working Group (SDE-WG).
-
Moderator
Oct. 2018 - Nov. 2018at German Centre for Research and Innovation Tokyo (DWIH Tokyo)
in Tokyo, JapanDetails
- Chosen by DWIH Tokyo to moderate the First Japanese-German-French Symposium for International Research and Applications on Artificial Intelligence (AI) that took place on November 21-22, 2018, and boasted a prestigious audience of about 350 participants and 65 speakers.
- Coordinated the guests on stage, moderated their discussions, and highlighted their inputs to foster new trilateral strategic collaborations on AI technology and policy to address future key challenges in academic research, business development, and state sovereignty.
- Met with state representatives such as His Excellency Dr. Hans Carl von Werthern (Ambassador of the Federal Republic of Germany to Japan), His Excellency Mr. Laurent Pic (Ambassador of France to Japan), and His Excellency Mr. Takuya Hirai (Minister of State for Science and Technology Policy in Japan) to promote AI research activities.
-
Senior Researcher
Oct. 2017 - Sep. 2019at Ritsumeikan University, Ritsumeikan Global Innovation Research Organization (R-GIRO)
in Kusatsu (Kyoto Area), JapanDetails
- Joint the International and Interdisciplinary Research Center for the Next-Generation Artificial Intelligence and Semiotics (AI+Semiotics) project.
- Coordinated cross-laboratory efforts to participate in World Robot Summit (WRS) and RoboCup@Home international robotics competitions.
- Developed unsupervised learning methods that enable robots to learn a variety of knowledge through daily human-robot interactions.
-
Sales Engineer
Sep. 2013 - Mar. 2014at intoPIX SA, Marketing & Sales Department (M&S)
in Mont-Saint-Guibert (Brussels Area), BelgiumDetails
- Met with business partners to introduce new product lines: TICO, JPEG 2000 GPU SDK.
- Demonstrated intoPIX's products at multiple exhibitions across Europe and United States: IBC, ISE, VSF VidTrans.
- Introduced intoPIX's newest technologies to industrial standardization committees such as SMPTE.
- Contributed to commercial and technical documentation.
-
Research Intern
Sep. 2012 - Jun. 2013at intoPIX SA, Research & Development Department (R&D)
in Louvain-la-Neuve (Brussels Area), BelgiumDetails
- Carried out Master thesis research within a professional environment as an intern in R&D.
- Explored the FPGA implementation of TICO compression, now JPEG XS, for Ultra HD 4K/8K video distribution.
- Introduced to Scrum, an iterative and incremental agile framework for collaborative software development.
-
System Administrator
Jul. 2005 - Presentat Les Mercredis du Multisport asbl
in Louvain-la-Neuve (Brussels Area), BelgiumDetails
- Automated daily administrative tasks to increase the overall productivity of the staff.
- Set up an online presence that managed to increase the enrolment figures of participants.
- Expanded the network infrastructure while performing maintenance on existing hardware.
- Started as a student part-time job by managing groups of children during sports days.
Education
-
Doctor of Engineering (DrEng), Robotics
Oct. 2014 - Sep. 2017at Nara Institute of Science and Technology (NAIST), Graduate School of Information Science (IS)
in Ikoma (Osaka Area), JapanDetails
- Doctor thesis explored Simultaneous Tracking and Attention Recognition from Eyes (STARE), a novel eye-tracking approach that leveraged deep learning to extract behavioral information from scenes reflected in the eyes.
- Published in results in international journals and conferences, and was awarded Best of IET and IBC 2016-2017.
- Research sponsored by the Japan Ministry of Education, Culture, Sports, Science and Technology (MEXT) Scholarship Program.
- Co-founded Team NAIST, now Team NAIST-RITS-Panasonic, and successfully participated in the Airbus Shopfloor Challenge (ASC) and the Amazon Robotics Challenge (ARC).
-
Research Internship
Apr. 2014 - Sep. 2014at Nara Institute of Science and Technology (NAIST), Graduate School of Information Science (IS)
in Ikoma (Osaka Area), JapanDetails
- Selected to conduct research on robotics in Japan through recommendation by the Japanese diplomatic mission in Belgium.
-
Training in International Business & Management EXPLORT
Sep. 2013 - Mar. 2014at Wallonia Foreign Trade and Investment Agency (AWEX), Centre de Compétence Management & Commerce
in Charleroi (Brussels Area), BelgiumDetails
- Intensive training in international business and management as a part-time trainee in a company.
- Strict candidate selection process supervised by the AWEX regional public authority for foreign trade and investment.
- Training put to practice with a mandatory mission abroad to represent the commercial interests of a Belgian company.
- Stay in the United States to expand intoPIX's core market and product line outside Belgium.
-
Master of Science in Electromechanical Engineering (MScEng), Professional Focus in Mechatronics: Cum Laude
Sep. 2011 - Jun. 2013at Université catholique de Louvain (UCLouvain), École polytechnique de Louvain (EPL)
in Louvain-la-Neuve (Brussels Area), BelgiumDetails
- Graduation from Master with Distinction (Cum Laude), last year with High Distinction (Magna Cum Laude).
- Major in electromechanical science with a professional focus in mechatronics, robotics and hardware video processing.
- Master thesis explored the FPGA implementation of TICO compression, now JPEG XS, in collaboration with intoPIX for Ultra HD 4K/8K video distribution.
- Participated in the international robotics contest Eurobot 2012 as a member of Team Kraken and reached the quarterfinals of the Belgian national qualifications.
-
Language Course, Japanese
Sep. 2009 - Mar. 2014at Institut libre Marie Haps (ILMH)
in Brussels, Belgium -
Bachelor of Science in Engineering (BScEng), Focus in Mechanics and Electricity
Sep. 2007 - Sep. 2011at Université catholique de Louvain (UCLouvain), École polytechnique de Louvain (EPL)
in Louvain-la-Neuve (Brussels Area), BelgiumDetails
- Introduced to a wide range of engineering disciplines with a major in mechanics science and a minor in electricity science.
- Mandatory entrance examination in mathematics for all aspiring engineer as dictated by the Belgian law.
- Followed a two-year introduction to Japanese language and culture as an extracurricular program.
-
7th Year of Preparation for Higher Education, Mathematics
Sep. 2006 - Jun. 2007at Lycée Martin V
in Louvain-la-Neuve (Brussels Area), Belgium -
General Secondary Education, Science
Sep. 2000 - Jun. 2006at Athénée royal Paul Delvaux d'Ottignies (ARO), Antenne de Lauzelle
in Louvain-la-Neuve (Brussels Area), Belgium -
Primary Education, Science
Sep. 1994 - Jun. 2000at École communale de Blocry
in Louvain-la-Neuve (Brussels Area), Belgium
Awards
-
1st Place, Restock & Disposal Task, Future Convenience Store Challenge Trial Competition 2019, World Robot Summit 2020
Dec. 2019by World Robot Summit (WRS)
in Tokyo, JapanDetails
Ranked 1st with Team NAIST-RITS-Panasonic in one of the three main tasks of the Future Convenience Store Challenge at the Trial Competition 2019 of World Robot Summit 2020.
-
2nd Place, Restroom Cleaning Task, Future Convenience Store Challenge Trial Competition 2019, World Robot Summit 2020
Dec. 2019by World Robot Summit (WRS)
in Tokyo, JapanDetails
Ranked 2nd with Team NAIST-RITS-Panasonic in one of the three main tasks of the Future Convenience Store Challenge at the Trial Competition 2019 of World Robot Summit 2020.
-
Best Award for Forecasting Research Proposal, CREST Research Area Meeting 2019
Oct. 2019by Japan Science and Technology Agency (JST) with 3,000,000 JPY
in Osaka, JapanDetails
Awarded with 3,000,000 JPY by JST for representing the Ambient Assisted Living Services Group in forecasting novel "Intelligent information processing systems creating co-experience knowledge and wisdom with human-machine harmonious collaboration".
-
Best Award for Research Proposal Breaking Hagita-CREST Shell, CREST Research Area Meeting 2019
Oct. 2019by Japan Science and Technology Agency (JST) with 1,000,000 JPY
in Osaka, JapanDetails
Awarded with 1,000,000 JPY by JST for proposing and developing novel "Intelligent information processing systems creating co-experience knowledge and wisdom with human-machine harmonious collaboration".
-
NEDO Chairman's Award for Excellence in World Robot Summit, World Robot Summit 2018
Oct. 2018by New Energy and Industrial Technology Development Organization (NEDO)
in Tokyo, JapanDetails
Awarded by the NEDO office for overall excellence in competition with Team NAIST-RITS-Panasonic during World Robot Summit 2018.
-
SICE Award for Future Convenience Store Challenge, World Robot Summit 2018
Oct. 2018by Society of Instrument and Control Engineers (SICE)
in Tokyo, JapanDetails
Awarded by the SICE society for displaying advanced research integration with Team NAIST-RITS-Panasonic during World Robot Summit 2018.
-
1st Place, Customer Interaction Task, Future Convenience Store Challenge 2018, World Robot Summit 2018
Oct. 2018by World Robot Summit (WRS) with 3,000,000 JPY
in Tokyo, JapanDetails
Ranked 1st and awarded 3,000,000 JPY with Team NAIST-RITS-Panasonic in one of the three main tasks of the Future Convenience Store Challenge at the World Robot Summit 2018.
-
Finalist Prize, Amazon Robotics Challenge 2017
Jul. 2017by Amazon with 10,000 USD
in Nagoya, JapanDetails
Ranked 6th and awarded 10,000 USD with Team NAIST-Panasonic in the finals of the 2017 Amazon Robotics Challenge (formerly Amazon Picking Challenge) among 16 top international teams and 27 entries worldwide.
-
Best of IET and IBC 2016-2017
Sep. 2016by Institution of Engineering and Technology (IET) & International Broadcasting Convention (IBC)
in Amsterdam, NetherlandsDetails
Selected among more than 360 submissions to figure in the top 8 papers at 2016 International Broadcasting Convention (IBC 2016) for "Outstanding research in broadcasting and entertainment and very best professional excellence in media technology".
-
1st Place, Airbus Shopfloor Challenge 2016
May 2016by Airbus Group with 20,000 EUR
in Stockholm, SwedenDetails
Winner of the biggest robotics challenge held at 2016 IEEE International Conference on Robotics and Automation (ICRA 2016) with a cash prize of 20,000 EUR.
-
Japan Tent Ambassador, Japan Tent 2014
Aug. 2014by Japan Tent Steering Committee
in Kanazawa, JapanDetails
Selected among 365 international students representing 102 countries gathering in Ishikawa Prefecture to foster friendly relations between Japan and my home country, Belgium.
Certifications
-
Japanese Driving License, Category Ordinary Motorcycle
Oct. 2019by Nara Prefecture
in Nara, JapanDetails
Driving license delivered by the Japanese authorities for motorcycles with an engine displacement not exceeding 399 cc.
-
Highly Skilled Professional (i)(a): Advanced Academic Research Activities
Oct. 2017by Japan Ministry of Justice (MOJ)
in Tokyo, JapanDetails
Received the residence status of Highly Skilled Professional in advanced academic research activities for being among "The quality, unsubstitutable human resources who have a complementary relationship with domestic capital and labor, and who are expected to bring innovation to the Japanese industries, to promote development of specialized/technical labor markets through friendly competition with Japanese people and to increase efficiency of the Japanese labor markets".
-
Doctor of Engineering (DrEng), Robotics
Sep. 2017by Nara Institute of Science and Technology (NAIST), Graduate School of Information Science (IS)
in Ikoma (Osaka Area), JapanDetails
Received the academic title of Doctor after 3 years of research and contribution to the state of the art at the Robotics Laboratory of the Nara Institute of Science and Technology (NAIST).
-
Japanese-Language Proficiency Test (JLPT): N2
Dec. 2016by Japan Educational Exchanges and Services (JEES)
in Nara, JapanDetails
The JLPT is a standardized criterion-referenced test administered by the Japan Ministry of Education, Culture, Sports, Science and Technology (MEXT) to evaluate and certify Japanese language proficiency for non-native speakers. The N2 level is the "The ability to understand Japanese used in everyday situations, and in a variety of circumstances to a certain degree".
-
Japan Kanji Aptitude Test (Kanken): 9
Oct. 2015by Japan Kanji Aptitude Testing Foundation
in Nara, JapanDetails
The Kanken is a standard test aimed at Japanese native speakers that evaluates one's knowledge of kanji, and especially one's ability to write them without computing aid. The level 9 covers 240 kanji learned up to the second grade of elementary school.
-
Japanese Driving License, Category Semi-Medium Vehicle
Aug. 2014by Nara Prefecture
in Nara, JapanDetails
Driving license delivered by the Japanese authorities for most common motor vehicles not exceeding 3.5 t.
-
Test of English for International Communication, Institutional Program (TOEIC IP): 955/990
Jun. 2014by Educational Testing Service (ETS)
in Ikoma (Osaka Area), JapanDetails
The TOEIC IP is a standard test administered in Japan by the Institute for International Business Communication (IIBC) to measure the English reading and listening skills of people working in international environments. A score above 800 means an advanced command of the language.
-
Training in International Business & Management EXPLORT
Mar. 2014by Wallonia Foreign Trade and Investment Agency (AWEX)
in Brussels, BelgiumDetails
Certificate of successful completion of the EXPLORT Program in which top candidates selected by the AWEX are intensively trained in international business and management, and dispatched abroad for representing the commercial interests of a Belgian company.
-
European Driving License, Category B
Jan. 2014by Wavre City
in Wavre (Brussels Area), BelgiumDetails
Driving license delivered by the Belgian authorities and valid in all member states of the European Economic Area (EEA) for most common motor vehicles not exceeding 3.5 t.
-
Test of English as a Foreign Language, Internet-Based Test (TOEFL iBT): 102/120
Sep. 2013by Educational Testing Service (ETS)
in Brussels, BelgiumDetails
The TOEFL iBT test measures the ability to use and understand English at the university level by evaluating reading, listening, speaking, and writing skills in performing academic tasks. A score above 95 means an advanced command of the language.
-
Ingénieur civil (Ir)
Jun. 2013by Université catholique de Louvain (UCLouvain), École polytechnique de Louvain (EPL)
in Louvain-la-Neuve (Brussels Area), BelgiumDetails
Legally protected title under the Belgian law applicable to the graduates of the 5-year engineering course of the top national universities.
-
Master of Science in Electromechanical Engineering (MScEng), Professional Focus in Mechatronics: Cum Laude
Jun. 2013by Université catholique de Louvain (UCLouvain), École polytechnique de Louvain (EPL)
in Louvain-la-Neuve (Brussels Area), BelgiumDetails
Received the academic title of Master after 5 years of higher education in engineering at the Université catholique de Louvain (UCLouvain).
-
Bachelor of Science in Engineering (BScEng), Focus in Mechanics and Electricity
Sep. 2011by Université catholique de Louvain (UCLouvain), École polytechnique de Louvain (EPL)
in Louvain-la-Neuve (Brussels Area), BelgiumDetails
Received the academic title of Bachelor after 3 years of higher education in engineering at the Université catholique de Louvain (UCLouvain).
Publications
-
B. Bastin, S. Hasegawa, J. Solis, R. Ronsse, B. Macq, L. El Hafi, G. A. Garcia Ricardez, and T. Taniguchi, "GPTAlly: A Safety-oriented System for Human-Robot Collaboration based on Foundation Models", in Proceedings of 2025 IEEE/SICE International Symposium on System Integration (SII 2025), Munich, Germany, Jan. 1, 2025. [International conference article, peer-reviewed.][Accepted for presentation.]
Abstract
As robots increasingly integrate into the workplace, Human-Robot Collaboration (HRC) has become increasingly important. However, most HRC solutions are based on pre-programmed tasks and use fixed safety parameters, which keeps humans out of the loop. To overcome this, HRC solutions that can easily adapt to human preferences during the operation as well as their safety precautions considering the familiarity with robots are necessary. In this paper, we introduce GPTAlly, a novel safety-oriented system for HRC that leverages the emerging capabilities of Large Language Models (LLMs). GPTAlly uses LLMs to 1) infer users' subjective safety perceptions to modify the parameters of a Safety Index algorithm; 2) decide on subsequent actions when the robot stops to prevent unwanted collisions; and 3) re-shape the robot arm trajectories based on user instructions. We subjectively evaluate the robot's behavior by comparing the safety perception of GPT-4 to the participants. We also evaluate the accuracy of natural language-based robot programming of decision-making requests. The results show that GPTAlly infers safety perception similarly to humans, and achieves an average of 80% of accuracy in decision-making, with few instances under 50%. Code available at: https://axtiop.github.io/GPTAlly/
-
E. Martin, S. Hasegawa, J. Solis, B. Macq, R. Ronsse, G. A. Garcia Ricardez, L. El Hafi*, and T. Taniguchi, "Integrating Multimodal Communication and Comprehension Evaluation during Human-Robot Collaboration for Increased Reliability of Foundation Model-based Task Planning Systems", in Proceedings of 2025 IEEE/SICE International Symposium on System Integration (SII 2025), Munich, Germany, Jan. 1, 2025. [International conference article, peer-reviewed.][*Corresponding author.][Accepted for presentation.]
Abstract
Foundation models provide the adaptability needed in robotics but often require explicit tasks or human verification due to potential unreliability in their responses, complicating human-robot collaboration (HRC). To enhance the reliability of such task-planning systems, we propose 1) an adaptive task-planning system for HRC that reliably performs non-predefined tasks implicitly instructed through HRC, and 2) an integrated system combining multimodal large language model (LLM)-based task planning with multimodal communication of human intention to increase the HRC success rate and comfort. The proposed system integrates GPT-4V for adaptive task planning and comprehension evaluation during HRC with multimodal communication of human intention through speech and deictic gestures. Four pick-and-place tasks of gradually increasing difficulty were used in three experiments, each evaluating a key aspect of the proposed system: task planning, comprehension evaluation, and multimodal communication. The quantitative results show that the proposed system can interpret implicitly instructed tabletop pick-and-place tasks through HRC, providing the next object to pick and the correct position to place it, achieving a mean success rate of 0.80. Additionally, the system can evaluate its comprehension of three of the four tasks with an average precision of 0.87. The qualitative results show that multimodal communication not only significantly enhances the success rate but also the feelings of trust and control, willingness to use again, and sense of collaboration during HRC.
-
S. Hasegawa, K. Murata, T. Ishikawa, Y. Hagiwara, A. Taniguchi, L. El Hafi, G. A. Garcia Ricardez, and T. Taniguchi, "大規模言語モデルによる複数ロボットの知識統合とタスク割当を用いた現場学習のコスト削減 (Reducing Cost of On-Site Learning by Multi-Robot Knowledge Integration and Task Decomposition via Large Language Models)", in Journal of the Robotics Society of Japan (JRSJ), ?. [Domestic journal article, peer-reviewed.][Published in Japanese.][Accepted for publication.]
Abstract
When robots are deployed in large environments such as hospitals and offices, they must learn place-object relationships in a short period of time. However, the amount of observational data required for multiple robots to perform object search and tidy-up tasks satisfactorily is often unclear a priori, making rapid knowledge acquisition necessary. Therefore, we propose a method in which each robot inputs its knowledge based on on-site learning of a spatial concept model into a large language model, GPT-4, to infer probabilistic action planning based on its predictions. We conducted simulations of object search tasks with multiple robots according to user instructions and evaluated the success score of each task for each iteration of spatial concept learning. As a result of the experiment, the proposed method achieved a high success score while reducing the amount of observational data by more than half compared to the baseline.
-
T. Sakaguchi, A. Taniguchi, Y. Hagiwara, L. El Hafi, S. Hasegawa, and T. Taniguchi, "Real-World Instance-specific Image Goal Navigation: Bridging Domain Gaps via Contrastive Learning", in Proceedings of 2024 IEEE International Conference on Robotic Computing (IRC 2024), Tokyo, Japan, Dec. 11, 2024. [International conference article, peer-reviewed.][Accepted for presentation.]
Abstract
Improving instance-specific image goal navigation (InstanceImageNav), which locates the identical object in a real-world environment from a query image, is essential for robotic systems to assist users in finding desired objects. The challenge lies in the domain gap between low-quality images observed by the moving robot, characterized by motion blur and low-resolution, and high-quality query images provided by the user. Such domain gaps could significantly reduce the task success rate but have not been the focus of previous work. To address this, we propose a novel method called Few-shot Cross-quality Instance-aware Adaptation (CrossIA), which employs contrastive learning with an instance classifier to align features between massive low- and few high-quality images. This approach effectively reduces the domain gap by bringing the latent representations of cross-quality images closer on an instance basis. Additionally, the system integrates an object image collection with a pre-trained deblurring model to enhance the observed image quality. Our method fine-tunes the SimSiam model, pre-trained on ImageNet, using CrossIA. We evaluated our method's effectiveness through an InstanceImageNav task with 20 different types of instances, where the robot identifies the same instance in a real-world environment as a high-quality query image. Our experiments showed that our method improves the task success rate by up to three times compared to the baseline, a conventional approach based on SuperGlue. These findings highlight the potential of leveraging contrastive learning and image enhancement techniques to bridge the domain gap and improve object localization in robotic applications. The project website is https://emergentsystemlabstudent.github.io/DomainBridgingNav/.
-
T. Sakaguchi, A. Taniguchi, Y. Hagiwara, L. El Hafi, S. Hasegawa, and T. Taniguchi, "Object Instance Retrieval in Assistive Robotics: Leveraging Fine-tuned SimSiam with Multi-View Images based on 3D Semantic Map", in Proceedings of 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024), Abu Dhabi, United Arab Emirates, Oct. 14, 2024. [International conference article, peer-reviewed.]
Abstract
Robots that assist humans in their daily lives should be able to locate specific instances of objects in an environment that match a user's desired objects. This task is known as instance-specific image goal navigation (InstanceImageNav), which requires a model that can distinguish different instances of an object within the same class. A significant challenge in robotics is that when a robot observes the same object from various 3D viewpoints, its appearance may differ significantly, making it difficult to recognize and locate accurately. In this paper, we introduce a method called SimView, which leverages multi-view images based on a 3D semantic map of an environment and self-supervised learning using SimSiam to train an instance-identification model on-site. The effectiveness of our approach was validated using a photorealistic simulator, Habitat Matterport 3D, created by scanning actual home environments. Our results demonstrate a 1.7-fold improvement in task accuracy compared with contrastive language-image pre-training (CLIP), a pre-trained multimodal contrastive learning method for object searching. This improvement highlights the benefits of our proposed fine-tuning method in enhancing the performance of assistive robots in InstanceImageNav tasks. The project website is https://emergentsystemlabstudent.github.io/MultiViewRetrieve/.
-
S. Hashimoto, T. Ishikawa, S. Hasegawa, A. Taniguchi, Y. Hagiwara, L. El Hafi, and T. Taniguchi, "Ownership Information Acquisition of Objects in the Environment by Active Question Generation with Multimodal Large Language Models and Probabilistic Generative Models", in Proceedings of 2024 SIGDIAL Workshop on Spoken Dialogue Systems for Cybernetic Avatars (SDS4CA 2024), pp. 1-1, Kyoto, Japan, Sep. 17, 2024. [International workshop abstract, non-peer-reviewed.]
Abstract
In daily life environments, such as a home or office, a robot that coexists with the user is required to perform various tasks through interaction with the environment and the user. Ownership information is important for tasks such as the robot bringing an object specified by the user. For example, if there are two identical looking cups in the living room, each cup may have its own owner. Knowing the owner of each cup in advance, the robot can identify "Bob's cup" and perform the task when the user gives the robot the verbal instruction "Bring me Bob's cup". Interaction with the user is effective for the robot to acquire invisible ownership information of objects in the environment. For example, when the robot finds an object, it may be able to acquire the visible attributes of the object, such as "the red cup" from its appearance. On the other hand, ownership information of the object, which depends on the user and the environment, such as whose cup it is, can be acquired through interaction with the user. However, when the robot learns such ownership information, it is burdensome for the user to interact passively with all objects in the environment, such as unilaterally instructing the robot about ownership information of each object. We propose a method that reduces the user's teaching burden and enables the robot to efficiently acquire ownership information of the object. The robot selects whether or not an object should be asked about its ownership information by utilizing the common sense knowledge of the multimodal large language model, GPT-4. It also selects objects to ask questions based on active inference that minimizes the expected free energy for ownership information. Then, a probabilistic generative model is constructed to learn ownership information of the object based on the location and attributes of the object in the environment and the user's answers obtained by question generation. We conduct experiments in a field simulating an actual laboratory to verify whether the robot can accurately learn ownership information of each object placed in the environment. We will also verify whether the proposed method using GPT-4 and active inference improves the learning efficiency and reduces the burden on the user compared to the comparison method.
-
T. Matsushima, R. Takanami, M. Kambara, Y. Noguchi, J. Arima, Y. Ikeda, K. Yanagida, K. Iwata, S. Hasegawa, L. El Hafi, K. Yamao, K. Isomoto, N. Yamaguchi, R. Kobayashi, T. Shiba, Y. Yano, A. Mizutani, H. Tamukoh, T. Horii, K. Sugiura, T. Taniguchi, Y. Matsuo, and Y. Iwasawa, "HSRT-X: コミュニティを活用したロボット基盤モデルの構築", in Proceedings of 2024 Annual Conference of the Robotics Society of Japan (RSJ 2024), ref. RSJ2024AC1D2-01, pp. 1-4, Osaka, Japan, Sep. 3, 2024. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
Abstract
多様な環境・タスクに利用可能なEnd-to-Endで巨大なロボットの方策モデルをはじめとするロボット基盤モデルを構築するために,モバイルマニピュレータHSRのユーザコミュニティであるHSRコミュニティを活用して,複数の拠点でデータセットを収集し,モデル学習を行うHSRT-Xプロジェクトの現況に関して紹介する.
-
S. Hasegawa, K. Murata, T. Ishikawa, Y. Hagiwara, A. Taniguchi, L. El Hafi, G. A. Garcia Ricardez, and T. Taniguchi, "マルチモーダル大規模言語モデルによる複数ロボットの知識統合とタスク割当を用いた現場学習のコスト削減", in Proceedings of 2024 Annual Conference of the Robotics Society of Japan (RSJ 2024), ref. RSJ2024AC1D2-02, pp. 1-4, Osaka, Japan, Sep. 3, 2024. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
Abstract
病院やオフィスなどの大規模環境にロボットを導入する際,ロボットが物体と場所の関係を迅速に学習することが重要である.複数ロボットが物体探索や片付けタスクを行う際に必要な観測データ量は明確ではなく,迅速な知識獲得が求められる.そこで我々は,各ロボットが場所概念モデルに基づいた現場知識をGPT-4に入力し,その予測に基づいて確率的な行動計画を立てる手法を提案する.シミュレータ上で,複数のロボットがユーザの指示に従って物体探索を実行し,場所の学習回数ごとにタスクの成功スコアを評価した.実験の結果,提案手法はベースラインよりも2倍以上の観測データ量を削減したうえで,高い成功スコアを達成した.
-
K. Murata, S. Hasegawa, T. Ishikawa, Y. Hagiwara, A. Taniguchi, L. El Hafi, and T. Taniguchi, "ロボット間の現場知識の差を考慮した基盤モデルによる物体探索の言語指示におけるタスク分解と割当", in Proceedings of 2024 Annual Conference of the Robotics Society of Japan (RSJ 2024), ref. RSJ2024AC3D2-05, pp. 1-4, Osaka, Japan, Sep. 3, 2024. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
Abstract
家庭環境では,“バナナとコップを探して”のように,複数の目標物体を含む言語指示がロボットに与えられる場合が考えられる.このような指示に対して複数のロボットが手分けしてタスクを実行するためには,タスクをサブタスクへ分解し正しく各ロボットへの割当を行うことが重要である.本研究では,提案手法がタスクのサブタスクへの分解と複数のロボットに対するサブタスクの割当を行う.場所名や物体の配置を推論できる場所概念モデルを用いて,GPT-4によるタスクの分解と割当を行う手法を提案し,適切にサブタスクを割当可能かを検証した.実験結果では,GPT-4と場所概念モデルを活用しタスクの分解と割当を行うことで,ベースライン手法より2倍近い成功数でのタスクを割当てを実現した.
-
T. Nakashima, S. Otake, A. Taniguchi, K. Maeyama, L. El Hafi, T. Taniguchi, and H. Yamakawa, "Hippocampal Formation-inspired Global Self-Localization: Quick Recovery from the Kidnapped Robot Problem from an Egocentric Perspective", in Frontiers in Computational Neuroscience, Research Topic on Brain-Inspired Intelligence: The Deep Integration of Brain Science and Artificial Intelligence, vol. 18, pp. 1-15, Jul. 18, 2024. DOI: 10.3389/fncom.2024.1398851 [International journal article, peer-reviewed.]
Abstract
It remains difficult for mobile robots to continue accurate self-localization when they are suddenly teleported to a location that is different from their beliefs during navigation. Incorporating insights from neuroscience into developing a spatial cognition model for mobile robots may make it possible to acquire the ability to respond appropriately to changing situations, similar to living organisms. Recent neuroscience research has shown that during teleportation in rat navigation, neural populations of place cells in the cornu ammonis-3 region of the hippocampus, which are sparse representations of each other, switch discretely. In this study, we construct a spatial cognition model using brain reference architecture-driven development, a method for developing brain-inspired software that is functionally and structurally consistent with the brain. The spatial cognition model was realized by integrating the recurrent state—space model, a world model, with Monte Carlo localization to infer allocentric self-positions within the framework of neuro-symbol emergence in the robotics toolkit. The spatial cognition model, which models the cornu ammonis-1 and -3 regions with each latent variable, demonstrated improved self-localization performance of mobile robots during teleportation in a simulation environment. Moreover, it was confirmed that sparse neural activity could be obtained for the latent variables corresponding to cornu ammonis-3. These results suggest that spatial cognition models incorporating neuroscience insights can contribute to improving the self-localization technology for mobile robots. The project website is https://nakashimatakeshi.github.io/HF-IGL/.
-
B. Bastin, "GPTAlly: A Safety-oriented System for Human-Robot Collaboration based on Foundation Models", in Master's thesis, Université catholique de Louvain (UCLouvain), Louvain-la-Neuve, Belgium, Jun. 2024. [Master's thesis.][Supervised by B. Macq, R. Ronsse, G. A. Garcia Ricardez, L. El Hafi, and J. Solis.]
Abstract
We are aiming for Society 5.0, which emphasizes improving workplace quality of life through AI and robotics. However, current robots lack human-like situational understanding and often rely on pre-programmed tasks or supervised learning. Additionally, there is a need for safety metrics that consider users' subjective safety perceptions. This thesis introduces GPTAlly, a system for safe human-robot collaboration using Large Language Models (LLMs) and Visual Language Models (VLMs). LLMs help infer users' subjective safety perceptions in collaborative tasks, influencing a Safety Index algorithm that adjusts safety evaluations. The system ensures robots stop to prevent harmful collisions and uses an LLM-based coding paradigm to determine subsequent actions, either autonomously or as per user preferences. The actions are implemented by an LLM, which shapes robotic arm trajectories by interpreting the user's natural language instructions to suggest 3D poses. A user study compares safety perception scaling factors from GPT-4 with participants' estimates. The study also evaluates user satisfaction with the changes in robot behavior. The accuracy of the streamlined coding paradigm is evaluated through contextual experiments by varying the number of conditions processed by the LLM and paraphrasing the conditions. The satisfaction with the trajectories shaped from 3D poses is assessed through another user study. The study finds that LLMs effectively integrate human safety perceptions. GPT-4's estimations of the scaling factors closely match the user responses, and participants express satisfaction with behavior changes. However, the coding paradigm's contextual accuracy can be below 50%. Finally, the robotic arm trajectories found that users preferred trajectories shaped by their natural language inputs over uninfluenced ones. Codebase available at: https://axtiop.github.io/GPTAlly
-
E. Martin, "Task Planning System using Foundation Models in Multimodal Human-Robot Collaboration", in Master's thesis, Université catholique de Louvain (UCLouvain), Louvain-la-Neuve, Belgium, Jun. 2024. [Master's thesis.][Supervised by L. El Hafi, G. A. Garcia Ricardez, B. Macq, R. Ronsse, and J. Solis.]
Abstract
Society 5.0, the society Japan aspires to, aims to create a cyber-physical system where humans and robots collaborate. To this end, both should be able to work together on the same tasks. In conventional robotics, robots are trained and specialized to perform specific tasks. While they perform well on this pre-defined set of tasks, these models require extensive data gathering and a time-consuming process. Moreover, when facing unknown environments, they experience a decrease in performance due to their non-adaptability to unforeseen situations. Additionally, if they are part of the same working team, the robots must understand and interpret human intentions. However, most of the past proposed intention recognition methods also lack flexibility and contextualization capability. To tackle this, this thesis proposes 1) a dynamic task planning system capable of performing non-predefined tasks, and 2) a framework that combines automatic task planning with human multimodal intention communication, enhancing the success of the task and human well-being (e.g., trust, willingness to use the system again). In this regard, there have been recent improvements in zero-shot learning in Human-Robot Collaboration using large pre-trained models. Because they were trained on large amounts of data, these models can apply their knowledge to tasks beyond their training data. Visual Language Models have recently demonstrated their ability to understand and analyze images. For this reason, these models are widely used as the robot’s reasoning module. Therefore, the system proposed in this thesis is divided into three modules: 1) automatic task planning computed using GPT-4V, 2) use of GPT-4V to compute a confidence level that reflects its comprehension of the task, and 3) a multimodal communication module to correct the automatic task planning in case of failure. Firstly, automatic task planning is achieved by feeding the Visual Language Model with an image of the task currently being performed. The VLM is then asked to determine the next step to pursue the task. The confidence level is defined as a number between 0 and 10, reflecting the robot’s comprehension of the task. Multimodal communication is achieved using deictic movements and speech communication. The results show that: 1) GPT-4V is able to understand simple tabletop pick-and-place tasks and provide the next object to pick and the corresponding placement position, 2) GPT-4V is able to evaluate its comprehension for three of the four implemented tasks, and 3) multimodal communication integrated into the automatic system enhances, in the tested task, both the success rate and human well-being.
-
C. Tornberg, L. El Hafi*, P. M. Uriguen Eljuri, M. Yamamoto, G. A. Garcia Ricardez, J. Solis, and T. Taniguchi, "Mixed Reality-based 6D-Pose Annotation System for Robot Manipulation in Retail Environments", in Proceedings of 2024 IEEE/SICE International Symposium on System Integration (SII 2024), pp. 1425-1432, Ha Long, Vietnam, Jan. 8, 2024. DOI: 10.1109/SII58957.2024.10417443 [International conference article, peer-reviewed.][*Corresponding author.]
Abstract
Robot manipulation in retail environments is a challenging task due to the need for large amounts of annotated data for accurate 6D-pose estimation of items. Onsite data collection, additional manual annotation, and model fine-tuning are often required when deploying robots in new environments, as varying lighting conditions, clutter, and occlusions can significantly diminish performance. Therefore, we propose a system to annotate the 6D pose of items using mixed reality (MR) to enhance the robustness of robot manipulation in retail environments. Our main contribution is a system that can display 6D-pose estimation results of a trained model from multiple perspectives in MR, and enable onsite (re-)annotation of incorrectly inferred item poses using hand gestures. The proposed system is compared to a PC-based annotation system using a mouse and the robot camera's point cloud in an extensive quantitative experiment. Our experimental results indicate that MR can increase the accuracy of pose annotation, especially by reducing position errors.
-
P. Zhu, L. El Hafi*, and T. Taniguchi, "Visual-Language Decision System through Integration of Foundation Models for Service Robot Navigation", in Proceedings of 2024 IEEE/SICE International Symposium on System Integration (SII 2024), pp. 1288-1295, Ha Long, Vietnam, Jan. 8, 2024. DOI: 10.1109/SII58957.2024.10417171 [International conference article, peer-reviewed.][*Corresponding author.]
Abstract
This study aims to build a system that bridges the gap between robotics and environmental understanding by integrating various foundation models. While current visual-language models (VLMs) and large language models (LLMs) have demonstrated robust capabilities in image recognition and language comprehension, challenges remain in integrating them into practical robotic applications. Therefore, we propose a visual-language decision (VLD) system that allows a robot to autonomously analyze its surroundings using three VLMs (CLIP, OFA, and PaddleOCR) to generate semantic information. This information is further processed using the GPT-3 LLM, which allows the robot to make judgments during autonomous navigation. The contribution is twofold: 1) We show that integrating CLIP, OFA, and PaddleOCR into a robotic system can generate task-critical information in unexplored environments; 2) We explore how to effectively use GPT-3 to match the results generated by specific VLMs and make navigation decisions based on environmental information. We also implement a photorealistic training environment using Isaac Sim to test and validate the proposed VLD system in simulation. Finally, we demonstrate VLD-based real-world navigation in an unexplored environment using a TurtleBot3 robot equipped with a lidar and an RGB camera.
-
A. Kanechika, L. El Hafi*, A. Taniguchi, Y. Hagiwara, and T. Taniguchi, "Interactive Learning System for 3D Semantic Segmentation with Autonomous Mobile Robots", in Proceedings of 2024 IEEE/SICE International Symposium on System Integration (SII 2024), pp. 1274-1281, Ha Long, Vietnam, Jan. 8, 2024. DOI: 10.1109/SII58957.2024.10417237 [International conference article, peer-reviewed.][*Corresponding author.]
Abstract
Service robots operating in unfamiliar environments require capabilities for autonomous object recognition and learning from user interactions. However, present semantic segmentation methods, crucial for such tasks, often demand large datasets and costly annotations to achieve accurate inference. In addition, they cannot handle all possible objects or environmental variations without a large additional number of images and annotations. Therefore, this study introduces a learning system for semantic segmentation that combines 3D semantic mapping with interactions between an autonomous mobile robot and a user. We show that the proposed system can: 1) autonomously construct 3D semantic maps using an autonomous mobile robot, 2) improve the prediction accuracy of models pre-trained by supervised and weakly supervised learning in new environments, even without interaction, and 3) more accurately predict new classes of objects with a small number of additional coarse annotations obtained through interaction. Results obtained from experiments conducted in a real-world setting using models pre-trained on the NYU, VOC, and COCO datasets demonstrated an improvement in semantic segmentation accuracy when using our proposed system.
-
S. Hasegawa, A. Taniguchi, Y. Hagiwara, L. El Hafi, and T. Taniguchi, "Integrating Probabilistic Logic and Multimodal Spatial Concepts for Efficient Robotic Object Search in Home Environments", in SICE Journal of Control, Measurement, and System Integration (JCMSI), Virtual Issue on IEEE/SICE SII 2023, vol. 16, no. 1, pp. 400-422, Dec. 26, 2023. DOI: 10.1080/18824889.2023.2283954 [International journal article, peer-reviewed.]
Abstract
Our study introduces a novel approach that combined probabilistic logic and multimodal spatial concepts to enable a robot to efficiently acquire place-object relationships in a new home environment with few learning iterations. By leveraging probabilistic logic, which employs predicate logic with probability values, we represent common-sense knowledge of the place-object relationships. The integration of logical inference and cross-modal inference to calculate conditional probabilities across different modalities enables the robot to infer object locations even when their likely locations are undefined. To evaluate the effectiveness of our method, we conducted simulation experiments and compared the results with three baselines: multimodal spatial concepts only, common-sense knowledge only, and common-sense knowledge and multimodal spatial concepts combined. By comparing the number of room visits required by the robot to locate 24 objects, we demonstrated the improved performance of our approach. For search tasks including objects whose locations were undefined, the findings demonstrate that our method reduced the learning cost by a factor of 1.6 compared to the baseline methods. Additionally, we conducted a qualitative analysis in a real-world environment to examine the impact of integrating the two inferences and identified the scenarios that influence changes in the task success rate.
-
G. A. Garcia Ricardez, C. Tornberg, L. El Hafi, J. Solis, and T. Taniguchi, "Toward Safe and Efficient Human-Robot Teams: Mixed Reality-based Robot Motion and Safety Index Visualization", in Abstract Booklet of 16th IFToMM World Congress (WC 2023), pp. 53-54, Tokyo, Japan, Nov. 5, 2023. [International conference abstract, peer-reviewed.]
-
Y. Hagiwara, S. Hasegawa, A. Oyama, A. Taniguchi, L. El Hafi, and T. Taniguchi, "現場環境で学習した知識に基づく曖昧な発話からの生活物理支援タスク", in Proceedings of 2023 Annual Conference of the Robotics Society of Japan (RSJ 2023), ref. RSJ2023AC1J1-05, pp. 1-4, Sendai, Japan, Sep. 9, 2023. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
Abstract
家庭環境では,「あれ取って」や「コップを持ってきて」といった曖昧な言語指示が用いられる.これらの言語指示には,持ってくる物体や取ってくる場所の情報が明示的に含まれていない.本稿では,ロボットが現場環境で学習した知識に基づいて不足している情報を補い,曖昧な言語指示から生活物理支援タスクを実現する二つの手法について述べる.一つは,現場のマルチモーダル情報を用いた指示語を含む言語指示の外部照応解析の手法である.もう一つは,場所概念モデルにより獲得された現場知識と大規模言語モデルを活用したプランニングの手法である.
-
S. Hasegawa, M. Ito, R. Yamaki, T. Sakaguchi, Y. Hagiwara, A. Taniguchi, L. El Hafi, and T. Taniguchi, "生活支援ロボットの行動計画のための大規模言語モデルと場所概念モデルの活用", in Proceedings of 2023 Annual Conference of the Robotics Society of Japan (RSJ 2023), ref. RSJ2023AC1K3-06, pp. 1-4, Sendai, Japan, Sep. 9, 2023. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
Abstract
生活支援ロボットがユーザに支援をするときに,ロボットはユーザの言語指示を理解し,その場に適した行動を取ることが重要である.我々は,多様な指示にロボットが対処するために,場所概念モデルで構築した現場知識をChatGPTに与え,現場知識に基づき行動計画を行うシステムを提案する.シミュレータ上で,ロボットがユーザの指示から物体を探す実験を行った.ロボットが物体を発見するまでに要した部屋の訪問数等を評価した.実験から,提案システムはBaselineよりも探索時の部屋の訪問数を削減可能なことを示した.
-
G. A. Garcia Ricardez, T. Wakayama, S. Ikemura, E. Fujiura, P. M. Uriguen Eljuri, H. Ikeuchi, M. Yamamoto, L. El Hafi, and T. Taniguchi, "Toward Resilient Manipulation of Food Products: Analysis of 6D-Pose Estimation at the Future Convenience Store Challenge 2022", in Proceedings of 2023 IEEE International Conference on Automation Science and Engineering (CASE 2023), pp. 1-6, Auckland, New Zealand, Aug. 26, 2023. DOI: 10.1109/CASE56687.2023.10260506 [International conference article, peer-reviewed.]
Abstract
Service robots, the class of robots that are designed to assist humans in their daily lives, are needed in the retail industry to compensate for the labor shortage. To foster innovation, the Future Convenience Store Challenge was created, where robotic systems for the manipulation of food products are tasked to dispose of expired products and replenish the shelves. We, as team NAIST-RITS-Panasonic, have developed a mobile manipulator with which we have obtained 1st place in the past three editions of the challenge. In the last edition, we manipulated the five types of items without fiducial markers or customized packaging using a suction-based end effector. In this paper, we evaluate the accuracy of the 6D-pose estimation as well as its effect on the grasping success rate by 1) comparing the 6D-pose estimation results with the ground truth, and 2) evaluating the grasping success rate with the estimated pose during and after the competition. The results show that the 6D-pose estimation error has a significant effect on the grasping success rate.
-
A. Taniguchi, Y. Tabuchi, T. Ishikawa, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "Active Exploration based on Information Gain by Particle Filter for Efficient Spatial Concept Formation", in RSJ Advanced Robotics (AR), Special Issue on World Models and Predictive Coding in Robotics (Part I), vol. 37, no. 13, pp. 840-870, Jul. 3, 2023. DOI: 10.1080/01691864.2023.2225175 [International journal article, peer-reviewed.]
Abstract
Autonomous robots need to learn the categories of various places by exploring their environments and interacting with users. However, preparing training datasets with linguistic instructions from users is time-consuming and labor-intensive. Moreover, effective exploration is essential for appropriate concept formation and rapid environmental coverage. To address this issue, we propose an active inference method, referred to as spatial concept formation with information gain-based active exploration (SpCoAE) that combines sequential Bayesian inference using particle filters and information gain-based destination determination in a probabilistic generative model. This study interprets the robot's action as a selection of destinations to ask the user, "What kind of place is this?" in the context of active inference. This study provides insights into the technical aspects of the proposed method, including active perception and exploration by the robot, and how the method can enable mobile robots to learn spatial concepts through active exploration. Our experiment demonstrated the effectiveness of the SpCoAE in efficiently determining a destination for learning appropriate spatial concepts in home environments.
-
S. Hasegawa, R. Yamaki, A. Taniguchi, Y. Hagiwara, L. El Hafi, and T. Taniguchi, "大規模言語モデルと場所概念モデルの統合による未観測物体の語彙を含んだ言語指示理解 (Understanding Language Instructions that Include the Vocabulary of Unobserved Objects by Integrating a Large Language Model and a Spatial Concept Model)", in Proceedings of 2023 Annual Conference of the Japanese Society for Artificial Intelligence (JSAI 2023), ref. 1Q4-OS-7b-03, pp. 1-4, Kumamoto, Japan, Jun. 7, 2023. DOI: 10.11517/pjsai.JSAI2023.0_1Q4OS7b03 [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
Abstract
For a robot to assist people in home environments, it is important to handle the vocabulary of unobserved objects while learning the knowledge of places. It is assumed that there exist objects that the robot did not observe through its sensors during learning. For such a case, the robot is expected to perform household tasks on language instructions that include the vocabulary of these objects. We propose a method that integrates a large language model and a spatial concept model to enable the robot to understand language instructions that include the vocabulary of unobserved objects while learning places. Even if the objects that the user instructed the robot to search for are not included in a training dataset during learning, the number of room visits during object search can be expected to reduce by combining the inference of these models. We validated our method in an experiment in which a robot searched for unobserved objects in a simulated environment. The results showed that our proposed method could reduce the number of room visits during the search compared to the baseline method.
-
L. El Hafi, Y. Zheng, H. Shirouzu, T. Nakamura, and T. Taniguchi, "Serket-SDE: A Containerized Software Development Environment for the Symbol Emergence in Robotics Toolkit", in Proceedings of 2023 IEEE/SICE International Symposium on System Integration (SII 2023), pp. 1-6, Atlanta, United States, Jan. 17, 2023. DOI: 10.1109/SII55687.2023.10039424 [International conference article, peer-reviewed.]
Abstract
The rapid deployment of intelligent robots to perform service tasks has become an increasingly complex challenge for researchers due to the number of disciplines and skills involved. Therefore, this paper introduces Serket-SDE, a containerized Software Development Environment (SDE) for the Symbol Emergence in Robotics Toolkit (Serket) that relies on open-source technologies to build cognitive robotic systems from multimodal sensor observations. The main contribution of Serket-SDE is an integrated framework that allows users to rapidly compose, scale, and deploy probabilistic generative models with robots. The description of Serket-SDE is accompanied by demonstrations of unsupervised multimodal categorizations using a mobile robot in various simulation environments. Further extensions of the Serket-SDE framework are discussed in conclusion based on the demonstrated results.
-
S. Hasegawa, A. Taniguchi, Y. Hagiwara, L. El Hafi, and T. Taniguchi, "Inferring Place-Object Relationships by Integrating Probabilistic Logic and Multimodal Spatial Concepts", in Proceedings of 2023 IEEE/SICE International Symposium on System Integration (SII 2023), pp. 1-8, Atlanta, United States, Jan. 17, 2023. DOI: 10.1109/SII55687.2023.10039318 [International conference article, peer-reviewed.]
Abstract
We propose a novel method that integrates probabilistic logic and multimodal spatial concepts to enable a robot to acquire the relationships between places and objects in a new environment with a few learning times. Using predicate logic with probability values (i.e., probabilistic logic) to represent commonsense knowledge of place-object relationships, we combine logical inference using probabilistic logic with the cross-modal inference that can calculate the conditional probabilities of other modalities given one modality. This allows the robot to infer the place of the object to find even when it does not know the likely place of the object in the home environment. We conducted experiments in which a robot searched for daily objects, including objects with undefined places, in a simulated home environment using four approaches: 1) multimodal spatial concepts only, 2) commonsense knowledge only, 3) commonsense knowledge and multimodal spatial concepts, and 4) probabilistic logic and multimodal spatial concepts (proposed). We confirmed the effectiveness of the proposed method by comparing the number of place visits it took for the robot to find all the objects. We also observed that our proposed approach reduces the on-site learning cost by a factor of 1.6 over the three baseline methods when the robot performs the task of finding objects with undefined places in a new home environment.
-
H. Nakamura, L. El Hafi*, A. Taniguchi, Y. Hagiwara, and T. Taniguchi, "Multimodal Object Categorization with Reduced User Load through Human-Robot Interaction in Mixed Reality", in Proceedings of 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022), pp. 2143-2150, Kyoto, Japan, Oct. 23, 2022. DOI: 10.1109/IROS47612.2022.9981374 [International conference article, peer-reviewed.][*Corresponding author.]
Abstract
Enabling robots to learn from interactions with users is essential to perform service tasks. However, as a robot categorizes objects from multimodal information obtained by its sensors during interactive onsite teaching, the inferred names of unknown objects do not always match the human user's expectation, especially when the robot is introduced to new environments. Confirming the learning results through natural speech interaction with the robot often puts an additional burden on the user who can only listen to the robot to validate the results. Therefore, we propose a human-robot interface to reduce the burden on the user by visualizing the inferred results in mixed reality (MR). In particular, we evaluate the proposed interface on the system usability scale (SUS) and the NASA task load index (NASA-TLX) with three experimental object categorization scenarios based on multimodal latent Dirichlet allocation (MLDA) in which the robot: 1) does not share the inferred results with the user at all, 2) shares the inferred results through speech interaction with the user (baseline), and 3) shares the inferred results with the user through an MR interface (proposed). We show that providing feedback through an MR interface significantly reduces the temporal, physical, and mental burden on the human user compared to speech interaction with the robot.
-
G. A. Garcia Ricardez, P. M. Uriguen Eljuri, Y. Kamemura, S. Yokota, N. Kugou, Y. Asama, Z. Wang, H. Kumamoto, K. Yoshimoto, W. Y. Chan, T. Nagatani, P. Tulathum, B. Usawalertkamol, L. El Hafi, H. Ikeuchi, M. Yamamoto, J. Takamatsu, T. Taniguchi, and T. Ogasawara, "Autonomous Service Robot for Human-aware Restock, Straightening and Disposal Tasks in Retail Automation", in RSJ Advanced Robotics (AR), Special Issue on Service Robot Technology: Selected Papers from WRS 2020 (Part I), vol. 36, no. 17-18, pp. 936-950, Sep. 17, 2022. DOI: 10.1080/01691864.2022.2109429 [International journal article, peer-reviewed.]
Abstract
The workforce shortage in the service industry, recently highlighted by the pandemic, has increased the need for automation. We propose an autonomous robot to fulfill this purpose. Our mobile manipulator includes an extendable and compliant end effector design, as well as a custom-made automated shelf, and it is capable of manipulating food products such as lunch boxes, while traversing narrow spaces and reacting to human interventions. We benchmarked the solution in the international robotics competition Future Convenience Store Challenge (FCSC) where we obtained the first place in the 2020 edition, as well as in a laboratory setting, both situated in a convenience store scenario. We reported the results evaluated in terms of the score of the FCSC 2020 and further discussed the real-world applicability of the current system and open challenges.
-
T. Wakayama, E. Fujiura, M. Yamaguchi, N. Yoshida, T. Inoue, H. Ikeuchi, M. Yamamoto, L. El Hafi, G. A. Garcia Ricardez, J. Takamatsu, T. Taniguchi, and T. Ogasawara, "Versatile Cleaning Service Robot based on a Mobile Manipulator with Tool Switching for Liquids and Garbage Removal in Restrooms", in RSJ Advanced Robotics (AR), Special Issue on Service Robot Technology: Selected Papers from WRS 2020 (Part I), vol. 36, no. 17-18, pp. 967-981, Sep. 17, 2022. DOI: 10.1080/01691864.2022.2109430 [International journal article, peer-reviewed.]
Abstract
In recent years, the labor shortage has become a significant problem in Japan and other countries due to aging societies. However, service robots can play a decisive role in relieving human workers by performing various household and assistive tasks. Restroom cleaning is one of such challenging tasks that involve performing motion planning in a constrained restroom setting. In this study, we propose a mobile manipulator to perform various tasks related to restroom cleaning. Our key contributions include system integration of multiple tools on an arm with high DoF mounted on a mobile, omni-directional platform capable of versatile service cleaning and with extended reachability. We evaluate the performance of our system with the competition setting used for the restroom cleaning task of the Future Convenience Store Challenge at the World Robot Summit 2020, where we obtained the 1st Place. The proposed system successfully completed all the competition tasks within the time limit and could remove the liquid with a removal rate of 96%. The proposed system could also dispose of most garbage and got an average garbage disposal rate of 90%. Further experiments confirmed the scores obtained in the competition with an even higher liquid removal rate of 98%.
-
P. M. Uriguen Eljuri, Y. Toramatsu, K. Maeyama, L. El Hafi, and T. Taniguchi, "Software Development Environment to Collect Sensor and Robot Data for Imitation Learning of a Pseudo Cranial Window Task", in Proceedings of 2022 Annual Conference of the Robotics Society of Japan (RSJ 2022), ref. RSJ2024AC4A2-07, pp. 1-4, Tokyo, Japan, Sep. 5, 2022. [Domestic conference article, non-peer-reviewed.]
Abstract
The use of AI in robotics has become more common, attempting to make robots able to learn and execute tasks similar to humans. To teach the robot how to do a task, we must record multiple samples from an expert. This paper introduces our containerized software development environment that can be quickly deployed to collect and extract data from a robot while doing a teleoperation task. This environment can be deployed on multiple computers, so a user can extract and process the collected data while the expert keeps recording sample data. This environment was tested by recording multiple sensor information while an expert did a pseudo cranial window task.
-
S. Hasegawa, Y. Hagiwara, A. Taniguchi, L. El Hafi, and T. Taniguchi, "確率論理と場所概念を結合したモデルによる場所の学習コストの削減 (Reducing the Cost of Learning Places via a Model that Integrates Probabilistic Logic and Spatial Concept)", in Proceedings of 2022 Annual Conference of the Japanese Society for Artificial Intelligence (JSAI 2022), ref. 1N5-OS-10b-04, pp. 1-4, Kyoto, Japan, Jun. 14, 2022. DOI: 10.11517/pjsai.JSAI2022.0_1N5OS10b04 [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
Abstract
We propose a method that integrates probabilistic logic and spatial concept to enable a robot to acquire knowledge of the relationships between objects and places in a new environment with a few learning times. By combining logical inference with prior knowledge and cross-modal inference within spatial concept, the robot can infer the place of an object even when the probability of its existence is a priori unknown. We conducted experiments in which a robot searched for objects in a simulation environment using four methods: 1) spatial concept only, 2) prior knowledge only, 3) spatial concept and prior knowledge, and 4) probabilistic logic and spatial concept (proposed). We confirmed the effectiveness of the proposed method by comparing the number of place visits it took for the robot to find all the objects. We observed that the robot could find the objects faster using the proposed method.
-
G. A. Garcia Ricardez, L. El Hafi, H. Ikeuchi, M. Yamamoto, J. Takamatsu, T. Taniguchi, and T. Ogasawara, "Team NAIST-RITS-Panasonic at the Future Convenience Store Challenge: Our Approach from 2018 to 2021", in Journal of the Society of Instrument and Control Engineers (SICE), Special Issue on WRS Future Convenience Store Challenge, vol. 61, no. 6, pp. 422-425, Jun. 10, 2022. DOI: 10.11499/sicejl.61.422 [Domestic journal article, non-peer-reviewed.]
Abstract
The paper describes the system development approach followed by researchers and engineers of the team NAIST-RITS-Panasonic (Japan) for their participation in the Future Convenience Store Challenge 2018, 2019 trials and 2020 (held in 2021). This international competition is about the development of robotic capabilities to execute complex tasks for retail automation. The team built four different robots with multiple end effectors, as well as different technologies for mobile manipulation, vision, and HRI. The diversity of the tasks and the competitiveness of the challenge allowed us to template our philosophy and to delineate our path to innovation.
-
L. El Hafi, G. A. Garcia Ricardez, F. von Drigalski, Y. Inoue, M. Yamamoto, and T. Yamamoto, "Software Development Environment for Collaborative Research Workflow in Robotic System Integration", in RSJ Advanced Robotics (AR), Special Issue on Software Framework for Robot System Integration, vol. 36, no. 11, pp. 533-547, Jun. 3, 2022. DOI: 10.1080/01691864.2022.2068353 [International journal article, peer-reviewed.]
Abstract
Today's robotics involves a large range of knowledge and skills across many disciplines. This issue has recently come to light as robotics competitions attract more talented teams to tackle unsolved problems. Although the tasks are challenging, the preparation cycles are usually short. The teams involved, ranging from academic institutions to small startups and large companies, need to develop and deploy their solutions with agility. Therefore, this paper introduces a containerized Software Development Environment (SDE) based on a collaborative workflow relying on open-source technologies for robotic system integration and deployment. The proposed SDE enables the collaborators to focus on their individual expertise and rely on automated tests and unattended simulations. The analysis of the adoption of the proposed SDE shows that several research institutions successfully deployed it in multiple international competitions with various robotic platforms.
-
T. Wakayama, E. Fujiura, M. Yamaguchi, H. Ikeuchi, M. Yamamoto, L. El Hafi, and G. A. Garcia Ricardez, "掃除ツール取り換え機能を有する多種類ゴミ廃棄可能なトイレ清掃ロボットの開発 World Robot Summit 2020 Future Convenience Store Challengeを活用した実用システム開発の試み (Development of the Restroom Cleaning Robot that Can Dispose of Various Types of Garbage with a Cleaning Tool Change Function: Attempt to Develop a Practical System utilizing World Robot Summit 2020 Future Convenience Store Challenge)", in Proceedings of 2022 JSME Conference on Robotics and Mechatronics (ROBOMECH 2022), no. 22-2, ref. 2P2-T03, pp. 1-4, Sapporo, Japan, Jun. 1, 2022. DOI: 10.1299/jsmermd.2022.2P2-T03 [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
Abstract
We have developed a ROS-based mobile manipulator for restroom cleaning with the capability to recognize garbage of various types (pieces of paper, cups, and liquids) and to select the most appropriate tool from three cleaning tools (suction to hold, vacuuming, and moping) to effectively clean a restroom. Upon deployment at the Future Convenience Store Challenge of the World Robot Summit 2020, we obtained the 1st Place in the Restroom Cleaning Task with an almost perfect score (96%).
-
A. Taniguchi, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "記号創発ロボティクスにおける場所概念の形成と応用 (Spatial Concept Formation for Symbol Emergence in Robotics and its Application)", in ISCIE Systems, Control and Information, Special Issue on Spatial Cognition and Semantic Understanding in Mobile Robots, vol. 66 no. 4, pp. 133-138, Apr. 15, 2022. DOI: 10.11509/isciesci.66.4_133 [Domestic journal article, non-peer-reviewed.][Published in Japanese.]
-
Y. Katsumata, A. Kanechika, A. Taniguchi, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "Map Completion from Partial Observation using the Global Structure of Multiple Environmental Maps", in RSJ Advanced Robotics (AR), Special Issue on Symbol Emergence in Robotics and Cognitive Systems (II), vol. 36, no. 5-6, pp. 279-290, Mar. 19, 2022. DOI: 10.1080/01691864.2022.2029762 [International journal article, peer-reviewed.]
Abstract
Using the spatial structure of various indoor environments as prior knowledge, the robot would construct the map more efficiently. Autonomous mobile robots generally apply simultaneous localization and mapping (SLAM) methods to understand the reachable area in newly visited environments. However, conventional mapping approaches are limited by only considering sensor observation and control signals to estimate the current environment map. This paper proposes a novel SLAM method, map completion network-based SLAM (MCN-SLAM), based on a probabilistic generative model incorporating deep neural networks for map completion. These map completion networks are primarily trained in the framework of generative adversarial networks (GANs) to extract the global structure of large amounts of existing map data. We show in experiments that the proposed method can estimate the environment map 1.3 times better than the previous SLAM methods in the situation of partial observation.
-
T. Fukumori, C. Cai, Y. Zhang, L. El Hafi, Y. Hagiwara, T. Nishiura, and T. Taniguchi, "Optical Laser Microphone for Human-Robot Interaction: Speech Recognition in Extremely Noisy Service Environments", in RSJ Advanced Robotics (AR), Special Issue on Symbol Emergence in Robotics and Cognitive Systems (II), vol. 36, no. 5-6, pp. 304-317, Mar. 19, 2022. DOI: 10.1080/01691864.2021.2023629 [International journal article, peer-reviewed.]
Abstract
Domestic robots are often required to understand spoken commands in noisy environments, including service appliances' operating sounds. Most conventional domestic robots use electret condenser microphones (ECMs) to record the sound. However, the ECMs are known to be sensitive to the noise in the direction of sound arrival. The laser Doppler vibrometer (LDV), which has been widely used in the research field of measurement, has the potential to work as a new speech-input device to solve this problem. The aim of this paper is to investigate the effectiveness of using the LDV as an optical laser microphone for human-robot interaction in extremely noisy service environments. Our robot irradiates an object near a speaker with a laser and measures the vibration of the object to record the sound. We conducted three experiments to assess the performance of speech recognition using the optical laser microphone in various settings and showed stable performance in extremely noisy conditions compared with a conventional ECM.
-
J. Wang, L. El Hafi*, A. Taniguchi, Y. Hagiwara, and T. Taniguchi, "Extending HoloGAN by Embedding Image Content into Latent Vectors for Novel View Synthesis", in Proceedings of 2022 IEEE/SICE International Symposium on System Integration (SII 2022), pp. 383-389, Narvik, Norway (Virtual), Jan. 9, 2022. DOI: 10.1109/SII52469.2022.9708823 [International conference article, peer-reviewed.][*Corresponding author.]
Abstract
This study aims to further develop the task of novel view synthesis by generative adversarial networks (GAN). The goal of novel view synthesis is to, given one or more input images, synthesize images of the same target content but from different viewpoints. Previous research showed that the unsupervised learning model HoloGAN achieved high performance in generating images from different viewpoints. However, HoloGAN is less capable of specifying the target content to generate and is difficult to train due to high data requirements. Therefore, this study proposes two approaches to improve the current limitations of HoloGAN and make it suitable for the task of novel view synthesis. The first approach reuses the encoder network of HoloGAN to get the corresponding latent vectors of the image contents to specify the target content of the generated images. The second approach introduces an auto-encoder architecture to HoloGAN so that more viewpoints can be generated correctly. The experiment results indicate that the first approach is efficient in specifying a target content. Meanwhile, the second approach method helps HoloGAN to learn a richer range of viewpoints but is not compatible with the first approach. The combination of these two approaches and their application to service robotics are discussed in conclusion.
-
A. S. Rathore, L. El Hafi*, G. A. Garcia Ricardez, and T. Taniguchi, "Human Action Categorization System using Body Pose Estimation for Multimodal Observations from Single Camera", in Proceedings of 2022 IEEE/SICE International Symposium on System Integration (SII 2022), pp. 914-920, Narvik, Norway (Virtual), Jan. 9, 2022. DOI: 10.1109/SII52469.2022.9708816 [International conference article, peer-reviewed.][*Corresponding author.]
Abstract
We propose a system using a multimodal probabilistic approach to solve the human action recognition challenge. This is achieved by extracting the human pose from an ongoing activity from a single camera. This pose is used to capture additional body information using generalized features such as location, time, distances, and angles. A probabilistic model, multimodal latent Dirichlet allocation (MLDA), which uses this multimodal information, is then used to recognize actions through topic modeling. We also investigate the influence of each modality and their combinations to recognize human actions from multimodal observations. The experiments show that the proposed generalized features captured significant information that enabled the classification of various daily activities without requiring prior labeled data.
-
P. M. Uriguen Eljuri, L. El Hafi, G. A. Garcia Ricardez, A. Taniguchi, and T. Taniguchi, "Neural Network-based Motion Feasibility Checker to Validate Instructions in Rearrangement Tasks before Execution by Robots", in Proceedings of 2022 IEEE/SICE International Symposium on System Integration (SII 2022), pp. 1058-1063, Narvik, Norway (Virtual), Jan. 9, 2022. DOI: 10.1109/SII52469.2022.9708602 [International conference article, peer-reviewed.]
Abstract
In this paper, we address the task of rearranging items with a robot. A rearrangement task is challenging because it requires us to solve the following issues: determine how to pick the items and plan how and where to place the items. In our previous work, we proposed to solve a rearrangement task by combining the symbolic and motion planners using a Motion Feasibility Checker (MFC) and a Monte Carlo Tree Search (MCTS). The MCTS searches for the goal while it collaborates with the MFC to accept or reject instructions. We could solve the rearrangement task, but one drawback is the time it takes to find a solution. In this study, we focus on quickly accepting or rejecting tentative instructions obtained from an MCTS. We propose using a Neural Network-based Motion Feasibility Checker (NN-MFC), a Fully Connected Neural Network trained with data obtained from the MFC. This NN-MFC quickly decides if the instruction is valid or not, reducing the time the MCTS uses to find a solution to the task. The NN-MFC determines the validity of the instruction based on the initial and target poses of the item. Before the final execution of the instructions, we re-validate the instructions with the MFC as a confirmation before execution. We tested the proposed method in a simulation environment by doing an item rearrangement task in a convenience store setup.
-
T. Wakayama, G. A. Garcia Ricardez, L. El Hafi, and J. Takamatsu, "6D-Pose Estimation for Manipulation in Retail Robotics using the Inference-embedded OAK-D Camera", in Proceedings of 2022 IEEE/SICE International Symposium on System Integration (SII 2022), pp. 1046-1051, Narvik, Norway (Virtual), Jan. 9, 2022. DOI: 10.1109/SII52469.2022.9708910 [International conference article, peer-reviewed.]
Abstract
The socio-economic need for service robots has become more evident during the ongoing pandemic. To boost their deployment, robots need to improve their manipulation capabilities, which includes solving one of the biggest challenges: determine the position and orientation of the target objects. While conventional approaches use markers which require constant maintenance, deep-learning-based approaches require a host computer with high specifications. In this paper, we propose a 6D-pose estimation system whose segmentation algorithm is embedded into OAK-D, a camera capable of running neural networks on-board, which reduces the host requirements. Furthermore, we propose a point cloud selection method to increase the accuracy of the 6D-pose estimation. We test our solution in a convenience store setup where we mount the OAK-D camera on a mobile robot developed for straightening and disposing of items, and whose manipulation success depends on 6D-pose estimation. We evaluate the accuracy of our solution by comparing the estimated 6D-pose of eight items to the ground truth. Finally, we discuss technical challenges faced during the integration of the proposed solution into a fully autonomous robot.
-
P. M. Uriguen Eljuri, Y. Toramatsu, L. El Hafi, G. A. Garcia Ricardez, A. Taniguchi, and T. Taniguchi, "物体の再配置タスクにおける動作実行可能性判定器のニューラルネットワークを用いた高速化 (Neural Network Acceleration of Motion Feasibility for Object Arrangement Task)", in Proceedings of 2021 SICE System Integration Division Annual Conference (SI 2021), pp. 3422-3425, (Virtual), Dec. 15, 2021. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
Abstract
We focus on the task of arranging objects using a robot. In our previous work, we proposed to use a Monte Carlo Tree Search (MCTS) and a Motion Feasibility Checker (MFC) to solve the task. However, the problem with the existing method is that is time-consuming. In this paper, we propose to use a Neural Network-based MFC (NN-MFC). This NN-MFC can quickly determine the motion feasibility of the robot and reduce the time used by the MCTS to find a solution. We tested the proposed method in a simulation environment by doing an item rearrangement task in a convenience store setup.
-
S. Hasegawa, A. Taniguchi, Y. Hagiwara, L. El Hafi, T. Nakashima, and T. Taniguchi, "確率論理と場所概念モデルの結合による確率的プランニング", in Proceedings of 2021 Annual Conference of the Robotics Society of Japan (RSJ 2021), ref. RSJ2021AC2H2-01, pp. 1-4, (Virtual), Sep. 8, 2021. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
-
T. Nakashima, A. Taniguchi, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "場所概念獲得がLoop Closure性能に及ぼす影響評価", in Proceedings of 2021 Annual Conference of the Robotics Society of Japan (RSJ 2021), ref. RSJ2021AC1I4-06, pp. 1-3, (Virtual), Sep. 8, 2021. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
-
A. Kanechika, L. El Hafi, A. Taniguchi, Y. Hagiwara, and T. Taniguchi, "自然な発話文教示に基づく弱教師あり物体領域分割の検証", in Proceedings of 2021 Annual Conference of the Robotics Society of Japan (RSJ 2021), ref. RSJ2021AC1I2-02, pp. 1-4, (Virtual), Sep. 8, 2021. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
-
T. Taniguchi, L. El Hafi, Y. Hagiwara, A. Taniguchi, N. Shimada, and T. Nishiura, "The Necessity of Semiotically Adaptive Cognition for Realizing Remotely-Operated Service Robots in the New Normal Society", in Proceedings of 2021 IEEE International Conference on Advanced Robotics and its Social Impacts (ARSO 2021), pp. 266-267, Tokoname, Japan (Virtual), Jul. 8, 2021. [International conference article, peer-reviewed.]
Abstract
In this study, we argue that the development of semiotically adaptive cognition is indispensable for realizing remotely-operated service robots to enhance the quality of the new normal society. To enable a wide range of people to work from home in a pandemic like the current COVID-19 situation, the installation of remotely-operated service robots into the work environment is crucial. However, it is evident that remotely-operated robots must have partial autonomy. The capability of learning local semiotic knowledge for improving autonomous decision making and language understanding is crucial to reduce the workload of people working from home. To achieve this goal, we refer to three challenges: the learning of local semiotic knowledge from daily human activities, the acceleration of local knowledge learning with transfer learning and active exploration, and the augmentation of human-robot interactions.
-
T. Taniguchi, L. El Hafi, Y. Hagiwara, A. Taniguchi, N. Shimada, and T. Nishiura, "Semiotically Adaptive Cognition: Toward the Realization of Remotely-Operated Service Robots for the New Normal Symbiotic Society", in RSJ Advanced Robotics (AR), Extra Special Issue on Soft/Social/Systemic (3S) Robot Technologies for Enhancing Quality of New Normal (QoNN), vol. 35, no. 11, pp. 664-674, Jun. 3, 2021. DOI: 10.1080/01691864.2021.1928552 [International journal article, peer-reviewed.]
Abstract
The installation of remotely-operated service robots in the environments of our daily life (including offices, homes, and hospitals) can improve work-from-home policies and enhance the quality of the so-called new normal. However, it is evident that remotely-operated robots must have partial autonomy and the capability to learn and use local semiotic knowledge. In this paper, we argue that the development of semiotically adaptive cognitive systems is key to the installation of service robotics technologies in our service environments. To achieve this goal, we describe three challenges: the learning of local knowledge, the acceleration of onsite and online learning, and the augmentation of human-robot interactions.
-
A. Taniguchi, S. Isobe, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "Autonomous Planning based on Spatial Concepts to Tidy Up Home Environments with Service Robots", in RSJ Advanced Robotics (AR), vol. 35, no. 8, pp. 471-489, Apr. 18, 2021. DOI: 10.1080/01691864.2021.1890212 [International journal article, peer-reviewed.]
Abstract
Tidy-up tasks by service robots in home environments are challenging in robotics applications because they involve various interactions with the environment. In particular, robots are required not only to grasp, move, and release various home objects but also to plan the order and positions for placing the objects. In this paper, we propose a novel planning method that can efficiently estimate the order and positions of the objects to be tidied up by learning the parameters of a probabilistic generative model. The model allows a robot to learn the distributions of the co-occurrence probability of the objects and places to tidy up using the multimodal sensor information collected in a tidied environment. Additionally, we develop an autonomous robotic system to perform the tidy-up operation. We evaluate the effectiveness of the proposed method by an experimental simulation that reproduces the conditions of the Tidy Up Here task of the World Robot Summit 2018 international robotics competition. The simulation results show that the proposed method enables the robot to successively tidy up several objects and achieves the best task score among the considered baseline tidy-up methods.
-
L. El Hafi, H. Nakamura, A. Taniguchi, Y. Hagiwara, and T. Taniguchi, "Teaching System for Multimodal Object Categorization by Human-Robot Interaction in Mixed Reality", in Proceedings of 2021 IEEE/SICE International Symposium on System Integration (SII 2021), pp. 320-324, Iwaki, Japan (Virtual), Jan. 11, 2021. DOI: 10.1109/IEEECONF49454.2021.9382607 [International conference article, peer-reviewed.]
Abstract
As service robots are becoming essential to support aging societies, teaching them how to perform general service tasks is still a major challenge preventing their deployment in daily-life environments. In addition, developing an artificial intelligence for general service tasks requires bottom-up, unsupervised approaches to let the robots learn from their own observations and interactions with the users. However, compared to the top-down, supervised approaches such as deep learning where the extent of the learning is directly related to the amount and variety of the pre-existing data provided to the robots, and thus relatively easy to understand from a human perspective, the learning status in bottom-up approaches is by their nature much harder to appreciate and visualize. To address these issues, we propose a teaching system for multimodal object categorization by human-robot interaction through Mixed Reality (MR) visualization. In particular, our proposed system enables a user to monitor and intervene in the robot's object categorization process based on Multimodal Latent Dirichlet Allocation (MLDA) to solve unexpected results and accelerate the learning. Our contribution is twofold by 1) describing the integration of a service robot, MR interactions, and MLDA object categorization in a unified system, and 2) proposing an MR user interface to teach robots through intuitive visualization and interactions.
-
K. Hayashi, W. Zheng, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "Bidirectional Generation of Object Images and Positions using Deep Generative Models for Service Robotics Applications", in Proceedings of 2021 IEEE/SICE International Symposium on System Integration (SII 2021), pp. 325-329, Iwaki, Japan (Virtual), Jan. 11, 2021. DOI: 10.1109/IEEECONF49454.2021.9382768 [International conference article, peer-reviewed.]
Abstract
The introduction of systems and robots for automated services is important for reducing running costs and improving operational efficiency in the retail industry. To this aim, we develop a system that enables robot agents to display products in stores. The main problem in automating product display using common supervised methods with robot agents is the huge amount of data required to recognize product categories and arrangements in a variety of different store layouts. To solve this problem, we propose a crossmodal inference system based on joint multimodal variational autoencoder (JMVAE) that learns the relationship between object image information and location information observed on site by robot agents. In our experiments, we created a simulation environment replicating a convenience store that allows a robot agent to observe an object image and its 3D coordinate information, and confirmed whether JMVAE can learn and generate a shared representation of an object image and 3D coordinates in a bidirectional manner.
-
Y. Katsumata, A. Taniguchi, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "SpCoMapGAN: Spatial Concept Formation-based Semantic Mapping with Generative Adversarial Networks", in Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2020), pp. 7927-7934, Las Vegas, United States (Virtual), Oct. 24, 2020. DOI: 10.1109/IROS45743.2020.9341456 [International conference article, peer-reviewed.]
Abstract
In semantic mapping, which connects semantic information to an environment map, it is a challenging task for robots to deal with both local and global information of environments. In addition, it is important to estimate semantic information of unobserved areas from already acquired partial observations in a newly visited environment. On the other hand, previous studies on spatial concept formation enabled a robot to relate multiple words to places from bottom-up observations even when the vocabulary was not provided beforehand. However, the robot could not transfer global information related to the room arrangement between semantic maps from other environments. In this paper, we propose SpCoMapGAN, which generates the semantic map in a newly visited environment by training an inference model using previously estimated semantic maps. SpCoMapGAN uses generative adversarial networks (GANs) to transfer semantic information based on room arrangements to a newly visited environment. Our proposed method assigns semantics to the map of an unknown environment using the prior distribution of the map trained in known environments and the multimodal observations made in the unknown environment. We experimentally show in simulation that SpCoMapGAN can use global information for estimating the semantic map and is superior to previous methods. Finally, we also demonstrate in a real environment that SpCoMapGAN can accurately 1) deal with local information, and 2) acquire the semantic information of real places.
-
L. El Hafi and T. Yamamoto, "Toward the Public Release of a Software Development Environment for Human Support Robots", in Proceedings of 2020 Annual Conference of the Robotics Society of Japan (RSJ 2020), ref. RSJ2020AC3E1-01, pp. 1-2, (Virtual), Oct. 9, 2020. [Domestic conference article, non-peer-reviewed.]
Abstract
This paper describes the latest developments of the ongoing effort to bring a shared Software Development Environment (SDE) to the Toyota Human Support Robot (HSR) Community to collaborate on large robotics projects. The SDE described in this paper is developed and maintained by the HSR Software Development Environment Working Group (SDE-WG) officially endorsed by Toyota Motor Corporation (TMC). The source code and documentation for deployment are available to all HSR Community members upon request at: https://gitlab.com/hsr-sde-wg/HSR.
-
Y. Katsumata, A. Kanechika, A. Taniguchi, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "深層生成モデルを用いた地図補完とSLAMの統合", in Proceedings of 2020 Annual Conference of the Robotics Society of Japan (RSJ 2020), ref. RSJ2020AC2C1-01, pp. 1-4, (Virtual), Oct. 9, 2020. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
-
H. Nakamura, L. El Hafi, A. Taniguchi, Y. Hagiwara, and T. Taniguchi, "拡張現実を用いたロボットの物体カテゴリ分類教示システムの提案", in Proceedings of 2020 Annual Conference of the Robotics Society of Japan (RSJ 2020), ref. RSJ2020AC2E3-02, pp. 1-4, (Virtual), Oct. 9, 2020. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
-
A. Taniguchi, Y. Tabuchi, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "環境の能動的な探索による効率的な場所概念の形成 (Efficient Spatial Concept Formation by Active Exploration of the Environment)", in Proceedings of 2020 Annual Conference of the Japanese Society for Artificial Intelligence (JSAI 2020), ref. 2M4-OS-3a-05, pp. 1-4, Kumamoto, Japan (Virtual), Jun. 9, 2020. DOI: 10.11517/pjsai.JSAI2020.0_2M4OS3a05 [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
Abstract
Autonomous service robots are required to adaptively learn the categories and names of various places through the exploration of the surrounding environment and interactions with users. In this study, we aim to realize the efficient learning of spatial concepts by autonomous active exploration with a mobile robot. Therefore, we propose an active learning algorithm that combines sequential Bayesian inference by a particle filter and position determination based on information-gain in probabilistic generative models. Our experiment shows that the proposed method can efficiently determine the position to form spatial concepts in simulated home environments.
-
Y. Katsumata, A. Taniguchi, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "Generative Adversarial Networksと場所概念形成の確率モデルの融合に基づくSemantic Mapping (Probabilistic Model of Spatial Concepts Integrating Generative Adversarial Networks for Semantic Mapping)", in Proceedings of 2020 Annual Conference of the Japanese Society for Artificial Intelligence (JSAI 2020), ref. 2M6-GS-13-01, pp. 1-4, Kumamoto, Japan (Virtual), Jun. 9, 2020. DOI: 10.11517/pjsai.JSAI2020.0_2M6GS1301 [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
Abstract
This paper proposes SpCoMapGAN, a method to generate the semantic map in a newly visited environment by training an inference model using previously estimated semantic maps. SpCoMapGAN uses generative adversarial networks (GANs) to transfer semantic information based on room arrangements to the newly visited environment. We experimentally show in simulation that SpCoMapGAN can use global information for estimating the semantic map and is superior to previous related methods.
-
G. A. Garcia Ricardez*, L. El Hafi*, and F. von Drigalski*, "Standing on Giant's Shoulders: Newcomer's Experience from the Amazon Robotics Challenge 2017", in Book chapter, Advances on Robotic Item Picking: Applications in Warehousing & E-Commerce Fulfillment, pp. 87-100, May 9, 2020. DOI: 10.1007/978-3-030-35679-8_8 [Book chapter.][*Authors contributed equally.]
Abstract
International competitions have fostered innovation in fields such as artificial intelligence, robotic manipulation, and computer vision, and incited teams to push the state of the art. In this chapter, we present the approach, design philosophy and development strategy that we followed during our participation in the Amazon Robotics Challenge 2017, a competition focused on warehouse automation. After introducing our solution, we detail the development of two of its key features: the suction tool and storage system. A systematic analysis of the suction force and details of the end effector features, such as suction force control, grasping, and collision detection, are also presented. Finally, this chapter reflects on the lessons we learned from our participation in the competition, which we believe are valuable to future robot challenge participants, as well as warehouse automation system designers.
-
G. A. Garcia Ricardez, S. Okada, N. Koganti, A. Yasuda, P. M. Uriguen Eljuri, T. Sano, P.-C. Yang, L. El Hafi, M. Yamamoto, J. Takamatsu, and T. Ogasawara, "Restock and Straightening System for Retail Automation using Compliant and Mobile Manipulation", in RSJ Advanced Robotics (AR), Special Issue on Service Robot Technology: Selected Papers from WRS 2018, vol. 34, no. 3-4, pp. 235-249, Feb. 16, 2020. DOI: 10.1080/01691864.2019.1698460 [International journal article, peer-reviewed.]
Abstract
As the retail industry keeps expanding and shortage of workers increasing, there is a need for autonomous manipulation of products to support retail operations. The increasing amount of products and customers in establishments such as convenience stores requires the automation of restocking, disposing and straightening of products. The manipulation of products needs to be time-efficient, avoid damaging products and beautify the display of products. In this paper, we propose a robotic system to restock shelves, dispose expired products, and straighten products in retail environments. The proposed mobile manipulator features a custom-made end effector with compact and compliant design to safely and effectively manipulate products in retail stores. Through experiments in a convenience store scenario, we verify the effectiveness of our system to restock, dispose and rearrange items.
-
G. A. Garcia Ricardez, N. Koganti, P.-C. Yang, S. Okada, P. M. Uriguen Eljuri, A. Yasuda, L. El Hafi, M. Yamamoto, J. Takamatsu, and T. Ogasawara, "Adaptive Motion Generation using Imitation Learning and Highly-Compliant End Effector for Autonomous Cleaning", in RSJ Advanced Robotics (AR), Special Issue on Service Robot Technology: Selected Papers from WRS 2018, vol. 34, no. 3-4, pp. 189-201, Feb. 16, 2020. DOI: 10.1080/01691864.2019.1698461 [International journal article, peer-reviewed.]
Abstract
Recent demographic trends in super aging societies, such as Japan, is leading to severe worker shortage. Service robots can play a promising role to augment human workers for performing various household and assistive tasks. Toilet cleanup is one such challenging task that involves performing complaint motion planning in a constrained toilet setting. In this study, we propose an end-to-end robotic framework to perform various tasks related to toilet cleanup. Our key contributions include the design of a complaint and multipurpose end-effector, an adaptive motion generation algorithm, and an autonomous mobile manipulator capable of garbage detection, garbage disposal and liquid removal. We evaluate the performance of our framework with the competition setting used for toilet cleanup in the Future Convenience Store Challenge at the World Robot Summit 2018. We demonstrate that our proposed framework is capable of successfully completing all the tasks of the competition within the time limit.
-
L. El Hafi, S. Isobe, Y. Tabuchi, Y. Katsumata, H. Nakamura, T. Fukui, T. Matsuo, G. A. Garcia Ricardez, M. Yamamoto, A. Taniguchi, Y. Hagiwara, and T. Taniguchi, "System for Augmented Human-Robot Interaction through Mixed Reality and Robot Training by Non-Experts in Customer Service Environments", in RSJ Advanced Robotics (AR), Special Issue on Service Robot Technology: Selected Papers from WRS 2018, vol. 34, no. 3-4, pp. 157-172, Feb. 16, 2020. DOI: 10.1080/01691864.2019.1694068 [International journal article, peer-reviewed.]
Abstract
Human-robot interaction during general service tasks in home or retail environment has been proven challenging, partly because (1) robots lack high-level context-based cognition and (2) humans cannot intuit the perception state of robots as they can for other humans. To solve these two problems, we present a complete robot system that has been given the highest evaluation score at the Customer Interaction Task of the Future Convenience Store Challenge at the World Robot Summit 2018, which implements several key technologies: (1) a hierarchical spatial concepts formation for general robot task planning and (2) a mixed reality interface to enable users to intuitively visualize the current state of the robot perception and naturally interact with it. The results obtained during the competition indicate that the proposed system allows both non-expert operators and end users to achieve human-robot interactions in customer service environments. Furthermore, we describe a detailed scenario including employee operation and customer interaction which serves as a set of requirements for service robots and a road map for development. The system integration and task scenario described in this paper should be helpful for groups facing customer interaction challenges and looking for a successfully deployed base to build on.
-
Y. Katsumata, L. El Hafi, A. Taniguchi, Y. Hagiwara, and T. Taniguchi, "Integrating Simultaneous Localization and Mapping with Map Completion using Generative Adversarial Networks", in Proceedings of 2019 IEEE/RSJ IROS Workshop on Deep Probabilistic Generative Models for Cognitive Architecture in Robotics (DPGM-CAR 2019), Macau, China, Nov. 8, 2019. [International conference article, peer-reviewed.]
Abstract
When autonomous robots perform tasks which include moving in daily human environments, they need to generate environment maps. In this research, we propose a simultaneous localization and mapping method which integrates the prior probability distribution of the map completion trained by a generative model architecture. The contribution of this research is that the method can estimate the environment map efficiently from pre-training in other environments. We show with an experiment that the proposed method performs better than other classic methods to estimate environment maps by observation without moving in a simulator.
-
L. El Hafi, S. Matsuzaki, S. Itadera, and T. Yamamoto, "Deployment of a Containerized Software Development Environment for Human Support Robots", in Proceedings of 2019 Annual Conference of the Robotics Society of Japan (RSJ 2019), ref. RSJ2019AC3K1-03, pp. 1-2, Tokyo, Japan, Sep. 3, 2019. [Domestic conference article, non-peer-reviewed.]
Abstract
This paper introduces a containerized Software Development Environment (SDE) for the Toyota Human Support Robot (HSR) to collaborate on large robotics projects. The objective is twofold: 1) enable interdisciplinary teams to quickly start research and development with the HSR by sharing a containerized SDE, and 2) accelerate research implementation and integration within the Toyota HSR Community by deploying a common SDE across its members. The SDE described in this paper is developed and maintained by the HSR Software Development Environment Working Group (SDE-WG) following a solution originally proposed by Ritsumeikan University and endorsed by Toyota Motor Corporation (TMC). The source code and documentation required to deploy the SDE are available to all HSR Community members upon request at: https://gitlab.com/hsr-sde-wg/HSR.
-
H. Nakamura, L. El Hafi, Y. Hagiwara, and T. Taniguchi, "複合現実によるロボットの空間認識可視化のためのSemantic-ICPを用いたキャリブレーション (Calibration System using Semantic-ICP for Visualization of Robot Spatial Perception through Mixed Reality)", in Proceedings of 2019 Annual Conference of the Japanese Society for Artificial Intelligence (JSAI 2019), ref. 1L3-J-11-02, pp. 1-4, Niigata, Japan, Jun. 4, 2019. DOI: 10.11517/pjsai.JSAI2019.0_1L3J1102 [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
Abstract
To achieve symbiosis between humans and robots, it is important to know what the robots recognize in their environment. Such information can be displayed using a Mixed Reality (MR) head-mounted device to provide an intuitive understanding of a robot perception. However, a robust calibration system is required because the robot and head-mounted MR device have different coordinate systems. In this paper, we develop a semantic-based calibration system for human-robot interactions in MR using Semantic-ICP. We show that the calibration system using Semantic-ICP is better than using GICP SE(3) when the accuracy of the semantic labels is high.
-
L. El Hafi, Y. Hagiwara, and T. Taniguchi, "Abstraction-Rich Workflow for Agile Collaborative Development and Deployment of Robotic Solutions", in Proceedings of 2018 Annual Conference of the Robotics Society of Japan (RSJ 2018), ref. RSJ2018AC3D3-02, pp. 1-3, Kasugai, Japan, Sep. 5, 2018. [Domestic conference article, non-peer-reviewed.]
Abstract
This paper introduces a collaborative workflow for development and deployment of robotic solutions. The main contribution lies in the introduction of multiple layers of abstraction between the different components and processes. These layers enable the collaborators to focus on their individual expertise and rely on automated tests and simulations from the system. The ultimate goal is to help interdisciplinary teams to work together efficiently on robotics projects.
-
J. Takamatsu, L. El Hafi, K. Takemura, and T. Ogasawara, "角膜反射画像を用いた視線追跡と物体認識 (Gaze Estimation and Object Recognition using Corneal Images)", in Proceedings of 149th MOC/JSAP Microoptics Meeting on Recognition and Authentication, vol. 36, no. 3, pp. 13-18, Tokyo, Japan, Sep. 5, 2018. [Domestic workshop article, non-peer-reviewed.][Published in Japanese.]
Abstract
We introduce a method for simultaneously estimating gaze directions and types of the gazed objects from corneal images captured by an eye camera embedded in an eye tracker. The proposed method is useful for simplifying the inherent mechanisms of eye trackers. Since the target objects are distorted on the corneal images, we use two approaches: one is to undistort the images and then apply conventional object detection, and the other is to apply deep learning-based object detection directly to the distorted images. In the latter approach, we describe a method to collect a large amount of data to train the detection with little effort.
-
G. A. Garcia Ricardez, F. von Drigalski, L. El Hafi, S. Okada, P.-C. Yang, W. Yamazaki, V. G. Hoerig, A. Delmotte, A. Yuguchi, M. Gall, C. Shiogama, K. Toyoshima, P. M. Uriguen Eljuri, R. Elizalde Zapata, M. Ding, J. Takamatsu, and T. Ogasawara, "Warehouse Picking Automation System with Learning- and Feature-based Object Recognition and Grasping Point Estimation", in Proceedings of 2017 SICE System Integration Division Annual Conference (SI 2017), pp. 2249-2253, Sendai, Japan, Dec. 20, 2017. [Domestic conference article, non-peer-reviewed.]
Abstract
The Amazon Robotics Challenge (ARC) has become one of the biggest robotic competitions in the field of warehouse automation and manipulation. In this paper, we present our solution to the ARC 2017 which uses both learning-based and feature-based techniques for object recognition and grasp point estimation in unstructured collections of objects and a partially controlled space. Our solution proved effective both for previously unknown items even with little data acquisition, as well as for items from the training set, obtaining the 6th place out of 16 contestants.
-
G. A. Garcia Ricardez, F. von Drigalski, L. El Hafi, M. Ding, J. Takamatsu, and T. Ogasawara, "Lessons from the Airbus Shopfloor Challenge 2016 and the Amazon Robotics Challenge 2017", in Proceedings of 2017 SICE System Integration Division Annual Conference (SI 2017), pp. 572-575, Sendai, Japan, Dec. 20, 2017. [Domestic conference article, non-peer-reviewed.]
Abstract
International robotics competitions bring together the research community to solve real-world, current problems such as drilling in aircraft manufacturing (Airbus Shopfloor Challenge) and warehouse automation (Amazon Robotics Challenge). In this paper, we discuss our approaches to these competitions and describe the technical difficulties, design philosophy, development, lessons learned and remaining challenges.
-
F. von Drigalski*, L. El Hafi*, P. M. Uriguen Eljuri*, G. A. Garcia Ricardez*, J. Takamatsu, and T. Ogasawara, "Vibration-Reducing End Effector for Automation of Drilling Tasks in Aircraft Manufacturing", in IEEE Robotics and Automation Letters (RA-L), vol. 2, no. 4, pp. 2316-2321, Oct, 2017. DOI: 10.1109/LRA.2017.2715398 [International journal article, peer-reviewed.][Presented at 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017), Vancouver, Canada, Sep. 2017.][*Authors contributed equally.]
Abstract
In this letter, we present an end effector that can drill holes compliant to aeronautic standards while mounted on a lightweight robot arm. There is an unmet demand for a robotic solution capable of drilling inside an aircraft fuselage, as size, weight, and space constraints disqualify current commercial solutions for this task. Our main contribution is the mechanical design of the end effector with high-friction, vibration-reducing feet that are pressed against the workpiece during the drilling process to increase stability, and a separate linear actuator to advance the drill. This relieves the robot arm of the task of advancing and stabilizing the drill, and leaves it with the task of positioning and holding the end effector. The stabilizing properties of the end effector are confirmed experimentally. The solution took first place at the Airbus Shopfloor Challenge, an international robotics competition held at ICRA 2016 that modeled the in-fuselage drilling task.
-
L. El Hafi, "STARE: Real-Time, Wearable, Simultaneous Gaze Tracking and Object Recognition from Eye Images (STARE: 眼球画像を用いた実時間処理可能な装着型デバイスによる視線追跡と物体認識)", in PhD thesis, Nara Institute of Science and Technology (NAIST), ref. NAIST-IS-DD1461207, Ikoma, Japan, Sep. 25, 2017. DOI: 10.34413/dr.01472 [PhD thesis.][Supervised by T. Ogasawara, H. Kato, J. Takamatsu, M. Ding, and K. Takemura.]
Abstract
This thesis proposes STARE, a wearable system to perform real-time, simultaneous eye tracking and focused object recognition for daily-life applications in varied illumination environments. The proposed system extracts both the gaze direction and scene information using eye images captured by a single RGB camera facing the user's eye. In particular, the method requires neither infrared sensors nor a front-facing camera to capture the scene, making it more socially acceptable when embedded in a wearable device. This approach is made possible by recent technological advances in increased resolution and reduced size of camera sensors, as well as significantly more powerful image treatment techniques based on deep learning. First, a model-based approach is used to estimate the gaze direction using RGB eye images. A 3D eye model is constructed from an image of the eye by fitting an ellipse onto the iris. The gaze direction is then continuously track by rotating the model to simulate projections of the iris area for different eye poses and matching the iris area of the subsequent images with the corresponding projections obtained from the model. By using an additional one-time calibration, the point of regard (POR) is computed, which allows to identify where a user is looking in the scene image reflected on the cornea. Next, objects in the scene reflected on the cornea are recognized in real time using the gaze direction information. Deep learning algorithms are applied to classify and then recognize the focused object in the area surrounding the reflected POR on the eye image. Additional processes using High Dynamic Range (HDR) demonstrate that the proposed method can perform in varied illumination conditions. Finally, the validity of the approach is verified experimentally with a 3D-printable prototype of a wearable device equipped with dual cameras, and a high-sensitivity camera in extreme illumination conditions. Further, a proof-of-concept implementation of a state-of-the-art neural network shows that the focused object recognition can be performed in real time. To summarize, the proposed method and prototype contribute a novel, complete framework to 1) simultaneously perform eye tracking and focused object analysis in real time, 2) automatically generate datasets of focused objects by using the reflected POR, 3) reduce the number of sensors in current gaze trackers to a single RGB camera, and 4) enable daily-life applications in all kinds of illumination. The combination of these features makes it an attractive choice for eye-based human behavior analysis, as well as for creating large datasets of objects focused by the user during daily tasks.
-
L. El Hafi, M. Ding, J. Takamatsu, and T. Ogasawara, "STARE: Realtime, Wearable, Simultaneous Gaze Tracking and Object Recognition from Eye Images", in SMPTE Motion Imaging Journal, vol. 126, no. 6, pp. 37-46, Aug. 9, 2017. DOI: 10.5594/JMI.2017.2711899 [International journal article, peer-reviewed.]
Abstract
We propose STARE, a wearable system to perform realtime, simultaneous eye tracking and focused object recognition for daily-life applications in varied illumination environments. Our proposed method uses a single camera sensor to evaluate the gaze direction and requires neither a front-facing camera nor infrared sensors. To achieve this, we describe: 1) a model-based approach to estimate the gaze direction using red-green-blue (RGB) eye images; 2) a method to recognize objects in the scene reflected on the cornea in real time; and 3) a 3D-printable prototype of a wearable gaze-tracking device. We verify the validity of our approach experimentally with different types of cameras in different illumination settings, and with a proof-of-concept implementation of a state-of-the-art neural network. The proposed system can be used as a framework for RGB-based eye tracking and human behavior analysis.
-
G. A. Garcia Ricardez*, L. El Hafi*, F. von Drigalski*, R. Elizalde Zapata, C. Shiogama, K. Toyoshima, P. M. Uriguen Eljuri, M. Gall, A. Yuguchi, A. Delmotte, V. G. Hoerig, W. Yamazaki, S. Okada, Y. Kato, R. Futakuchi, K. Inoue, K. Asai, Y. Okazaki, M. Yamamoto, M. Ding, J. Takamatsu, and T. Ogasawara, "Climbing on Giant's Shoulders: Newcomer's Road into the Amazon Robotics Challenge 2017", in Proceedings of 2017 IEEE ICRA Warehouse Picking Automation Workshop (WPAW 2017), Singapore, Singapore, May 29, 2017. [International workshop article, non-peer-reviewed.][*Authors contributed equally.]
Abstract
The Amazon Robotics Challenge has become one of the biggest robotic challenges in the field of warehouse automation and manipulation. In this paper, we present an overview of materials available for newcomers to the challenge, what we learned from the previous editions and discuss the new challenges within the Amazon Robotics Challenge 2017. We also outline how we developed our solution, the results of an investigation on suction cup size and some notable difficulties we encountered along the way. Our aim is to speed up development for those who come after and, as first-time contenders like us, have to develop a solution from zero.
-
L. El Hafi, M. Ding, J. Takamatsu, and T. Ogasawara, "眼球画像を用いた視線追跡と物体認識 日常生活のための装着型デバイス (Gaze Tracking and Object Recognition from Eye Images: Wearable Device for Daily Life)", in Proceedings of 2017 JSME Conference on Robotics and Mechatronics (ROBOMECH 2017), no. 17-2, ref. 2A1-I12, pp. 1-2, Fukushima, Japan, May 10, 2017. DOI: 10.1299/jsmermd.2017.2A1-I12 [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
Abstract
This paper introduces a method to identify the focused object in eye images captured from a single camera in order to enable intuitive eye-based interactions using wearable devices. The proposed method relies on a 3D eye model reconstruction to evaluate the gaze direction from the eye images. The gaze direction is then used in combination with deep learning algorithms to classify the focused object reflected on the cornea. Experimental results using a wearable prototype demonstrate the potential of the proposed method.
-
L. El Hafi, M. Ding, J. Takamatsu, and T. Ogasawara, "Gaze Tracking and Object Recognition from Eye Images", in Proceedings of 2017 IEEE International Conference on Robotic Computing (IRC 2017), pp. 310-315, Taichung, Taiwan, Apr. 10, 2017. DOI: 10.1109/IRC.2017.44 [International conference article, peer-reviewed.]
Abstract
This paper introduces a method to identify the focused object in eye images captured from a single camera in order to enable intuitive eye-based interactions using wearable devices. Indeed, eye images allow to not only obtain natural user responses from eye movements, but also the scene reflected on the cornea without the need for additional sensors such as a frontal camera, thus making it more socially acceptable. The proposed method relies on a 3D eye model reconstruction to evaluate the gaze direction from the eye images. The gaze direction is then used in combination with deep learning algorithms to classify the focused object reflected on the cornea. Finally, the experimental results using a wearable prototype demonstrate the potential of the proposed method solely based on eye images captured from a single camera.
-
L. El Hafi, M. Ding, J. Takamatsu, and T. Ogasawara, "Gaze Tracking using Corneal Images Captured by a Single High-Sensitivity Camera", in The Best of IET and IBC 2016-2017, vol. 8, pp. 19-24, Sep. 8, 2016. [International journal article, peer-reviewed.][Also in Proceedings of 2016 International Broadcasting Convention (IBC 2016), pp. 33-43, Amsterdam, Netherlands, Sep. 2016.]
Abstract
This paper introduces a method to estimate gaze direction using images of the eye captured by a single high-sensitivity camera. The purpose is to develop wearable devices that enable intuitive eye-based interactions and applications. Indeed, camera-based solutions, as opposed to commercially available infrared-based ones, allow wearable devices to not only obtain natural user responses from eye movements, but also scene images reflected on the cornea, without the need for additional sensors. The proposed method relies on a model approach to evaluate the gaze direction and does not require a frontal camera to capture scene information, making it more socially acceptable if embedded in a glasses-shaped device. Moreover, recent development in high-sensitivity camera sensors allows us to consider the proposed method even in low-light condition. Finally, experimental results using a prototype wearable device demonstrate the potential of the proposed method solely based on cornea images captured from a single camera.
-
L. El Hafi, M. Ding, J. Takamatsu, and T. Ogasawara, "Gaze Tracking using Corneal Images Captured by a Single High-Sensitivity Camera", in Proceedings of 2016 International Broadcasting Convention (IBC 2016), pp. 33-43, Amsterdam, Netherlands, Sep. 8, 2016. DOI: 10.1049/ibc.2016.0033 [International conference article, peer-reviewed.][Also in The Best of IET and IBC 2016-2017, vol. 8, pp. 19-24, Sep. 2016.]
Abstract
This paper introduces a method to estimate gaze direction using images of the eye captured by a single high-sensitivity camera. The purpose is to develop wearable devices that enable intuitive eye-based interactions and applications. Indeed, camera-based solutions, as opposed to commercially available infrared-based ones, allow wearable devices to not only obtain natural user responses from eye movements, but also scene images reflected on the cornea, without the need for additional sensors. The proposed method relies on a model approach to evaluate the gaze direction and does not require a frontal camera to capture scene information, making it more socially acceptable if embedded in a glasses-shaped device. Moreover, recent development in high-sensitivity camera sensors allows us to consider the proposed method even in low-light condition. Finally, experimental results using a prototype wearable device demonstrate the potential of the proposed method solely based on cornea images captured from a single camera.
-
F. von Drigalski*, L. El Hafi*, P. M. Uriguen Eljuri*, G. A. Garcia Ricardez*, J. Takamatsu, and T. Ogasawara, "NAIST Drillbot: Drilling Robot at the Airbus Shopfloor Challenge", in Proceedings of 2016 Annual Conference of the Robotics Society of Japan (RSJ 2016), ref. RSJ2016AC3X2-03, pp. 1-2, Yamagata, Japan, Sep. 7, 2016. [Domestic conference article, non-peer-reviewed.][*Authors contributed equally.]
Abstract
We propose a complete, modular robotic solution for industrial drilling tasks in an aircraft fuselage. The main contribution is a custom-made end effector with vibration-reducing feet that rest on the workpiece during the drilling process to increase stability. The solution took 1st place at the Airbus Shopfloor Challenge, an international robotics competition held at ICRA 2016.
-
L. El Hafi, P. M. Uriguen Eljuri, M. Ding, J. Takamatsu, and T. Ogasawara, "Wearable Device for Camera-based Eye Tracking: Model Approach using Cornea Images (カメラを用いた視線追跡のための装着型デバイス 角膜画像によるモデルアプローチ)", in Proceedings of 2016 JSME Conference on Robotics and Mechatronics (ROBOMECH 2016), no. 16-2, ref. 1A2-14a4, pp. 1-4, Yokohama, Japan, Jun. 8, 2016. DOI: 10.1299/jsmermd.2016.1A2-14a4 [Domestic conference article, non-peer-reviewed.]
Abstract
The industry's recent growing interest in virtual reality, augmented reality and smart wearable devices has created a new momentum for eye tracking. Eye movements in particular are viewed as a way to obtain natural user responses from wearable devices alongside gaze information used to analyze interests and behaviors. This paper extends our previous work by introducing a wearable eye-tracking device that enables the reconstruction of 3D eye models of each eye from two RGB cameras. The proposed device is built using high-resolution cameras and a 3D-printed frame attached to a pair of JINS MEME glasses. The 3D eye models reconstructed from the proposed device can be used with any model-based eye-tracking approach. The proposed device is also capable of extracting scene information from the cornea reflections captured by the cameras, detecting blinks from an electrooculography sensor as well as tracking head movements from an accelerometer combined with a gyroscope.
-
A. Yuguchi, R. Matsura, R. Baba, Y. Hakamata, W. Yamazaki, F. von Drigalski, L. El Hafi, S. Tsuichihara, M. Ding, J. Takamatsu, and T. Ogasawara, "モーションキャプチャによるボールキャッチ可能なロボット制御コンポーネント群の開発 (Development of Robot Control Components for Ball-Catching Task using Motion Capture Device)", in Proceedings of 2015 SICE System Integration Division Annual Conference (SI 2015), pp. 1067-1068, Nagoya, Japan, Dec. 14, 2015. [Domestic conference article, non-peer-reviewed.][Published in Japanese.]
Abstract
This paper describes the design and implementation of RT-middleware components for a ball-catching task by humanoid robots. We create a component to get the position of a thrown reflective ball from a motion capture device. We also create component to estimate the trajectory and the point where it will fall. The estimation is used to catch the ball using an HRP-4 humanoid robot with the control component.
-
L. El Hafi, K. Takemura, J. Takamatsu, and T. Ogasawara, "Model-based Approach for Gaze Estimation from Corneal Imaging using a Single Camera", in Proceedings of 2015 IEEE/SICE International Symposium on System Integration (SII 2015), pp. 88-93, Nagoya, Japan, Dec. 11, 2015. DOI: 10.1109/SII.2015.7404959 [International conference article, peer-reviewed.]
Abstract
This paper describes a method to estimate the gaze direction using cornea images captured by a single camera. The purpose is to develop wearable devices capable of obtaining natural user responses, such as interests and behaviors, from eye movements and scene images reflected on the cornea. From an image of the eye, an ellipse is fitted on the colored iris area. A 3D eye model is reconstructed from the ellipse and rotated to simulate projections of the iris area for different eye poses. The gaze direction is then evaluated by matching the iris area of the current image with the corresponding projection obtained from the model. We finally conducted an experiment using a head-mounted prototype to demonstrate the potential of such an eye-tracking method solely based on cornea images captured from a single camera.
-
L. El Hafi, J.-B. Lorent, and G. Rouvroy, "Mapping SDI with a Light-Weight Compression for High Frame Rates and Ultra-HD 4K Transport over SMPTE 2022-5/6", in Proceedings of 2014 VSF Content in Motion, Annual Technical Conference and Exposition (VidTrans14), Arlington, United States, Feb. 26, 2014. [International workshop article, non-peer-reviewed.]
Abstract
Considering the necessary bandwidth for the next generation of television with higher resolution video and higher frame rates, live uncompressed transport across 10 GB Ethernet network is not always possible. Indeed, uncompressed 4K video at 60 fps requires 12 Gbps or more. A light-weight compression can be optimal to address this challenge. A pure lossless codec would be the best. However, in general it is difficult to predict the compression ratio achievable by a lossless codec. Therefore, a light-weight visually lossless guaranteeing compression at very low compression ratio with no impact on latency seems optimal to perfectly map SDI links over SMPTE 2022-5/6.
-
L. El Hafi* and T. Denison*, "TICO : Étude d'un système de compression vidéo de faible complexité sur FPGA (TICO: Study of a Low-Complexity Video Compression Scheme for FPGA)", in Master's thesis, Université catholique de Louvain (UCLouvain) & intoPIX, Louvain-la-Neuve, Belgium, Jun. 2013. [Master's thesis.][*Authors contributed equally.][Supervised by J.-D. Legat, B. Macq, and G. Rouvroy.][Published in French.]
Abstract
L'évolution des techniques d'affichage, en matière de résolution d'écran, de nombre d'images par seconde et de profondeur des couleurs, nécessite de nouveaux systèmes de compression en vue de réduire notamment la puissance consommée aux interfaces vidéo. Face à cette problématique, le consortium Video Electronics Standards Association (VESA) lance en janvier 2013 un appel à propositions pour la création d'un nouveau standard de compression Display Stream Compression (DSC). Ce document, réalisé en collaboration avec intoPIX, répond à l'appel de VESA et propose Tiny Codec (TICO), un schéma de compression vidéo de faible complexité hardware. Il y est décrit d'une part l'étude algorithmique d'un codeur entropique, inspiré de l'Universal Variable Length Coding (UVLC), affichant un rendement de 85% sur du contenu filmé et, d'autre part, l'implémentation sur FPGA d'une transformée en ondelettes discrète, horizontale de type 5:3, traitant des flux vidéo 4K jusqu'à 120 images par seconde. L'implémentation réalisée consomme 340 slices par composante de couleur sur les plateformes basse consommation Artix-7 de Xilinx.
Lectures
-
"Getting Started with the HSR Software Development Environment for Team Collaboration on Robotics Projects and Competitions", at 2nd Session of EU HSR Community Webinar Series, Online, Jan. 21, 2021. [Invited lecture.]
Abstract
The presentation describes the motivations, tools, and strategies to successfully get started with the HSR Software Development Environment (SDE) collectively developed within the HSR Community by the members of the HSR SDE Working Group with the support of Toyota Motor Corporation. The HSR SDE aims at 1) enabling interdisciplinary teams to quickly start using the HSR in their projects, and 2) accelerate research implementation and integration between the members of the HSR Community by containerizing the whole HSR development, simulation, and operation process in a virtual environment that can be conveniently shared between users. This presentation should particularly be of interest to teams collaborating on large research projects or preparing for international robotics competitions with the HSR.
-
"Getting Started with the HSR Software Development Environment for Team Collaboration on Robotics Projects and Competitions", at 5th HSR Annual Users' Conference, Online, Nov. 23, 2020. [Invited lecture.]
Abstract
The presentation describes the motivations, tools, and strategies to successfully get started with the HSR Software Development Environment (SDE) collectively developed within the HSR Community by the members of the HSR SDE Working Group with the support of Toyota Motor Corporation. The HSR SDE aims at 1) enabling interdisciplinary teams to quickly start using the HSR in their projects, and 2) accelerate research implementation and integration between the members of the HSR Community by containerizing the whole HSR development, simulation, and operation process in a virtual environment that can be conveniently shared between users. The presentation also marks the initial public release of the SDE that has been evaluated in a private testing phase through a small-scale deployment within the core members of the HSR Community since 2019. Finally, this presentation should particularly be of interest to teams collaborating on large research projects or preparing for international robotics competitions with the HSR.
-
"Introduction to Efficient Source Code Management for Team Collaboration on Robotics Projects", at 10th Symposium of the Intelligent Home Robotics Research Committee (iHR), Online, Sep. 19, 2020. [Invited lecture.][Conducted in Japanese.]
Abstract
The presentation describes the strategies and tools successfully deployed by the members of the HSR Software Development Environment Working Group (Toyota HSR Community) and Team NAIST-RITS-Panasonic (Nara Institute of Science and Technology, Ritsumeikan University, Panasonic Corporation) to 1) collaborate on large robotics projects in academia, and 2) participate in international robotics competitions. In particular, the presentation describes how to introduce multiple layers of abstraction between the different components and processes to enable the collaborators to focus on their individual expertise and rely on automated tests and simulations from the system. The ultimate goal is to share our proven experience in international competitions with other interdisciplinary teams to help them efficiently work together on large robotics projects.
Fundings
-
Grant Recipient, Research Promotion Program for Acquiring Grants-in-Aid for Scientific Research (KAKENHI)
Sep. 2019by Ritsumeikan University with 200,000 JPY
in Kyoto, JapanDetails
Awarded with a grant of 200,000 JPY to stimulate the acquisition of additional competitive funds through the application of Grants-in-Aid for Scientific Research (KAKENHI).
-
Grant Recipient, Research Promotion Program for Acquiring Grants-in-Aid for Scientific Research (KAKENHI)
Sep. 2018by Ritsumeikan University with 200,000 JPY
in Kyoto, JapanDetails
Awarded with a grant of 200,000 JPY to stimulate the acquisition of additional competitive funds through the application of Grants-in-Aid for Scientific Research (KAKENHI).
-
Grant Recipient, CREST AIP Challenge Program
Jun. 2018by Japan Science and Technology Agency (JST) with 1,000,000 JPY
in Tokyo, JapanDetails
Awarded with a research grant of 1,000,000 JPY as a young researcher who belongs to a CREST team under the AIP Network Laboratory in order to explore and develop an original research work related to the CREST project objectives.
-
Grant Recipient, MEXT Scholarship Program
Feb. 2014by Japan Ministry of Education, Culture, Sports, Science and Technology (MEXT) with 6,060,000 JPY
in Tokyo, JapanDetails
Selected to conduct research in Japan through recommendation by the Japanese diplomatic mission in Belgium, and awarded a grant totaling 6,060,000 JPY over 3 years and 6 months.
-
Grant Recipient, EXPLORT Program
Sep. 2013by Wallonia Foreign Trade and Investment Agency (AWEX) with 3,000 EUR
in Brussels, BelgiumDetails
Selected among more than 70 candidates for an intensive training in international business and management, and awarded 3,000 EUR for a mission in the United States.
Projects
-
World Robot Summit 2020
Jun. 2019 - Presentas Group Leader
-
HSR Software Development Environment Working Group (SDE-WG)
Apr. 2019 - Presentas Group Leader
-
CREST: Symbol Emergence in Robotics for Future Human-Machine Collaboration
Oct. 2017 - Presentas Team Member
-
R-GIRO: International and Interdisciplinary Research Center for the Next-Generation Artificial Intelligence and Semiotics (AI+Semiotics)
Oct. 2017 - Presentas Research Assistant Professor
-
First Japanese-German-French Symposium for International Research and Applications on Artificial Intelligence
Oct. 2018 - Nov. 2018as Moderator
-
World Robot Summit 2018
Oct. 2017 - Oct. 2018as Group Leader
-
Amazon Robotics Challenge 2017
Oct. 2016 - Jul. 2017as Team Member
-
Airbus Shopfloor Challenge 2016
Mar. 2016 - May 2016as Team Member
-
STARE: Simultaneous Tracking and Attention Recognition from Eyes
Apr. 2014 - Sep. 2017as Doctor Student
-
TICO: Tiny intoPIX Codec
Sep. 2012 - Mar. 2014as Research Intern
-
Eurobot 2012: Treasure Island
Sep. 2011 - Apr. 2012as Team Member