Atuhurra Jesse

Atuhurra Jesse

Ph.D. Candidate, Natural Language Processing (NLP), Google Scholar

Nara Institute of Science and Technology (NAIST)

Hi There 👋

I am a PhD student at the Natural Language Processing Lab of NAIST, working with Prof. Taro Watanabe. I’m grateful for Hidetaka Kamigaito, Hiroyuki Shindo, and Hiroki Ouchi guidance too.

My NLP research interests lie in Information Extraction (named entity recogntion, entity linking), Knowledge Graphs, Multimodal AI, prompting in Large Language Models (LLMs) and Low-resource NLP. Broadly speaking, I am passionate about applying deep learning approaches to enable machines to understand human language, and facilitate communication between humans and social robots.

I’m interning with Deutsches Forschungszentrum für Künstliche Intelligenz GmbH (DFKI) especially the DFKI Lab Berlin to construct language resources for Swahili and North European languages.

I’m working on social robots under the Guardian Robot Project at RIKEN where I am specifically contributing to First-person Multimodal Perception through Attribute collection and Vision Language Models (VLMs). I work with Koichiro Yoshino.

I undertook a research internship at Fujitsu AI Lab where I worked on Multimodal Information Extraction. I worked with Prof. Tomoya Iwakura and Tatsuya Hiraoka.

I was affiliated with HONDA Research Institute Japan (HRI-JP) as a Part-time Researcher, and my work primarily focused on Intent Recognition in Language for a Social Robot. I collaborated with Eric Nichols and Anton de la Fuente.

Previously, I worked on Intrusion Detection in IoT networks in the Large-scale Systems Management Lab of NAIST during my master’s degree where Prof. Shoji Kasahara advised me.

I spent time as a Research Student at Kyoto University, in Yoshikawa Lab in the Graduate School of Informatics . While there, I was supervised and mentored by Prof. Masatoshi Yoshikawa on several methods mainly related to Information Retrieval, Databases, Human-Computer Interface design and Artificial Intelligence.

My graduate studies are fully funded by the Japanese government’s MEXT scholarship for which I am incredibly grateful.

Activities:
  • [Dec. 2024] Commenced research internship at DFKI Berlin 🇩🇪.
  • [Jan. 2024] Commenced work on multimodal perception for Robots, at RIKEN 🇯🇵.
  • [Sept. 2023] Started research internship at Fujitsu AI Lab 🇯🇵.
  • [Oct. 2021] Completed research internship, starting a new role at Honda 🇯🇵.
  • [Sept. 2021] Selected to participate in the AllenNLP Hacks 2021 🇺🇸.
Interests
  • Natural Language Processing
  • Multimodal Foundation Models
  • Social Robotics
  • Human—Robot Interaction
  • Representation Learning
Education
  • PhD Information Science and Engineering (Expected), 2025

    Nara Institute of Science and Technology, Japan

  • MEng Information Science and Engineering, 2022

    Nara Institute of Science and Technology, Japan

  • Research Student, 2020

    Kyoto University, Japan

  • BEng in Telecommunications Engineering, Jan 2016

    Kyambogo University, Uganda

Experience

 
 
 
 
 
German Research Center for Artificial Intelligence (DFKI)
Research Intern
Dec 2024 – Present Berlin, Germany
Research: Construct language resources for Swahili and North European languages.
 
 
 
 
 
RIKEN (R-IH)
Research Assistant
Jan 2024 – Present Sorakugun, Kyoto, Japan
Research: Multimodal Perception in Social Robots.
 
 
 
 
 
Fujitsu AI Lab
Research Intern
Sep 2023 – Dec 2023 Kawasaki, Kanagawa, Japan
Research: Multimodal Information Extraction, Vision Language Models (VLMs).
 
 
 
 
 
HONDA
Part time Researcher
Nov 2021 – Mar 2023 Wako-shi, Saitama, Japan
Research: Named Entity Recognition, Knowledge Bases, Knowledge Graphs.
 
 
 
 
 
HONDA
Research Intern
Jul 2021 – Oct 2021 Wako-shi, Saitama, Japan
Research: Intent Recognition in Language for HARU.
 
 
 
 
 
NAIST
Graduate Student (MEng)
Apr 2020 – Mar 2022 Ikoma, Nara, Japan
I completed Master’s degree in the Large-scale Systems Management Lab where I worked on Intrusion Detection with Prof. Shoji Kasahara.
 
 
 
 
 
Gaba Corporation
English Language Instructor
Aug 2018 – Mar 2022 Kyoto/Osaka, Japan
 
 
 
 
 
Kyoto University
Research Student
Apr 2018 – Mar 2020 Kyoto, Japan
As a Research Student, I was actively mentored and supervised by Prof. Masatoshi Yoshikawa on Information Retrieval, Databases, Human Computer Interface design and Artificial Intelligence methods.
 
 
 
 
 
United Nations Global Pulse Lab
Junior Researcher
Feb 2017 – Jul 2017 Kampala, Uganda
My work mainly included Big Data Analysis and the collection of GIS data.

Publications

Conferences & Preprints

Please find all my publications on Google Scholar

Thesis

Dealing with Imbalanced Classes in Bot-IoT Dataset
Jesse Atuhurra
M.Eng Information Science and Engineering

Conferences

HLU: Human vs. LLM Generated Text Detection Dataset for Urdu at Multiple Granularities
Iqra Ali, Jesse Atuhurra, Hidetaka Kamigaito, Taro Watanabe
COLING 2025. Abu Dhabi, UAE. January 19–24, 2025.
Zero-shot Retrieval of User Intent in Human-Robot Interaction with Large Language Models
Jesse Atuhurra
IEEE MIPR 2024. San Jose, CA, USA. August 7-9, 2024.
The Impact of Large Language Models on Social Robots: Potential Benefits and Challenges
Jesse Atuhurra
Assistive Robots @ RSS 2024. Delft, Netherlands. July 15-19, 2024.

Preprints

NERsocial: Efficient Named Entity Recognition Dataset Construction for Human-Robot Interaction Utilizing RapidNER
Jesse Atuhurra, Hidetaka Kamigaito, Hiroki Ouchi, Hiroyuki Shindo, Taro Watanabe
arxiv: 6029236 (submission on hold)
Leveraging Large Language Models in Human-Robot Interaction: A Critical Analysis of Potential and Pitfalls
Jesse Atuhurra
arXiv:2405.00693
Revealing Trends in Datasets from the 2022 ACL and EMNLP Conferences
Jesse Atuhurra, Hidetaka Kamigaito
arXiv:2404.08666
Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models
Jesse Atuhurra, Iqra Ali, Tatsuya Hiraoka, Hidetaka Kamigaito, Tomoya Iwakura, Taro Watanabe
arXiv:2406.15359
Introducing Syllable Tokenization for Low-resource Languages: A Case Study with Swahili
Jesse Atuhurra, Hiroyuki Shindo, Hidetaka Kamigaito, Taro Watanabe
arXiv:2406.15358
Domain Adaptation in Intent Classification Systems: A Review
Jesse Atuhurra, Hidetaka Kamigaito, Taro Watanabe, Eric Nichols
arXiv:2404.14415
Image Classification for CSSVD Detection in Cacao Plants
Jesse Atuhurra, N'guessan Yves-Roland Douha, Pabitra Lenka
arXiv:2405.04535
Enrich Robots with Updated Knowledge in the Wild via Large Language Models
Jesse Atuhurra
RG.2.2.15798.31048
Distilling Named Entity Recognition Models for Endangered Species from Large Language Models
Jesse Atuhurra, Seiveright Cargill Dujohn, Hidetaka Kamigaito, Hiroyuki Shindo, Taro Watanabe
arXiv:2403.15430

Projects

Descriptions & Datasets

Distillation of Bio-species Infomation from LLMs

This project focuses on creating datasets for Named Entity Recognition (NER) and Relation Extraction (RE) in the domain of endangered species by distilling knowledge from GPT-4. We generated synthetic data about four classes of endangered species (amphibians, arthropods, birds, and fishes) using GPT-4, which was then verified by humans using external knowledge bases like IUCN and Wikipedia. The final dataset contains 3.6K sentences evenly split between NER and RE tasks, with annotations for species, habitat, feeding, and breeding entities, along with their relationships. We fine-tuned various BERT models (standard BERT, BioBERT, and PubMedBERT) on this dataset, with PubMedBERT achieving the best performance at 94.14% F1-score. Last, we demonstrated that GPT-4 performs better than UniversalNER-7B in zero-shot NER tasks on both easy and hard examples, confirming GPT-4's effectiveness as a teacher model for knowledge distillation in this domain.
[PDF] [Code]

Large-scale NER Dataset Construction

We introduce RapidNER, a framework for efficiently creating named entity recognition (NER) datasets for new domains, with a focus on human-robot interaction. The framework operates through three key steps: 1) extracting domain-specific knowledge from Wikidata using instance-of and subclass-of relations, 2) collecting diverse texts from Wikipedia, Reddit, and Stack Exchange, and 3) implementing an efficient annotation scheme using Elasticsearch. We demonstrate the framework by creating NERsocial, a new dataset containing 153K tokens, 134K entities, and 99.4K sentences across six entity types relevant for social interactions: drinks, foods, hobbies, jobs, pets, and sports. When fine-tuned on NERsocial, transformer models like BERT, RoBERTa, and DeBERTa-v3 achieve F1-scores above 95%. The framework significantly reduces dataset creation time and effort while maintaining high quality, as evidenced by a 90.6% inter-annotator agreement.
[PDF] [Code] [Data (HF)] [Website]

Open-world Object Recognition

One of the hardest tasks for robots is to recognize unseen objects, relevant to inform the execution of tasks in the robot's vicinity. In this project we leverage state-of-the-art foundaiton models to enhance the robot's ability to perceive the world around it, and then perfom complex reasoning tasks.

Visual Semantic Understanding

We introduce nine vision-and-language (VL) tasks and constructs multilingual datasets in English, Japanese, Swahili, and Urdu to evaluate vision language models (VLMs), with a particular focus on GPT-4V. The tasks include object recognition, scene understanding, relationship understanding, semantic segmentation, image captioning, image-text matching, unrelatedness (a newly introduced task), entity extraction, and visual question answering. We collected 1,000 image-text pairs from Wikinews and Wikipedia across 10 categories, selected 200 pairs for detailed analysis, and used GPT-4V to generate answers and rationales for each task in all four languages. Native speakers evaluated the quality of translations and model outputs using a 5-point Likert scale. The results showed that GPT-4V performed best in English (94.81% accuracy), followed by Urdu (90.56%), Japanese (88.09%), and Swahili (83.57%), with better performance on image-only tasks compared to tasks requiring both image and text understanding. This project contributes to the new frontier of omprehensive VL analysis in Swahili and Urdu languages.
[PDF] [Code]

Zero-shot Intent Recognition

This research project investigates zero-shot user intent classification in human-robot interaction using large language models (LLMs). We created a new dataset containing 33,812 sentences across four languages (English, Japanese, Swahili, and Urdu) and six intent classes (pet, food, job, hobby, sport, and drink). We leveraged Wikidata knowledge graphs to extract sentences from Wikipedia articles and tested six different prompting methods with various LLMs including GPT-4, Claude 3, and Gemma. The experiments demonstrated that well-crafted prompts, utilizing adavanced prompting methods, enabled LLMs to achieve high accuracy in intent classification without requiring fine-tuning or example data, with GPT-4 and Claude 3 achieving nearly 95% accuracy across all languages. The study also showed that retrieval-augmented generation (RAG) improved classification performance, and simple zero-shot prompting was sufficient for achieving competitive results, especially with more capable LLMs like GPT-4 and Claude 3 Opus.
[PDF] [Code]

Contact

Email(s): atuhurra.jesse.ag2 [at] is.naist.jp OR atuhurrajesse [at] gmail.com