Golsa Tahmasebzadeh

Hannover, Germany

Research Assistant at Leibniz Information Centre for Science and Technology (TIB) and L3S Research Center

PhD Candidate at the Gottfried Wilhelm Leibniz University of Hannover


Email me: golsatahm@gmail.com

My PhD is focused on multimodal analytics of multilingual news articles, with a specific emphasis on inferring and distinguishing geospatial, events and temporal information within news narratives. My objective is to employ advanced multimodal machine learning techniques to effectively bridge the gaps between visual and textual modalities. Furthermore, I aim to enrich news documents by integrating external knowledge derived from knowledge graphs, thereby enhancing the contextualization of news content. The contextualization is important to organize and analyze news documents in various applications such as fake news detection, fact checking, and news retrieval.

As an Early Stage Researcher (ESR) in the CLEOPATRA ITN, a Marie Skłodowska-Curie Innovative Training Network, I was responsible for facilitating advanced cross-lingual processing of event-centric textual and visual information through development of novel methods for extraction, verification and contextualisation of multilingual information. Additionally, my role in the FakeNarratives project aimed at understanding narratives of disinformation in public and alternative news videos has empowered me to directly confront one of the most significant real-world challenges.

Education

PhD Candidate
Aug. 2019 - present
Gottfried Wilhelm Leibniz University of Hannover
MSc in Computer Engineering - Artificial Intelligence and Robotics
Sep. 2016 - Feb. 2019
Faculty of Electrical and Computer Engineering, University of Tabriz
BSc in Information Technology Engineering
Sep. 2012 - Aug. 2016
Faculty of Electrical and Computer Engineering, University of Tabriz

Projects Overview

Few-Shot Event Classification in Images using Knowledge Graphs for Prompting


Event classification in images plays a vital role in multimedia analysis especially with the prevalence of fake news on social media and the Web. The majority of approaches for event classification rely on large sets of labeled training data. However, image labels for fine-grained event instances (e.g., 2016 Summer Olympics) can be sparse, incorrect, ambiguous, etc. A few approaches have addressed the lack of labeled data for event classification but cover only few events. Moreover, vision-language models that allow for zero-shot and few-shot classification with prompting have not yet been extensively exploited. In this paper, we propose four different techniques to create hard prompts including knowledge graph information from Wikidata and Wikipedia as well as an ensemble approach for zero-shot event classification. We also integrate prompt learning for state-of-the-art vision-language models to address few-shot event classification. Experimental results on six benchmarks including a new dataset comprising event instances from various domains, such as politics and natural disasters, show that our proposed approaches require much fewer training images than supervised baselines and the state-of-the-art while achieving better results.

Paper link: Few-Shot Event Classification in Images using Knowledge Graphs for Prompting
Github link: https://github.com/TIBHannover/PromptImageEvent
GeoWINE: Geolocation based Wiki, Image, News and Event Retrieval


In the context of social media, geolocation inference on news or events has become a very important task. In this paper, we present the GeoWINE (Geolocation-based Wiki-Image-News-Event retrieval) demonstrator, an effective modular system for multimodal retrieval which expects only a single image as input. The GeoWINE system consists of five modules in order to retrieve related information from various sources. The first module is a state-of-the-art model for geolocation estimation of images. The second module performs a geospatial-based query for entity retrieval using the Wikidata knowledge graph. The third module exploits four different image embedding representations, which are used to retrieve most similar entities compared to the input image. The last two modules perform news and event retrieval from EventRegistry and the Open Event Knowledge Graph (OEKG). GeoWINE provides an intuitive interface for end-users and is insightful for experts for reconfiguration to individual setups. The GeoWINE achieves promising results in entity label prediction for images on Google Landmarks dataset.

Paper link: GeoWINE: Geolocation based Wiki, Image, News and Event Retrieval
Github link: https://github.com/cleopatra-itn/GeoWINE
Demo link: http://cleopatra.ijs.si/geowine/
Multimodal Geolocation Estimation of News Photos


The widespread growth of multimodal news requires sophisticated approaches to interpret content and relations of different modalities. Images are of utmost importance since they represent a visual gist of the whole news article. For example, it is essential to identify the locations of natural disasters for crisis management or to analyze political or social events across the world. In some cases, verifying the location(s) claimed in a news article might help human assessors or fact-checking efforts to detect misinformation, i.e., fake news. Existing methods for geolocation estimation typically consider only a single modality, e.g., images or text. However, news images can lack sufficient geographical cues to estimate their locations, and the text can refer to various possible locations. In this paper, we propose a novel multimodal approach to predict the geolocation of news photos. To enable this approach, we introduce a novel dataset called Multimodal Geolocation Estimation of News Photos (MMG-NewsPhoto). MMG-NewsPhoto is, so far, the largest dataset for the given task and contains more than half a million news texts with the corresponding image, out of which 3000 photos were manually labeled for the photo geolocation based on information from the image-text pairs. For a fair comparison, we optimize and assess state-of-the-art methods using the new benchmark dataset. Experimental results show the superiority of the multimodal models compared to the unimodal approaches.

Paper link: Multimodal Geolocation Estimation of News Photos
Github link: https://github.com/TIBHannover/mmg-newsphoto
MLM: A benchmark dataset for multitask learning with multiple languages and modalities


In this paper, we introduce the MLM (Multiple Languages and Modalities) dataset - a new resource to train and evaluate multitask systems on samples in multiple modalities and three languages. The generation process and inclusion of semantic data provide a resource that further tests the ability for multitask systems to learn relationships between entities. The dataset is designed for researchers and developers who build applications that perform multiple tasks on data encountered on the web and in digital archives. A second version of MLM provides a geo-representative subset of the data with weighted samples for countries of the European Union. We demonstrate the value of the resource in developing novel applications in the digital humanities with a motivating use case and specify a benchmark set of tasks to retrieve modalities and locate entities in the dataset. Evaluation of baseline multitask and single task systems on the full and geo-representative versions of MLM demonstrate the challenges of generalising on diverse data. In addition to the digital humanities, we expect the resource to contribute to research in multimodal representation learning, location estimation, and scene understanding.

Paper link: MLM: A Benchmark Dataset for Multitask Learning with Multiple Languages and Modalities
Github link: https://github.com/GOALCLEOPATRA/MLM

Publications

-->