Cultural heritage takes an important part of the history of humankind as it is one of the most powerful tools for the transfer and preservation of moral identity. As a result, these cultural assets are considered highly valuable and sometimes priceless. Digital technologies provided multiple tools that address challenges related to the promotion and information access in the cultural context. However, the large data collections of cultural information have more potential to add value and address current challenges in this context with the recent progress in artificial intelligence (AI) with deep learning and data mining tools. Through the present paper, we investigate several approaches that are used or can potentially be used to promote, curate, preserve and value cultural heritage through new and evolutionary techniques based on deep learning tools. The deep learning approaches entirely developed by our team are intended to classify and annotate cultural data, complete missing data, or map existing data schemes and information to standard-ized schemes with language processing tools.
Cultural Heritage, Digital Heritage, Deep Learning, CEPROQHA Project, Artificial Intelligence
Due to the high costs and the risks encountered in the physical preservation of cultural heritage assets, digital technologies present an effective alternative that optimizes their long-term preservation in addition to their promotion in multiple ways. New digital technologies for management, curation, and conservation have been proven to be reliable and effective at managing and promoting cultural heritage. This, in fact, is mainly due to the increasing reliability and the dropping costs of IT infrastructure as well as the ease of use of information technology solutions. The concept of Digital Heritage thus saw the light and many new related applications were introduced. Heritage institutions such as museums and art galleries began the digitization of their collections using photography and 3D scanning tools. Collection management systems were used to store and manage these assets. Other applications oriented for end users were also designed or adapted for cultural data such as virtual museums, AR, VR, etc. This wide use of digital heritage resulted in big data collections of heritage data consisting mainly of visual assets associated with metadata. Several researchers recently saw the opportunity to leverage this data in order to tackle many challenges related to cultural heritage such as the retrieval of metadata, the discovery of links between asset, the digital curation of assets, etc. These challenges were often addressed by machine learning and semantic web technologies.
Since the rise of deep learning technologies in visual classification, natural language processing and new generative models (Generative Adversarial Networks), many research contributions were developed to address current challenges in digital heritage. The overall focus is on how to successfully leverage these tools in order to address the challenges presented by digital heritage to effectively manage, curate, preserve, and share cultural heritage. In this paper, we mostly cover approaches based on deep learning and which are studied and developed by the CEPROQHA project team. These approaches mostly target data-oriented challenges such as the annotation of assets, their curation, and their data management.
Data Analytics for cultural data enrichment and curation
Cultural data annotation
The CEPROQHA team designed several annotation and classification approaches for multiple scenarios. These approaches mainly focus on either the full or the partial annotation of metadata. These techniques rely mostly on the visual features and characteristics of cultural assets to predict the desired labels using a combination of deep learning-based approaches such as convolutional neural networks (CNN) and transfer learning. The frameworks we designed achieved excellent performance and were validated on several datasets of paintings collected from multiple museums and institutions such as the Museum of Islamic Art in Doha, Qatar, Wikiart, the Rijksmuseum and the Metropolitan Museum of New York REF4].
- Multimodal classification
Convolutional neural networks are nowadays one of the most powerful tools for visual classification. These networks leverage the visual features of images to predict textual labels. There were several contributions that used CNNs for the classification and annotation of cultural data. These approaches use datasets of paintings such as WikiArt and the Rijksmuseum to annotate assets. For example, in the paintings context, the trained models classify paintings to retrieve metadata such as the artist or the creation year, etc. However, through analysis of these approaches, and with several consultations of art and culture specialists, it was obvious that these approaches do not replicate a realistic scenario as often, some information can be easily found along with the visual captures of assets and can potentially be used as inputs when predicting the missing data. As a result, our team designed and implemented a multimodal classification approach that leverages textual data found along assets in addition to their visual data to perform an efficient multitask prediction of the missing data. After tests and evaluations against a single input approach (visual input only), we found that our multimodal classification approach outpaces traditional approaches in terms of validation accuracy on a paintings classification model (data from Wikiart) and this is mainly due to the fact that more data help deep learning approaches in being more accurate REF5] (See Figure 2).
Multitask Hierarchical classification
After the analysis of the cultural datasets structure, it was clear that the metadata of the assets of different types is not similarly structured. For example, the metadata of a painting is not the same as a carpet or pottery. It is inefficient to design a single classifier that handles the completion or annotation of multiple types of assets that have different visual features and a different metadata structure. As a result, we designed a hierarchical classification framework that splits the task of classifying multiple types of cultural assets into a two-stage classification. In the first stage, a classifier (a CNN in this case) is used to predict the cultural category of the input. In the second stage, a set of multitask classifiers (each predicting multiple outputs) are used to predict the metadata of each type as assigned by the first stage classifier. The results of this approach confirm that it is more optimized and efficient than raw data classification where the data is sparse and not well structured. (See Figure 3 and Figure 4) REF4, 16].
Cultural Image Inpainting (completion)
Metadata is not the only information that can be missing from a cultural asset. Some assets may be physically damaged, have some missing parts or are in a scattered form, as example, scattered pottery, a painting with a hole or a statue with a missing membrane REF7]. Completing such assets is, in fact, a challenge that often requires the involvement of long-time art specialists which is both costly and impractical. Several techniques leveraging machine learning tools tried to address these challenges using computer-based approaches. These methods were however not suitable in multiple scenarios as their use is difficult and often require very large visual databases to work efficiently (millions of photographs). New deep learning models called Generative Adversarial Networks (GAN) demonstrated superior performance in visual completion tasks REF17-19]. The GAN model is inspired by game theory and represents a minimax game between two neural networks. GANs can be used to generate new data samples from a known context. The first network is called the generator and has to produce data samples from a random input. The second network is called the discriminator and is trained to differentiate between real data and samples produced by the generator. The training process consists of training both networks to be effective at their tasks i.e. the generator has to generate samples that look nearly real and the discriminator needs to identify real and fake samples accurately.
To use GANs for image completion, a new process called semantic inpainting REF18] was recently proposed and consists of training a GAN to generate samples from a known visual domain. For completion, the semantic inpainting process consists of constraining the output of the generator which receives a random seed as input so that its output has the same context as the image that we want to complete. Through our experiments, we saw that training a generator on data from entirely different visual contexts makes the completion process inaccurate. We thus used a divide and conquer strategy to regroup visual data from similar contexts into clusters and for each of these clusters, we trained a GAN. Each of the GANs was thus trained on data having a similar visual context (See Figure 5 and 6). The inpainting results show that our approach optimizes the semantic inpainting approach as instead of relying on a single GAN for completion which can be inaccurate, we trained several GANs on restrained visual contexts. We tested this approach on paintings and the results were clearly superior REF7].
Cultural Ontology Learning and Information Extraction
Presenting cultural information in a standardized data scheme is primordial in order to facilitate its exchange between institutions and preserve the already existing semantic links established between assets or between assets and events. Some standards already exist such as the CIDOC-CRM REF20] model which is intended to share and exchange cultural information, however, implementing these standards across all the cultural institutions is a real challenge as in the current state, mapping the already established data schemes is labor-intensive and time-consuming. Many research efforts were thus undertaken to design and develop tools that can perform this mapping automatically. Natural Language Processing (NLP) technologies, which are based on deep learning methods, are some of the most widely used tools to process text in artificial intelligence.
These NLP tools may be adapted to perform the text processing required in data schemes mapping. NLP tools may also seduce data managers in museums and heritage organization to shift from traditional databases into ontologies as ontologies are often known for the difficulty to populate them with data. This process is manual, time-consuming, labor intensive and only feasible for small knowledge domains. In order to be effective, the intended system needs to automate or semi-automate the extraction of metadata from different source types such as descriptions, stories, metadata, websites, titles, etc. Most of these resources can be explored freely and the goal is to spot relevant entities and the relationship between these entities. The whole process of building these blocks is referred to as Ontology Learning REF21].
The NLP techniques that our team plan to use to tackle such challenge are Named Entity Recognition (NER) and Relation Extraction (RE). NER focuses on extracting domain entities from unstructured text such as dates, places, and people names. The definitions of entities are either taken from knowledge sources such as Wikipedia and other sources that are easily and freely accessible or discovered through NLP techniques REF22, 23]. (See Figure 7). RE, on the other hand, is concerned with extracting the occurrence of relations in the text which would facilitate the discovery of the relationship between the domain entities mentioned in the text REF21].
As example, Figure 8 illustrates a cultural asset from the collection of the Met Museum of New York. Our goal is to extract relevant information from such assets and transform it into a common data scheme based on ontologies automatically using these NLP tools. Representing these assets digitally with an ontology opens the way to a wider spectrum of applications such as semantic search, browsing, link discoveries, etc.