KB Research enrichment infrastructure
The KB Research enrichment infrastructure consists of an enrichment database, dumps from DBpedia, Solr indexes and a number of applications and services to access the enrichment database like request enrichments, add enrichment and delete enrichment. The main process is the automatic enrichment of newspaper articles. This process uses a sophisticated disabiguation service that finds for each named entity the most probable DBpedia article.
The enrichments can be links to external resource descriptions, images, video and webpages but also extracted features like image type, genre, topic etc. Although the focus is on enriching newspaper articles, the infrastructure allows enrichment of almost everything that has an identifier, as well internal as external. The results are demonstrated by means of an experimental portal at http://www.kbresearch.nl/xportal
. This portal offers a lot of functionality with respect to the enrichments, like: semantic search in the historical newspapers, manually adding or correcting enrichments, geographical search etc.
The enrichments are mainly links to context information for a name in the article, or a video or image and are shown on top of the article display.
There is also a browser extension
available for Delpher with limited functionality.
There are two basic types of resources that can be enriched:
Information resources (IR)
These are digital content objects like newspaper articles, book pages etc. They can be requested and presented by means of a URL. Enrichments have the same the same identifier as the objects they enrich and can also be requested via an URL containing the identifier, for example:
Non-information resources (NIR)
These are the links to the resources, like things, concepts, persons etc. A NIR record can contain links to more than one description at different locations. The description can be requested via an URL using one of the identifiers like http://www.kbresearch.nl/get_nir?identifier=DBP:Albert_Einstein or http://www.kbresearch.nl/get_nir?identifier=WD:Q937.
There are many different types of enrichments possible for the information resources. In some cases these enrichments refer to only fragment of the content. Some examples:
Related web pages, videos etc. (example)
Links to resource descriptions for named entities in text objects
Geographical coordinates of a location or an event in the text
Place and street name where an event took place
Extracted features from a text like sentiment
Different types of enrichments require different data fields. One of the fields is "reference", which is a reference to the algorithm being used for obtaining or generating the enrichment. Most references start with "ls-" for example "ls-2017-03-7" meaning the enrichment is an automatically generated linked named entity with the algorithm that was used from that specific date. When the reference is "user" this means that the enrichment is added by a user.