Enrichment references There are several enrichment processes and algorithms. Each enrichment has a reference field that refers to the enrichment process or algorith. These references are listed below, each with a short description, primarily for the purpose of internal administration. The disambiguation algorithm for named entities in newpaper articles is continuously improved while the enrichment process remains running. Therefor enrichments for newspaper articles may refer to different algorithms, depending on the time and date of enrichment.
An example reference can be seen at: http://tomcat.kbresearch.nl/links/ir?id=http://resolver.kb.nl/resolve?urn=ddd:010135660:mpeg21:a0026:ocr.
  1. user
    Indicates that an enrichment is added by a user. The email address of the user is stored, but not shown.
  2. anp-entities-1
    First named entity recognition and searching complete NEs in DBpedia dump
  3. kranten-entities-1
    Search DBpedia entries in newspaper articles (0-10000 hits)
  4. kranten-entities-2
    Search DBpedia entries in newspaper articles (10000-20000 hits)
  5. kranten-entities-3
    Some improvements
  6. ls-2014-10
    1. Named entity recognition, 2. search in new DBpedia dumps and caculation of probability score
  7. ls-2014-11
    1. Named entity recognition, 2. search in new DBpedia dumps and caculation of probability score
    Bug fix: Names and initials in NE and DBpedia title may not conflict.
  8. ls-2015-03
    Now using new DBpedia dumps and Solr for searching these dumps. Processing moved to HPC cluster at SURFsara.
    In Solr titles and redirects in Dutch and English are indexed as string and as text. The current algorithm applied to a Solr query (q=title_str:""&sort=inlinks desc) is:
    Problems with NER: System switched from Tomcat 6 to 7. Tomcat 6 and 7 give different NER results.
  9. ls-2015-03-2
    Improved distribution of CPU load in time: spread over time instead of burst.
    Still problems with NER and switching from Tomcat 6 to 7. NER results not reliable.
  10. ls-2015-03-3
    Still problems with switching from Tomcat 6 to 7 and unreliable NER results.
  11. ls-2015-04-7
    Links to English DBpedia are sometimes incorrectly preferred above the Dutch DBpedia.
  12. ls-2015-05-18
    Above problem solved by first checking whether the DBpedia title corresponds to the named entity. Language code now correctly available in Solr.
    Initials are not yet handled, so A. Einstein is currently not connected to Albert Einstein.
  13. ls-2015-06-23
    Dump files with redirects are not complete.
  14. freebase_links_nl.nt.bz2
    Addition of Freebase entries to NIRs.
  15. freebase_links_en.nt.bz2
    Addition of Dutch Freebase entries to NIRs.
  16. geonames_links_en_en.nt.bz2
    Addition of Geonames to NIRs.

Overview of all references (3-4-2018)

Spotlight: 17177021
bng-2015-10-27: 1
ls-2014-09: 89
ls-2014-10: 9804
ls-2014-11: 42627
ls-2015-03: 814
ls-2015-03-11: 40801915
ls-2015-03-2: 750
ls-2015-03-3: 1286
ls-2015-04-18: 7004
ls-2015-04-7: 99226
ls-2015-05-21: 257145
ls-2015-06-19: 24727
ls-2015-06-23: 327093
ls-2015-07-07: 138516
ls-2015-07-16: 115171
ls-2015-08-3: 19047991
ls-2016-03-21: 7700420
ls-2016-06-20: 3301277
ls-2016-12-22: 14820180
ls-2017-02-27: 32
ls-2017-07-24: 745154
ls-2017-10-16: 4042534
ls-2018-02-27: 227938
ls-2018-03-27: 39372