id author title date pages extension mime words sentences flesch summary cache txt sbdevel-wordpress-com-8472 Software Development at Royal Danish Library | A peekhole into the life of the software development department at the Royal Danish Library .html text/html 14413 1184 74 It was custom tailored for the Solr index created with WARC-indexer and had features such as Trend analysis (n-gram) visualization of search results over time. Besides full-text search, Solr provides multiple ways of aggregating data, moving common net archive statistics tasks from slow batch processing to interactive requests. SolrWayback relies on real time access to WARC files and a Solr index populated by the UKWA webarchive-discovery tool. Whenever the content of a field is to be used for grouping, faceting, sorting, stats or streaming in Solr (or Elasticsearch or Lucene, where applicable), it is advisable to store it using DocValues. The linear access time was not a problem for small indexes or requests for values for a lot of documents, where most blocks needs to be visited anyway. Our netarchive search contains 89 Solr collections, each holding 300M documents in 900GB of index data. ./cache/sbdevel-wordpress-com-8472.html ./txt/sbdevel-wordpress-com-8472.txt