id author title date pages extension mime words sentence flesch summary cache txt bagga-hathi-2022 bagga bagga-hathi-2022 2022 10 .pdf application/pdf 5327 293 51 The distribution of four features from our Enriched Feature set – average sentence length, Tuldava score, NRC positive score, and VADER positive score – across our dataset of fiction pages (red) and non-fiction pages (blue) sampled from 1800 to 1999. Studying long time scales necessarily requires large data collections as each time unit (year/decade) becomes sparser the less data one has. cache/bagga-hathi-2022.pdf txt/bagga-hathi-2022.txt