What is love?
A graduate student, with whom I've been working, assembled the full text of 600 Victorian novels. I used the Distant Reader Toolboxto model the corpus, and in the end, the corpus is close to 93 million words long. (By comparison, the Bible is about .8 million words long.)
I then applied concordancing to the corpus to answer the question, "What is love?", and the some of the snippets, below, were returned.
- a beautiful thing." "i can not say. not experienced in beau
- a horror to every pure mind; it was to the minister the mos
- a pathological condition. i am painfully aware of the objec
- cheated every day in this way by offenders much more seriou
- indestructible! its holy flame for ever burneth-from heaven
- not so necessary to us women as people think. fine writers
- so splendid, it is the greatest pity it should be impossibl
- surely the early home of the heart. it came upon him so ple
- the greatest of all hypocrites." "perhaps that is true," sa
- the soul of an irish dragoon!' by jove, i am as delighted t
- worth the whole broad earth; give that, you give us all!" "
(The complete list is linked at ./love-is.txt.)
A colleague (Ben Companjen) then asked, "I [wonder] if certain literal expressions were more common, not necessarily if there was a classification?"
I then said to myself, "Literal expressions? Such are ngrams, and the attached Python script outputs a frequency list of ngrams from a configured file." I applied the script to the definitions, and some of the more interesting results include the following phrases and their frequencies:
- a man (22)
- the world (16)
- stronger than (15)
- the heart (11)
- a thing (10)
- a woman (9)
- not love (9)
- more than (8)
- a passion (8)
- the first (6)
- my heart (6)
- blind and (4)
- creative energy (2)
- one god (2)
- his religion (2)
- immortal and (2)
Fun with text mining, natural language processing, and data science.
Creator: Eric Lease Morgan <firstname.lastname@example.org>
Source: This was originally posted to the Code4Lib Slack channel (October 24, 2022)
Date created: 2022-11-14
Date updated: 2022-11-14