This is a table of type quadgram and their frequencies. Use it to search & browse the list to learn more about your study carrel.
quadgram | frequency |
---|---|
td td td td | 476 |
span a href http | 414 |
li li a href | 316 |
li a href http | 300 |
cmd wordsearch phrase zzzz | 298 |
zzzz word zzzz bookcode | 298 |
phrase zzzz word zzzz | 298 |
wordsearch phrase zzzz word | 298 |
life of a librarian | 224 |
li b source b | 217 |
li li b source | 217 |
when it comes to | 215 |
on the other hand | 212 |
td tr tr td | 208 |
word zzzz bookcode etext | 199 |
bookcode etext segsize usepre | 199 |
etext segsize usepre word | 199 |
zzzz bookcode etext segsize | 199 |
td img src http | 197 |
li b keywords b | 196 |
the a href http | 190 |
university of notre dame | 167 |
in the form of | 166 |
ul li b keywords | 165 |
through the use of | 155 |
li li b date | 154 |
tr align center td | 149 |
at the same time | 143 |
was never formally published | 141 |
a li li a | 139 |
align center td img | 134 |
at the university of | 134 |
as well as the | 133 |
center td img src | 132 |
the university of notre | 128 |
br span a href | 119 |
this text was never | 108 |
a a href http | 104 |
p a href http | 103 |
p img src http | 103 |
word zzzz bookcode thoreau | 99 |
span td tr table | 98 |
table align right tr | 98 |
td tr table p | 98 |
cmd term id themes | 97 |
a span td tr | 96 |
with the advent of | 93 |
it is possible to | 92 |
br a href http | 89 |
the number of times | 88 |
as well as a | 87 |
text was never formally | 87 |
p table align right | 84 |
for the purposes of | 84 |
align right tr align | 82 |
right tr align center | 82 |
called a href http | 82 |
a part of the | 80 |
cmd term id formats | 78 |
catalogue of electronic texts | 78 |
li a href https | 77 |
li b facet terms | 77 |
local annotated a li | 77 |
annotated a li ul | 77 |
ul li b creator | 77 |
li li b facet | 77 |
pdf local annotated a | 77 |
li b rights b | 77 |
b date read b | 77 |
li b date created | 77 |
li b date read | 77 |
li li b versions | 77 |
li li b rights | 77 |
b facet terms b | 77 |
b date created b | 77 |
quite a number of | 72 |
alex catalogue of electronic | 72 |
jpg br span a | 71 |
id p table align | 71 |
it a span td | 71 |
td td align right | 71 |
getwater id p table | 71 |
map it a span | 71 |
the use of the | 71 |
i li li strong | 71 |
cmd getwater id p | 71 |
p p a href | 69 |
can be used to | 68 |
my experiences at the | 67 |
the purpose of the | 67 |
p p img src | 67 |
open li li b | 66 |
td td align center | 66 |
table align center tr | 65 |
on the topic of | 65 |
td td img src | 65 |
the problem of find | 64 |
as a part of | 63 |
p table align center | 63 |
may or may not | 62 |
is a list of | 61 |
the use of a | 60 |
the result will be | 60 |
next generation library catalogs | 58 |
the library of congress | 58 |
of open source software | 57 |
the form of a | 57 |
td align right td | 57 |
of a href http | 56 |
will be able to | 56 |
digital humanities computing techniques | 55 |
in an effort to | 55 |
number of words in | 54 |
available on the web | 54 |
you may or may | 53 |
png img src http | 53 |
a list of all | 53 |
the content of the | 53 |
div id attachment class | 53 |
com sandbox liam sparql | 53 |
id attachment class wp | 53 |
width height class size | 52 |
ul li a href | 52 |
align center img src | 51 |
center img src http | 51 |
of words in a | 50 |
of the great books | 50 |
td align center img | 50 |
or may not know | 49 |
is a type of | 49 |
had the opportunity to | 48 |
the most frequently used | 48 |
frequency inverse document frequency | 48 |
jpg img src http | 48 |
enabling the reader to | 47 |
a part of a | 47 |
of the western world | 46 |
the result ought to | 46 |
as you may or | 46 |
great books of the | 46 |
list of all the | 46 |
of a number of | 46 |
books of the western | 46 |
p ul li strong | 45 |
for the most part | 45 |
for a good time | 45 |
each item in the | 45 |
conference on digital libraries | 45 |
gov authorities names n | 45 |
this is a pre | 45 |
and at the same | 44 |
are a number of | 44 |
center tr align center | 44 |
p ul li a | 44 |
align center tr align | 44 |
more things in common | 43 |
the distant reader is | 43 |
alt width height class | 43 |
here at notre dame | 43 |
a target blank href | 43 |
enable the reader to | 43 |
the number of words | 42 |
term frequency inverse document | 42 |
txt plain text a | 42 |
with a number of | 41 |
words in a document | 41 |
open source software and | 41 |
result ought to be | 40 |
things in common than | 39 |
the whole of the | 39 |
p div id attachment | 39 |
digital public library of | 39 |
library association annual meeting | 39 |
public library of america | 39 |
result will be a | 38 |
but not limited to | 38 |
documents my experiences at | 38 |
of the great ideas | 38 |
european conference on digital | 38 |
and a href http | 38 |
at the very least | 37 |
is intended to be | 37 |
i had the opportunity | 37 |
target blank href http | 37 |
there are a number | 36 |
apache tika release apache | 36 |
tika release apache tika | 36 |
library collections and services | 36 |
total number of documents | 36 |
are the great books | 36 |
i was able to | 36 |
in the united states | 36 |
the hathitrust research center | 36 |
this article was originally | 36 |
in common than differences | 35 |
the full text of | 35 |
a p id caption | 35 |
it comes to the | 35 |
the download page for | 35 |
create a list of | 35 |
download page for more | 35 |
com sandbox liam id | 35 |
not intended to be | 34 |
a number of things | 34 |
all of these things | 34 |
the great ideas coefficient | 34 |
contains a set of | 34 |
is a set of | 34 |
height class aligncenter size | 34 |
open source software in | 34 |
width height class aligncenter | 34 |
make it easier to | 34 |
open source software is | 33 |
allowing the reader to | 33 |
caption aligncenter a href | 33 |
is one of the | 33 |
html main sandbox liam | 33 |
www html main sandbox | 33 |
the alex catalogue of | 33 |
to read and write | 33 |
disk www html main | 33 |
an overview of the | 33 |
and open source software | 33 |
the next step is | 33 |
aligncenter a href http | 33 |
can be applied to | 33 |
next step is to | 32 |
a type of http | 32 |
ought to be a | 32 |
are intended to be | 32 |
in the same breath | 32 |
intended to be read | 32 |
as opposed to the | 32 |
the person next to | 32 |
in a number of | 32 |
the end of the | 32 |
to the use of | 32 |
center for digital scholarship | 32 |
total number of words | 32 |
as if it were | 32 |
have more things in | 32 |
a plain text file | 31 |
a look at the | 31 |
linked data is a | 31 |
a number of ways | 31 |
catholic research resources alliance | 31 |
source software in libraries | 31 |
in the public domain | 31 |
i am able to | 31 |
li li b keywords | 31 |
for each of the | 31 |
the american library association | 30 |
li ul p the | 30 |
a span violent span | 30 |
see the list of | 30 |
the problem of use | 30 |
align right src http | 30 |
will need to be | 30 |
the great books of | 30 |
i collected this water | 30 |
the items in the | 30 |
this text documents my | 30 |
the world wide web | 30 |
to be able to | 30 |
was originally published in | 30 |
of plain text files | 30 |
is an acronym for | 29 |
center for research computing | 29 |
if you want to | 29 |
old is new again | 29 |
you will need to | 29 |
each of the great | 29 |
for more information on | 29 |
span man is span | 29 |
is old is new | 29 |
the code lib community | 29 |
align right td td | 29 |
in the triple store | 28 |
align right td tr | 28 |
catholic youth literature project | 28 |
formats web articles a | 28 |
number of documents in | 28 |
person next to you | 28 |
plot on a timeline | 28 |
the frequency of words | 28 |
described in this proposal | 28 |
the good work of | 28 |
id formats web articles | 28 |
jpg alt width height | 28 |
of the semantic web | 28 |
the use of computers | 28 |
number of times the | 28 |
is a part of | 28 |
most frequently used words | 28 |
the balance of the | 28 |
enables the reader to | 28 |
img width src http | 28 |
term id formats web | 28 |
a chttp a f | 28 |
have been able to | 27 |
speech and named entities | 27 |
north carolina state university | 27 |
of these things are | 27 |
article was originally published | 27 |
intended to be used | 27 |
libraries of notre dame | 27 |
page for more information | 27 |
img align right src | 27 |
the names of people | 27 |
a greater number of | 27 |
in a href http | 27 |
text documents my experiences | 27 |
have a look at | 27 |
p blockquote p code | 27 |
text mining is a | 27 |
p img align right | 27 |
this is a list | 27 |
plot on a map | 26 |
more information on how | 26 |
the other end of | 26 |
putting it on the | 26 |
more like this one | 26 |
look at the download | 26 |
are some of the | 26 |
are not limited to | 26 |
the idea of the | 26 |
the university of michigan | 26 |
top tech trends for | 26 |
linked data is not | 26 |
wiki declaration of independence | 26 |
is expected to be | 26 |
are expected to be | 26 |
information on how to | 26 |
org wiki declaration of | 26 |
height br a href | 26 |
and the number of | 26 |
it on the web | 26 |
the result is a | 26 |
of full text content | 26 |
a span td td | 26 |
at the download page | 26 |
h summary h p | 26 |
width height br a | 26 |
but at the same | 25 |
at the other end | 25 |
here are a few | 25 |
next generation library catalog | 25 |
list of changes in | 25 |
plain text versions of | 25 |
a td tr table | 25 |
rdf and linked data | 25 |
libraries at the university | 25 |
full list of changes | 25 |
my a href http | 25 |
as a set of | 25 |
the advent of the | 25 |
on how to obtain | 25 |
please see the changes | 25 |
right td tr tr | 25 |
how to obtain apache | 25 |
the complete works of | 25 |
of documents in a | 25 |
the file named data | 25 |
to obtain apache tika | 25 |
and natural language processing | 25 |
in order to be | 25 |
american library association annual | 24 |
is not so much | 24 |
a number of years | 24 |
provide access to the | 24 |
open community knowledge hypermedia | 24 |
all of this is | 24 |
computer programs and scripts | 24 |
the value of the | 24 |
find more like this | 24 |
this presentation was given | 24 |
knowledge hypermedia administration and | 24 |
be used as a | 24 |
denoting the number of | 24 |
hypermedia administration and metadata | 24 |
code p blockquote p | 24 |
the lita blog at | 24 |
number of times each | 24 |
figure out how to | 24 |
the whole thing is | 24 |
lita blog at http | 24 |
and have a look | 24 |
documents in a corpus | 24 |
in the life of | 24 |
to learn how to | 24 |
eric lease morgan eric | 24 |
and the digital humanities | 24 |
a digital library framework | 24 |
community knowledge hypermedia administration | 24 |
frequency li li a | 24 |
make it easier for | 23 |
morgan eric morgan infomotions | 23 |
from the internet archive | 23 |
this is a good | 23 |
the north carolina state | 23 |
is made up of | 23 |
what is old is | 23 |
hesburgh libraries at the | 23 |
a perl module called | 23 |
lease morgan eric morgan | 23 |
of the digital humanities | 23 |
in the right direction | 23 |
is a good thing | 23 |
code pre blockquote p | 23 |
to answer the question | 23 |
the hesburgh libraries at | 23 |
there are a few | 23 |
is akin to a | 23 |
p blockquote pre code | 23 |
to accomplish this goal | 23 |
it is important to | 23 |
problem to be solved | 22 |
to make a book | 22 |
to what degree is | 22 |
the content of a | 22 |
the ala annual meeting | 22 |
the opportunity to visit | 22 |
with a set of | 22 |
and why should i | 22 |
this subdirectory contains a | 22 |
the use of these | 22 |
as but not limited | 22 |
the catholic research resources | 22 |
of globally networked computers | 22 |
p ol li strong | 22 |
tr tr align center | 22 |
word in a text | 22 |
the creation of a | 22 |
of the things i | 22 |
id formats journal articles | 22 |
how to use the | 22 |
p blockquote p the | 22 |
to figure out how | 22 |
are not intended to | 22 |
the context of the | 22 |
of publishing linked data | 22 |
themes libraries and librarianship | 22 |
whether or not the | 22 |
the total number of | 22 |
a violent fit of | 22 |
the purpose of this | 22 |
introduction to the nltk | 22 |
and or their frequency | 22 |
of the hesburgh libraries | 22 |
why should i care | 22 |
some of the things | 22 |
of one or more | 22 |
in the end i | 22 |
formats journal articles a | 22 |
but are not limited | 22 |
include but are not | 22 |
not seem to be | 22 |
such as but not | 22 |
eric lease morgan lt | 22 |
cite a href http | 22 |
subdirectory contains a set | 22 |
term id formats journal | 22 |
word cloud illustrating the | 22 |
with the exception of | 22 |
the national library of | 22 |
available as linked data | 21 |
of the human condition | 21 |
themes data curation a | 21 |
this was originally a | 21 |
p this is a | 21 |
is a collection of | 21 |
document was never formally | 21 |
in the first place | 21 |
li go to step | 21 |
a few years ago | 21 |
in this repository all | 21 |
analytics cookies to understand | 21 |
this repository all github | 21 |
to understand how you | 21 |
this document was never | 21 |
edited edited copy for | 21 |
cookies to understand how | 21 |
id themes data curation | 21 |
this travel log documents | 21 |
a and a href | 21 |
copy for eric lease | 21 |
understand how you use | 21 |
term id themes data | 21 |
articulating a research question | 21 |
a presentation at the | 21 |
a number of other | 21 |
this essay was originally | 21 |
td a href http | 21 |
from a set of | 21 |
edited copy for eric | 21 |
for eric lease morgan | 21 |
of the a href | 21 |
td tr tr align | 21 |
what it means to | 21 |
the catholic pamphlets project | 21 |
li li go to | 21 |
university libraries of notre | 21 |
p h summary h | 21 |
in any number of | 20 |
a td td img | 20 |
and putting it on | 20 |
is no such thing | 20 |
p h a id | 20 |
this is where the | 20 |
the use of rdf | 20 |
to what degree do | 20 |
plain text version of | 20 |
a great deal of | 20 |
it seems as if | 20 |
from the command line | 20 |
water and putting it | 20 |
into a coherent whole | 20 |
locations in a text | 20 |
is not the problem | 20 |
the contents of the | 20 |
is an introduction to | 20 |
a number of us | 20 |
advocate the use of | 20 |
for feature in features | 20 |
words phrases in a | 20 |
a plain text version | 20 |
a growing number of | 20 |
founding date ad http | 20 |
we do not advocate | 20 |
cloud illustrating the most | 20 |
in a text services | 20 |
the bulk of the | 20 |
p this is the | 20 |
the idea of a | 20 |
my point of view | 20 |
at least a couple | 20 |
the services against texts | 20 |
release and have a | 20 |
from my point of | 20 |
the work of the | 20 |
in the file named | 20 |
frequently used words in | 20 |
on the good work | 20 |
on a web server | 20 |
org oclc worldcat a | 20 |
has founding date ad | 20 |
in the case of | 20 |
it is time to | 20 |
through the creation of | 20 |
is probably the most | 20 |
number of documents containing | 20 |
and locations in a | 20 |
very similar to the | 20 |
take the form of | 20 |
li ul li li | 20 |
there is no such | 20 |
here at the university | 20 |
a word cloud illustrating | 20 |
at a href http | 20 |
back a list of | 20 |
a long time ago | 20 |
have a number of | 19 |
to a href http | 19 |
used to denote the | 19 |
code li li find | 19 |
how to make a | 19 |
tech trends for ala | 19 |
an analysis of the | 19 |
step in the right | 19 |
a number of different | 19 |
the name of a | 19 |
a code li li | 19 |
archival descriptions as linked | 19 |
last of the mohicans | 19 |
tools described in this | 19 |
in a presentation called | 19 |
in order to make | 19 |
be a list of | 19 |
return a list of | 19 |
descriptions as linked data | 19 |
p p in the | 19 |
open source software for | 19 |
freely available on the | 19 |
this posting describes how | 19 |
one of the more | 19 |
in the same way | 19 |
of the top ten | 19 |
a wide variety of | 19 |
you will find a | 19 |
more than ten years | 19 |
answer questions such as | 19 |
online public access catalogs | 19 |
on the lita blog | 19 |
intended to be a | 19 |
code lib mailing list | 19 |
i gave a presentation | 19 |
carolina state university libraries | 19 |
content uploads c l | 19 |
the code lib mailing | 19 |
the reader to do | 19 |
an integrated library system | 19 |
p blockquote p a | 19 |
a set of perl | 19 |
would not have been | 19 |
early english books online | 19 |
from a number of | 19 |
is located in the | 18 |
a better understanding of | 18 |
to what degree are | 18 |
be applied to the | 18 |
a a a a | 18 |
published in computers in | 18 |
the results of text | 18 |
is a travel log | 18 |
in computers in libraries | 18 |
is a good example | 18 |
least a couple of | 18 |
item in the corpus | 18 |
the reader can see | 18 |
the collection as a | 18 |
as well as some | 18 |
make sense of the | 18 |
on a world map | 18 |
a word of interest | 18 |
use of a concordance | 18 |
org library virtue htm | 18 |
this blog posting describes | 18 |
a whole lot of | 18 |
get back a list | 18 |
not advocate the use | 18 |
counting the number of | 18 |
as if they were | 18 |
it used to be | 18 |
the answers to these | 18 |
of the items in | 18 |
ought to be able | 18 |
if it were a | 18 |
each record in the | 18 |
for the long haul | 18 |
can be used in | 18 |
the content they find | 18 |
can be imported into | 18 |
the increasing availability of | 18 |
to make sense of | 18 |
collection as a whole | 18 |
list of a href | 18 |
of books and journals | 18 |
the university libraries of | 18 |
people who work in | 18 |
a whole lot like | 18 |
the number of documents | 18 |
this is the first | 18 |
illustrating the most frequent | 18 |
to participate in the | 18 |
a set of books | 18 |
the reader to use | 18 |
was not able to | 18 |
do not advocate the | 18 |
use of the levenshtein | 18 |
at the a href | 18 |
the problem to be | 18 |
text version of a | 18 |
what are some of | 18 |
copies keep stuff safe | 18 |
he went on to | 17 |
to take advantage of | 17 |
it is more than | 17 |
tr td align center | 17 |
words in a text | 17 |
through the application of | 17 |
web and linked data | 17 |
worth a thousand words | 17 |
in the digital humanities | 17 |
they can be used | 17 |
text as well as | 17 |
ask and answer questions | 17 |
a li ul p | 17 |
picture is worth a | 17 |
the list of a | 17 |
mining and natural language | 17 |
outlines some of my | 17 |
width height src http | 17 |
advent of the internet | 17 |
is worth a thousand | 17 |
this posting documents my | 17 |
is one way to | 17 |
collecting water and putting | 17 |
provide the means for | 17 |
it is used to | 17 |
concord and merrimack rivers | 17 |
version of an article | 17 |
university of illinois at | 17 |
through a set of | 17 |
p p img align | 17 |
the google books project | 17 |
text mining and natural | 17 |
in the process of | 17 |
the concord and merrimack | 17 |
how they can be | 17 |
for the library profession | 17 |
to come up with | 17 |
on the concord and | 17 |
part ii of iii | 17 |
a list of urls | 17 |
li ol p the | 17 |
when it came to | 17 |
is a sort of | 17 |
it means to be | 17 |
close and distant reading | 17 |
td tr table h | 17 |
semantic web and linked | 17 |
by eric lease morgan | 17 |
to count and tabulate | 17 |
in a given text | 17 |
img width height src | 17 |
com alex alex catalogue | 17 |
provides an overview of | 17 |
as illustrated by the | 17 |
dpla beta sprint submission | 17 |
the scholarly communications process | 17 |
the center for research | 17 |
after a bit of | 17 |
digital library collections and | 17 |
this text is a | 17 |
a large number of | 17 |
different types of input | 17 |
td img width src | 17 |
target blank img src | 16 |
the opportunity to attend | 16 |
no such thing as | 16 |
a stop word list | 16 |
photos infomotions in set | 16 |
com images img src | 16 |
or just about any | 16 |
problem of find amp | 16 |
the distant reader can | 16 |
to step for each | 16 |
corpus li li a | 16 |
a report against the | 16 |
day configure use constant | 16 |
some of my experiences | 16 |
main sandbox liam etc | 16 |
may need to be | 16 |
sanity check my output | 16 |
is akin to the | 16 |
images img src http | 16 |
reader ought to be | 16 |
is a lot like | 16 |
by the number of | 16 |
authorities names n gt | 16 |
reader is intended to | 16 |
com photos infomotions in | 16 |
the heart of the | 16 |
to do the work | 16 |
after wrestling with wilson | 16 |
in this proposal are | 16 |
into a search box | 16 |
the day configure use | 16 |
with the mobile web | 16 |
process each record in | 16 |
search retrieve url service | 16 |
are plain text files | 16 |
outlines some of the | 16 |
of data and information | 16 |
this is not a | 16 |
goes a long way | 16 |
for a limited period | 16 |
the digital public library | 16 |
the traditional reading process | 16 |
and sanity check my | 16 |
for most of the | 16 |
the idea of love | 16 |
this posting describes a | 16 |
configure use constant etc | 16 |
the goal of the | 16 |
initialize and sanity check | 16 |
the second document is | 16 |
wilson for most of | 16 |
libraries are expected to | 16 |
results of text mining | 16 |
a good time was | 16 |
problem of find get | 16 |
number of times a | 16 |
a p p the | 16 |
border vspace hspace a | 16 |
the great books are | 16 |
comes to the idea | 16 |
to interact with the | 16 |
a means to an | 16 |
time of the reader | 16 |
world wide web servers | 16 |
from the university of | 16 |
one and only one | 16 |
this sort of work | 16 |
limited period of time | 16 |
wrestling with wilson for | 16 |
find is not the | 16 |
the author of the | 16 |
the principles of librarianship | 16 |
of the day configure | 16 |
is based on the | 16 |
the home page for | 16 |
a list of the | 16 |
is a tool for | 16 |
of the alex catalogue | 16 |
updated name and id | 16 |
in the hopes of | 16 |
p h links h | 16 |
most of the day | 16 |
org net c dm | 16 |
in the realm of | 16 |
the semantic web and | 16 |
they are intended to | 16 |
in the corpus li | 16 |
blank img src http | 16 |
posting outlines some of | 16 |
means to an end | 16 |
in the middle of | 16 |
some of the more | 16 |
is to figure out | 16 |
with wilson for most | 16 |
library and information science | 16 |
including but not limited | 16 |
the reader ought to | 16 |
the reader wanted to | 16 |
border br span a | 16 |
a limited period of | 16 |
provide services against the | 16 |
org library virtue tsv | 16 |
to the idea of | 16 |
the time of the | 16 |
png colors a li | 16 |
com sandbox liam tmp | 16 |
to a greater degree | 16 |
allows the reader to | 16 |
the center for digital | 16 |
some of them are | 16 |
of copies keep stuff | 16 |
spread full text indexing | 16 |
upon and visualize parts | 16 |
an introduction to the | 16 |
what words are used | 16 |
does not seem to | 16 |
subsets of the collection | 15 |
search retrieve via url | 15 |
of electronic texts a | 15 |
span good man span | 15 |
this does not mean | 15 |
it is relatively easy | 15 |
where in the world | 15 |
this is the briefest | 15 |
originally a blog posting | 15 |
blockquote p a href | 15 |
length of a book | 15 |
into other plain text | 15 |
denoting the location of | 15 |
set of perl modules | 15 |
one or more words | 15 |
of a corpus with | 15 |
is the briefest of | 15 |
not have been able | 15 |
publishing archival descriptions as | 15 |
called the great books | 15 |
i would not have | 15 |
chttp a f fsimile | 15 |
version of a corpus | 15 |
for more information about | 15 |
span td td align | 15 |
five different types of | 15 |
identifying themes and clustering | 15 |
of the functionality of | 15 |
clustering documents using mallet | 15 |
see these sorts of | 15 |
the open content alliance | 15 |
creating a plain text | 15 |
to the a href | 15 |
org library virtue figures | 15 |
this text was originally | 15 |
hspace vspace align right | 15 |
in conjunction with the | 15 |
seven of the top | 15 |
li li click the | 15 |
edu sandbox reader hackaton | 15 |
good work of others | 15 |
to share some of | 15 |
getting started with xml | 15 |
with the internet archive | 15 |
is in the form | 15 |
themes and clustering documents | 15 |
considering the fact that | 15 |
and clustering documents using | 15 |
a li ul h | 15 |
width br a href | 15 |
number of unique words | 15 |
used in conjunction with | 15 |
in the field of | 15 |
a corpus with tika | 15 |
creation and maintenance of | 15 |
edited version of an | 15 |
part i of iii | 15 |
as well as an | 15 |
of open access journals | 15 |
part iii of iii | 15 |
was originally a blog | 15 |
essay was originally published | 15 |
to any number of | 15 |
presentation was given at | 15 |
eye candy by eric | 15 |
documents some of my | 15 |
a picture is worth | 15 |
is just one example | 15 |
not be able to | 15 |
was originally given at | 15 |
a step in the | 15 |
set of books called | 15 |
the corpus li li | 15 |
and a set of | 15 |
written by the author | 14 |
set of marc records | 14 |
something to do with | 14 |
issues pull requests actions | 14 |
what degree do these | 14 |
was originally published on | 14 |
the university of chicago | 14 |
be as simple as | 14 |
when things were published | 14 |
how you use github | 14 |
you will be able | 14 |
does this have to | 14 |
d total number of | 14 |
td justice td td | 14 |
and a list of | 14 |
of text mining and | 14 |
the atom publishing protocol | 14 |
once and at the | 14 |
a data management plan | 14 |
hspace a a href | 14 |
td rose td td | 14 |
imported into for favorite | 14 |
it is a process | 14 |
github desktop if nothing | 14 |
word and phrase frequencies | 14 |
the answer lies in | 14 |
files that can be | 14 |
stanford named entity recognizer | 14 |
the size of the | 14 |
number of years ago | 14 |
desktop if nothing happens | 14 |
make it easy to | 14 |
great idea tfidf scores | 14 |
at the north carolina | 14 |
a set of documents | 14 |
was had by all | 14 |
into for favorite spreadsheet | 14 |
party analytics cookies to | 14 |
they used to be | 14 |
desktop and try again | 14 |
collection predicts the future | 14 |
the advantages and disadvantages | 14 |
along the way i | 14 |
changes in this release | 14 |
pull requests actions projects | 14 |
in a set of | 14 |
begin to see how | 14 |
services against the index | 14 |
is not only about | 14 |
determine whether or not | 14 |
term id formats magazine | 14 |
unable to create model | 14 |
vspace hspace a a | 14 |
td td td tr | 14 |
the files in the | 14 |
adventures of huckleberry finn | 14 |
download github desktop and | 14 |
words in each item | 14 |
good time was had | 14 |
planet eric lease morgan | 14 |
in the current environment | 14 |
code issues pull requests | 14 |
the result as a | 14 |
txt file for a | 14 |
a number of library | 14 |
as a data structure | 14 |
we use optional third | 14 |
distant reader is intended | 14 |
reload to refresh your | 14 |
elaborate upon and visualize | 14 |
many of the things | 14 |
the size of a | 14 |
i participated in a | 14 |
id formats magazine articles | 14 |
into a number of | 14 |
the advent of ubiquitous | 14 |
com so we can | 14 |
none none none none | 14 |
making it easier for | 14 |
in a galaxy far | 14 |
for a full list | 14 |
the body of the | 14 |
is left up to | 14 |
t total number of | 14 |
at once and at | 14 |
for better or for | 14 |
it is a good | 14 |
as the number of | 14 |
with a script called | 14 |
can build better products | 14 |
my water collection predicts | 14 |
with a violent fit | 14 |
the gnu public license | 14 |
of the levenshtein algorithm | 14 |
of the more interesting | 14 |
in order to get | 14 |
launching github desktop if | 14 |
get a list of | 14 |
there is a need | 14 |
definitions of these columns | 14 |
the life of a | 14 |
text versions of the | 14 |
to refresh your session | 14 |
be imported into for | 14 |
the distant reader and | 14 |
of interest to you | 14 |
musings on information and | 14 |
answers to these questions | 14 |
its purpose is to | 14 |
png alt width height | 14 |
edu f f fontologies | 14 |
another tab or window | 14 |
for a presentation at | 14 |
documenting my experiences at | 14 |
this have to do | 14 |
indexed true stored true | 14 |
a full list of | 14 |
publishing linked data is | 14 |
that needs to be | 14 |
of cultural heritage institutions | 14 |
the good folks at | 14 |
the type of content | 14 |
number of times they | 14 |
the declaration of independence | 14 |
i believe it is | 14 |
the definitions of these | 14 |
the totality of the | 14 |
to be read by | 14 |
a set of marc | 14 |
we can build better | 14 |
upon a sort of | 14 |
at the american library | 14 |
td chair td td | 14 |
of changes in this | 14 |
of rdf and linked | 14 |
alex alex catalogue of | 14 |
sort the result by | 14 |
integer denoting the number | 14 |
in the near future | 14 |
td word td td | 14 |
of services against texts | 14 |
file for a full | 14 |
time was had by | 14 |
the first part of | 14 |
the catholic youth literature | 14 |
on information and librarianship | 14 |
whether or not it | 14 |
when compared to the | 14 |
have to do with | 14 |
a reference to a | 14 |
left up to the | 14 |
an integer denoting the | 14 |
problem to solve is | 14 |
this being the case | 14 |
reference to a hash | 14 |
go to step until | 14 |
of books called the | 14 |
what does this have | 14 |
better or for worse | 14 |
imaginative or intellectual content | 14 |
galaxy far far away | 14 |
of king henry the | 14 |
formats magazine articles a | 14 |
as it used to | 14 |
a number of people | 14 |
c number of times | 14 |
a list of words | 14 |
a galaxy far far | 14 |
of a marc record | 14 |
f f fontologies fmods | 14 |
globally networked computers and | 14 |
github desktop and try | 14 |
water collection predicts the | 14 |
commenced upon a sort | 14 |
a presentation to the | 14 |
the tools described in | 14 |
an rdf triple store | 14 |
access to the materials | 14 |
p ol li a | 14 |
this is the tiniest | 14 |
in the linked data | 14 |
in cultural heritage institutions | 14 |
the middle of the | 14 |
become a part of | 14 |
advantages and disadvantages of | 14 |
items in a corpus | 14 |
so we can build | 14 |
a number of my | 14 |
each of the items | 13 |
align center td align | 13 |
submission will describe and | 13 |
in a single word | 13 |
tcp baxter graphs catalog | 13 |
with the help of | 13 |
related open source software | 13 |
requires the skills of | 13 |
how it can be | 13 |
given a corpus of | 13 |
the reader to the | 13 |
advent of globally networked | 13 |
a next generation library | 13 |
copyright and the digital | 13 |
the creation and maintenance | 13 |
more important than the | 13 |
similarities and differences between | 13 |
mods a chttp a | 13 |
and a number of | 13 |
p the a href | 13 |
the same time we | 13 |
colors a li li | 13 |
f fontologies fmods e | 13 |
the same time it | 13 |
and provide access to | 13 |
the advent of globally | 13 |
and the distant reader | 13 |
at the end of | 13 |
the majority of the | 13 |
set of plain text | 13 |
an a href http | 13 |
by henry david thoreau | 13 |
h p img src | 13 |
linkcode as camp creative | 13 |
i am in the | 13 |
browser emerson graphs catalog | 13 |
was originally written for | 13 |
is relatively easy to | 13 |
within the context of | 13 |
a couple of the | 13 |
li li use the | 13 |
using a set of | 13 |
week on the concord | 13 |
it is nice to | 13 |
ie utf tag infomotions | 13 |
td td a href | 13 |
with the use of | 13 |
one of the oldest | 13 |
not as important as | 13 |
will describe and demonstrate | 13 |
ol li a href | 13 |
is the tiniest of | 13 |
a couple of years | 13 |
p this posting describes | 13 |
might be able to | 13 |
tcp love graphs catalog | 13 |
com sandbox liam data | 13 |
of what it means | 13 |
build on the good | 13 |
absence from my employer | 13 |
the functionality of the | 13 |
seems to be an | 13 |
query prefix mods a | 13 |
on a google map | 13 |
td tr table this | 13 |
distant reader is a | 13 |
tr table p the | 13 |
enables a person to | 13 |
semantic web in libraries | 13 |
items in the collection | 13 |
it is not about | 13 |
share some of my | 13 |
an open source software | 13 |
some automated analysis of | 13 |
used to create the | 13 |
as camp creative creativeasin | 13 |
browser thoreau graphs catalog | 13 |
you can download the | 13 |
books when it comes | 13 |
it can be used | 13 |
in the world of | 13 |
on a regular basis | 13 |
it seems to be | 13 |
a set of tab | 13 |
the output of the | 13 |
how to use a | 13 |
but it is not | 13 |
a week on the | 13 |
of eric lease morgan | 13 |
directory of open access | 13 |
prefix mods a chttp | 13 |
may be able to | 13 |
could be associated with | 12 |
and answer questions of | 12 |
a travel log this | 12 |
exploit the use of | 12 |
count the number of | 12 |
the process of publishing | 12 |
relationships between subjects and | 12 |
this proposal are not | 12 |
versions of the pamphlets | 12 |
of these files are | 12 |
in the etc directory | 12 |
a few words into | 12 |
address the problem of | 12 |
outlined in this proposal | 12 |
these sorts of services | 12 |
tables of contents and | 12 |
declaration of independence gt | 12 |
amount of unstructured data | 12 |
a travel log documenting | 12 |
to be in the | 12 |
to do text mining | 12 |
triple store eric lease | 12 |
query language of relational | 12 |
containing the query terms | 12 |
the help of a | 12 |
need to have been | 12 |
to the file system | 12 |
open source software development | 12 |
a description of how | 12 |
this can be done | 12 |
imagine being able to | 12 |
not suppose to do | 12 |
enter a few words | 12 |
strong text mining strong | 12 |
on one hand there | 12 |
read and write marc | 12 |
full text of the | 12 |
in the world is | 12 |
the other way around | 12 |
of the works of | 12 |
by counting and tabulating | 12 |
this was done with | 12 |
the form of an | 12 |
there were a number | 12 |
for each item in | 12 |
configure use constant index | 12 |
services against texts outlined | 12 |
po s file not | 12 |
form of a uri | 12 |
what degree is the | 12 |
it is not only | 12 |
distribution comes with a | 12 |
to be used as | 12 |
here is a list | 12 |
what to do with | 12 |
describes how to use | 12 |
text was never published | 12 |
has founding date bc | 12 |
speed of four records | 12 |
sense of all the | 12 |
the local file system | 12 |
would be possible to | 12 |
only one root element | 12 |
thumbnail alt a a | 12 |
such a thing is | 12 |
into a triple store | 12 |
in the body of | 12 |
problem of find is | 12 |
uses of these files | 12 |
it is difficult to | 12 |
do a search in | 12 |
of linked open data | 12 |
metadata as linked data | 12 |
search engines of google | 12 |
the appearance of the | 12 |
average number of words | 12 |
in your text editor | 12 |
people want to do | 12 |
content uploads london img | 12 |
code li li code | 12 |
i am interested in | 12 |
library of congress authority | 12 |
eric lease morgan emorgan | 12 |
of a given word | 12 |
the form of plain | 12 |
in a relational database | 12 |
allow the reader to | 12 |
because libraries are expected | 12 |
for the reader to | 12 |
was attended by approximately | 12 |
ideas behind the semantic | 12 |
file contains a set | 12 |
store my etc etc | 12 |
of a serial nature | 12 |
of four records per | 12 |
this is a travel | 12 |
word to other words | 12 |
it does this by | 12 |
the value of marc | 12 |
list of words and | 12 |
of word in a | 12 |
number of words per | 12 |
originally given at the | 12 |
the proximity of a | 12 |
provide a means to | 12 |
the sru interface to | 12 |
few words into a | 12 |
marc file in your | 12 |
the results of the | 12 |
the problem is addressed | 12 |
create a word cloud | 12 |
range of imaginative or | 12 |
the speed of four | 12 |
marrying close and distant | 12 |
the semantics of xml | 12 |
text files that can | 12 |
these files are described | 12 |
and open access publishing | 12 |
store eric lease morgan | 12 |
to take full advantage | 12 |
the way to go | 12 |
mining is a process | 12 |
tr valign top td | 12 |
an overview of what | 12 |
a number of times | 12 |
behind the semantic web | 12 |
is a whole lot | 12 |
and select items of | 12 |
the hypertext transfer protocol | 12 |
arbitrary amount of unstructured | 12 |
blog posting on the | 12 |
to do things with | 12 |
location of word in | 12 |
notice how the word | 12 |
that can be imported | 12 |
is far from perfect | 12 |
of congress authority record | 12 |
unable to open store | 12 |
published on techessence at | 12 |
sp o file not | 12 |
who work in libraries | 12 |
were not limited to | 12 |
at the speed of | 12 |
with the linked data | 12 |
live in a world | 12 |
td tr table i | 12 |
and possible uses of | 12 |
log documenting my experiences | 12 |
td td tr tr | 12 |
the number of topics | 12 |
the store my etc | 12 |
the reader can run | 12 |
of words are used | 12 |
is not too difficult | 12 |
while the problem of | 12 |
they make it easier | 12 |
as well as from | 12 |
li ul h a | 12 |
collections as a whole | 12 |
and write marc records | 12 |
originally published in computers | 12 |
save the time of | 12 |
each file contains a | 12 |
taken with a violent | 12 |
i have to do | 12 |
such thing as a | 12 |
against the triple store | 12 |
documents my experience at | 12 |
given word to other | 12 |
size of library collections | 12 |
the size of library | 12 |
so p file not | 12 |
and plot on a | 12 |
l as o a | 12 |
travel log documents my | 12 |
edited version of a | 12 |
makes it easier to | 12 |
they allow you to | 12 |
type text indexed true | 12 |
day in the life | 12 |
the root of the | 12 |
as much as they | 12 |
this process is called | 12 |
is the mail going | 12 |
books called the great | 12 |
documents containing the query | 12 |
the most common words | 12 |
a file of marc | 12 |
world is the mail | 12 |
to enable the reader | 12 |
all the words in | 12 |
was done with a | 12 |
a few special characters | 12 |
to the distant reader | 12 |
form of plain text | 12 |
described in the file | 12 |
of traditional library principles | 12 |
an irish catholic layman | 12 |
and microsoft a reality | 12 |
of illinois at urbana | 12 |
files ought to be | 12 |
content uploads ngc lib | 12 |
has something to do | 12 |
select items of interest | 12 |
was taken with a | 12 |
and get back a | 12 |
information technology and libraries | 12 |
a given word to | 12 |
plain text files that | 12 |
allow you to define | 12 |
until you get tired | 12 |
this posting outlines some | 12 |
this presentation was originally | 12 |
open the store my | 12 |
this proposal assumes the | 12 |
a diverse set of | 12 |
possible uses of these | 12 |
it is not as | 12 |
suppose the reader wanted | 12 |
originally published on techessence | 12 |
and the frequency of | 12 |
of these columns and | 12 |
this travel log was | 12 |
i went to the | 12 |
top td align center | 12 |
in a document and | 12 |
the value of db | 12 |
the hesburgh libraries of | 12 |
and the result will | 12 |
i get many hits | 12 |
uk xslt ead rdf | 12 |
the problem to solve | 12 |
have the computer programmer | 12 |
days in the life | 12 |
words in the corpus | 12 |
center td align right | 12 |
to be manifested in | 12 |
it is intended to | 12 |
posting on the lita | 12 |
than ten years ago | 12 |
may be transformed into | 12 |
language of relational databases | 12 |
the topic of the | 12 |
presentation was originally given | 12 |
harvested data has been | 12 |
one of the most | 12 |
p file not found | 12 |
appears in a document | 12 |
includes a number of | 12 |
sort search results by | 12 |
div div id attachment | 12 |
the results in a | 12 |
next to you is | 12 |
all of the works | 12 |
blockquote p code curl | 12 |
of imaginative or intellectual | 12 |
align right tr td | 12 |
primary purpose was to | 12 |
travel log documenting my | 12 |
while i was there | 12 |
the search engines of | 12 |
letters of an irish | 12 |
at the present time | 12 |
presentation of the day | 12 |
of no more than | 12 |
width height border alt | 12 |
do this sort of | 12 |
columns and possible uses | 12 |
what sorts of things | 12 |
arguments my db argv | 12 |
are uniquely positioned to | 12 |
a document or corpus | 12 |
of the sru interface | 12 |
the creation of an | 12 |
it would be possible | 12 |
this posting documents some | 12 |
this analysis can be | 12 |
of an irish catholic | 12 |
this is because the | 12 |
files are described elsewhere | 12 |
line arguments my db | 12 |
and only one root | 12 |
the average length of | 12 |
if i get many | 12 |
the words in each | 12 |
p div div id | 12 |
my alex catalogue of | 12 |
be transformed into a | 12 |
lease morgan emorgan nd | 12 |
in a networked environment | 12 |
of contents and back | 12 |
words into a search | 12 |
text indexed true stored | 12 |
four records per minute | 12 |
these columns and possible | 12 |
actions projects security insights | 12 |
founding date bc http | 12 |
my experience at the | 12 |
o a code li | 12 |
report against the database | 12 |
the following query will | 12 |
of each of these | 12 |
the reader to see | 12 |
mentioned in the text | 12 |
authorities at the speed | 12 |
just about any other | 12 |
on techessence at http | 12 |
o file not found | 12 |
proximity of a given | 12 |
command line arguments my | 12 |
take full advantage of | 12 |
td colspan document td | 12 |
was going to be | 12 |
you can see the | 12 |
on a set of | 12 |
is very important to | 12 |
they are plain text | 12 |
s file not found | 12 |
using a number of | 12 |
the world is the | 12 |
can also be used | 12 |
there is too much | 12 |
are described in the | 12 |
some sort of database | 12 |
f total number of | 12 |
in a given document | 12 |
up to the reader | 12 |
requests actions projects security | 12 |
as many of the | 12 |
of all the content | 12 |
set or get the | 12 |
valign top td align | 12 |
chatter at code lib | 12 |
libraries are uniquely positioned | 12 |
on open source software | 12 |
a tool for reading | 12 |
the result was a | 12 |
use of the sru | 12 |
of notre dame is | 12 |
sets of words are | 12 |
a blog posting on | 12 |
total number of unique | 12 |
of documents containing the | 12 |
alt a a href | 12 |
process of publishing linked | 12 |
in the semantic web | 12 |
personal tei publishing system | 12 |
it does this in | 12 |
such a thing would | 12 |
a combination of the | 12 |
of perl modules called | 11 |
text and data mining | 11 |
is the home page | 11 |
in conjunction with other | 11 |
entities with stanford tools | 11 |
themes digital humanities a | 11 |
a christmas carol cite | 11 |
book describes how to | 11 |
id themes digital humanities | 11 |
named entities with stanford | 11 |
the better part of | 11 |
by the end of | 11 |
to facilitate searching keywords | 11 |
tr th align right | 11 |
enumerated a number of | 11 |
give it a whirl | 11 |
libraries and librarianship a | 11 |
of the project was | 11 |
all words in the | 11 |
fs fp fo d | 11 |
have a better understanding | 11 |
the goals of librarianship | 11 |
i have commenced upon | 11 |
with a bit of | 11 |
in the th century | 11 |
to learn about the | 11 |
wrote seven of the | 11 |
the full list of | 11 |
but in order to | 11 |
span td td img | 11 |
as a person who | 11 |
on the shoulders of | 11 |
of henry david thoreau | 11 |
p for a good | 11 |
shakespeare wrote seven of | 11 |
as a means of | 11 |
source software for libraries | 11 |
than one way to | 11 |
ul h a id | 11 |
digital humanities and the | 11 |
the goals of the | 11 |
as well as to | 11 |
term id themes digital | 11 |
and describes how they | 11 |
of changes in the | 11 |
the length of a | 11 |
outputs sets of structured | 11 |
voyant tools to do | 11 |
cite a christmas carol | 11 |
about open source software | 11 |
in no priority order | 11 |
the answer is not | 11 |
is a matter of | 11 |
then it would be | 11 |
things can be done | 11 |
of my experiences at | 11 |
the profession needs to | 11 |
of the code lib | 11 |
the integrated library system | 11 |
graphing with tableau public | 11 |
the shoulders of giants | 11 |
constant chatter at code | 11 |
other end of a | 11 |
take better advantage of | 11 |
the top ten books | 11 |
the other day i | 11 |
other plain text files | 11 |
metadata and full text | 11 |
of the collection is | 11 |
other end of the | 11 |
software is never done | 11 |
a part of this | 11 |
a person needs to | 11 |
in the release and | 11 |
with open source software | 11 |
i think it is | 11 |
here you will find | 11 |
of the archival community | 11 |
or their frequency li | 11 |
information as well as | 11 |
the vast majority of | 11 |
the proverbial fire hose | 11 |
takes an arbitrary amount | 11 |
tr td img width | 11 |
p blockquote p i | 11 |
would be able to | 11 |
of the university of | 11 |
where b fs fp | 11 |
restricted li li b | 11 |
posting documents some of | 11 |
more than one way | 11 |
american and english literature | 11 |
used to describe the | 11 |
some of my ideas | 11 |
wondering whether or not | 11 |
file for the full | 11 |
of term frequency inverse | 11 |
p table tr td | 11 |
searching keywords in context | 11 |
to a set of | 11 |
have commenced upon a | 11 |
in the set was | 11 |
a mailing list called | 11 |
it is interesting to | 11 |
for the full list | 11 |
align center tr td | 11 |
is a step in | 11 |
the creation of your | 11 |
term id themes libraries | 11 |
s of copies keep | 11 |
text mining and the | 11 |
ten books when it | 11 |
in a previous posting | 11 |
does not have to | 11 |
will need to have | 11 |
to the source code | 11 |
can be used as | 11 |
some of the possibilities | 11 |
a few weeks ago | 11 |
to do this work | 11 |
the advancement of learning | 11 |
is the first of | 11 |
in the context of | 11 |
b fs fp fo | 11 |
the role of the | 11 |
and named entities with | 11 |
id themes libraries and | 11 |
your milage may vary | 11 |
be a part of | 11 |
as a href http | 11 |
in a way that | 11 |
to be transformed into | 11 |
primary purpose is to | 11 |
sets of structured data | 11 |
given a set of | 11 |
the archival community has | 11 |
to the code lib | 11 |
top ten books when | 11 |
called the a href | 11 |
means to be human | 11 |
the similarities and differences | 11 |
changes in the release | 11 |
click a word to | 11 |
to rule them all | 11 |
their frequency li li | 11 |
tiny text mining tools | 11 |
for the advancement of | 11 |
to answer this question | 11 |
are we there yet | 11 |
save the result as | 11 |
and enable the reader | 11 |
facilitate searching keywords in | 11 |
of words and associated | 11 |
txt file for the | 11 |
p p there are | 11 |
an arbitrary amount of | 11 |
together for the advancement | 11 |
program called a href | 11 |
about a number of | 11 |
at the bottom of | 11 |
to be a list | 11 |
using voyant tools to | 11 |
working together for the | 11 |
of interest from the | 11 |
a good idea to | 11 |
align center tr valign | 11 |
of computers in libraries | 11 |
the release and have | 11 |
the time and effort | 11 |
tools to do some | 11 |
availability of full text | 11 |
needs to be a | 11 |
of the library catalog | 11 |
thank you for the | 10 |
to begin to see | 10 |
really intended to be | 10 |
the results can be | 10 |
given the full text | 10 |
p on the other | 10 |
were done against the | 10 |
outlined a number of | 10 |
this is the readme | 10 |
numeric characteristics of records | 10 |
is almost always a | 10 |
examples include but are | 10 |
their frequencies are listed | 10 |
myriad of reports enabling | 10 |
is the readme file | 10 |
is the way to | 10 |
many things it contains | 10 |
the values of the | 10 |
the works in the | 10 |
to include links to | 10 |
subjects and the objects | 10 |
li ul p in | 10 |
group of technical services | 10 |
in the traditional manner | 10 |
notre dame journal of | 10 |
expensive in terms of | 10 |
phrases in a corpus | 10 |
and sort the result | 10 |
of words per item | 10 |
and a sparql endpoint | 10 |
all have more things | 10 |
rdf as linked data | 10 |
this particular corpus employs | 10 |
lists connoting an idea | 10 |
of times the query | 10 |
counting and tabulating the | 10 |
one of a number | 10 |
the sizes of its | 10 |
average number of pages | 10 |
each book in the | 10 |
together can be illustrated | 10 |
library federation annual meeting | 10 |
the readme file for | 10 |
can be put into | 10 |
how many records are | 10 |
communication is the key | 10 |
how many things it | 10 |
t get me wrong | 10 |
application of computer science | 10 |
the processes of librarianship | 10 |
categories uncategorized comment on | 10 |
item in the collection | 10 |
in the afternoon i | 10 |
are there one or | 10 |
methods in the humanities | 10 |
in the midst of | 10 |
the characteristics of the | 10 |
o a camp creative | 10 |
occur in specific items | 10 |
of each of the | 10 |
written by the same | 10 |
can see there are | 10 |
tiny list of part | 10 |
the move with the | 10 |
phrases in a text | 10 |
the conference was a | 10 |
given the opportunity to | 10 |
a p blockquote p | 10 |
these sorts of questions | 10 |
to do some of | 10 |
advocated the creation of | 10 |
tr tr th align | 10 |
in it he described | 10 |
making it easy to | 10 |
sizes of its items | 10 |
the exception of the | 10 |
a set of words | 10 |
data into plain text | 10 |
washington university in st | 10 |
of the collection as | 10 |
choosing occur in specific | 10 |
and how it can | 10 |
of pages per item | 10 |
idea of interest to | 10 |
lib open source software | 10 |
can prove to be | 10 |
and put the result | 10 |
from a given document | 10 |
vocabularies used to describe | 10 |
well as a few | 10 |
p p the second | 10 |
of items in words | 10 |
cite td td img | 10 |
went on to describe | 10 |
it has something to | 10 |
tabulating the words in | 10 |
use your text editor | 10 |
occur in the corpus | 10 |
of records in the | 10 |
in the catalog can | 10 |
the past couple of | 10 |
as opposed to a | 10 |
the result into a | 10 |
services against the texts | 10 |
are expected to know | 10 |
to be quite insightful | 10 |
given document in your | 10 |
in the united kingdom | 10 |
be illustrated through a | 10 |
of what and how | 10 |
illustrated by the following | 10 |
to the materials through | 10 |
words are used across | 10 |
of linked data publishing | 10 |
display location of word | 10 |
the united states is | 10 |
is very similar to | 10 |
li li strong most | 10 |
in a corpus and | 10 |
and the sizes of | 10 |
documents in the collection | 10 |
has been saved in | 10 |
once this is done | 10 |
used across a corpus | 10 |
the linked data of | 10 |
select one or more | 10 |
list of top tech | 10 |
creative width height border | 10 |
all i have to | 10 |
my goal is to | 10 |
only game in town | 10 |
who is mentioned in | 10 |
and their associated links | 10 |
code lib open source | 10 |
new and different ways | 10 |
or more words in | 10 |
and in the end | 10 |
the great idea tfidf | 10 |
advances in information retrieval | 10 |
com behas oai lod | 10 |
number of pages per | 10 |
based on the information | 10 |
content available on the | 10 |
the directory of open | 10 |
or not it is | 10 |
gp product ref as | 10 |
of congress subject headings | 10 |
be able to use | 10 |
of the number of | 10 |
tcp love html a | 10 |
for creating and maintaining | 10 |
product ref as li | 10 |
the process is not | 10 |
h links h p | 10 |
is a process for | 10 |
and tabulating the words | 10 |
click the start button | 10 |
the future of library | 10 |
been saved in the | 10 |
any number of ways | 10 |
item of the corpus | 10 |
chttp a f fdata | 10 |
more relevant than the | 10 |
of words or phrases | 10 |
correlation between pages and | 10 |
move with the mobile | 10 |
each item of the | 10 |
in each item of | 10 |
provide a means for | 10 |
i was there i | 10 |
posting describes how i | 10 |
as o a camp | 10 |
not the problem to | 10 |
width height class alignright | 10 |
of linked data is | 10 |
the only game in | 10 |
you will want to | 10 |
open source software award | 10 |
your choosing occur in | 10 |
would be a good | 10 |
if the answer is | 10 |
in each of the | 10 |
the distant reader will | 10 |
learning how to use | 10 |
url pointing to the | 10 |
humanities computing techniques to | 10 |
services against the result | 10 |
the beginning of the | 10 |
catalog can be illustrated | 10 |
it outputs sets of | 10 |
things into a single | 10 |
originally written for a | 10 |
records in the catalog | 10 |
into the search box | 10 |
describes some of the | 10 |
the result in a | 10 |
increasing availability of full | 10 |
zip file with a | 10 |
width height hspace vspace | 10 |
do everything you would | 10 |
of times each occurs | 10 |
i learned about the | 10 |
the location of the | 10 |
your milage will vary | 10 |
and it outputs sets | 10 |
it needs to be | 10 |
tools of the trade | 10 |
will become a part | 10 |
collected this water while | 10 |
associated with the given | 10 |
of documents in the | 10 |
the metaphysics of morals | 10 |
through the process i | 10 |
as well as in | 10 |
structured data for analysis | 10 |
a given document in | 10 |
on the world wide | 10 |
possible to count and | 10 |
items from a corpus | 10 |
there are so many | 10 |
the use of words | 10 |
without the use of | 10 |
to count the number | 10 |
is the key to | 10 |
possible to measure additional | 10 |
connoting an idea of | 10 |
d aselect where b | 10 |
height class alignright size | 10 |
edu emorgan files img | 10 |
on the information above | 10 |
employs three such dictionaries | 10 |
texas library association annual | 10 |
are akin to the | 10 |
you would do in | 10 |
a corpus of documents | 10 |
is possible to measure | 10 |
part of king henry | 10 |
such as the one | 10 |
the open archives initiative | 10 |
as well as computers | 10 |
for a number of | 10 |
a zip file with | 10 |
and the application of | 10 |
specific sets of words | 10 |
the catalog can be | 10 |
located in the text | 10 |
prove to be quite | 10 |
to the principles of | 10 |
to what degree does | 10 |
in order to learn | 10 |
much as it is | 10 |
it is better to | 10 |
texts in ways that | 10 |
word appears in a | 10 |
document in your corpus | 10 |
as well as all | 10 |
in a text and | 10 |
one of the original | 10 |
what is text mining | 10 |
the application of computer | 10 |
these lists connoting an | 10 |
is the heart of | 10 |
of the day was | 10 |
is possible to count | 10 |
computational methods in the | 10 |
a word cloud of | 10 |
words are used in | 10 |
display the proximity of | 10 |
in the spirit of | 10 |
d printing working group | 10 |
to the wider community | 10 |
center tr valign top | 10 |
particular corpus employs three | 10 |
advocated the use of | 10 |
of text mining are | 10 |
the creation of locally | 10 |
metadata provides an overview | 10 |
have a mindset of | 10 |
written for a presentation | 10 |
analysis of the corpus | 10 |
a word or phrase | 10 |
the tools of the | 10 |
can be illustrated through | 10 |
all of the files | 10 |
is possible to create | 10 |
not the end itself | 10 |
do these words occur | 10 |
valley group of technical | 10 |
brought to my attention | 10 |
is not the only | 10 |
words of your choosing | 10 |
to build on the | 10 |
a few months ago | 10 |
aselect where b fs | 10 |
number of topic words | 10 |
in a library catalog | 10 |
these words occur in | 10 |
and tabulate how specific | 10 |
the ideas behind the | 10 |
of a triple store | 10 |
the frequency of the | 10 |
how specific sets of | 10 |
a limited number of | 10 |
perusing the list of | 10 |
ohio valley group of | 10 |
home page for the | 10 |
almost always a correlation | 10 |
the whys and hows | 10 |
ead into rdf xml | 10 |
overview of what and | 10 |
are not really about | 10 |
is a need for | 10 |
sizes of items in | 10 |
we will have to | 10 |
a member of the | 10 |
the university of toronto | 10 |
how words of your | 10 |
in order to keep | 10 |
tcp baxter html a | 10 |
with the availability of | 10 |
characteristics of records in | 10 |
as a librarian i | 10 |
words occur in the | 10 |
more than a few | 10 |
by the same author | 10 |
png width height alt | 10 |
and dissemination of data | 10 |
an idea of interest | 10 |
file with a companion | 10 |
with an overview of | 10 |
as you would expect | 10 |
ref as li tf | 10 |
to provide services against | 10 |
we have a mindset | 10 |
there are at least | 10 |
of the library profession | 10 |
the name of the | 10 |
pages and number of | 10 |
between pages and number | 10 |
given the increasing availability | 10 |
these files help answer | 10 |
seem to be the | 10 |
but they are not | 10 |
allow the user to | 10 |
makes a lot of | 10 |
is the creation of | 10 |
use any number of | 10 |
there is almost always | 10 |
see how words of | 10 |
provides a means for | 10 |
will continue to be | 10 |
camp creative width height | 10 |
are of possible interest | 10 |
always a correlation between | 10 |
count word and phrase | 10 |
a myriad of reports | 10 |
page for more details | 10 |
are a part of | 10 |
the purposes of this | 10 |
com sandbox bibframe data | 10 |
sandbox bibframe data data | 10 |
wonder whether or not | 10 |
all the works of | 10 |
you may want to | 10 |
whys and hows of | 10 |
a correlation between pages | 10 |
and how many things | 10 |
td thesis td td | 10 |
the right of the | 10 |
collected this water on | 10 |
what is the average | 10 |
in regards to the | 10 |
the process of find | 10 |
they are expected to | 10 |
cite td tr table | 10 |
the first is to | 10 |
will not be able | 10 |
at the beginning of | 10 |
can find plenty of | 10 |
content as well as | 10 |
collection as well as | 10 |
believe it or not | 10 |
to do with librarianship | 10 |
my first epub file | 10 |
searched the library of | 10 |
with a presentation by | 10 |
tcp baxter xml a | 10 |
as much as it | 10 |
to measure additional characteristics | 10 |
edit the value of | 10 |
what and how many | 10 |
its primary purpose was | 10 |
done with a script | 10 |
described and demonstrated a | 10 |
tfidf score for each | 10 |
such as text mining | 10 |
total number of pages | 10 |
of natural language processing | 10 |
matrix of scatter plots | 10 |
some of the characteristics | 10 |
primary purpose of the | 10 |
of open access publishing | 10 |
it is very important | 10 |
and it is a | 10 |
it is not easy | 10 |
code br a href | 10 |
of structured data for | 10 |
are used across a | 10 |
tabulate how specific sets | 10 |
you to go to | 10 |
library of congress subject | 10 |
frequencies are listed below | 10 |
as long as the | 10 |
gave a presentation called | 10 |
you are going to | 10 |
amount of full text | 10 |
there are many ways | 10 |
the same time they | 10 |
corpus employs three such | 10 |
this essay was written | 10 |
on the move with | 10 |
for items of interest | 10 |
a greater amount of | 10 |
the length of the | 10 |
sets of loosely defined | 10 |
everything you would do | 10 |
more words in these | 10 |
is a href http | 10 |
com gp product ref | 10 |
possible correlations between numeric | 10 |
tcp baxter text a | 10 |
is a form of | 10 |
against the database to | 10 |
creation of locally defined | 10 |
surrounding the topic of | 10 |
enter a word of | 10 |
a camp creative width | 10 |
alignright a href http | 10 |
caption alignright a href | 10 |
between numeric characteristics of | 10 |
how this can be | 10 |
the primary purpose of | 10 |
degree do these words | 10 |
org resource walt disney | 10 |
it is now possible | 10 |
be able to read | 10 |
to get a list | 10 |
e d aselect where | 10 |
the reader will be | 10 |
of top tech trends | 10 |
and how these frequencies | 10 |
a leave of absence | 10 |
not a whole lot | 10 |
and through the use | 10 |
were a number of | 10 |
linked data is about | 10 |
in these lists connoting | 10 |
li li create a | 10 |
com codeforkjeff refine viaf | 10 |
the reader to select | 10 |
pl configure use constant | 10 |
the results of step | 10 |
the library profession has | 10 |
a triple store and | 10 |
right of the query | 10 |
count and tabulate how | 10 |
every item in the | 10 |
i learned a lot | 10 |
is mentioned in the | 10 |
to answer questions like | 10 |
the whole thing into | 10 |
queries can be applied | 10 |
the humanities and sciences | 10 |
transformed into plain text | 10 |
the result on a | 10 |
interface to the index | 10 |
indiana library federation annual | 10 |
access control in libraries | 10 |
just like any other | 10 |
to see how words | 10 |
digital versions of books | 10 |
be able to do | 10 |
the semantic web is | 10 |
in the humanities and | 10 |
correlations between numeric characteristics | 10 |
the distribution of words | 10 |
then to what degree | 10 |
words in these lists | 10 |
in order to find | 10 |
day of the conference | 10 |
of your choosing occur | 10 |
there one or more | 10 |
notes on word usage | 10 |
and number of words | 10 |
with the content they | 10 |
the levenshtein distance algorithm | 10 |
li code br a | 10 |
on the web is | 10 |
a huge number of | 10 |
great ideas coefficient is | 10 |
a role in the | 9 |
from all over the | 9 |
is not easy to | 9 |
end of the workshop | 9 |
based on personal experience | 9 |
edited version of eric | 9 |
files need to be | 9 |
see how easy it | 9 |
some of my take | 9 |
as a way of | 9 |
sort of leave of | 9 |
the a href https | 9 |
preservationists have the most | 9 |
i wrote a perl | 9 |
via linked data eric | 9 |
configure use constant root | 9 |
the great books survey | 9 |
this posting describes the | 9 |
my personal tei publishing | 9 |
in the recent past | 9 |
as of this writing | 9 |
this is a set | 9 |
it comes to love | 9 |
interactive map pie chart | 9 |
find all triples with | 9 |
interface allowing the reader | 9 |
with optical character recognition | 9 |
the current environment where | 9 |
to take better advantage | 9 |
new dog old tricks | 9 |
the result to a | 9 |
go to step on | 9 |
outlines my experiences there | 9 |
tr table h day | 9 |
i was happy to | 9 |
know about literary history | 9 |
a set of plain | 9 |
i was not able | 9 |
will probably happen because | 9 |
to investigate how to | 9 |
content of the hathitrust | 9 |
also be used as | 9 |
p p i then | 9 |
this is a tiny | 9 |
the profession does not | 9 |
in the current directory | 9 |
hspace vspace br a | 9 |
posting outlines my experiences | 9 |
be used to facilitate | 9 |
fun with elasticsearch and | 9 |
did my best to | 9 |
the last two weeks | 9 |
id formats technical report | 9 |
rss and the rss | 9 |
is more than possible | 9 |
number of open source | 9 |
the center of the | 9 |
linked data eric lease | 9 |
all over the world | 9 |
p p for the | 9 |
on the processes of | 9 |
accessible via linked data | 9 |
td newton td td | 9 |
are a few sample | 9 |
examples of how the | 9 |
to know how to | 9 |
p the distant reader | 9 |
page describes a corpus | 9 |
used to determine the | 9 |
to see how easy | 9 |
one of the things | 9 |
and the use of | 9 |
somewhere along the line | 9 |
because of sparql syntax | 9 |
they will want to | 9 |
term id formats technical | 9 |
bottom of the page | 9 |
fruits of my labors | 9 |
was given by strong | 9 |
for the creation of | 9 |
viaf identifiers for more | 9 |
order to be useful | 9 |
al li li b | 9 |
p the purpose of | 9 |
version of eric lease | 9 |
it ought to be | 9 |
there is the disclaimer | 9 |
the topic of mass | 9 |
the current state of | 9 |
p blockquote p where | 9 |
it is simply not | 9 |
a subset of liam | 9 |
to create and maintain | 9 |
org target blank http | 9 |
fun with rss and | 9 |
the content of books | 9 |
endpoint to a subset | 9 |
is simply not possible | 9 |
to evolve in order | 9 |
think of it as | 9 |
but not necessarily limited | 9 |
tarzan of the apes | 9 |
the forest from the | 9 |
for a set of | 9 |
compass by andrew sutton | 9 |
essay about my water | 9 |
leave of absence from | 9 |
the most challenging job | 9 |
output will be saved | 9 |
probably happen because of | 9 |
script called a href | 9 |
gallery valencia pages img | 9 |
of leave of absence | 9 |
the structure of the | 9 |
a number of open | 9 |
i learned a number | 9 |
the western world cite | 9 |
the beginnings of the | 9 |
exploiting the content of | 9 |
posting documents my experience | 9 |
files in the current | 9 |
library of america beta | 9 |
opportunities for future study | 9 |
provide a way to | 9 |
source software and libraries | 9 |
opportunity to visit the | 9 |
article was originally written | 9 |
museum and library services | 9 |
full text of all | 9 |
edu sandbox hathi downloadable | 9 |
script to rule them | 9 |
some of the challenges | 9 |
png width a br | 9 |
to create your own | 9 |
td tr tr th | 9 |
sparql endpoint to a | 9 |
formats technical report a | 9 |
sandbox liam tmp guidebook | 9 |
forest from the trees | 9 |
happens to be a | 9 |
been processed with optical | 9 |
shared with the audience | 9 |
using bibframe for bibliographic | 9 |
a document li li | 9 |
library school graduate students | 9 |
into a set of | 9 |
and this is my | 9 |
for the a href | 9 |
of absence from my | 9 |
for the purpose of | 9 |
of sparql syntax errors | 9 |
can be implemented through | 9 |
vspace br a href | 9 |
where the output will | 9 |
one way to skin | 9 |
fontologies fmods e d | 9 |
all triples with rdf | 9 |
specific types of nouns | 9 |
is not a panacea | 9 |
a br code li | 9 |
into a data structure | 9 |
in this release and | 9 |
log documents my experiences | 9 |
this is a simple | 9 |
topic modeling is an | 9 |
of the types of | 9 |
this page describes a | 9 |
align center a href | 9 |
provide the means to | 9 |
a few of my | 9 |
come up with a | 9 |
number of ways the | 9 |
creation of your own | 9 |
this travel log outlines | 9 |
php fm article view | 9 |
in order to remain | 9 |
not as necessary as | 9 |
be able to understand | 9 |
then what might that | 9 |
once in a while | 9 |
goal of the project | 9 |
with rss and the | 9 |
notre dame digital humanities | 9 |
and the rss aggregator | 9 |
not necessarily limited to | 9 |
the core principles of | 9 |
subset of liam linked | 9 |
happen because of sparql | 9 |
what open source software | 9 |
python natural language toolkit | 9 |
how the process of | 9 |
and demonstrates how the | 9 |
new york technical services | 9 |
way to skin a | 9 |
a special issue of | 9 |
more information about the | 9 |
identifiers for more than | 9 |
on the use of | 9 |
of liam linked data | 9 |
mining and the digital | 9 |
combined with primo central | 9 |
this was a presentation | 9 |
of my alex catalogue | 9 |
institute of museum and | 9 |
but there is the | 9 |
this posting outlines how | 9 |
software and open access | 9 |
in the last century | 9 |
was written for the | 9 |
was given at the | 9 |
presentation was given to | 9 |
just about any type | 9 |
p p in a | 9 |
a few of us | 9 |
michael hart in roanoke | 9 |
p ul li what | 9 |
et al li li | 9 |
one way to accomplish | 9 |
describes a corpus named | 9 |
the tennessee library association | 9 |
been able to do | 9 |
a number of rudimentary | 9 |
td align center a | 9 |
a new dog old | 9 |
it is easier to | 9 |
this posting outlines my | 9 |
states agriculture information network | 9 |
provide better library service | 9 |
tr table p i | 9 |
of the conference was | 9 |
edu emorgan files model | 9 |
coercing the corpus into | 9 |
bibframe for bibliographic description | 9 |
the corpus as a | 9 |
to make it easier | 9 |
of top technology trends | 9 |
on the content of | 9 |
with elasticsearch and marc | 9 |
very much like the | 9 |
is more important than | 9 |
text of all the | 9 |
td car td td | 9 |
triples with rdf schema | 9 |
this article describes the | 9 |
at the national library | 9 |
a few sample queries | 9 |
the existence of a | 9 |
of years ago i | 9 |
and links to the | 9 |
source software and open | 9 |
a travel log http | 9 |
its primary purpose is | 9 |
to skin a cat | 9 |
about my water collection | 9 |
the skills of many | 9 |
outlines my experiences at | 9 |
characteristics of a text | 9 |
reader is a tool | 9 |
the state of the | 9 |
order to remain relevant | 9 |
teaching a new dog | 9 |
topic of mass digitization | 9 |
tcp love xml a | 9 |
of museum and library | 9 |
given a text and | 9 |
i wish i could | 9 |
written by pete johnston | 9 |
as well as others | 9 |
and english literature as | 9 |
the same time i | 9 |
it is imperative to | 9 |
this release and have | 9 |
linked data is the | 9 |
a sort of leave | 9 |
and saving the result | 9 |
to a subset of | 9 |
an essay about my | 9 |
there is more than | 9 |
please see the download | 9 |
a person to do | 9 |