The Cybernetics Thought Collective: Machine-Generated Data Using Computational Methods


(1) Overview
Repository location
https://digital.library.illinois.edu/collections/38ec6 
eb0-18c3-0135-242c-0050569601ca-1.

Context
Cybernetics was a transdisciplinary scientific move-
ment in the mid-twentieth century that emerged from 
the Macy Conferences on “Circular Causal and Feedback 
Mechanisms in Biological and Social Systems” (1946–
1953), as well as the publication of Norbert Wiener’s 
Cybernetics: Or the Control and Communication in the 
Animal and the Machine [34]. As a postwar movement 
that explored the possibilities and implications of “think-
ing machines” amid broader social currents, cybernetics 
inspired questions about what it means to be a human 
being, a machine, or a social system. Discussions radiating 
from the Macy Conferences evolved into correspondence 
networks, publications, and the establishment of centers 
for systems and cybernetics. 

The data were created for “The Cybernetics Thought 
Collective: A History of Science and Technology Portal 
Project,” a grant project funded by the National 
Endowment for the Humanities’ Humanities Collections 
and Reference Resources program (US) [29]. The project 
was led by the University of Illinois at Urbana-Champaign 
Library in collaboration with the American Philosophical 
Society, the British Library, and the MIT Distinctive 
Collections. By making the data available, the project 
seeks to reveal insights about the cybernetics phenome-
non through the “thought collective” [11] that exchanged 
and interrogated ideas through correspondence and other 
records.

(2) Methods
Steps
Digitization and OCR
The four participating institutions identified correspond-
ence, scientific journals, and publications from the per-
sonal archives (or fonds) of W. Ross Ashby, Warren S. 

DATA PAPER

The Cybernetics Thought Collective: Machine-Generated 
Data Using Computational Methods
Bethany G. Anderson
University of Illinois at Urbana-Champaign, US
bgandrsn@illinois.edu

This dataset comprises machine-generated data from the research records and personal archives of four 
founding members of the transdisciplinary field of cybernetics—W. Ross Ashby, Warren S. McCulloch, 
Heinz von Foerster, and Norbert Wiener. These archives (or, fonds) are held by the British Library, the 
American Philosophical Society, the University of Illinois at Urbana-Champaign, and MIT, respectively. The 
data were created for “The Cybernetics Thought Collective: A History of Science and Technology Portal 
Project” (2017–2019), a pilot project funded by the National Endowment for the Humanities (NEH). Using 
computational methods and tools—machine learning, named entity recognition, and natural language pro-
cessing—on digitized archival records, the data were generated to enhance archival access in three dis-
tinct but interrelated ways: as archival metadata for the digitized records, as reusable data to facilitate 
digital scholarly analyses, and as the basis for a series of test visualizations. The data represent entities 
associated with cybernetic concepts and the main actors attached to the cybernetics movement and the 
exchange of its ideas. The dataset is stored along with the digitized records in the University of Illinois 
(U of I) Library’s multi-tiered repository, a replicated preservation service based on PREMIS (Preserva-
tion Metadata: Implementation Strategies). Reuse potential for this dataset includes historical/archival, 
linguistic, and artistic analyses of the data to examine connections between the cybernetic entities. 

Keywords: Archive records; Science and technology; Social networks 
Funding statement: “The Cybernetics Thought Collective: A History of Science and Technology Portal 
Project” (NEH PW-253912-17), was funded by the National Endowment for the Humanities’ Humanities 
Collections and Reference Resources program (US).

Anderson BG 2020 The Cybernetics Thought Collective: Machine-
Generated Data Using Computational Methods. Journal of Open 
Humanities Data, 6: 7. DOI: https://doi.org/10.5334/johd.19

https://digital.library.illinois.edu/collections/38ec6eb0-18c3-0135-242c-0050569601ca-1
https://digital.library.illinois.edu/collections/38ec6eb0-18c3-0135-242c-0050569601ca-1
mailto:bgandrsn@illinois.edu
https://doi.org/10.5334/johd.19


Anderson: The Cybernetics Thought CollectiveArt. 7, page.  2 of 6

McCulloch, Heinz von Foerster, and Norbert Wiener for 
digitization. In total, 61,067 pages of archival records 
were digitized, resulting in 615 digital objects (which rep-
resent folder-level or multi-page item-level aggregations 
of digitized records). The project created PDFs for archival 
access purposes as well as high-resolution preservation 
TIFF files. The former were processed by optical character 
recognition (OCR) software to make the records machine-
readable. Some materials are also handwritten and were 
transcribed as time allowed.

Normalization and Input Creation
PDFMiner [20] was used to extract text from the OCR-ed 
records into plaintext files. Before testing entity extrac-
tion, natural language processing, and machine learn-
ing software, text remediation and normalization was 
needed to both address OCR errors and to translate some 
of the fonds’ Italian, Spanish, French, and German texts 
into English. Translation was completed with the aid 
of N-grams and Googletrans [17], while Wolfram Text 
Analysis tools [35] were used to remove stopwords. 

Concurrent with this step, the project team created 
inputs, or a cybernetics vocabulary. The project sought 
to specifically identify and extract cybernetic entities; 
fortunately, cybernetics has a distinct set of core con-
cepts related to behavior, self-organization, and feedback 
mechanisms, from which a vocabulary could be derived [2, 
25]. Identifying this vocabulary was especially important 
for connecting concepts and agents to each other in the 
cybernetics network. The project team used Cybernetics of 
Cybernetics: Or, the Control of Control and the Communication 
of Communication as a source for generating a cybernetics 
vocabulary [32]. Cybernetics of Cybernetics is a compilation 
that prominently features Ashby, McCulloch, von Foerster, 
Wiener, and key cybernetic ideas at the time that they 
were active in the transdiscipline. A digital version of the 
text was run through Voyant Tools [28] to generate a list 
of keywords based on frequency. This list was narrowed to 
include the most frequently occurring terms (about 200 
total). Members of the project’s advisory board (who com-
prised technologists and subject-experts in cybernetics) 
reviewed this list and offered additional suggestions.

Entity Extraction, Natural Language Processing, 
and Classification 
Using this cybernetic vocabulary as inputs, one of the 
project’s programmers experimented with a number 
of Python libraries for natural language processing and 
named entity extraction (e.g., NLTK [6] and spaCy [24]) 
and the University of Illinois Cognitive Computation 
Group’s NLP pipeline software [7]. Following entity identi-
fication and extraction, the project team decided to adopt 
a supervised machine learning approach to classify the 
records into four broad categories: Mathematics/Logic, 
Computers/Machines, Psychology/Neuroscience, and 
Personal. Naïve Bayes [8] and Weka [14] were used for 
the machine learning portion of the project. Percentages 
of certainty for the classifications were also generated 
through this process. Additional testing was performed 
with sentiment analysis using NLTK and VADER [19].

Text Processing and Remediation Pipeline
After testing various software, a Python-based pipeline 
was developed (see Figure 1). Following the pipeline, the 
project team imported text from the PDF files into plain-
text; normalized the files; removed files that contained a 
significant amount of noise that could not be easily reme-
diated with existing tools in the allotted timeframe; iden-
tified the language of the documents and translated into 
English where necessary; extracted entities; classified the 
documents into categories; and estimated the percentage 
of certainty for each category per document. The pipeline 
is documented in the project’s GitHub Repository [30]. All 
of the data resulting from the entity extraction, sentiment 
analysis, and machine learning steps were imported into a 
CSV file that was made available for research use and used 
as metadata for the digital collection. A more detailed 
overview of the methodology and the steps employed is 
delineated in the project’s white paper [4]. It is important 
to note that the creation of this pipeline was not a linear 
process and involved retesting tools and revisiting several 
steps.

Preservation and Access
The PDFs of the digital surrogates and files containing 
the inputs, classification data, percentages of certainty for 
those classifications, and extracted entities were ingested 
into the University of Illinois Library’s digital repository 
service for preservation and access. The digital reposi-
tory (known as Medusa [31]) is a replicated multi-tiered 
Fedora-based repository that uses PREMIS [26]. The clas-
sification data, percentages of certainty, and entities also 
populate the metadata application profile for each PDF 
in the repository’s access interface. The data were made 
available as a dataset via a CSV file for users to download, 
along with a CSV file containing the original inputs and 
a readme file that provides additional information about 
the data and the process that created them [3]. The data-
set includes file-level metadata, some of which is human 
created (e.g., level of description and title), and some of 
which is collection-level metadata that applies to all dig-
ital objects in the same fonds (e.g., scope and contents, 
parent collection, collection identifier), and provides origi-
nal archival context for the machine-generated data (e.g., 
machine-extracted feature, cybernetic classification, cer-
tainty). These fields are described more fully in the data 
dictionary in the readme file. A selection of the data was 
also used to create test visualizations, which are available 
on the project site. 

Sampling strategy
This pilot project aimed to produce a proof-of-concept 
machine learning, named entity recognition, and natural 
language processing pipeline for meta/data generation 
and classification of archival records; through this process, 
a representative sample of documents that illustrate prom-
inent cybernetic concepts and consist of letters between 
von Foerster, Ashby, McCulloch, Wiener, and other known 
cyberneticians were selected from across the four fonds. 
However, statistical sampling techniques were employed 
at various stages of the natural language processing and 


Anderson: The Cybernetics Thought Collective Art. 7, page.  3 of 6 

entity-extraction workflow. For example, to translate texts 
into English, a test set of approximately 200 documents in 
English, German, French, and Italian was created in order 
to employ an N-gram approach to language identification. 
The Python library Googletrans was then used to translate 
the texts into English. Additionally, a training set of 154 
documents from all fonds were manually annotated and 
prepared for the supervised classification model.

Quality Control
The majority of the records from the four fonds are type-
written; these records were processed with OCR software, 
and, as time allowed, handwritten documents were tran-
scribed. The texts also required “normalization” in order 
to be machine-ready. After extracting the text from the 
OCR-ed records to import into plaintext files, character 
errors that resulted from OCR were remediated (e.g., extra 
spaces between letters in a word, or alpha-numeric char-
acters that were misread as non-ASCII characters). 

Statistical analysis was performed on the extracted enti-
ties to identify which entities surface the most frequently 
in the corpus, as a means of determining which entities 

appear most significant. We tested this through N-grams 
and Term-Frequency Inverse Document Frequency 
(TF-IDF) to determine the frequency of an entity in each 
document and thus its importance throughout the entire 
corpus. Using TF-IDF in an archival context has precedent 
([10], pp. 109–110), so we hoped that it would have utility 
for the project. The team felt this would be useful for com-
parison against the original cybernetic inputs. However, 
despite removing “noise” such as stop-words (i.e., com-
monly used words like “the,” “of,” or “but”), TF-IDF proved 
not as reliable as an N-gram approach for determining 
entity relevancy within the corpus. For TF-IDF to produce 
more useful results, document-length would need to be 
normalized. Given the overall nonuniformity of archival 
records in this particular corpus (and in archival fonds in 
general), it is difficult to normalize records for length. 

To assess the accuracy of the machine learning results, 
the project team used Weka to perform a chi-squared anal-
ysis to help us better understand the accuracy of the train-
ing set in the classification process. The results revealed 
71.1% “true positives” and 4% “false positives,” indicating 
that the majority of the entities were useful in informing 

Figure 1: Illustration of extraction, classification, and preservation/access pipeline.


Anderson: The Cybernetics Thought CollectiveArt. 7, page.  4 of 6

which documents were classified into specific categories. 
However, a manual analysis revealed more false positives 
(i.e., a few inaccurately classified documents). This assess-
ment enabled the project team to perform a degree of 
quality control on the dataset and understand how we 
might improve the machine learning results in the future, 
especially by creating a larger training set.

Assessment
As a proof of concept, the Cybernetics Thought Collective 
project opened up the possibility of applying compu-
tational methods to archival records. But it also opened 
up questions about how to develop and streamline com-
putational workflows in an archival setting, how best to 
document those workflows to facilitate data reuse and 
reproducibility, and to provide transparency so that users 
can understand the “computational provenance” of the 
results. 

While the results of the project did reveal connections 
between documents across the four fonds through the 
extracted entities, the machine learning results indicated 
a need for additional refinement. For example, some of 
the documents which almost exclusively consisted of dis-
cussions of a technical nature were classified as “Personal.” 
Thus, we will need larger training sets in tandem with bet-
ter quality control mechanisms to produce more reliable 
results. Participants in computational archival projects 
need to be able to anticipate the labor necessary for creat-
ing viable inputs and training sets and for verifying the 
trustworthiness of the results. 

Computational archival projects require close collabo-
ration between archivists, programmers, data curators, 
and digital preservationists, who each provide vital input 
and expertise at different decision points. Likewise, in the 
future more engagement with potential users will be vital 
for determining the utility of the results and their impli-
cations for archival research, which should also inform 
the creation and refining of processes that generate these 
datasets.

The project raised questions about the relationship 
between machine-generated archival datasets and the 
original archival records—especially how that relationship 
is represented in both archival systems and visualization 
interfaces in order to ensure the original “archival prov-
enance” of the data and materials from which they are 
derived are clearly described to prevent decontextualiza-
tion. Digital records, and the data generated from them, 
can provide greater context and enhance access to each 
other. Therefore, it is important to find ways to make 
them mutually discoverable in archival access systems.

(3) Dataset description
Object name
CTC_Machine-Generated-Data.csv

Format names and versions
CSV

Creation dates
2017-10-25 to 2018-05-07

Dataset Creators
1. Bethany Anderson, University of Illinois at 

Urbana-Champaign (conceptualization, data cura-
tion, funding acquisition, methodology; project 
administration, resources, supervision)

2. Christopher J. Prom, University of Illinois at Urbana-
Champaign (conceptualization, data curation, 
funding acquisition, methodology, project adminis-
tration, resources)

3. Anirudh Chandrashekhar (data curation, formal 
analysis, investigation, methodology, software, 
validation)

4. Saumye Kaushik (data curation, formal analysis, 
investigation, software)

5. Alex Dolski, University of Illinois at Urbana- Champaign 
(investigation, methodology, software)

6. James A. Hutchinson, University of Illinois at 
Urbana-Champaign (conceptualization, data cura-
tion, funding acquisition)

7. Mark Sammons, University of Illinois at Urbana-
Champaign (methodology, resources, software)

8. Kevin Hamilton, University of Illinois at Urbana-
Champaign (conceptualization, funding acquisi-
tion) 

9. Charles Greifenstein, American Philosophical Soci-
ety (funding acquisition, methodology, resources)

10. Jonathan Pledge, British Library (funding acquisi-
tion, methodology, resources)

11. Thomas Rosko and Beverly Turner, MIT (funding 
acquisition, methodology, resources)

Language
English, French, Italian, German, and Spanish.

License
CC BY 4.0

Repository name 
University of Illinois Digital Collections repository

Publication date
2020-01-24

(4) Reuse potential
While cybernetics experienced a heyday that spanned the 
mid-late twentieth century, its philosophical influences 
are widespread. Indeed, vestiges of cybernetics continue 
to surface in modern computing, information theory, and 
cognitive science, as just a few examples [21, 9]. Because of 
this intellectual omnipresence, the data have the poten-
tial to shed light on the etymology of concepts and disci-
plinary areas of specialization. For example, the data may 
be useful for contributing to discussions about artificial 
intelligence and its relationship to cybernetics [12, 27]. 
However, these data do not provide insight into the evo-
lution of the terms themselves or how the relationships 
between entities shifted and changed over time. 

From a historical perspective, the data can be reused to 
reveal additional connections between cybernetic entities 
and the scientists who formed the cybernetics movement. 


Anderson: The Cybernetics Thought Collective Art. 7, page.  5 of 6 

Cybernetics continues to be of recent interest to historians 
and science and technology studies scholars (for example 
[21, 1]). Since this was a pilot project that resulted (in part) 
in several test visualizations, the data are also available 
for bulk download to facilitate use in other digital schol-
arship projects. We hope that this opens the data up to 
new questions and explorations of the boundaries of the 
“thought collective,” while also serving as a step toward 
meeting emergent research needs within a digital scholar-
ship framework (for example [18]). 

An important aspect of data reuse is providing sufficient 
contextual information to enable a variety of reuse(s). 
Because the dataset includes information about the origi-
nal digital records from which the entities are generated, 
and thus the original fonds, this may lead to new pathways 
to the digitized records themselves that are in line with 
FAIR data reuse principles for archival materials [22]. At 
the same time, the relative success with which research-
ers are able to reuse the data and gain new insights can 
inform the project’s future phases as it refines its software 
pipeline and methods for assessing quality control. It is 
hoped that this data paper provides additional informa-
tion about the process that generated the data, so that 
others may test its reproducibility and assess the results. 

It is worth noting that reuse should also logically extend 
to the digital records themselves; all digitized materials 
have been made machine-readable and are accessible 
through the University of Illinois’ repository/digital col-
lections portal. Users can download the OCR-ed records, 
process them through different software pipelines, and 
perform their own computational analyses. While the 
methods employed by this project sought to extract data 
from the records, drawing a distinction between the reuse 
potential of the records themselves and the data generated 
from them is somewhat blurry given the interdependence 
of the data on the records to elucidate their context(s) and 
make them reusable [15]. It is thus important to empha-
size that the digital records themselves are (re)usable. A 
future phase of this project will seek to engage research-
ers and the archival community in identifying additional 
reuse cases for both the data and the digitized records 
themselves, and investigate the possibility of interactive 
interfaces that open up explorations of records as data 
and user-driven reorderings of content [23, 36].

The data also have potential reuse value in a visual cul-
ture space. Cybernetics (especially second-order cybernetics) 
invoked visual and art historical references to interrogate 
and illustrate many of its ideas. For example, to peruse the 
publications that emerged from the Biological Computer 
Laboratory—the center for cybernetics at the University of 
Illinois directed by Heinz von Foerster—is to become simul-
taneously immersed in scientific diagrams and esoteric 
imagery of ouroboros and art historical iconography (see, for 
example [33]). Cybernetics has inspired “cybernetic art” and 
explorations of media culture through a cybernetic lens [5, 
13]. Examples of cybernetic data either informing or becom-
ing artistic works themselves also have precedent, indicating 
that such reuses are not unimaginable [16]. The data result-
ing from the project can contribute to cybernetic explora-
tions at the intersection of art, technology, and new media. 

Additional File
The additional file for this article can be found as follows:

•	 Readme for the Cybernetics Thought Collective 
Data. This readme file contains a brief description 
of the dataset, metadata fields, and the process of 
data creation. https://digital.library.illinois.edu/items/ 
3cd33c50-8c95-0138-729a-02d0d7bfd6e4-8.

Acknowledgements
Thank you to the National Endowment for the Humanities 
for providing funding that supported this project. I would 
like to express my deep gratitude to my collaborators on 
this project—Christopher J. Prom, James A. Hutchinson, 
Kevin Hamilton, Alex Dolski, Mark Sammons, Charles 
Greifenstein, Jonathan Pledge, Thomas Rosko, and 
Beverly Turner. I am also grateful for the project’s advi-
sory board members who generously shared their time 
and insights. Special thanks to Stephen Wolfram and 
Jesús V. Hernández of Wolfram Research for donating 
time and technology resources to the project. I especially 
want to thank Anirudh Chandrashekhar, whose work 
was crucial to the project’s success. Many colleagues 
offered advice and guidance during the project, espe-
cially William J. Maher, MJ Han, Patricia Lampron, Angela 
Waarala, Tom Habing, Kyle Rimkus, Jennifer Hain Teper, 
and Kathie Veach. In addition, there were many other con-
tributors to the project: Christine Pallon, Shreya Udhani, 
Brinna Michael, Alicia Hopkins, Tanairy Delgado, Saumye 
Kaushik, and Meghna Shrivastava. Lastly, many thanks to 
Heidi Imker and Kelli Trei for providing invaluable feed-
back on this paper.

Competing Interests
The author has no competing interests to declare.

Author Contributions
Conceptualization; Data curation; Funding acquisi-
tion; Methodology; Project administration; Supervision; 
Writing – original draft. 

References
1. Abraham TH. Rebel Genius: Warren S. McCull-

och’s Transdisciplinary Life in Science. The MIT 
Press. 2016. DOI: https://doi.org/10.7551/mit-
press/9780262035095.001.0001.

2. American Society for Cybernetics. (n.d.). ASC Glos-
sary. Retrieved May 28, 2020, from http://www.asc-
cybernetics.org/foundations/ASCGlossary.htm.

3. Anderson BG. Data from the Cybernetics Thought 
Collective [Data set]. University of Illinois Digital 
Library. 2020. https://digital.library.illinois.edu/
items/3c80ad40-8c95-0138-729a-02d0d7bfd6e4-b.

4. Anderson BG, Prom CJ, Hutchinson JA, 
Chandrashekhar A, Michael B, Udhani S, Sammons 
M, Dolski A, Hamilton K, Kaushik S, Shrivastava M. 
The Cybernetics Thought Collective: A History of Science 
and Technology Portal Project [White paper]. University 
of Illinois at Urbana-Champaign. 2019. https://www.
ideals.illinois.edu/handle/2142/106050.

https://digital.library.illinois.edu/items/3cd33c50-8c95-0138-729a-02d0d7bfd6e4-8
https://digital.library.illinois.edu/items/3cd33c50-8c95-0138-729a-02d0d7bfd6e4-8
https://doi.org/10.7551/mitpress/9780262035095.001.0001
https://doi.org/10.7551/mitpress/9780262035095.001.0001
http://www.asc-cybernetics.org/foundations/ASCGlossary.htm
http://www.asc-cybernetics.org/foundations/ASCGlossary.htm
https://digital.library.illinois.edu/items/3c80ad40-8c95-0138-729a-02d0d7bfd6e4-b
https://digital.library.illinois.edu/items/3c80ad40-8c95-0138-729a-02d0d7bfd6e4-b
https://www.ideals.illinois.edu/handle/2142/106050
https://www.ideals.illinois.edu/handle/2142/106050


Anderson: The Cybernetics Thought CollectiveArt. 7, page.  6 of 6

5. Archive of Digital Art. n.d. Roy Ascott. Retrieved May 
28, 2020, from https://www.digitalartarchive.at/data-
base/artists/general/artist/ascott.html.

6. Bird S, Loper E, Klein E. NLTK (Version 3.2.5) [Com-
puter software]. NLTK Documentation. 2017. https://
www.nltk.org/. 

7. Cognitive Computation Group. CogComp NLP Pipe-
line [Computer software]. GitHub. 2017. https://github.
com/CogComp/cogcomp-nlp/tree/master/pipeline. 

8. Dawson R. Bayesian Classifier [Computer software]. 
GitHub. 2016. https://github.com/codebox/bayesian-
classifier.

9. Dupuy J. On the Origins of Cognitive Science: The Mech-
anization of the Mind. 2009. The MIT Press.

10. Esteva M. The Aleph in the Archive: Appraisal and Pres-
ervation of a Natural Electronic Archive [Doctoral dis-
sertation, University of Texas at Austin]. Texas Schol-
arWorks. 2008. https://repositories.lib.utexas.edu/
handle/2152/3840.

11. Fleck L. Genesis and Development of a Scientific Fact. 
Trenn TJ, Merton RK (eds.). 1979. University of Chicago 
Press. (Original work published 1935).

12. Franchi S, Guezeldere G,  Minch E. Interview with 
Heinz von Foerster. Stanford Humanities Review. 1995; 
4(2): 288–307.

13. Fuller M. Media Ecologies: Materialist Energies in Art 
and Technoculture. 2007. The MIT Press.

14. Frank E, Hall MA, Witten IH. Weka (Version 3.8) 
[Computer software]. University of Waikato. 2016. htt-
ps://www.cs.waikato.ac.nz/ml/weka/.  

15. Grant R. Recordkeeping and Research Data Man-
agement: A Review of Perspectives. Records Manage-
ment Journal. 2017; 27(2): 159–174. DOI: https://doi.
org/10.1108/RMJ-10-2016-0036.

16. Hamilton K. (n.d.). BCL/IGB Mural. KevinHamilton.
org. http://www.kevinhamilton.org/bcl_igb/.

17. Han S. Googletrans (Version 2.1.4) [Computer soft-
ware]. Python Package Index. 2017. https://pypi.org/
project/googletrans/. 

18. Harris G, Potter A, Zwaard K. Digital Scholarship at 
the Library of Congress. Library of Congress. 2020. htt-
ps://labs.loc.gov/static/labs/work/reports/DHWork-
ingGroupPaper-v1.0.pdf. 

19. Hutto CJ, Gilbert EE. VADER Sentiment Analysis 
[Computer software]. GitHub. 2014. https://github.
com/cjhutto/vaderSentiment. 

20. Jeong J. PDFMiner [Computer software]. GitHub. 
2016. https://github.com/jaepil/pdfminer3k. 

21. Kline RR. The Cybernetics Moment: Or Why We Call our 
Age the Information Age. 2015. The MIT Press.

22. Koster L, Woutersen-Windhouwer S. FAIR Principles 
for Library, Archive and Museum Collections: A Proposal 
for Standards for Reusable Collections. Code4Lib. 2018; 
48. https://journal.code4lib.org/articles/13427.

23. Lemieux VL. Toward a ‘Third-Order’ Archival Interface: 
Research Notes on Some Theoretical and Practical Im-
plications of Visual Explorations in the Canadian Con-
text of Financial Electronic Records. Archivaria. 2014; 
78: 53–93. https://archivaria.ca/index.php/archiv-
aria/article/view/13721. 

24. Montani I. spaCy (Version 2.0) [Computer software]. 
GitHub. 2017. https://github.com/explosion/spaCy. 

25. Principia Cybernetica Web. Web Dictionary of Cy-
bernetics and Systems. 2002. Retrieved May 28, 2020, 
from http://pespmc1.vub.ac.be/ASC/INDEXASC.html.

26. Rimkus K, Habing T. Medusa at the University of Illi-
nois at Urbana-Champaign: A Digital Preservation Ser-
vice Based on PREMIS. Proceedings of the 13th ACM/
IEEE-CS Joint Conference on Digital Libraries. 2013. 
DOI: https://doi.org/10.1145/2467696.2467725.

27. Sato K. From AI to Cybernetics. AI & Society. 1991; 5: 
155–161. DOI: https://doi.org/10.1007/BF01891721.

28. Sinclair S, Rockwell G. Voyant Tools [Computer soft-
ware]. 2020. https://voyant-tools.org/. 

29. University of Illinois Archives. Cybernetics Thought 
Collective Project. 2019. Retrieved May 27, 2020, from 
https://archives.library.illinois.edu/thought-collective/.

30. University of Illinois Archives. cybernetics-thought-
collective [Computer software]. GitHub. 2018. https://
github.com/cybernetics-thought-collective. 

31. University of Illinois Library. Medusa. 2020. Re-
trieved October 8, 2020, from https://medusa.library.
illinois.edu/static_pages/technology. 

32. Von Foerster H. (Ed.). Cybernetics of Cybernetics, or the 
Control of Control and the Communication of Communi-
cation. 1974. Biological Computer Laboratory.

33. Von Foerster H. On Constructing a Reality (Report No. 
234). BCL Publication, University of Illinois at Urbana-
Champaign. 1973. https://digital.library.illinois.edu/
items/3f260d50-29ac-0136-4d81-0050569601ca-0. 

34. Wiener N. Cybernetics: Or, Control and Communica-
tion in the Animal and the Machine. J. Wiley. 1948.

35. Wolfram (n.d.). Text Analysis [Computer software]. 
Wolfram Language and System Documentation Cent-
er. https://reference.wolfram.com/language/guide/
TextAnalysis.html. 

36. Yeo G. Bringing Things Together: Aggregate Re-
cords in a Digital Age. Archivaria. 2012; 74: 43–19. 
https://archivaria.ca/index.php/archivaria/article/
view/13407.

How to cite this article: Anderson BG 2020 The Cybernetics Thought Collective: Machine-Generated Data Using 
Computational Methods. Journal of Open Humanities Data, 6: 7. DOI: https://doi.org/10.5334/johd.19

Published: 27 October 2020

Copyright: © 2020 The Author(s). This is an open-access article distributed under the terms of the Creative Commons 
Attribution 4.0 Unported License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, 
provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/.

Journal of Open Humanities Data is a peer-reviewed open access journal published by Ubiquity 
Press OPEN ACCESS

https://www.digitalartarchive.at/database/artists/general/artist/ascott.html
https://www.digitalartarchive.at/database/artists/general/artist/ascott.html
https://www.nltk.org/
https://www.nltk.org/
https://github.com/CogComp/cogcomp-nlp/tree/master/pipeline
https://github.com/CogComp/cogcomp-nlp/tree/master/pipeline
https://github.com/codebox/bayesian-classifier
https://github.com/codebox/bayesian-classifier
https://repositories.lib.utexas.edu/handle/2152/3840
https://repositories.lib.utexas.edu/handle/2152/3840
https://www.cs.waikato.ac.nz/ml/weka/
https://www.cs.waikato.ac.nz/ml/weka/
https://doi.org/10.1108/RMJ-10-2016-0036
https://doi.org/10.1108/RMJ-10-2016-0036
http://KevinHamilton.org
http://KevinHamilton.org
http://www.kevinhamilton.org/bcl_igb/
https://pypi.org/project/googletrans/
https://pypi.org/project/googletrans/
https://labs.loc.gov/static/labs/work/reports/DHWorkingGroupPaper-v1.0.pdf
https://labs.loc.gov/static/labs/work/reports/DHWorkingGroupPaper-v1.0.pdf
https://labs.loc.gov/static/labs/work/reports/DHWorkingGroupPaper-v1.0.pdf
https://github.com/cjhutto/vaderSentiment
https://github.com/cjhutto/vaderSentiment
https://github.com/jaepil/pdfminer3k
https://journal.code4lib.org/articles/13427
https://archivaria.ca/index.php/archivaria/article/view/13721
https://archivaria.ca/index.php/archivaria/article/view/13721
https://github.com/explosion/spaCy
http://pespmc1.vub.ac.be/ASC/INDEXASC.html
https://doi.org/10.1145/2467696.2467725
https://doi.org/10.1007/BF01891721
https://voyant-tools.org/
https://archives.library.illinois.edu/thought-collective/
https://github.com/cybernetics-thought-collective
https://github.com/cybernetics-thought-collective
https://medusa.library.illinois.edu/static_pages/technology
https://medusa.library.illinois.edu/static_pages/technology
https://digital.library.illinois.edu/items/3f260d50-29ac-0136-4d81-0050569601ca-0
https://digital.library.illinois.edu/items/3f260d50-29ac-0136-4d81-0050569601ca-0
https://reference.wolfram.com/language/guide/TextAnalysis.html
https://reference.wolfram.com/language/guide/TextAnalysis.html
https://archivaria.ca/index.php/archivaria/article/view/13407
https://archivaria.ca/index.php/archivaria/article/view/13407
https://doi.org/10.5334/johd.19
http://creativecommons.org/licenses/by/4.0/

	(1) Overview 
	Repository location
	Context 

	(2) Methods 
	Steps 
	Digitization and OCR 
	Normalization and Input Creation 
	Entity Extraction, Natural Language Processing, and Classification  
	Text Processing and Remediation Pipeline 
	Preservation and Access 

	Sampling strategy 
	Quality Control 
	Assessment 

	(3) Dataset description 
	Object name 
	Format names and versions 
	Creation dates 
	Dataset Creators 
	Language 
	License 
	Repository name  
	Publication date 

	(4) Reuse potential 
	Additional File 
	Acknowledgements 
	Competing Interests 
	Author Contributions
	References 
	Figure 1