LLamaIndex Sketches This directory contains a set of files used to: 1) index content, and 2) query the index. The files are sketches against a Python framework called LLamaIndex, which is designed to exploit large-language models. [1] In the end, this directory is intended to demonstrate yet another way to model text for the purposes of use & understanding. Setup First, to set up the files in this directory, you MUST have a OpenAI API key. I'm not sure, but I think you can get one for free. First, from the OpenAI home page create an account, and then generate a key. [2, 3] The resulting key will look something like this: sk-Y8bnK8DasdfygmJdEDMLTtWABwzBCUX Next, you MUST create an environment variable called OPENAI_API_KEY containing the value of your... key. On Linux/Macintosh computers you can enter the following command from the terminal: export OPENAI_API_KEY=sk-Y8bnK8DasdfygmJdEDMLTtWABwzBCUX On Windows you can do the same thing like this, I think: set OPENAI_API_KEY=sk-Y8bnK8DasdfygmJdEDMLTtWABwzBCUX Index To interact the files you will need to place content in the folder/directory called "corpus". This distribution comes pre-installed with the chapters from Jane Austen's Emma. Alternatively, you can put your own plain text or PDF files in to the corpus directory. Next, you need to actually index the files; run ./bin/index.py. Initially the program will probably croak because you have not installed the necessary Python modules. Do so now, and repeat this step. If all goes well zero output will be returned, and a bunch o' files will appear in the folder/directory called "index". These files include snippets of the original content as well as vectors ("embeddings") denoting where those snippets exist in an n-dimensional space. Chat Once content has been indexed, you can interact with the index; run ./bin/chat.py. The index files will be loaded into RAM, and you will be prompted to enter a query. You can enter anything, but try to enter detailed and grammatically correct sentences/questions. If all goes well, the system will return responses. Repeat this step using words and/or phrases found in the responses. Repeat this whole process for about five minutes. Such will exercise your brain, and I'm sure you will go away with a better understanding of the indexed corpus. Compare this process to the process of traditional reading; if you were to spend five minutes using the traditional reading process against the whole of Emma, do you believe you will acquire the same level of understanding? Don't get me wrong; traditional reading and distant reading complement each other. Example Here is an example interaction with the system: $ ./bin/index.py $ ./bin/chat.py Ask anything: What are the over-arching themes in the story? The over-arching themes in the story include chance encounters and unexpected connections, the power of first impressions, and the complexities of human emotions. Ask anything: Tell me about the human emotions. The characters in the given context go through a range of human emotions. These emotions include surprise, disbelief, happiness, amusement, indifference, attachment, eagerness, regret, affection, tiredness, concern, reflection, doubt, envy, misery, and longing. The characters' emotional states are shaped by their situations and interactions with each other. Ask anything: What are the names of some of the people who are surprised, eager, indifferent, or concerned? In the given context, there are several people who display different emotions. These individuals include Mr. Weston, who is eager and concerned, Emma, who is indifferent, Mr. Woodhouse, who is concerned, Harriet, who is frightened, Frank Churchill, who is concerned and eager, Miss Bates, who is concerned, Mrs. Goddard, who is concerned, Mr. Knightley, who is concerned, and Miss Fairfax, who is indifferent. Ask anything: Tell me about Frank Churchill. Frank Churchill is mentioned in the given context. He is described as someone who has the ability to leave the Churchills and is often seen at various social events and locations. There are suggestions that he enjoys pleasure and is not always consistent in his actions. However, his behavior is also seen as a result of his circumstances and the expectations placed upon him. Overall, Frank Churchill is a character who is of interest to the other characters in the context. Ask anything: What are the relationships between Emma and Frank? Emma and Frank Churchill have a friendly relationship. They enjoy each other's company and have conversations about various topics. They are seen walking together and spending time together at social events. However, there is no indication of a romantic relationship between them in the given context. Ask anything: Is there a moral to the story, and if so, then what is it? The story in the given context does not explicitly convey a moral. Ask anything: ^C Okay, bye bye, and thank you. Extra credit For extra credit, delete all the files in the corpus directory and replace them with your own. For example, do some sort of literature search, download about a dozen PDF files from the results, save them in the corpus directory, and lastly, index them. Next, use the chat program to interact the newly created index. Alternatively, download a few files from Project Gutenberg -- say, all of the things written by Mark Twain -- and "read" them too. [4] Librarianship These sketches exploit some of the latest developments in natural language processing and machine learning, but more importantly, these sketches can be seen as an opportunity for librarianship. Here in libraries we collect, organize, archive, and disseminate data, information, and knowledge. But with the advent of computers and the Internet, such processes can be done sans libraries and librarians. Yet, libraries and librarians still have vast quantitates of data, information, and knowledge at their disposal. Moreover, the associated cataloging ("metadata") is all but indispensable when it comes to thorough analysis. Merely enabling people to search & get library content is not as useful as it once was. On the other hand, if libraries and librarians where to not only curate collection but provide services akin to the services in this sketch, then libraries and librarians would both evolve and enable people to comprehend the things they need to read to a greater degree. Links [1] LlamaIndex - https://www.llamaindex.ai [2] OpenAI home page - https://openai.com [3] generate an OpenAI API key - https://platform.openai.com/account/api-keys [4] Project Gutenberg - https://gutenberg.org -- Eric Lease Morgan Navari Family Center for Digital Scholarship University of Notre Dame January 2, 2024