id author title date pages extension mime words sentences flesch summary cache txt work_k2iffxium5g75pmy23wvbgl3pm William Y. Arms EScience in Practice 2009.0 .htm text/html 5042 413 63 In this article we describe our experience in developing the Cornell Web Lab, a large-scale framework for eScience based on the collections of the Internet Archive, and discuss the lessons that we have learned in doing so. As an example of how research interests change, when we began work on the Web Lab, we interviewed fifteen people to find how they might use the collections and services. The full pages of the four complete crawls are being loaded onto the cluster, together with several large sets of link data extracted from the Web Lab database. In the Web Lab, tasks that need high levels of expertise include the transfer of data from the Internet Archive to Cornell, extraction of metadata, removal of duplicates, the construction of the relational database, and the tools for extracting groups of pages from complete web crawls. ./cache/work_k2iffxium5g75pmy23wvbgl3pm.htm ./txt/work_k2iffxium5g75pmy23wvbgl3pm.txt