Project Title | Commonplace Cultures: Mining Shared Passages in the 18th Century using Sequence Alignment and Visual Analytics |
Project Website | www.oerc.ox.ac.uk/projects/commonplace-cultures |
Start Date | 1 April 2014 |
End Date | 31 March 2015 |
UK Project Manager | Professor Min Chen University of Oxford Oxford e-Research Centre 01865 610633 min.chen@oerc.ox.ac.uk |
Project Team | Dr. Alfie Abdul-Rahman University of Oxford Oxford e-Research Centre alfie.abdulrahman@oerc.ox.ac.uk Professor Nicolas Cronk University of Oxford Faculty of Medieval and Modern Languages nicholas.cronk@voltaire.ox.ac.uk |
Lead Institution | University of Oxford in the UK (University of Chicago in the USA) |
Project Partners | University of Chicago |
Project Plan | http://repository.jisc.ac.uk/5650/ |
Progress Report | http://repository.jisc.ac.uk/6047/ |
Summary
Recent scholarship has demonstrated that the various practices associated with Early Modern commonplacing – the extraction and organisation of quotations and other passages for later recall and reuse – were highly effective strategies for dealing with the perceived “information overload” of the period. But, the 18th century was also a crucial moment in the modern construction of a new sense of self-identity.
Our goal is to examine this paradigm shift in 18th-century culture from the perspective of commonplaces and their textual and historical deployment in the contexts of collecting, reading, writing, classifying, and learning. These practices allowed individuals to master a collective literary culture through the art of commonplacing, a nexus of intertextual activities that we aim to explore through the concerted application of sequence alignment algorithms for shared passage detection and large-scale visual analytics on the largest collection of 18th-century works ever assembled.
However identifying commonplace passages among hundreds of documents is non-trivial intellectual undertaking. Technical challenges arise not only with the size of data, but also with the multi‐faceted nature of computational rules designed to capture the diverse aspects associated with commonplaces. These aspects may include, but are not limited to, word occurrence and co‐occurrence, grammatical structure, language, temporal context, geographical context, book genres, and styles of writing. In addition, to compile a global commonplace book computationally, one needs efficient and effective means for identifying appropriate rules, determining appropriate parameters, verifying text mining results, and, most importantly, gaining an understanding of the relationship between different algorithmic choices (rules and parameters) and the quality of the corresponding text mining results (e.g., in terms of recall and precision). In this project, we will step beyond conventional text alignment techniques for identifying commonplaces by adopting the methodology of visual analytics, which supports complex analytical tasks through an iterative and integrated process of automated text mining, multivariate and model visualization, and human-computer interaction.
Objectives
- to develop progressively an online database of 18th-century commonplaces, which will become a public‐domain resource for scholars of literary and cultural history;
- to equip the database with visualization capabilities that enable users to explore the relationships between different commonplace groupings (e.g., authors, books, periods, languages, etc.) as well as to encourage users to report errors and suggest alternative text mining rules through scholarly crowdsourcing;
- to provide a visual analytics system for creating and collecting new text mining rules, analyzing their performance in different contexts, studying different configurations of composite rules, and managing the progressive development of the commonplace database.
Anticipated Outputs and Outcomes
- The final project report(s) to Jisc and DiD3 as required.
- A minimal of two paper submissions to publication venues in digital humanities and visualization.
- Web-based software for general dissemination to a broad aufdience.
- Integrated software for large-scale text mining using the HPC infrastructure at Chicago.
- Long-term collaboration partnership between the two teams.