Text and data mining (TDM) technologies read large amounts of digital data and are used to explore, dissect and understand texts. As technology advances and data is made available through projects such as The Old Bailey Online, readers are given more opportunity to conduct efficient research on open sets of data. (Michelle Brook, Peter Murray-Rust and Charles Oppenheim, 2014). Text analysis tools use quantitative data to conduct qualitative research, measuring the text to provide evidence for subjective observations.
The Old Bailey Online is a collection of all the court proceedings of the Old Bailey from 1674 to 1913, digitised to enable researchers to see the content of trials in-depth. Areas of interest might include the history of punishment, the justice system, or the casual family historian looking for a particular name; all able to implement data mining of the digital texts.
Considering how valuable textual analysis might be when approaching this kind of traditionally unstructured data, I conducted some searches in the collection. The Old Bailey API (below right) is a more concise version of the general Old Bailey search tool (below left).
The original search tool allows more freedom for free text searches. I searched for keyword: ‘insanity’, in murder cases, over the entire span of cases from 1674 to 1913. The list of results came back with links to individual cases, encouraging close reading of all relevant trials. The API differs in that it’s structured for more specific searches, with only one keyword field and more concise drop down menu options. The option to search by gender was useful; it enabled me to get more accurate results (searching for keyword: ‘woman’ in the general search wouldn’t have helped me much I’m sure).*
The results from the API search were much easier to work with. I was able to break down my results by keywords and interestingly saw that ‘pleasure’ was rated top alongside ‘murder’. I ‘drilled’ the word ‘pleasure’ and it refined the results instantly. The results came to me in a context which easily facilitated further textual analysis, and at this point I attempted to export the results to Voyant using the ‘send to Voyant’ option. This didn’t work however, with the 100 results I tried at first or with the 10 I tried the second time. The site didn’t seem to be able to handle exporting this way, but after saving the zip file using ‘Zip URL’ option I could then upload the file into Voyant separately.
I was surprised to find ‘pleasure’ wasn’t a prominent word in the word cloud, nor was ‘insanity’, despite them being listed as top keywords in my original results via the API. I went through the same process again with the next few sets of 10 trials from my Old Bailey results and found that the words ‘mother’ and ‘child’ featured heavily, which is telling. It seems as though the use of Voyant is still limited to smaller data sets, and therefore text analysis of part of the results from a collection like Old Bailey might not take into account enough data to create a fair, or useful, visualisation.
Qualitative research methods rely on interpretation, which is definitely needed with tools like Voyant, using unstructured data. Quantitative and qualitative data work together in these circumstances, with the former supporting the latter. Observations can be made over multiple sets of data, hunches followed and tones of individuals in the trials analysed, supported and facilitated by TDM technologies which depend on this kind of digitisation.
*Old Bailey Online Tweeted information about the gender search function: it does exist, but in the ‘custom search’ tool.