The criminals went to the place of execution in the following order, Morgan, Webb, and Wolf, in the first cart; Moore in a mourning coach; Wareham and Burk in the second cart; Tilley, Green, and Howell in the third; Lloyd on a sledge; on their arrival at Tyburn they were all put into one cart. They all behaved with seriousness and decency. Mary Green professed her innocence to the last moment of the fact for which she died, cleared Ann Basket, and accused the woman who lodged in the room where the fact was committed. As Judith Tilley appeared under terrible agonies, Mary Green applied herself to her, and said, do not be concerned at this death because it is shameful, for I hope God will have mercy upon our souls; Catharine Howell likewise appeared much dejected, trembled and was under very fearful apprehensions; all the rest seemed to observe an equal conduct, except Moore, who, when near dying, shed a flood of tears. In this manner they took their leave of this transitory life, and are gone to be disposed of as shall seem best pleasing to that all-wise Being who first gave them existence.*
In my research sources before I came to Sheffield, capital punishment appeared fairly infrequently, briefly and usually in the future tense: typically, the marginal note ‘suspendatur’ (abbreviated to sur’ or sr’), ‘to be hanged’. Even those terse notes of an event 300 years old, which quite possibly didn’t happen anyway (as many of those sentenced were reprieved), always disturbed me slightly.
I read the records of homicides and coroners’ inquests – murders, gruesome accidents, negligence and cruelty – and they are distressing and disturbing, yet they don’t evoke quite the same sense of culture shock as do the accounts of executions and ‘Last Dying Speeches’. We aren’t simply talking about the execution of murderers here: in the 18th century burglars, robbers, pickpockets, horse thieves, sheep- and cattle-rustlers, forgers and counterfeiters could all face slow, horrible deaths, in most cases public strangulation, and this was regarded by most people as perfectly normal and civilised. (Indeed, there were those who thought that hanging was not punishment enough.)
In my new job, I’ve spent some time reading Ordinary’s Accounts, which are one of the many sources we’re digitising. These are rich and fascinating sources, full of stories of the lives of common people. But they are also stories of death, and they give me the willies – not least because ordinary, decent, intelligent people in the 18th century had no problem with the idea of pickpockets, shoplifters, burglars, sheep rustlers, forgers and counterfeiters, receiving exactly the same punishment as murderers.
In fact, at the very beginning of the book she mentions some of the bemused reactions she received from people learning what her research topic was, including the gentleman who suggested that she should study “something pleasant, like great battles”. Continue reading Repost: Tyburn’s Martyrs
On the evening before execution, a respite of 14 days was brought for George Chippendale, and to be continued, if within that time he shall submit to suffer the amputation of a limb, in order to try the efficacy of a new-invented styptic for stopping the blood-vessels, instead of the present more painful practice in such cases. For this indulgence, he, together with his brother and his uncle, had joined in a petition to his Majesty, and thankfully accepted it, appearing in good health and spirits, ready and chearful to undergo the experiment. (Ordinary’s Account, May 1763.)
Well, I got at least one important thing wrong, anyway. It wasn’t George’s arm that was, er, on the block. It was his leg.
How do I know this? Well, by sheer chance, a few weeks after I posted that, I got an email query at work, from a family historian who was searching for a George Clippingdale in the Old Bailey Proceedings. The problem was that the OBP reporters (unlike most other sources the researcher had consulted) spelt his surname Chippendale. (Spelling variations are not an uncommon problem in 18th-century sources, as I’ve mentioned here before.)
So, we got that sorted out, and that would normally have been the end of it. But then the researcher happened to mention that his George was reprieved from a death sentence because a surgeon wanted to use him in an experiment.
At which point, I thought ‘Hang on a minute… that sounds familiar’, and came over here and checked my earlier post. And it’s the same man!
Naturally, of course, I had to write back with a barrage of questions. And the researcher was kind and generous enough to send me his write-up of everything he’d found out about George – and to agree to let me tell you lot about it.
The view from my seat at the DP data visualisation workshop
Yesterday, I went to All Souls College, Oxford, for a data visualisation workshop organised by the Digital Panopticon project.
The project – a collaboration between the Universities of Liverpool, Sheffield, Oxford, Sussex and Tasmania – is studying the lives of over 60,000 people sentenced at the Old Bailey between 1780 and 1875, to look at the impact of different penal punishments on their lives.
It aims to draw together genealogical, biometric and criminal justice datasets held by a variety of different organisations in Britain and Australia to create a searchable website that is aimed at anyone interested in criminal history – from genealogists to students and teachers, to academics.
This is a huge undertaking, and it is no wonder that the project aims to harness digital technologies in making the material accessible to a wide audience. But how could…
I’ll do more later and add a form for people to submit more blogs and so on, but I wanted to get the basics up and running this weekend. (If it sounds at all familiar, it’s because there has been a page there with that name for a while – but this is New New!)
I’m as guilty as anyone of holding on to my old research data (databases, transcriptions, abstracts, calendars, etc of primary sources), so this is pulling together some stuff to prod me into action this summer. I have material going right back to my MA thesis that I keep not getting round to sharing. My resolution is that I’m going to try to do better this year. This post is partly to help me (and maybe you too?) to get there.
A couple of thoughts first. I think there are two slightly different meanings of ‘messy’ to be teased apart here.
One is about errors and gaps in the content of the material – we all make mistakes in transcription, especially when we’re learning the ropes; we leave “???” where we couldn’t quite make out what the source said; etc. I’m not talking about cleaning all of that up to spare your embarrassment. You know you’ll never get round to sorting that crap out (especially if it would require going back to the archives); so it’s time to try to get over your shame and be prepared to share it warts and all (and include appropriate caveats).
I’m thinking more in terms of tidying the structure, format, labelling and documentation of the data. So, for example, someone else can import it into database without columns ending up all over the place, or with abbreviations and codes that no one but you can interpret.
Ensure it includes the archive references for the originals!
Add documentation that explains the dataset (and its limitations) accurately. If you’ve used codes in any fields on a spreadsheet or databse, for convenience in data entry, make sure you add a list of what they actually stand for (or use Find & Replace to change them to something more transparent!).
Be clear about what it represents (eg full transcriptions, partial transcriptions, just summaries, etc)
Ideally, write the documentation itself in a structured data format to make it machine-readable (“metadata“; also, this intro).
Convert it to a non-proprietary format (eg .csv text files); or use one that has good inter-operability for people who don’t have the software you used to create it. Eg, Excel spreadsheets (.xls, .xlsx) can be opened in quite a lot of different packages and converted to other formats quite easily, but Access databases (.mdb) are much more difficult to share. Plain text files (.txt) are preferable to Word docs.
This last point isn’t just about data sharing, by the way, but also about preservation for your own use in the longer term. Do you want to find yourself unable to access all that hard work you did in the archives in just a few years because a manufacturer stopped making that software you used, or the version of it that you were using isn’t supported any more?
And don’t forget, if you share data, you can get credit back too. (Make sure your documentation includes clear citation guidance…) Use a Creative Commons licence or something similar.
At the same time, this isn’t about trying to achieve perfection. (I don’t really advise going and plunging into the data management guidelines at the UK Data Service unless you have a lot of time on your hands…)
There are two resources I plan to try out (hey, I might even blog about the experience):
My academic apprenticeship, in Aberystwyth, was spent engrossed in two things: first, early modern Welsh and northern English crime archives, and second, the potential of the Internet for research and teaching and simply opening up early modern history to as many people as possible. That wasn’t a completely respectable interest back in 1999, and I’m still amazed sometimes that I’ve been able to spend I’ve spent the last 7 years indulging shamelessly in that obsession and get paid for it.
But what about the first of my obsessions? A couple of weeks ago, the Financial Times told us that more cranes have been erected in London in the past 3 years than everywhere else in the UK put together. I have a nagging worry that I’ve unwittingly contributed to a similar situation in the digital sphere.
I’ve found at least 300 scholarly publications citing OBO, so it’s certainly made its mark on academic research. Beyond academia, it’s directly generated family histories, novels, radio and TV dramas and documentaries. But what impact has it had on digitising crime history? 10 years on, vast swathes of our criminal records remain untouched by the digital. And while there has been large-scale digitisation of sources that crime historians use, not much of it is freely accessible, and little of it has been done by or for us.
A number of historians over the years have worried that OBO skews attention – and resources – disproportionately towards London and the higher courts, representing a tiny minority of prosecuted crimes and policing. As the digital historian and project manager, I’m thrilled to learn of young researchers who chose history of crime because of OBO. But my other half, the archives researcher, is more ambivalent.
Early modern court archives aren’t like our neatly packaged, readable trial reports. They’re unwieldy, often dirty, fragmentary, intimidating in overall scale. Documents vary hugely in size, structure, handwriting, materials used and condition, defying any ‘one size fits all’ approach to digitising. They’re frequently written in heavily abbreviated Latin, or ponderously legalese-d English, or an unholy mix of both.
Who would want to struggle with that if they can use something like OBO instead? Would I, if I were a PhD student now? And how much easier is it to turn to OBO for immediate digital rewards than to start new digitisation projects with such awkward and intractable material?
I was asked to introduce themes and challenges that I think are important for the future of digital crime history. So here’s the first challenge: improving digital access to documents like this, and the hundreds of thousands like them in our archives. A second challenge: as always, how to pay for it and sustain it in the long term. And a third is the digital skills we need: I don’t mean necessarily programming, but understanding something about various kinds of code, how to work with digital data, how to work with people who do programme.
And then there are two themes I want to emphasise, that can help us to face the challenges: the need to re-use, recycle and share digital content; and the importance of collaboration and partnerships.
I’ve blogged recently about the dual identity of the Proceedings; ideal for quantitative analysis, which needs a structured database; but also containing many rich, engaging witness narratives that demanded full text. The solution found in OBO’s case was to transcribe using a double-rekeying process that’s less accurate than traditional standards for scholarly editions, but far more accurate than OCR, and then mark up transcriptions with XML tags to create structure that can be extracted and turned into a database.
There are certainly downsides to this: time-consuming, expensive, unwieldy. [Both Tim and I agreed in the subsequent discussion that we wouldn’t try to do it quite like that today, though I’m not sure we’d be in complete agreement on exactly what we would do instead…]
But the upsides: accuracy, completeness, versatility.
Having created our digital data, it can manipulated and re-used in many ways. Convert it into other formats. Index it in different ways for different kinds of search. And even transform it with new markup for different purposes, as in Magnus Huber’s Old Bailey Corpus Online. There have been uses of OBO that no one could have predicted.
Bridget: Searching in London Lives, Connected Histories, 18th-Connect (are those other results the same person?); a dot somewhere in this graph from Datamining with Criminal Intent. Same data in four places: making new connections, seeing trials in different ways.
I’d argue there are two lessons from OBO for everyone, whatever kind of project or source they have in mind:
digitise in a way that best captures the information in a source;
& which facilitates future re-use and collaboration
Not the specifics of transcription or markup or any particular search engine. Given that many crime history sources are heavily formulaic, or in Latin before 1733, sometimes verbatim transcription can hide as much as it reveals, make it harder to find the useful stuff. Some – many – of our sources simply don’t have rich stories to tell like OBO.
Creating data that is clean and consistent, well-structured and accurately documented may cost more at the beginning, it may require more of an investment in technical skills and management, but it will make your efforts worth more in the long run.
What kind of collaborations and partnerships do we need? Let’s start with the vital one: the relationship between the historians and the keepers of the archival documents. Well, I admit I’m worried about that relationship. Here’s one reason why.
Why does this resource trouble me? It’s not just because it’s behind a paywall.
OK, in an ideal world, all these resources would be freely accessible to all. But I know all too well how expensive digitisation is. Someone has to pay; it’s just a question of how. The grim reality is that archives and libraries are under intense financial pressure and it’s only going to get worse: and that one of the few reliable paying audiences outside academia is family historians.
And findmypast have made a great, affordable, resource for family historians. But it’s a terribly limited one for crime historians. It’s a name search; as far as I can tell, no separate keyword search or browse. (If I’m wrong, there’s nothing telling me so.)* The needs and priorities of family historians and academics overlap, but they’re not close enough that creating resources that can serve them both well just happens.
Then you have, say, Eighteenth Century Collections Online or British Newspapers 1600-1900, which are designed more for academic audiences, but virtually inaccessible to individuals outside academic institutions, and those whose institutions can’t afford the price tag. And even then pretty much all you can do with those is keyword search and hope what you want isn’t lost in garbled OCR text.
Both kinds of resource are black boxes that make it impossible for a researcher to evaluate the quality of the data or search results; and hinder any kind of use other than those the platform was specifically built for. And if the data is locked away in a box it can never be corrected or improved or enhanced – even though the technology to enable that is continually developing. So publishers lose out, in the end, too.
Are there alternatives to the black box?
The Text Creation Partnership is funded by a group of libraries led by Michigan. It’s transcribing content from major commercial page-image digitised collections. The resulting texts are restricted to partnership members and resource subscribers for 4 or 5 years and then released into the public domain.
The images continue to be behind the paywall, and not all texts are transcribed. For ECCO the proportion is small, but EEBO’s goal is much more ambitious: one transcription for each unique text (usually first editions). In January 2015, 25,000 phase 1 EEBO texts will become available to everyone for search and to download for textmining or whatever else we can think of by then. (Phase 2 in ?2019 will be something like another 40,000 texts.)
It surely is not beyond our wit to translate that kind of public-private collaborative model to crime records [suggestion here], and for that matter, other archival records with overlapping academic/family history user groups. But to do so, I think we need to build partnerships between historians, archivists and publishers much more than we’ve been doing. And if what you want is totally free-to-access resources, you still need to work with archivists to find answers to the ‘who pays?’ question. I hope today can be a good starting place.
But it’s not enough simply to think about institutional collaboration.
A lot of smart people are thinking very hard about ways to facilitate collaborative user participation in digital resources – transcription, indexing, correction, tagging, annotation, linking, and they’re building tools whose usefulness often isn’t confined to volunteer projects and ‘crowd sourcing’. (The re-use maxim applies here too: don’t build from scratch if other people have already done the hard work building and testing good tools.)**
However, don’t imagine ‘the crowd’ is an easy option. OBO has been trying it out for a while and we’re only sort of getting there.
Part of our problem, on reflection, has been adding these things near the end of a project when we launch a website and then hope for something to magically happen while we go off to the next project.
A second issue has been user interfaces and design. We took a while to learn that we have to make participation easy, really easy, and we have to build the design in from early on. It’s no good building something that needs a separate login from the rest of the site, with a flaky user database that was tacked on as an afterthought. Third, and related: understand the limits of what most users are willing to do.
Two years ago, we added to OBO a simple form for registered users to report errors: one click on the trial page itself. People have been using that form not simply to report errors in our transcriptions but to add information and tell us about mistakes in the original. The desire on the part of our site users to contribute what they know is there. Just don’t think that means there’s a ready-made ‘crowd’ waiting to turn up and help you out without plenty of effort on your part.
Historians and our dirty data
And what about us, the historians who have been or are working in the archives? We’re all digitizers now, and have been for a long time. Well, sort of. My computer has folders of databases, transcriptions and (to use the technical term) “stuff” that is kept from public view because, well, it’s a mess and I never get around to the data cleaning needed and it would be embarrasing to let people see my mistakes. I’m sure I’m not alone. [There were nods and sheepish grins all round at this point. You know who you are.]
Increasingly in future, there are going to be requirements from funders to share research data in institutional repositories and the like. We should not be assuming that means just scientists! We shouldn’t in any case be doing this just because someone demanded it; it should become a habit, the right thing to do to help each other.
But we need to get the right training in digital skills for students, so they know how to make good, shareable data, and how best to re-use data shared by others. (Full disclosure: I’m working on a project to create an online data management course for historians at the moment…)
Digitisation, digital history and re-usability don’t have to be all about big funded projects. It can start with personal decisions and actions: clean up your old data, put it in your institutional repository, share it with a Creative Commons licence, tell your colleagues and students it’s there. Relinquish some control. [more thoughts on this]
If we digitise for re-use, and re-use to digitise, we can share and collaborate, and build partnerships that can make some of the challenges of digitisation less intimidating. Digital history should be an iterative, accumulative, learning process rather than one-off ‘projects’ to be ‘launched’ and then left to gather dust.
* The findmypast resource has only gone up recently and the content isn’t complete yet. Keyword search functionality is apparently supposed to be included in the resource and it’s possible that will become available as it rolls out. But it should be noted that even a keyword search is unlikely to fulfil the needs that crime historians often have to crunch numbers in complex ways.
** The tools and projects in this slide really are a tiny sample of what’s happening now. They are:
The Digging into Data challenge is an international grant competition (UK, US, Canada), which announced its first eight winners yesterday.
What is the “challenge” we speak of? The idea behind the Digging into Data Challenge is to answer the question “what do you do with a million books?” Or a million pages of newspaper? Or a million photographs of artwork? That is, how does the notion of scale affect humanities and social science research? Now that scholars have access to huge repositories of digitized data — far more than they could read in a lifetime — what does that mean for research?
The most exciting bit for historians of crime and fans of the Old Bailey Proceedings Online, and of Zotero and TAPoR, is that the Old Bailey Online is one of the eight:
Using Zotero and TAPoR on the Old Bailey Proceedings: Data Mining with Criminal Intent
Awardees: Dan Cohen, George Mason University, NEH; Tim Hitchcock, University of Hertfordshire, JISC; Geoffrey Rockwell, University of Alberta, SSHRC.
Additional Key Participants: The National Archives (United Kingdom), McMaster University, the Open University, Amherst College, University of Sheffield, Trent University, and the University of Western Ontario.
Description: This project will create an intellectual exemplar for the role of data mining in an important historical discipline – the history of crime – and illustrate how the tools of digital humanities can be used to wrest new knowledge from one of the largest humanities data sets currently available: the Old Bailey Online.