Old Bailey Voices: gender, speech and outcomes in the Old Bailey, part 1

The Old Bailey Voices data is the result of work I’ve done for the Voices of Authority research theme for the Digital Panopticon project. This will be the first of a few blog posts in which I start to dig deeper into the data. First I’ll review the general trends in trials, verdicts and speech, and then I’ll look a bit more closely at defendants’ gender. …

Posted at In Her Mind’s Eye

Defendants’ voices and silences in the Old Bailey courtroom, 1781-1880

This is a version of the paper I gave at the Digital Panopticon launch conference at Liverpool in September 2017.

In the interests of fostering reproducible research in the humanities, I’ve put all the data and R code underlying this paper online on Github – details of where to find them are at the end.

Defendant speech and verdicts in the Old Bailey

Defendants’ voices are at the heart of the Digital Panopticon Voices of Authority research theme I’ve been working on with Tim Hitchcock. We know that defendants were speaking less in Old Bailey Online trials as the 19th century went on; we’ve tended to put this in the context of growing bureaucratisation and the rise of plea bargaining.

I want to think about it slightly differently in this paper though. The graph above compares conviction/acquittal for defendants who spoke and those who remained silent, in trials containing direct speech between 1781 and 1880. It suggests that for defendants themselves, their voices were a liability. This won’t surprise those who’ve read historians’ depiction of the plight that defendants found themselves in 18th-century courtrooms without defence lawyers, in the “Accused Speaks” model of the criminal trial (eg Langbein, Beattie).

But this isn’t a story of bureaucrats silencing defendants (or lawyers riding in to the rescue). I want to suggest that, once defendants had alternatives to speaking for themselves (ie, representation by lawyers and/or plea bargaining), they made the choice to fall silent because it was often in their best interests.

About the “Old Bailey Voices” Data

  • Brings together Old Bailey Online and Old Bailey Corpus (with some additional tagging, explained in more detail in the documentation on Github)
  • Combines linguistic tagging (direct speech, speaker roles) and structured trials tagging (including verdicts and sentences)
  • Single defendant trials only, 1781-1880
  • 20700 trials in 227 OBO sessions
  • 15850 of the trials contain first-person speech tagged by OBC

The Old Bailey Corpus, created by Magnus Huber, enhanced a large sample of the OBP 1720-1913 for linguistic analysis, including tagging of direct speech and tagging about speakers. [In total: 407 Proceedings, ca. 14 million spoken words, ca. 750,000 spoken words/decade.]

Trials with multiple defendants have been excluded from the dataset because of the added complexity of matching the right speaker to utterances (and they aren’t always individually named in any case). [But of course this begs the question of whether the dynamics and outcomes of multi-defendant trials might be different…]

Trial outcomes have also been simplified; if there are multiple verdicts or sentences only the most “serious” is retained. Also, for this paper I include only trials ending in guilty/not guilty verdicts, omitting a handful of ‘special verdicts’ etc.

Caveat!

Working assumption is that nearly all silent defendants do have a lawyer and the majority of defendants who speak, don’t.

Sometimes, especially in early decades, defendants had a lawyer and also spoke. Unfortunately, the OBC tagging doesn’t distinguish between prosecution and defence lawyers, and not all lawyer speech was actually reported.

But, more seriously, is it safe to assume that ‘silent’ defendants were really silent? Occasionally defendant speech was actually censored in the Proceedings (in trials where other speech was reported), eg a man on trial for seditious libel in 1822 whose defence “was of such a nature as to shock the ears of every person present, and is of course unfit for publication”. But that was a very unusual, political, case. (See t18220522-82 and Google Books, Trial of Humphrey Boyle)

[However, it was suggested in questions after the presentation that maybe the issue isn’t so much total censorship as in the case above, but that the words of convicted defendants might be more likely to be partially censored, which would problematise analyses that centre on extent and content of their words. This could be a particular problem in 1780s and 1790s; maybe less so later on.]

So work to be done here – eg, look at trials with alternative reports specifically to consider defendants’ words.

Distribution of trials by decade 1781-1880

Start with some broad context.

The number of cases peaked during the 1840s and dramatically fell in the 1850s. (Following the Criminal Justice Act 1855, many simple larceny cases were transferred to magistrates’ courts.)

Percentage of trials containing speech, annually

Percentage climbs from 1780s (in 1778 Proceedings became near-official record of court), peaks early 19th c and then after major criminal justice reforms of late 1820s swept away most of the Bloody Code, shown by red line, substantial fall in proportion of trials containing speech.

This was primarily due to increase in guilty pleas, which were previously rare. After the reforms, 2/3 of trials without speech are guilty pleas.

Conviction rates annually, including guilty pleas

(Ignore the spike around 1792, due to censorship of acquittals.) Gradual increase in conviction rates which declines again after mid 19th c.

But if we exclude guilty pleas and look only at jury trials, the pattern is rather different.

Conviction rates annually, excluding guilty pleas

Conviction rates in jury trials after the 1820s rapidly decrease – not much over 60% by end of 1870s. That’s much closer to 18th-century conviction rates (when nearly all defendants pleaded not guilty), in spite of all the transformations inside and outside the courtroom in between.

Percentage of trials in which the defendant speaks, annually

Here the green line is the Prisoners’ Counsel Act of 1836, which afforded all prisoners the right to full legal representation. But the smoothed trend line indicates that it had no significant impact on defendant speech. Defendants had, at the judge’s discretion, been permitted defence counsel to examine and cross-examine witnesses since the 1730s.Legal historians emphasise the transformative effect of the Act; but from defendants’ point of view it seems less important; for them it was already a done deal and the Bloody Code reforms were much more significant.

Defendant speech/silence and verdicts, by decade

This breaks down the first graph by decade – shows that the general pattern is consistent throughout period, though exact % and proportions do vary.

Defendant speech/silence/guilty pleas and sentences

Moreover, harsher outcomes for defendants who speak continues into sentencing. Pleading guilty (though bear in mind this only really applies to c.1830-1880, whereas silent/speaks bars are for whole period) most likely to result in imprisonment, much less likely to receive transportation (and hardly ever death) sentence. Defendants who speak are the most likely to face tougher sentences – death or transportation, more so than the silent.

(Don’t yet have actual punishments – the next big job is getting the linked Digital Panopticon life archives…)

Defendant word counts (all words spoken in a trial)

How much did defendants say? Not a lot. The largest single group of defendants is the silent (ie, 0 words). But even those who spoke usually didn’t say very much. [average overall was 55 words] Eloquent, articulate defendants few and far between!

Defendant word counts and verdicts

So if you did speak, it was better to say plenty!? Or in other words, more articulate defendants had a better chance of acquittal (though they were still slightly worse off than the silent).

Defences: average word counts and verdicts

Finish with focus on defendants’ defence statements – made by nearly all defendants who spoke and for the majority the only thing they did say (a minority questioned witnesses or made statements at other points in the trial).

overall word counts of defence statements * guilty (n=7696) average wc 44.97 * notguilty (n=1414) average wc 65.15

On average, defence statements by the acquitted were longer. Again highlights that more articulate defendants do better.

Also, there is more variety (less repetition) in the statements of acquitted defendants. 98% (1374) of their 1414 defence statements are unique (crudely measured, as text strings). Whereas 93.17% (7170) of statements by convicted defendants are unique.

Start to look more closely at what they say? Not possible yet to investigate in depth, but use some simple linguistic measures.

Defences: Words least associated with acquittal

mercy
picked
man
i
distress
carry
along
them
beg
stop
up
young

In linguistics, keywords are “items of unusual frequency in comparison with a reference corpus”. Compared the larger set of defence statements by defendants who were convicted with defence statements by defendants who were acquitted

Table above is the words least likely to be associated with acquittal – ie, the least successful defence statements…

I want to highlight:

  • mercy + beg
  • picked (+ carry might be related)
  • i
  • distress

Remember that many defence statements were not really ‘defences’; they were more of an appeal to the judges’ clemency after sentencing (‘I beg for mercy’) or claiming extenuating circumstances (‘I was in distress’) in particular. Also Playing down offence – ‘I picked up the things’.

And in general many short bare statements beginning with “I” rather than more complex narratives.

Four hopeless short defences

So I picked four of the most frequent short (non-)defences that are heavily associated with convictions, to explore a bit further. (excludes use of any of these within longer defences)

defence frequency % convicted
nothing to say 109 98.17
mercy 125 98.40
picked up/found 223 93.72
distress 82 97.56

Main variants:

  • I have nothing to say
  • I beg for mercy/leave it to the mercy of the court/throw myself on the mercy of the court
  • I picked it (them) up/found it
  • I was in (great) distress/I was distressed/I did it through distress

The next four graphs show the percentage of defendant speakers who use each phrase in short defence statements in each decade.

I have nothing to say

This was very popular before 1810s – peaks at use by 4% of defendants who speak in decade 1801-10 and then rapidly disappears.

I beg for mercy/leave to the mercy of the court

Slightly later popularity – slower decline after 1810s

I picked it up/found it

Less dramatic decline after 1820s.

I was in distress/did it through distress

Curious that this doesn’t appear at all in 1780s; peaks 1810s.

Conclusions

So there are variations in timing/speed of decline, but broadly, these hopeless ’non-’defence statements, which are almost certain to be followed by conviction, are all declining in use and rarely heard in the courtroom after the 1820s. That fits, it seems to me, with both the gradual decline in defendant speech and the more rapid rise from the late 1820s of plea bargaining.

First, the defence lawyer option meant that defendants were better off finding the money for a lawyer who could try to undermine the prosecution case through aggressively examining witnesses. This was happening from the 1780s onwards.

And second, the plea bargaining option from the late 1820s meant that if defendants really had no viable defence, had been caught red-handed, they were better off pleading guilty in return for a less harsh punishment.

And so: for defendants who wanted to walk free or at least lessen their punishment, if not for later historians trying to hear their voices and understand what made them tick, silence was golden.

More stuff

Repost: Tyburn’s Martyrs

[Originally posted here, November 2007]

The criminals went to the place of execution in the following order, Morgan, Webb, and Wolf, in the first cart; Moore in a mourning coach; Wareham and Burk in the second cart; Tilley, Green, and Howell in the third; Lloyd on a sledge; on their arrival at Tyburn they were all put into one cart. They all behaved with seriousness and decency. Mary Green professed her innocence to the last moment of the fact for which she died, cleared Ann Basket, and accused the woman who lodged in the room where the fact was committed. As Judith Tilley appeared under terrible agonies, Mary Green applied herself to her, and said, do not be concerned at this death because it is shameful, for I hope God will have mercy upon our souls; Catharine Howell likewise appeared much dejected, trembled and was under very fearful apprehensions; all the rest seemed to observe an equal conduct, except Moore, who, when near dying, shed a flood of tears. In this manner they took their leave of this transitory life, and are gone to be disposed of as shall seem best pleasing to that all-wise Being who first gave them existence.*

In my research sources before I came to Sheffield, capital punishment appeared fairly infrequently, briefly and usually in the future tense: typically, the marginal note ‘suspendatur’ (abbreviated to sur’ or sr’), ‘to be hanged’. Even those terse notes of an event 300 years old, which quite possibly didn’t happen anyway (as many of those sentenced were reprieved), always disturbed me slightly.

I read the records of homicides and coroners’ inquests – murders, gruesome accidents, negligence and cruelty – and they are distressing and disturbing, yet they don’t evoke quite the same sense of culture shock as do the accounts of executions and ‘Last Dying Speeches’. We aren’t simply talking about the execution of murderers here: in the 18th century burglars, robbers, pickpockets, horse thieves, sheep- and cattle-rustlers, forgers and counterfeiters could all face slow, horrible deaths, in most cases public strangulation, and this was regarded by most people as perfectly normal and civilised. (Indeed, there were those who thought that hanging was not punishment enough.)

In my new job, I’ve spent some time reading Ordinary’s Accounts, which are one of the many sources we’re digitising. These are rich and fascinating sources, full of stories of the lives of common people. But they are also stories of death, and they give me the willies – not least because ordinary, decent, intelligent people in the 18th century had no problem with the idea of pickpockets, shoplifters, burglars, sheep rustlers, forgers and counterfeiters, receiving exactly the same punishment as murderers.

So, I wasn’t quite sure what to make of Andrea McKenzie, since she has written an entire, densely detailed book about the subject and the source: Tyburn’s Martyrs: Execution in England 1675-1775. She must be a tougher soul than me.

In fact, at the very beginning of the book she mentions some of the bemused reactions she received from people learning what her research topic was, including the gentleman who suggested that she should study “something pleasant, like great battles”. Continue reading Repost: Tyburn’s Martyrs

Repost: George’s choice: an 18th-century convict and a medical experiment

Originally posted here (February 2008)

Last November, I dashed off a quick post about someone I’d encountered in an Ordinary’s Account: It’s Your Neck or Your Arm

On the evening before execution, a respite of 14 days was brought for George Chippendale, and to be continued, if within that time he shall submit to suffer the amputation of a limb, in order to try the efficacy of a new-invented styptic for stopping the blood-vessels, instead of the present more painful practice in such cases. For this indulgence, he, together with his brother and his uncle, had joined in a petition to his Majesty, and thankfully accepted it, appearing in good health and spirits, ready and chearful to undergo the experiment. (Ordinary’s Account, May 1763.)

Well, I got at least one important thing wrong, anyway. It wasn’t George’s arm that was, er, on the block. It was his leg.

How do I know this? Well, by sheer chance, a few weeks after I posted that, I got an email query at work, from a family historian who was searching for a George Clippingdale in the Old Bailey Proceedings. The problem was that the OBP reporters (unlike most other sources the researcher had consulted) spelt his surname Chippendale. (Spelling variations are not an uncommon problem in 18th-century sources, as I’ve mentioned here before.)

So, we got that sorted out, and that would normally have been the end of it. But then the researcher happened to mention that his George was reprieved from a death sentence because a surgeon wanted to use him in an experiment.

At which point, I thought ‘Hang on a minute… that sounds familiar’, and came over here and checked my earlier post. And it’s the same man!

Naturally, of course, I had to write back with a barrage of questions. And the researcher was kind and generous enough to send me his write-up of everything he’d found out about George – and to agree to let me tell you lot about it.

(But I warn you, there’s a sad ending.)

Continue reading Repost: George’s choice: an 18th-century convict and a medical experiment

Data And The Digital Panopticon

Criminal Historian

The view from my seat at the DP data visualisation workshop The view from my seat at the DP data visualisation workshop

Yesterday, I went to All Souls College, Oxford, for a data visualisation workshop organised by the Digital Panopticon project.

The project – a collaboration between the Universities of Liverpool, Sheffield, Oxford, Sussex and Tasmania – is studying the lives of over 60,000 people sentenced at the Old Bailey between 1780 and 1875, to look at the impact of different penal punishments on their lives.

It aims to draw together genealogical, biometric and criminal justice datasets held by a variety of different organisations in Britain and Australia to create a searchable website that is aimed at anyone interested in criminal history – from genealogists to students and teachers, to academics.

This is a huge undertaking, and it is no wonder that the project aims to harness digital technologies in making the material accessible to a wide audience. But how could…

View original post 530 more words

New project, new people: the Digital Panopticon

Starting a new project is exciting and intensely busy (which is also my excuse for taking a month to blog about it). And the Digital Panopticon is the biggest one we’ve done yet.

‘The Digital Panopticon: The Global Impact of London Punishments, 1780-1925’ is a four-year international project that will use digital technologies to bring together existing and new genealogical, biometric and criminal justice datasets held by different organisations in the UK and Australia in order to explore the impact of the different types of penal punishments on the lives of 66,000 people sentenced at The Old Bailey between 1780 and 1925 and create a searchable website.

The Panopticon, for anyone who doesn’t know, was a model prison proposed by the philosopher Jeremy Bentham (1748-1832): “a round-the-clock surveillance machine” in which prisoners could never know when they were being watched. In Bentham’s own words: “a new mode of obtaining power of mind over mind, in a quantity hitherto without example”. Although Bentham’s plan was rejected by the British government at the time, there were later prisons built along those lines (Wikipedia), and the panopticon has become a modern symbol of oppressive state surveillance and social control.

Bentham criticised the penal policy of transportation and argued that confinement under surveillance would prove a more effective system of preventing future offending. One of DP’s basic themes is to test his argument empirically by comparing re-offending patterns of those transported and imprisoned at the Old Bailey. But it will go further, to compare the wider social, health, generational impacts of the two penal regimes into the 20th century.

Technically, DP brings together a number of different methods/techniques we’ve worked on in various projects over the years: digitisation, record linkage, data mining and visualisation, impact, connecting and enhancing resources, with the goal of developing “new and transferable methodologies for understanding and exploiting complex bodies of genealogical, biometric, and criminal justice data”.

However, it’s a much more research-intensive project than the ones we’ve done recently, and that’s reflected in the depth and breadth of the seven research themes. These are based on three central research questions/areas:

  • How can new digital methodologies enhance understandings of existing electronic datasets and the construction of knowledge?
  • What were the long and short term impacts of incarceration or convict transportation on the lives of offenders, and their families, and offspring?
  • What are the implications of online digital research on ethics, public history, and ‘impact’?

What’s also exciting (and new for us) is that we’ll have PhD students as well as postdoc researchers (adverts coming soon). Lots of PhD students! Two are part of the AHRC funding package – one at Liverpool and one at Sheffield – and the partner universities have put up funding for several more (two each at Liverpool and Sheffield and one at Tasmania, I think).

The first at Sheffield has just been advertised and the deadline is 2 December (to start work in February 2014):

The Social and Spatial Worlds of Old Bailey Convicts 1785-1875

The studentship will investigate the social and geographical origins and destinations of men and women convicted at the Old Bailey between 1785 and 1875, in order to shed light on patterns of mobility and understandings of identity in early industrial Britain. Using evidence of origins from convict registers and social/occupational and place labels in the Proceedings, the project will trace convicts from their places of origin through residence and work in London before their arrests, to places of imprisonment and subsequent life histories. Analysis of the language they used in trial testimonies will provide an indication of how identities were shaped by complex backgrounds.

Spread the word – and watch this space (and the project website) for more announcements soon!

PS: the project is on Twitter: follow at @digipanoptic

Collaboration and crowdsourcing for Old Bailey Online and London Lives

My digital crime history talk included some mention of ‘crowd sourcing’ and our stuttering efforts in this direction (on various projects) over the last five years or so. This post is intended as a marker to get down some further thoughts on the subject that I’ve been mulling over recently, to start to move towards more concrete proposals for action.

Two years ago, we added to OBO a simple form for registered users to report errors: one click on the trial page itself. People have been using that form not simply to report errors in our transcriptions but to add information and tell us about mistakes in the original. The desire on the part of our site users to contribute what they know is there.

We now have a (small but growing) database of these corrections and additions, which we rather failed to foresee and currently have no way of using. There is some good stuff! Examples:

t18431211-307    The surname of the defendents are incorrect. They should be Jacob Allwood and Joseph Allwood and not ALLGOOD

t18340220-125    The text gives an address Edden St-Regent St. I believe the correct street name is Heddon St, which crosses Regent St. There is no Edden St nearby. There is an Eden St and an Emden St nearby, but neither meet Regent St.

t18730113-138    The surname of the defendant in this case was Zacharoff, not Bacharoff as the original printed Proceedings show. He was the man later internationally notorious as Sir Basil Zaharoff, the great arms dealer and General Representative abroad of the Vickers armaments combine. See DNB for Zaharoff.

t18941022-797    Correct surname of prisoner is Crowder see Morning Post 25.10.1894. Charged with attempted murder  not murder see previous citation.

It also bothers me, I’d add, that there’s no way of providing any feedback (let alone encouragement or thanks). If I disagree with a proposed correction, I don’t have a way to let the person reporting the issue know that I’ve even looked at it, let alone explain my reasoning  (someone suggested, for example, that the murder of a two-year-old child ought to be categorised as ‘infanticide’, but we use that term only for a specific form of newborn infant killing that was prosecuted under a particular statute during the period of the Proceedings).

On top of which, I think it’s going to become an increasing struggle to keep up even with straightforward transcription corrections because the method we’ve always used for doing this now has considerably more friction built in than the method for reporting problems!

So, the first set of problems includes:

  • finding ways to enable site users to post the information they have so that it can be added to the site in a useful way (not forgetting that this would create issues around security, spam, moderation, etc)
  • improving our own workflow for manual corrections to the data
  • solving a long-standing issue of what to do about names that were wrongly spelt by the reporters or have variant spellings and alternatives, which makes it hard for users to search for those people
  • maybe also some way of providing feedback

A possible solution, then, would be a browser-based collaborative interface (for both Old Bailey Online and London Lives), with the facility to view text against image and post contributions.

  • It should be multi-purpose, with differing permissions levels for project staff and registered users.
  • Corrections from users would have to be verified by admin staff, but this would still be much quicker and simpler than the current set-up.
  • But it would be able to do more than just corrections – there would be a way of adding comments/connections/annotations to trials or documents (and to individual people).

A rather different and more programmatic approach to (some of) the errors in the OBO texts than our individualised (ie, random…) and manual procedures was raised recently by Adam Crymble.

For such a large corpus, the OBO is remarkably accurate. The 51 million words in the set of records between 1674 and 1834 were transcribed entirely manually by two independent typists. The transcriptions of each typist was then compared and any discrepancies were corrected by a third person. Since it is unlikely that two independent professional typists would make the same mistakes, this process known as “double rekeying” ensures the accuracy of the finished text.

But typists do make mistakes, as do we all. How often? By my best guess, about once every 4,000 words, or about 15,000-20,000 total transcription errors across 51 million words. How do I know that, and what can we do about it?

… I ran each unique string of characters in the corpus through a series of four English language dictionaries containing roughly 80,000 words, as well as a list of 60,000 surnames known to be present in the London area by the mid-nineteenth century. Any word in neither of these lists has been put into a third list (which I’ve called the “unidentified list”). This unidentified list contains 43,000 unique “words” and I believe is the best place to look for transcription errors.

Adam notes that this is complicated by the fact that many of the ‘errors’ are not really errors; some are archaisms or foreign words that don’t appear in the dictionaries, and some (again) are typos in the original.

Certain types of error that he identified could potentially be addressed with an automated process, such as the notorious confusion of the long ‘S’ with ‘f’: “By writing a Python program that changed the letter F to an S and vise versa, I was able to check if making such a change created a word that was in fact an English word.”

But any entirely automated procedure would inevitably introduce some new errors, which we’re obviously reluctant to do (being pedantic historians and all that). So what to do?

Well, why not combine the power of the computer and the ‘crowd’? We could take Adam’s ‘unidentified list’ as a starting point, so that we’re narrowing down the scale of the task, and design around it a specific simplified and streamlined corrections process, within the framework of the main user interface. My initial thoughts are:

  • this interface would only show small snippets of trials (enough text around the problem word to give some context) and highlight the problem word itself alongside the page image (unfortunately, one thing we probably couldn’t do is to highlight the word in the page image itself)
  • it would provide simple buttons to check for a) the dictionary version or a letter switch, or b) the transcription is correct; with c) a text input field as a fallback if the correction needed is more complex; hopefully most answers would be a or b!
  • if at least two (maybe three?) people provide the same checkbox answer for a problem word it would be treated as a verified correction (though this could be overruled by a project admin), while text answers would have to go to admins for checking and verification in the same way as additions/corrections submitted in the main interface.
  • we should be able to group problems by different types to some extent (eg, so people who wanted to fix long S problems could focus on those)

Suggestions from people who know more than me about both the computing and the crowdsourcing issues would be very welcome!