The Old Bailey Voices data is the result of work I’ve done for the Voices of Authority research theme for the Digital Panopticon project. This will be the first of a few blog posts in which I start to dig deeper into the data. First I’ll review the general trends in trials, verdicts and speech, and then I’ll look a bit more closely at defendants’ gender. …
This post takes a look at an open dataset available through the University of Pennsylvania’s open access repository. The dataset, Indentures and Apprentices made by Philadelphia Overseers of the Poor, 1751-1799 (created by Billy G. Smith), is one of an interesting collection of datasets on 18th- and 19th-century history which I may return to in the future. …
If you know me, the topic of this first post may come as unsurprising but also a bit eyebrow-raising. “Sharon, you’ve been working on the Old Bailey Online project (OBO) since forever. Aren’t you bored with it yet?” …
This dataset makes accessible the uniquely comprehensive records of vagrant removal from, through, and back to Middlesex, encompassing the details of some 14,789 removals (either forcibly or voluntarily) of people as vagrants between 1777 and 1786. It includes people ejected from London as vagrants, and those sent back to London from counties beyond
They’ve already written about this data in an excellent article (open access) and Crymble has blogged further about his ongoing research. (They have better visualisations too, so you could skip this post entirely and go to the real thing. Think of this as a taster.)
I want to focus on ways of visualising multiple categories of qualitative information – the more categories you want to compare at the same time, the more complex a dataviz has to be. In this case, I’ve got four categories to play with: gender, dates, countries of origin, and vagrant ‘types’. That’s to say, there are three types of individual in the dataset: leaders of family groups, their dependents, and single vagrants. The gender of the majority of dependents is unknown (most are children), so for most of this post, I decided to simplify things by filtering out all of the dependents to focus on the group leaders and singles. (As a result, because I’m ignoring about 500 wives who were counted as dependents, the following will differ somewhat from the work referenced above.) This resulted in 10963 individuals.
Overall, the gender ratio of the vagrants looks almost perfectly balanced (5438 female to 5525 male). But this hides some interesting variations.
Firstly let’s break it down by the year of the case. (There are some missing records, and the very small numbers in 1777 and 1779 in particular are due to these gaps.) Two things stand out: the numbers of both female and male vagrants rise rapidly in the mid-1780s; and women are in the majority each year until 1782, after which they’re overtaken by men.
Now looking at vagrant type. As soon as you have multiple categories, you can split up the data in different ways – the “best” can depend on the data and exactly what it is you want to show. So graph 3a compares the percentages of male and female vagrants for each vagrant type, whereas graph 3b shows the percentages of group and single for each gender. 3b highlights that the majority were single individuals – something you wouldn’t know at all from 3a. It also makes it clear that vagrant type was gendered – considerably more men than women were singles. 3a, on the other hand, is better if you want to know exactly what the proportions of men and women were in each type. Most often, if I had to pick just one of these, it’s likely that I’d plump for 3b, because I’ve already seen that overall there are very similar numbers of men and women. But it might be a harder choice if that weren’t the case.
Now, looking at country of origin (British and Irish vagrants only, as there were only a few from other countries ), further striking differences emerge. It’s hardly surprising that the majority of the vagrants came from England, but much more noteworthy that there was such a large disparity between Irish men and women.
Adam Crymble discusses what’s most likely going on, and it ties in with the particularly rapid increase in the numbers of male vagrants from 1783 shown in graph 1 – it’s probably the result of demobilisation after the American wars.
This says ‘demobilisation’ to me, and the male nature of most Irish vagrants suggests that this may have been a strategy for getting home after the war. Demobilisation was heavily centralized in London. Soldiers and sailors weren’t taken home; they were dropped off and left to find their own way.
Finally, I want to visualise the relationships between three categories in the data: gender, country and vagrant type. Mosaic plots are a more complex and less commonly used type of visualisation that can cram a lot more information into a single chart than you can with a bar chart. But, as with boxplots, that makes them a bit harder to interpret.
Imagine that you start with a single large rectangular block. For your first category, you divide it horizontally, and put the labels for each “level” (in this case there are two, F and M, for gender) on the left hand Y axis. As in the very first bar chart, we can see that the proportions of men and women are close to equal.
Then you sub-divide the two blocks vertically for your second category (country) and put the labels along the top X axis. So reading left to right along each gender block, the first vertical block = English, the second = Irish, third = Scottish and fourth = Welsh. Again, we can see that English vagrants are in the majority for both genders, and at the same time, how a much higher proportion of the men are Irish.
Finally you sub-divide the blocks once again, horizontally, for the third category (vagrant type), and the labels for these (group and single) go on the right hand Y axis. The biggest single category, then, is women from England who are single (Hitchcock et al argue the importance of short-distance female migration London to find domestic service for making up much of this). The smallest category is men from Wales who lead a group.
Male Irish and Welsh vagrants are more likely to be single than are men from England and Scotland, whereas a higher proportion of Irish and (even more so) Scottish women were heading groups. (Crymble has also emphasised how different the Irish and Scottish vagrants were.)
The use of colour and shading adds one final dimension, but it’s harder to interpret on first sight. The idea is to show statistical significance. What it boils down to is that blue means the square is bigger than would be expected by the statistical model; red means it’s smaller than the model would expect (and the darker the colour, the bigger the significance). The fact that the group-Irish-male box is coloured dark red (ie, smaller than “expected”) pretty much seems to reinforce what we’ve already observed. The group-Scottish-female box also stands out among the smaller blocks – suggesting that this is significant and might be further investigated.
However, it’s important to to understand whether what the statistical model “expects” is appropriate for the data we have. In medical research, where data collection is conducted according to carefully defined rules, it may be possible to be confident that a statistical significance means a “real” difference. For a historian it might simply be pointing to imperfections in the data! So it’s essential for historians doing data analysis and visualisation to get to grips with both the original sources and the statistics. I’m still grappling with the second part…
The petition of Geelien Cowley ‘a poore widdow and mother of three smale fatherlesse children’:
that your petitioners late husband by name E[dward] Birien of Ruthin a souldier that served in his majestys service in Ireland neare upon three yeares & afterward he retorned to England he served in his majestys service there sixe or seaven yeares where in all these tymes he suffered many ympriso[nments] wounds & brueses wch made him unable to earn his liveliehoode & more especiallie this two yeares last past then he was allowed one of the majestys pensioners to receave a share of his majestys allo[wance] for maymed souldiers provided. Nowe may it please [your] worships to be advertised that the said Edward Birien your petitioners late husband, had a longe sicknesse, beeinge vearie poore & nowe called to gods mercie caused your petitioner to goe upon the credit with her neighbours to suplie her said husbands wants in confidence to receave his share & alloweance of pension as afore is set forth, but it was gods will to take hime to his mercie afore this generall sessions.
Most humbly prayeinge your worships to allowe your petitioner the pencion allotted her late husband for to paye to her creditors what she is engaged for & your worships further help & succours in such sort as your worships thinke meete without your worships comisseracion hearein your petitioner shall not be able to goe amonge good & charitable people for releefe to her & her smale children for feare of arrest or lawsuite. this I humblie bege for gods sacke…
The treasurer of the maimed soldiers’ fund was ordered to pay her the whole quarterly allowance due to her husband.
[NLW Chirk Castle Quarter Sessions files October 1665 B21/d7]
I’ve recently been working on the Digital Panopticon, a digital history project that has brought together (and created) massive amounts of data about British prisoners and convicts in the long 19th century, including several datasets which include heights for women. Adult height is strongly influenced by environmental factors in childhood, one of the most important being nutrition. So,
The height of past populations can thus tell historians much about the conditions that individuals encountered in their formative years. Given sufficient data it is possible to glimpse inside households in order to piece together a history of the impact that declining wages, rising prices, improvements in sanitation and diminishing family size had on mean adult stature.
However, many studies of height and nutrition in 18th- and 19th-century Britain focused on military records and therefore had little to say about women. The turn to using the rich records of heights for men and women (and children) in 19th-century penal records has been more recent.
Today’s post is going to look at height patterns in four Digital Panopticon datasets, mainly using a kind of visualisation that many historians aren’t familiar with: box plots. If you’ve seen them and not really understood them, it’s OK – I didn’t have a clue until quite recently either! And so, I’ll start by attempting to explain what I learned before I move on to the actual data.
A box plot, or box and whisker plot, is a really concentrated way of visualising what statisticians call the “five figure summary” of a dataset: 1. the median average; 2. upper quartile (halfway between the median and the maximum value); 3. lower quartile (halfway between the median and minimum value); 4. minimum value; and 5. maximum value.
Here’s a diagram:
The thick green middle bar marks the median value. The two blue lines parallel to that (aka “hinges”) show the upper and lower quartiles. The pink horizontal lines extending from the box are the whiskers. In this version of a box plot, the whiskers don’t necessarily extend right to the minimum and maximum values. Instead, they’re calculated to exclude outliers which are then plotted as individual dots beyond the end of the whiskers.
So what’s the point of all this? Imagine two datasets: one contains the values 4,4,4,4,4,4,4,4 and the other 1,3,3,4,4,4,6,7. The two datasets have the same averages, but the distribution of the values is very different. A boxplot is useful for looking more closely at such variations within a dataset, or for comparing different datasets, which might look pretty much the same if you only considered averages.
These are the four datasets:
HCR, Home Office Criminal Registers 1790-1801, prisoners held in Newgate awaiting trial (1226 heights total, 1061 aged over 19)
CIN, Convict Indents 1820-1853, convicts transported to Australia (17183 heights, 14181 over 19)
PLF, Female prison licences 1853-1884, female convicts sentenced to penal servitude (571 heights, 535 over 19)
RHC, Registers of Habitual Criminals 1881-1925, recidivists who were under police supervision following release from prison (12599 heights, 12118 over 19)
For each dataset, I only included women who had a year of birth, or whose year of birth could be calculated using an age and date, as well as a height. (I say “heights” above because I can’t guarantee that they are all unique individuals; but nearly all of them should be.) In all the following charts I’m including only adult women aged over 19.
Here’s what happens when you plot the heights for each birth decade in RHC.
(This is generated using the R package ggplot2 , and it looks a little bit different from many examples you’ll see online because ggplot has a nice feature to vary the width of the boxes according to the size of the data group.)
The first thing I look for is incongruities that might suggest problems with the data, and on the whole it looks good – the boxes are mostly quite symmetrical and none of the outliers is outside the realms of possibility (the tallest woman is 74.5 inches, or 6 foot 2 1/2, and the shortest is 48 inches), though I’m slightly doubtful that there were women born in the 1800s in this dataset, which gets going in the 1880s; still, they’re a very small number so unlikely to skew things much overall. Since the data seems to be OK on first sight, the interesting thing to note here is that from the 1850s onwards, the women are getting taller, and those born in the 1890s are quite a lot taller than the 1880s cohort. This is fairly consistent with Deb Oxley’s (more fine-grained) observations of the same data.
Again, we have a reasonable spread of heights and fortunately very small number of slightly questionable early births. (It happens to be the case that this data was manually transcribed, whereas RHC was created using Optical Character Recognition – but on the other hand, the source for RHC was printed and much more legible than the handwritten indents.) Ignoring for now the very small groups before the 1770s, the tallest decade cohort of women in this data is those born in the 1790s and thereafter they get consistently shorter.
Let’s put all four datasets together! (click on the image for a larger version)
I’ve filtered out women born before 1750 and after 1899, because the numbers were very small, and some extreme outliers (more about those later…). Then I added a guideline at the median for the 1820s (the mid-point), as I think it helps in seeing the trends.
It might seem surprising at first that the late 18th-century women of HCR are taller than any subsequent cohorts until the 1890s. Yet the trends here are broadly consistent with the pioneering research by Roderick Floud et al on British men and boys between 1740 and 1914. They argued “that the average heights of successive birth cohorts of British males increased between 1740 and 1840, fell back between 1840 and 1850, and increased once again from the 1850s onwards” (Harris, ‘Health, Height and History’). The British population was less well-fed for much of the 19th century (as food resources struggled to keep up with rapid population growth), and it got smaller as a result. Our women’s growth after 1850 may be slower than for the men (until the 1890s) though; perhaps it took longer for women than men to start growing again.
Finally, though, I have to put in a big caveat about the HCR data. I mentioned that I excluded some extreme outliers from the chart above. HCR was by far the worst offender, and if you look closely at the 18th-century cohorts covered by HCR, the boxes aren’t quite as symmetrical as the 19th-century ones. If we visualise it using a histogram (another handy one for examining the distribution of values in a dataset), we can see more clearly that there’s something up. A ‘normal’ height distribution in a population should look like a “bell curve” – quite tightly and symmetrically clustered around the average. CIN and RHC are close:
But this is what HCR looks like. This is not good.
If we’re lucky, much of the problem could turn out to be errors in the data which can be fixed. After all, it’s at least roughly the right kind of shape! The big spike at 60 inches (5 feet) rings plenty of alarm bells though. It looks reminiscent of a problem we have with much of the age data in the Digital Panopticon, known as “heaping“, a tendency to round ages to the nearest 0 or 5 (people often didn’t know their exact dates of birth). The age heaping is very mild in comparison to this spike, so I think it could well be another issue with either the transcription or the method used to extract heights. But if it turns out that’s not the case, this could be pretty problematic. We’re assuming the prisoners were properly measured, but we don’t know anything about the equipment used. For all we know, it might often have been largely guess work. In the end, we might find that HCR simply isn’t reliable enough to use for demographic analysis. There’s very little height data for women born in the 18th century, so this is a potentially really important source. But what if it’s not up to the job?
Today I want to go on an excursion in “catalogues as data“. The UK National Archives’ Discovery catalogue is an excellent resource for this activity, because a) it has a lot of records that have document descriptions at ‘item’ or ‘piece’ level in the catalogue, containing quite structured information (like dates, places, occupations) that can be quantified and visualised; and b) even more importantly, it has an export function that allows you to download up to 10,000 records in CSV format. (It also has a full API for those with some programming skills, but 10,000 records will get you a long way, and you can often break up larger collections into chunks, eg with date filters).
You’ll need to use the Discovery advanced search quite carefully to get the right set of search results (it enables specification of particular records, dates, catalogue level, etc) – there are some useful tips here. Then you’re quite likely to need to use a tool like OpenRefine to separate out pieces of information into separate data fields and clean/normalise dates etc (check out this tutorial).
the service records of more than 7,000 women who joined the Women’s Army Auxiliary Corps (WAAC) between 1917 and 1920… The WAAC became the QMAAC in April 1918 and was disbanded in September 1921
At 7000 records, this sounded like a good size set to play around with, well within the download limits. And a look at a catalogue entry showed that it has some nice information beyond women’s names (unlike a similar and larger series, WO399, which has only transcribed names). Given just a few hours work extracting and cleaning the data, what could I learn?
Aaron, Sarah Ann nee Phillips
Place of Birth:
High Street Cefn Mawr, North Wales
Date of Birth:
22 August 1894
First, what does this actually offer in terms of usable data? The date of birth is an obvious one: closer inspection shows that it’s in a consistent format where there’s a full date (the majority); at least a year is provided in almost every case, and that can be extracted into a standard year of birth field quite easily. Place of birth also has potential, but it’s more varied and needs more cleaning, so I haven’t done anything with that yet; but it could make for an interesting mapping exercise. Less obviously perhaps, “nee Phillips” suggests that – if you can safely assume women always gave this information! – it’s possible to also infer something about whether a woman was (or had been) married. Another nice little thing you could also potentially do, given birth dates and first names, is to look for patterns in baby naming (although this might really need a larger dataset).
Two caveats, one major and one more minor:
The online guide makes it clear that these 7000 records are only a small minority of the original collection (57000 records), as many were destroyed in a WW2 air raid. So it might not be representative of the women recruited.
Errors in the data – which you always have to look out for, even in the best quality material. In this case, there were a few obvious transcription errors in the birth dates. We can be 100% certain that birth years of 1822, 1917-18 and 1988 are just wrong. But actually more problematic are outliers that look unlikely but not quite impossible: 1844? 1903? Fortunately, they account for a tiny number of records. There were also 278 recorded as numbers like 18880 or 18930: I concluded that these were actually meant to be year dates to which somehow an extra zero had been added and corrected them accordingly.
Visualisation is often particularly useful for highlighting errors and problems in your data. But it’s the researcher who has to decide what to do about such anomalies (and whether they might even be serious enough to make the whole dataset too unreliable to be worth using).
I initially hoped that the record dates would represent specific dates when women joined up, but as it turned out there was only a covering date for the series as a whole. Since it only covers 4 years, that’s not really an issue; instead I simply worked out their ages in 1918 (assuming that there wouldn’t have been new recruits after the war ended anyway), and filtered out the half-dozen supposedly born before 1860 or after 1903.
And so the thing I learned today is that, gosh, they were so young.
As visualisations, tables may be less eye-catching than graphs, but they have the virtue of presenting a lot of precise information in a relatively small space; the table at the bottom of this post shows that more than 60% of the women were aged 25 or under in 1918 and about 90% were under 30. Very few of them were old enough to take advantage of the limited extension of voting rights to women at the end of the war.
This is confirmed by a bit of background reading – according to Lucy Noakes on Women’s Mobilization for War (Great Britain and Ireland), “the majority of recruits to the WAAC were young working class women”. If we can reasonably assume that the information given about maiden names is a complete record, or anywhere near it, the vast majority of the women were also unmarried – nearly 95% of them overall. I suspect that very few married women would have volunteered for this type of service (which was likely to take them overseas and close to combat), and as a result it might be expected that the majority would be young – very likely younger, on average, than male soldiers. You can also see that a considerably higher proportion of the women aged over 25 were/had been married – but it still looks a very low proportion compared to what you might expect in the general population (and I wonder if quite a lot of these were widows).
I’m not exactly surprised to learn from Noakes that their youth (and, no doubt, class) resulted in some negative perceptions:
In the public mind however, they were sometimes perceived as thrill seekers, drawn by a desire for adventure and romance, and recruitment to the service suffered from fears that women were finding opportunities for sexual liaisons with the soldiers. So worried was the government by these rumours that a Commission of Enquiry was formed, which included figures showing the number of pregnancies amongst unmarried members of the WAAC was lower than among unmarried civilians…