My final data visualisation post for this Women’s History Month is back in the 18th century and takes a look at an open dataset, Vagrant Lives: 14,789 Vagrants Processed by the County of Middlesex, 1777–1786, which was created by Adam Crymble, Louise Falcini and Tim Hitchcock, using data from London Lives.
This dataset makes accessible the uniquely comprehensive records of vagrant removal from, through, and back to Middlesex, encompassing the details of some 14,789 removals (either forcibly or voluntarily) of people as vagrants between 1777 and 1786. It includes people ejected from London as vagrants, and those sent back to London from counties beyond
They’ve already written about this data in an excellent article (open access) and Crymble has blogged further about his ongoing research. (They have better visualisations too, so you could skip this post entirely and go to the real thing. Think of this as a taster.)
I want to focus on ways of visualising multiple categories of qualitative information – the more categories you want to compare at the same time, the more complex a dataviz has to be. In this case, I’ve got four categories to play with: gender, dates, countries of origin, and vagrant ‘types’. That’s to say, there are three types of individual in the dataset: leaders of family groups, their dependents, and single vagrants. The gender of the majority of dependents is unknown (most are children), so for most of this post, I decided to simplify things by filtering out all of the dependents to focus on the group leaders and singles. (As a result, because I’m ignoring about 500 wives who were counted as dependents, the following will differ somewhat from the work referenced above.) This resulted in 10963 individuals.
Overall, the gender ratio of the vagrants looks almost perfectly balanced (5438 female to 5525 male). But this hides some interesting variations.
Firstly let’s break it down by the year of the case. (There are some missing records, and the very small numbers in 1777 and 1779 in particular are due to these gaps.) Two things stand out: the numbers of both female and male vagrants rise rapidly in the mid-1780s; and women are in the majority each year until 1782, after which they’re overtaken by men.
Now looking at vagrant type. As soon as you have multiple categories, you can split up the data in different ways – the “best” can depend on the data and exactly what it is you want to show. So graph 3a compares the percentages of male and female vagrants for each vagrant type, whereas graph 3b shows the percentages of group and single for each gender. 3b highlights that the majority were single individuals – something you wouldn’t know at all from 3a. It also makes it clear that vagrant type was gendered – considerably more men than women were singles. 3a, on the other hand, is better if you want to know exactly what the proportions of men and women were in each type. Most often, if I had to pick just one of these, it’s likely that I’d plump for 3b, because I’ve already seen that overall there are very similar numbers of men and women. But it might be a harder choice if that weren’t the case.
Now, looking at country of origin (British and Irish vagrants only, as there were only a few from other countries ), further striking differences emerge. It’s hardly surprising that the majority of the vagrants came from England, but much more noteworthy that there was such a large disparity between Irish men and women.
Adam Crymble discusses what’s most likely going on, and it ties in with the particularly rapid increase in the numbers of male vagrants from 1783 shown in graph 1 – it’s probably the result of demobilisation after the American wars.
This says ‘demobilisation’ to me, and the male nature of most Irish vagrants suggests that this may have been a strategy for getting home after the war. Demobilisation was heavily centralized in London. Soldiers and sailors weren’t taken home; they were dropped off and left to find their own way.
Finally, I want to visualise the relationships between three categories in the data: gender, country and vagrant type. Mosaic plots are a more complex and less commonly used type of visualisation that can cram a lot more information into a single chart than you can with a bar chart. But, as with boxplots, that makes them a bit harder to interpret.
Imagine that you start with a single large rectangular block. For your first category, you divide it horizontally, and put the labels for each “level” (in this case there are two, F and M, for gender) on the left hand Y axis. As in the very first bar chart, we can see that the proportions of men and women are close to equal.
Then you sub-divide the two blocks vertically for your second category (country) and put the labels along the top X axis. So reading left to right along each gender block, the first vertical block = English, the second = Irish, third = Scottish and fourth = Welsh. Again, we can see that English vagrants are in the majority for both genders, and at the same time, how a much higher proportion of the men are Irish.
Finally you sub-divide the blocks once again, horizontally, for the third category (vagrant type), and the labels for these (group and single) go on the right hand Y axis. The biggest single category, then, is women from England who are single (Hitchcock et al argue the importance of short-distance female migration London to find domestic service for making up much of this). The smallest category is men from Wales who lead a group.
Male Irish and Welsh vagrants are more likely to be single than are men from England and Scotland, whereas a higher proportion of Irish and (even more so) Scottish women were heading groups. (Crymble has also emphasised how different the Irish and Scottish vagrants were.)
The use of colour and shading adds one final dimension, but it’s harder to interpret on first sight. The idea is to show statistical significance. What it boils down to is that blue means the square is bigger than would be expected by the statistical model; red means it’s smaller than the model would expect (and the darker the colour, the bigger the significance). The fact that the group-Irish-male box is coloured dark red (ie, smaller than “expected”) pretty much seems to reinforce what we’ve already observed. The group-Scottish-female box also stands out among the smaller blocks – suggesting that this is significant and might be further investigated.
However, it’s important to to understand whether what the statistical model “expects” is appropriate for the data we have. In medical research, where data collection is conducted according to carefully defined rules, it may be possible to be confident that a statistical significance means a “real” difference. For a historian it might simply be pointing to imperfections in the data! So it’s essential for historians doing data analysis and visualisation to get to grips with both the original sources and the statistics. I’m still grappling with the second part…
More about Mosaic plots and their interpretation:
- Visualising contingency tables
- Understanding area-based plots: mosaic plots
- how to determine significant associations in a mosaic plot (and search stats.stackexchange.com for more questions and answers on this subject)
- Are Mosaic Plots Worthwhile? (a sceptical view)