Today I want to go on an excursion in “catalogues as data“. The UK National Archives’ Discovery catalogue is an excellent resource for this activity, because a) it has a lot of records that have document descriptions at ‘item’ or ‘piece’ level in the catalogue, containing quite structured information (like dates, places, occupations) that can be quantified and visualised; and b) even more importantly, it has an export function that allows you to download up to 10,000 records in CSV format. (It also has a full API for those with some programming skills, but 10,000 records will get you a long way, and you can often break up larger collections into chunks, eg with date filters).
You’ll need to use the Discovery advanced search quite carefully to get the right set of search results (it enables specification of particular records, dates, catalogue level, etc) – there are some useful tips here. Then you’re quite likely to need to use a tool like OpenRefine to separate out pieces of information into separate data fields and clean/normalise dates etc (check out this tutorial).
the service records of more than 7,000 women who joined the Women’s Army Auxiliary Corps (WAAC) between 1917 and 1920… The WAAC became the QMAAC in April 1918 and was disbanded in September 1921
At 7000 records, this sounded like a good size set to play around with, well within the download limits. And a look at a catalogue entry showed that it has some nice information beyond women’s names (unlike a similar and larger series, WO399, which has only transcribed names). Given just a few hours work extracting and cleaning the data, what could I learn?
|Record for||Aaron, Sarah Ann nee Phillips|
|Place of Birth:||High Street Cefn Mawr, North Wales|
|Date of Birth:||22 August 1894|
First, what does this actually offer in terms of usable data? The date of birth is an obvious one: closer inspection shows that it’s in a consistent format where there’s a full date (the majority); at least a year is provided in almost every case, and that can be extracted into a standard year of birth field quite easily. Place of birth also has potential, but it’s more varied and needs more cleaning, so I haven’t done anything with that yet; but it could make for an interesting mapping exercise. Less obviously perhaps, “nee Phillips” suggests that – if you can safely assume women always gave this information! – it’s possible to also infer something about whether a woman was (or had been) married. Another nice little thing you could also potentially do, given birth dates and first names, is to look for patterns in baby naming (although this might really need a larger dataset).
Two caveats, one major and one more minor:
- The online guide makes it clear that these 7000 records are only a small minority of the original collection (57000 records), as many were destroyed in a WW2 air raid. So it might not be representative of the women recruited.
- Errors in the data – which you always have to look out for, even in the best quality material. In this case, there were a few obvious transcription errors in the birth dates. We can be 100% certain that birth years of 1822, 1917-18 and 1988 are just wrong. But actually more problematic are outliers that look unlikely but not quite impossible: 1844? 1903? Fortunately, they account for a tiny number of records. There were also 278 recorded as numbers like 18880 or 18930: I concluded that these were actually meant to be year dates to which somehow an extra zero had been added and corrected them accordingly.
Visualisation is often particularly useful for highlighting errors and problems in your data. But it’s the researcher who has to decide what to do about such anomalies (and whether they might even be serious enough to make the whole dataset too unreliable to be worth using).
I initially hoped that the record dates would represent specific dates when women joined up, but as it turned out there was only a covering date for the series as a whole. Since it only covers 4 years, that’s not really an issue; instead I simply worked out their ages in 1918 (assuming that there wouldn’t have been new recruits after the war ended anyway), and filtered out the half-dozen supposedly born before 1860 or after 1903.
And so the thing I learned today is that, gosh, they were so young.
As visualisations, tables may be less eye-catching than graphs, but they have the virtue of presenting a lot of precise information in a relatively small space; the table at the bottom of this post shows that more than 60% of the women were aged 25 or under in 1918 and about 90% were under 30. Very few of them were old enough to take advantage of the limited extension of voting rights to women at the end of the war.
This is confirmed by a bit of background reading – according to Lucy Noakes on Women’s Mobilization for War (Great Britain and Ireland), “the majority of recruits to the WAAC were young working class women”. If we can reasonably assume that the information given about maiden names is a complete record, or anywhere near it, the vast majority of the women were also unmarried – nearly 95% of them overall. I suspect that very few married women would have volunteered for this type of service (which was likely to take them overseas and close to combat), and as a result it might be expected that the majority would be young – very likely younger, on average, than male soldiers. You can also see that a considerably higher proportion of the women aged over 25 were/had been married – but it still looks a very low proportion compared to what you might expect in the general population (and I wonder if quite a lot of these were widows).
I’m not exactly surprised to learn from Noakes that their youth (and, no doubt, class) resulted in some negative perceptions:
In the public mind however, they were sometimes perceived as thrill seekers, drawn by a desire for adventure and romance, and recruitment to the service suffered from fears that women were finding opportunities for sexual liaisons with the soldiers. So worried was the government by these rumours that a Commission of Enquiry was formed, which included figures showing the number of pregnancies amongst unmarried members of the WAAC was lower than among unmarried civilians…
The ages of women recruited to the WAAC/QMAAC, 1917-18
|Age in 1918||number of women||% of total|