A quick post, just to expand on my thoughts about the Text Creation Partnership in my talk. How might this model work in practice for crime (and other) archives, in partnership with institutions like TNA or local record offices and publishers like Ancestry or Findmypast?
The indexing done by family-history oriented publishers like Ancestry and Findmypast is often very limited – a researcher from TNA mentioned a series of criminal registers done by Ancestry that only have names and counties indexed for searching.
And they guard this data, however thin it is, jealously. (The researcher could get access to the Ancestry data but only by signing strict confidentiality agreements.)
So imagine that a group of historians gets some funding together to enrich the indexing that’s been done – capture offence categories, outcomes, places, dates, information about individuals, etc, depending on the source.
The agreement with TNA and the publisher could look something like this:
- For a set period of time (it’s 4 years for the TCP if I remember rightly), only the project members and resource subscribers (including users at TNA) can have direct access to the enriched data.
- The project can publish work based on analysis of the data (with aggregate graphs, tables etc) and small extracts (akin to the ‘snippets’ of text for context that we show in Connected Histories search results)
- Once the time is up, the project can freely distribute the enhanced textual data, eg by posting it as linked open data in a data repository, and can make it searchable at their website (linking to the images at the publisher’s site for people who have subscriptions, as we do at Connected Histories).
- The publisher retains its exclusive control of the source images and it gets a much improved search.
What do people think?