A “Data Creation Partnership”?

A quick post, just to expand on my thoughts about the Text Creation Partnership in my talk. How might this model work in practice for crime (and other) archives, in partnership with institutions like TNA or local record offices and publishers like Ancestry or Findmypast?

The indexing done by family-history oriented publishers like Ancestry and Findmypast is often very limited – a researcher from TNA mentioned a series of criminal registers done by Ancestry that only have names and counties indexed for searching.

And they guard this data, however thin it is, jealously. (The researcher could get access to the Ancestry data but only by signing strict confidentiality agreements.)

So imagine that a group of historians gets some funding together to enrich the indexing that’s been done – capture offence categories, outcomes, places, dates, information about individuals, etc, depending on the source.

The agreement with TNA and the publisher could look something like this:

  • For a set period of time (it’s 4 years for the TCP if I remember rightly), only the project members and resource subscribers (including users at TNA) can have direct access to the enriched data.
  • The project can publish work based on analysis of the data (with aggregate graphs, tables etc) and small extracts (akin to the ‘snippets’ of text for context that we show in Connected Histories search results)
  • Once the time is up, the project can freely distribute the enhanced textual data, eg by posting it as linked open data in a data repository, and can make it searchable at their website (linking to the images at the publisher’s site for people who have subscriptions, as we do at Connected Histories).
  • The publisher retains its exclusive control of the source images and it gets a much improved search.

What do people think?

One thought on “A “Data Creation Partnership”?”

  1. I should add that my experience so far of working with commercial publishers (like Gale-Cengage, ProQuest or Origins.net) on Connected Histories is that they’re not at all hostile to negotiating more open access to their content as long as they’re persuaded they’ll get some benefit from it. For example, we initially expected them to be very resistant to allowing text snippets in search results, but they happily accepted our argument that it would enable users to view results in context and work out whether they were relevant (and so wouldn’t get pissed off at clicking through to lots of irrelevant material…).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.