Which and What Data, When?!

Updated April 15, with thanks to Georgina Ibarra for proof reading and edits and David Anning for links to the UN and Forbes.

I’ve noticed a common reaction to the word “data” when observing commentators delivering news stories or politicians evangelising the benefits of open data initiatives. While some of us implicitly understand data use in context from our domain expertise and regular exposure to the varying types of data (including how hard it is to get at times) generally speaking, people get freaked out because they assume the worse.

Granted, there are nefarious types out there collecting and selling personal details that they shouldn’t and this is sort of the point – to educate people about the data in use in a way they can grasp easily. Once we remove this knee jerk reaction about the word data, people can focus on what they can do with data rather than what someone else might do to them with it.

I was at the KnowledgeNation event at the ATP yesterday and this kind of hit home when Angus Taylor (Assistant Minister for Cities and Digital Transformation) talked about the “open data” initiative underway. After he finished his speech the first question from a member of the audience was about citizen’s personal details being released. He of course answered it expertly, but at first I was quite astonished at the leap the audience member made from “open data” to “personal data”. But afterwards I thought: well should it be that astonishing considering the vast ocean of “data” out there and how little most of us know about it?

So that got me thinking – how can we provide clearer descriptors for data that deliver an expectation of use and immediately set the tone for the ongoing discussion? As a user experience professional I see this as a responsibility and am now embarking on a proposed solution to try it out.

Like Eskimos have with snow, we might need more words for data or be more conscious of the type of data we are referencing when we talk about it (and when we talk about the stories we tell with data).

I think we’re all in agreement that the term “big data” is vague and unhelpful so I’m making some suggestions to introduce a commonly used vernacular for different types of data:

  • Private data – the citizen owns it, gives permissions, expiration times, and it’s protected from any other use
  • Secure data – sharable but with mind blowing encryption
  • Market data – anything used to sell products to you
  • Hybrid data – some kind of private and non-private mix
  • triangulated data – those seemingly harmless sets that are used to identify people
  • Near-to-Real Time data – because real-time is rarely actually real-time
  • Real Time data
  • Legacy data – old stuff
  • Curated data – deliberately created data sets serving a single purpose
  • Active: Photos, Videos, Searches (search terms) Communications (email, text, comments, blogs)
  • Passive: Health, Financial, Spending, External environmental, Domestic environmental, Location, Logs

Examples in use could be –

“Google are tracking your Real Time Location data when you use maps”

“The Australian Open Data initiatives makes Curated data from the ABS available”

Private and Secure Financial data will not be shared with any third parties”

At Data61 we are face to face with this too so it will be part of our UX work to discover patterns in attitudes and communication.

I’m currently investigating this idea, and I’d love your thoughts! Is there anything published, either academically or otherwise that might have attempted to do this already?

Refs: http://www1.unece.org/stat/platform/display/bigdata/Classification+of+Types+of+Big+Data

2 Replies to “Which and What Data, When?!”

  1. A common vocab for discussing different kinds of data? Great idea – could be bloody hard work though.

    I had this idea a while ago of even just listing what the word “data” means in dozens of contexts. Examples:
    – researchers think of the individual numbers that comprise the experimental results
    – system administrators think of volume and properties like necessity to back up
    – mobile phone users think of it like fuel

    What’s kind of fascinating is that any two uses of the word “data” are ultimately connectable in a way that other homonyms aren’t. That is, a researcher’s experimental results could be stored on disk and then downloaded to a phone, temporarily becoming part of each of the three perspectives…

Comments are closed.