Information Density is in the Eye of the Beholder

There is a current trend that manifests itself as “there is value in all data, so keep everything, because you don’t know now what is valuable and what isn’t”. This is undoubtedly correct, but is it useful?

Basically, the question is “what is the information density of my data?” Journal entry data is very information dense – a small amount of data indicates when, in what account, and how much real money was transacted as part of business operations, and this data isn’t only information dense, it is frequently used, either at the atomic level, or as part of an aggregation.

The value of other data is not as easily recognizable – is the record of readings every second from a temperature sensor valuable? Or is it only the time and values when there is a change that is of value, and how valuable is that data during the 99.9% of the time that the machine being monitored is running normally?

The inability to be able to measure the information density of a data set used to result in most data that had not been “proven” valuable to be discarded. But in a post Google world, choosing to discard data has become a more uncomfortable decision to make. Google seems to have proven that there is value in just about all data. You might not be interested in the scorer of the goals in the 1950 English FA Cup Final (Reg Lewis of Arsenal scored both, Go Arsenal!), but somebody is.

The Difference between Your Organization and Google
This highlights the difference between Google and most organizations – Google open their data to the world, most organizations limit access of their data to a few hundred data scientists and business analysts, most of who are focused on pertinent business goals, rather than a sudden interest in soccer history. So the value that is realized from the large amounts of data stored by Google, or more specifically the value delivered to Google’s users, is unlikely to be replicated in most organizations because the number of queries, and the scope and range of the queries directed at the data in most organizations is minuscule in comparison.

This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>