Thursday, February 13, 2020

There's gold in the data - but have we learned the lessons?

For some reason I can't get the song Clementine out of my head.  Perhaps it's because of the line - dwelt a miner, a 49er....  that keeps that song playing in my mind.  We are in a significant transition. One breed of miner - the hardhat wearing, coal digging, heavy equipment manager is leaving the scene, being replaced by large equipment, robots, issues with safety and the environment and of course greenhouse gas concerns.  In his or her place is emerging a new kind of miner - a data miner, a process miner, a bitcoin miner.  People who are expert at exploiting a new, but somewhat unnatural resource:  all the data that we create.

The reference to "49" in the song, by the way, is a reference to the gold rush in California.  Where there are a lot of valuable and unoccupied opportunities open for grabs, there will be a gold rush.  We are in the very beginning of one now - a gold rush to find, mine and refine data and turn the insights into value.  There are some interesting parallels between the gold rush miners in the 1800s and the people seeking to capitalize on all the data available today.

You may have heard it said data is the new oil

Let's get this out of the way - there must be at least one cliche in every blog post so there it is.  Data has a lot about it that makes it a good analogy for oil.  At the beginning of the oil boom, there were lots of deposits of oil that were easy to reach, close to the surface.  Oil as we know it today became popular as other types of fuel (whale oil in particular) were becoming more difficult to obtain and more expensive.  Plus, the oil was more plentiful and in most cases had more value.

Data is increasingly the same way.  Now there is more of it, since there are more devices and people generating more data (and more kinds of data).  Data is more valuable, because as the volume grows you can do interesting things and get interesting insights from the data.  And we are just entering an age where we have the compute power and memory access to really dig deeply into the data and find all that is interesting in there.

Of course we can mine our current databases - the CRM and ERP databases that contain so many transactions.  These should be gold mines after years of collecting data.  There are probably other significant master data bases that your company has that can be mined.

Just recently I became away of process mining - which is just data mining but based on process records.  So your order to cash process, or procure to pay process can be mined and evaluated for gaps or inefficiencies, or to discover new insights.  Even blockchain has its own miners, but they are in a different category.

Why data is unlike oil or gold

In the past, miners have been extractive, doing difficult and dangerous work to extract rapidly depleting natural resources.  These future miners are doing something quite the opposite - working in pristine conditions with the latest technologies, but more importantly working on a resource that is replenishing faster than it can be mined and understood.

The total amount of gold, coal or oil worth extracting is fixed, and the inexpensive stuff has already been extracted.  In contrast, we are only beginning to understand how to understand data, and the amount of data we create will experience exponential growth.  The problem miners will face is not where to dig, but assessing the potential value of data before digging.  If anything, data miners and data scientists risk becoming overwhelmed with data, as companies increasingly understand the potential value of the data in their organizations and make plans to capture and store more of it.

What these miners have in common

What all of these data miners have in common is that they are trying to find, collect, assess and refine the data and information in large databases within corporations.  The potential to do this well emerges from a couple of new concepts:  data scientists, who are interested in what the data contains, and new tools like predictive analytics, machine learning and artificial intelligence, as well as natural language processing.  These tools help identify interesting trends or relationships in the data in ways that humans can't do at scale.

What these miners and their companies lack

While there is great promise in data mining and process mining, many companies lack the basic infrastructure to benefit in significant ways from their data.  This is true for several reasons:

First, the depth and range of data.  While many companies have a lot of data, it often isn't reflective of a long period of history over a consistent set of customers or vendors.  Just having a lot of data does not mean much if the contents vary a lot.

Second, the quality of the data.  While some data is of relatively high quality (most likely the data in your ERP system) other data may be suspect (your CRM data is more suspect than your ERP data) or questionable (data you acquire versus data that you create).  Differences in quality can create significant errors in analysis later.

Third, the validity of your processes.  When a friend suggested I write about process mapping, I almost laughed out loud.  There's nothing wrong with the idea - in fact it's a good idea, but it assumes that 1) there are defined processes that 2) people follow to the letter and 3) that all the data within that process is captured effectively.  Since we know that SOPs are often written but rarely reviewed, many processes don't do what we think they do.  Be careful about believing that your documented processes reflect reality.

Fourth, the ability and capability of your data scientists.  I don't know about the market where you live, but if you want a job in Raleigh, all you need to do is claim to be a data scientist.  There are far more jobs than experienced people, which is another problem with the gold rush.  Lots of people who went to California had little or no experience mining for gold.  Turns out, it isn't simply lying there on the ground waiting to be picked up.  You needed experience to find it, and even then there were no guarantees.

I suspect that there will be some isolated successes in every company that has good data and good processes, and is patient with the people who are seeking value in the data.  There are a number of caveats there.  Few companies have good consistent data, and even fewer have the patience necessary to allow a lot of experiments and learning in the data.  It could be another few years before we see really interesting results from data mining from firms that are not pure data plays.

One small warning

It will be interesting to see who becomes the Levi Strauss of this particular mining rush.  Strauss became rich not from mining operations but by selling clothing and equipment to miners. There's a lesson here for current day data and process miners.  Very few of the miners in the California gold rush got wealthy.  The people who reaped most of the benefits were the people who supplied them, sold them supplies and entertainment.  The unexpected outcome of the gold rush was a growing, thriving and eventually diversifying economy, which I think might happen in this data mining rush.  I wonder how much that history will repeat itself in this rush to mine data.  It remains to be seen who benefits or ultimately profits from all the data mining that will happen.

AddThis Social Bookmark Button
posted by Jeffrey Phillips at 6:52 AM


Post a Comment

<< Home