The digital dilemma: putting data to work
Over the history of mankind, we've constantly sought out tools and capital to make us more productive. From the formation of basic tools to assist in farming to real cultivation and shaping of the land for greater yields, humankind learned to grow food. Further research into genetics, fertilizers and pesticides enabled us to rapidly scale food production. From early sweatshops to almost fully automated factories, we've learned how to scale manufacturing and get far more productivity from fewer workers and more machinery and automation.
In this manner, we've learned to improve the deployment of human labor, land, tools, machinery and other capital to improve our quality of life. Now, we must fully engage the asset that we have the most of, that is producing the least for us: data. It's time to put our data to work.
Does it strike you as odd that cybersecurity experts speak of needing to protect "data at rest"? In fact it would appear that data in the cybersecurity world only has two states: data at rest and data in motion. Does data ever "work"? Of course, I'm being a bit pedantic but you get the point. For far too long we've thought about data as a by-product of our work. We've collected data and stored it, first in data centers on site and eventually in data centers off-site, or what we now call "the cloud". But the real question is: what is all that data doing for us? Is it an under-utilized asset that can be put to work for more effective gains?
The new land, or the new oil?
Lately, people are talking about data as the new oil - a cheap and easily accessible resource. The clear difference between oil and data is that oil is a rapidly diminishing resource that gets more difficult to obtain and extract the more we use it. The reverse is true for data. We will never run out of data - the sources may change, but as long as the internet exists and people are engaged on the internet, we will create data.
I think the analogy is wrong. I think we should be thinking about data as the new land. I realize it's not a perfect analogy, because we also have a finite amount of land. But land comes in different shapes, sizes and configurations, good for different things. If we take the US as an example, in the 1700s New York City was limited to the southern tip of Manhattan, and the rest was used to grow crops. Long Island was basically a farming community. Over time, as the city expanded, other uses for the land emerged - for manufacturing, for banking, for dwelling and other uses. Today, Manhattan is mostly developed, and governments have developed regulations regarding how land can be used.
In other words, we put the land to work for us, and over subsequent generations the land creates value in different ways, either by growing crops or housing manufacturing or apartments. Land increases in value based on its scarcity and based on what you can do with it or place on it. To the extend that it comes full circle, and land becomes valuable when you place nothing on it.
We don't have a perfect analogy for data
Since we don't have experience with infinitely replenish-able assets, there is no good analogy for data, but the point remains. As we've done with other assets - labor, land, equipment - we need to put the data we have and are generating to work, get it up off the couch and out of the air conditioned data centers and out working for us. Oh, but you might say, we are doing that - the artificial intelligence guys and the machine learning folks are working on that now.
This is true, but misleading. While AI and ML teams are working with data, they are focused on highly specific use cases of data in narrow niches, and only working with tiny subsets of data. To the greatest extent, many of the AI and ML teams can't work with a lot of the data that exists, because the data is too...well, it's too messy, too discontinuous, exists over too short a time horizon and a lot of other reasons. A massive amount of data we have on hand is not working for us, and we need for it to become productive or move out of the basement.
Needing a mnemonic
Now, and in the near future, we need to be asking ourselves some interesting questions as we create and store data.
Putting data to work is paramount, if our history of leveraging labor, land and equipment is any guide. This is far too important to leave to the IT folks, or even the AI and ML folks. It requires that everyone in an organization work to get the most out of the data you have - which I suspect is the least utilized asset in the company. More importantly, what is your plan for the deluge of data that will be generated as we go through a digital transformation?
Farmers, Ranchers, Machinists, Data-ists?
If farmers and ranchers are people who seek to get value out of land, and machinists are people who seek to optimize machines, what do you call someone who is seeking to get the most value from data? This is a job for data scientists, but not just them alone. Perhaps over time data scientists and others will be responsible for getting the most absolute value out of data, but until then it is the responsibility of anyone with access to the data, and therein lies the rub.
In the oil boom, basically anyone could drill a well as long as they owned the rights to drill (and often in the early days the legalities happened post facto). Today, the real question is: who owns the data, and who has access to the data? In corporations, corporations own the data, and traditionally have limited access to the data. One must proceed through the cybersecurity guys, to the IT guys and the data center guys to even get access to the data. We've hidden, protected and partitioned the data so very few people can access it. Further, most organizations have arcane rules and challenging tools to use to access and manipulate data. No wonder data is at rest. We've given it a comfy place to reside, little responsibility and limited access to people who need it.
Doesn't "data want to be free"
Strange that the rallying cry for many at the start of the internet was that "data wants to be free", but just as we are gathering enough of it to matter, and just as we are starting to develop the tools and people to begin to make sense of it, data seems more isolated, more locked down than ever before.
As digital transformation unfolds, and the real value in the economy is not in labor, land or equipment, the real value proposition will be in putting data to work. Which means we need the right data, the right people, the right access to the data, and the right questions to ask. This is a job that is far larger than the IT department and the data scientists. It is a job, and a responsibility, for everyone in the organization.
In this manner, we've learned to improve the deployment of human labor, land, tools, machinery and other capital to improve our quality of life. Now, we must fully engage the asset that we have the most of, that is producing the least for us: data. It's time to put our data to work.
Does it strike you as odd that cybersecurity experts speak of needing to protect "data at rest"? In fact it would appear that data in the cybersecurity world only has two states: data at rest and data in motion. Does data ever "work"? Of course, I'm being a bit pedantic but you get the point. For far too long we've thought about data as a by-product of our work. We've collected data and stored it, first in data centers on site and eventually in data centers off-site, or what we now call "the cloud". But the real question is: what is all that data doing for us? Is it an under-utilized asset that can be put to work for more effective gains?
The new land, or the new oil?
Lately, people are talking about data as the new oil - a cheap and easily accessible resource. The clear difference between oil and data is that oil is a rapidly diminishing resource that gets more difficult to obtain and extract the more we use it. The reverse is true for data. We will never run out of data - the sources may change, but as long as the internet exists and people are engaged on the internet, we will create data.
I think the analogy is wrong. I think we should be thinking about data as the new land. I realize it's not a perfect analogy, because we also have a finite amount of land. But land comes in different shapes, sizes and configurations, good for different things. If we take the US as an example, in the 1700s New York City was limited to the southern tip of Manhattan, and the rest was used to grow crops. Long Island was basically a farming community. Over time, as the city expanded, other uses for the land emerged - for manufacturing, for banking, for dwelling and other uses. Today, Manhattan is mostly developed, and governments have developed regulations regarding how land can be used.
In other words, we put the land to work for us, and over subsequent generations the land creates value in different ways, either by growing crops or housing manufacturing or apartments. Land increases in value based on its scarcity and based on what you can do with it or place on it. To the extend that it comes full circle, and land becomes valuable when you place nothing on it.
We don't have a perfect analogy for data
Since we don't have experience with infinitely replenish-able assets, there is no good analogy for data, but the point remains. As we've done with other assets - labor, land, equipment - we need to put the data we have and are generating to work, get it up off the couch and out of the air conditioned data centers and out working for us. Oh, but you might say, we are doing that - the artificial intelligence guys and the machine learning folks are working on that now.
This is true, but misleading. While AI and ML teams are working with data, they are focused on highly specific use cases of data in narrow niches, and only working with tiny subsets of data. To the greatest extent, many of the AI and ML teams can't work with a lot of the data that exists, because the data is too...well, it's too messy, too discontinuous, exists over too short a time horizon and a lot of other reasons. A massive amount of data we have on hand is not working for us, and we need for it to become productive or move out of the basement.
Needing a mnemonic
Now, and in the near future, we need to be asking ourselves some interesting questions as we create and store data.
- First, why are we collecting this data - what near term or longer term benefit do we hope to achieve?
- Second, what about this data makes it interesting or useful? What "meta-data" should we be attaching to the data?
- Third, if we hope to put this data to work, what conditions must exist for the data to be useful and valuable?
- Fourth, is this data "enough" to be useful? Does it need to be augmented with other data? What lifespan of the data is necessary in order to gain more value?
- Fifth, how might we put this data to work, either in a statistical analysis, a predictive model, to drive or automate a process, or to provide insight into new or emerging opportunities?
- Sixth, what kinds of people do we need who can curate the data, clean and consolidate the data and eventually manipulate the data into useful information and analyze and act on the results?
Putting data to work is paramount, if our history of leveraging labor, land and equipment is any guide. This is far too important to leave to the IT folks, or even the AI and ML folks. It requires that everyone in an organization work to get the most out of the data you have - which I suspect is the least utilized asset in the company. More importantly, what is your plan for the deluge of data that will be generated as we go through a digital transformation?
Farmers, Ranchers, Machinists, Data-ists?
If farmers and ranchers are people who seek to get value out of land, and machinists are people who seek to optimize machines, what do you call someone who is seeking to get the most value from data? This is a job for data scientists, but not just them alone. Perhaps over time data scientists and others will be responsible for getting the most absolute value out of data, but until then it is the responsibility of anyone with access to the data, and therein lies the rub.
In the oil boom, basically anyone could drill a well as long as they owned the rights to drill (and often in the early days the legalities happened post facto). Today, the real question is: who owns the data, and who has access to the data? In corporations, corporations own the data, and traditionally have limited access to the data. One must proceed through the cybersecurity guys, to the IT guys and the data center guys to even get access to the data. We've hidden, protected and partitioned the data so very few people can access it. Further, most organizations have arcane rules and challenging tools to use to access and manipulate data. No wonder data is at rest. We've given it a comfy place to reside, little responsibility and limited access to people who need it.
Doesn't "data want to be free"
Strange that the rallying cry for many at the start of the internet was that "data wants to be free", but just as we are gathering enough of it to matter, and just as we are starting to develop the tools and people to begin to make sense of it, data seems more isolated, more locked down than ever before.
As digital transformation unfolds, and the real value in the economy is not in labor, land or equipment, the real value proposition will be in putting data to work. Which means we need the right data, the right people, the right access to the data, and the right questions to ask. This is a job that is far larger than the IT department and the data scientists. It is a job, and a responsibility, for everyone in the organization.