Data Clarity
Data clarity is something that data scientists, analysts, application developers and indeed businesses as a whole aspire to, but not all achieve. It is something we were not really taught at university (at least not as a separate subject), but without which companies falter, crash and sometimes burn, technology breaks down and governments lose elections.
But what exactly is data clarity?
I see it as a combination of knowing what type of information you have available, where it is stored, how up-to-date it is, the level of detail stored, and of what quality it is. It is pretty much about being able to trust the output of your applications and analytics that use your data. Without this knowledge it is virtually impossible to effectively and confidently use your data to inform the activities of your business.
Maintaining organisational data clarity relies on a number of skill-sets, and requires constant effort and revision. From data architects, data stewards and dba’s to ETL developers, analysts, modellers, application designers and business subject matter experts – all are involved to some degree in the enterprise of ensuring the business can use its data effectively.
There are a number of methods businesses use to keep tabs on the ever-increasing burden of maintaining data integrity. These include maintaining data quality dashboards to keep a handle on it all, cataloguing of data dictionaries which need constant updating and are available on the company intranet, the increasing use of graphical ETL tools to simplify repetitive data movement and transformation tasks across environments, and of course the establishment and use of company-wide standards for data naming, formatting, storage, updating, movement and modelling, as well as for data security. This applies to metadata, operational data, data marts, dimensional data, data warehouses, as well as unstructured data, streaming data and any other form of data that may live in some sort of database or file system. If it is potentially useful, then it needs to be maintained.
Some, tend to focus on data clarity issues almost to the exclusion of producing outcomes with the data they are striving to maintain. Others ignore data clarity issues altogether and just keep hoping for the best. These of course are the extremes, and a balance lies somewhere between the two. There is such a thing as a maturity model which seeks to place businesses on a spectrum, rating how effectively they conduct the activities required for maintaining control over their data. This is also known as a data management maturity (DMM) model, and a quick search on the web will lead to plenty of resources on the topic.
The aim for any business is to strive for excellence in as many facets of the DMM model as possible so as to maximise the potential of their data assets.
I encourage you to investigate this a little more if you are concerned that your business is not providing a data environment that it could or should. After all, knowing what information your business maintains, and understanding its limitations will facilitate confidence engaging in this crazy, information-rich world we live in.