Is your data a liability?

Data should support knowledge workers and empower executives to make the best decisions. In this post we will cover the key characteristics you can use to decide if your data is an asset or liability.

Data Asset versus Liability: The Key Indicators of Data Value

The value of a data set is measured by much more than the information itself. Value is determined by the data’s ability to address a need. Data quality is the foundation of information’s ability to fulfill needs and serve knowledge workers. The table below outlines the characteristics or indicators of data value. A substantial deficiency in any of the indicators can render the data useless. Certain deficiencies might be unidentified which can cause unknown expenses and problems. In this worst case and common scenario, the data is perceived as an asset when it is actually a liability. Understanding the indicators that drive value will allow you to assess what is an asset, liability or risk.

Relevance In-demand and relevant to subject matter that is desired.
Completeness Provides all desired pieces of information.
Timeliness Provided by a requested time.
Accuracy How close a measurement is to the actual value.
Precision Measure of detail in which a value is expressed. Granularity or unit of measure.
Consistency All values computed or captured with same method, granularity or unit of measure.  Typically applies to summarizations and aggregations.
Uniqueness Free of duplicate information or different instances of a unique piece of information.
Accessibility Both discoverable and ease of accessing.
Understandability Degree of available metadata, documentation, and lineage.
Interoperability Degree to which the data distribution format is easy to consume, query and integrate with different data management systems.
Community Number of contributors and frequency of contributions (an indicator of maintenance vs. obsolescence).

Relevance

Data must be relevant to the need of the data consumer. Relevance is the degree that the information is related to the desired subject matter. Data that does not cover the desired subject matter but is correlated can provide value. Consider a custom tailor that produces clothing and recently decided to add shoes to his line. He would like to choose the size and quantity of shoes to keep in inventory. He does not know the shoe sizes of his customers, but he has height measurements. Since height is correlated to shoe size, he is able to leverage this information to make a better inventory decision.

Completeness

Completeness is a factor of providing all needed details. For example, tabular data may lack columns for some or all rows of data. One or more rows of data might be missing. A picture might contain a portion of the required object. A video recording of an event might be cut short.

Timeliness

Timeliness is a factor of response time or delivering data by a required date. Many enterprise information systems use batch processes with lengthy execution times or data preparation occurring each month or quarter which delays the delivery of critical information. Other systems provide real-time results from live data streams.

Accuracy

Either your data is accurate, or it is not. Inaccuracies can reside as dormant risk, which can spread to other data sets and downstream systems. One inaccuracy might be used in numerous calculations, summarizations or data science models. The output from these calculations and models can be used as input to other models. As a result, inaccuracies can reach unsuspecting consumers through a variety of ways.

Unknown inaccuracies create an illusion that your data is an asset. As a result, knowledge workers and executives make misinformed decisions that create problems and expenses.  What you thought was an asset is actually a liability. Due to this common and costly misperception, accuracy is considered the most important indicator.

Precision

Measurements provided in inches are more precise than those provided in feet. Data that is more precise can be used in a wider variety of applications when compared to less precise counterparts. Images and video with higher resolution are required for zooming and large format printing with little distortion.

Consistency

Users expect a consistent or better-than-before data experience. Data types and precision should not vary for the same field. The formula to compute a field value should not change from one row to the next. Angles and orientation of photography might need to be the same throughout all pictures. Backwards compatibility between releases will ease the impact of change as data evolves. Changes to data types, distribution formats, precision, or computation formulas causes the consumer to incur the expense to support these changes in dependent reports and systems.

Uniqueness

Uniqueness is the magnitude, which a data set is free of duplicate representations of the same information. For example, multiple entries for the same person are confusing and costly for consumers to resolve. Creating data that is differentiated or unique compared to other sources is considered a part of the relevance indicator.

Accessibility

Accessibility includes the ability of a user to discover and access the information. Providing a seamless access experience includes features such as semantic search for data sources, workflow for a user to attain authorization, and an interface to query or download data.

Understandability

Complex data is unusable if the consumer cannot understand what the data represents. Data sets are easier to use with metadata, detailed documentation and lineage about the sources of the data and history of changes. Incorrect documentation can be as damaging as inaccurate data. Applying data inappropriately will lead to faulty business intelligence and bad decisions. These problems with relapse until the misinformation is corrected or the credibility of the data or documentation is questioned.

Interoperability

The data distribution format affects the consumers ability to easily use the data with a preferred technology. Use can range from running simple queries to integrating the data into an enterprise knowledge graph with data ontologies relevant to the distributed data. Using industry standard formats will ensure high interoperability.

Community

Collaboration between data contributors is proven to create more valuable data. An active community of contributors providing support and users providing feedback will result in data that is more reliable, comprehensive and understood. This factor is a key driver of value for open data.

A synergistic relationship exists between accessibility, understandability, interoperability and community.  Modern data catalogs that utilize knowledge graphs and semantic search like data.world leverage this relationship to empower data driven organizations. Knowledge workers are able to find, understand, use and share data assets.