Who's Minding the (Data) Store?

It is certainly nothing new that data privacy and questions over how different organizations collect, maintain and utilize an individual’s data is a hot topic among consumers, lawmakers and companies alike. Past data breaches from several well-known companies have raised the antennae on the topic, and ongoing questions around what data is being stored and how it is being used by other high-profile corporations continue to draw attention. In 2018, California began to take the issue into their own hands by passing the California Consumer Privacy Act. This Act, known as the CCPA for short, goes into effect January 1 of 2020. Among other things, the CCPA allows for California consumers in certain situations to be informed about the personal information that companies collect on them, ask that their information be deleted if certain criteria are not met, gives consumers the right to prevent companies from selling or disclosing their personal information and provides a limited right of action by consumers against firms that have a breach of their personal data. The California law follows after the General Data Protection Regulation (GDPR) that was passed by the European Union in 2016 which also had several provisions regarding handling of personal data. Now, the United States federal government is getting into the “act” as the Senate works to draft a comprehensive data privacy and protection bill of their own, in addition to many other bills introduced by Congress over the past several months that address portions of consumer data privacy and protection. Also, many other states like Illinois, New York, Pennsylvania, Texas and Washington (to name a few) are looking at their own potential comprehensive privacy laws or are expanding existing consumer data privacy laws already on the books.

Clearly, data privacy is important, and it seems unlikely for anyone to substantially dispute that point. Given both the evolution of technology and the importance of competing on data (which includes the ever-increasing types and volume of data being collected by companies), the focus on protecting both personal and corporate data will only continue to increase. Cybersecurity (protecting systems, networks and data from digital attacks) is all the rage. Organizations continue to devote more resources to this important task as they try to stay ahead of those that would seek to compromise their systems and data. In addition, cyber insurance, which is protection against the financial loss due to a cyber-related event, has continued to grow in importance and premium volume.

But what about the quality of the data that is being protected? Who is looking at that and are resources being devoted to ensure that data is of high quality? Often referred to as “data hygiene,” it’s the much less glamorous side of the big data equation. However, while it may seem obvious, if the data being used for those increasingly sophisticated algorithms and important purposes is not of good quality, then the results can be questionable at best and flat out wrong at worst. Garbage in, garbage out. The best predictive modeler can apply the most advanced techniques that technology can support, but he or she can’t really overcome the issue of the data being bad. Not only can bad data lead to less-than-optimal analytics results, but it can also lead to other (and possibly more severe) consequences, such as poor servicing of customers or reputational damage.

No data is perfect, so there is usually considerable effort that goes into any analytics project around making sure the data is as useful and accurate as is reasonably possible. Ideally, companies are taking steps well before any analytics work is invoked to provide for high quality data. Some of these steps can include a strategy for how data is stored and retrieved. Is it able to be connected across various sources in an efficient and effective way? Are there controls being put in place both as the data is being built (allowed value checks, data cleaning procedures, providing for a single source of the truth, etc.) as well as after the data store has been established (auditing/balancing of information, periodic surveys of the data, regular updating of data, etc.)? Finally, is there useful and up-to-date metadata (information about the data) that is being developed and readily available for end users of the data to consume?

While it may not get the glory, those charged with developing the processes around and maintaining the quality of a company’s data store serve a critical role in the success or failure of analytics (big data or otherwise). The importance of the results, and really the ability for a company to successfully compete, now and into the future, relies on it. Data management practices must not be overlooked—the investment of both financial and human capital will most certainly be worth it!