This is part of Solutions Review’s Premium Content Series, a collection of reviews written by industry experts in maturing software categories. In this presentation, Monte Carlo’s Customer Success Manager, Will Robins, shares some examples of bad data, with insights into why bad data is becoming a major problem for businesses.
We’ve all known the pain of bad data. It could be an urgent call from the C suite about missing data or duplicate tables in your data environment. When data is erroneous or outdated, it’s called data downtime, and as data becomes more valuable, that downtime becomes more of a problem. In this article, we’ll cover 8 reasons why bad data becomes more of a problem.
Data floats downstream
When bad data is identified and corrected by a data engineer, there is no problem. However, if caught by the general public in the wild, there could be significant consequences.
Each stage can also filter out bad data before it moves downstream. However, there are currently several factors that are accelerating the pace of downstream data, including data democratization, reverse ETL, and more.
Data platforms are getting complicated
It’s more problematic for bad data to travel further downstream, because it’s harder to mitigate. You’d much rather have a data engineer fixing a broken pipeline than a data team member having to fix a machine learning model that drifted onto poor data quality.
It is not just mitigating these issues that is problematic. As organizations depend on complex data platforms to make decisions, the opportunity costs of bad data also increase.
An example would be a financial organization with ML models to buy bonds when they reach certain thresholds. Errors in the schema could crash this model for weeks and prevent this group from running.
With increasingly complicated data platforms, there are more moving parts, which can create more opportunities for problems to arise.
More data adoption
Chances are data adoption has increased in your organization. Businesses have recognized that to be data-driven, they need to drive data adoption.
According to a report by Google Cloud and Harvard Business Review, 97% of industry leaders surveyed believe that access to organization-wide data and analytics is critical to the success of their business.
That’s not a bad thing, but as data adoption increases in your organization, it translates to more inactive professionals when your data is bad.
Raising Data Expectations
Your organization has high expectations for the quality of its data. They think data should be like the SaaS products they use that never fail. Unfortunately, few data teams can say that their data platforms are at a SaaS level of reliability.
Many teams today are measured quantitatively rather than quantitatively, which means making life difficult for your data consumers can have consequences. Keeping data quality high and regularly evangelizing your quality metrics can help.
It’s hard to hire a data engineer
Data teams constantly tell me how hard it is to hire data engineers.
Data engineering was one of the fastest growing jobs in the Dice 2020 Tech Job Report with an average salary of over $115,000 according to the 2022 report.
So if time spent on data engineering is limited, it’s better to automate your data quality and monitoring rather than letting them spend their time (some studies show 30-50%) fixing pipelines broken.
Dissemination of data quality responsibilities
Data meshing, which federates data ownership between domain data owners who are held accountable for providing their data as products, is a big trend in data right now.
This brings the team closer to business data, but it can also distribute responsibilities. To prevent problems from arising, teams must have constant and proactive communication. Otherwise, the time it takes to resolve data issues across domains can become problematic.
Abandonment of cookies
By now, most data professionals are aware that the cookie collapses thanks to the GDPR. This means that companies that have outsourced their targeting to third parties will again have to rely on first-party data.
And that first-party data will need to be reliable.
It is a competitive market for data products
It’s amazing to see all the innovative data products produced by the data teams. In some industries, especially media and advertising, it gets super competitive.
Consumers expect more data, more interaction and less latency. It’s crazy how these data products go from hourly batching at 15 minutes, to every 5 minutes, and are now starting to stream.
There’s no room for bad data in these products, not if you want to be competitive.
Prevention is the best solution
Most organizations underestimate the extent of their data quality problem and therefore underinvest time and resources to fix the problem.
A proactive, multi-pronged approach between teams, organizational structure, and tools can help reduce the rising cost of data downtime.