To improve data quality for better AI stop fixing it

In the rapidly evolving landscape of modern business, data stands as a cornerstone for informed decision-making, strategic planning, and overall organizational success. Yet, the quality of data often comes under scrutiny due to inherent inaccuracies introduced during its creation and capture. While the natural inclination might be to rectify these data flaws directly, a more effective approach would involve addressing the root causes by refining the processes responsible for data generation and capture. This article advocates for a shift in perspective—rather than fixating on perfecting data, organizations should prioritize enhancing the underlying business processes. By doing so, the integrity of data as a factual representation of reality can be preserved, while the flaws in data can be harnessed for improving the overall business processes.

Preserving Data Integrity

Data serves as the bedrock upon which organizations build their strategies and navigate their journeys. It acts as a bridge between past occurrences and future aspirations, enabling businesses to discern patterns, identify opportunities, and mitigate potential risks. In the pursuit of maintaining data integrity as a true reflection of reality, the emphasis should be on upholding the authenticity of data rather than attempting to correct every blemish.

The act of post hoc data correction risks distorting the historical narrative, potentially leading to misguided decisions based on altered information. For instance, correcting logistic movements data to complete the data chain might offer short-term relief in consistent reports, but it sidesteps the valuable insights that there are manual work arounds to correct for process disturbances or addressing underlying product performance issues.

Leveraging Flaws for Process Enhancement

The inconsistencies and inaccuracies that reside within data are not mere obstacles; they are signposts pointing towards areas of improvement within business processes. Rather than viewing these flaws as hindrances, organizations should perceive them as guideposts directing the way toward operational refinement.

Imagine a scenario where a retail company routinely records variations in inventory figures. Instead of simply modifying the data to match desired outcomes, the organization should seize this opportunity to delve into the processes leading to these inconsistencies. By dissecting the supply chain, identifying weak links, and implementing corrective measures, the company not only rectifies data accuracy but also optimizes its operations, potentially resulting in cost savings, heightened customer satisfaction, and enhanced product quality.

Connect the dots: From Data to Processes

The inclination to correct data discrepancies often stems from a desire for immaculate datasets that appear to ensure more accurate decision-making. However, this mindset overlooks the fact that data is a mirror reflecting real-world occurrences, which inherently contain imperfections. Rather than chasing an unrealistic data ideal, organizations should pivot towards process improvement as the bedrock of data quality enhancement.

Through meticulous examination of the processes responsible for data creation and capture, organizations can unearth systemic issues that might be undermining their operations. This shift in focus embodies the essence of continuous improvement—a philosophy that emphasizes identifying and rectifying systemic shortcomings over superficial data adjustments.

Empowering Technology

Embracing process enhancement over data rectification does not negate the significance of technology. Indeed, technology can play a pivotal role in automating and streamlining processes, minimizing the introduction of errors in the first place. Automation reduces the likelihood of human fallibility, a significant contributor to data inaccuracies. Furthermore, technology can be harnessed to integrate checks and balances within data capture systems, ensuring accurate and consistent data entry.

Nevertheless, even the most sophisticated technology cannot entirely eliminate flaws from data. The primary objective remains enhancing the processes feeding into these technologies, establishing a cycle of refinement and growth. And beware of the downside of too much control through technology. When there are too many constraints on capturing the variance of data, people get creative and misuse the limitations the technology enforces.


In the age of data-driven insights, the temptation of spotless data can be alluring. Nevertheless, the pursuit of data perfection should not overshadow the essence of reality. Data is an embodiment of genuine events, and its imperfections are key indicators of areas demanding attention and improvement.

Rather than expending energy on rectifying data flaws, organizations should prioritize the enhancement of processes responsible for generating and capturing this data. In doing so, they not only elevate data quality but also nurture a culture of continuous improvement and operational excellence. Every flaw transforms into an opportunity, every discrepancy a potential breakthrough. This approach enables the organization to evolve holistically, guided by the wisdom extracted from its imperfect yet invaluable data.

Stop sticking plasters on data and technology

Do you want to discover a new data revolution?  

Data service organisation - to enable business requirements

Central for de-central data teams: start acting as a (temporary) data owner to service business requirements.

Set up data service desks to be flexible for business requirements.

Data is often seen as the raw material to produce new products, frequently with analytics and AI as the innovative machinery enabling the end-result. Recent years have proven that the value of data more often serves as an enabler of multiple business results, leading to efficiency savings, profits, and the ability to maintain existing markets while expanding into new ones

Data as fundamental oil

Whether it is automated payments and invoicing, online customer interactions, or digital manufacturing, data is the underlying oil that can make your business operations run smoothly.

Or does it? Why is it that, despite the abundance of data, businesses often run sub-optimally, sometimes even relying on manual activities, in this digital age?

“So, if data is the new oil,

why isn’t everything running smoothly?”

All companies generate, receive, and process data to some extent. Data is abundant these days. So, if data is the new oil, why isn’t everything running smoothly?

Here is why: the data itself is complex, and the usage of data is complex. Many companies have tried to resolve this combined complexity through centralized standardization. Many projects aimed at establishing a single data model have become famous, often leading to disappointments. Alternatively, data solutions seek refuge in technology, often resulting in an increase in applications, which can add to the complexities instead of alleviating them. Above all, centralized standardization requires control, which does not adequately serve the business.

Move from control to empower

The very essence of any business is flexibility, the ability to innovate and develop new products and markets. The business needs to be facilitated by data. So, move from controlling towards empowering.

Empowering means understanding that there is no one-size-fits-all when it comes to data, as in the above-mentioned complexities. Data is very similar to machinery. Just compare the oil for ball bearings to petrol. They share the same raw material but differ in volume, characteristics, substance, and processing for different purposes.

How do we see the solution?

With the extensive rise in data volume, complexity, and velocity, a central data team supported by data stewards and architects is no longer sufficient. It requires more decentralized teams that can facilitate specific business needs while adhering to central requirements. Use the motto: control only where needed – for example, using one standard product or client ID across systems, and facilitate where possible, such as adding an additional product ID to support a regional process. Of course, this requires more effort. It is evident that any additional data requires more maintenance. However, the benefits for the business are immediate and significant. There is no need for the business to change processes, systems, or reporting. Immediate possibilities emerge to make more local variations of products and insights, facilitating specific market requirements. This approach maintains the possibility of working with central initiatives and the option to upgrade or downscale data where possible without affecting central requirements.

“The answer is easy, the deployment is more complex “

How do you facilitate this? The answer is easy; the deployment is much more complex. Have dedicated data teams in place with a close relationship to the business. That data team should consist of senior, well-trained data management experts, data analysts, and data engineers to facilitate and guide the local solutions, including the link to central platforms. The team should be able to answer business questions through a so-called service desk. Such a service desk requires a thorough understanding of the business processes and systems and the translation of data requirements into existing (or missing) data within systems. Preferably, the service desk should have the capability to identify, flag, and resolve regulatory questions on privacy, financial legislation, and health legislation. Make sure that the service desk is enabled by a ticketing workflow, including a dashboard displaying their effort and impact. Finally, that data team should be able to guide the business stakeholders in the best approaches and solutions. Don’t expect business stakeholders to deliver data requirements; they will have business requirements. If you didn’t know better, this team almost acts like a data owner.

Of course, some data needs to be strictly governed and controlled. There are overarching business requirements (e.g., insights in sales volumes) that require consistency and quality of trusted data. Identify these key data elements and manage them with a strict and tight regime. These key data elements can. for example, be linked to key reporting, identified as being used for most processes, or be the primary key within multiple systems. Current data volumes can make this identification a tough job. A good start can be using the Dublin Core standard to identify the right regime. The standard uses the following guidance:
– which data is related to which process, system, product and report?
– where is data being used?
– why is the data needed (purpose)?
– Who uses the data?
– How is data labelled and referenced?
– What is the relevance of the data (e.g., static or dynamic)?
– How is data related to other data?

New way

Teams acting as (temporary) data owners is a new, almost revolutionary way of looking at data. The traditional view, based on data standards (e.g., DAMA, DCAM, ISO), all revolves around governance and ownership. That view is based on having data ownership within the business. If you step away from that theoretical view and fall back on lessons learned, then don’t expect business stakeholders to take up sufficient data ownership. For decades, they have perceived data as a by-product.

Most business stakeholders will stay away from data ownership simply because it is unknown territory for them. It is up to the data team to translate do’s and don’ts regarding data and take up data ownership for the business. In theory, this might even support the embracing of data ownership by business stakeholders through the principle of show and tell.

So, a team of experts is required. Such a team goes beyond the effort of some companies to “simply” assign a data steward who reports on the content of specific fields within systems. Companies should build dedicated teams across the organization, which will often need to invent the wheel themselves. The way of working will differ per objective. For project goals, make sure you can act fast, agile, and dedicated. For sustainable solutions, ensure that you stay completely aligned with company standards (and enhance a few where needed) to avoid the “not invented here syndrome.” For any purpose, make sure you take the time to understand and align data, data requirements, and business requirements. And actually build solutions – not just on paper, but within apps, databases, data pipelines, and systems.

All of this will require a solid, robust, and senior data leadership team which can manage, sustain, guide and facilitate data responsibilities. Invest in that team.

For examples on data standards, visit: DAMA, DCAM or ISO

Material Science with Materials Zone

Ori Yudilevich (Chief Technology Officer at Materials Zone) on: the history of Materials Zone, the company and its product. Ori explains how Materials Zone’s Materials Informatics platform applies material science techniques to save costs. He also explains what challenges with regards to data they usually come across.

D8A Directors
D8A Directors
Material Science with Materials Zone

Tailor made D8A Academy trainings

Members and partners of D8A Directors can bring their years of experience as hands-on training to your organisation.

Are you looking for inspiration, guidance and practical how-to’s in building your data driven organisation?

Some inspiration on topics:

  • Guiding change through enterprise data architecture practice
  • How metadata management enables you to find, understand, govern, trust and share your data
  • Growing a federated data governance in your organisation made practical
  • Where and how to start with data quality management with immediate cost reduction in business operations
  • How to transition the perspectives of data security, privacy & compliance from ‘cost center’ to ‘profit center’ when done right
  • How to organise data products in data mesh
  • How to realize data observability embedded into your data pipelines with databricks and deltalake
  • and other

Don’t hesitate to get in touch with your case for a trailor made training!

Trusted data awareness

Arjan Pepping (Corporate Data Manager at MN) on: creating awareness around trusted data and the role of data in control for a pension provider. Listen for the golden tip on implementing data awareness.

Date with D8A
Date with D8A
Trusted data awareness

Good design teams embrace data quality

Marinka Voorhout (Director at Philips) on: data quality in design is becoming a pre requisite for innovations on data. Listen to practical approach tips and ideas to take data quality into account in user interfaces.

Date with D8A
Date with D8A
Good design teams embrace data quality

Data quality to resolve 3rd party cookies ban


Marketeers and dedicated advertising benefit from good data quality

Google announced its intentions to kill off the tracking cookies (so called 3rd party cookies) within its Chrome browser. Cookies which advertisers use to track users around the web and target them with dedicated ads. Google is not the only major player altering the digital ad landscape. Apple has already made changes to restrict 3rd -party cookies, along with changes to mobile identifiers and email permissions. Big Tech altering 3rd party cookies is caused by the need to be respectful of the growing data privacy consciousness. Most consumers don’t like the feeling of being tracked across the internet (70% of U.S. adults want data regulation reform and 63% of internet users indicate that companies should delete their online data completely).

For most marketeers, this paradigm change presents huge challenges to enable customer acquisition by tracking users and targeting them with dedicated digital advertising.

On the other hand, 3rd party cookies are inherently problematic, from limited targeting capabilities, inaccurate attribution to the personalization & privacy paradox. Their loss presents an opportunity to provide a smaller group of high-value customers with higher-caliber and increasingly personalized experiences. In other words, losing these cookies might become a blessing in disguise.

Confronting data acquisition challenges in a cookie-less future

For all the shortcomings of 3rd party cookies, the marketing industry does not yet have a perfect answer for how to acquire customers without them. Marketeers are waking up to the impactfull change they are facing. One potential answer to the loss of 3rd party cookies can be that they will be replaced with 1st & 2nd* party data, i.e., gathering data shared directly by the customer, such as an email address, phone numbers and customer authenticators (see below). This data can become the mutual currency for the advertising business. First party data can be hard to obtain, you need to “earn” it, including solutions on how to gain good quality 1st party data.

Technology Section

Some solutions focus on technology, e.g.,Google’s Federated Learning of Cohorts (FLoC). A type of web tracking that groups people into “cohorts” based on their browsing history for interest-based advertising. Other technology solutions include building a 1st and 2nd party data* pool, i.e., a Customer Data Platform (CDP). CDPs are built as complete data solution across multiple sources. By integrating all customer data into individual profiles, a full view of customers can be profiled. Another solution are private identity graphs that hold all the identifiers that correlate with individuals. Private identity graphs can unify digital and offline first-party data to inform the single customer view and manage the changes that occur over time (LINK?). this helps companies to generate consistent, persistent 360-degree view of individuals and their relationship with the company, e.g., per product brand. All to enable stronger relationships with new and existing customers.

Earning good quality data will increase the need for standardized and good quality customer journeys. And therefore, the need for standardized and good quality data.
Where previously, design and data quality were not closely connected, the vanishing 3rd party cookies now acts as catalyst to integrate both.

Data quality is usually an unknown phenomenon for most designers**, design companies, front- & back-end software developers and marketeers. It requires a combined understanding of multiple domains, i.e., the user interface where data will be captured, the underlying processes which the captured data will facilitate, data storage & database structures and marketing (analyses) purposes.

Finding the expert that has all this combined knowledge is like finding a real gem. If you do, handle with care!
It will be more likely that all domains will need at least an understanding how their domain enhances and impacts the other domains.

For the (UI/UX) designer:

  • Have a good knowledge of data quality rule types. What is the difference between a format & accuracy type? Is timeliness of data relevant? What are pitfalls for data quality rules? How to integrate multiple purposes (e.g. processes, data integration & analytics) into a dedicated data quality rule.

For product owners:

  • Ensure that expertise of data entry and how data is used within processes at a granular level (i.e., on data field level). Onboard a so-called data steward who can facilitate the correct input for data quality. Let the data steward cooperate with front-end developers and designers.
  • Keep your data fresh. Data doesn’t last forever. Make sure data stewards support data updates and cleansing.
  • Data stewards should work with designers and front-end developers to determine which fields are considered as critical. These fields should be governed by a strict regime, e.g., for the quality and timeliness of data as well as for access to the data and usage purposes.
  • Personal authentication is a separate topic that needs to be addressed as such. Relying on big tech firms as Facebook or Google can seem an easy solution, however increases the risk of being dependent on an external party. Yet authentication needs to be earned to build authentic customer relationships. When customers give a company a verifiable durable pieces of identity data, they are considered authenticated (e.g. signing up for a newsletter or new account via email address). This will be a new way of working for most companies. Therefore, data stewards need to up their game and not only know existing processes but extend their view, understanding and knowledge towards new developments.
  • Data stewards must align with the Data Privacy Officer on how to capture, store and process data. When it comes to privacy, compliance and ethics, you can never play it too safe.

For data storage & databases:

  • Ensure that data architecture (or at least a business analyst) is involved in the design process. This is sometimes resolved by the back-end developer (who cannot work without aligning with the architect office on data integration, models for databases and data definitions).
  • If standardized data models and/or data definitions reside within the organization, this should be part of the database development. Refer to authoritative source systems where possible.
  • If the application is made via low-code, standardization of existing data models/architecture, data definitions and data quality rules is often part of the approach. Yet, data quality checks should always take place as separate activity.

For marketeers:

  • Understand how customer journeys can facilitate 1st and 2nd party cookies. Determine which data is needed for insights. Gather insights requirements and work together with the data steward to define data quality rules that facilitate your insights. Now that the 3rd party source is limited, the value of the customer journey for marketing increases!
  • Privacy is one of the catalysts to make 3rd party cookies disappear. This requires a new approach for acquiring personal data for marketing and ad targeting. New developments that require new skills and more importantly, a new cooperation between existing domains. Companies that enable this, will lead this new way of working.


* Data from 1st party cookies = occur only within a company’s own domain. & data from 2nd party cookies = ca be used within and outside a companies’ own domain. This article takes mostly 1st party data into account. For 2nd party data, you can further investigate e.g., ‘data co-ops’, complementary companies that share data. Each member of the co-op should relate to the others in a meaningful way because outside of your own web domain, you’ll be able to reach customers only on your partner sites — and this reflects on your brand.

** Of course, there are designers work who work with data enabled design. In the view of this article, this is a different topic, more focused on tracking & logging data, which is then analyzed to improve the design. This article is about good data quality when data is entered via a UI, e.g., as part of a customer journey.