To improve data quality for better AI stop fixing it

In the rapidly evolving landscape of modern business, data stands as a cornerstone for informed decision-making, strategic planning, and overall organizational success. Yet, the quality of data often comes under scrutiny due to inherent inaccuracies introduced during its creation and capture. While the natural inclination might be to rectify these data flaws directly, a more effective approach would involve addressing the root causes by refining the processes responsible for data generation and capture. This article advocates for a shift in perspective—rather than fixating on perfecting data, organizations should prioritize enhancing the underlying business processes. By doing so, the integrity of data as a factual representation of reality can be preserved, while the flaws in data can be harnessed for improving the overall business processes.

Preserving Data Integrity

Data serves as the bedrock upon which organizations build their strategies and navigate their journeys. It acts as a bridge between past occurrences and future aspirations, enabling businesses to discern patterns, identify opportunities, and mitigate potential risks. In the pursuit of maintaining data integrity as a true reflection of reality, the emphasis should be on upholding the authenticity of data rather than attempting to correct every blemish.

The act of post hoc data correction risks distorting the historical narrative, potentially leading to misguided decisions based on altered information. For instance, correcting logistic movements data to complete the data chain might offer short-term relief in consistent reports, but it sidesteps the valuable insights that there are manual work arounds to correct for process disturbances or addressing underlying product performance issues.

Leveraging Flaws for Process Enhancement

The inconsistencies and inaccuracies that reside within data are not mere obstacles; they are signposts pointing towards areas of improvement within business processes. Rather than viewing these flaws as hindrances, organizations should perceive them as guideposts directing the way toward operational refinement.

Imagine a scenario where a retail company routinely records variations in inventory figures. Instead of simply modifying the data to match desired outcomes, the organization should seize this opportunity to delve into the processes leading to these inconsistencies. By dissecting the supply chain, identifying weak links, and implementing corrective measures, the company not only rectifies data accuracy but also optimizes its operations, potentially resulting in cost savings, heightened customer satisfaction, and enhanced product quality.

Connect the dots: From Data to Processes

The inclination to correct data discrepancies often stems from a desire for immaculate datasets that appear to ensure more accurate decision-making. However, this mindset overlooks the fact that data is a mirror reflecting real-world occurrences, which inherently contain imperfections. Rather than chasing an unrealistic data ideal, organizations should pivot towards process improvement as the bedrock of data quality enhancement.

Through meticulous examination of the processes responsible for data creation and capture, organizations can unearth systemic issues that might be undermining their operations. This shift in focus embodies the essence of continuous improvement—a philosophy that emphasizes identifying and rectifying systemic shortcomings over superficial data adjustments.

Empowering Technology

Embracing process enhancement over data rectification does not negate the significance of technology. Indeed, technology can play a pivotal role in automating and streamlining processes, minimizing the introduction of errors in the first place. Automation reduces the likelihood of human fallibility, a significant contributor to data inaccuracies. Furthermore, technology can be harnessed to integrate checks and balances within data capture systems, ensuring accurate and consistent data entry.

Nevertheless, even the most sophisticated technology cannot entirely eliminate flaws from data. The primary objective remains enhancing the processes feeding into these technologies, establishing a cycle of refinement and growth. And beware of the downside of too much control through technology. When there are too many constraints on capturing the variance of data, people get creative and misuse the limitations the technology enforces.

Conclusion

In the age of data-driven insights, the temptation of spotless data can be alluring. Nevertheless, the pursuit of data perfection should not overshadow the essence of reality. Data is an embodiment of genuine events, and its imperfections are key indicators of areas demanding attention and improvement.

Rather than expending energy on rectifying data flaws, organizations should prioritize the enhancement of processes responsible for generating and capturing this data. In doing so, they not only elevate data quality but also nurture a culture of continuous improvement and operational excellence. Every flaw transforms into an opportunity, every discrepancy a potential breakthrough. This approach enables the organization to evolve holistically, guided by the wisdom extracted from its imperfect yet invaluable data.

Stop sticking plasters on data and technology

Do you want to discover a new data revolution?  

Data service organisation - to enable business requirements

Central for de-central data teams: start acting as a (temporary) data owner to service business requirements.

Set up data service desks to be flexible for business requirements.


Data is often seen as the raw material to produce new products, frequently with analytics and AI as the innovative machinery enabling the end-result. Recent years have proven that the value of data more often serves as an enabler of multiple business results, leading to efficiency savings, profits, and the ability to maintain existing markets while expanding into new ones

Data as fundamental oil

Whether it is automated payments and invoicing, online customer interactions, or digital manufacturing, data is the underlying oil that can make your business operations run smoothly.

Or does it? Why is it that, despite the abundance of data, businesses often run sub-optimally, sometimes even relying on manual activities, in this digital age?

“So, if data is the new oil,

why isn’t everything running smoothly?”

All companies generate, receive, and process data to some extent. Data is abundant these days. So, if data is the new oil, why isn’t everything running smoothly?

Here is why: the data itself is complex, and the usage of data is complex. Many companies have tried to resolve this combined complexity through centralized standardization. Many projects aimed at establishing a single data model have become famous, often leading to disappointments. Alternatively, data solutions seek refuge in technology, often resulting in an increase in applications, which can add to the complexities instead of alleviating them. Above all, centralized standardization requires control, which does not adequately serve the business.

Move from control to empower

The very essence of any business is flexibility, the ability to innovate and develop new products and markets. The business needs to be facilitated by data. So, move from controlling towards empowering.

Empowering means understanding that there is no one-size-fits-all when it comes to data, as in the above-mentioned complexities. Data is very similar to machinery. Just compare the oil for ball bearings to petrol. They share the same raw material but differ in volume, characteristics, substance, and processing for different purposes.

How do we see the solution?

With the extensive rise in data volume, complexity, and velocity, a central data team supported by data stewards and architects is no longer sufficient. It requires more decentralized teams that can facilitate specific business needs while adhering to central requirements. Use the motto: control only where needed – for example, using one standard product or client ID across systems, and facilitate where possible, such as adding an additional product ID to support a regional process. Of course, this requires more effort. It is evident that any additional data requires more maintenance. However, the benefits for the business are immediate and significant. There is no need for the business to change processes, systems, or reporting. Immediate possibilities emerge to make more local variations of products and insights, facilitating specific market requirements. This approach maintains the possibility of working with central initiatives and the option to upgrade or downscale data where possible without affecting central requirements.

“The answer is easy, the deployment is more complex “

How do you facilitate this? The answer is easy; the deployment is much more complex. Have dedicated data teams in place with a close relationship to the business. That data team should consist of senior, well-trained data management experts, data analysts, and data engineers to facilitate and guide the local solutions, including the link to central platforms. The team should be able to answer business questions through a so-called service desk. Such a service desk requires a thorough understanding of the business processes and systems and the translation of data requirements into existing (or missing) data within systems. Preferably, the service desk should have the capability to identify, flag, and resolve regulatory questions on privacy, financial legislation, and health legislation. Make sure that the service desk is enabled by a ticketing workflow, including a dashboard displaying their effort and impact. Finally, that data team should be able to guide the business stakeholders in the best approaches and solutions. Don’t expect business stakeholders to deliver data requirements; they will have business requirements. If you didn’t know better, this team almost acts like a data owner.


Of course, some data needs to be strictly governed and controlled. There are overarching business requirements (e.g., insights in sales volumes) that require consistency and quality of trusted data. Identify these key data elements and manage them with a strict and tight regime. These key data elements can. for example, be linked to key reporting, identified as being used for most processes, or be the primary key within multiple systems. Current data volumes can make this identification a tough job. A good start can be using the Dublin Core standard to identify the right regime. The standard uses the following guidance:
– which data is related to which process, system, product and report?
– where is data being used?
– why is the data needed (purpose)?
– Who uses the data?
– How is data labelled and referenced?
– What is the relevance of the data (e.g., static or dynamic)?
– How is data related to other data?

New way

Teams acting as (temporary) data owners is a new, almost revolutionary way of looking at data. The traditional view, based on data standards (e.g., DAMA, DCAM, ISO), all revolves around governance and ownership. That view is based on having data ownership within the business. If you step away from that theoretical view and fall back on lessons learned, then don’t expect business stakeholders to take up sufficient data ownership. For decades, they have perceived data as a by-product.

Most business stakeholders will stay away from data ownership simply because it is unknown territory for them. It is up to the data team to translate do’s and don’ts regarding data and take up data ownership for the business. In theory, this might even support the embracing of data ownership by business stakeholders through the principle of show and tell.

So, a team of experts is required. Such a team goes beyond the effort of some companies to “simply” assign a data steward who reports on the content of specific fields within systems. Companies should build dedicated teams across the organization, which will often need to invent the wheel themselves. The way of working will differ per objective. For project goals, make sure you can act fast, agile, and dedicated. For sustainable solutions, ensure that you stay completely aligned with company standards (and enhance a few where needed) to avoid the “not invented here syndrome.” For any purpose, make sure you take the time to understand and align data, data requirements, and business requirements. And actually build solutions – not just on paper, but within apps, databases, data pipelines, and systems.

All of this will require a solid, robust, and senior data leadership team which can manage, sustain, guide and facilitate data responsibilities. Invest in that team.

For examples on data standards, visit: DAMA, DCAM or ISO

Using Data Mesh to Organise Data Management

We recently visited Sander Kerstens to talk about his Data Mesh implementation at Vanderlande. Data Mesh is a new approach to organising enterprise data. It aims to make managing and using data easier for everyone involved. 

Traditionally speaking, data management is organised through a centralised team that is responsible for all enterprise data. Data Mesh decentralises this, by distributing this responsibility across smaller teams (called domains) within the enterprise. Instead of having central policy and standards that are applied across enterprise teams, teams define their own policies and standards instead. 

In Data Mesh, each domain is responsible for the data they generate, the domain decides how their data is managed, processed and shared with other domains. All domains work together in a networked architecture. In turn, allowing for greater collaboration and ability.

A core principle behind the Data Mesh philosophy is one that we often write about: treating data as a product. As a product, there should be clear documentation and standards that describe that data is used and maintained. Much like in traditional data management, Data Mesh stresses the importance of good metadata.

By empowering smaller teams to take ownership of their data and work more closely with other domains, Data Mesh can help organizations to scale and innovate more quickly and efficiently. It bases data management hygiene factors on its principles, rather than having a central data governance team dictate how teams should act. 

This introduces a different way of thinking, which may be more suited to modern enterprises. This depends on the culture of the enterprise, though. One approach is not necessarily better than the other, both have their own strengths and weaknesses which are outlined below. 

Traditional Data Management

ProsCons
Greater control and consistency Potentially slow and inflexible 
Close alignment with business strategyMay not need team/domain specific requirements
Close alignment with regulatory requirementsMay not need team/domain specific requirements

Data Mesh

ProsCons
Agile and responsive to changing business needsCan help foster innovation and collaboration between teams
May present challenges around data quality and consistencyComplex to implement in terms of culture and technical debt

Leading through ownership of personal data, this is what you should know!

#innovation #data

Enabling or blocking? Sovereignty of personal data.

Within the digital world, individuals are mostly viewed as — potential — consumers (obviously already a high share) or patients (currently growing share). The data of individuals needs to comply to the regulations within the country or region where the data is collected, i.e., it needs to fit with privacy and security.

Companies are building views on individuals, based from the name, address, email etc, which have been provided through every registration to an online service. As well as online behaviour, e.g., through tracking cookies. These centralised views or centralised identities are stored within silo-based platforms. Neither personal data or individual behaviour are well portable. This means that your digital identity exists in many small pieces with several companies knowing different information about you. This also means that you have to create a unique password for every profile you make, which can be cumbersome, and many tend to use the same password more than once. All of this creates security risks, since your personal data is being stored and managed by many entities and because a password breach might give access to several of your accounts.

An attempt to address these issues is federated identities. Individual identities are managed in a company or government centralized system. The system then distributes the data from the individual to a digital service. Examples where this is in use is within banks, insurers, retail and health. A federated identity enables easier digital activities through a single-sign-on solution However, a federated identity is still silo-based, since it only can be used with web services that accept this solution.

“………That’s right, SSI sets data ownership at the individual level.”

A next generation of identity solutions that is currently being developed and taken into use is self-sovereign identities (SSI). This type of digital identity is a user-centric identity solution that allows you to be in control of your data and only share the strictly relevant information. An example would a situation where you need to prove that you are of age. With an SSI you can document that you are over 18, without disclosing your exact age. Or documenting that you have received a specific vaccine, without disclosing information about all the vaccines you have ever gotten or other sensitive health data. Other examples are sharing that you have graduated to your — future — employer, your medical record with a hospital and your bank account with a store. In your own personal vault if you like (also: a ‘holder’ or ‘wallet’), you own and manage your data. That’s right, SSI sets data ownership at the individual level. Data ownership would resolve a large topic, that often proofs to be a blocker for companies to fulfill their digital ambitions. From this vault you decide to which companies & organisation you want to share your personal data to be defined per specific purpose. For this purpose, personal data needs to be classified (e.g., in accordance with privacy & security regulations) which data is open for all, which is private and which is secure data. The vault provider needs to have good technical solutions (e.g., with verifiers and encryption), a sufficient governance regime and controls in place to support this.

SSI will mean that individuals need to understand what ownership comprises of, what potential risks are and what good practices are to share data. Data literacy should be extended from mostly companies to more individuals. And companies should prevent technical, legal, ethical, fairness and security pitfalls (see also: 10 principles for SSI), e.g, for transparency for systems & algorithms as well as data monetization.

Owning the use of data

D8A

Why your company needs a Chief Data Officer.

It is time to increase acknowledgement of the importance of a chief data officer.

As companies move towards working data-driven, monetizing data in new and enhanced services and products is essential. Traditionally heavy regulated industries, e.g. financial and health, first focused on bringing their data in control. Their efforts concentrated mainly around data quality management, data privacy, data governance and E2E trusted data lineage. These efforts are often led — or owned — by a Chief Data Officer (CDO).

In this article, we advocate to shift or extend this focus of the chief data officer towards data in control AND data in use.

CDO’s define and communicate the companies vision on data management and data use. Through this vision, the CDO gives direction, guidance, advocates for change and sets priorities for running projects. Most companies CDO’s have to some extent achieved this for data management. The extended focus of Chief Data Officers, which we advocate for in this article, contains standard processes for the design, prototyping, development, productizing and use of data & insights products & services. Furthermore, it is the CDO who defines a standard set of technology to be used to support these processes and create these solutions. Where needed, this is based on the data management foundation as implemented by the CDO in previous years.

The Chief Data Officer ideally combines business expertise, technology background and analytics/BI. Extended by a common commercial sense, understanding of production processes and knowledge of relevant 3rd party partners to cooperate with. Organisations without an ‘extended CDO’ will experience difficulties and potential delays in reaching their data-driven goals — in accordance with new developments in the market. Without strategic guidance and steering, there is an increased risk that departments and units will define their own standard processes, set of technology and data-driven products and services. Making it harder to leverage pre-existing data foundations as well as cross-unit collaboration to enable effective market penetrations. Teams will struggle to escalate and address growing concerns as sufficient C-level representation is missing.

Concluding, companies benefit from a Chief Data Officer with a focus on data in control and data in use. Top-down ownership and alignment of data initiatives, standardisation of processes and data tooling and a clear escalation path for growing concerns are necessary to succeed as a data-driven company.

Data quality to resolve 3rd party cookies ban

D8A

Marketeers and dedicated advertising benefit from good data quality

Google announced its intentions to kill off the tracking cookies (so called 3rd party cookies) within its Chrome browser. Cookies which advertisers use to track users around the web and target them with dedicated ads. Google is not the only major player altering the digital ad landscape. Apple has already made changes to restrict 3rd -party cookies, along with changes to mobile identifiers and email permissions. Big Tech altering 3rd party cookies is caused by the need to be respectful of the growing data privacy consciousness. Most consumers don’t like the feeling of being tracked across the internet (70% of U.S. adults want data regulation reform and 63% of internet users indicate that companies should delete their online data completely).

For most marketeers, this paradigm change presents huge challenges to enable customer acquisition by tracking users and targeting them with dedicated digital advertising.

On the other hand, 3rd party cookies are inherently problematic, from limited targeting capabilities, inaccurate attribution to the personalization & privacy paradox. Their loss presents an opportunity to provide a smaller group of high-value customers with higher-caliber and increasingly personalized experiences. In other words, losing these cookies might become a blessing in disguise.

Confronting data acquisition challenges in a cookie-less future

For all the shortcomings of 3rd party cookies, the marketing industry does not yet have a perfect answer for how to acquire customers without them. Marketeers are waking up to the impactfull change they are facing. One potential answer to the loss of 3rd party cookies can be that they will be replaced with 1st & 2nd* party data, i.e., gathering data shared directly by the customer, such as an email address, phone numbers and customer authenticators (see below). This data can become the mutual currency for the advertising business. First party data can be hard to obtain, you need to “earn” it, including solutions on how to gain good quality 1st party data.

Technology Section

Some solutions focus on technology, e.g.,Google’s Federated Learning of Cohorts (FLoC). A type of web tracking that groups people into “cohorts” based on their browsing history for interest-based advertising. Other technology solutions include building a 1st and 2nd party data* pool, i.e., a Customer Data Platform (CDP). CDPs are built as complete data solution across multiple sources. By integrating all customer data into individual profiles, a full view of customers can be profiled. Another solution are private identity graphs that hold all the identifiers that correlate with individuals. Private identity graphs can unify digital and offline first-party data to inform the single customer view and manage the changes that occur over time (LINK?). this helps companies to generate consistent, persistent 360-degree view of individuals and their relationship with the company, e.g., per product brand. All to enable stronger relationships with new and existing customers.

Earning good quality data will increase the need for standardized and good quality customer journeys. And therefore, the need for standardized and good quality data.
Where previously, design and data quality were not closely connected, the vanishing 3rd party cookies now acts as catalyst to integrate both.

Data quality is usually an unknown phenomenon for most designers**, design companies, front- & back-end software developers and marketeers. It requires a combined understanding of multiple domains, i.e., the user interface where data will be captured, the underlying processes which the captured data will facilitate, data storage & database structures and marketing (analyses) purposes.

Finding the expert that has all this combined knowledge is like finding a real gem. If you do, handle with care!
It will be more likely that all domains will need at least an understanding how their domain enhances and impacts the other domains.

For the (UI/UX) designer:

  • Have a good knowledge of data quality rule types. What is the difference between a format & accuracy type? Is timeliness of data relevant? What are pitfalls for data quality rules? How to integrate multiple purposes (e.g. processes, data integration & analytics) into a dedicated data quality rule.

For product owners:

  • Ensure that expertise of data entry and how data is used within processes at a granular level (i.e., on data field level). Onboard a so-called data steward who can facilitate the correct input for data quality. Let the data steward cooperate with front-end developers and designers.
  • Keep your data fresh. Data doesn’t last forever. Make sure data stewards support data updates and cleansing.
  • Data stewards should work with designers and front-end developers to determine which fields are considered as critical. These fields should be governed by a strict regime, e.g., for the quality and timeliness of data as well as for access to the data and usage purposes.
  • Personal authentication is a separate topic that needs to be addressed as such. Relying on big tech firms as Facebook or Google can seem an easy solution, however increases the risk of being dependent on an external party. Yet authentication needs to be earned to build authentic customer relationships. When customers give a company a verifiable durable pieces of identity data, they are considered authenticated (e.g. signing up for a newsletter or new account via email address). This will be a new way of working for most companies. Therefore, data stewards need to up their game and not only know existing processes but extend their view, understanding and knowledge towards new developments.
  • Data stewards must align with the Data Privacy Officer on how to capture, store and process data. When it comes to privacy, compliance and ethics, you can never play it too safe.

For data storage & databases:

  • Ensure that data architecture (or at least a business analyst) is involved in the design process. This is sometimes resolved by the back-end developer (who cannot work without aligning with the architect office on data integration, models for databases and data definitions).
  • If standardized data models and/or data definitions reside within the organization, this should be part of the database development. Refer to authoritative source systems where possible.
  • If the application is made via low-code, standardization of existing data models/architecture, data definitions and data quality rules is often part of the approach. Yet, data quality checks should always take place as separate activity.

For marketeers:

  • Understand how customer journeys can facilitate 1st and 2nd party cookies. Determine which data is needed for insights. Gather insights requirements and work together with the data steward to define data quality rules that facilitate your insights. Now that the 3rd party source is limited, the value of the customer journey for marketing increases!
  • Privacy is one of the catalysts to make 3rd party cookies disappear. This requires a new approach for acquiring personal data for marketing and ad targeting. New developments that require new skills and more importantly, a new cooperation between existing domains. Companies that enable this, will lead this new way of working.

Footnotes:

* Data from 1st party cookies = occur only within a company’s own domain. & data from 2nd party cookies = ca be used within and outside a companies’ own domain. This article takes mostly 1st party data into account. For 2nd party data, you can further investigate e.g., ‘data co-ops’, complementary companies that share data. Each member of the co-op should relate to the others in a meaningful way because outside of your own web domain, you’ll be able to reach customers only on your partner sites — and this reflects on your brand.

** Of course, there are designers work who work with data enabled design. In the view of this article, this is a different topic, more focused on tracking & logging data, which is then analyzed to improve the design. This article is about good data quality when data is entered via a UI, e.g., as part of a customer journey.