To improve data quality for better AI stop fixing it

In the rapidly evolving landscape of modern business, data stands as a cornerstone for informed decision-making, strategic planning, and overall organizational success. Yet, the quality of data often comes under scrutiny due to inherent inaccuracies introduced during its creation and capture. While the natural inclination might be to rectify these data flaws directly, a more effective approach would involve addressing the root causes by refining the processes responsible for data generation and capture. This article advocates for a shift in perspective—rather than fixating on perfecting data, organizations should prioritize enhancing the underlying business processes. By doing so, the integrity of data as a factual representation of reality can be preserved, while the flaws in data can be harnessed for improving the overall business processes.

Preserving Data Integrity

Data serves as the bedrock upon which organizations build their strategies and navigate their journeys. It acts as a bridge between past occurrences and future aspirations, enabling businesses to discern patterns, identify opportunities, and mitigate potential risks. In the pursuit of maintaining data integrity as a true reflection of reality, the emphasis should be on upholding the authenticity of data rather than attempting to correct every blemish.

The act of post hoc data correction risks distorting the historical narrative, potentially leading to misguided decisions based on altered information. For instance, correcting logistic movements data to complete the data chain might offer short-term relief in consistent reports, but it sidesteps the valuable insights that there are manual work arounds to correct for process disturbances or addressing underlying product performance issues.

Leveraging Flaws for Process Enhancement

The inconsistencies and inaccuracies that reside within data are not mere obstacles; they are signposts pointing towards areas of improvement within business processes. Rather than viewing these flaws as hindrances, organizations should perceive them as guideposts directing the way toward operational refinement.

Imagine a scenario where a retail company routinely records variations in inventory figures. Instead of simply modifying the data to match desired outcomes, the organization should seize this opportunity to delve into the processes leading to these inconsistencies. By dissecting the supply chain, identifying weak links, and implementing corrective measures, the company not only rectifies data accuracy but also optimizes its operations, potentially resulting in cost savings, heightened customer satisfaction, and enhanced product quality.

Connect the dots: From Data to Processes

The inclination to correct data discrepancies often stems from a desire for immaculate datasets that appear to ensure more accurate decision-making. However, this mindset overlooks the fact that data is a mirror reflecting real-world occurrences, which inherently contain imperfections. Rather than chasing an unrealistic data ideal, organizations should pivot towards process improvement as the bedrock of data quality enhancement.

Through meticulous examination of the processes responsible for data creation and capture, organizations can unearth systemic issues that might be undermining their operations. This shift in focus embodies the essence of continuous improvement—a philosophy that emphasizes identifying and rectifying systemic shortcomings over superficial data adjustments.

Empowering Technology

Embracing process enhancement over data rectification does not negate the significance of technology. Indeed, technology can play a pivotal role in automating and streamlining processes, minimizing the introduction of errors in the first place. Automation reduces the likelihood of human fallibility, a significant contributor to data inaccuracies. Furthermore, technology can be harnessed to integrate checks and balances within data capture systems, ensuring accurate and consistent data entry.

Nevertheless, even the most sophisticated technology cannot entirely eliminate flaws from data. The primary objective remains enhancing the processes feeding into these technologies, establishing a cycle of refinement and growth. And beware of the downside of too much control through technology. When there are too many constraints on capturing the variance of data, people get creative and misuse the limitations the technology enforces.

Conclusion

In the age of data-driven insights, the temptation of spotless data can be alluring. Nevertheless, the pursuit of data perfection should not overshadow the essence of reality. Data is an embodiment of genuine events, and its imperfections are key indicators of areas demanding attention and improvement.

Rather than expending energy on rectifying data flaws, organizations should prioritize the enhancement of processes responsible for generating and capturing this data. In doing so, they not only elevate data quality but also nurture a culture of continuous improvement and operational excellence. Every flaw transforms into an opportunity, every discrepancy a potential breakthrough. This approach enables the organization to evolve holistically, guided by the wisdom extracted from its imperfect yet invaluable data.

Stop sticking plasters on data and technology

The Metaverse: new frontier that re-imagines retail & health. And data re-imagines the Metaverse.

D8A vision on metaverse

Avoid mistakes. How data dependent is the newest digital development?

The Metaverse, generated in science fiction and frequently applied to gaming platforms is trending in e-commerce and increasingly in healthcare.

Simply stated, the Metaverse is a connection between the physical and virtual worlds and is seen as the successor to the mobile internet.

Purchasing a digital product in one ecosystem in the Metaverse (e.g., Facebook) allows you to use it in another (like TikTok). Or buy a physical product, which includes a digital twin — a digital representation as well as a statistic — for your online persona. And vice versa, the digital twin could be used to increase sale at a physical location, if a physical product is not available at a shop, the digital twin can be shown as example. Want to model how a car would behave with the same conditions as in the physical world your are in right now (weather, population, other vehicles on the road) then you can in the Metaverse. Or just google Fortnite’s Ariana Grande concert in the Metaverse.

Meta-commerce is you like, beyond e-commerce. Metaverse is also beyond virtual reality. It is a hybrid of VR, AR, mixed reality and can interact with real life.

So interoperability between eco-systems is key. This is how the timestamps of blockchain can show its worth beyond crypto! The same goes for data. To understand the value of good data, you need to think beyond the structured data that often comes to mind when we talk about data, AI and innovations.

The Metaverse is all about unstructured data or files; e.g., images, videos, music, SEO. Often a neglected area when it comes to data quality concepts such as accuracy (high, medium, low quality), timeliness, versioning (which originates from archiving principles and is now becoming directly related to the core business & product life cycle management), format and completeness. Each file needs to have sufficient data quality rules, definitions and other meta data to enable the above mentioned interoperability. It needs to be totally clear that the digital twin you’re being is from this season or last season for instance. And don’t forget hygiene factors such as ownership (who owns the digital twin? Who can re-sell it?), customization possibilities, portability, sharing agreements, security and most of all privacy. Hot topic, privacy and The Metaverse. Being in accordance with legislation is key in a highly digital world.

From an analytics perspective, the Metaverse is similar to AR &VR. It needs high quality training data. Which means data sets needs to be accurate and fit for use, removing bias and including good data labeling — based on standard classifications.

The Metaverse within the healthcare sector seems a logical next move. Here the ownership, portability and privacy are even more significant. Further increasing the value of good data quality, governed by a fitting regime.

In short, the Metaverse is the upcoming opportunity to increase the value of good data. And for businesses to become further data driven.

Leading through ownership of personal data, this is what you should know!

#innovation #data

Enabling or blocking? Sovereignty of personal data.

Within the digital world, individuals are mostly viewed as — potential — consumers (obviously already a high share) or patients (currently growing share). The data of individuals needs to comply to the regulations within the country or region where the data is collected, i.e., it needs to fit with privacy and security.

Companies are building views on individuals, based from the name, address, email etc, which have been provided through every registration to an online service. As well as online behaviour, e.g., through tracking cookies. These centralised views or centralised identities are stored within silo-based platforms. Neither personal data or individual behaviour are well portable. This means that your digital identity exists in many small pieces with several companies knowing different information about you. This also means that you have to create a unique password for every profile you make, which can be cumbersome, and many tend to use the same password more than once. All of this creates security risks, since your personal data is being stored and managed by many entities and because a password breach might give access to several of your accounts.

An attempt to address these issues is federated identities. Individual identities are managed in a company or government centralized system. The system then distributes the data from the individual to a digital service. Examples where this is in use is within banks, insurers, retail and health. A federated identity enables easier digital activities through a single-sign-on solution However, a federated identity is still silo-based, since it only can be used with web services that accept this solution.

“………That’s right, SSI sets data ownership at the individual level.”

A next generation of identity solutions that is currently being developed and taken into use is self-sovereign identities (SSI). This type of digital identity is a user-centric identity solution that allows you to be in control of your data and only share the strictly relevant information. An example would a situation where you need to prove that you are of age. With an SSI you can document that you are over 18, without disclosing your exact age. Or documenting that you have received a specific vaccine, without disclosing information about all the vaccines you have ever gotten or other sensitive health data. Other examples are sharing that you have graduated to your — future — employer, your medical record with a hospital and your bank account with a store. In your own personal vault if you like (also: a ‘holder’ or ‘wallet’), you own and manage your data. That’s right, SSI sets data ownership at the individual level. Data ownership would resolve a large topic, that often proofs to be a blocker for companies to fulfill their digital ambitions. From this vault you decide to which companies & organisation you want to share your personal data to be defined per specific purpose. For this purpose, personal data needs to be classified (e.g., in accordance with privacy & security regulations) which data is open for all, which is private and which is secure data. The vault provider needs to have good technical solutions (e.g., with verifiers and encryption), a sufficient governance regime and controls in place to support this.

SSI will mean that individuals need to understand what ownership comprises of, what potential risks are and what good practices are to share data. Data literacy should be extended from mostly companies to more individuals. And companies should prevent technical, legal, ethical, fairness and security pitfalls (see also: 10 principles for SSI), e.g, for transparency for systems & algorithms as well as data monetization.

Data quality to resolve 3rd party cookies ban

D8A

Marketeers and dedicated advertising benefit from good data quality

Google announced its intentions to kill off the tracking cookies (so called 3rd party cookies) within its Chrome browser. Cookies which advertisers use to track users around the web and target them with dedicated ads. Google is not the only major player altering the digital ad landscape. Apple has already made changes to restrict 3rd -party cookies, along with changes to mobile identifiers and email permissions. Big Tech altering 3rd party cookies is caused by the need to be respectful of the growing data privacy consciousness. Most consumers don’t like the feeling of being tracked across the internet (70% of U.S. adults want data regulation reform and 63% of internet users indicate that companies should delete their online data completely).

For most marketeers, this paradigm change presents huge challenges to enable customer acquisition by tracking users and targeting them with dedicated digital advertising.

On the other hand, 3rd party cookies are inherently problematic, from limited targeting capabilities, inaccurate attribution to the personalization & privacy paradox. Their loss presents an opportunity to provide a smaller group of high-value customers with higher-caliber and increasingly personalized experiences. In other words, losing these cookies might become a blessing in disguise.

Confronting data acquisition challenges in a cookie-less future

For all the shortcomings of 3rd party cookies, the marketing industry does not yet have a perfect answer for how to acquire customers without them. Marketeers are waking up to the impactfull change they are facing. One potential answer to the loss of 3rd party cookies can be that they will be replaced with 1st & 2nd* party data, i.e., gathering data shared directly by the customer, such as an email address, phone numbers and customer authenticators (see below). This data can become the mutual currency for the advertising business. First party data can be hard to obtain, you need to “earn” it, including solutions on how to gain good quality 1st party data.

Technology Section

Some solutions focus on technology, e.g.,Google’s Federated Learning of Cohorts (FLoC). A type of web tracking that groups people into “cohorts” based on their browsing history for interest-based advertising. Other technology solutions include building a 1st and 2nd party data* pool, i.e., a Customer Data Platform (CDP). CDPs are built as complete data solution across multiple sources. By integrating all customer data into individual profiles, a full view of customers can be profiled. Another solution are private identity graphs that hold all the identifiers that correlate with individuals. Private identity graphs can unify digital and offline first-party data to inform the single customer view and manage the changes that occur over time (LINK?). this helps companies to generate consistent, persistent 360-degree view of individuals and their relationship with the company, e.g., per product brand. All to enable stronger relationships with new and existing customers.

Earning good quality data will increase the need for standardized and good quality customer journeys. And therefore, the need for standardized and good quality data.
Where previously, design and data quality were not closely connected, the vanishing 3rd party cookies now acts as catalyst to integrate both.

Data quality is usually an unknown phenomenon for most designers**, design companies, front- & back-end software developers and marketeers. It requires a combined understanding of multiple domains, i.e., the user interface where data will be captured, the underlying processes which the captured data will facilitate, data storage & database structures and marketing (analyses) purposes.

Finding the expert that has all this combined knowledge is like finding a real gem. If you do, handle with care!
It will be more likely that all domains will need at least an understanding how their domain enhances and impacts the other domains.

For the (UI/UX) designer:

  • Have a good knowledge of data quality rule types. What is the difference between a format & accuracy type? Is timeliness of data relevant? What are pitfalls for data quality rules? How to integrate multiple purposes (e.g. processes, data integration & analytics) into a dedicated data quality rule.

For product owners:

  • Ensure that expertise of data entry and how data is used within processes at a granular level (i.e., on data field level). Onboard a so-called data steward who can facilitate the correct input for data quality. Let the data steward cooperate with front-end developers and designers.
  • Keep your data fresh. Data doesn’t last forever. Make sure data stewards support data updates and cleansing.
  • Data stewards should work with designers and front-end developers to determine which fields are considered as critical. These fields should be governed by a strict regime, e.g., for the quality and timeliness of data as well as for access to the data and usage purposes.
  • Personal authentication is a separate topic that needs to be addressed as such. Relying on big tech firms as Facebook or Google can seem an easy solution, however increases the risk of being dependent on an external party. Yet authentication needs to be earned to build authentic customer relationships. When customers give a company a verifiable durable pieces of identity data, they are considered authenticated (e.g. signing up for a newsletter or new account via email address). This will be a new way of working for most companies. Therefore, data stewards need to up their game and not only know existing processes but extend their view, understanding and knowledge towards new developments.
  • Data stewards must align with the Data Privacy Officer on how to capture, store and process data. When it comes to privacy, compliance and ethics, you can never play it too safe.

For data storage & databases:

  • Ensure that data architecture (or at least a business analyst) is involved in the design process. This is sometimes resolved by the back-end developer (who cannot work without aligning with the architect office on data integration, models for databases and data definitions).
  • If standardized data models and/or data definitions reside within the organization, this should be part of the database development. Refer to authoritative source systems where possible.
  • If the application is made via low-code, standardization of existing data models/architecture, data definitions and data quality rules is often part of the approach. Yet, data quality checks should always take place as separate activity.

For marketeers:

  • Understand how customer journeys can facilitate 1st and 2nd party cookies. Determine which data is needed for insights. Gather insights requirements and work together with the data steward to define data quality rules that facilitate your insights. Now that the 3rd party source is limited, the value of the customer journey for marketing increases!
  • Privacy is one of the catalysts to make 3rd party cookies disappear. This requires a new approach for acquiring personal data for marketing and ad targeting. New developments that require new skills and more importantly, a new cooperation between existing domains. Companies that enable this, will lead this new way of working.

Footnotes:

* Data from 1st party cookies = occur only within a company’s own domain. & data from 2nd party cookies = ca be used within and outside a companies’ own domain. This article takes mostly 1st party data into account. For 2nd party data, you can further investigate e.g., ‘data co-ops’, complementary companies that share data. Each member of the co-op should relate to the others in a meaningful way because outside of your own web domain, you’ll be able to reach customers only on your partner sites — and this reflects on your brand.

** Of course, there are designers work who work with data enabled design. In the view of this article, this is a different topic, more focused on tracking & logging data, which is then analyzed to improve the design. This article is about good data quality when data is entered via a UI, e.g., as part of a customer journey.