Skip to content

D8A directors

Giving you and your data direction

  • Home
  • Content studio
    • Latest articles
    • Podcasts
    • Enterprise Directions
    • Industries
  • D8A Academy
  • Partners
    • Syntho – synthesize data
  • Our story
  • Let’s get in touch!
  •  

Tag: data retention

Posted on 31/05/202331/05/2023

How to make data governance practical using datasets

Nowadays, even when you have a catalog implemented that scans your data, the number of search results on the word ‘customer’ is too high and far from relevant. Scrolling through the first page of results makes you wonder what would suffice your data needs. Is the data properly described, do you know who to go to for questions and are you even allowed to use privacy sensitive data for your use case?

Documenting and maintaining these basic attributes of your data is already challenging and time consuming in itself. And at what granularity do you need to do this? If you listen to legal and compliance; every single attribute of every table of every system must be properly protected. This seems like an impossible task. And it is!

How can you practically approach this?

Start organising your data into datasets!

Classifying and defining all your data attributes is a cumbersome and time consuming task. At some point you should do it for at least the data you share. But waiting for all this to be done is not an option. Business must go on. So how can we overcome the big pile of work in the mean time? The answer is to start organising your data into datasets.

What do we mean by dataset?

There are varying opinions of what a dataset may be. At D8A Directors we consider a dataset as a collection of physical data objects that needs to be managed (governed) together. This can consist of a single or a selection of tables that make up the customer master data or the collection of tables that hold your inbound shipments. It may also be used for unstructured data like video’s or images that were collected during a particular event.

Then governance is applied at all the data together in these datasets. When you combine data into datasets that needs to be physically managed together; you don’t need to manage every individual attribute. The question then is what are these governance requirements that you should think about to combine data?

Dimensions to seperate data into datasets

To be honest, the dimensions to take into account need to be determined in your organisation by the Data Governance department in their policies. The legal grounds that allow you to process data vary per organisation, so there’s no one single set of dimensions. However, to get you started, here’s an example of good practice dimensions:

  • Source system
  • Data product
  • Geographical Region due to regional legislation
  • Responsible data owner (single ownership for accountability)
  • Purpose limitations
  • Retention term
  • Business & privacy sensitivity classification

The goal of a dataset is to be able to group physical data for purposes of:

  • Logically organizing data
  • Data Ownership and Data Governance
  • Request and approval against purpose limitations
  • Granting and Managing Security / Data Access Rights
  • Managing Data Retention and Disposal
  • Classifying Data for Privacy
  • Classifying Data for Geographical Regions
  • To make Reuse of data easier
  • Archiving Data to cheaper Storage tiers

Let’s make it practical

Decomposing datasets distinguishes conceptual data models, data objects and the attributes (from a logical model). These should then be linked up to the actual physical data.

Data Attributes
Data Attributes
Data Objects
Data Objects
Datasets
Datasets
Conceptual
Data Entities
Conceptual…
Datasets group data objects on sensitivity, retention, ownership, regulatory region, etc to enable governance and access management
Datasets group data objects…
Conceptual entities are the data representations understood by business users and relate to business objects affected by business processes.
Conceptual entities are the…
Business Users
Business Users
Technology Users
Technology Users
Data objects represent a detailing of data concepts to the level of tables and attributes.
Data objects represent a det…
Data attributes describe the data that is captured like columns of a table.
Data attributes describe the…
Text is not SVG – cannot display

An example is shown below on how data privacy classification of attributes affect datasets. Classsifying all attributes is a lot of work while you can manage your data already to a sufficient extent when this is classified at dataset level:

German Employees
German Employees
Legal
entity
Legal…
Legal
structure
Legal…
Person
Person
Employee
Employee
Home
address
Home…
Employee
Employee
Employee
contract
Employee…
Employee
Employee
Performance evaluation
Performance evalu…
Legal
entity
Legal…
Organisational
unit
Organisational…
Employee contract
Employee contract
Perf. reviews
Perf. reviews
Org. structure
Org. structure
Person
Person
Employee address
Employee address
Employee
Employee
Performance
evaluation
Performance…
Employee
contract
Employee…
Abscence
Abscence
Organisational
unit
Organisational…
Legal
entity
Legal…
Conceptual model
Conceptual model
City
City
Employee Address
Employee Address
Country
Country
Postal codes
Postal codes
Logical model
Logical model
Street
Street
Employee
Employee
Public
Public
Strictly
confi-
dential
Strictly…
Strictly
confi-
dential
Strictly…
Confi-
dential
Confi-…
Internal
Internal
Person
Person
Employee
Employee
Home
address
Home…
Confi-
dential
Confi-…
Dutch Employees
Dutch Employees
Datasets
Datasets
Attributes
Attributes
Internal
Internal
Public
Public
Confi-
dential
Confi-…
Strictly
confi-
dential
Strictly…
Text is not SVG – cannot display

The dataset is an abstraction at a logical level that makes it easier to govern a collection of physical data. With the right information at dataset level like business owner, purpose limitation, business sensitivity, privacy sensitivity, retention term etc, you can start managing access to this data.

Note that the dataset is a logical grouping of data. This does not necessarily mean you must physically seperate different datasets to comply with governance requirements. An example when you must physically seperate the data is in case of a regional requirement to store data within country borders.

Data products & datasets

Taking it one step further: how do datasets relate to data products?

As per the HR example, you can consider this to be a single data product. A data product can expose access to different datasets that the data product holds.

To optimize the storage, you could expose dataset as (virtual) views on the actual data.

  • Privacy Policy
  • Disclaimer
  • Cookie Policy
  • Refund and Returns Policy
  • LinkedIn
  • Medium
  • Mail

© Coöperatie D8A Directors U.A.
Chamber of Commerce 85811815
VAT number NL863751143B01

Privacy Policy Proudly powered by WordPress