Data quality dimensions: Uniqueness
In this latest post, we look at one of the DAMA six dimensions of data quality – uniqueness.
No thing will be recorded more than once based upon how that thing is identified.
Analysis of the number of things as assessed in the ‘real world’ compared to the number of records of things in the data set. The real world number of things could be either determined from a different and perhaps more reliable data set or a relevant external comparator.
Essentially, when looking at our data we are looking for duplicates. Does this entity (e.g. asset or person) have more than one record in our data set to describe it.
A lack of uniqueness in data can lead to expensive and dangerous mistakes in terms of the number and detail of assets recorded, fraud, under/over budgeting/supply or compliance/safety considerations and mistakes. Having multiple records for the same entity, may mean that the wrong record gets updated. What would the impact of that be at your organisation?
Identifying duplicates may not be as easy as you think. For example, there could be several people legitimately called “John Smith”. If however, you also held their date of birth (DOB) in the data set you could check for their name and their DOB as a “potentially” unique identifier. indeed you could continue to add distinguishing attributes like initials and other personal details to build up as close to a unique identifier as possible. Of course if you hold National Insurance numbers for every person in your data the duplicate check is made a lot easier.
Infoboss empowers the data owner to define the multi attribute detection rules to identify potential duplicates in your data.
To discover more about how infoboss can help support your data quality and data protection initiatives, please get in touch.