There are a lot of myths about data quality. The belief that advanced automatic data validation and dictionary use mechanisms will solve the problem of low data quality is one of them. From our experience, the introduction of more and more restrictive data validation methods may sometimes make data quality worse rather than better. For instance, one of our local customers reported the highest policy sales to… Afghanistan as it was precisely that country which appeared as the first in the list of the countries to choose from used by the application registration system operator. The 00-000 postcode, which is frequently encountered in customers’ databases, is another example.
We know well that the mere application of the most sophisticated data validation rules is not going to ensure absolute data correctness. We also know that it is better to have no data than incorrect data. At the same time, based on our projects to date, even those organizations which have institutionally implemented data management programs are unable to achieve 100 percent quality of their data. Even where there is an advanced data management culture, it is sometimes necessary to cleanse or deduplicate data. Naturally, manual verification remains not only expensive in such case but it also cannot guarantee that no errors will be made during the correction process.
Based on proven algorithms, with many years of experience to support them, Sanmargar provides automatic data cleansing services in respect of: postal addresses, first names and surnames, business names, e-mail addresses, telephone numbers, identification information of individuals and businesses. We also carry out data deduplication projects for our customers, including the provision of a solution for them to regularly cleanse and deduplicate data on their own, as it is yet another myth that a one-off cleansing process solves the problem once and for all. In fact, data cleansing, although it cannot be a substitute for a comprehensive data quality management process, should complement it. We are aware of the fact that data should be regularly cleansed out of concern for its quality.