With over six million customer records, manual verification was out of the question, all the more so given that manual data correction does not guarantee full correctness. It was necessary to develop automatic data quality improvement mechanisms.
One of the main challenges faced by our Client, a leading electricity distributor in Poland, in introducing a new billing system, was insufficient customer data quality. The master data of its electricity payers and customers, which was to be migrated to a new system, included incorrectly entered customer addresses and additional customer attributes, including NIP Tax Identification Number, REGON Statistical Identification Number, PESEL Statistical Identification Number, name, surname, phone number and e-mail. At the same time, the level of completeness of such data varied across the source systems – data was not always complete. All of those factors resulted in a certain lack of confidence in the data.
Given the very strict data quality requirements for the new system, it was necessary to verify data first and develop a common data model to enable effective migration. With over six million customer records, manual verification was out of the question, all the more so that manual data correction does not guarantee full correctness. It was necessary to develop automatic data quality improvement mechanisms.
Drawing upon Sanmargar Team consultants’ many years of experience with data cleansing, automatic cleansing rules were defined and then optimized in several iterations. Their implementation made it possible not only to meet the strict address data correctness criteria set at the beginning of the project but also standardize customer data and add additional information thereto (such as landline area codes). The proprietary Sanmargar DQS and Metastudio DRM solutions were used in the implementation. The data cleansing algorithms developed and implemented with the use of those tools support multiple, regular data cleansing as part of the process of data migration to the new system, spread over many months.
#dataquality #dqs #metastudio #postgresql #smartdata