Data Cleansing and Standardization

Sanmargar DQS (Data Quality Studio) is a solution developed by Sanmargar Team to support validation, standardization and cleansing of customer address and other customer attributes.

The Sanmargar DQS solution provides Clients with comprehensive data validation, standardization and correction services. Predefined Sanmargar DQS algorithms make it possible to cleanse postal addresses, first names and surnames, business names, e-mail addresses, telephone numbers, identification information of individuals and businesses. Depending on the data type, cleansing involves, for instance, soft pattern matching techniques, reference data dictionaries and regular expressions.

The high configurability of Sanmargar DQS makes it possible to control cleansing algorithms and result confidence levels, and create non-standard, dedicated solutions for cleansing or classifying other attributes of any data sets, e.g. product names and codes.

A reference data library, comprising Polish address dictionaries and international dictionaries of first names and surnames, is maintained and developed for the purpose of the Sanmargar DQS solution. Reference databases rely not only on official sources (databases administrated by government agencies), but also on our own experiences, eliminating inconsistencies and errors encountered in those databases.

Sanmargar DQS may work with many data sources, databases, and any ETL software. The use of Web Services makes it also possible to validate data entered by users online. These features enable easy integration of Sanmargar DQS with other solutions, including CRM systems, CCF (Central Customer File) systems, e-commerce, etc.

Sanmargar DQS has been used in several projects involving the migration of data to new ERP/CRM-class systems. Using Sanmargar DQS, we performed the processes of integration, cleansing and deduplication of data obtained from several installations of the legacy systems in order to load it into the target system. Our use of Sanmargar DQS in those processes enhanced the effectiveness of data deduplication many times over.


Grzegorz Orłowski

Product manager