Maintaining high data quality is paramount for Big Data

It has long been known that data-driven marketing is only as effective as the quality of the data upon which it is based. The old adage ‘rubbish in, rubbish out’ is true.

The much vaunted statistic that customer data degrades at around 30 per cent per year is also true. However, with the influx of unstructured customer data, ensuring that customer records are up to date becomes even more crucial. Gartner estimates that today over 80 per cent of business data is unstructured including web logs, multimedia content, email, customer service interactions, sales automation, and social media data. This means that the vast majority of information held by organisations isn’t easily searchable nor is held in secure sources, like its structured cousin that sits neatly in databases with prescribed fields. The problem with unstructured data is that whilst it is incredibly useful, it is difficult to understand and prepare for analytic use.  Beyond issues of structure, is the sheer volume of it. Every time someone clicks on your website or responds to a Facebook post, another piece of data is added to the vast repository. It is estimated that 2.5 quintillion bytes of unstructured data are created every day, add to this the 20 per cent yoy growth of structured data and the term Big Data really does earn its name!

There are already hundreds of estimates pertaining to the opportunity big data affords businesses, however, all of these are based on one very simple and fundamental principle – data hygiene. If the underlying structured data is not sound then any predictive models using this data are flawed. Having a clean database was important when the data was largely used for marcomm campaigns but when it is being used with unstructured data to build predictive models its accuracy becomes even more important – or risk bias. Consequently ensuring quality control methods such as deceased suppression to flag any customers that have passed away or deduping to remove duplicate records are no longer best practice, but a fundamental part of the wider data management process.