Incorrect data is an acute problem for online businesses. While most businesses have started to realize the importance of data-mining and performance-based marketing, it seems that some of the effort has been going in the wrong direction. The point is not about having as much data as possible, but rather having the data you can actually work with – structured, complete and correct.
Below I will outline some common sources of erroneous or incomplete data and give some advice on how to avoid the mistakes.
Unclear or imprecise definitions of KPIs.
Even if your company is not very large, make sure that everyone dealing with data understands precisely what is included in statistics and how it is calculated. It even makes sense to create a written document defining the KPIs, no matter how simple or self-explanatory they might seem.
For example, if you want to calculate CTR (click through rate) for banner advertising, which is, in essence, the number of clicks on the banner divided by the number of banner impressions, you can ask yourself a range of questions. Do you want to apply unique banner impressions and unique clicks (that is, per user)? At what intervals do you want to measure the CTR (per hour, per day, per week)? How do you go about natural fluctuations in CTR (e.g. during the day vs at night), do you want to make them part of your statistics or just ignore them and take the average? How do you group the data: by region, by traffic source, by banner type, etc.? If you run banner tests, do you want to exclude the test data from the overall statistics?
As you can see, the answers to these questions might greatly influence the outcome, i.e. the CTR your data analyst will produce at the end of the day.
The more complex a system is, the more likely it is to malfunction. Always check for technical issues if dealing with data inconsistency. For this, it is best to work with statistical and technical benchmarks, based on the normal system behavior in the past.
Technical problems influencing the data consistency may, firstly, be caused by the under-performance of the system itself (for example, the banners are not served for a period of time, or are not displayed correctly in some browsers). Secondly, even if the system functions well, there might be problems in capturing or storing the data (e.g. not enough RAM to perform operations causes the database server to crash). And thirdly, if you do not query the database directly but let the data flow through a business intelligence system, there might be all kinds of compatibility problems between the systems.
This is the least predictable source of data inaccuracy. It starts with how the data is collected and aggregated in the system. Even if one has enough clarity on how the KPIs are constructed, there is always a chance that the setup of the analytics system will divert from the desired parameters. Furthermore, if data processing and analysis are largely done manually, the probability of a mistake rises with every step. Also, merging the data from several sources can impact the data consistency in a negative way, especially if some parameters have to be converted or rounded up before the merging.
Thus, human factor should not be underestimated, and the only way to minimize the number of errors is double checking the calculations, or better still, automating as much of data processing as possible.
In conclusion, if you find that the data does not “seem right”, even after you have excluded every possibility of a mistake, you might need to look for the reason outside of your analytics system. One of my previous blog posts looks at this issue in more detail.