The amount of data we are now producing is astounding. According to the sixth edition of DOMO's “Data Never Sleeps” report, over 2.5 quintillion bytes of data are created every single day, and by 2020, approximately 1.7MB of data will be created every second for every person on Earth.
Therefore, the overused adage, “data is the new oil”, is by no means an exaggeration. Data can provide organisations with a myriad of valuable insights, and thus, today’s businesses are under increasing pressure to make use of data to drive competitive advantage.
However, just like oil, raw data needs to be refined before it can be used effectively. This is because the insights that a business is able to draw rely very much on the quality of data being used. As they say, “garbage in, garbage out”. The use of bad data – data that is inaccurate, ‘polluted’, unsecured, or even non-compliant – can result in significant difficulties in deriving insights, leading to poor decision-making, judgements, and even loss of reputation and income.
Ensuring data quality should, therefore, be a top priority for any data-driven organisation. Data quality can be measured on the following six core factors, each of equal importance:
Accuracy – The degree to which the data correctly describes and represents what it should.
Completeness – The data must not contain missing vital elements. However, completeness depends on specific business rules or expectations. It is possible for the data to be ‘complete’ even if optional data is missing.
Consistency – Wherever the data resides, it has to reflect the same information and must be in sync across the enterprise.
Timeliness – The data has to be available when it is expected and needed so that the information can be utilised efficiently. Nevertheless, the time frame varies depending on the use case and user expectation. Some may require real-time data while for others, a certain amount of delay may be acceptable.
Uniqueness – In simple terms, there should be no data duplicates, and no two data records should be identical as this will result in increased risks such as the accessing of outdated information.
Validity – The data has to maintain conformance to specific data definitions or formats. For example, the date of a product delivery could be in the format “dd/mm/yyyy”, and the recorded data has to conform to this format for it to be valid.
At the end of the day, it’s about conditioning data in order to meet the specific needs of business users, and so the desired standards may differ from one company to another.
What’s certain is that improving data quality presents too huge of an added value for businesses to pass up – in terms of increased revenues from positive outcomes and less time spent on reconciling data, leading to reduced costs and greater confidence in the data and the analytics behind it.
To help businesses, Talend, a company that specialises in cloud data integration and data integrity, has released a white paper that extensively explores the approaches, tools and collaborative effort required in order to meet modern-day data quality requirements.
Click here to download Talend’s Definitive Guide to Data Quality.