What is data lifecycle? Data lifecycle refers to a process to help organizations manage the different phases of the life of certain critical objects as Vendor, Customer, Employee, Material etc. throughout their lifecycle. This begins with the initial creation through Data lifecycle the end of life.
It is necessary for organizations to understand that each stage of a lifecycle provides its own set of defined activities. By clearly identifying the data objects' various phases, stages or statuses in their data lifecycle: the foundation of data quality lifecycle, organizations are able to manage them efficiently and effectively. Defining this basis is crucial to any data quality uplift activities that your organization may plan for its data objects.
The phases in a data lifecycle
While there are many interpretations, requirements and needs to the various stages or phases of a typical data lifecycle, they can be summarized as follows, ensuring each stage triggers a set of activities:
Creation: phase-In stage
This first phase focuses on the initial acquisition, entry, creation or capture of the data object. In our example, the creation of an employee as a data object begins when a legal contract binds a potential candidate to a firm, transforming this potential candidate to an ‘employee’ with a job start date.
In another example, the creation of a product begins with activities such as defining and designing the product. These activities only pertain to the creation stage and are no longer applicable once the product becomes an active product.
Usage: active stage
This phase focuses on the usage of data and takes into consideration how the data is made available for use, processed, modified or shared.
In our example a product in an active stage could be produced, purchased, sold, utilized, etc. On the other hand, an employee in an active stage could receive a salary, must maintain working hours, can subscribe to trainings, is granted access to certain environment and has an active badge.
End of life: phase-out stage
This phase focuses on the data object’s end of life. In our example, once an employment contract is terminated, a concise set of activities are followed to change an active employee record to an inactive record. This status is followed by activities such as an exit interview or completion of leave forms.
Accordingly, a product in the phase-out stage has limited stock. This status allows for a reduction in product stock but does not allow the creation of new stock.
Archival: obsolete stage
This phase focuses on the obsolete stage of the data object.
In our example, once an employee’s job termination date is in the past, this triggers the release of end-of-term benefits, closure of work email ID and deactivation of their employee ID.
Accordingly, a product in the obsolete stage has ran its stock. This status does not allow any operations pertaining to this stock.
What are the benefits of a data lifecycle?
We have observed that organizations struggle the most with clearly defining the data lifecycle stages. Too many stages lead to inconsistencies while too few lead to inaccuracies. A good rule to follow is to ensure each stage comes with a defined list of activities that cannot overlap with another stage.
Additionally, all data lifecycle phases must meet certain standards to ensure that data is complete, concise, understandable, trusted, accurate, relevant and secure. This should be the basis to determine the data object's data quality.
In our example, a data quality rule can be created to ensure that employees' date of birth is be within a range of acceptable values. However, this rule is only relevant if applied to employees in the active stage. Clearly understanding your data’s lifecycle helps define data quality rules, which then must be adapted to match the lifecycle stage.
In our product example we can consider a data quality rule to check whether a product’s net weight is within an acceptable range for the active stage. This rule cannot be applied to a product that is in the phase-in stage as it is not yet finalized. Hence, for data quality rules the following steps should be taken into consideration:
- Define the key object
- Define the data lifecycle stage of the key object
- Define the data quality rule based on the data object's stage within the lifecycle
Organizations should review their data quality practices, ensuring the right perspective is considered. A data lifecycle structures the life of an object into certain stages. However, appropriate data quality measures are only possible if the stages of your data’s lifecycle are clearly defined as these are the foundation of your data quality.
We would like to thank Marryam Khawaja for her valuable contribution to the article. If you have any questions regarding this topic or other related matters, please contact Marryam Khawaja, Evgenia Rüdisüli or Francesco Guidi.