top of page

Why you need a Data Inventory to secure your company

Cybersecurity Guide - Cybersecurity GRC

One of the most obvious but undervalued ways to improve data protection is by properly managing it. Most companies do not take the time to thoroughly inventory their data. In order to protect your data effectively, you need to know where your valuable and sensitive data resides.


Why you need a Data Inventory to secure your data

It is crucial that every organization establish a comprehensive data management process that encompasses a data management framework, data classification guidelines, and requirements for the protection, handling, retention, and disposal of data. Creating a successful process depends on creating an accurate and thorough data inventory. A data inventory is a comprehensive record of an organization's data assets, designed to facilitate the management, protection, and compliance of these assets.

Some examples of industries that rely heavily on data protection and significantly benefit from developing a data inventory include:

  1. Healthcare: Healthcare providers, insurers, and other entities in this industry handle sensitive patient data, such as electronic health records (EHRs) and personally identifiable information (PII). They must adhere to strict regulations like the Health Insurance Portability and Accountability Act (HIPAA) in the United States.

  2. Financial services: Banks, credit unions, investment firms, and insurance companies handle sensitive financial data and PII. They must comply with regulations such as the Gramm-Leach-Bliley Act (GLBA) in the United States and the Payment Card Industry Data Security Standard (PCI DSS) globally.

  3. E-commerce and retail: Online retailers and brick-and-mortar stores that process credit card transactions must adhere to PCI DSS to protect customer payment data.

  4. Telecommunications: Telecommunication companies, internet service providers, and mobile network operators handle large volumes of sensitive customer data and communication records. They must comply with various privacy and data retention regulations, depending on the jurisdiction.

  5. Technology companies: Software, cloud service providers, and other technology companies often manage large amounts of user data, including PII. They must comply with data protection regulations such as the General Data Protection Regulation (GDPR) in the European Union and the California Consumer Privacy Act (CCPA) in the United States.

  6. Government agencies: Public sector organizations manage various types of sensitive data, including PII, national security information, and other classified data. They must comply with numerous government-specific data protection regulations and standards.

  7. Education: Schools, colleges, and universities handle student and employee data, including education records, personal information, and research data. They must comply with regulations such as the Family Educational Rights and Privacy Act (FERPA) in the United States.

  8. Legal services: Law firms and other legal service providers manage sensitive client data and confidential case information. They must adhere to legal professional privilege and data protection regulations in their respective jurisdictions.

Organizations across these industries must maintain a robust data inventory to ensure the proper identification, protection, and management of their data assets, helping them comply with regulatory requirements and safeguard the sensitive information they handle.

How to properly plan and develop your data inventory:

  1. Define the scope: Determine which data types, systems, and processes will be included in the inventory. Consider including all data assets, both structured and unstructured, across various departments, applications, and storage locations.

  2. Assemble a team: Form a cross-functional team with representatives from different departments, including IT, legal, compliance, and business units. This team will help ensure that all relevant perspectives are considered during the inventory process.

  3. Identify data sources: Catalog all data sources within the organization, such as databases, file systems, cloud storage, third-party services, and external data feeds. Be sure to include both on-premises and off-premises data storage.

  4. Collect data attributes: For each data source, gather information on data attributes, such as data type, format, sensitivity, owner, and usage. This information will help in understanding the data's importance and the risks associated with it.

  5. Classify data: Categorize the data based on its sensitivity and criticality, using labels such as "Public," "Confidential," and "Sensitive." This classification will help in determining the appropriate security measures and handling procedures for each data type.

    1. To determine data sensitivity levels, organizations must catalog their primary data types and assess the overall criticality, considering the potential impact of data loss or corruption. This assessment will inform the development of a tailored data classification scheme for the organization. Common labels used for classification include "Sensitive," "Confidential," and "Public," which can be applied to categorize data based on its sensitivity.

    2. It's important to note that each company can define their own set of labels. It's usually ideal to have as few labels as possible and keep the definitions straightforward. Military and Government classification schemes.

  6. Map data flows: Document how data moves within the organization, including data entry points, processing steps, storage locations, and sharing with external parties. This mapping will help identify potential risks and vulnerabilities in the data lifecycle.

  7. Assess data quality: Evaluate the accuracy, completeness, and consistency of the data. Identify any data quality issues and take steps to address them.

  8. Determine data ownership and responsibilities: Assign data owners and data stewards who will be responsible for maintaining and managing the data in the inventory. Clearly define their roles and responsibilities.

  9. Establish a data inventory management process: Develop a process for maintaining and updating the data inventory regularly. This may include periodic reviews, audits, and updates to reflect changes in data sources, classifications, or usage.

  10. Implement data security measures: Based on the data classification and mapping, implement appropriate security controls to protect the data throughout its lifecycle. This may include access controls, encryption, and data retention policies.

  11. Document and communicate the inventory: Share the data inventory with relevant stakeholders and ensure they understand its purpose and their responsibilities. Regularly update and distribute the inventory to maintain awareness and compliance.

Details that should be identified and documented within the inventory:

An effective data inventory should capture all the key details needed to understand what the data is and how to protect it.

  1. Data owner: The person or entity responsible for the data's accuracy, integrity, and security, often a senior executive or department head.

  2. Data custodian: The individual or team responsible for the day-to-day management, storage, and maintenance of the data.

  3. Data location: The physical or virtual location where the data is stored, such as servers, databases, or cloud services.

  4. System storing or processing the data: The hardware, software, or service used for storing, processing, or transmitting the data.

  5. Sensitivity label: The classification of the data based on its sensitivity, such as "Public," "Confidential," or "Sensitive."

  6. Encryption status: Whether the data is encrypted, both at rest and in transit, to protect it from unauthorized access.

  7. Data Loss Prevention (DLP) solution: Whether a DLP solution is in place to monitor and protect the data from unauthorized access, exfiltration, or leakage.

  8. Data backup: Whether the data is regularly backed up to ensure its availability in case of hardware failure, accidental deletion, or other incidents.

  9. Recovery Point Objective (RPO) and Recovery Time Objective (RTO): The maximum acceptable amount of data loss (RPO) and the acceptable downtime (RTO) for restoring data in the event of a disruption.

  10. Data retention period: The minimum and maximum periods for which the data must be retained, as dictated by legal, regulatory, or business requirements.

  11. Archive and/or journaling solution: Whether an archiving or journaling solution is in place to store historical data for long-term preservation, compliance, or other purposes.

  12. Data sharing and access: Information on any third parties with whom the data is shared, as well as the processes and controls governing data access.

  13. Compliance requirements: Any legal, regulatory, or contractual obligations related to the data, such as GDPR, HIPAA, or industry-specific regulations.

  14. Data flow documentation: Whether a data flow diagram or document exists that illustrates how the data moves within and outside the organization.

  15. Data lifecycle documented: Documentation outlining the complete data lifecycle, from creation and storage to archiving and disposal.

By capturing this information, organizations can gain a better understanding of their data landscape, allowing them to more effectively manage, secure, and comply with regulations surrounding their data assets.

Tools can help you:

It's important to understand what information you need to create a data inventory, but it can be challenging identifying, locating, and organizing all of the information. Organizations can have millions of folders and files, as well as, numerous applications and databases. There are many different tools available that can help conduct a data discovery, aggregate metadata and produce helpful reports. Organizations that have large amounts of data spread across a network can save valuable time and effort by leveraging tools to help automate the process of identifying, classifying, and cataloging data assets. Some popular types of tools and solutions include:

  1. Data Catalog Tools: Data catalog tools create a centralized inventory of data assets and provide a searchable interface for users to discover and understand the data. Examples of data catalog tools include Alation, Collibra, and Informatica Enterprise Data Catalog.

  2. Data Classification Tools: These tools automatically discover, classify, and label sensitive data based on predefined classification policies. Examples include Microsoft Azure Information Protection, Boldon James Classifier, and Titus Classification Suite.

  3. Data Discovery and Mapping Tools: These tools help organizations discover and visualize their data landscape by mapping data flows and relationships between data assets. Examples include IBM InfoSphere Information Governance Catalog, SAS Data Management, and Talend Data Catalog.

  4. Data Loss Prevention (DLP) Solutions: Many DLP solutions offer data discovery and classification features, helping organizations identify sensitive data and monitor its usage. Examples of DLP solutions with data discovery capabilities include Symantec Data Loss Prevention, McAfee Total Protection for Data Loss Prevention, and Forcepoint DLP.

When choosing a tool or solution for creating a data inventory and conducting data discovery, organizations should consider factors such as the types of data they manage, their specific industry requirements, the complexity of their data landscape, and their existing IT infrastructure. By implementing a suitable tool, organizations can gain better visibility into their data assets, streamline data management processes, and ensure compliance with data protection regulations.

Notable security frameworks that cover data inventories:

Many security frameworks cover data inventories as a part of their guidelines or controls and emphasize the importance of identifying, classifying, and managing data assets to ensure effective data protection.

  1. NIST Cybersecurity Framework (CSF): Developed by the National Institute of Standards and Technology (NIST), the CSF provides a set of best practices for organizations to manage and reduce cybersecurity risks. The framework's Identify function emphasizes the need for an organization to understand its data assets and includes guidelines for creating and maintaining a data inventory.

  2. ISO/IEC 27001: This international standard for information security management systems (ISMS) provides a systematic approach to managing sensitive information to ensure it remains secure. The standard includes controls related to data asset management, such as inventorying, classifying, and handling data.

  3. CIS Critical Security Controls: Developed by the Center for Internet Security (CIS), these controls provide a prioritized set of actions to improve an organization's cybersecurity posture. Control 3 focuses on Data Protection and includes requirements to create a data classification scheme and data inventory.

  4. The Cloud Security Alliance (CSA) Cloud Controls Matrix (CCM) is a comprehensive security framework specifically designed for cloud computing environments. It provides a set of security controls that organizations can use to assess the overall security of their cloud-based systems and services. The CCM covers various aspects of data protection and management, including data inventory. Within these control domains, there are specific controls that emphasize the importance of data inventory and classification:

    1. Data Security & Information Lifecycle Management (DS): This domain focuses on ensuring the confidentiality, integrity, and availability of data in the cloud environment. It includes controls related to data classification, data handling, data retention, and disposal, which all rely on having an accurate data inventory.

    2. Asset Management (AM): The Asset Management domain covers the identification, classification, and management of assets, including data assets. It underscores the need for a comprehensive inventory of data assets, as well as their classification, to ensure proper security measures are in place.

  5. FAIR (Factor Analysis of Information Risk): The FAIR framework is a quantitative risk analysis model that helps organizations understand, analyze, and quantify information risk. To effectively use the FAIR model, organizations need to have a comprehensive understanding of their data assets, which is facilitated by maintaining a data inventory.

These security frameworks highlight the importance of creating and maintaining a data inventory as a critical component of an organization's data protection and risk management strategy. By following the guidelines and controls provided by these frameworks, organizations can better understand and manage their data assets, ultimately improving their overall security posture.

What next

Utilize this guide and the resources available to you through frameworks and other online sources to construct your data inventory. After the inventory is completed, further actions can be taken to protect the data. One recommendation is that the network should be segmented to group assets with the same sensitivity level together and isolate them from assets with different sensitivity levels. This segmentation enhances security by reducing the potential attack surface. It also allows you to focus time and money on the most valuable and sensitive data segments. Network hardware, including firewalls and switches, should be used to control access to each network segment. User access rules must be implemented to permit only those individuals with a legitimate business need to access the data. Software-based solutions, such as Zero-Trust solutions, can also help control access to data. These types of security configurations are difficult to effectively implement without the proper identification and understanding of the data.

Check our Twitter and Discord Server for more information:

Additional resources:

  1. CIS Critical Security Controls

  2. NIST Cybersecurity Framework

  3. Cloud Security Alliance - Cloud Controls Matrix


blockchain concept illustration in 3d, connected blocks in blockchain_edited.jpg

Check out our Twitter feed!

  • Discord
  • Twitter
  • LinkedIn
bottom of page