This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Data loss prevention software" – news ·newspapers ·books ·scholar ·JSTOR(July 2016) (Learn how and when to remove this message) |
Data loss prevention (DLP)software detects potentialdata breaches/data exfiltration transmissions and prevents them by monitoring,[1] detecting and blocking sensitive data whilein use (endpoint actions),in motion (network traffic), andat rest (data storage).[2]
The terms "data loss" and "data leak" are related and are often used interchangeably.[3] Data loss incidents turn into data leak incidents in cases where media containing sensitive information are lost and subsequently acquired by an unauthorized party. However, a data leak is possible without losing the data on the originating side. Other terms associated with data leakage prevention are information leak detection and prevention (ILDP), information leak prevention (ILP), content monitoring and filtering (CMF), information protection and control (IPC) and extrusion prevention system (EPS), as opposed tointrusion prevention system.
Thetechnological meansemployed for dealing with data leakage incidents can be divided into categories: standard security measures, advanced/intelligent security measures, access control and encryption and designated DLP systems, although only the latter category are currently thought of as DLP today.[4] Common DLP methods for spotting malicious or otherwise unwanted activity and responding to it mechanically are automatic detection and response. Most DLP systems rely on predefined rules to identify and categorize sensitive information, which in turn helps system administrators zero in on vulnerable spots. After that, some areas could have extra safeguards installed.
Standard security measures, such asfirewalls,intrusion detection systems (IDSs) andantivirus software, are commonly available products that guard computers against outsider and insider attacks.[5] The use of a firewall, for example, prevents the access of outsiders to the internal network and an intrusion detection system detects intrusion attempts by outsiders. Inside attacks can be averted through antivirus scans that detectTrojan horses that sendconfidential information, and by the use of thin clients that operate in aclient-server architecture with no personal or sensitive data stored on a client device.
Advanced security measures employmachine learning and temporal reasoningalgorithms to detect abnormal access to data (e.g., databases or information retrieval systems) or abnormal email exchange,honeypots for detecting authorized personnel with malicious intentions and activity-based verification (e.g., recognition of keystroke dynamics) anduser activity monitoring for detecting abnormal data access.
Designated systems detect and prevent unauthorized attempts to copy or send sensitive data, intentionally or unintentionally, mainly by personnel who are authorized to access the sensitive information. In order to classify certain information as sensitive, these use mechanisms, such as exact data matching,structured data fingerprinting, statistical methods, rule andregular expression matching, published lexicons, conceptual definitions, keywords and contextual information such as the source of the data.[6]
Network (data in motion) technology is typically installed at network egress points near the perimeter. It analyzes network traffic to detect sensitive data that is being sent in violation ofinformation security policies. Multiple security control points may report activity to be analyzed by a central management server.[3] Anext-generation firewall (NGFW) orintrusion detection system (IDS) are common examples of technology that can be leveraged to perform DLP capabilities on the network.[7][8] Network DLP capabilities can usually be undermined by a sophisticatedthreat actor through the use ofdata masking techniques such as encryption or compression.[9]
Endpoint (data in use) systems run on internal end-user workstations or servers. Like network-based systems, endpoint-based technology can address internal as well as external communications. It can therefore be used to control information flow between groups or types of users (e.g. 'Chinese walls'). They can also control email andInstant Messaging communications before they reach the corporate archive, such that a blocked communication (i.e., one that was never sent, and therefore not subject to retention rules) will not be identified in a subsequent legal discovery situation. Endpoint systems have the advantage that they can monitor and control access to physical devices (such as mobile devices with data storage capabilities) and in some cases can access information before it is encrypted. Endpoint systems also have access to the information needed to provide contextual classification; for example the source or author generating content. Some endpoint-based systems provide application controls to block attempted transmissions of confidential information and provide immediate user feedback. They must be installed on every workstation in the network (typically via aDLP Agent), cannot be used on mobile devices (e.g., cell phones and PDAs) or where they cannot be practically installed (for example on a workstation in anInternet café).[10]
Thecloud now contains a lot of critical data as organizations transform tocloud-native technologies to accelerate virtual team collaboration. The data floating in the cloud needs to be protected as well since they are susceptible tocyberattacks, accidental leakage and insider threats. Cloud DLP monitors and audits the data, while providing access and usage control of data using policies. It establishes greater end-to-end visibility for all the data stored in the cloud.[11]
DLP includes techniques for identifying confidential or sensitive information. Sometimes confused with discovery, data identification is a process by which organizations use a DLP technology to determine what to look for.
Data is classified as either structured or unstructured. Structured data resides in fixed fields within a file such as a spreadsheet, whileunstructured data refers to free-form text or media in text documents, PDF files and video.[12] An estimated 80% of all data is unstructured and 20% structured.[13]
Sometimes a data distributor inadvertently or advertently gives sensitive data to one or more third parties, or uses it themselves in an authorized fashion. Sometime later, some of the data is found in an unauthorized place (e.g., on the web or on a user's laptop). The distributor must then investigate the source of the loss.
"Data at rest" specifically refers to information that is not moving, i.e. that exists in a database or a file share. This information is of great concern to businesses and government institutions simply because the longer data is left unused in storage, the more likely it might be retrieved by unauthorized individuals. Protecting such data involves methods such as access control, data encryption anddata retention policies.[3]
"Data in use" refers to data that the user is currently interacting with. DLP systems that protect data in-use may monitor and flag unauthorized activities.[3] These activities include screen-capture, copy/paste, print and fax operations involving sensitive data. It can be intentional or unintentional attempts to transmit sensitive data over communication channels.
"Data in motion" is data that is traversing through a network to an endpoint. Networks can be internal or external. DLP systems that protect data in-motion monitor sensitive data traveling across a network through various communication channels.[3]