“Data Loss Prevention (DLP) is a critical aspect of data security, aimed at identifying and preventing the unauthorized access, use, disclosure, disruption, modification, inspection, recording or destruction of sensitive data. With the increasing amount of data being generated and stored, the need for effective DLP solutions is more important than ever. Machine learning, with its ability to analyze patterns and behaviors in data usage, has emerged as a powerful tool for DLP.”
Machine learning (ML) has come a long way. While long used extensively in Natural Language Processing, image recognition, and playing games like chess, ML’s reach is expanding. It is now being used to assist in water conservation, identify drugs that could help smokers quit, and even to study how bird roosting habits are changing with climate.
Machine learning has become so popular that it has become a necessary buzzword that organizations (including security vendors) like to use liberally, sometimes without context, explanation, or even its actual existence. We think how Next DLP uses it is pretty special and want to shed some light on it.
What is Machine Learning (ML)
Machine learning is a set of algorithms and statistical models that allow computer systems to learn and improve from experience. At its heart, ML is looking at large data sets and extracting patterns. Systems can use these patterns to make predictions, inform decisions, and automate tasks without relying on explicit instructions from programmers and without human intervention.
ML in the Cloud
Machine learning has obvious applications in DLP. By analyzing activity over time, an ML system can determine baseline, or “normal” behavior. Once the platform establishes a baseline, it can also identify activities that deviate from the baseline and generate an alert. This could include someone from HR accessing large amounts of engineering data or vice versa.
The problem with most approaches to ML in DLP is in the data sets used for training. Legacy Vendors designed legacy DLP solutions to follow granular rules to identify risky behavior. This means they need to pre-classify all data in the organization and determine what specific activities the solution will allow each set of users to perform with each class of data. They took a similar approach when applying ML. They record every action by every user and upload it to an engine in the cloud for analysis. This presents two problems. First, the “training” period required to establish a baseline of normal activity can take months, delaying the time when a customer receives value. Second, it is subject to false positives as the data is not clean; not every user in HR, engineering, sales, or finance operates in the same way. Therefore, false positive deviations are over-reported.
ML on the Endpoint is Smarter
Next Reveal takes a different approach. We designed Reveal using today’s technology for today’s work environment, where cloud applications, BYOD, and remote work is the norm. That means intelligence and machine learning is required on the endpoint.
By putting ML on each endpoint, Reveal solves the problem of poor data sets. Reveal looks at each individual user to tailor a baseline for each user in a few weeks, not several months. It does not require the cloud or network connection to collect data, establish patterns, analyze behavior, and enforce controls. Data analyzed on the endpoint remains on the endpoint.
By individualizing each profile, Reveal allows organizations to move away from the requirement for granular policies for the organization at large and the resulting false positives. Individual baselines surface individual anomalies, isolating risks to each device and user.
For example, an ML solution looking at the organization at large will generate alerts based on deviations from the organizational or departmental baseline. Multiple failed logins aren’t normal; therefore, a legacy DLP solution will generate an alert. With ML on the endpoint, Reveal can look at individual deviations. Was the keystroke pattern consistent with that user’s normal pattern, or was it more rapid and indicative of credential stuffing? After login, was the individual’s behavior normal or did they do something else unusual, such as launch new software or visit uncommon IP addresses? By stacking and correlating these activities as they occur – and against an individual’s baseline – Reveal can establish patterns, analyze behavior, and enforce controls quickly on and off the corporate network and without a connection to a cloud-based ML engine.
Next Reveal’s approach to ML provides faster baselining, more accurate enforcement, and more rapid time to value. You can learn more in our Fireside Chat with Next DLP ML team lead Alan Brown. Check it out here.