What is URL classification and why does it matter?

Despite our perfectly good intentions, we all found ourselves lost at least once in a “dark corner” of the Internet. Each of these dark places has its precise location. And one way to identify this location is a URL, or “Uniform Resource Locator.”

Why do you need to track URLs in the first place? The answer has to do with cyber forensics and threat intelligence—the art of tracking anomalous activity and investigating cyber incidents.

In a nutshell, say you are an analyst, and you want to know where one of your users picked up some malware, or say you are looking to reconstruct a data exfiltration pattern. You are likely to start by looking at the list of URLs associated with the user's traffic at a particular point in time.

So URL information is crucial to security and IT teams. But is picking out some unusual URL enough? Not really. Today, there are more than 1.8 billion websites on the Internet, making for oodles of dark corners. So how do cyber analysts go from identifying a URL to preventing threats, detecting insider risk, or mitigating a breach? How do they use URL information to go all the way from gathering threat context, qualifying the malicious cyber actor’s intent, identifying the particular type of data breach, and taking remediation action?

And can the security team go even further and use URL data to fend off cyber threats? Will they be able to use this contextual data (from the URL) and generalize to other breaches of the same kind—to build effective acceptable-use policies and prevent cyber incidents?

What is URL classification?

Not all URLs are born equal. And to make sense of billions of URLs, you need something called “website categorization” or “URL classification.”

Website categorization reduces a vast number of websites into a limited number of categories, which helps security teams monitor user activity. It can be a pretty powerful feature in the hands of an advanced data loss protection solution. URL categorization insights now supercharge our Reveal solution relying on smart policies and machine learning.

Reveal’s new URL classification feature provides enhanced context to the browsing activity of your organization’s users. For example, besides flagging that a user visited a site like dropbox.com, Reveal now provides additional context that the site falls under the category type CLOUD_STORAGE. This insight gives analysts immediate context as to whether the activity in question is benign or one that needs deeper investigation.

Our URL classification for DLP relies on a third-party URL categorization database that stores and organizes billions of web domains according to generic attributes like language and specific security-related URL categories and custom URL categories. This implementation strategy gives the advantage of a centralized URL categorization service that receives frequent updates and can be accessed in near-real-time.

Why do you need to classify URLs?

URLs are powerful indicators of compromise. Identifying them systematically is a key cyber threat intelligence task, especially for technologies and security operations teams that need to process massive data streams. But what would you use this information for?

First, URL categorization helps to run cyber hygiene assessments by uncovering the use of adult content, phishing or gambling sites, VPN service websites, peer-to-peer services, or social media. Knowing if users are abusing company resources or taking missteps in their use of company IT resources leads to an improved cyber security posture.

Second, for cyber forensics and investigation, URL classification can help track malicious actors, identify sites that promote malicious software distribution, or spot attempts to circumvent security solutions, like using a remote proxy.

In essence, if you want to answer questions like: "Is anyone trying to bypass my security controls?", "Where is my data going?", "Are my users sticking to good security practices?" you will greatly benefit from URL classification.

How do you do URL classification?

The new URL classification feature from Reveal is fast, secure, and easy to integrate with your security stack. Reveal’s URL classification system takes browser and sensor event URLs received by the cloud infrastructure, compares them against a third-party URL classification database, and classifies them. Since the third-party database resides in the cloud infrastructure, all URLs are processed internally and not shared externally. The classified events are stored in the database and are accessible via the Reveal UI and API.

For example, if the organization’s authorized file share is OneDrive, a visit to Dropbox would warrant greater scrutiny to determine if the user uploaded files to Dropbox and which files. Were there any sensors triggered on the upload of that file that alerted to the fact that it contained sensitive information?

Reveal’s Investigate module with this enhanced URL classification feature makes this contextualized investigation both simple and easy.

The Reveal agent allows analysts to filter through millions of events collected across the organization’s endpoints within a few steps, without going to a site to categorize URLs or download data from Reveal and write custom scripts to merge different datasets for categorization and analysis. Imagine having the power to look at a sensor triggered by a user entering their credentials on a site and immediately being able to view additional context on whether that site is a phishing site—or a newly registered domain? Or imagine searching for all sites categorized as “Webmail'' and seeing if sites visited are company authorized or if users have been sending files to these unauthorized webmail domains? This simplification saves valuable time in the investigation process.

Reveal’s URL classification categories include: