Silent Push HTML Content Similarity Search: One-Click Discovery of Linked Threat Domains 

platform

Adversaries often use multiple domains that feature the same visual characteristics, created in a ‘copy and paste’ fashion that makes it easier to deploy malicious content at scale. 

Whether its infostealer control panels, Command and Control infrastructure, phishing sites, or malware delivery portals, manually discovering and traversing across malicious clusters of infrastructure can be a tedious and error-prone process. 

Release 4.9 introduces HTML Content Similarity search – a feature that makes it quick and easy to shine a light on groups of malicious domains that share the same look and feel. 

Available as a pivot control on Silent Push Web Scanner results, and as a standalone menu option, analysts can use HTML Content Similarity Search to uncover clusters of websites that share between 50% to 100% similarity. 

HTML Content Similarity is almost like a separate product in itself – it’s that powerful – but we provide it as a feature, bundled in with a Silent Push Enterprise subscription. 

This blog introduces the concept, but there’s a LOT more to discuss. 

Over the next few weeks, we’ll be publishing more on how to ingest the data gained from a similarity search into your security stack, including SOAR and SIEM integration, and the tangible benefits that process provides to security teams.

How Does It Work? 

HTML Content Similarity search uses ssdeep fuzzy hashing – a method designed to measure similarity between files, even if they are not exact copies.

We use an algorithm that hashes visible content on scanned domains, and identifies matching websites to a certain percentage value.

For HTML pages, this means we can detect sites with slight variations (e.g., minor visual and textual changes) that would otherwise be missed by exact matching.

HTML Content Similarity searches are conducted in two ways:

  1. As a dedicated menu item under ‘Web Data’ 
  1. A contextual pivot control on Web Scanner results

Both options provide the ability to increase or decrease the outputted match percentage from 50% to a full 100% match.

Why This Matters to Security Teams

Malicious actors don’t typically create every single domain or website they use from scratch. Instead, adversaries rely on templates to build out large networks of related infrastructure that share the same underlying characteristics. 

This tactic helps them scale their operation in several ways – whether its delivering malware across hundreds of domains and IPs, or applying the same look and feel to phishing infrastructure targeting multiple brands and organizations.

Without automated tracking methods, identifying these clusters requires a significant amount of manual work – checking domain by domain, hunting for visual or textual similarities, and cross-referencing disparate intel sources – that quicky eats into threat discovery and resolution times. 

HTML Content Similarity search facilitates fast discovery of threat infrastructure by:

  • Quickly uncovering related domains from a single suspicious site or hash 
  • Providing a focused list of connected websites, reducing noise and guesswork 
  • Facilitating rapid investigation by pivoting directly from Silent Push Web Scanner results 
  • Helping to prioritize blocking actions based on cluster size and similarity
  • Integrating with our industry-leading DNS and content datasets to provide additional context for each domain 

Detecting Similar Content: Chinese Retail Scam Campaign 

Let’s look at an example.

tommyilfigershop[.]com is part of a larger campaign involving thousands of websites deployed by Chinese threat actors spoofing well-known brands. 

Retail scam site

Running a content similarity search on the domain immediately returns numerous lookalike websites, with a high correlation scores of between 80-100% similarity:

Once a set of results has been obtained, users can perform a live lookup of any returned domains using the Live Scan pivot, and get a real-time snapshot of live visual content.

Here’s a live lookup of wattlea[.]com from the above results, confirming visual similarity with tommyilfigershop[.]com, indicating deployment using the same template: 

From one known malicious site, analysts can quickly pivot and generate results listing thousands of domains sharing similar HTML patterns, revealing additional campaign infrastructure in minutes instead of hours. 

Feature Recap 

Web Scanner pivot 

  • Run a Web Scanner query using 200+ input parameters 
  • Identify a suspicious or known malicious site
  • Click the pivot icon to view similar HTML results

Standalone query 

  • Use the Web Data menu to launch an HTML Content Similarity search directly
  • Perform additional contextual pivots on returned domains

Silent Push Contextual Data 

We present this data alongside everything else we know about each domain – which is significantly more than any other cybersecurity vendor can offer. 

Our platform breaks down each website into over 200 pivotable categories, using a proprietary scanning and aggregation engine that’s all our own work. 

We don’t rely on stale lists of publicly known IOCs. We collect and deliver our own infrastructure-level intelligence, which allows us to be infinitely flexible in how we present actionable data to our customers. 

These data points enable teams to drill-down into the underlying TTPs that govern how a threat actor is managing their infrastructure, and uncover additional connections or infrastructure components that are otherwise difficult to get at. 

For example, after discovering a cluster of phishing domains via a similarity search, analysts can pivot on common certificate issuers or matching Javascript code to find more suspicious sites, or look at similarities in HTML titles to confirm grouping. 

Outcomes 

By simplifying and accelerating the discovery of related malicious domains, our HTML Content Similarity search gives security teams the ability to: 

  • Identify attacker domains faster, and with a greater degree of accuracy
  • Reduce manual overhead and investigation fatigue 
  • Proactively block clusters of malicious sites before they cause harm 
  • Uncover previously hidden relationships between threat campaigns 

Book A Demo 

Ready to transform your threat intelligence workflows, and massively improve detection times? Use the form below for a customised walkthrough of our HTML Content Similarity functionallity, and everything else the platform has to offer.