SHARE
Security / October 24, 2022

An OSINT Analysis of x509 Certificates, Part Two: Digging Into the SAN

Editor’s note: This article is the second of two blog posts on analyzing x509 certificates using open-source intelligence. Read part one here.

Part one of this two-part series on OSINT analysis of x509 certificates dove into a handful of fields within x509 certificates that are useful when trying to determine if those certs are used during phishing, malware, or just general business operations.

In Part 2, we focus on one section of x509 certificates: the Subject Alternative Name (SAN). There is a wealth of information buried in the SAN, especially when combined with the x509 issuer and subject.

Subject Alternative Name x509 Extension

When it comes to inferring context about a domain from its x509 certificate, the real treasure trove is the subject alternative name (SAN). The SAN is a way to indicate a list of host names that are covered by the certificate. Different organizations populate the SAN in different ways that can tell you a bit about how the organization manages its domain(s).

Most legitimate domains run by small organizations will usually list two hosts in their SAN; the domain name and either a wildcard host to cover all possible subdomains (Figure 6) or a “www.” subdomain.

Figure 1. Example of SAN with domain and wildcard domain.

Phishing domains almost always follow the above pattern. Rarely do they include other domains, as this would leak information about their operations.

One of the funniest phishing certificates I’ve come across tried to take a shotgun approach to the subject alternative name:

Only around 50 percent of the certificates used for malware that we’ve observed even included a SAN. Those that do usually only have a single domain name, though those issued by Let’s Encrypt have either a “www.” or wildcard subdomain prefix. Like phishing certificates, rarely do malware certificates include other domains in the SAN.

Larger corporations tend to have much more complex SANs that can include dozens of hosts. A common pattern is to see the domain name plus an explicit set of wildcarded subdomains (Figure 7). This protects known subdomains but not subdomains that might get created for malicious use if the site was compromised, a tactic known as domain shadowing.

Figure 2. Multiple subdomain hosts in SAN.

The other pattern commonly seen is a SAN that includes multiple different domain names all owned and operated by the same parent corporation. These are used as a single catch-all certificate for all owned assets.

Tip: Look for newly issued certificates that only contain a single host with a subdomain but not the registered domain name. Also, look to see if the certificate was issued by a different CA than the primary domain’s certificate issuer.

Hosting Platforms

One of the easiest ways to tell if a domain is being hosted on a hosting platform such as WordPress or cPanel is to inspect its certificate. Every platform has its own signature, but you can usually identify it by looking at the issuer, subject, and SAN values.

WordPress, for example, creates catch-all certificates for 25 different domains at a time and has the following signature:

  • Signature algorithm: SHA-256 with RSA Encryption
  • Subject: CN=tls.automattic.com
  • SAN: 25 different domain names, plus hosts for those with “www.” subdomains

cPanel, a hosting platform that is popular with both small companies and phishing sites, is another example that is easy to identify. The SAN will include the domain name, a host with “cpanel.” as a subdomain, and any combination of the following subdomains shown in Figure 8.

Figure 3. SAN with cPanel subdomain hosts.

This is a useful signature when combined with the age of the domain and passive DNS data. Any cPanel-hosted domain that’s less than three to four months old and has passive DNS telemetry for the last couple of weeks is most likely a phishing domain.

There is probably a slew of other hosting platforms that can be identified by sussing out their individual x509 signatures.

Reverse Proxies

Reverse proxies sit in front of domains and forward client requests to those domains, but to the client, it looks like it is talking directly to the target domain. One of the easiest ways to tell if a domain is sitting behind a reverse proxy is to look at the domain’s x509 certificate.

In most cases, the subject’s common name will not be the domain being browsed to, but the domain owned by the reverse proxy service. The SAN will contain the domain owned by the reverse proxy service as well as the domain being browsed to and any included subdomains.

CloudFlare, for example, has the following signature:

  • Issuer contains “CN=Cloudflare Inc ECC CA-3”
  • Subject contains “CN=sni.cloudflaressl.com”
  • SAN contains “sni.cloudflaressl.com”, the target domain name, and sometimes a wildcard prefix on the target domain

What is interesting is if you search for a domain that uses CloudFlare as a reverse proxy in the certificate transparency log, you will probably see two valid certificates: one that was issued by the original certificate authority to the domain and one issued by CloudFlare for the reverse proxy server. So, if you want to know the actual CA used by the domain owner, you must search the certificate transparency log.

Microsoft Defender also has a reverse proxy service that can be detected in a similar fashion.

TLS Inspection Proxies

TLS inspection proxies are another interesting case when it comes to inspecting x509 certificates. TLS inspection proxies establish separate TLS tunnels with the client and the destination server, which allows it to decrypt and inspect HTTPS traffic. To do this, the proxy server needs to automatically generate x509 certificates for the destination domain.

Consider this scenario: A client behind a TLS inspection proxy browses to “some-domain.com.” The proxy server establishes a TLS session with “some-domain.com,” inspects the certificate returned in the TLS handshake, and issues a new certificate with the Subject and SAN from the destination certificate. The proxy server then continues with the TLS handshake with the client, returning the newly issued certificate.

So, in the case of a Zscaler TLS inspection server, the x509 certificate the client receives will have the following signature:

  • Issuer contains “CN=Zscaler Intermediate Root CA” or something similar
  • Subject and SAN match the values from the certificate originally issued to “some-domain.com”

This becomes even more fun if “some-domain.com” is sitting behind a reverse proxy like CloudFlare, in which case the certificate returned to the client has the issuer from the proxy service, the subject from the reverse proxy service, and the SAN from the original certificate of the target domain.

One thing to note, such certificates would only be observed in traffic behind the TLS inspection proxy, so if you aren’t sure if your company uses TLS inspection, check the x509 certificates in your browser.

Subject Alternative Name Host Diversity

Sometimes x509 certificates have a SAN that contains many hosts. When visually inspecting a certificate, it’s easy to see if these hosts are all subdomains on the same domain, similarly named domains with possibly different TLDs, or totally different domains. Each of these scenarios tells you something different about the certificate and the goals of the organization that requested the certificate.

But when dealing with thousands of certificates, it’s much easier to do this programmatically. There are a few metrics I use to understand the demographics of a collection of x509 certificates:

  • Count of hosts in the SAN
  • Ratio of distinct domain names in the SAN
  • Ratio of distinct TLDs in the SAN

These three metrics together will give you a good sense of how diverse the set of hosts are in the SANs.

Another metric I like to use is to calculate the entropy of each of the hosts in the SAN. This can be useful to quickly scan of the full set of hosts in the SAN to see if they have similar entropy values or if their entropy values are all over the board. This metric can tell you if the host names are fairly related to each other or not. For a good primer on entropy and how to interpret the entropy value for domain names, check out this article.

Other x509 Extensions

There are dozens of other well-known x509 extensions, but none of them are as useful as the SAN when trying to infer any OSINT information about a domain. But that doesn’t mean these should be ignored, as extensions are a great place to hide data. Each extension has a byte format specification, and if someone adheres to that specification, they can inject data into their certificate. Someone could even define their own x509 extension and put anything they want in it.

The takeaway here depends on what you are investigating; it could be important to look for infrequent or deprecated extensions, extensions that are unrecognized, and extensions that have an unusually large byte size.

Conclusion

x509 certificates can contain insightful information, but when staring at tens of thousands of certificates in your logs, it can seem impossible to infer anything meaningful. One way to approach this is to build mental profiles of different types of organizations mentioned in this paper and the patterns that exist in their certificates. From this information, you can start to categorize traffic based on the context found within these certificates. Hopefully, these profiles can help you make sense of and bring a bit of order to the ever-increasing volume of certificate data in your network logs.

Featured Webinars

Hear from our experts on the latest trends and best practices to optimize your network visibility and analysis.

CONTINUE THE DISCUSSION

People are talking about this in the Gigamon Community’s ThreatINSIGHT group.

Share your thoughts today

RELATED CONTENT

REPORT
2022 Ransomware Defense Report
WEBINAR
Unlock Ultimate Hybrid Cloud Security: Join Nutanix for Insights
REPORT
2022 TLS Trends Data
WEBPAGE
Suddenly, Ransomware Has Nowhere to Hide

Back to top