SHARE
Cloud / August 31, 2021

The Three Rs of Visibility for Any Cloud Journey

This article was previously published on Help Net Security.

Dealing with an incident requires not just prompt notification of the incident but the ability to triage the cause of the incident; carry out forensics; identify what other systems, users, devices, and applications have been compromised or impacted by the incident; and identify the magnitude or impact of the incident, the duration of the activity that led to the incident, and many other factors. In other words, notification of an incident is simply the first step in a complex journey that could lead to possibly unearthing a major cyberbreach, or perhaps writing off a completely benign non-incident. And while Security Orchestration Automation and Response (SOAR) solutions help automate and structure these activities, the activities themselves require telemetry data that provide the breadcrumbs to help scope, identify, and potentially remedy the situation. This takes increasing significance in the cloud for a few reasons:

  1. The public cloud shared security model may lead to gaps in the telemetry — for example, lack of telemetry from the underlying infrastructure that could help correlate breadcrumbs at the infrastructure level to the application level
  2. Lack of consistency in telemetry information as applications increasingly segment into microservices, containers, and Platform-as-a-Service, and as various modules come from different sources such as internal development, open source, commercial modules, and outsourced development
  3. Misconfigurations and misunderstandings as control shifts between DevOps, CloudOps, and SecOps
  4. All the above coupled with a significant expansion of attack surface area with the decomposition of monolith applications into microservices

When incidents occur, the ability to quickly size the scope, impact, and root cause of the incident is directly proportional to the availability of quality data and its ability to be easily queried, analyzed, and dissected. As companies migrate to the cloud, logs have become the de facto standard of gathering telemetry. However, there are a few challenges when relying almost exclusively on logs for telemetry.

  • The first issue is that a common practice with many hackers and bad actors is to turn off logging on the compromised system to cloak their activity and footprint. This creates gaps in telemetry that can significantly delay incident response and recovery initiatives. On occasion, DevOps teams too may reduce logging on end systems and applications to reduce CPU usage (and associated costs in the cloud) leading to additional gaps in telemetry data.
  • A second issue is that logs tend to be voluminous and, in many cases, written by developers for developers, leading to too much and perhaps irrelevant telemetry data. This drives up costs of storing and indexing that data, and also leads to longer query times and more effort on the part of the incident responder sifting through that data.
  • Finally, log levels can be increased or decreased, but ultimately the logs themselves are pre-defined as they are embedded into code. Changing what information logs put out is not something that can be done in real time or near real time in response to an incident but may require code changes, leading to significant delays and impaired incident response capability.

This leads us to the three Rs of telemetry — Reliable, Relevant, and Real time. To serve the needs of rapid response, telemetry data needs to be reliable in that it is available when needed, without gaps introduced by malicious actors or even inadvertently by various operators due to misconfiguration or miscommunication. It needs to be relevant in that it should provide meaningful actionable insights without significantly driving up costs or query times due to excessive, duplicate, and irrelevant information. And finally, it needs to be real time in the sense that the stream of telemetry data can be changed, and new telemetry data or additional telemetry data can be derived at the click of a button.

A great way to complement logs in the cloud and address the three Rs is with telemetry data derived from observing network traffic. After all, command and control activity, lateral movement of malware, and data exfiltration happen over the network. If end systems or applications are compromised and logging is turned off at the server or application, network activity continues and can continue capturing breadcrumbs identifying the malicious activity. In that sense, network-based telemetry can provide a reliable stream of information even when endpoints or end systems are compromised or impacted. Metadata generated from network traffic can be surgically tuned to provide a highly relevant and targeted telemetry feed. Security operations teams can select from thousands of metadata elements specific to their use case — for example, focusing on DNS metadata or metadata associated with Remote Desktop activity — and discard other network metadata that may not be relevant, thereby reducing cost, but equally important, being able to write targeted queries. And should the need arise to expand or change what telemetry data is being acquired, it can be easily changed at the network level without requiring any change to the application. A simple API call can change what network metadata is being captured in near real time.

As organizations look to move to the cloud, complementing their log sources with network-based telemetry will prove invaluable in bolstering their security and compliance posture. In that sense, network-based telemetry is an essential component in securing the move to the cloud. Gigamon Hawk provides a cloud agnostic platform for precisely achieving this and hitting the three Rs described above.

Featured Webinars

Hear from our experts on the latest trends and best practices to optimize your network visibility and analysis.

CONTINUE THE DISCUSSION

People are talking about this in the Gigamon Community’s Hybrid/Public Cloud group.

Share your thoughts today


}
Back to top