The Haystack Problem
Organizations lack sufficient collect and compute resources to arrive at accurate conclusions about whether or not events observed are good or bad. That is to say, simply passing data to a SIEM will not, by itself, identify a breach.
To make it worse, things one organization or business unit deems “bad”, may not be “bad” for for another.
Let’s look at a couple examples: At ACME dot Com, PCI data must be encrypted in transit unless the data is on an isolated network segment. The observation of unencrypted credit card information is neither good nor bad unless a rule exists to correlate that data in that address space.
Next, at ExampleTech the rules governing inbound traffic may differ from one network to another. In some companies, inbound rules may differ from one desk to another.
Finally, at Security Foundation, a malware detection engine alerts that a host is compromised without understanding it’s a malware research PC, or it has some other mitigation in place.
Although it’s possible for rules that govern PCI, remote assist, and research machines to be fairly static, the rate at which innovation and technology changes is not. Nor are companies.
Generally, controls that govern permitted use are rigid and they erode over time because people move desks, companies buy new services, firewall rules are changed and against the rigid set controls, companies evolve. As the rate of change increases, the rate of decay against the controls does too. And in a world struggling to be more secure, the questions of whether or not an observed event is good or bad, becomes even more pressing to answer.
To answer the seemingly simple question, “is this good or bad?” often requires sophisticated context. Building and managing context requires more than just the accumulation of data. To draw conclusions and create context, requires that tools using the data have persistence, reference and inference.
The accumulation of data and the rate at which it needs to become contextually aware are almost always in juxtaposition. To achieve the level of fidelity needed to make good decisions, the volume of data required, at the rates it’s required, overwhelms correlation systems and puzzle anomaly detection engines. Detection systems spend an enormous amount of time matching patterns and correlating very basic events and very little time analyzing complex relationships. This fact doesn’t render the collection of data from networks and systems useless. In many cases, the collection of event data can be useful forensically or for future event prediction.
Stopping all but the very common, static, or predictable attacks in near real time, is still exceedingly difficult and fraught with false positives. Apart from very sophisticated instances, high volume data analytics systems use humans to maintain persistence and correlate events across seemly disparate events. Obviously, relying on humans as analytic state engines is fraught with problems.
The opportunity is to solve the collect and compute problem. By improving the quality of the data pre-process, the fidelity of the inputs improve, thereby improving the quality of the context data to which the questions are asked. At a minimum, we improve the quality of the SIEM correlation by several orders because we’re able to improve the quality of the pre-process data.
Allowing customers to choose what patterns they need to match, for us, is relatively easy. It’s easy because it’s a simple pattern with very little state and we deliver it in a single summary record. But by adding this very basic level of pre-processing, we free up cycles on SIEMS that enable them to reach much higher quality conclusions.
And here’s the best part, once SIEMS no longer have to spend cycles stitching together HTTP, SSL, DNS and SMB conversations, they’re freed up to manage persistence, reference and inference. And the way those qualities improve is with more data with more structure, which provides analytic tools a holistic, high fidelity view, into which they can pose the questions. The customer’s problem now inverts from having to find a needle in the haystack, to a real need for more hay, because they’re finding the needles with context.
To learn more about this topic see these blog posts about context relevance:
http://jeffjonas.typepad.com/jeff_jonas/2006/08/accumulating_co.html
http://jeffjonas.typepad.com/jeff_jonas/2008/02/algorithms-at-d.html