A-Tisket, A-Tasket: Your IR Team Is Hungry — Feed Them Some Packets
Don’t you hate the phrase “I wish”? It so often speaks to wild pipe dreams or past desires that never came to fruition. I wish I’d asked my crush out when I had the chance. I wish I’d have taken that job offer. I wish I’d have thought of Cyber Kill Chain first, etc. Do you know when wishing really stings? During a breach. I really wish our data wasn’t so incomplete. I really, really wish we had visibility over those servers. If only wishing made it so. While we can’t change the past, I’m pretty excited to help you along with your future.
One of the most interesting and useful parts of the Gigamon acquisition of ICEBRG is that we became the beneficiary of Gigamon employee’s collective networking wizardry. As it pertains to this blog, we learned a ton about SPANs and, much to our surprise, how they fall short. Unfortunately, we believe we weren’t alone and that this problem is largely unknown among security practitioners and network engineers alike, hence this blog!
In this blog, I’m going to discuss a couple of the major ways we get the data we need to catch bad guys: SPANs and TAPs. I’ll cover why we use them, how they function and why SPANs are, at best, marginally effective for feeding security tools and should not be a long-term strategy.
What Problem Are We Trying to Solve?
When I was just a wee baby analyst, I never really took the time to wonder where all the data I was seeing came from. I just took for granted that it was always there. I had little idea about the intricacies of acquiring it or even the technologies used to acquire it. I guess I assumed the Metadata Fairy would come down from high atop SIEM Mountain and do a few fancy wand tricks and boom, data. Lakes of data. Clouds filled with data. Is there a stork involved here? Turns out that it’s nothing as magical as what my brain can concoct.
As security professionals, we need to see the data. Not just some of it; certainly not all of it. We need the right data. We need data for network security monitoring (NSM), incident response (IR), and threat hunting and detection (no swanky acronym, sorry). What is the right data, though? It depends on a whole host of things. These could include:
- Your use cases for the data
- How good your current equipment is
- How much money you have for better or more equipment
- The problems important to you right now, what can wait until later, and how you ultimately bridge that gap to get the best picture you can at this very moment
Any Port in a Packet Storm
A SPAN is a software-configured function in a switch, creating duplicate copies of traffic and egressing them through a switch port configured for this purpose. A TAP is a standalone purpose-built hardware device that sits inline, generates duplicate copies of traffic that pass through it and egresses those packets through dedicated monitor ports. Understood? Cool, let’s keep going.
Before network switches with gigabit speeds, we had 10Mbit ethernet and ethernet hubs. A hub was how we connected multiple devices in the days of shared media. If a packet was destined for Bob’s computer, it went out to all the ports in the hub, regardless of which port Bob was on. If Bob had a problem with his connection, you could plug in a port analyzer anywhere on the network and see all the packets destined for all the things. Noisy, but it got the job done.
Then, the world got itself in a great big hurry and Fast Ethernet was born. Each workstation would now be connected by its own card, with a line straight to a switch port. Everyone got their own express train instead of riding the local with all its broadcast stops. However, this put the network troubleshooters in a hell of a predicament, because the whole concept of shared media went away. Switches forward things based on Layer 2 (MAC) addressing. So, if the packet isn’t destined for you, you don’t get a copy. Great for business as usual, annoying for sniffing all the flows.
So, we’d evolved in our networking prowess and moved from hubs to switches, but we still needed to fix the problem of troubleshooting. Networks are crazy places. Stuff goes wrong all the time. Is the web slow because the ISP sucks or is it slow because Karen in Accounting brought her labradoodle in and Princess likes to chew on cables? These are things we needed to know. These new switches were cool, but there was no way to see all the traffic in one place.
Enter the SPAN
We have a couple of ways to get these packets. The first way we’ll talk about is asking the switch for a copy of those packets. When they realized they could no longer stick an analyzer on the wire and see everything, grizzled network admins everywhere collectively growled — which, we all know, is one of the signs of the apocalypse. This impending doom pressured companies like Cisco to come up with a workaround. “What if we gave you this really cool thing called a mirror? Copy packets from lots of ports and copy them all to a single port. Please don’t hurt us.” And so, SPAN (Catalyst Switched Port ANalyzer) was born. And, yes, spanning and mirroring are the same thing. Cisco just got to take credit for the snazzy acronym. To avoid confusion, I’ll use “SPAN” exclusively in this blog.
The grizzled network admins got their troubleshooting capability, but there were shortcomings in the realm of security monitoring.
SPAN-asaurus Wrecked
I have a 4-year-old son and he’s the apple of my eye. I have gotten very good at piecing together stories with a few garbled bits here and there, or the concept of coherent thought being dropped altogether. As long as I can figure out exactly which of the cats he tried to ride, I can go make sure the cat isn’t dead and make a grocery store run for Band-Aids®.
A SPAN was meant to be a temporary fix to infrequent problems. While it was a way to get the packets, there wasn’t any indication of performance, nor did there really need to be. You were after the bigger picture and if a few packets got dropped or missed it wasn’t the end of the world.
When it comes to spanning and switches, we’ve never gone back to revisit the problems they can pose in an enterprise:
- A misconfigured SPAN can shut the entire link down
- Packet loss, while not guaranteed, is highly likely
- SPAN traffic has the lowest priority in the switch
- Timelining may become an issue because a SPAN port may see packets after they reach the production network
- Packets are often mis-ordered and duplicated
In the context of security, a few dropped packets could mean entire dropped transactions. This could be a big problem because, as analysts, we try to piece together as complete a picture of a security event as possible. It’s maddening to get 97 percent of the way through a 1,000-piece puzzle and realize you’re missing 3 of the last pieces. In both scenarios you won’t have the full picture. It gets worse, though. You then have to take extra time to make guesses filling in the blanks. What should have taken hours could take days, and you could be wrong. Giving management an answer that’s late and wrong then reflects poorly on the analyst, when the culprit is the “garbage in” situation.
It sucks enough to know you’re missing log sources, but we can work around that because it’s a known entity. One hundred percent visibility is a fairy tale. But imagine relying on a technology that you think is giving you 100 percent of what you need or want only to learn you’re getting, at best, 50 percent. And you’re probably going to learn that in the worst scenarios possible — not when you see the bad guys in the house, but long after they’ve gone with all your worldly possessions. Security experts are depending on a Band-Aid solution to the problem.
If we had all these problems with SPAN 10Mbit/s networks, the problems became amplified when the industry evolved up to and past 10Gbit/s. Now there was a silicon problem. The switch hardware itself could not handle these increased speeds when it came to spanning traffic. This was now an entire industry and vendor problem.
Pipe Dream Nightmare
One of the problems with spanning is that there is just so much data being sent. A 100Gbit/s switch already has a whole bunch of traffic running around. An already overutilized switch will never be 100 percent reliable because of that. When it comes to security monitoring, we need all the reliability we can get. To illustrate my point, say you have an e-commerce web server using all of its bandwidth connected to the switch at 1Gbit. You are transmitting that 1Gbit/s to a web monitoring solution. 1Gbit to 1Gbit — seems fine right? Only we’re not just sending 1Gbit.
Network traffic is inherently “bursty,” meaning it’s not always steady. At times you could be far below an aggregate allotment of 1Gbit, but at other times it bursts to full capacity. In a SPAN scenario, if you have five ports all sending traffic to a single 1Gbit SPAN port and two of them start bursting larger than normal amounts of traffic, packets are going to get dropped.
Aggregating traffic that exceeds output capacity is called “oversubscription” and you are guaranteed to drop packets due to a very limited buffering capability being present in the switch. Ever gone to try on clothes right after Thanksgiving, Christmas and New Year’s dinners have settled into your waistline? That’s oversubscription.
On what planet is one-tenth of your traffic good enough for any kind of real security analysis? Then weigh that in an incident response scenario, relying on a SPAN that doesn’t care if your beloved packet lives or dies. To surmise, SPANs will always break your heart, just a little bit.
Fan of the SPAN?
Using SPANs clearly isn’t all bad because we keep using and needing them. If you have enterprise grade switches, you can easily configure SPAN ports. Google “How to configure SPAN” and you can be up and running in about ten minutes. That ease of setup is hard to beat. So, we have that going for us. Additionally, since we’re talking in the context of security analysis, SPANs are the only way to capture intra-switch traffic — traffic between two hosts on the same switch — without putting a TAP on every single port switch. A SPAN is still almost guaranteed to drop packets due to oversubscription, though.
SPANs are easy and cost-effective but aren’t the best way to provide visibility. It’s a free solution, so why bother burning capital if you don’t have to — right? Except, that’s a 100 percent false statement. Everything costs something. In this case, it costs us packet loss, incomplete analysis, potential network failures, and violent sysadmins at your doorstep.
TAP, TAP, Hooray!
So, what makes a TAP so great? I mentioned earlier that this blog was less about saying one technology was better than the other; that in any real-world scenario you need both TAP and SPAN to flourish. That being said, TAP is still way more relevant to a long-term, solid monitoring solution.
I learned a long time ago not to buy the combination shampoo and conditioner stuff at the grocery store. On its face, it seems like a win all around. One bottle takes up half the space, you expend half the energy to use it and you spend less money. The problem is my hair is never as silky smooth as I want it to be and I need to look pretty in the Gigamon booth at security conferences. I don’t want one thing that does a half-ass job. I want one thing to do a whole-ass job. TAP is a technology built to run the distance. It’s scalable, dedicated and configurable. Using a TAP is playing the long game.
A TAP sends a separate traffic stream out for each direction of traffic flow. Because this gets done with physical hardware, the packet doesn’t get dropped or mis-ordered. A SPAN, on the other hand, is heartless when it comes to the data you want it to copy. SPAN data will always have lowest priority, while a TAP will have a separate channel for both transmit and receive going each way. That is perfect fidelity.
A TAP also helps alleviate another very real problem when it comes to security analysis — building a timeline. Unlike a SPAN, TAP traffic isn’t going to production first then coming back to the TAP, potentially screwing up timestamps and mis-ordering packets. With TAPs you’re looking at the traffic as it is happening in real time.
To TAP or SPAN, That Is the Question
Packets tell the truth. I don’t care what the GUI, %device%, tea leaves, witch doctor, or marketing flyer told you. If it isn’t a packet on the wire, it isn’t the truth. Unfortunately, you can’t TAP every link in every switch in all your locations — that way lies madness. So, what do you do?
If it’s where everything goes out and everything comes back in, TAP it. TAPs are not out of the question for east-west traffic either. You can, and should, identify what your crown jewels are and determine what your monitoring plan looks like for those assets. Maybe you need visibility over a critical e-commerce server facing the internet that’s a huge chunk of your business model. TAP it. In the beginning, you might only be capable of covering 90 percent of your north-south traffic and 10 percent of your east-west with TAPs, but you’ll see 100 percent of that data and that’s a damn good start. There will always be something else you want complete visibility over — say your domain controllers — but it’s not the end of the world if you don’t TAP it all. That’s when you could utilize a SPAN to send that targeted traffic to whatever tool you’re using to monitor your network.
A bad worker always blames their tools. We can be so quick to evaluate the usefulness of a tool by what it does for us without ever stopping to think about what we’re doing for it. I can’t sit here and say I hate all Kias because I chose to put shampoo in my gas tank thinking it would make the car run cleaner. If we feed our tools the wrong fuel, of course they’re going to perform badly. We need to get our own house in order, making sure our own processes are on point and humming along, then we can blame the vendor and get lots of free “I’m sorry” lunches. There’s a framework for everything.
While it is true that some security products are trash from start to finish, let that gauntlet be run with complete care and complete data. TAP where you can, SPAN where you must. A TAP is a long-term monitoring solution, while a SPAN is meant to be a Band-Aid at best. You will need both in your environment but it’s important to know when and how to use each of them to live your best data life.
Epilogue: We’ve Read This Whole Blog — What Do We Do Now?
You’re in luck, because as I was writing this piece, Baseer Balazadeh, Sr. Technical Marketing Engineer for Gigamon, wrote a series of blog posts outlining the mechanics of TAP versus SPAN, described the mechanics of active versus passive TAPs, and also did a great writeup of TAP best practices. I highly recommend checking out his series plus these additional resources:
- GigaVUE TA series Aggregator
- Gigamon Whitepaper — Understanding Network TAPs
— The First Step to Visibility - Gigamon Whitepaper — To TAP or SPAN?
- Gigamon Video — Network TAPs
- What’s the Difference Between a Hub, a Switch and a Router?
A special thank you to my Gigamon colleagues who taught me more about SPAN and TAP than I wanted to know:
Neal Allan
Gigamon Senior Training Specialist
Dale Smith
Gigamon Principle Professional Services Engineer
Michael Valladao
Gigamon Senior Sales Engineer
Featured Webinars
Hear from our experts on the latest trends and best practices to optimize your network visibility and analysis.
CONTINUE THE DISCUSSION
People are talking about this in the Gigamon Community’s Networking group.
Share your thoughts today