Friday, April 14, 2006

How to bring your company network down

Today at work my network started acting crazy and none of the websites were loading up. I thought maybe it was one of those friday afternoons where Windows needed a hard kick in the butt. So I restarted my machine and I could hear the warm purring of the hard drive booting up my Windows. I get the login prompt and it took almost 10 minutes to log me in and then boom - nothing works. My outlook crashed, "My computer" did not open, IE went bonkers opening a web page. So I get up from my desk and ask my colleagues if they are facing a similar problem - YES they were. Sometimes it helps to ask - so all of us started playing with darts :) as the entire network was down.

Ethereal, revealed some interiors of what was happening - Some TCP packets were in the queue for retransmission. Retransmission was for every 1/5000th of a second. Wow - the network quickly got overloaded with a zillion retransmission requests from a particular source to a particular destination. We tracked down the source and asked the person to shut off the computer. Now some other source->destination packets started retransmission requests and all this was happening so fast that slowly the entire network was brought down one by one.

Now, all the computers were shut off and there was still something fishy happening. How could something be happening when all the computers have been powered off? Funny right - wait till you know what the cause of this was - one of my colleagues had shorted 2 ports of a network switch by mistake. By short I mean, connecting an Ethernet cable from one port of the switch back to itself to another port. In short and sweet words - good 'ol local loopback that just went bonkers and kept itself in an unbreakable infinite loop.

Finally the loop was broken and voila - the network made perfect sense again. All computers purred to their usual startup sounds and Windows worked like a charm. I sometimes wonder - is it this easy to bring the entire network down? Why isn't there something that can detect such infinite loops and have some sort of a network sniffer that can break these loops or warn the admin what the heck is going on rather than going around to everyone's computer and seeing if amy faulty network connection is made. Come on guys - it is the 21st century where we have martian robots collecting information for us from Mars, can't we design a fault-tolerant network that will not allow itself to be self-shorted infinite loopback. Maybe there is something, I don't know.

0 Comments:

Post a Comment

<< Home