The Artima Developer Community
Sponsored Link

Web Buzz Forum
Black magic causing database server timeouts

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Chirag Mehta

Posts: 80
Nickname: chime
Registered: Jun, 2005

Chirag Mehta is IT Systems Manager for Formulated Solutions and owner of Chime Softwares
Black magic causing database server timeouts Posted: Dec 23, 2006 9:22 PM
Reply to this message Reply

This post originated from an RSS feed registered with Web Buzz by Chirag Mehta.
Original Post: Black magic causing database server timeouts
Feed Title: chir.ag/tech
Feed URL: http://chir.ag/tech/rss.xml
Feed Description: Chirag Mehta - Tech Web Log: I discuss pretty much anything related to technology that comes to my mind, from the nitty-gritties of string parsing in some language to the overall big picture of the software world.
Latest Web Buzz Posts
Latest Web Buzz Posts by Chirag Mehta
Latest Posts From chir.ag/tech

Advertisement
After coming across the case of the 500-mile email again, I was reminded of a pretty severe problem just as unexplainable that we had at our company last year.

The day after an unscheduled closing (hurricane), I started getting calls from users complaining about database connection timeouts. Since I had a very simple network with less than 32 nodes and barely any bandwidth in use, it was quite scary that I could ping to the database server for 15-20 minutes and then get "request timed out" for about 2 minutes. I had performance monitors etc. running on the server and was pinging the server from multiple sources. Pretty much every machine except the server was able to talk to the others constantly. I tried to isolate a faulty switch or a bad connection but there was no way to explain the random yet periodic failures.

I asked my coworker to observe the lights on a switch in the warehouse while I ran trace routes and unplugged different devices. After 45-50 minutes on the walkie-talkie with him saying "ya it's down, ok it's back up," I asked if he noticed any patterns. He said, "Yeah... I did. But you're going to think I'm nuts. Every time the shipper takes away a pallet from the shipping room, the server times out within 2 seconds." I said "WHAT???" He said "Yeah. And the server comes back up once he starts processing the next order."

I ran down to see the shipper and was certain that he was plugging in a giant magnetomaxonizer to celebrate the successful completion of an order. Surely the electromagnetic waves from the flux capacitor were causing rip in the space-time continuum and temporarily shorting out the server's NIC card 150 feet away in another room. Nope. All he was doing was loading up the bigger boxes on the pallet first and then gradually the smaller ones on top, while scanning every box with the wireless barcode scanner. Aha! It must be the barcode scanner's wireless features that probably latch on to the database server and cause all other requests to fail. Nope. Few tests later I realized it wasn't the barcode scanner since it was behaving pretty nicely. The wireless router and it's UPS in the shipping room were configured right and seemed to be functioning normally too. It had to be something else, especially since everything was working fine just before the hurricane closing.

As soon as the next time out started, I ran into the shipping room and watched the guy load the next pallet. The moment he placed four big boxes of shampoo on the bottom row of the pallet, the database server stopped timing out! This had to be black magic! I asked him to remove the boxes and the database server began to time out again! I did not believe the absurdity of this and spent five more minutes loading and unloading the boxes of shampoo with the same exact result. I was about to fall down on my knees and start begging for mercy from the God of Ethernet when I noticed that the height at which the wireless router was placed in the shipping room was about a foot lower than the top of the four big boxes when placed on the pallet. We were finally on to something!

The wireless router lost the line-of-sight to the outside warehouse anytime a pallet was loaded with the big boxes. Ten minutes later I had the problem solved. Here is what happened. During the hurricane, there was a power failure that reset the only device in our building that wasn't connected to a UPS - a test wireless router I had in my office. The default settings on the test router somehow made it a repeater for the only other wireless router we had, the one in the shipping room. The two wireless nodes were only able to talk to each other when there were no pallets placed between them and even then the signal wasn't too strong. Every time the two wireless routers managed to talk, they created a loop in my tiny network and as a result, all packets to the database server were lost. The database server had it's own switch from the main router and hence was pretty much the furthest node. Most other PC's were on the same 16-port switch so I had no problems pinging constantly between them.

The 1-second solution to this four-hour troubleshooting nightmare was me yanking off the power to the test router. And the database server never timed out again.

Read: Black magic causing database server timeouts

Topic: Why Web developers shouldn't ignore older technologies Previous Topic   Next Topic Topic: Does your Website Need W3C Validation?

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use