Interestingly, this story actually helped me once. I was debugging an issue where some people in the multi-national company I work for (many offices) could connect to the server and some people could not, and it was thanks to this story that I seriously entertained the possibility that it had something to do with timing, due to the geographical distribution of the problem reports. Subsequent investigation revealed that for various uninteresting technical reasons, the server was setting a 200ms timeout on the SSL negotiation part of the connection, which closely matched the ping times of the participants, including the people who could sometimes connect and sometimes not. And I discovered this far faster by knowing I should be poking around on timing issues, rather than searching out something deeper.
(I am deliberately eliding other details not relevant to this aspect of the story. My point is merely that I am glad I had encountered it. Could easily have saved me a day or two.)
Take a look at the data file (/usr/share/misc/units.dat) some time. It's an interesting little declarative non-Turing-complete DSL, in addition to being a handy reference manual.
This one has made the rounds on reddit a few times. I shared this personal story last time it came around:
In 2005 at my job we had a pretty severe problem just as unexplainable. The day after an unscheduled closing (hurricane), I started getting calls from users complaining about database connection timeouts. Since I had a very simple network with less than 32 nodes and barely any bandwidth in use, it was quite scary that I could ping to the database server for 15-20 minutes and then get "request timed out" for about 2 minutes. I had performance monitors etc. running on the server and was pinging the server from multiple sources. Pretty much every machine except the server was able to talk to the others constantly. I tried to isolate a faulty switch or a bad connection but there was no way to explain the random yet periodic failures.
I asked my coworker to observe the lights on a switch in the warehouse while I ran trace routes and unplugged different devices. After 45-50 minutes on the walkie-talkie with him saying "ya it's down, ok it's back up," I asked if he noticed any patterns. He said, "Yeah... I did. But you're going to think I'm nuts. Every time the shipper takes away a pallet from the shipping room, the server times out within 2 seconds." I said "WHAT???" He said "Yeah. And the server comes back up once he starts processing the next order."
I ran down to see the shipper and was certain that he was plugging in a giant magnetomaxonizer to celebrate the successful completion of an order. Surely the electromagnetic waves from the flux capacitor were causing rip in the space-time continuum and temporarily shorting out the server's NIC card 150 feet away in another room. Nope. All he was doing was loading up the bigger boxes on the pallet first and then gradually the smaller ones on top, while scanning every box with the wireless barcode scanner. Aha! It must be the barcode scanner's wireless features that probably latch on to the database server and cause all other requests to fail. Nope. Few tests later I realized it wasn't the barcode scanner since it was behaving pretty nicely. The wireless router and it's UPS in the shipping room were configured right and seemed to be functioning normally too. It had to be something else, especially since everything was working fine just before the hurricane closing.
As soon as the next time out started, I ran into the shipping room and watched the guy load the next pallet. The moment he placed four big boxes of shampoo on the bottom row of the pallet, the database server stopped timing out! This had to be black magic! I asked him to remove the boxes and the database server began to time out again! I did not believe the absurdity of this and spent five more minutes loading and unloading the boxes of shampoo with the same exact result. I was about to fall down on my knees and start begging for mercy from the God of Ethernet when I noticed that the height at which the wireless router was placed in the shipping room was about a foot lower than the top of the four big boxes when placed on the pallet. We were finally on to something!
The wireless router lost the line-of-sight to the outside warehouse anytime a pallet was loaded with the big boxes. Ten minutes later I had the problem solved. Here is what happened. During the hurricane, there was a power failure that reset the only device in our building that wasn't connected to a UPS - a test wireless router I had in my office. The default settings on the test router somehow made it a repeater for the only other wireless router we had, the one in the shipping room. The two wireless nodes were only able to talk to each other when there were no pallets placed between them and even then the signal wasn't too strong. Every time the two wireless routers managed to talk, they created a loop in my tiny network and as a result, all packets to the database server were lost. The database server had it's own switch from the main router and hence was pretty much the furthest node. Most other PC's were on the same 16-port switch so I had no problems pinging constantly between them.
The 1-second solution to this four-hour troubleshooting nightmare was me yanking off the power to the test router. And the database server never timed out again.
I had a vaguely similar problem with a node at the end of a long, but in-spec, cat5e cable periodically experiencing huge packet loss, typically at night, and it would last for about 5-10 minutes then work fine for about 10-20. The problem happened right after I added a switch at the end of the line, but the switch had been in service elsewhere and never had any problems.
It turned out the Ethernet cable was placed in a conduit that ran close to a conduit carrying a high-current two-phase 240V power line. Most of the devices on that line were running and pulling huge amounts of current all the time, so I didn't originally suspect that the line was the problem. But, there was a furnace on that line whose variable-speed blower was connected to a single 120V phase, and the unbalanced current draw on the power line created enough common-mode noise in the cat5e cable to throw off the switch. The furnace came on periodically at night to heat the building, causing the packet loss. The solution was to put a device at the end of the Ethernet cable with better common-mode rejection.
A "normal" person might have simple "turned it off and on again" and thus fixed it. Interesting how you wanted to understand what's going on and went through a lot of trouble to get there. I'd have done the same.
Well the first problem is to find the thing to turn off and on again. It's not always clear what device is misbehaving.
Also, frequently you'll need to report to the "higher ups" what went wrong. Saying "I dunno, I turned it off and on again and it works now. shrug" makes you look like you don't know what's going on. It's much better to be able to explain what happened.
A lovely story (which I urge anyone reading this to go and read; it's much geekier than it probably sounds). But wait: isn't there a factor-of-2 error right at the end, and doesn't that rather spoil it? Or am I missing something?
Fascinating, although I'm guessing the author isn't an electrical engineer, because for me, the timeout on the messages would have been the absolute first thing that I checked as soon as I'd confirmed the 500 mile limit. Although that's probably because I spent a certain amount of time at the start of my career detecting faults in fibre optic and copper installations using reflectometers to work out how far down the line the problem was...
It depends on your definition of linear. In some algebra contexts you would call it a linear function because it's a line, but in linear algebra you would not consider it a linear transformation because it doesn't pass through the origin[1]. It would be more accurate to call it "affine."
[1] More generally, being "linear" means that f(ax+y) = af(x) + f(y) for constant a and variables x, y. From this, it follows that f(0) = 0 for linear transformations.
Recalling older versions of sendmail literally sent a shiver down my spine, we had something akin to a voodoo ritual whenever we needed to change some configuration.
Me too. Though in this case the "boss" was a statistician.
The way the professor initially gathered data, instead of immediately asking for help like a normal user, made me think this was going to turn into a joke. Like the one that begins with, "a manager, an engineer, and a programmer are driving through the mountains..."
(I am deliberately eliding other details not relevant to this aspect of the story. My point is merely that I am glad I had encountered it. Could easily have saved me a day or two.)