hooooboy! what a day!

avi

W3RK3R
Aug 21, 2002
10,213
3
38
Oly, WA
www.itsatrap.com
looks like our main server that runs pretty much every single little thing this company does took a big, fat shit just now. we could be totally fucked.
 
ID^^^^^$$$$10#####$$$$$T(*(*(*($

MY SYMPATH1ES T0 TH# P00R 0L L4SS. MAY SH3 RES7 IN PIECE.

1F'N Y0U NEED FUR7HER ASSIST. PLEAS3 DI4L +0.1.999.328.7448


x (.)_(.)>>>>>>>>>>> (|)(|)(|) >>>>>>>>>>>>>>>>>.>.>.>
 
I almost positive the box was running redhat. I'm pretty sure the only windows server stuff we do is for stuff running actual windows media streams.
 
RedHat is a crappy distro out of the box. You have to do some major tweaking to have it run the way you need. Default installation is even worse than a vanilla windows install... I am sure this has been done though if it doesn't crash that much... Ever done a Risk analysis in your shop? How's the Business recovery/contingency plan?
 
I was told it was the box itself, not the system that died.

I'd like to say there's plans for all that stuff in case something dies, but knowing the way things go around here, I can't be sure. too many of the engineers need to talk shit to death before anything actually gets done. i just want to scream "FUCKING DO IT ALREADY" and strangle them. seriously - during a mass of live streaming events today, I had to practically yell at my frickin' boss to help me. anytime I said something he'd babble "so you want me to so-and-so...?" jeez! don't question it, just get it done. I ended up just rushing by him to do it myself.
 
full description of the problem from the head of IT:

The primary data store, the storage array which supports all the production requirements, experienced a fatal hardware failure at approximately 12:04 PM. It attempted to re-boot itself, which attempt failed. That left the web site completely out of service. We went to the data center and activated the "hot spare" filer, which assumed the processing load and brought the site back to life. No data was lost, and processing was restored at approximately 1:30PM. The bulk of the elapsed time was the hour that it takes to drive to Sunnyvale, where the data center is located.
 
Well that's not much of an explanation. All we know is that hardware in the storage array failed. What was it, hard disk failing, card, cable, what? It would have been a good idea to have the "hot spare" online instead of having to drive there to actually put it in.
 
don't lecture me, it's not my deal. my understanding is that the box itself failed b/c all they did was switch the drives to the spare and turn it on. I believe the matter of having the spare online et al. is primarily a budget constraint.
 
I am not lecturing you, I am commenting on their work, not yours. Also, you deal with this crap everyday and have to answer to clients you should be entitled to some explanation that is a little more thorough than :" hugh! the box failed." which is exactly what you had there....
 
well, there was a bit more to the note, but I'm not posting it. I guess there was also a cable length issue that complicated activation of the spare, too. anyhoo, it's fixed - that's what matters. and also, this msg went out to everyone so it had to be dumbed down to a salesperson level.