Jump to content
Sign in to follow this  
Tom Allensworth

So, What Happened to AVSIM?

Recommended Posts

First... if someone tells you that a RAID system never fails, tell them they don't have a clue as to what they are talking about. I have it on good authority that they can, and do.

 

Were we attacked? No, the system shutdown was not due to any activity to shut our forums down.

 

We had indications that a disk was unstable; much of what you have seen in 503 errors we now attribute to that and what follows in this story. The "disk" finally did fail last week sometime after 3:40 a.m. on Thursday morning. The whole system crashed. That’s not supposed to happen in a RAID array. The system should continue to operate with a failed disk, or depending upon configuration, multiple failed disks. We replaced the “bad” disk on Saturday and when we did, the remaining three disks started showing indications that they too were on the road to failure, but did not show that until we replaced disk 1. Importantly those three disks have not failed - yet. And we'll come back to those three disks below. The answer to why the RAID array did not do what it should  have done is that all four disks became corrupted. Disk 1 was just the first to fail.

 

We decided to go ahead and start the IDERA backup process Tuesday night in spite of the wobbly three disks and we were encouraged that no errors were showing as late as 20 hours into the process. Nearly 30 hours later, the system came back online after we did some additional dialing and tweaking.

 

Okay, now about those three questionable disks in the RAID array today… We have ordered and have now received a total of five disks… We will replace the three existing failing drives, and we will then expand our system to go from a total of four to six drives. By doing that, we will have a hot spare, and have a fully RAID 5 configuration which will allow two disks failures and a system that continues to operate. The replacement of three “bad” disks should be pretty straight forward. We will replace each on one day, let it stabilize and let the RAID process do its thing and then replace the second the following day, and third the following after that. In other words, it is going to take three days and six trips to and from the colo to get that job done, but short of shutting the system down and reloading the best backup that is the quickest and safest way to proceed.

 

The system is now going to show some sporadic slowness over the next day or two. The reason for that is that the RAID system will be doing its thing with the "parity" process (and I never thought of myself as an I.T. guy - now I is... Parity... Love that word!)

 

As for all the time it has taken… I can’t begin to tell you how much of a goat rope this whole thing has been. Support delays from HP, Idera, NETWORK2000, PCCW, and Equinix or confusion about “who’s on first” have contributed hours of lost time.  We had our tech support people sitting in the cage at our colo Tuesday from 11 a.m. until nearly midnight, mostly waiting for support responses.  We had the colo accept our first replacement drive on Friday. Monday they refused to accept the two expansion disks and they were shipped back to HP. HP had to resend us drives yesterday. We have been forced to have all of our hardware shipped to the tech support’s office some miles away, and he now has to make sure the hardware gets to the colo – time and travel that we pay for.

 

By the end of next week, if not sooner, we expect to have a fully populated RAID array, operating to spec and with a hot spare if things go catywampus again. More importantly to you, as a member, is that we believe the 503 errors will finally go away (yes, there was one about 20 minutes ago, but that was my doing, not the system's). When all is said and done, we think the system will be back to good health and operating at its peak performance.

 

Share this post


Link to post

Tom,

Thank you for all the hard work that you and everyone on Team AVSIM have been performing to get everything back online.

Share this post


Link to post

Nice to see you getting your parity bits in order Tom :)

 

If there's one thing that I have learnt in 30 years of IT support, it is never say never - closely followed by "it will only go in one way"


Scott
Boeing777_Banner_Pilot.jpg

Share this post


Link to post

Glad you are back Tom, and interesting read although after I read these technical things I wonder why I bother....it mostly sounded like blah blah blah we are back up blah blah blah.  I guess that is why we have IT guys.


Mark W   CYYZ      

My Simhttps://goo.gl/photos/oic45LSoaHKEgU8E9

My Concorde Tutorial Videos available here:  https://www.youtube.com/user/UPS1000
 

 

Share this post


Link to post

Great news, I feel the forums a bit speedier, many thanks to Tom and all the AVSIM Team. Just wondering if support for TapaTalk will be back someday.

 

My bad! I see TapaTalk support is working!


Alexander Colka

Share this post


Link to post

It is very nice to have AVSIM back, I was quite concerned.

 

João Alfredo


It is impossible to please Greeks and Trojans

É impossivel agradar Gregos e Troianos

Share this post


Link to post

Thanks a lot Tom, glad everything got solved. And yes, I was also afraid....


Best regards,
Luis Hernández 20px-Flag_of_Colombia.svg.png20px-Flag_of_Argentina.svg.png

Main rig: self built, AMD Ryzen 5 5600X with PBO enabled (but default settings, CO -15 mV, and SMT ON), 2x16 GB DDR4-3200 RAM, Nvidia RTX3060 Ti 8GB, 256 GB M.2 SSD (OS+apps) + 2x1 TB SATA III SSD (sims) + 1 TB 7200 rpm HDD (storage), Viewsonic VX2458-MHD 1920x1080@120 Hz, Windows 10 Pro. Runing FSX-SE, MSFS and P3D v5.4 (with v4.5 default airports).

Mobile rig: ASUS Zenbook UM425QA (AMD Ryzen 7 5800H APU @3.2 GHz and boost disabled, 1 TB M.2 SSD, 16 GB RAM, Windows 11 Pro). Running FS9 there... sometimes on just battery! FSX-SE also installed, just in case. 

VKB Gladiator NXT Premium Left + GNX THQ as primary controllers. Xbox Series X|S wireless controller as standby/travel.

Share this post


Link to post

Super great to have you back online! :Applause: :Party:


 

Staffan

Share this post


Link to post

Nice work, Tom! Thank you,  and to all those who worked hard at getting her back up, thank you as well!

 

Don

Share this post


Link to post

Great news, I feel the forums a bit speedier, many thanks to Tom and all the AVSIM Team. Just wondering if support for TapaTalk will be back someday.

 

My bad! I see TapaTalk support is working!

Yes, it it working and take a look at the number of users being served.

Share this post


Link to post

Whoa, that was quite the headache for you and the team to say the least. Strange not having avsim for a week. Like a whole community disappeared without a trace, something like that MH370 flight. 

 

One thing Ive learned working with computers and electronics over 25 years.... There are no guarantees. The best redundancy can still fail. Still reduced greatly, but can fail. 

 

The only guarantees in life are Death and Taxes. 

 

Glad is all up and running now. Welcome back.


CYVR LSZH 

http://f9ixu0-2.png
 

Share this post


Link to post

Glad to see your'e back up again!  I, too, have noticed a performance increase.  Hope it holds.

 

I've seen a failure like that before on standalone servers with onboard RAID controllers.  One disk goes bad, and on replacement the whole array went south.  Not fun.  Thankfully you have good backups!


Tony

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  
  • Tom Allensworth,
    Founder of AVSIM Online


  • Flight Simulation's Premier Resource!

    AVSIM is a free service to the flight simulation community. AVSIM is staffed completely by volunteers and all funds donated to AVSIM go directly back to supporting the community. Your donation here helps to pay our bandwidth costs, emergency funding, and other general costs that crop up from time to time. Thank you for your support!

    Click here for more information and to see all donations year to date.
×
×
  • Create New...