787 electric jet needs regular reboots

But only once every 120 days...

Is it realistic to think that these airplanes stay powered up for even close to that length of time?
 
But only once every 120 days...

Is it realistic to think that these airplanes stay powered up for even close to that length of time?

Why would you shut it down ? They don't make any money sitting on the ramp !

Seriously though thanks a lot software engineer douche ! Wish I could make mistakes of this magnitude and still keep my job.
 
Why would you shut it down ? They don't make any money sitting on the ramp !

Seriously though thanks a lot software engineer douche ! Wish I could make mistakes of this magnitude and still keep my job.

What makes you think he kept his job?
 
It's 248 days. And if you do the math, the "engineer" used a signed integer to store number of milliseconds from power up.

Total rookie mistake but one the software "engineering" field makes over and over and over and over and over...
 
hard_reboot.png
 
If they don't just handle it with a "reboot the jet" warning and leave it at that, I bet the "fix to make it better" is to make that an unsigned 32-bit integer. Time rarely goes backward. :)

No need to retest memory leaks, etc. Same number of bits, twice the time before it rolls over.

Depends on how many things need to read it. Another electronics geek friend of mine pointed out that if only one process reads the thing, reset it on each cycle if there's no need to know time from power up and it was just being used to fire regular events.

Better yet would have been to let the hardware handle it and fire an interrupt and just handle the interrupt per normal handling procedures.
 
How long does a reboot take? Are we talking it should just be a preflight item where the pilot flicks a switch off and back on or are we talking this is gonna have to be a regularly scheduled maintenance item where it goes into the shop for a couple hours?
 
A mistake, yes, a big mistake? Not really. I seriously doubt any aircraft in the fleet has ever stayed powered on that long or ever will.

It's really easy to say how incompetent software developers are and light them up over every bug. However, It's really damn hard to be a software developer and write things that don't have bugs...basically impossible..we're human after all. Hence why I'm not a huge fan of fully electronic fly by wire systems that crash if all the computers crash.
 
It's 248 days. And if you do the math, the "engineer" used a signed integer to store number of milliseconds from power up.

Total rookie mistake but one the software "engineering" field makes over and over and over and over and over...

Not milliseconds. 0.01 seconds. Probably counting loops through a cyclic executive running at 100 Hz or somesuch. All of us RT guys do that sort of thing, but most of us have the sense to keep it explicitly bounded.

If I had to guess, it found a really, really bad voltage transient or a very long loss of communication in the future.

I doubt it will be changed to unsigned. It will have a test added for negative values, and this particular fault will have the hell tested out of it. The many thousands of other moving parts will have to wait….
 
However, It's really damn hard to be a software developer and write things that don't have bugs...basically impossible..we're human after all.

Not "basically."

You can't test anything much more complex than a "hello world" program exhaustively in bounded time. It's a so-called "NP-complete" problem that has little to do with "being human" and much more to do with the universe only being 13 billion years old and the desire to release products in less time than that.

Less-than-exhaustive testing is a practical necessity, and it means in no uncertain terms that there WILL be bugs.
 
There's a Windows/Mac/*nix joke in here someplace.


The OS is probably an embedded RT system. Takes all the fun out of it.
 
Not "basically."

You can't test anything much more complex than a "hello world" program exhaustively in bounded time. It's a so-called "NP-complete" problem that has little to do with "being human" and much more to do with the universe only being 13 billion years old and the desire to release products in less time than that..

Even "hello world" is pretty complex. Way back when I worked at IBM there was a bug report on IEFBR14. It's a null program that simply returns used as the body of a JCL "program". Problem was that it didn't zero Register 14 so sometimes it returned an error indication.
 
How long does a reboot take? Are we talking it should just be a preflight item where the pilot flicks a switch off and back on or are we talking this is gonna have to be a regularly scheduled maintenance item where it goes into the shop for a couple hours?


The planes are very rarely powered down between routine flights.

None of them are going months without being shutdown, however.
 
There's a Windows/Mac/*nix joke in here someplace.


The OS is probably an embedded RT system. Takes all the fun out of it.


OS-9, VxWorks, or GreenHills?

My money is on VxWorks. Probably on an FPGA.

And yeah, I slipped a decimal point. I even thought it was weird I kept having to add a zero when I was checking the seconds to days conversions. Haha.

Oh well. I was talking to the drywall contractor and attempting to restart the buggy centralized logging software at the same time as going "oooh cool! I know what they screwed up!" after I saw a FB post about the bug announcement go by.

Oh and as for bug hunting... I love that the industry lie marches on. Code is hard. It's so hard we code in the exact same bugs in new languages every single year. LOL.

Sure keeps a lot of us employed. I love bugs. You guys writing bugs and saying it's because it's hard, keep me pretty damn well fed. Haha.

What's either worrisome or very cool is realizing instantly when you see "248 days" what the bug was from experience. I'm going to go with worrisome. I need to get out of this stupidity before it implodes.

But then the paycheck shows up and I go back for more. Ha.
 
The planes are very rarely powered down between routine flights.

None of them are going months without being shutdown, however.


A friend mentioned that some airports don't have the special power needs for the aircraft done right, but I haven't asked him to elaborate. I guess an outstation blew up a bunch of his avionics and he got an extra night there while they flew in parts.

I assume the electric jet doesn't need anything special for voltages or frequency, but needs the power to be a wee bit cleaner than some places are able to provide.

It'll bug me enough I'll eventually ask him, but probably folks here who know, too.

Not supposed to let the smoke out. Once you do that the toys don't work anymore. ;-)
 
OS-9, VxWorks, or GreenHills?

My money is on VxWorks. Probably on an FPGA.

And yeah, I slipped a decimal point. I even thought it was weird I kept having to add a zero when I was checking the seconds to days conversions. Haha.

Oh well. I was talking to the drywall contractor and attempting to restart the buggy centralized logging software at the same time as going "oooh cool! I know what they screwed up!" after I saw a FB post about the bug announcement go by.

Oh and as for bug hunting... I love that the industry lie marches on. Code is hard. It's so hard we code in the exact same bugs in new languages every single year. LOL.

Sure keeps a lot of us employed. I love bugs. You guys writing bugs and saying it's because it's hard, keep me pretty damn well fed. Haha.

What's either worrisome or very cool is realizing instantly when you see "248 days" what the bug was from experience. I'm going to go with worrisome. I need to get out of this stupidity before it implodes.

But then the paycheck shows up and I go back for more. Ha.


If you can write bug free code, you might as well stop being a sysadmin and join the dark side. Since you'll be the only developer in the world that won't make those same mistakes we keep making you'll be able to command a salary high enough to buy yourself quite a fleet of aircraft :)
 
If you can write bug free code, you might as well stop being a sysadmin and join the dark side. Since you'll be the only developer in the world that won't make those same mistakes we keep making you'll be able to command a salary high enough to buy yourself quite a fleet of aircraft :)

Naw, he should just be a manager. :)

If I had a nickel for every manager that was surprised a big software system had a bug….

Locally, we have problems with people assuming that following the process means all bugs get found. Sure, it finds quite a few bugs if it's a good process, but there is no process in the universe that will find them all in finite time.
 
If you can write bug free code, you might as well stop being a sysadmin and join the dark side. Since you'll be the only developer in the world that won't make those same mistakes we keep making you'll be able to command a salary high enough to buy yourself quite a fleet of aircraft :)


Still a cop out. Stop writing in twenty different languages and write some libraries that get re-used and a set of hard business rules that grow and must be met over time.

Making stuff like "Thou shalt NOT use a 32-bit counter for time without a mechanism for handling counter rollover" a serious offense worthy of demotion, pay cuts, whatever you like... Because it's too dumb to allow to happen anymore in an industry that needs to grow up after four decades of existence...

You see where I'm going. And no. Not going to join the ranks of coders writing the same bugs over and over. I'm a sysadmin because I see patterns. System problems that manifest themselves in ways where you get just a hint that they're going on because you have that "Deja Fu" sense.

You know Deja Fu, right? "Somewhere, sometime, someone has kicked me in the head like this before."

Make no mistake, even though I'm joking about Deja Fu, the personal nature of the kick to the head is real. It's really damn annoying to tell the twentieth coder in your life that they wrote the same damn bug you first saw in 1995. (Or for some of us here long before that.)

But even that isn't so bad as realizing their bosses haven't stopped, figured out how to stop the tool churn, and often even encourage it (here I'll make you laugh Jesse... "oh goodie! Let's all use Ruby now!") and haven't figured out how to make the tools and methods used smart enough to catch the most basic bugs ever known to man... Shoving a timer counter onto an integer type that's not big enough for the possible uptime of the machine.

The industry needs a makeover. I don't know what it is or what will trigger it, but I suspect it'll happen in banking and it'll be a bug so damn well understood by everyone, even laymen, who'll say "what retard did THAT?" and it'll have to pull real money out of lots of bank accounts. We're already seeing hints of it during trading halts on Wall Street for glitches in high speed trading code.

It's pretty much a little game of Russian Roulette as we put computers in charge of important stuff. A co-worker had "all braking" fail on his goofy Chevy Volt. He's a software engineer and knows better. I suspect he had all regenerative braking fail. Not all. But nevertheless, I asked him what the dealer said... "They just replaced the computer." And that's a system that's actually been designed for either fail-safe or fail-neutral.

Anyway. I always chuckle when I mention that there could be building codes and standards for software and the excuse is always "you try writing code - it's hard!" I'm sure the first house builder who got a failed inspection by the code enforcement inspectors said the same thing.
 
Nate, it's possible, even common, for code reuse to be a source of bugs.

The usual case study is the Arianne 5 maiden voyage.

And all those Unix buffer overruns everyone hates so much on the security side came from the same poorly designed libc still in use today.

Some of us don't translate just for the heck of it. Real time guys still favor C, and occasionally C++. Every once in a while some total idiot tries out Java and then gets handed his butt by garbage collection.
 
Jesse here's an example of knowing the pattern is wrong, and it's even from one of your lines of work.

Ordered about $8500 worth of stuff from a well known online IT vendor yesterday. Used the boss' credit card because he likes getting the perks.

Put in the order. No problem.

Order gets split in their system into two orders for whatever silly business reasons internally. (Dumb enough as it is, but anyway...)

One order processes and ships.

The other order fails and sends an email that either the credit card address or the phone number associated with the card is incorrect.

Hmm. Since I entered the same data for the original one order and they split the order... See the pattern / problem here?

It's easy for the user to see and a massively difficult problem for the computer which is handling the two orders as if they were separate with no initial relationship.

And the engineer won't call it a bug and will use the more common case of "what if the card limit is hit on the second one? We still want the first one to go through and make our money on that one, right?"

Heh. Maybe or maybe not. Customer might need all that stuff on the original order you split to complete a project and having half of it process is not going to work. Or... Maybe they were ordering for multiple projects.

Either way, if you could successfully process one and not the other, and they were based on the exact same data input, something is wrong.

What do you bet that when I call and ask "How did order number X go through and not order number Y?" they have absolutely no way to flag that as a possible bug and they don't have a report that triggers when two transactions driven from the same input data, have one succeed and one fail?

That bug will be in their system... forever.

In my case, I only saved $300 using them over a better vendor on an $8500 purchase. (Technically it was $12,500 worth of stuff and the other vendor had a better price on part of it.)

Think I'll bother shopping with them again for $300 off and the waste of an hour of my time following up on whatever happened? Unlikely. I'll remember their system is buggy and avoid them whenever possible in the future.
 
Some of us don't translate just for the heck of it. Real time guys still favor C, and occasionally C++. Every once in a while some total idiot tries out Java and then gets handed his butt by garbage collection.


LOL! I know. I've never seen a significant Java project of any size not have their butts handed to them by garbage collection. Doesn't even have to be real-time stuff.

The whole damn company ran on Java at the last place I was at. And I got all the four AM phone calls saying it was down again. My favorite is Java's handling of database thread pools. Look that disaster up sometime if you're bored. Haha.

One developer walked out never to be heard from again trying to fix that one. I'm not kidding. He even left the country. His boss finally had to fire him when he heard from him weeks later on an international phone call where he said he wouldn't be back. LOL
 
The biggest problem I have seen in my career as an engineer is lack of testing. Management often pushes hard on the development engineers to get the product done and then they also expect them to test. That is wrong on three basic levels:
- the author of the system should not be allowed to test due to a bias
- developers are paid for being creative, not destructive
- management is too pressed to release that they don't give the developers enough time to test anything properly
This is a systemic problem due to the nature of the business (yes, writing SW is business).
I always push for more testing but you can see a lot of developers who are too lazy to test or too arrogant ("my code doesn't have bugs").
We are all human.
Anything wrong with technology in the world can be attributed to the FHF (Freaking Human Factor). We are our own worst enemy. Think about it. :)
 
My favorite is the "hard release schedule". Code releases on X day, QA *must finish* testing by Y day, Deployment is always on Z day.

That above leads to total stupidity eventually. People like routine and won't stop that process even if the code is crap and shouldn't have even made it to QA.

Plus, as an old boss said, "Quality Assurance is not a department. It's a series of decisions and a state of mind."
 
My favorite is the "hard release schedule". Code releases on X day, QA *must finish* testing by Y day, Deployment is always on Z day.

That above leads to total stupidity eventually. People like routine and won't stop that process even if the code is crap and shouldn't have even made it to QA.

Plus, as an old boss said, "Quality Assurance is not a department. It's a series of decisions and a state of mind."

Which is why I feature branch everything, and what is ready, and we feel confident about gets deployed, and what isn't ready waits. And some things never go out, not because we don't need it, but because we never got it where it needed to be.

Also have never committed to anything being done by any date. Not sure I'll get away with that forever, but I'll do it as long as I can :)
 
Nate, it's possible, even common, for code reuse to be a source of bugs.

The usual case study is the Arianne 5 maiden voyage.

And all those Unix buffer overruns everyone hates so much on the security side came from the same poorly designed libc still in use today.

Some of us don't translate just for the heck of it. Real time guys still favor C, and occasionally C++. Every once in a while some total idiot tries out Java and then gets handed his butt by garbage collection.
I keep getting calls & emails about Java gigs and keep saying no thanks. Might say yes to one just to get enough $$$ to pay for the GPS and ADSB bills.

Nah, don't think so. I think I'd do COBOL again before Java. Anyone here need a really good C/ C++ or FORTRAN programmer?
 
BTW, for those that don't understand what's so hard about software development, this article describes it pretty well. Worth the read..and actually scary scary accurate:

http://www.stilldrinking.org/programming-sucks


It does describe how much of the problem is self-inflicted, though. Especially that part where no one gets good at anything because they constantly go hunting for new tools. I call that the "ooh shiny!" syndrome.

(And I thought we figured out that the perl had a bug in it the last time this was posted? LOL.)

All the really good, and I mean scary good, coders -- nay, engineers... Because they behave like real engineers -- I've met all have a couple of traits in common. They have a lot of grey hair, and the discipline to pick one language and know it inside and out.

I'm not being dishonest when I say that every young coder I've seen who doesn't do that, eventually leaves the biz. They can't keep up with "ooh shiny" kids with more energy and no life outside of work for long, once they get married, have kids, do stuff.

But those engineers. Man are they good. As a sysadmin I can walk over and explain whatever odd behavior I'm seeing (and I agree with the article, all code is always broken, it's just a matter of degree and how), and they'll not only have a good idea where it is and what it is, but they'll often open the editor and jump right to it in a few hundred thousand lines of code.

They have that trait also. They know their code and everyone else's in the one big moneymaking project and they'll completely ignore the never ending new stuff that isn't making the company a dime. They stay close to the revenue stream. The important stuff. They have an innate ability to ignore the stuff that'll be gone in six months and the team disbanded.

The key word is that they're disciplined. They really pay attention only to the important stuff and let the rest pass them by.

At one place it was the C guy who had started life writing a commercial RTOS and went into telecom. At another it was the Java guy who's written Java since the day it was first released. Both knew the annoyances and limitations of their particular toolset, but they didn't bail to go write in another language.

In fact the C guy looked lost when we mentioned that a database was going to have to be loaded from a floppy disk and a REXX script. He just stared and said, "Yeah. Um. Can you guys in support write that? I'll look it over for any big mistakes and check it in." He knew it wasn't his thing and didn't go trying to make it his thing.

The kids around these guys would flail around if brought a set of symptoms and say they'd find a tool on the Internet to hunt the bug, and generally couldn't be nailed down on anything like an estimate of how long anything would take, because they always worked from the reactionary mode and had no depth in any particular toolset, language, or anything really. They always looked frustrated and stressed out. Then there were some mid-level folks who were workhorses and knew a few languages but bad started down the path to the guru status on one or another. They could code anything in those couple of languages but hadn't yet developed a sense of "big picture" that the grey hairs had. The grey hairs were obviously mentoring them and they knew it. They enjoyed it and had very little stress.

That guy in the above post... Hasn't figured it out yet. He can't be a world class expert in anything if he tries to do all of that stuff. He can't even become a mid-level expert until he lets some of it go. Some people just crave the chaos and won't ever be that good at one thing. They're limiting themselves and don't know it.

As systems guys, we kinda have to at least be familiar with all the goofy tools and languages and their operational quirks. Otherwise we can't see when the devs addicted to "ooh shiny" are about to drive the bus off the cliff. So it's a little different for you and me.

The very top of his article is also hilarious. Another common self-inflicted wound set. The devs in his story are writing both the design spec and choosing the materials for the bridge? Never seen a shop that wasn't totally out of control, allowing that for every component of "the bridge". Of course with the demise of waterfall and top down type design also went away with it the design team choosing tools and standards and the team sticking to them. Agile methodology never intended to be the excuse for "I found this tool that's at version 0.05 on the Internet, only supported by one guy in his basement, and I want to install it on Production because I'm so agile I can support it". Heh. But it turns into that everywhere I've ever been.

I'm not kidding. I had this conversation today with a young coder...

"I knew a guy who wrote code in aerospace stuff! He said it was CRAZY how much testing they had to do!"

LOL... Gee, I wonder why. :)
 
It does describe how much of the problem is self-inflicted, though. Especially that part where no one gets good at anything because they constantly go hunting for new tools. I call that the "ooh shiny!" syndrome.

(And I thought we figured out that the perl had a bug in it the last time this was posted? LOL.)

All the really good, and I mean scary good, coders -- nay, engineers... Because they behave like real engineers -- I've met all have a couple of traits in common. They have a lot of grey hair, and the discipline to pick one language and know it inside and out.

I'm not being dishonest when I say that every young coder I've seen who doesn't do that, eventually leaves the biz. They can't keep up with "ooh shiny" kids with more energy and no life outside of work for long, once they get married, have kids, do stuff.

But those engineers. Man are they good. As a sysadmin I can walk over and explain whatever odd behavior I'm seeing (and I agree with the article, all code is always broken, it's just a matter of degree and how), and they'll not only have a good idea where it is and what it is, but they'll often open the editor and jump right to it in a few hundred thousand lines of code.

They have that trait also. They know their code and everyone else's in the one big moneymaking project and they'll completely ignore the never ending new stuff that isn't making the company a dime. They stay close to the revenue stream. The important stuff. They have an innate ability to ignore the stuff that'll be gone in six months and the team disbanded.

The key word is that they're disciplined. They really pay attention only to the important stuff and let the rest pass them by.

At one place it was the C guy who had started life writing a commercial RTOS and went into telecom. At another it was the Java guy who's written Java since the day it was first released. Both knew the annoyances and limitations of their particular toolset, but they didn't bail to go write in another language.

In fact the C guy looked lost when we mentioned that a database was going to have to be loaded from a floppy disk and a REXX script. He just stared and said, "Yeah. Um. Can you guys in support write that? I'll look it over for any big mistakes and check it in." He knew it wasn't his thing and didn't go trying to make it his thing.

The kids around these guys would flail around if brought a set of symptoms and say they'd find a tool on the Internet to hunt the bug, and generally couldn't be nailed down on anything like an estimate of how long anything would take, because they always worked from the reactionary mode and had no depth in any particular toolset, language, or anything really. They always looked frustrated and stressed out. Then there were some mid-level folks who were workhorses and knew a few languages but bad started down the path to the guru status on one or another. They could code anything in those couple of languages but hadn't yet developed a sense of "big picture" that the grey hairs had. The grey hairs were obviously mentoring them and they knew it. They enjoyed it and had very little stress.

That guy in the above post... Hasn't figured it out yet. He can't be a world class expert in anything if he tries to do all of that stuff. He can't even become a mid-level expert until he lets some of it go. Some people just crave the chaos and won't ever be that good at one thing. They're limiting themselves and don't know it.

As systems guys, we kinda have to at least be familiar with all the goofy tools and languages and their operational quirks. Otherwise we can't see when the devs addicted to "ooh shiny" are about to drive the bus off the cliff. So it's a little different for you and me.

The very top of his article is also hilarious. Another common self-inflicted wound set. The devs in his story are writing both the design spec and choosing the materials for the bridge? Never seen a shop that wasn't totally out of control, allowing that for every component of "the bridge". Of course with the demise of waterfall and top down type design also went away with it the design team choosing tools and standards and the team sticking to them. Agile methodology never intended to be the excuse for "I found this tool that's at version 0.05 on the Internet, only supported by one guy in his basement, and I want to install it on Production because I'm so agile I can support it". Heh. But it turns into that everywhere I've ever been.

I'm not kidding. I had this conversation today with a young coder...

"I knew a guy who wrote code in aerospace stuff! He said it was CRAZY how much testing they had to do!"

LOL... Gee, I wonder why. :)
I agree that it's self-inflected, but it's the state of the industry, and it's not likely to change any time soon. It's the reality of what most developers have to face on a daily basis and there is little they can do to stop it.

As an industry we build software incredibly fast, we do it under intense pressure, we make lots of mistakes, and if we try to change any of that it's very difficult to stay competitive. The reality behind why it is like this..is because it works...but damn it's a headache sometimes.
 
Why would you shut it down ? They don't make any money sitting on the ramp !

Seriously though thanks a lot software engineer douche ! Wish I could make mistakes of this magnitude and still keep my job.

True, but any man-made machine needs maintenance. Seems an awful long stretch for an aircraft to be run with NOTHING requiring a shutdown.
 
BTW, for those that don't understand what's so hard about software development, this article describes it pretty well. Worth the read..and actually scary scary accurate:
http://www.stilldrinking.org/programming-sucks

Cool.

Now let's have 100 million self-driving cars on the roads and self-piloting airliners flying the skies run by software written by "idiots and dicks" in the flavor-of-the-month language; to keep us all "safe."
 
True, but any man-made machine needs maintenance. Seems an awful long stretch for an aircraft to be run with NOTHING requiring a shutdown.

We typically don't shut down the airplanes even for station level maintenance. If the plane will be out of service and parked remotely, then they are shut down.

Twice I have had a weird maintenance glitch that required shutting down the entire airplane and restarting to clear the problem.
 
Back
Top