Case Study 1:

What’s in a beat?

This is a study in creativity.  It centers on a minor failure in a rollout of a new system that had many complex applications and markets.  The setting is the first deployment of the MarkVIe control platform for a GE Aero Energy LM-series turbine in which the auxiliary systems were also controlled by MarkVIe hardware.

A control system upgrade had been sold for a site that was under GE management.  The sold systems had been tested with all hardware in the loop on simulators in the laboratories in Salem, VA and I was a witness to that testing.  While we had to work out some bugs in the configurations and control software, the platform itself (hardware and firmware) performed flawlessly.

When the two systems, one for each dual-fuel water-injected LM6000PC, were installed at site the commissioning team faced a challenge. Digital I/O cards controlling valves in the auxiliary systems were intermittently and inexplicably rebooting themselves, driving all outputs safe. We had to find a solution to avoid delaying the startup and prolonging the outage. 

(source GE Vernova)*

LM6000PC Aeroderivative Turbine

The installation was in India so experts from the GE office in Hyderabad were dispatched to site.  They brought the cards back to the labs in Hyderabad.   In that testing, the cards exhibited the same erratic behavior which we could not replicate in Salem, VA.  The standard diagnostic data available in the logs stored on the IO cards gave little useful information as the problem lay deeper than those diagnostics were designed for.

All critical electronic controls have safety features which are designed to ensure any failure does not result in unsafe operation.  On electronic boards this mechanism is sometimes referred to as a suicide.  When multiple loops are being controlled by a processor one must monitor the processor to ensure it is behaving as expected.  The monitors are called ‘watchdogs’ which detect an anomalous condition and cause an immediate suicide.

Crucially, the LM6000PC turbine requires a hyper-responsive control system with frame rates of 10ms (100Hz).  That is, one hundred times every second the control system will monitor its inputs and determine how to adjust its outputs.  The MarkVIe platform has two base frame rates: a 40ms base frame rate used for most applications including heavy-duty gas and steam turbines; and a 10ms rate needed for the core engine controls of the lighter, higher-speed aeroderivatives.

The watchdogs must operate much faster than the base frame rate since some serve as timers for the higher-level functions.  One of these timers made sure that the controllers completed their operations each frame within the expected amount of time, like a lap timer.  There was a lower limit due to the nature of the timing mechanism and an upper limit which was set to twice the base frame rate, or 20ms.

All of the misbehaving cards were digital output controllers mounted on boards that included 220VAC 50Hz wetting voltage for solenoid valves.  So fifty times per second, or every 20ms, there would be a voltage spike on the motherboard.  I pointed this out.

I couldn’t explain and might not understand the precise nature of the failure mechanism, but when one of my colleagues participating in the brainstorm said the words “beat frequency”  tensions relaxed and everyone looked at each other.  Some unspoken plan was apparent.   The team developed, tested and deployed a solution in just a couple of days. 

It wasn’t the end of the project but it is the end of this case study.