Cray Research and Cray Inc Stories
Hold issue failure
A Cray analyst once told me of a strange problem that an oil company in Texas had where they reported that for some reason the seismic job that they ran every day on their X-MP that normally took many hours to run had started running to completion in half the time that it normally took. The customer confirmed that the results being produced were correct, just that the machine was running the job in half the time. Diagnostics were run and quickly confirmed a problem with the “hold issue” logic. When an instruction wants to access memory that a previous instruction has not yet completed writing data into, a “hold issue condition” prevents the “issue” of the next instruction until the data has been written after which the “hold” is released. In this case data was being read out of memory before the previous step had finished writing it. This should have caused the job to produce wrong answers, but it was found that by pure luck the way the code was written, the data in memory that the code would normally wait for the completion of the write, wasn’t being used by the next instruction in any calculations. Although I never understood why the code was written to read data that it would do nothing with, this meant that in this case it didn’t matter if the instruction didn’t wait for the previous one to finish, so the code just pushed on without waiting. The site engineer quickly located the failing module, replaced it and confirmed that all was now running okay and that the customers job was no longer running in half the time. The analyst finished the story by saying that the customer had mixed feeling about this “fix” as they liked having a system that ran their code in half the time.
The overheated X-MP system
I think everyone at Cray knows the story of the X-MP system somewhere in Saudi Arabia that had a double failure that resulted in the system cooking itself. At some point before the second failure there had been an undetected failure in the monitoring system that would have normally shut the X-MP down if any of the column temperatures got too high. Unfortunately, some time after this first failure the cooling system failed and the X-MP got hotter and hotter until it just stopped working. An inspection of the system showed that it got so hot that the solder used to mount chips on the circuit boards had melted and that chips on the underside of circuit boards had fallen out of their sockets and were piled up on the circuit boards below. A SWAT team from Chippewa went to the site and started resoldering chips back on modules and after an heroic effort eventually got the entire system back up and running again.
Beryllium copper EMI finger strips and Y-MP 2E & 4E doors
The YMP-2E and 4E systems had two very annoying features, both related to the two large doors located on either side of the cabinet. The doors swung outwards from the centre of the cabinet to allow access to power supplies and cable connectors. During normal operation the doors were held closed by a metal locking bar that had a locating pin at the base and a hex head cap screw at the top.
The door opening procedure was …
1) While holding pressure against the locking bar remove the hex head cap screw.
2) Lift the locking bar up and out so that the locating pin disengaged.
3) The doors could now be swung open.
When the bar was removed the doors would usually spring out a few inches as all around the perimeter on the inside of each door were strips of beryllium copper EMI finger strips that were compressed when the doors were closed. The job of the strips was to seal the doors and prevent any Electro Magnetic Interference (EMI) leakage.
Beryllium Copper EMI Finger strips
To close the doors you had to push them both in far enough to get the locking bar’s locating pin back into the hole, after which you could lever the locking bar up and this closed the doors enough to get the hex head cap screw to engage. The trouble was that the beryllium copper EMI finger strips were so springy that it was difficult to close the doors enough to get the locking bar locating pin into the hole. This was made even more difficult by the fact that the average computer room floor tile has a smooth vinyl surface that our work shoes would slide on. Usually you would have to ask someone to come and help lean on the doors while you struggled to secured the locking bar. So, why were the doors designed this way? One of the instructors in Chippewa told me that because so many engineers had complained about how difficult it was to close these doors, one day he went over to the development building and found the guy responsible for the design. He said that the guy he spoke to couldn’t understand why anyone would have difficulty getting the doors closed and proceeded to show him how simple it was. He stood in front of the doors, bent his knees forward and pushing the doors in with his knees inserted the locking bar and tightened the screw. That was when the instructor realized that he was wearing a pair of hiking boots with huge ripple soles. It turned out that this guy was a keen hiker and always wore hiking boots when at work. The huge ripple soles of his boots gave him enormous traction so that he could easily push against the doors and compress the beryllium copper EMI finger strips. It never occurred to him that Cray engineers in the field didn’t wear hiking boots to work like he did!
The second really annoying thing about this setup was that the beryllium copper EMI finger strips were very sharp. If you happen to reach too far around as you were struggling with the doors they would slice into your fingers and give you painful cuts.
Milking machine parts used in Crays
Sometime during the 1990s a story appeared in a local Wisconsin newspaper that claimed that Cray supercomputers were being made out of old milking machines! As with most of these Cray stories there was a germ of truth behind this, but “No”, they were not made out of old milking machines. When the Y-MP series of systems was being developed the decision was made to cool the modules by circulating fluorinert through them. Once the fluorinert had carried the heat away from the module it passed into a heat exchanger where it was cooled by chilled water supplied by the customer. Initially Cray approached a hydraulics company to supply the plumbing components for the cooling system. These components needed to be able to work under high pressure, not contaminate the fluorinert with any oils or lubricants and be able to be assembled and disassembled easily when needed. The vendor informed Cray that what they were asking for was a custom design and quoted a cost way beyond what anyone at Cray expected. Fortunately, at this point someone suggested that they visit the factory up the road that manufactured milking machines. It turns out that milk is handled under high pressure and all the fitting are made of food grade stainless steel to ensure the milk does not get contaminated as it is collected. The fittings are all designed to clamp together with rubber seals without any cutting or welding needed. Also, pretty much all the plumbing they needed could be ordered straight out of their standard catalogue. As can be seen in the photograph below. The pipe fittings all had flanged ends that were held together by easily disconnected coupling clamps and even the shutoff valves had large plastic handles so that no matter how much milk you spilled on your hands they were easy to grip and operate!
The only item that had to be sourced elsewhere was the Heat Exchange Unit (HEU). This consisted of a tank that had chilled water pumped through it while hot fluorinert flowed through a series of tubes within the tank. The solution, I was told by one of the refrigeration engineers, was provided by a company that made speedboat radiators. They modified one of their designs to meet the flow rate and heat transfer rate required and it worked quite well.
Russian agents in Chippewa
On my first ever day in Chippewa Falls to attend X-MP training class the very first person who came in to speak with us was a lady from security who warned us all to be careful with whom we spoke with and what we told people about Cray’s business. She said that the FBI had tracked Russian KGB agents to the front door of the building we were sitting in and that there were certain individuals in Chippewa Falls who’s allegiance did not lie with the US. She said that if you meet someone in a bar and they showed too much interest in what you were doing at Cray, to excuse yourself and move away from them. She told us that every so often a foreigner would pass through the town looking around and showing too much interest in Cray, but Chippewa Falls being so small everybody knew when someone new and suspicious showed up and avoided them. I heard that in the early days when Seymour was working at the Hallie lab the odd farmer would tell stories of someone turning up at their gate and asking in a thick accent “where the computer factory was?”. The farmers would usually send them off on a wild goose chase in a direction away from the Hallie lab.
The missing Cray 2 boolean
In the training classrooms we had bookshelves full of folders containing copies of system boolean. We needed these in order to learn how to diagnose and repair modules. In our classroom was a complete set of X-MP boolean. Some of our homework assignments required us to take one or two of these folders back to our apartment to study. In order to remove boolean from the training building there was a special form that had to be filled out and filed at reception before the folder containing boolean could leave the building and the folder had to return the next day. The reason the forms and the folders were policed so closely, we were told, was because one time an entire set of Cray2 boolean disappeared over a weekend, but mysteriously reappeared on Monday morning. They never found out who took the boolean, or returned it, although everyone assumed that somewhere that weekend a photocopier probably ran hot.
Getting bailed out of jail
After the security lady finished we were briefed on the facilities of the training building and then on life in Chippewa Falls and what to expect over the next couple of months during our stay. One of the warning that I particularly remember being given was “if you get arrested by the police, do NOT ring your instructor to bail you out!” Apparently several of the visiting Cray engineers had gotten into trouble and been thrown in jail after a heavy night drinking and then attempting to drive back to their apartment. Something that they said they did all the time, but could not understand why the Chippewa Falls police had an issue with this! When the police asked them if they wanted to ring someone to bail them out, the only person they knew in Chippewa Falls was their instructor, so at three in the morning the instructor would get a phone call from some guy with a heavy accent asking for them to come down to the police station and bail them out. There was memo written about this and it was famous in Cray as every new hire received a copy of it. Below is a scan of my copy.