List of Archived Posts

1996 Newsgroup Postings

360/370
Cache
Hypothetical performance question
360/370
360/370
360 "channels" and "multiplexers"?
John Hartmann's Birthday Party
360 "channels" and "multiplexers"?
Why Do Mainframes Exist ???
Why Do Mainframes Exist ???
cics
Caches, (Random and LRU strategies)
Caches, (Random and LRU strategies)
IBM song
PC reliability
mainframe tcp/ip
tcp/ip
middle layer
middle layer
IBM 4381 (finger-check)
IBM 4381 (finger-check)
1401 series emulation still running?
IBM 1403 printer
Old IBM's
Old IBM's
old manuals
SGI O2 and Origin system announcements
System/360 Model 30
Mainframes & Unix
Mainframes & Unix
Mainframes & Unix
interdata and perkin/elmer
Mainframes & Unix
Mainframes & Unix
Mainframes & Unix
Mainframes & Unix
Mainframes & Unix (and TPF)
Mainframes & Unix (and TPF)
interdata & perkin/elmer machines
Mainframes & Unix
Mainframes & Unix
what happened to the 286?
IBM 4361 CPU technology

360/370

From: Lynn Wheeler <lynn@garlic.com>
Date: 1996/01/06
Subject: 360/370
Newsgroups: comp.arch
typically circuits/chip have significantly increased and the mis-match between processor cycle and memory latency has also increased. So there are two approaches to processor stall ... go to weak memory consistency or support concurrent threads.

assuming a cache machine & all other being equal ... concurrent threads would tend to increase the cache size requirements over weak memory consistency i.e. executing double the instructions per unit time ... isn't likely to double the cache line requirements ... while concurrent threads would tend to double the cache requirements (except possibly in degenerate case of the two threads executing the same code) ...

To some extent the workstation & PCs tend to single threaded uses ... chips for that market have optimized for single thread. Given that the volumes drive to commodity price ... then they become the chip of choice for other uses also. That could change if usefull multi-threaded systems/applications appear that provide perceived/tangible added value to the end-user ... in which case the volume chip market might start supporting concurrent multi-threaded instruction streams.

Cache

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: lynn@netcom23.netcom.com
Date: Sun, 28 Jan 1996 18:58:49 GMT
Subject: re: Cache
Newsgroups: comp.arch
David Mayhew, ibm worte:
Suppose you held cache size constant. As caches grow, the they have a decreasing marginal return in hit rate. A 4K cache might yield a 60% hit ratio while a 16K cache might not be much better than 90%. Statistically 4 threads each of which only have 60% hit ratios, but that can "instantly" context switch amongst themselves will have a 96.5% hit ratio, compared to the original single thread of only 90%. As the disparity grows between processor speed and memory speed the multithreading seems to be advantageous.

multiple hardware threads on a chip should handle processor-stall masking associated with cache misses and better utilization of shared resource (cache size).

current point on the curve seems that to be that there isn't a big over abundance of cache.

i know of one software vendor that (with detailed simulator) has convinced at least some hardware vendors that 1mbyte cache is bare minimum and 4mbyte L2 is much better.

also 2/4/etc way set associative also impacts the ability to utilize efficiently what is there. There have been some pathological mis-matches between software and bits indexing sets where some sets don't get utilized at all.

assuming a moderate amount of cache ... then there is some trade-offs with increase in cache miss (assuming short of saturating memory bus) for hardware multi-thread ... and processor-stall masking. However, it would seem at this point in the curve, weak-memory and out-of-order execution provides better trade-off in processor-stall masking and any cache size increase requirements.

one of the DEC people mentioned that optimizing for alpha cache starts to look like some of the techniques from the late 60s & early 70s for virtual memory optimization.

in late '60s & early '70s i could beat global LRU with full competition for all pages for all slots, super fast path replacement, fetch & context switch as well as dynamic adaptive based on behavior, latency, and saturation for selecting number and which mix to run together (and later on an adaptive switch between local/global LRU on per thread basis).

given that you were on the right point in the curve ... 4 programs could run faster in shared "4*X" than individually in fixed "X" each (i.e. the argument from the late 60s with denning and global/local lru ... if i remember right at sigops circa '79 somebody had rediscovered global clock for his phd thesis & it was being held up because somebody apparently thot that the local/global LRU argument still raged).

In any case, on the wrong side of the curve (& w/o dynamic adaptive controls) things could get very pathological in terms of mean-time between miss and bus saturation.

Hypothetical performance question

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: lynn@netcom19.netcom.com
Date: Sat, 10 Feb 1996 02:17:11 GMT
Subject: Re: Hypothetical performance question
Newsgroups: comp.arch
cache protection can help since LRU algorithms aren't necessarily good for everything. i did some 2-way SMP kernel support (quite a while ago) where some workloads ran better than twice as fast than uni ... because the effective MIP rate of both processors increased (because of decreased cache miss rate). The scenerio involved some intelligent application processor affinity ... which because of non-shared cache resulted in cache affinity.

this is the counter argument to global LRU outperforming local LRU (i.e. partitioning the environment and forcing replacements from cache lines specific to an application).

circumstances were even more remarkable because the SMP hardware at the time slowed the uniprocessor cycle time by 15% to allow for cross-cache invalidation signals ... i.e. the improved cache hit rate, in order to be better than twice as fast, had to also compensate for the 15% reduced cycle time.

360/370

Refed: **, - **, - **, - **, - **, - **
From: Lynn Wheeler <lynn@garlic.com>
Date: 1996/02/11
Subject: 360/370
Newsgroups: alt.folklore.computers
i had access to the 3rd or 4th 4341 built ... and had a friend in the 4341 product group in endicott who couldn't get access to one for running performance and evaluation tests ... so I would run tests out on the west coast for them on their own product.

360/370

From: Lynn Wheeler <lynn@garlic.com>
Date: 1996/02/15
Subject: 360/370
Newsgroups: alt.folklore.computers
... except for i/o 360m67 with 768kbyte memory would be somewhere between 8086pc and 286pc.

360 "channels" and "multiplexers"?

From: lynn@garlic.com
Subject: Re: 360 "channels" and "multiplexers"?
Newsgroups: alt.folklore.computers
Date: 09 Mar 1996 10:40:23 -0800
360 typically had 1-6 selector channels and 1 multiplexor channel.

in some sense channel was independent processing unit which executed "channel programs" which was this funny language. basically each channel had the equivalent of multiple "instruction streams" that it would hardware multiplex thru a single instruction stream decoder.

the multiplexor channel could support up to 256 of this simultaneous instruction streams ... but was limited to attachment of relatively slow speed devices (i.e. data transfer typically <1mbit/sec).

selector channels supported higher speed devices (multi mbyte/sec data transfer) but was limited in the number of concurrent programs that it could support ... with a further restriction that only one I/O program at a time could have an active data transfer section.

there were other funny rules about how the channels shared the memory bus with each other and with the processor ... and various processor functions.

when i was an undergraduate there were four of us that put together a hardware 360 control unit (i think there is a write-up someplace blaming us for having originated the ibm oem control unit business). the 360 that we worked with was a m65 which had a 13.? microsecond timer in real memory ... basically every 13 microseconds the real storage timer location "tic'ed" requiring a storage update. One of our first hardware bugs was that we raised bus-in on the channel to be able to do data transfer to memory ... and then processor died. turns out that if the memory bus was held for two timer tics (i.e. timer hardware was held off updating memory for the first timer tic ... and then a 2nd timer tic occured still w/o updating memory) the processor would figure it was in a hardware error condition and stop.

--

John Hartmann's Birthday Party

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: lynn@garlic.com
Subject: John Hartmann's Birthday Party
Date: 6 Mar 1996
attached was posted to
http://vm.marist.edu/~piper/party/jph-12.html#wheeler

Lynn Wheeler

lynn@garlic.com

I've reproduced John's abstract from an advanced technology conference I held on March 4th and 5th, 1982 at San Jose Research (I think it was the first adtech conference since the one held in the mid-70s in POK where we presented a 16-way processor design ... and the 801 group presented 801 & CPr ... i.e. precursor to RIOS & PC/RT).

Also on the agenda was Cary Campbell talking about DataHub (a pc fileserver; Cary was traveling to Provo nearly once a week to work with a group of students on parts of the implementation ... which eventually spawned a new network PC company in Provo).

Bob Selinger gave a talk about 925 work station. When the original Sun people went to IBM Palo Alto Science Center to ask IBM about building the machine ... PASC brought in the 925 group, the Boca PC people and a YKT workstation group to review the proposal. All three groups claimed to be doing something better and so IBM declined (not) to produce the Sun workstation.

Barry Goldstein gave a talk on running CMS applications under MVS.

Rip Parmelee and Peter Capak gave talks on VM/370 support for Unix as well as the TSS/370 UNIX PRPQ for Bell Labs.

The CFP and the agenda were not IBM Confidential ... but some of the other presentations were.

I have fond memories of John taking us on site-seeing tour of the (Hamlet's) castle at Helsinger(?) and ferry ride across to Malmo (and the castle on the other side).

======================================================================

From the CFP ..
TOPICS

High level system programming language
Software development tools
Distributed software development
Migration of CP functions to virtual address spaces
Migration to non-370 architectures
370 simulators
Dedicated, end-user system

A possible project which would utilize extensions in all the before mention areas is a relatively inexpansive, relatively fast non-370 CPU. A VM kernal (many CP functions having been migrated to virtual address spaces) is coded in an high level system programming language. The kernal will initially be compiled into 370 code and executed using the 370 simulator. Eventually the kernal (and possibly some of the virtual address space code) will be recompiled into the native machine language and execute along side the 370 simulator (providing both native mode and 370 virtual machines).

Although a definite pilot project is envisioned, nearly all work will be benificial to all current VM/370 environments.


=======================================================================
THE TOY PROGRAM

by: John Hartmann

ABSTRACT

The TOY Program for receiving and processing messages was originally run in a reserved storage area like the Yorktown nucleus extensions, but the instability of the CMS system in use lead to a desire to have the TOY Program run "outside" CMS. Also, some kind of full screen support was felt to be desirable.

The objectives set for the current TOY Program were:

Investigate the use of CMS as a slave operating system for applications running mostly in a stand alone mode like RSCS and PVM.

Investigate the implementation of shared segments in CP.

Investigate the possibility of restarting CMS after an error without disturbing information outside VMSIZE.

Investigate the feasibility of using storage protection for such an operating system where the use of key zero is restricted to handling CMS data areas.

Create an environment for programs where the components of a large package can be plugged in and out dynamically, so that software changes can be applied to "outlying" components without requiring a re-initialization of the whole system.

To present RSCS conversational messages on a full screen and keep conversations separate.

To be able to log such conversations on a disk file.

Make the 1052 line mode console support more usable by allowing the user to delete lines from the virtual console listing selectively instead of the all-or-nothing-at-all approach used by CP.

The objectives have been met, and the TOY Program has proven to increase the productivity of my terminal sessions considerably.


360 "channels" and "multiplexers"?

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: lynn@garlic.com
Subject: Re: 360 "channels" and "multiplexers"?
Newsgroups: alt.folklore.computers
Date: 10 Mar 1996 08:54:18 -0800
byte mux & selector basically transferred one byte per bus in/out.

selector could start a channel program ... and when it received "channel end" status ... it could start another channel program, the first one may have had completed the data transfer portion ... but still could be active (i.e. channel program running in the UCW, basically the equivalent of an i-stream in a multi-thread chip design).

byte mux could have several concurrent channel programs w/o the selector data transfer restriction ... i.e. multiples doing active data transfer

block mux introduced the concept of channel program disconnect/reconnect for active channel programs ... supporting multiple concurrent channel programs with psuedo active data transfer.

It also introduced the concept of RPS-miss for disk technology ... i.e. when it was introduced in the early '70s ... the disk drives didn't have buffers ... the disconnect/reconnect was used to disconnect the disk from the channel until specific record was in position for transfer and at that point attempt to reconnect. if the channel was busy at the reconnect ... there was a RPS-miss and the disk had to rotate a full revolution and try again.

around the '82 time-frame when I wrote the paper that system performance of disk technology had slowed down by a factor of 5-10 over a 15 year period ... there was a lot of people that got red under the color. I had just done a rough approx. based on processor and memory increasing by a factor of 50 in thruput during the period while disk only increased by a factor of 10 or less (i.e. the relative system thruput therefor had declined by a factor of at least five since the disk performance curve didn't track the processor/memory curve). When they did a more detailed analysis ... including RPS-miss in the equation indicated that relative system thruput of disks had declined by even a larger factor. In the '60s there were system configurations that had E/B ratios with multiple bytes transferred for every instruction executed. Now E/B ratios tend to be discussed in instructions per bit ... not instruction per bytes.

data streaming was introduced initially because the mip rate of the i/o processors declined ... data streaming allowed for eight bytes to be transferred per bus in/out ... rather than a single byte ... effectively the data path got wider ... but the control path slowed down (needed only 1/8th the i/o processor path length for every byte transferred). for latency sensitive operations it was actually possible to demonstrate the reduction in thruput (even tho the data rate went up).

later on data streaming was able to show additional benefits for things like extended-length (escon) channels where signal propagation was becoming a factor (as opposed to processing latency ... i.e. not required to do a synchronous end-to-end bus in/out operation for every byte transfered)

--
Lynn & Anne Wheeler

Why Do Mainframes Exist ???

Refed: **, - **, - **, - **, - **, - **
From: lynn@garlic.com
Subject: Re: Why Do Mainframes Exist ???
Newsgroups: comp.arch
Date: 25 Mar 1996 09:11:11 -0800
in the commercial mainframe environment there is a lot of 7x24 procedural and automation infrastructure that has somewhat grown up thru trial and error over the past 30 years ... only some of it represented by hardware technology.

there is also a fundamental difference in a basicly "interactive" environment design point (where default tends to be having a person handle situations) and "batch" environment design point (where default tends to be automated applications handling situations and the corresponding instrumented infrastructure to support it).

a trivial example of infrastructure/design-point is what happens when a sort program runs out of temp space ... current os/mvs (as well as os/mft circa 1968) generates a specific error code for just about every condition ... and frequently there is an automated infrastructure that can handle each of the return codes and can take corrective/recovery action.

for how many of the newer operating systems is it possible to create a batch-procedure that based on sort utility return code ... recognize out of space condition and take automated corrective action?

there are some number of current commercial situations where the cost of a daily application failing once a year to not complete on schedule exceeds the cost of the hardware. more common is where the delta people costs that are involved in attempting to utilize an "interactive" operating system for a "batch" environment exceeds the cost of the hardware.

--
Lynn & Anne Wheeler

Why Do Mainframes Exist ???

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: lynn@garlic.com
Subject: Re: Why Do Mainframes Exist ???
Date: 1996/03/26
Newsgroups: comp.arch
not shared memory ... these are sorts that not only exceed virtual memory but also potentially disk ... and/or in some situations multi-volume tape. little things like corporate payrolls, stock-exchange settlement, etc.

how 'bout when /var/tmp fills up ... sort fails ... and then you try and backtrack to find out why.

currently mvs, sms and some other stuff intercepts the space fill condition ... temporarily suspends things while space can be re-arranged so that enuf is available .. and then resume operation for completion (pro-active prevention). MVS JCL has the option of pre-allocating space ... but extents are possible. sophisticated sort applications can do their own JFCB operations ... but when all available space on all available disks fill up ... what else is there to do?

none of the primitive stuff of mapping files to specific filesystems (and frequently to specific disk drives).

i don't see anything wrong with translating batch-paradigm processing to distributed commodity priced hardware ... at some abstraction hardware is orthogonal to the system-paradigm issue. however, frequently the case is commodity priced hardware operated with systems that started out with interactive design-point. there is significant infrastructure difference between interactive and batch design-point.

infrastructure extends past system structure issues. for instance, is by default your system configured and operated for disaster recovery ... i.e. if the bldg. containing your system collapsed tonight ... would all applications be running on schedule tomorrow (including current/up-to-date data ... i.e. not is it possible to provide for such a contigency, but as a matter of default operation, disaster/recovery is provided for).

if you are interested in shared memory ... SCI & various dictionary cache consistency. also there is work that looks more like the precursor proposal before SCI went thru the standards process (i.e. joke about starting with race horse requirements and producing a camel).

--
Lynn & Anne Wheeler

cics

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: lynn@garlic.com
Subject: cics
Date: 1996/03/31
Newsgroups: alt.folklore.computers
university i was at ... was beta-test for prerelease cics product in fall of '69. first bug i remember shooting in cics was bdam open ... code as shipped only supported a specific set of bdam ... and we were using something different (developing library project on grant from navy research).

in some sense, cics was similar to watfor ... it wasn't that facilities weren't there to do the job ... but the overhead was too expensive, large part of cics was lightweight thread multitasker ... with lightweight file i/o (open files at start-up ... and pretty much leave them open). result was significant reduction in processing overhead.

about the same time i also put 2741 & tty support into HASP ... in order to get interactive job support.

there were also four of us about the same time that were credited with building 2702-replacement control unit (blamed for originating the ibm oem control unit business). part of the reason was that 2702 didn't work as originally designed. among other things we implemented dynamic type recognition (at least 2741, 1052, & tty) as well as dynamic speed recognition (old had these days with modern modems).

we had also been running cp/67 since jan. of 1968 for interactive.

--
Lynn & Anne Wheeler

Caches, (Random and LRU strategies)

Refed: **, - **, - **, - **, - **, - **
From: lynn@garlic.com
Subject: Re: Caches, (Random and LRU strategies)
Date: 1996/04/26
Newsgroups: comp.arch
i did clock in '68/'69 ... but in '71 came up with version of >1bit (i.e. instead of clear does a shift ... assuming shift right, then hardware alwas sets the left most bit, clock does a shift right one bit ... with left most bit being zero'ed; only take entry if all bits are zero) and two-handed clock that would beat straight LRU.

standard clock tended to be 10-15% worse than LRU ... variation tended to be 10-15% better than LRU. detailed analysis was that it tended to approximate LRU ... when locality was good ... but tended to approximate random ... when locality was poor ... or when patterns started to look like MRU (i.e. program requirements slightly larger than cache size and straight LRU would constantly replace the block that was going to be needed next, whereas random would seldom be selecting block needed next). --
Lynn & Anne Wheeler

Caches, (Random and LRU strategies)

Refed: **, - **, - **
From: lynn@garlic.com
Subject: Re: Caches, (Random and LRU strategies)
Date: 1996/04/26
Newsgroups: comp.arch
(psuedo) random solves nasty problems with pathological behavior when programs don't operate according to assumed behavior (i.e. LRU algorithms are done based on implied assumption that the things referenced most recently will have the highest probability of being referenced in the near future).

presumably predictability refers to some series of events running at the microscopic level ... generalized program behavior across wide-range of applications tends to have more uniform macroscopic behavior with random (although not necessarily optimal).

--
Lynn & Anne Wheeler

IBM song

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: lynn@garlic.com (Lynn Wheeler)
Subject: Re: IBM song
Date: 1996/06/15
Newsgroups: alt.folklore.computers
'31 book had 45 pages and 106 songs ... but i like the hasp songbook better.

denver '7? share .. i believe one of the people that did the song bony fingers ... was relative of one of the jes committee members and he got her to do a parody version about jes ..although it isn't listed in my version of the book. Thursday nights at share used to be especially hard, trying to help the head of share not have to lug any cases home on the plane the next day.

(I guess I shouldn't say too much bad about jes ... I worked on HASP in school; implementing crje support in HASP-III/mvt18 with CMS edit syntax and drivers for 2741, 1052, and tty ... borrowed from my work on cp/67 ... but my wife worked in the jes2 group for awhile in g'burg).


If I had a HASP, I'd be SPOOL.in' in the morning,
I'd be SPOOL.in' in the evening, all over this land.
I'd SPOOL all the SYSIN, I'd SPOOL all the SYSOUT,
I'd SPOOL the jobs between
the remotes and the local
Aaa-aahh, all over this land.

--
Lynn & Anne Wheeler

PC reliability

From: lynn@garlic.com (Lynn Wheeler)
Subject: PC reliability
Date: 1996/07/08
Newsgroups: alt.folklore.computers
in fact it looks as if pc reliability may be heading in the opposite direction; instead of having PC parity memory (9bits, 1parity bit for every 8data bits, can detect one bit errors) ... direction for pcs are things like EDO (can't detect any errors).

workstations now frequently have ECC (10bit, 2 error correcting bits for every eight data, detect all two bit errors, correct any single bit errors)

mainframes have things like 80bit ecc (16bit correcting for 64bit data, same ratio as 10bit ecc, but can detect combinations of 16bit errors in 64bit data, can correct 15bit errors).

--
Lynn & Anne Wheeler

mainframe tcp/ip

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: lynn@garlic.com (Lynn Wheeler)
Subject: mainframe tcp/ip
Date: 1996/07/11
Newsgroups: alt.folklore.computers
i wrote hyperchannel ibm mainframe drivers in '81 allowing the IMS group to relocate several hundred people and all their local channel devices several miles from the center. interesting thing was that remoting 327x local channel attach controllers ... improved overall system thruput by 15% (mostly by reducing channel busy; w/o degrading response).

i also did rfc1044 for tcp/ip product ... while rest of the product limped along at 44kbyte/sec ... was doing transfer between 4381 and cray at channel speeds.

--
Lynn & Anne Wheeler

tcp/ip

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: lynn@garlic.com (Lynn Wheeler)
Subject: tcp/ip
Date: 1996/07/12
Newsgroups: alt.folklore.computers
more recently ... late '80s ... it took me several months to get the engineer that did 6000 version of escon (sla) to stop working on 800mbit version and come to the FCS meetings. however we got sidetracked as the attached indicates

also for index to rfcs (i.e. rfc1044 reference) see

http://www.garlic.com/~lynn/rfcietf.htm

.... a posting i did last year to comp.arch.storage
grump ....

a large number of the 9333 systems were for ha/cmp and we heavily backed the project. however we were also doing cluster scaleup using fcs.

during san fran usenix, jan. 1992, Hester, my wife, and I had a meeting with Ellison, Baker, Shaw, and Puri in Ellison's conference room. We proposed having 16-way fcs pilot clusters in customer shops with parallel oracle by summer of 1992 ... upgraded to 128-way by ye92.

unfortunately the kingston group were out trolling for technology and found cluster scaleup the very next week. in something like 10 weeks, the project was transferred to kingston, announced as a supercomputer, and we were instructed to not work on anything involving more than 4 processors.

in the elephant's dance to do the supercomputer subset of cluster scaleup ... the device interconnect strategy got obliterated. so instead of 9334->interoperable family(1/8 fcs on serial copper, 1/4 speed fcs on fiber, & full speed fcs on fiber) ... in the resulting confusion, 9334->ssa.

while ssa is quite good technology (especially compared to scsi), the interoperable family strategy is better.

various posts mentioning our ha/cmp product:
http://www.garlic.com/~lynn/subtopic.html#hacmp

various posts mentioning original sql/relational implementation
http://www.garlic.com/~lynn/submain.html#systemr

--
Lynn & Anne Wheeler

middle layer

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: lynn@quake.garlic.com (Anne & Lynn Wheeler)
Subject: middle layer
Date: 1996/07/14
Newsgroups: comp.infosystems.www.authoring.cgi,comp.infosystems.www.servers.unix,comp.databases,comp.client-Server
i have a foil presentation laying around that i put together in dec '88 describing what i called 'middle layer' ... which i presented to a number of companies. large focus was operation, management, and coordination of distributed environment i.e. legacy systems didn't know how to ... and the desktops didn't either. departmental servers started to fit into middle layer ... but didn't have the ops c&c characteristics.

presentation included aggregation ... lots of LAN description with peer-to-peer operation seemed to also imply symmetrical traffic ... by definition lots of concentration points exhibit asymmetrical traffic flows (i.e. concentration point sees the aggregate bandwidth of the desktops).

operation of concentration points imply requirements that start to look a lot like 7x24 (even if coordination achieves availability via replication ... which also needs to be managed and operated). there were also thruput efficiencies where dedicated box's provided concentration/batching for backends (especially legacy backends).

one of the most interesting (violent) reactions was that single t/r segment should support unlimited large number of clients ... since presumably no single client required more than t/r worth of bandwidth. one marketing group wanted my head after presenting such heresy to a particularly large electronics company. because of niche in the technology & market cycle ... concentrators and short e-net segments with aggregate bandwidth delivery hundreds of times single-segment was less expensive than single-segment t/r (with no concentrators).

concentrators then provided focal point for added-value function/feature delivery opportunities.

strong reactions regarding middle layer even being responsible for functions and/or various types of added-value features and protocol translations (disinmediation?). other aspect was that the desktops were on generation & deployment cycles significnatly shorter than backend/legacy. desktop software delivery/maint. still wasn't solved ... so middle layer provided a compromise between new feature delivery and support overhead.

--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com | finger lynn@garlic.com for public key

middle layer

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: lynn@garlic.com (Lynn Wheeler)
Subject: middle layer
Date: 1996/07/14
Newsgroups: comp.infosystems.www.authoring.cgi,comp.infosystems.www.servers.unix,comp.databases,comp.client-Server
oh yes, some of the characteristics of the economic nitch in '88 creating middle layer:

... 10baset enet cards were going for $300 or less, station-to-station burst was running at 95+% media ... free-for-all was running at 85+% media for clusters of 20-40 stations.

... 16mbit t/r cards with effective aggregate media thruput of 50%-60% media were going for $1000

i had done the tcp/ip mainframe product support for rfc1044 ... and was getting channel thruput at the mainframe ... while the product using other means was around <50mbytes/sec thruput (>order magnitude difference).

For reasonable corporate configuration of >300 stations ... could put in mulitple small 10baseT configurations ... necessary backbone routers with direct channel attach to the mainframe and dedicated enet &/or fddi to middle layer servers ... in additon to some of the middle layer servers for the same cost as single segment 16mbit t/r (adapter card cost spread at 300 was >$210,000)

small enet configs operating at 9mbit/sec thruput (aggregated by backbone routers with direct channel attach and dedicated server lan attach, avg bandwidth per station >300kbit/sec). or 300 stations sharing single t/r operating at 9mbit/sec thruput (avg. bandwidth per station about 30kbit/sec).

cost per avg. bit/sec to the desktop then was also order of magnitudes difference.

feature/function/ops management, maintenance, etc ... at the middle layer servers was also significantly less than doing at each of the individual desktops.

in any case, lots of corporate dollars easily justified for middle layer infrastructure.

--
Lynn & Anne Wheeler

IBM 4381 (finger-check)

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: Lynn Wheeler 
Subject: Re: IBM 4381 (finger-check)
Newsgroups: alt.folklore.computers
Date: 24 Jul 1996 09:07:38 -0700
finger check with directory reference ... correct version is:

ftp.netcom.com/pub/ly/lynn/
MOVED to:
http://www.garlic.com/~lynn/

i usually had early access to several models of 43xx and 370s. in the late '70s the disk development lab and product test/assurance lab were organized in big machine rooms with the disk & controller products in "test cells" (limited access campus, controlled access building, controlled access machine room, within each machine room, numerous heavy mesh steel cages with combination locks on the doors ... each cage was "test cell" containing product under development &/or test).

At the time attempting to operate a single "test cell" connecting to a mainframe running a standard product operating system ... would typically crash the operating system within 10-15 minutes. the result was that mainframe tests were run with special "stand-alone" dedicated testing programs ... on a scheduled basis (one test cell at a time). To help the situation, I did a bullet-proof rewrite of the operating systems I/O support and failure routines ... so that 6-10 test cells could be operated concurrently running under mainframe operating system. I also wrote up internal document describing some of the failure modes (for which i got the 2nd line manager of RAS at meyers corner after me ... not for fixing the problems, but documenting that they existed).

In any case, in return for creating and supporting such a beast ... I got quite a bit of latitude in being able to use the machines as long as I didn't impact development and test (i.e. the "heavy" i/o testing typically resulted in less than 5% cpu busy). The first one or two mainframe machines produced typically went to CPU product test ... then next couple machines went to disk product test ... as a result I frequently had much better access to the machines than some of my cpu engineering friends back at the cpu manufacturing sites.

the downside was that whenever something wasn't working correctly ... i got called. the first weekend they replaced 3830 running 16 drives with 3880 ... i got called. They suspected that I had done some change in the operating system that caused severe performance degradation. Turned out to be idyosyncrocy in the 3880 which hadn't been tested in multiple drive configuration up until then. Had to go in and identify the 3880 internal hardware bottleneck and get it fixed before first customer ship. spent a lot of time browbeating engineers when they would do something that violated performance, cpu, and/or channel architecture (self-defense ... because I would get the initial blame supposedly as having done something to the software).

--
Anne & Lynn Wheeler lynn@{netcom|garlic}.com finger for public key

IBM 4381 (finger-check)

Refed: **, - **, - **, - **, - **, - **, - **
From: Lynn Wheeler 
Subject: Re: IBM 4381 (finger-check)
Newsgroups: alt.folklore.computers
Date: 24 Jul 1996 09:31:38 -0700
... i also somewhat aggravated the performance situation ... when i rewrote the mainframe i/o supervisor, i also redid the channel balancing, channel redrive and disk redrive algorithms ... which tended to come back and hit a controller much faster and harder (increasing thruput, but it exaserbated this early 3880 multi-drive bug ... which had to get fixed before first customer ship).

they also got my page-mapped filesystem ... which I had done in the early '70s (at 545 tech sq) but never shipped externally as a product. the up-side to the page-mapped filesystem was that it was totally transparent to the users and applications and nobody knew anything about it (except the performance thruput was up to 3* that of standard filesystem). the down-side to the page-mapped filesystem was that almost nobody knew about (it ran on a large number of machines ... but since it was so transparent ... it apparently never occured to anybody to make it a product).

--
Anne & Lynn Wheeler lynn@{netcom|garlic}.com finger for public key

1401 series emulation still running?

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: Lynn Wheeler <lynn@garlic.com>
Subject: Re: 1401 series emulation still running?
Newsgroups: comp.society.folklore
Date: 06 Sep 1996 09:12:23 -0700
there is actually this story regarding testimony at some anti-trust trial. one of the competitors that got out of the business testified at the trial that EVERYBODY in the computing business in the late '50s realized that the SINGLE MOST IMPORTANT CRITERIA TO BE SUCCESFUL in the computer business was to have a single compatible machine architecture across the product line. He testified that everybody tried to pull it off ... but only IBM succeeded. The other vendors had problems controlling local plant managers ... the manager in charge of the low-end wanted to tweak things for their technology, the manager in charge of the mid-range wanted to tweak things for the mid-range technology and the manager for the high-end wanted to tweak things for the high-end. Only ibm hdqtrs was able to control all their plant managers and force them to implement the same, single architecture. The hdqtrs of all the other companies failed in one way or another.

Presumably the justification was this was a period of rapid business expansion. A business would write an application and it would become critical to the operation of the business. As their business expanded they needed a bigger machine to run the application. Could they simply bring in a larger machine ... and run the same application or would they have to wait while the application (and/or some other aspect of the environment) was ported. The cost of machine/hardware became far less than lost business associated with delays associated with porting/converting software.

Note the perception of this is as important ... or more important than the reality.

The 360 1401 and 7094 hardware emulators were a concession to the transition from the old environment to the new.

--
Lynn & Anne Wheeler

IBM 1403 printer

From: Lynn Wheeler <lynn@garlic.com>
Subject: Re: IBM 1403 printer
Newsgroups: comp.society.folklore
Date: 18 Sep 1996 08:29:34 -0700
1403N1 ... faster (1100 lpm), super noise insulation in the cover (loader noise because of faster operation), and mechanical cover lift (because of the weight of the insulation)

had interesting feature that cover would automatically lift (paper jam, etc) ... spilling everything placed on top.

--
Lynn & Anne Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

Old IBM's

From: Lynn Wheeler <lynn@garlic.com>
Subject: Re: Old IBM's
Newsgroups: alt.folklore.computers
Date: 19 Sep 1996 09:22:23 -0700
given that many s/360 were implemented in some m'code or another it is a 360 ... except for the fact that I/O and some other supervisor instructions weren't implemented.

there was also a bit-slice implementation that one of the labs did that was sufficient 360 to execute FORT-H binaries. they were placed at data gathering stations along the accelerator for doing initial data reduction.

--
Lynn & Anne Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

Old IBM's

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: Lynn Wheeler <lynn@garlic.com>
Subject: Re: Old IBM's
Newsgroups: alt.folklore.computers
Date: 20 Sep 1996 20:35:29 -0700
the 360/370 princ. of ops ... was a (s?)gml subset/extract of the "red book" which specified what was required for something to be 360/370 from standpoint of operating system, aplication programs, software. most things (including i/o instructions) were required to be present including specification on how they worked. some things were required ... but some details of how they worked were model dependent (like diag/83 instruction).

xt/370 & at/370 executed highly modified version of vm/370 because the implementation was non-conformance in many (supervisor instruction) areas with regard to the red book (i.e. various processor lines had all sorts of microprocessors as core ... so issue of m68000 core wasn't the issue ... issue was that it was only partial implementation).

typically the lower end of the 370 range tended to be various m'processor cores with 10:1 instruction execution ratio between m'processor (vertical) instruction and 370 instruction.

higher end of the 370 line tended to be horizontal (wide) instruction machines ... where measure was cycles per instruction rather than (vertical) m'code instructions per 370 instruction. 165 avg. about 2.1 machine cycles per 370 instruction. one of the enhancements going from 165->168 was avg.cycle/instruction was reduced from 2.1 to about 1.6.

--
Lynn & Anne Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

old manuals

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: Lynn Wheeler <lynn@garlic.com>
Subject: Re: old manuals
Newsgroups: comp.society.folklore
Date: 07 Oct 1996 08:24:18 -0700
i got into various trouble by referring to FS as the inmates in charge of the institution ... reference to cult movie that had been playing down the street (from 545 tech sq) in central sq for over a decade. FS seemed to include every blue-sky concept that had ever been dreamed up. one thing that killed it was projection that FS machine built of hardware faster than currently existed (370/195) would have application thruput less than 370/145. simple hardware add instruction could go thru five levels of memory access for each parameter to determine what kind of object it was and how the add should be executed (length, type, interger, float, character, etc).

course i also thot that what i had running & distributing was better than what they were promising in the resource management chapter (8?, does somebody remember); little things like running avg utilization for concepts like dynamic adaptive and fair share rather than just previous delta, total to-date, pretty fixed dispatch ordering.

one might consider John's effort on RISC in the mid-70s to be at least partial FS-backlash

with respect to FS manuals ... they were softcopy and kept online under special access procedures. one weekend when i had test time in machine room where a copy was kept ... they claimed proadly that only authorized people had access to the documents and even i couldn't get at them. took me about five minutes at the console and demonstrated unlimited access to everything on the machine. i suggested possibly development of an encrypted filesystem so that even if i had physical access to all the components ... i wouldn't be able to recover the stuff.

as to how little things change ... this was nearly 25 years ago and the manuals ... as well things like the 370 "red book" (i.e. complete 370 architecture manual from which the 370 principle of ops manual was produced as subset) ... were gml (precursor to sgml ... from which html and lots of web stuff are derived).

how many people have been doing gml for over 25 years?

--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

SGI O2 and Origin system announcements

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: Lynn Wheeler <lynn@garlic.com>
Subject: Re: SGI O2 and Origin system announcements
Newsgroups: comp.arch,comp.sys.sgi.misc
Date: 09 Oct 1996 07:45:21 -0700
does anybody know whether or not interconnect is SCI (ala convex, sequent, dg, etc) for distributed shared memory???

2way seems to be two processors on the same board ... 64-way SCI would yield 128 processor configuration.

convex(hp) with two processors on the same board ... & 64-way SCI yield 128 processor configuration

sequent using intel's quad-board and 64-way SCI achieves 256 processor config.

--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

System/360 Model 30

Refed: **, - **, - **, - **, - **, - **, - **, - **
From: Lynn Wheeler <lynn@garlic.com>
Subject: Re: System/360 Model 30
Newsgroups: alt.folklore.computers
Date: 03 Nov 1996 07:45:05 +0000
some of the early 360 documents predate shipment of the actual hardware &/or software products. there tended to be minimum memory requirement inflation as software approached FCS (first customer ship).

i've got a 360 document that describes the models 60 and 62 ... with the 62 available in 1cpu, 2cpu, and 4cpu configurations. What actually shipped were the models 65 and 67.

standard 67 only came in one and two processor configuration. the two processor configuration had channel controller and dual-ported memory bus. the "channel controller" was able to route all i/o bus/channels and memory banks ... including being able to partition the machine into two uniprocessor units. the channel controller was configured via switches on the front panel (later two processor 370 models were actually a step back from the 67).

i know of one custom triplex 67 built which had a much more sophisticated channel controller ... including the ability to reset all the configuration switches under program control.

original 67 documentation described entry level memory configuration of 256k. As TSS got closer and closer to ship ... entry level memory requirements grew to 512k ... and ibm had to retrofit 67s in customer locations.

--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

Mainframes & Unix

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Mainframes & Unix
Newsgroups: comp.arch
Date: 05 Nov 1996 21:43:40 +0000
some of the "bulletproof" issues aren't the obvious RAS (reliability, availability, serviceability) ... i.e. lots of work to eliminate bugs, lots of work done to handle situations progamatically when there were failures/problems, and lots of stuff put into the system for visibility, recording, and management of problems.

one simple comparison from a cultural viewpoint is lots of "production" NT machines running on pentium pros with EDO memory (i.e. fake parity, no protection at all) compared to modern mainframes with 80/64 ECC (i.e. 16 bits of ecc for every eight bytes ... capable of detecting 16 bit errors and correcting 15 bit errors).

another is some simple philosophical background. MVS heritage is 30 years or so that a program was run under batch control ... and all aspects of its operation needed to be handled programatically (including things like being able to intercept b37 error condition, effectively filespace full, and recover). In prior discussion, it was noted that common business appliation is sort ... and that SV4 sort out of the box didn't even check for write error on temp file. Some vendors had upgraded to recognized write error ... and at least fail.

compare unix heritage, for nearly as long, has been that a program is a user command ... i.e. if sort command failed ... the user could figure out how to reconfigure things to get it to work.

I've looked at taking some 7x24 business critical mainframe environments and migrating them to open systems ... and there are still quite a few business rules (embodied in places like operational command & control centers) that I haven't figured out how to translate to unix or nt platforms ... just because the necessary system infrastructure doesn't exist.

On the other hand ... when my wife and I were starting our unix high availability cluster stuff in the late '80s (based on some experience; having done some of the mainframe stuff 10-15 years earlier), there were large number of people around the (open system) industry in total opposition to the effort ... who, somewhat surprisingly, now are some of the biggest proponents.

MVS isn't perfect. I was involved in a somewhat atypical installation in the late 70s and early 80s where the standard MVS product (if run) would crash & burn regularly ... typically within 15 minutes of being brought up ... because of various types of I/O errors and failures (things that normal installations might see less than once a year). I did a bullet proof IOS rewrite for this environment which eliminated all the failures ... hard, as well as various soft failures involving loops and resource starvations.

Another scenerio is in '81 I did redesign and re-implementation of the HYPERChannel remote device support. As part of that redo ... I choose to reflect various sorts of transmit failures, involving channel programs loaded into the A51x remote device adapters, as channel checks. Nearly nine years later, I got a call from some IBM quality control expert. There is this industry service that collects mainframe erep (error recording) logs from customer locations and produces industry reports regarding various production machine operational RAS characteristics. The (then) current generation of IBM mainframes had reduced channel error rates to well under 10**-20. As a result nearly all of the reported channel check errors ... across a large percentage of all operational production mainframes in existance ... were not real ... but were being generated by various installations running HYPERChannel remote device support (some for "channel extenders" at the end of telco lines ... which had BER no better than 10**-9). After some review ... I determined that the transmission error condition could be reflected as interface control check (i.e. IFCC instead of CC) ... because the same exact software recovery operations were executed for IFCC and CC errors. The advantage is that IFCC errors are reported differently than CC errors in the industry RAS reports.

To put this is some perspective ... imagine a vendor of large production unix systems has a customer base well in excess of several thousand machines ... and that every error for all customers was being captured and reported by an industry standard RAS organization (i.e. things like every SCSI bus error). Imagine the vendor becoming quite concerned when it finds that there were a total of ten such (SCSI bus) errors reported across the all the machines in the whole customer base in a period of 12 months ... when they believed there should be no more than one.

--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

Mainframes & Unix

Refed: **, - **, - **, - **, - **, - **
From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Mainframes & Unix
Newsgroups: comp.arch
Date: 06 Nov 1996 08:40:11 +0000
... oh yes, somewhat implicit in my previous post was the concept of instrumentation and measurement. to get out of the art stage and at least into the engineering stage (not even necessarily science) ... requires instrumentation and measurement. RAS (reliability, availability, serviceability) "engineering" requires not just the pieces that implement it ... but also the instrumentation and reporting structure that support it.

the hypothetical RAS case of all large production unix systems (fast cpu, 100+ mbytes of memory, at least a dozen disk drives, etc) includes extensive measurement and reporting structure that is not only used at the respecitive data centers ... but reports are fed into national service that can catagorize RAS profiles for all such systems (somewhat analogous would be frequency of repair for all trucks in the US broken down by types of failure) ... with lots of people really concerned about things like whether or not even a single SCSI bus error occured on any of the systems anytime in the past 12 months.

--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

Mainframes & Unix

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Mainframes & Unix
Newsgroups: comp.arch
Date: 09 Nov 1996 16:00:56 +0000
there seems to be a variety of apps on TPF res systems. I looked at some of them a couple years ago. One of them was "routes" which accounted for approximately 25percent of total processing in a complex. I redesigned with a paradigm shift and got about a factor of 1000* thruput improvement ... and then implemented the top 10 (impossible) things on the wish list (which required about factor of 100* ... cutting aggregate thruput improvement back to about 10* ... one item was eliminating any carrier bias). It was a query mostly app ... so relatively straightforward to move to unix clusters (turns out to be cultural problem, more difficult than technical issues).

--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

interdata and perkin/elmer

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: interdata and perkin/elmer
Newsgroups: comp.society.folklore
Date: 10 Nov 1996 15:47:46 +0000
does anybody remember the interdata & perkin/elmer lineage???

when I was an undergraduate ... I had rewritten much of the mainframe driver software for 2702 ... in theory using the SAD command and some hacks ... to reconfigure which line-scanner was associated with which port. Objective was to allow various terminals (tty/ascii, 2741, 1052s, etc) to dial into common modem subpool ... and dynamically recognize the incoming terminal type. During tests ... everything seemed to work. Finally, one day the IBM hardware engineer explained to be that it really wouldn't work reliably because ... while it was possible to specify under program control which line scanner was associate with which port/line ... that 2702 implementation took some short cuts and hired wired specific oscillators to each line (fixing the baud rate on each line).

In reaction to the problem we started a project to build a 2702 replacement (that could do dynamic baud detection as well as dynamic terminal type identification). We started with Interdata 3 and built our own hired wired channel attachment card. Somewhere I believe there is an artical blaming four of us for originating the IBM OEM plug-compatible controller business. The project eventually grew into an Interdata4 with multiple embedded Interdata3 processers. About that time, I graduated and went on to other things.

I still remember two problems debugging the interface. One was watching the mainframe "red-light" ... because the 65/67 was still locked out on the 2nd timer tic from updating location 80 timer (can't hold bus-in continously on the channel for two timer tics ... because it also locks up the memory bus). The other problem was when it was identified that ascii/tty bits were going into the mainframe "backwards" (actually the 2702 tty line-scanner was loading leading bits into lower-order bit ... before transmitting to mainframe memory ... effectively reversing bit order). To be "plug-compatible" we had to also reverse ascii bit sequence in each byte before sending to mainframe memory.

I believe I've seem some writeup that the first non-DEC Unix port was to a Interdata 7/32(?).

Also at some point Perkin-Elmer bought up Interdata. In any case, I ran into somebody a couple weeks ago that said that he sold large number of Perkin-Elmer boxes in the early 80s with wire-wrapped mainframe channel attached boards ... he implied that the wire-wrapped board possibly had changed little from our original.

In any case, there still seem to be shops still running perkin-elmer boxes as terminal/line controllers.

--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

Mainframes & Unix

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Mainframes & Unix
Newsgroups: comp.arch
Date: 11 Nov 1996 08:58:19 +0000
for routes and fares ... it would be possible to distribute workload across all machines in cluster ... then outage only represents capacity issue not availability.

harder issue becomes the other 98 (out of 100) issues involved in operating production, commercial system ... say command&control operational center. what automated processes are in place to make sure the daily updates for routes and fares are guaranteed to run to run every time and on schedule?

one example from similar thread here last march is with sort. typical business critical mainframe shop will have procedures that intercepts temp-space full condition for specific applications ... reconfigures space and then successfully runs appliation to completion. lots of business apps rely on sort. it was noted that 5.4 sort out of the box doesn't even check for write-error so if /tmp fills, output is truncated w/o error indication (also kernel error reflected isn't even specific to condition ... but is lumped into generic catch all ... assuming sort is even modified to check for an error).

Catching space full can be one out of 1000 business rules defined for business critical application and most of them could require custom hack to translate to unix (code that isn't part of the application ... but part of the operation of the application).

it is not that i don't believe in unix clusters ... my wife and I originated ha/cmp as something of skunk-works and despite various opposition saw it thru to first customer ship. ha/cmp typically refers to no-single-point-of-failure.

lots of the mainframe isn't science but lore. it is the thousands of outages that have occured over the past 30 years ... some of which were wierd combinations of multiple failures ... which a solution was then devised for. there are some large business-critical complex apps that have hundreds of thousands of business rules that have evolved over the past 30 years.

--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

Mainframes & Unix

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Mainframes & Unix
Newsgroups: comp.arch
Date: 11 Nov 1996 10:19:59 +0000
... another aspect of business critical apps is failure mode analysis.

sometime last year ... i did a review of a relatively modest unix business critical application for operation on tcp/ip and the internet. it had been thru development, test, stress-test and q&a.

we put together a failure mode matrix involving system components, networking components, internet components, isp, firewalls, routers, client components, etc. the application was analysed from the standpoint did it handle &/or recover from each specific failure mode. in the case where it didn't handle and/or recover ... was there at least sufficient information being logged to diagnose what the failure was. first pass thru the majority of items weren't even diagnosable.

the original effort was somewhat the expected straight-forward quality application development. to support business critical involved about four times as much (more) effort ... as went into the initial straight-forward quality effort.

a number of mainframe applications have acquired the additional 4* effort (or more) ... not necessarily thru fore-sight but thru evolution. many times the people currently involved might not even being able to cataloge everything.

--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

Mainframes & Unix

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Mainframes & Unix
Newsgroups: comp.arch
Date: 11 Nov 1996 10:35:18 +0000
That is a rather small "production unix system" nowadays. The large ones (from Sun at least) have tens of CPUs (up to 30), many Gigabytes of memory (up to 30GB) and many hundreds to thousands of disk drives. The "interesting" installations in the last year or so have been multi-terabyte. Sun's in-house maximum config tests are for tens of terabytes. Over the last few years Sun has learned (the hard way) what you have to do to make this size of system run well, and what kind of RAS features and service are needed.

i recognize it as being small ... but there are mainframe systems that small also. point was to establish lower bound (not average or upper) on all system configurations that were consistently monitored and reviewed.

one fundamental issue is culture & people that are focused on even things as minor as all the soft failures that have no observable external impact. even in cases where the software monitoring exists ... is it being extensively reviewed. are there morning departmental meetings to cover all operational characteristics that occured in the last 24 hrs. are there weekly meetings. do all the installations provide information for ranking across the industry.

make it slightly simpler, for the top ten solstice installations, what is the total number of scsi bus errors that occured at each installation in the past 12 months (including distribution). how does that compare to the top ten sgi installations and the top ten hp installations.

for large oracle clusters ... are they raw devices or filesystems. is there difference in the way errors are logged and reported between raw devices, filesystems, and hardware outboard controllers that provide error masking with raid? are all errors; kernel, dbms, outboard controllers, etc ... recording consistently in the same place?

after having done ha/cmp for unix ... a lot of the gotchas are the nits and the details ... not the high-level stuff.

--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

Mainframes & Unix

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Mainframes & Unix
Newsgroups: comp.arch
Date: 11 Nov 1996 10:49:07 +0000
a "gatcha" detail ... for ha/cmp i wanted to do ip-address take-over to mask various types of system, network, adapter, etc. failures.

the server software worked correctly, the tcp/ip architecture works correctly, most environments worked correctly ... until ...

there is this function in tcp/ip that uses an ARP (address resolution protocol ... for detailed reference look at
http://www.garlic.com/~lynn/rfcietf.htm in the term index.

Most systems implementat an ARP cache. The ARP cache maintains the most recently resolved IP->mac addresses. The ARP cache architecture has well defined rules that entries in the ARP cache times-out after a specified period. It is also possible to issue a ARP clear command to wipe everything from the ARP cache.

The bottom line is that in theory, client-side tcp/ip support should allow servers to do ip-address take-over with a different LAN card ... and the clients will discover the new mapping between the ip-address and the mac address ... loading it into the ARP cache.

The problem was that the 4.3 tahoe and reno tcp/ip source used by a large number of vendors had a glitch. The tcp/ip routine that called the ARP routine to provide the IP->mac mapping ... kept a one-deep most recently used mapping in its hip-pocket. This was a performance enhancement ... but also wasn't subject to the ARP cache time-out rules and/or the ARP flush command.

There are some environments where effectively all of a client's tcp/ip activity is with a single server ... using the same tcp/ip address. The problem is that in such an environment ... the ip-address take-over event will never be noticed by the client ... since it never will get into a situation that invalidates the hip-pocket mapping.

...

Another example is trying is trying to put up ha/cmp like servers on the internet with multiple diverse routings into the internet backbone (including diverse routing to different central exchanges, different cable points of entry into the building etc) ... and then banging my head on a brick wall trying to get all the browser vendors to support multiple A-records (i.e. the same DNS name mapping to multiple ip addresses).

--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

Mainframes & Unix (and TPF)

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Mainframes & Unix (and TPF)
Newsgroups: comp.arch
Date: 12 Nov 1996 21:51:03 +0000
i don't know all the details ... but it seems that different parts of at&t ran different things for different purposes. around '74 a somewhat standard mainframe system was provided to at&t longlines that included about 30k lines of kernel code modifications and enhancements that I had generated.

that was pretty much the last i heard ... until 10 years later when the local salesman tracked me down ... to say that longlines was still running the same kernel (had migrated to newer models of mainframes as they came out over the years). concern was that finally after 10 years ... the evolutionary changes of the mainframe models had progressed to where it was not possible to run the '74 kernel on the next generation of mainframes coming out (i.e. mainframe binary application program compatibility has been preserved over the years ... but hardware interfaces supported requiring support by the kernel didn't stay completely compatible over multiple hardware generations).

fortunately the 30k modifications did include some advanced dynamic adaptive resource and workload management ... so it was able to adapt to the 20* increase in resource capacity and workload that occured during the period.

--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

Mainframes & Unix (and TPF)

Refed: **, - **, - **, - **, - **, - **, - **, - **
From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Mainframes & Unix (and TPF)
Newsgroups: comp.arch
Date: 13 Nov 1996 06:58:21 +0000
the last time we looked at ss7 ... for at least 800-number support, it was a non-unix redundant hardware box. problem was the availability criteria of 5minutes of downtime per year ... and doing any kernel maintenance at all tended to blow 20 to 30 year worth of availability budget.

my wife and I proposed ha/cmp cluster. some response was that with a little work it would be possible to create clusters of the current system (i.e. alwas smop ... small matter of programming).

then it became an issue of price for availability.

given that the ss7 defined availability as being able to get the 800-number mapping back from either of the redundant T1 lines ... there was effectively no difference in availabiliity between a unix cluster solution and a cluster of redundant hardware systems (the probability that all boxes in either cluster would be down at the same time was statistically the same ... given the measurement criteria).

--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

interdata & perkin/elmer machines

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: interdata & perkin/elmer machines
Newsgroups: alt.folklore.computers
Date: 13 Nov 1996 07:19:34 +0000
does anybody remember the interdata & perkin/elmer lineage???

when I was an undergraduate ... I had rewritten much of the mainframe driver software for 2702 ... in theory using the SAD command and some hacks ... to reconfigure which line-scanner was associated with which port. Objective was to allow various terminals (tty/ascii, 2741, 1052s, etc) to dial into common modem subpool ... and dynamically recognize the incoming terminal type. During tests ... everything seemed to work. Finally, one day the IBM hardware engineer explained to be that it really wouldn't work reliably because ... while it was possible to specify under program control which line scanner was associate with which port/line ... that 2702 implementation took some short cuts and hard wired specific oscillators to each line (fixing the baud rate on each line).

In reaction to the problem we started a project to build a 2702 replacement (that could do dynamic baud detection as well as dynamic terminal type identification). We started with Interdata 3 and built our own wire-wrap channel attachment card. Somewhere I believe there is an artical blaming four of us for originating the IBM OEM plug-compatible controller business. The project eventually grew into an Interdata4 with multiple embedded Interdata3 processers. About that time, I graduated and went on to other things.

I still remember two problems debugging the interface. One was watching the mainframe "red-light" ... because the 65/67 was still locked out on the 2nd timer tic from updating location 80 timer (can't hold bus-in continously on the channel for two timer tics ... because it also locks up the memory bus). The other problem was when it was identified that ascii/tty bits were going into the mainframe "backwards" (actually the 2702 tty line-scanner was loading leading bits into lower-order bit ... before transmitting to mainframe memory ... effectively reversing bit order). To be "plug-compatible" we had to also reverse ascii bit sequence in each byte before sending to mainframe memory.

I believe I've seem some writeup that the first non-DEC Unix port was to a Interdata 7/32(? ... something like 10 years later).

Also at some point Perkin-Elmer bought up Interdata. In any case, I ran into somebody a couple weeks ago that said that he sold large number of Perkin-Elmer boxes in the early 80s with wire-wrapped mainframe channel attached boards ... he implied that the wire-wrapped board possibly had changed little from our original.

In any case, there still seem to be shops still running perkin-elmer boxes as terminal/line controllers.

--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

Mainframes & Unix

Refed: **, - **, - **, - **
From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Mainframes & Unix
Newsgroups: comp.arch
Date: 16 Nov 1996 08:24:32 +0000
there is minor mention at
http://vm.marist.edu/~piper/party/jph-12.html#wheeler

also check unix system implementation for system/370, bell labs tech journal v63n8p2 p1751, oct '84.

--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

Mainframes & Unix

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Mainframes & Unix
Newsgroups: comp.arch
Date: 17 Nov 1996 16:07:32 +0000
mainframe handler of terminal devices ... from a post I made recently to alt.folklore.computers. ... I was corrected as first "unix" port was to 8/32 (not 7/32 as per attached).

the "problem" was/is that mainframe "channel" interface is half-duplex ... although in the past have used pairs of sub-channel addresses for dual-simplex simulating full full-duplex.

does anybody remember the interdata & perkin/elmer lineage???

when I was an undergraduate ... I had rewritten much of the mainframe driver software for 2702 ... in theory using the SAD command and some hacks ... to reconfigure which line-scanner was associated with which port. Objective was to allow various terminals (tty/ascii, 2741, 1052s, etc) to dial into common modem subpool ... and dynamically recognize the incoming terminal type. During tests ... everything seemed to work. Finally, one day the IBM hardware engineer explained to be that it really wouldn't work reliably because ... while it was possible to specify under program control which line scanner was associate with which port/line ... that 2702 implementation took some short cuts and hard wired specific oscillators to each line (fixing the baud rate on each line).

In reaction to the problem we started a project to build a 2702 replacement (that could do dynamic baud detection as well as dynamic terminal type identification). We started with Interdata 3 and built our own wire-wrap channel attachment card. Somewhere I believe there is an artical blaming four of us for originating the IBM OEM plug-compatible controller business. The project eventually grew into an Interdata4 with multiple embedded Interdata3 processers. About that time, I graduated and went on to other things.

I still remember two problems debugging the interface. One was watching the mainframe "red-light" ... because the 65/67 was still locked out on the 2nd timer tic from updating location 80 timer (can't hold bus-in continously on the channel for two timer tics ... because it also locks up the memory bus). The other problem was when it was identified that ascii/tty bits were going into the mainframe "backwards" (actually the 2702 tty line-scanner was loading leading bits into lower-order bit ... before transmitting to mainframe memory ... effectively reversing bit order). To be "plug-compatible" we had to also reverse ascii bit sequence in each byte before sending to mainframe memory.

I believe I've seem some writeup that the first non-DEC Unix port was to a Interdata 7/32(? ... something like 10 years later).

Also at some point Perkin-Elmer bought up Interdata. In any case, I ran into somebody a couple weeks ago that said that he sold large number of Perkin-Elmer boxes in the early 80s with wire-wrapped mainframe channel attached boards ... he implied that the wire-wrapped board possibly had changed little from our original.

In any case, there still seem to be shops still running perkin-elmer boxes as terminal/line controllers.


--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

what happened to the 286?

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: what happened to the 286?
Newsgroups: alt.folklore.computers
Date: 28 Nov 1996 13:17:14 +0000
the bottom dropped out from under memory chip prices after the 386 machines hit the market (and drove 286 machines off the scene; there was huge stockpile of 286 clones from pacific rim countries getting ready for fall buying season ... and the 386 just blew things to pieces ... there was somehting like a couple month period where 286 clone prices dropped from around 800 to under 300).

i had a 286 system configured with 6mbyte ... but I don't believe that was very common ... given memory prices (typically <10-20% system thruput increase for double system cost).

--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

IBM 4361 CPU technology

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **
From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: IBM 4361 CPU technology
Newsgroups: comp.arch
Date: 08 Dec 1996 09:12:39 +0000
... somewhat related; ibm service procedure requires the ability to field bootstrap diagnose failing systems. 15 years or so ago, larger machines got so complex that it was no longer possible to meet field bootstrap diagnose requirement ... so a "service" processor was created. Various types of probes were built into the machine connected to the service processor; it was possible to field bootstrap diagnose the service processor ... and then use the service processor to diagnose the mainframe complex. Early service processors eventually graduated to 4331s ... and then to pair of replicated 4361s (embedded inside larger mainframe). 4331s and 4361s service processors ran modified version of VM/370 release 6 using ios3270 to drive the service panels.

--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com | finger for pgp key

next, previous, subject index - home