2004 Newsgroup postings (05/15 - 05/27) Lynn Wheeler

2004 Newsgroup Postings (05/15 - 05/27)

c.d.theory glossary (repost)
c.d.theory glossary (repost)
Quote of the Week
Infiniband - practicalities for small clusters
Infiniband - practicalities for small clusters
Infiniband - practicalities for small clusters
Infiniband - practicalities for small clusters
The Network Data Model, foundation for Relational Model
racf
racf
racf
command line switches [Re: [REALLY OT!] Overuse of symbolic
1.7rc2 & developer's build/install
racf
command line switches [Re: [REALLY OT!] Overuse of symbolic
Infiniband - practicalities for small clusters
Infiniband - practicalities for small clusters
IBM 7094 Emulator - An historic moment?
layered approach
Infiniband - practicalities for small clusters
Why does Windows allow Worms?
Infiniband - practicalities for small clusters
Infiniband - practicalities for small clusters
command line switches [Re: [REALLY OT!] Overuse of symbolic
command line switches [Re: [REALLY OT!] Overuse of symbolic
Infiniband - practicalities for small clusters
command line switches [Re: [REALLY OT!] Overuse of symbolic
[Meta] Marketplace argument
[Meta] Marketplace argument
[Meta] Marketplace argument
vm
MITM attacks
Usenet invented 30 years ago by a Swede?
MITM attacks
Usenet invented 30 years ago by a Swede?
Questions of IP
MITM attacks
Why doesn't Infiniband supports RDMA multicast
Infiniband - practicalities for small clusters
Who said "The Mainframe is dead"?
Infiniband - practicalities for small clusters
Infiniband - practicalities for small clusters
Infiniband - practicalities for small clusters
can a program be run withour main memory ?
Infiniband - practicalities for small clusters
Infiniband - practicalities for small clusters
Finites State Machine (OT?)
Infiniband - practicalities for small clusters
Random signatures
can a program be run withour main memory?
Blinkenlights?
before execution does it require whole program 2 b loaded in
Leaf routines
Infiniband - practicalities for small clusters
[HTTP/1.0] Content-Type Header
Infiniband - practicalities for small clusters
Phone # layout vs Computer keypad # layout???!???!!!
Text Adventures (which computer was first?)
Infiniband - practicalities for small clusters
Infiniband - practicalities for small clusters
Infiniband - practicalities for small clusters
Infiniband - practicalities for small clusters
before execution does it require whole program 2 b loaded in
before execution does it require whole program 2 b loaded in
before execution does it require whole program 2 b loaded in

c.d.theory glossary (repost)

Refed: **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: c.d.theory glossary (repost)
Newsgroups: comp.databases.theory
Date: Sat, 15 May 2004 09:14:38 -0600

mAsterdam writes:
Narrowing this down:

The glossary is a list of items that led to mutual misunderstandings in the c.d. theory newsgroup. It is built from contributions. The newsgroup uses terms from database design, implementation, operation and change management, cost sharing, productivity research, indexing and cataloging database literature, and /or basic databse research.

The glossary's purpose is to limit lengthy misunderstandings. It consists of signposts: watch out! You may think the OP means A but she might mean B. Alternative names and views of the same concept are only introduced when the danger of mutual misunderstandings is appearant. When context matters, it is provided. The glossary is a highly biased list of problematic concepts.

slight drift ... the nlm has books, articles, papers ... there is essentially an online (card?) catalog for the library. umls is sort of the structured set of words used for the catalog. it is sort of structured into somewhat hierarchy of concepts, terms, and word sequences. however there is also mesh of complex many-to-many relationships between concepts. there are tens of thousands of concepts, hundreds of thousands of terms, and millions of word sequences.

this is sort of independent of having any definitions for the concepts, terms, and/or word. if you have a set of words that you might want to look for a article with ... umls gives other related words, terms, and/or concepts that might also be used to search for articles.

it is also used by the people cataloging the library ... lots of listed terms and word sequences have preferred relationships, i.e. if an article abstract contains certain set of terms and/or word sequences, there are guidelines about preferred terms to be used for indexing/cataloging. this structure of preferred/nonpreferred relationships can also be used for people looking up entries in the catalog

at this level, umls is effectively the structure used for understanding the cataloging of the articles (as opposed to understanding the articles themselves).

there was some statement that nlm reached the state of many current search engines possibly by the late '70s. a boolean term search would be quite bimodal, at six to seven terms there could still be hundred thousand hits ... but adding one more term dropped the number of hits to zero. the holy grail was finding magic combination of five to eight terms that resulted in 50-100 hits. in the early 80s, an online interface (grateful med) was developed that by default didn't ask for the hits but just the number of hits. then a 2-3 day task might be to discover the magic query combination that resulted in a reasonable hit result (say greater than zero but less than several hundred).

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

c.d.theory glossary (repost)

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: c.d.theory glossary (repost)
Newsgroups: comp.databases.theory
Date: Sat, 15 May 2004 09:41:04 -0600

at one point, i heard that there were something like 40,000 medical librarians around the world that specialized in assisting doctors and medical researchers in doing nlm lookups.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Quote of the Week

Refed: **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Quote of the Week
Newsgroups: comp.databases.theory
Date: Sat, 15 May 2004 13:55:18 -0600

jcelko212@earthlink.net (--CELKO--) writes:
1) Someone send Chris Date one of my very old SQL puzzles and wanted help with it. Chris answered it with his personal programming language rather than Standard SQL. The problem involved displaying the hire date and last promotion date for each employee in a personnel. The specs were to use a NULL, if the employee was a new hire. Since Chris Date's version of the Relational Model does not have NULLs, he used an arbitrary dummy date instead and thus created false information.

posting a year ago in this n.g.
https://www.garlic.com/~lynn/2003g.html#40 How to cope with missing values - NULLS?

mentioning a long ago and far away article by date (1992) titled "An Explanation of why three-valued logic as a mistake" ... related to the handling of NULLS in SQL.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

Refed: **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Sun, 16 May 2004 09:58:50 -0600

Anne & Lynn Wheeler writes:
Several hundred instructions in the kernel were SMP'ed for concurrent operation ... bascially a relatively thin layer in all the interrupt interfaces and the dispatcher. On entry to the kernel, an attempt was made to obtain the kernel lock, and if it couldn't ... rather than spinning, it queued an extremely lightweight thread request and went to the dispatcher looking for something else to do (aka rather than spinning on the kernel lock, it "bounced" off the kernel lock and went looking for non-kernel work to do).

a side issue of SMP support was pricing for software. somewhat with the unbundling of 6/23/69, services and application software started being priced/charged ... as opposed to free. however, kernel software was still free, somewhat under the theory that it was necessary for the operation of the hardware.

i was somewhat working on the resource manager, ecps (extensive microcode performance enhancements), and VAMPS all at the same time. the resource manager was big package of software that nominally improved the resource allocation algorithms. however it also had a bunch of structual changes to the kernel.

one of the things was that a infrastructure for automated benchmarks was create (as part of calibrating the resouce manager algorithms). in preparation for releasing the over 2000 benchmarks were run taking something like 3 months elapsed time. some of the benchmarks were extreme stress tests that were possibly a factor of ten times outside the normally observed operating situations. at the start these stress test were guaranteed to crash the kernel. eventually as part of the resource manager, the whole kernel serialization structure was completely rewritten ... which eliminated all observed crashes under heavy stress load and also all known situations that involved hung/zombie processes.

previously, in the cp/67 to vm/370 (port from 360/67 to 370) there had been some kernel restructuring; some of the cp/67 that had been done in support of multiprocessing was eliminated. for the resource manager, i re-introduced the dispatching and paging related structures that had been there from cp/67.

the other characteristic was that it was decided to make the resource manager the guinea pig for charged/priced kernel code ... under the revised guidelines that kernel code that wasn't directly required for using some hardware could be charged for.

So the resource manager and ecps ships ... and VAMPS is canceled. The decision is then made to turn out a real vm/370 multiprocessor kernel and to adopt the thin-layer multiprocessor design from VAMPS with bounce lock and extremely lightweight kernel requests (rather than the traditional kernel spin lock).

The only problem was that SMP support needed something like 80 percent of the code that was in the resource manager (involved in various restructuring). The issue was that they just couldn't just make the resource manager a prerequisite for SMP support ... since the resource manager was charged for ... and the business guidelines required that kernel software directly needed to support hardware (in this case multiprocessor operation) was still free (and free software with a prerequisite for priced software wasn't an option).

The solution was to remove the 80 percent or so of the code from the resource manager needed for SMP support and place it in the base (free) kernel. They then continued to charge for the five times smaller resource manager (in terms of lines of code) at the same price as the original resource manager.

all of this seems trivial now that it has evolved that all (mainframe) software is charged for (including all kernel software) .... but at the time it represented an enormous issue.

misc. past resource manager related posts
https://www.garlic.com/~lynn/subtopic.html#fairshare
and paging system posts
https://www.garlic.com/~lynn/subtopic.html#wsclock
and smp posts
https://www.garlic.com/~lynn/subtopic.html#smp
and microcode posts
https://www.garlic.com/~lynn/submain.html#mcode
and benchmarking (in support of resource manager) posts
https://www.garlic.com/~lynn/submain.html#bench

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Sun, 16 May 2004 10:11:32 -0600

Joe Seigh writes:
For simple synchronous kernel calls that return without exiting to the dispatcher/scheduler, there is some additional complications. You might need checkpointing for error handling. If you can avoid blindly doing a pipeline flush and maybe use some mainframe like commit logic, you might be able to implement leaf kernel calls that are fairly efficient.

i created something that I called fastpath for various cp/67 kernel calls when i was undergraduate ... doing detailed feature/function krenel path enalaysis of bunch of paths thru the kernel and identifying those that needed the minimum of kernel services ... and

a) handling them in completely in the interrupt handler (and directly resuming application from the interrupt handler) or

b) recognizing them in the dispatcher and providing special path thru the dispatcher for application resume or

c) dispatcher recognizing that while it was longer kernel processing it was still resuming the same application and being able to still optimize some pathlength.

one of the remaining timing issues was just that the svc interrupt/call from application state to kernel state also changed the machine state from enabled for i/o & timer interrupts to disabled for i/o & timer interrupts. on resume, the state of the machine then changed back from disabled for interrupts to enabled for interrupts. The change in machine state between enabled/disabled for i/o & timer interrupts was an extremely expensive process by itself.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Sun, 16 May 2004 10:15:52 -0600

Anne & Lynn Wheeler writes:
a) handling them in completely in the interrupt handler (and directly resuming application from the interrupt handler) or

trivial example in this scenario was avoid doing save/restore of the floating point registers ... since the brief kernel processing handling in the interrupt handler never touched them.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Sun, 16 May 2004 10:50:31 -0600

this is a report i gave at ibm user group meeting while undergraduate on operating system/kernel speed up.
https://www.garlic.com/~lynn/94.html#18 CP/67 & OS MFT14
https://www.garlic.com/~lynn/94.html#20 CP/67 & OS MFT14

MFT14 was the main batch operating system that was used at the university for the bulk of the work. a characteristic was that it was heavily disk bound ... design point that assumed everything was heavily real memory constrained and so everything had to be done using disks. I speeded up nominal university workload thruput by a factor of three times by carefully positioning data & files on disk to optimize disk arm.

CP/67 was virtual machine operating system from cambridge science center:
https://www.garlic.com/~lynn/subtopic.html#545tech

some people from cambridge had installed it at the university january of 1968 ... and the above referenced presentation was made fall of '68.

when running MFT14 in virtual machine the main issues for CP/67 were in the

1) SVC interrupt handler.

MFT application were using SVC interrupts to call the MFT kernel; under CP/67 they interrupted into the CP/67 SVC interrupt handler and a SVC interrupt into the virtual machine had to be simulated, resuming the virtual machine at the virtual svc interrupt address.

2) program interupt handler

MFT kernel would execute a large number of "privileged" 360 instructions. CP/67 ran all virtual machines in non-privileged mode which resulted in privileged instructions interrupting into the CP/67 kernel. The CP/67 kernel then had to simulate the privilege instruction and resume virtual machine execution

3) dispatcher

standard process for resuming virtual machine execution was the dispatcher which all kernel processes would eventually converge on. the dispatcher in additional to actually activating the execution of virtual machine, also managed the selection of virtual machine to run as well as managing all the kernel thread infrastructure and misc. other tasks.

...

The fastpath work and misc. other kernel work that I did between january and fall of '68 ... reduced the kernel cpu utilization by nearly 80percent:

before: 534 cpu secs in the cp/67 kernel
after: 113 cpu secs in the cp/67 kernel.

the measurements were by running the MFT14 on the real machine and clocking real time and then repeating under CP/67 and clocking real time. The MFT14 workload was run in an virtual machine enivoronment where all of the virtual machines pages were resident in real memory and CP/67 performed no paging operations during the period.

The increase in real time between real machine operation and virtual machine operation is totally attributable to CPU utilization by the CP/67 kernel operation ... not necessarily solely instruction pathlength, since it would also include the overhead for things like the privilege instructions from the virtual machine interrupting into the CP/67 kernel.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

The Network Data Model, foundation for Relational Model

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: The Network Data Model, foundation for Relational Model
Newsgroups: comp.databases.theory
Date: Sun, 16 May 2004 14:58:09 -0600

"Ken North" writes:
I've never heard the term "network model" DBMS in any context other than the CODASYL DBTG standard.

i was writing code for a "semantic network" DBMS (a network model with some additional characteristics) out at the los gatos VLSI group about the same time I was writing code for System/R ... the original RDBMS at research (research lab, bldg 28 was about 10 miles from the Los Gato VLSI lab, bldg 29 ... the bldgs were consecutively numbered i would guess based on when they were built, rather than strict physical proximity).

The "semantic network" DBMS done by the VLSI tools group drew a lot from Sowa's work on semantic networks. There has also been various efforts over the years to map semantic networks into RDBMS technology ... frequently the intermediate layer represents a 10:1 performance overhead handling the semantic network representation to relational representation. however, quicky use of search engine turns up some number of explicit implementations.

a random reference turned up by search engine that happens to also mention medical informatics and UMLS semantic networks ... even tho there has been significant effort mapping UMLS semantic networks to rdbms over the years:
http://courses.mbl.edu/Medical_Informatics/2001/outlinesFall2001.html

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

racf

Refed: **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: racf
Newsgroups: bit.listserv.ibm-main
Date: Mon, 17 May 2004 13:36:21 -0600

tom.schmidt@ibm-main.lst (Tom Schmidt) writes:
DES uses a public key and a private key to verify authority. In RACF's case the public key is the 8-character userid and the private key is the 8- character password. (RACF pads both fields to 8-characters with blanks.)

DES uses symmetric, shared, secret key for encryption and decryption.

warning: long authentication topic drift.

typically a userid/password scenario is that you assert something ... aka: assert you are authorized for the "userid" and you prove the assertion by knowing the corresponding password. the "userid" is used for the authorization process, and the "password" is used for the authentication process.

in 3-factor authentication paradigm:
• something you know
• something you have
• something you are

a password is a single factor, shared-secret, something you know authentication. some security infrastructures may also depend on the "userid" (used for authorization) also be kept totally secret and therefor also become part of something you know authentication process.

asymmetric encryption has a pair of keys and uses different keys for encryption and decryption. business processes establish convention that a specific key is to be treated as public and the other half of the key pair is to be treated as private and never divulged.

In DES, the shared-secret key isn't private (in the sense of the asymmetric encryption business process of public/private keys) since both the encryptor and the decrypter have to share knowledge of the same key (although they may keep it secret from everybody else).

the use of "private" in the business application of asymmetric encryption is, in effect, intended to be more restrictive control of the key than just secret (aka public/private key systems don't represent technology, asymmetric encryption represents technology; public/private keys represent business process application of asymmetric encryption).

it is possible to have two factor authentcation:
• something you know
• something you have

where the something you know authentication can be private as opposed to shared-secret.

the something you have can be some form of chip token that possibly contains a private key that can never (practically) be removed from the token. The private key in the token is used to "digitally sign" (aka encrypt a hash) which is transmitted. If the receiver can verify the digital signature with a recorded public key, then something you have authentication can be demonstrated (since only the person in possesion of the hardware token could have generated the correct digital signature). This is scenario where somebody walks in and says instead of recording a password for my userid ... record a public key ... and instead of the system doing password compare ... uses the public key to perform digital signature verification.

Now, if the hardware token has been certified to only operate in a specific way when the correct PIN has been entered ... the authentication of the digital signature then not also implies something you have authentication, but also something you know authentcation ... i.e. the recepient doesn't know the PIN ... but knows the correct PIN was entered since the token appears to be operating correctly.

In this way, something you know authentication can be demonstrated w/o having to use shared-secrets between the sender and the receipient.

There are horrendous problems with the shared-secret paradigm.

First off, since it is shared-secret, there is security requirement that every unique security domain use a unique shared-secret. Specific security domains, then in addition ask that the shared-secrets used be very hard to guess ... and many of these security domains seem to operate under the assumption that they are the only security operation in the whole world. However, the reality is that people may participate in scores of different security domains ... with the result they have large tens of very hard to guess and remember unique shared-secret passwords. Individuals then are forced to create repositories of all their shared-secrets which become attractive targets for criminals.

Second, institutions also tend to have repositories of significant amounts of shared-secrets, which become attractive targets. In the shared-secret scenario, criminals skimming/harvesting shared-secrets repositories, are able to use the information to impersonate people. small side drift on security proportional to risk:
https://www.garlic.com/~lynn/2001h.html#61

if passwords/shared-secrets were replaced in these repositories with public keys .... criminals could harvest the repositories all they wanted and still not be able to impersonate somebody by knowing their public key (in the way they can impersonate individuals by knowing their shared-secrets).

Finally, people have to know their shared-secrets and therefor are prone to phishing (con-artists convincing people to divulge their secrets in one way or another). hardware tokens using private keys (that can't be extracted from the token) help address the problem. If nobody can know the private key, then the owner of the hardware token also won't know the private key .... and a con artist can't get somebody to tell them something that they don't know.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

racf

Refed: **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: racf
Newsgroups: bit.listserv.ibm-main
Date: Mon, 17 May 2004 13:57:28 -0600

tom.schmidt@ibm-main.lst (Tom Schmidt) writes:
The EFF took most of the fun out of DES several years ago though.

i've got one of the souvenir chips from the machine ... they didn't take the fun out of DES ... they reduced the cost of doing a brute force attack on a specific DES key ... showing that you can do a brute force attack and find a specific key in small tens of hours.

this becomes a problem for using specific DES key for extended periods of time, especially over a large domain, which might involve lots of value.

so there are two countermeasures used in the financial world

1) dukpt ... derived unique key per transaction. basically things like ATM machines that continue to use existing DES hardware ... but wrapper it so every transaction over the ATM network uses a unique DES key. The individual transactions are valued in the tens to low hundreds of dollars and lifetime is typically measured in seconds to minutes. the lifetime of the key is much shorter than existing techniques to find a key ... and besides the value of the transactions are much less than the cost to find the unique key for each transaction.

2) triple DES ... basically an encrypt/decrypt/encrypt sequence using different keys ... achieving 112bit strength with two different DES keys. Each additional bit doubles the elapsed time for the brute force attack on specific key. so if specific 56bit DES key can be found in say ten hours ... then it takes either 2**56 times as long to find a 112bit DES value (and/or cost 2**56 times as much)

the following recent RFC has some discussion of various key strength issues
https://www.garlic.com/~lynn/rfcidx12.htm#3766

3766 I
Determining Strengths For Public Keys Used For Exchanging Symmetric Keys, Hoffman P., Orman H., 2004/04/26 (23pp) (.txt=55939) (BCP-86) (was draft-orman-public-key-lengths-08.txt)

if you click on the above URL ... then clicking on the ".txt=55939" field retrieves the actual RFC.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

racf

Refed: **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: racf
Newsgroups: bit.listserv.ibm-main
Date: Mon, 17 May 2004 14:39:43 -0600

Rick.Fochtman@ibm-main.lst (Rick Fochtman) writes:
Chuck DES and go to RSA, which uses both public and private keys. Copy of original working paper on request. <G>

if you look at determining strengths for public keys in previous referenced RFC ... also reference
https://www.garlic.com/~lynn/2004e.html#18 RFC 3766
and recent previous post in this thread:
https://www.garlic.com/~lynn/2004f.html#9 racf

based on the tables, you are starting to talk about RSA key sizes well over 2048 bits ... to give comparable strength to triple-DES (discussed also in previous post in this thread) or even minimum AES key size.

a thread from sci.crypt
https://www.garlic.com/~lynn/2004d.html#58 How secure is 2048 bit RSA?

for pure authentication, you might be much better off with one of the FIPS186-2 implementations, dsa or ecdsa (see strengths from referenced RFC or table extracted in above referenced URL on how secure is 2048 bit RSA):
http://csrc.nist.gov/cryptval/dss.htm

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

command line switches [Re: [REALLY OT!] Overuse of symbolic

Refed: **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: command line switches [Re: [REALLY OT!] Overuse of symbolic
constants]
Newsgroups: alt.folklore.computers
Date: Mon, 17 May 2004 19:49:51 -0600

jmfbahciv writes:
[puzzled emoticon here] The last sentence doesn't make sense. Why would there be output?

os/360 compilers/assemblers produces object decks (80 byte images) that had ESDs (external symbols), "relocatable adcons", external program references, etc.

various past posts about os/360 object deck formats
https://www.garlic.com/~lynn/2001.html#8 finding object decks with multiple entry points
https://www.garlic.com/~lynn/2001.html#14 IBM Model Numbers (was: First video terminal?)
https://www.garlic.com/~lynn/2001.html#60 Text (was: Review of Steve McConnell's AFTER THE GOLD RUSH)
https://www.garlic.com/~lynn/2002n.html#62 PLX
https://www.garlic.com/~lynn/2002n.html#71 bps loader, was PLX
https://www.garlic.com/~lynn/2002o.html#25 Early computer games
https://www.garlic.com/~lynn/2002o.html#26 Relocation, was Re: Early computer games
https://www.garlic.com/~lynn/2003d.html#47 IBM says AMD dead in 5yrs ... -- Microsoft Monopoly vs. IBM
https://www.garlic.com/~lynn/2003f.html#26 Alpha performance, why?

the link-editor could take several object decks, combined them together resolve what was possible to resolve and emit it as series of disk records ... that had some amount of the object deck overhead eliminated (it still had entry points, any remaining unresolved external references and relocatable address constants). there was a some number of link-edit control commands that had control over things like which libraries to search for resolving external program references.

the information about internal program address constants had to be kept around until it was actually decided to run the program ... because address constants were absolute ... once the program image was loaded into specific address location at runtime ... one of the last things that had to be done (before starting program execution) was go thru the "loaded" memory image of the program and adjust all the absolute address constants.

this os/360 convention for absolute address constants gave me lots of problems attempting to create shared executables ... where the same object appeared simultaneously in multiple different address spaces at potentially different virtual addresses. some past discussions of loacation/address independent code
https://www.garlic.com/~lynn/submain.html#adcon

the os/360 loader and the os/360 link-edit had a lot of common features in terms of processing imput ... the loader targeted the results to a specific memory address/image ... and the linkage edit just wrote an image to disk that could be more efficiently loaded at a later time (than if it had to repeat a lot of the steps every time). the link-edit addressed the scenario where there was large number of repeated executions of an unchanged program.

this didn't do a whole lot for a university student environment where the same, unchanged program was very rarely executed more than once. to some extent the os/360 loader was introduced to address environments like student programs ... which would compile and then (maybe) run ... and never have a repeat execution w/o a compile.

and now there is the "binder" ... ibm FAQ ... what is the difference between the binder and linkage editor ...
http://www.developer.ibm.com/tech/faq/individual?oid=2:22397

the following url
http://www-306.ibm.com/software/awdtools/hlasm/share.html
is "high level assembler and toolkit feature" ... it has a number of descriptions ... including (pdf file) tutorial/overviews:

• how the linkage editor works: a tutorial on object/load modules; link editors, loaders, and what they do for (and to) you.

• the DFSMS/MVS binder and its "program ojbect" format: what the new program model will mean to you.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

1.7rc2 & developer's build/install

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: 1.7rc2 & developer's build/install
Newsgroups: netscape.public.mozilla.general
Date: Mon, 17 May 2004 18:54:12 -0600

yesterday (sunday, 5/16) developer's build would install and run ... wasn't too bad, but could crash under really heavy load.

today both 1.7rc2 and today's developer's build repeat the following three times:

nsNativeComponentLoader: SelfRegisterDll(libnegotiateauth.so) Load FAILED with error: libcom_err.so.3: cannot open shared object file: No such file or directory

and won't load/run

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

racf

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: racf
Newsgroups: bit.listserv.ibm-main
Date: Tue, 18 May 2004 09:38:02 -0600

wfarrell@ibm-main.lst (Walt Farrell) writes:
It could work with smart-cards and readers to supply your digital certificate, and the cost of the cards and readers is getting into the reasonable range, for small deployments. I'm not sure they're inexpensive enough for large-scale deployment yet (but I haven't checked the market lately, either).

note that hardware tokens (smartcards) just supplying your digital certificate .... and nothing else is just another form of static data and shared-secret. some set of hardware tokens have tried that and found out that they were prone to evesdropping and reply ... somebody just listens and captures the digital certificate ... installs it into a counterfeit token ... and voila they have their own

something you have

token to impersonate you.

The issue of using public/private key, asymmetric encryption, authentication is almost totally independent of whether or no a digital certificate exists.

the early x.509 identity digital certificaes were target at a totally unconnected environment ... and the x.509 identity digital certificate supposedly contained all the necessary information for the receiving/relying party to both authenticate as well as authorize you; aka these digital certificates were designed that the system receiving the digital certificates would not have to look up anything on any system to determine whether you were a valid user and/or what permissions you might have .... everything was carried in the certificate. there was also quite a bit of fud generated portraying public/private key authentication as equivalent to digital certificates.

so some of the problems with x.509 digital certificate take-up ... is that it has very little relationship to most real live business operations.

one of the first things that x.509 identity digital certificates encountered was that the overloading of the digital certificates with enormous amounts of identity information tended to create a severe privacy problem. so in the mid-90s you started to see appear something called a relying-party-only certificate.

In this case, you register your public key (in place of a password) in the receivers/relying account database (say the userid database). They then issue you a relying-party-only digital certificate that contains (two things) your userid and your public key (a drastic subset of the information that is contained in your userid database).

when you go to logon ...

1) you contact the server/relying party with your userid.

2) the server sends back some unique, random data.

3) you digital sign the random data with your private key

4) you transmit the digital signature and you certificate to the server

5) the server looks up the userid information using "1" and retrieves the userid record (giving your authentication and authorization information). It then uses the public key from the userid record to validate the digital signature.

The digital certificate is basically a stale, static subset, copy of information from your userid/account record. The only time that a relying/receiving party would ever need to resort to a stale, static subset of information contained in a digital certificate is when it otherwise doesn't have access to the real information (allowing a random user to logon to a system based purely on the contents of a digital certificate and there is no local definition for that user and there is no means of making online contact to a system that might have a definition for that user).

The usefullness of x.509 digtital digital certificates was severely reduced in the mid-90s when the amount when it was realized that overloading it with enormous amounts of identity and/or authorization detail created huge privacy problems for the individual and/or the corporation.

Furthermore, the original design point for digital certificates was something out of the early 80s ... that of offline email. In those days, email was handled by electronic post-offices. PCs would dail-up and create a temporary connection to their post-office, exchange email, and then hangup. This was long before the days of ubquituous internet online connectivity. The person then was sitting there with some amount of email, possibly from individuals that there had been no previous contact. The issue was how to perform any validation on the original sender of the email when there had not been any previous communication, and there was no recourse to an online environment.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

command line switches [Re: [REALLY OT!] Overuse of symbolic

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: command line switches [Re: [REALLY OT!] Overuse of symbolic
constants]
Newsgroups: alt.folklore.computers
Date: Tue, 18 May 2004 08:54:21 -0600

jmfbahciv writes:
If the segment is sharable, each address space had to use the same absolute address of the sharable segment. The whole point of sharable is to need only one copy in core. When did the virtual to physical address calculations get done by you (meaning the monitor)?

OS/360 had convention that "address constants" were absolute ... thay had something called relocatable adcons that were additional data structures for the image on disk that when loaded in memory ... the loader could run thru and swizzel/fix all the storage areas with absolute address constants. programs could load values from these absolute address constants and directly address areas of memory.

in the shared segment world .... if a shared thing (data or instructions) were to appear at different address in different virtual virtual address spaces ... it is obvious that absolute address constants isn't going to work. so the common solution is to use relative address constants. each virtual address space has some private dictionary area and/or carries some local absolute address in a general purpose register. for relative address constants a convention is established that the address constant is added to some base registerd, (where the base register is local to the specific address space and has the local address of the shared object. The convention is that the relative address constant is some offset from a base ... and the absolute value for the base in any specific address constant is in a known register.

a trivial analogy is the whole 360 base+displacement addressing convention. nearly all 360 instructions address storage with a base register plus 12bit displacement convention (or base register plus index register plus 12bit displacement). In the assembler world, all the symbolic locations are tracked by the assembler ... and the assembler generates instructions with the correct base register and appropriate 12bit displacement.

So a not uncommon 360 convention is:


BALR   R15,0
            USING  *,R15

which is branch and link register ... i.e. used for branching to subroutines and saving the return address of the next instruction; except in the case of register zero for a branch-to address ... it doesn't actually go anywhere ... just loads the address of the next instruction into R15. The "using" statement doesn't generate any instructions, it just tells the assembler to use the relative location of what is in R15 as the base ... and generate subsequent storage references with 12bit displacements off R15.

So a real standard calling sequence is something like:


L     R15,=a(subroutine)
           BALR  R14,R15

which uses absolute adcon of the subroutine. as a replacement, I had to use something like:


L     R15,=a(subroutine-base)
           AR    R15,R12
BALR  R14,R15

where "subroutine-base" becomes a relative or offset address ... and the location of "base" for the specific virtual address space is known to be in r12.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Tue, 18 May 2004 09:07:18 -0600

Eric writes:
The VAX had the PROBER and PROBEW instructions to check for access. However this may not be much help unless the page table cannot change. Otherwise the table might change right after you check and kaboom. Whether this can happen or not depends on the OS and the exact spot in the kernel code.

the instruction on 360/67 was LRA ... load real address. the condition code from the instruction gave whether it was valid or another kernel routine had to be called to make it valid.

note however that for the cp/67 kernel most entries weren't exactly the virtual machine calling the kernel but a privilege instruction interrupt ... which the kernel then had to decode and simulate.

the program interrupt gave the starting address of the offending instruction; so the first thing that the cp/67 kernel had to do was a LRA of the instruction and access the first two bytes. the first two bits of the instruction gave the length of the instruction. instructions were half-word aligned so instructions longer than two bytes could cross page boundaries ... and therefor require the LRA to be repeated for every two bytes in the instruction.

there were interlocks in the cp/67 kernel which allowed it to reliable access virtual memory for the purpose of instruction retrieval and emulation. start i/o instruction got especially complex since all of the channel command words and many of the arguments were effectively part of the instruction execution that needed to be accessed for simulation.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Tue, 18 May 2004 11:44:38 -0600

Terje Mathisen writes:
OTOH, Netware was designed from the bottom up on the idea that you should never have to touch anything you didn't actually intend to do something specific with, i.e. all IO was async using scatter/gather extension lists.

there was a precursor project called DataHub (sort of deparmental PC LAN-based network server) done by GPD san jose. some of the coding work was subcontracted out to organization in provo (one of the DataHub people used to commute between san jose and provo almost every week) at some point GPD san jose decided to walk away from the effort and allowed the group in provo to retain all code from the project.

all of the GPD san jose people had mainframe background where async i/o with scatter/gather had been the norm for some time.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

IBM 7094 Emulator - An historic moment?

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: IBM 7094 Emulator - An historic moment?
Newsgroups: alt.folklore.computers
Date: Tue, 18 May 2004 11:37:15 -0600

Tom Van Vleck writes:
The CTSS machines also had a Chronolog clock attached to unit A9. This device appeared to be a tape drive to the channel: reading it returned the month, day, hour, minute, and tenth of minute in BCD. Notice that the year was not returned: we had to reassemble the CTSS supervisor every year to change the constant.

it was carried over when the guys did cp/40 and then cp/67 ... and there was a virtual chronolog device at unix 0FF sort of defined by default (at least in every cms virtual machine) ... except it was extended to return


byte   value
0-7    mm/dd/yy
8-15   hh.mm.ss
16-19  total kernel/supervisor CPU time since logon
20-23  total virtual CPU time since logon

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

layered approach

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: layered approach
Newsgroups: comp.protocols.tcp-ip
Date: Tue, 18 May 2004 12:19:54 -0600

Barry Margolin writes:
>why do we need layered approach (OSI model)?

I'm sure this is explained in the first chapter of your textbook.

if it is the OSI model specifically ... one could consider that one might need OSI if they wished to outlaw LANs, WANs, and the Internet.

ISO international standards body had a rule that ISO and its national chartered bodies couldn't consider standards that violated OSI.

I was involved in HSP (high-speed protocol) attempt to work on it as standard in X3S3.3 (the us body chartered by ISO for working on level 3 & 4 standards in the OSI model ... aka network layer and transport layer).

HSP had a specification that would go directly from the level 4/5 interface directly to the LAN MAC interface. This violated the OSI model and therefor X3S3.3 couldn't work on it for standards.

The problem was (at least) two fold:

1) going directly from level 4/5 interface to the LAN MAC interface skipped the level 3/4 interface and therefor violated the OSI model

2) the LAN MAC interface sits logically somewhere in the middle of the OSI level 3 network layer (at least in part because it includes some network addressing function). by definition anything that interfaces to a LAN MAC interface is in violation of OSI model ... because LANs are in violation of the OSI model.

The Internet issue is that the Internet defines a "internetworking layer" (internetworking protocol layer or IP for short) that doesn't exist in the OSI model ... the osi model only has level 4, transport and level 3, network ... there is no such thing as an "internetworking layer". Therefor anything that involves IP and/or the internet is also in violation of the OSI model .. and by the ISO rules could not be considered.

basically the OSI model reflects somewhat a pre-70s communication state-of-the-art with point-to-point copper wires and disallows the invention of LANs, WANs, and the Internet (or at least the standards body for OSI disallowed inventions that violated the OSI model).

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Tue, 18 May 2004 12:04:09 -0600

glen herrmannsfeldt writes:
I thought it was the next instruction, and you use the instruction length code in the OPSW to correct it. I think not all exceptions do it the same way, though.

yes, i was obfuscating too much detail ...

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Why does Windows allow Worms?

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Why does Windows allow Worms?
Newsgroups: comp.os.ms-windows.misc,comp.os.ms-windows.networking.misc,comp.security.misc,comp.windows.misc
Date: Tue, 18 May 2004 14:43:29 -0600

Bruce Barnett <spamhater95+U040518143634@grymoire.com> writes:
Perl has a mechanism to tag data that came from non-trusted sources. Certain operations cannot be executed because the data is tainted. It's not foolproof, and user errors in Perl coding can occur, but I suspect it's easier for a beginner to write safe Perl code than it is to write secure C code.

the converse is true ... anybody (beginner or not) can write unsafe C code ... detailed analysis at least in the 80s identified the traditional string handling library functions with implicity lengths as creating an extrodinarily unsafe environment. somewhat like when the hew came into farming environment and mandating all farm equipment needed protection because even experience farmers where getting caught in one thing or another. the standard C string library functions and string handling paradigm is hazardous equipment.

detailed vulnerability analysis in the late 80s predicated that C language environment would have a factor of ten times to hundred times more buffer length related problems than other program environments with better length handling paradigms (because of the standard length handling paradigm that was part of the standard C environment).

something like 30 years ago, mid-70s, a situation was analysed and documented regarding vulernability allowing exectuable code of any kind to arrive over a network interface.

computing paradigm from the 60s was systems where all software and programs were relatively carefully vented and installed by experienced and trained stated. ordinary people might be able to use such systems, but didn't actually write code &/or install executables. the problem that started appearing with various time-sharing systems in the 70s that supported personal computing ... was that some of them actually allowed end-users to introduce executable code.

The other characteristic is that most of these time-sharing systems from the 70s (or earlier) at least started with the basic premise in their design that they had to protect different users from each other. That permeated the basic design through-out the system.

The stand-alone, dedicated personal computers from the 80s had none of these problems ... they didn't require partitioning that protected a very large number of different users from each other ... and they didn't have to worry about foreign and possibly hostile executables arriving over any network.

There are two somewhat different vulnerabilities:

1) huge number of compromises because of fundamental flaw in the length paradigm used in C language

2) partitioning and security features that needed to a) isolate different local users from each other and eventually b) isolate a user from a foreign and hostile network environment

so there is some analogy to automobiles. long ago and far away ... automobiles required drivers that were trained in all the quirks and mechanics of an automobile. the problem was eventually that somebody wanted to sell everybody a car ... but there wasn't enuf personal chauffer/mechanics for everybody to have a car. they had to come up cars that people could operate themselves w/o requiring a personal chauffer/mechanic.

so if everybody was going to operate their own car ... they eventually had to

1) require a minimum of expertise ... so there were mandated training and licensing programs

2) require a huge amount of mandated safety features

4) have institutonalized vehicle safety checks

3) have a bunch of laws that could charge people with a) reckless operation of a vehicle and/or b) operation of an unsafe vehicle. This could confiscate their vehicle and take away their driving rights. It didn't matter whether people knew anything about the mechanics of a car or not ... they were still liable for operating an unsafe vehicle.

now these are applicable for operation of a vehicle in a public environment. if you have a vehicle that you will only operate in your backyard and never bring into a public environment you aren't subject to most of the regulations.

recent post about looking at entries in CVE database:
https://www.garlic.com/~lynn/2004e.html#43 security taxonomy and CVE

collection of past posts about all kinds of vulnerabilities, exploits, and fraud:
https://www.garlic.com/~lynn/subintegrity.html#fraud
collection of past posts related somewhat to the reverse ... assurance
https://www.garlic.com/~lynn/subintegrity.html#assurance

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Tue, 18 May 2004 15:53:09 -0600

Brian Inglis writes:
That was done before SMP came along: the CP (central processor) handled I/O, and APs (attached processors) ran app code, and requested services on the CP. ISTR Lynn Wheeler's VM resource manager/HPO dynamic adaptive code switched into non-SMP mode to avoid cache thrashing when the interrupt rate or system CPU went over some threshold. ISTM it would be easier to design a single architecture and drop some hardware to customize a core, similar to the split of 370/158? CPU and I/O microcode onto separate cores to produce the next faster model (168?/3033?)

note quite ... the resource manager dynamically switched into disabled interrupt mode when the i/o interrupt rate exceeded some limit.

nominally most of the kernel ran disabled for external & i/o interrupts .. but all virtual machine execution was fully enabled for interrupts. the virtual machine was run with a time-slice ... and if nothing else interrupted the virtual machine ... there would at least be an external timer interrupt into the kernel. the kernel would then at least update some dispatching priority and possibly re-arrange things so a different virtual machine would be chosen to run.

there was two parts. the most frequently executed instructions were in the dispatcher loading up all the stuff for dispatching a virtual machine. you can see that in the CPU hotspot measurements we took as part of the ECPS effort (looking to see where the kernel spent most of its time):
https://www.garlic.com/~lynn/94.html#21 370 ECPS VM microcode assist

so frequently, if it had been in the kernel for some time ... there was some queued I/O interrupts that had arrived and where queued up ... and would interrupt as soon as the virtual machine was dispatched. that wasted all of those instructions "dsp+8d2" to "dsp+c84" referenced in the above ECPS URL.

So, there is an instruction "SSM" ... set system mask that can change whether you are enabled or disabled for i/o interrupts. I placed a pair of such instructions effectively just before all the work to load up a virtual machine that opened an I/O interrupt window and then immediately shut it. If there was a queued interrupt(s), the processor would go off to the interrupt handling routine and not get to the 2nd SSM instruction in the dispatcher. There was still small possibility that an I/O interrupt would arrive during the load-up ... but that period was very short.

So the interrupt window effectively eliminated wasting loading up a virtual machine if there was already any queued i/o interrupts.

So the next piece was the dynamic adaptive stuff ... if you ran the virtual machine disabled for I/O interrupts ... and only took interrupts with the dispatch interrupt window ... it would increase the latency processing for handling the interrupt. The problem was that if the environment was a very high interrupt rate ... the constant switching back & forth between application/virtualmachine execution and interrupt handling could totally destroy any cache localty and system thruput. So the dynamic adaptive stuff had to make trade-off decision between interrupt latency processing and cache hit rate associated with interrupt rate. Furthermore, if you slightly delayed interrupts ... and processed multiple in a batch ... the improved cache hit rate of the i/o interrupt processing might actually not only improve thruput ... but actually reduce avg. latency.

So the default was run the virtual machine enabled for I/O interrupts which allowed for latency optimization. However, the i/o interrupt rate would be monitored and considered to be so low as to not have significant effect on cache hit ratio. It was at high i/o interrupte rates ... that the dynamic adaptive stuff would switch the system back & forth between free-for-all i/o interrupts and "structured" interrupts only thru the interrupt window in the dispatcher.

Part of the latency issue in a heavily loaded system was effective device utilization ... since the I/O interrupt routine was also responsible for redriving the device with any pending i/o requests. On 2305 fixed head disks ... this was compensated for by having "multiple exposure" feature. The basic channel infrastructure only supported a single request at a time for device. Multiple exposures created multiple logical devices for the purpose of the channel program structure ... which were all mapped to the same physical device. With a half-dozen requests constantly queued in the hardware for a device, any redrive latency in the i/o interrupt handler was less of an issue. I tried w/o much success to get multiple exposure feature added to other high utizalization devices.

Later working in the non-mainframe environment ... I did get some stuff done for multiple command queueing ... that helps address the device redrive latency.

The problem in the (370) SMP environment is somewhat more complicated with shared controllers being used to simulate shared channels ... and the original (370) SMP only had a very thin layer that had been SMP'ed ... and the rest was behind a global kernel lock (although a bounce lock with queueing rather than a spinlock). If you've turned off i/o interrupts while running in virtual machine mode ... and one processor monoplizes kernel execution (and the other is busily churning away with virtual machine execution) .... it is fantastic for cache hit ratio ... but terrible if there are queue I/O interrupts on channels dedicated to the processor not getting into the kernel. So you have to play some games about making sure that a processor always gets some window to drain pending i/o interrupts on dedicated channels (again you are trading off cache efficiency against i/o service latency).

For 370 158 & 168s ... IBM announced both SMP configurations and what they called "attached processor" configuration. An "attached processor" configuration was fully shared-memory SMP ... but one of the processors had no (dedicated or otherwise) channels at all. In this case, the dynamic adaptive code didn't have to worry about a processor getting i/o interrupt starved ... because one of the processors didn't have any channels. In this scenario the dynamic cache efficiency optimization wasn't to manage just the bad effects of i/o interrupts but also to optimize cache from the effects of cross processor migration (execution of same code constantly changing between processors and loosing cache locality).

Now, 370s, 303x, 3081s, etc ... in two processor configurations ran caches at .90 times that of a uniprocessor (to allow for all the cross-cache chatter); therefor a two-processor system was only 1.8 times a single processor system. A SMP kernel might throw in a lot of other overhead ... so a two-processor SMP system might have only 1.3-1.5 times the effecitvely thruput of the same workload on a uniprocessor.

With all sorts of tricks and magic in the SMP code ... and the dynamic adaptive stuff doing some more magic ... there were 370 "attached processors" configurations that were running at more than twice that of a uniprocessor ... the magic with cache locality more than offset the hardware only being 1.8 times (and whatever minimum magic additional kernel SMP pathlength there was).

For SP1 there was a rewrite to make it less magic and more elegent, traditional SMP implementation. It had the adverse downside that most of the cache locality optimization was lost and typical customers were finding that in a two processor system ... both processors were spending ten percent of elapsed time executing new SMP kernel overhead that it hadn't been executing before.

... oops, longer than i thot it was going to be, guess i got carried away again.

The 370/158 supported integrated channels ... the processor engine inside the 158 ran microcode that implemented both the channel function as well the 370 processor instructions. The 370/158 was also on the knee of the manufacturing cost/performance curve ... something like some automobile assembly lines.

For the 303x line of computers ... they decided to create a channel director ... which was actually a repackaged 370/158 engine running only the integrated channel microcode (supporting six chnannels). The 3031 was a 370/158 repackaged to only have the 370 microcode and to use external channels (in the channel director) rather than integrated channels (time-shared in the processor). The 3032 was a 370/168 repackaged to use the (erzats 370/158) channel director. The 3033 started out being the 370/168 wiring diagram remapped to newer technology chips that were 20% faster. Along the way, they decided that the 3033 needed to be more than 20% faster than the 370/168/3032. The newer chip technology had something like ten times the circuit density of the 168 chips ... but wasn't being utilized because of the simple remapping of the 168 wiring diagram. Some amount of the 3033 performance boost project was redoing the wiring so that it did a lot more on-chip operations. The resulting 3033 was more like 50% faster than the 168 instead of only 20% faster.

I had previously mentioned VAMPS, a 5-way shared-memory processor project where I sort of originally created the whole structure for bounce lock and really lightweight queued kernel requests. It didn't get announced. Somewhat in that timeframe there was also work on something called logical machines .... a 16-way 370/158 shared memory multiprocessor w/o cache consistency (which also didn't get announced). This was sort of the tail-end of the VAMPS effort and before the official product VM/370 SMP support gat underway.

For logical machines, the VM kernel would reliably use compare&swap ... which had cache specific semantics on everything it needed to in the kernel ... and there was feature for cache flush. So the CP kernel would understand about shared memory across the 16 processors ... and would dispatch different virtual machines on specific processors ... which were essentially all private, non-writeable shared virtual address spaces. R/O shared virtual memory was fine and non-shared virtual memory was fine. Anything that might want to operate in this machine with read/write shared memory had to faithfully use compare&swap for all such memory operations ... since not only did it do the compare&swap semantics ... but it was also the only instruction that had any cross-cache semantics.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Tue, 18 May 2004 17:35:04 -0600

slight digression ... there was some other systems and kernels that had SMP support for these machines.

a specific one had really, really bad thuput and performance on both uniprocessor and two-processor. however, they went around highlighting the fact tha they had really, really great SMP support (support was carefully used in place of thruput) becuase their two-processor thruput was 3.8 times their single processor support. it was sort of met to confuse customers between the issue about really great SMP thruput and really great SMP support.

actually their claim to great SMP support was that their two-processor thruput was 3.8 times their single processor thruput.

what they didn't say was that the single processor benchmark was on a 1mbyte machine and their two-processor benchmark was on a two mbyte machine (if you have twice the machine ... then you double both the number of processors as well as the amount of memory).

so the real issue was that they also had a really bloated kernel with large fixed storage requirements and there was barely any storage left for applications with only one mbyte of real memory. on a two processor machine with only a single copy of the kernel and twice the real storage, they had nearly ten times the amount of real storage available for applications ... and that was the real reason the two processor benchmarks had 3.8 times the thruput of the single processor benchmarks.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

command line switches [Re: [REALLY OT!] Overuse of symbolic

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: command line switches [Re: [REALLY OT!] Overuse of symbolic
 constants]
Newsgroups: alt.folklore.computers
Date: Tue, 18 May 2004 21:34:49 -0600

Peter Flass writes:
This is the way VM's DCSS's (Discontinuous Saved Segments) work, and it's a mess. The creator has to assign the address when the DCSS gets built. When you get a lot of them you get overlaps or segments that want to occupy the same virtual address space as other segments, and you've got trouble.

I think what Lynn was talking about are programs that contain no absolute addresses, so they can be loaded anywhere and, better yet, mapped to different virtual addresses in different processes while still occupying only one set of physical addresses. With the 360, etc. architecture it's potentially possible, since the instructions only contain base/displacement addresses, but it's not easy. I think it's also possible on a PC. Is this the meaning of PIC (position independent code) that Linux uses for shared objects?

i had done the precursor to DCSS .. and called it virtual memory management ... and it was part of some other stuff i did for a cms page mapped filesystem
https://www.garlic.com/~lynn/submain.html#mmap

the issue in 360 & 370 is you have a program


balr  r12,r0
               using *,r12
.....
l    r3,=a(abcd)
tm   0(r3),x'ff"
               ...
...
               bunch of instructions
...
...
abcd           dc  F'0"

ltorg
               =a(abcd)

... now lets say that abcd is at location x'800' within the program and the literal pool is at x'a00' in the program ... then the assembler will generate something like


5830C9fe

i.e.
58   load opcode
3    into register 3 from storage
0    no index register
C    base register 12
9fe  displacement added to contents
     of reg 12 to form address where
storage is to be loaded.

now the standard for os/370 and cms was that the storage location =a(abcd)/a00 got stuck with a value something like 800 and some dictionary stuff that said that storage location a00 contained a value relatively to the begining of the program (aka relocatable adcon). when the program was loaded into memory ... the loader would run thru all the relocatable adcons listed by the directory and physically adjust them so that they were absolute addresses ... i.e. if the program got loaded at x'1000000' ... then the loader would add x'1000000' to the contents of storage location x'a00' (resulting in x'1000800') before starting the program. at the time the program is running all "relocatable adcons" were now absolute values ... not relative to anything.

so if I had a program that occuped same r/o shared segment at different virtual addresses in different virtual address spaces ... all the (absolute) relocatable adcons ... could at best take on a single value ... which means it would be limited to working at one specific (virtual) address location. so it doesn't work ... at least not until we get quantum computing at the adcon will know what address it is supposed to be based on what address it is being used from.

So the issue is how to be address constants work in an address free environment ... more so than just the 12bit displacement paradigm supported by instruction storage address calculation.

so to make it work .... i take a page out of the instruction decoding and force address constants (which can be 24bits ... or later 31bits) and make them 24bit displacements ... rather than 24bit absolute addresses. first off, you have to turn off the whole relocation adcon directory infrastructure, if i specified something like:

=a(abcd) it would store the displacement and leave around the information for the loader to add the BASE address to it at load time before the program started running. however, if I specified =a(abcd-base) it would believe it was a displacement/absolute address and not leave any work orders for the loader. The unloaded value would be the same .... however, the first form is fixed into an absolute address by the loader when the program is brought into memory. the second form the loader leaves alone and remains a displacement.

now to get the real usable value for the second (displacement) form, there has to be some inline code at run time that


l     r3,=a(abcd-base)
ar    r3,r12
tm    0(r3),x'ff'

the issue is that the original 360 displacement addressing only allowed for 12bit displacements (that are physically part of the instruction) to be automatically handled by hardware address calculation.

So I went thru some amount of CMS kernel code, the original CMS editor, some number of other programs and converted them from traditional os/360 "relocatable adcons" to fixed displacements with inline, runtime code that calculate the real address. I also later did something similar to iso3270, browse and fulist package. Some amount of this code wasn't "read only" ... so I had to do various restructures for it to reside in a read/only shared segments.

The DWSS shared segment code was based on having captured a virtual memory snapshot of the code and putting away in a special VM/370 kernel structure. The original code included support from the CMS paged mapped filesystem ... instead of having to record the shared segments in special VM/370 control structures ... I just put virtual memory images out into the CMS filesystem ... and applications could load virtual memory images directly out of the page mapped filesystem ... and optionally specify whether they were shared or not shared.

So originally there was a bunch of VM/370 kernel code to support the "shared segment" extensions, the page mapped file system for cms. In addition, there was a bunch of cms kernel and application code converted to be "read only" and be address free ... as well as the support for the page mapped filesystem ... and the CMS support to load application images from the filesystem and specify whether they were "shared" or "not shared".

The CMS people wanted to pick up all the code for DWSS ... but the VM/370 kernel people only wanted to add a drastic subset of the changes to the VM kernel. So the VM/370 kernel people didn't pick up the page mapped filesystem support and only a very restricted subset of the shared segment changes ... and then they cribbed the DWSS diagnose instruction to interface to that drastic subset.

The CMS people picked up most of the shared segment code ... but left out all the page mapped filesystem support (since the necessary vm/370 kernel support had been dropped). Now, this did create something of an anomoly ... all of that initial CMS changes for DWSS shared segment had code changes for both read-only code as well as eliminating absolute adcons (allowing the same exact memory image to execute simultaneously in different virtual address spaces at different virtual addresses).

So one of the anomolies in the CMS DWSS code was the SVC$202 in page zero. Normal calls in CMS involved loading the address of a parameter list in register zero (which also specified the function) and doing executing SVC 202. This could be the form of:


                   SVC 202
instructions


SVC 202
                   DC  AL4(*+4)
instructions

the cms kernel svc handler would look at the first byte after the svc instruction and if it was zero (this is 24bit addressing ... so a 32bit adcon would always have the high byte zero) and for a normal return skip the 4byte adcon. The address following the svc call was for error returns. If there was an error in the processing, and there was an adcon, the kernel would load the address in the adcon to return to. If there was an error in the processing, and there was no adcon, the kernel would go off to some standard kernel error handler and never return to the program. I could go thru all the applications and remove the DC AL4(*+4) which eliminates the relocatable address problem ... but all application specific error handling and recovery is lost. As an aside, while the interface allowed any address to go into DC AL4(), common usage would have both error returns and non-error returns come to the following instruction which would then check the return code in the register for normal/error.

In any case, I did a hack ... I put a dummy svc in CMS kernel page zero ... in CMS NUCON and called it:


SVC$202            SVC   202
ERR$202            DC    AL4(*+4)
BR14$202           BR    R14

and then inline application code I changed from


SVC  202
DC   AL4(*+4)
                   instructions


                   BAL  R14,SVC$202
instructions

aka ... go off to the NUCON svc instruction ... which would return to the branch on register 14 ... which had the value of the next instruction from the branch and link operation.

that was part of the distributed CMS dwss ... which wasn't required for fixed address shared-segments ... but I had only invented to handle the case for "floating" shared segments.

Later, the SVC202 processing was redefined ... and allowed for


SVC  202
DC   AL4(1)
instructions

the "1" is easily recognizable as not a valid address ... and the kernel should return to *+4 after the svc whether there is an error or not.

a couple past posts on shared sgements and some nucon SVC$202 hack that I had done for address free code.
https://www.garlic.com/~lynn/2001f.html#9 Theo Alkema
https://www.garlic.com/~lynn/2002o.html#25 Early computer games
https://www.garlic.com/~lynn/2003f.html#32 Alpha performance, why?
https://www.garlic.com/~lynn/2003g.html#27 SYSPROF and the 190 disk

somebody want to see some general cms application assembler code with the use of adcons, svc 202s, and other stuff ... this is a version of Kermit for CMS dated 1982:
http://www.ibiblio.org/pub/academic/computer-science/history/pdp-11/rsx/decus/rsx83b/356040/cmskermit.asm

over the years people have found other uses for svc$202; various old postings in the vmshare archive referencing svc$202
http://vm.marist.edu/~vmshare/read.cgi?fn=INTRANS&ft=MEMO&line=116
http://vm.marist.edu/~vmshare/read.cgi?fn=STRANGE&ft=MEMO&line=125
http://vm.marist.edu/~vmshare/read.cgi?fn=STRANGE&ft=MEMO&line=171
http://vm.marist.edu/~vmshare/read.cgi?fn=STRANGE&ft=MEMO&line=236
http://vm.marist.edu/~vmshare/read.cgi?fn=TT&ft=NOTE&line=17
http://vm.marist.edu/~vmshare/read.cgi?fn=CMSSPR2&ft=MEMO&line=147
http://vm.marist.edu/~vmshare/read.cgi?fn=DVF&ft=MEMO&line=225

... how 'bout source for program originally written 1976 .. but posted version in the archive has some changes dated in mid-80s.
http://vm.marist.edu/~vmshare/read.cgi?fn=PRTRCVR&ft=NOTE&line=1

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

command line switches [Re: [REALLY OT!] Overuse of symbolic

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: command line switches [Re: [REALLY OT!] Overuse of symbolic
constants]
Newsgroups: alt.folklore.computers
Date: Wed, 19 May 2004 09:54:39 -0600

jmfbahciv writes:
How one views the mapping depends on which hat you have on.

From the kernel's POV the mapping is NOT physical-->virtual; it is exactly the other way around. The user sees a virtual; the monitor has to see a physical (CPUs do not execute virtual code).

For a user program to be "sharable" it can't have hardwired absolute addresses. If it does, then the monitor has to put that code at exactly that physical location.

the issue is not physical addresses it is fixed addresses ... if they are virtual fixed addresses ... the virtual->real translation hardware converts them to some physical address.

a standard 360/370 address constant was typically a 32bit word containing a 24bit address ... absolute/fixed ... relative to "zero". if it was running real ... it was relative to real location zero. if it was running virtual, it was relative to virtual location zero.

these were called "relocatable adcons" ... in the sense they were stored as relative/displacement addresses ... and an administrative dictionary was kept for the loader. when the loader/binder/etc brought the image into memory ... it would access the administrative dictionary and run thru the program image converting all the displacements to absolute (relative to zero) before starting the program.

the 370/370 had convention in instructions for base+displacement that the hwrdware automatically resolved to absolute address at instruction decode. the standard form was


BDDD

B  ... 4bit/16 register
DDD .. 12bit displacement

the contents of the register was combined with the displacement to form the effective address (relative to zero) at instruction decode time. each dynamic image of a program had its own private copy of registers ... so, from a instruction decode standpoint, (most of) the program image was not tied to specific address (relative to zero, whether virtual or real).

the thing that tied a program image to a specific absolute address (relative to zero, whether virtual or real) was the os/360 convention of relocatable adcons ... which in the executable program image was always converted from a displacement to an absolute address (relative to zero).

the issue that I was grappling with was that I wanted the program image to be completely location free ... not only all the address processing in instructions (which had convention of register plus 12bit displacement that was resolved dynamically to an absolute address at instruction execution) ... but also all the "relocatable adcons" (which were being converted to absolute by the standard os/360 loader process).

each program image (address space) had its own private copy of registers. the instructions in the program image were mostly location free because of the instruction convention of addresses being register contents plus 12bit displacement. the issue that prevented a program image from been truely location independent was that they tended to be sprinkled with these 32bit words containing absolute addresses (rather than displacements, they were relative to zero, whether virtual or real).

so i had to go thru code that i wanted to make totally location independent/free and fixup all uses of address constants ... by forcing them to be displacements and modifying the code to do some inline code sequence that added the contents of a register to the dispalcement value to form the absolute address value. From a programming point of view, absolute address is the same whether running virtual or real. When executing virtually, the hardware translation takes a virtual absolute address (relative to virtual zero) and converted to an absolute real address (relative to real zero).

so if i have a program image that is part of a read-only, shared, virtual memory segment ... then if the program image starts at x'100000' in one virtual address space and the same program image starts at x'200000' in a different virtual address space ... then all the instruction storage addresses will work correctly because there will be some register with x'100000' (or x'200000' depending on the address space) ... and all the 12bit displacements will be added to the register value and converted to the correct absolute address for the virtual address space that it is executing in.

however, the program image contains a number of different modules, say each 4kbytes in size or larger (larger than 12bit displacement). to branch between such modules the conventional instruction sequence picks up an absolute address contained in the program image and branches to that location. Since it is an absolute address ... it will need to be relative to zero ... and can either be address of the form x'1nnnnn' or x'2nnnnn' ... but it can't be both (at least w/o having something like quantum effects).

In conventional os/360, these are referred to as relocatable adcons ... supposedly giving the program-image location independence ... but they are only stored as a displacement with respect to some base before they have been loaded into memory. as part of the loader bringing the program-image into memory, the loader runs thru a dictionary of all adcons and converts them from relative to something within the program ... to absolute (relative to location zero). during normal execution, all of the relocatable adcons have become absolute.

For my purposes to have a read-only program image to occupy different (virtual) addresses in different virtual address spaces simultaneously ... the program image had to be totally location independent (even at run time). All the relocable adcons had to still be displacement form at runtime ... not absolute form.

One possible solution that some other technologies have used is totally separate instructions and data (not allowed intermixing of data and instructions that can occur in 360 programs). Address constants are placed in special part of the address space that is not-shared and private to the virtual address space. The programing convention has a process where the program can find the location of its address constants. These address constants can be absolute, but since they aren't part of the program image, they can have the value that is specific to that address space.

the problem that i was having in 360/370 with location independent read-only, shared program image was a combination of the convention of having absolute address constants (at least at runtime) and the address constants were part of the program image.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Wed, 19 May 2004 14:51:40 -0600

hack@watson.ibm.com (hack) writes:
No, because it is in processor-private real memory (z/Series page 0).

360/370 architecture (and later) used real page zero for a lot of processor specific operations ... for example on an interrupt to a specific processor ... the current processor status is stored in real page zero.

so for 360 SMP ... there are multiple real processors ,,, sharing the same flat real address space ... where there are multiple processors (each needing their own page zero) and there is only a single page zero.

the solution was a processor specific page zero "prefix register" ... where the processor loaded a specific (real) page address which was used to modify all absolute real addresses for page zero. of course processors in the same smp complex all agreed to choose different real page addresses for their page zero prefix registers.

On 360 SMP ... any real addresses that mapped to real page zero were rerouted to the page specified by the processor's page zero prefix register. Any real address that mapped to the value in the prefix register ... also went to that real address.

The operation of the SMP page zero prefix register was changed in 370. For real addresses that mapped to real page zero where rerouted to the page specified by the processor's page zero prefix register (just like 360) ... however 370 introduced reverse prefix register mapping. In 370, for a real address that mapped to the page specified in the page zero prefix register ... instead of going to that address ... in went to the "real, real" page zero ... aka the page zero that is the same for all processors in the complex.

In the 360 SMP scenario ... the real, real page zero was somewhat lost. The enhancement for 370 SMP ... was the real, real page zero could become a convenient place to store common, cross-complex information.

here is specific section in esa/390 principles of operation discussing prefixing
http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/DZ9AR004/3.7?SHELF=EZ2HW125&DT=19970613131822

here is the search page from esa/390 principles of operation for words prefix register
http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/SEARCH?Book=dz9ar004&searchRequest=prefix+register&SEARCH=Search&Type=FUZZY&SHELF=EZ2HW125&DT=19970613131822&searchTopic=TOPIC&searchText=TEXT&searchIndex=INDEX&rank=RANK

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

command line switches [Re: [REALLY OT!] Overuse of symbolic

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: command line switches [Re: [REALLY OT!] Overuse of symbolic
constants]
Newsgroups: alt.folklore.computers
Date: Thu, 20 May 2004 09:39:00 -0600

jmfbahciv writes:
Then that code isn't a sharable segment. A sharable segment means that the only thing JMF had to do was update the users's PMP (page map page) with the PMP of the sharable segment and that the PMP of the sharable segment could be put into more than one user's PMP. I can't recall how it was done on the KA (before paging hardware). But a similar thing happened. The sharable segment was in core at ADDRfoo; all users who had that high seg "in core" pointed to the same physical address. Note that these segments were read-only.

360/370 sort of had location independent code .... in 360, the kernel could bring code into arbritrary location in real memory (for the most part because instructions were location independent having address of the form register+displacement values) ... and some of it was even "shareable" by all applications running in real memory. the issue that prevented these program images from being totally location independent were these thing called "relocatable adcons" which were given an absolute address at load time.

the issue for cp/67 & vm/370 ... with using the virtual memory hardwware and segment sharing for share between different virtual address spaces. on 370, vm/370 used 64kbyte segments & in 24bit addressing that provided up to 255 segment "objects".

The problem with relocatable adcons ... and the subset of virtual memory management released in the product as DWSS ... was that at installation time, a fixed/absolute virtual address had to be chosen for a shareable program. Now any program image might be one or more 64kbyte shared segments ... and there were lots more program images to be defined as shared than there were unique virtual addresses. For a specific user, CMS tended to have all applications that person needed running in a single virtual address space.

The issue became when a person needed combinations of shared program images simultaneously in the same address space. For the first dozen or so ... the system installation process could fabricate unique virtual addresses for the desired combinations. However, passed a certain point there weren't sufficient system-wide unique virtual addresses (combination of one or more 64kbyte contiguous segments in 24bit address space) to satisfy all possible user requirements at the installation. No specific user might want more simultaneous shared objects than could in a single 24bit address space ... but for the aggregate of all users wanting some combination subset of all possible shared objects ... being able to assign unique system-wide virtual address for every possible shared object became impossible.

So the original virtual memory management addressed the limitation of having at most 255 possible shared objects were two fold:

1) shared-objects could be normal program images in the cms filesystem ... rather than in a global, system-wide kernel defined facility. subsets of the user community could have there own set of program images that were defined as shared ... and didn't need to follow a single, system-wide convention defined for all users at the original installation of the application

2) it was possible to go thru various applications and modify the relocatable adcon convention ... to use address constants that were displacements/offsets/relative to some value that would be in a register ... and have inline code to add the displacement value occupying the program image to some value in a register. the net result was that the program image became totally location independent ... and the same program image could occupy a read-only shared segment and that shared-segment could be defined as arbritrary different virtual addresses in different virtual address spaces.

The whole relocable adcon philosophy was invented in the days of os/360 and real memory ... where a single real memory was shared by all simultaneously running applications. The program image on disk could be loaded at an arbritrary location ... so any arbritrary combination of programs could occupy whatever (real) available addresses available. The installation didn't have to go thru every available installed program at installation time and assign a arbritrary unique address ... which could prevent some combinations of applications of running simultaneously if they were forced to pre-assign fixed, absolute address to every program.

However, the os/360 people hadn't yet considered virtual address spaces and shared segments ... where different virtual address spaces might want to have arbritrary different combinations of program images ... but still want to have the physical space taken up by such program image to shared across all simultaneous users. While the instruction architecture defined for program images that were location independent ... the os/360 designers made some sort cuts in defining convention for location related data (i.e. the relocatable adcons).

So, I spent a fair amount of time trying to overcome the short cut taken with the relocatable adcons convention actually being absolute addresses at execution time ... and being the chink in preventing executing program images from being totally location independent.

Note that also, the relocable adcon convention also prevented the program image on disk being exactly the same as the executing program image (since there was the little matter of doing the relocatable adcon swizzle from offset to absolute at load time). This then created opportunities with the page mapped filesystem and simply pointing the paging system at a program image in the filesystem and depending on the paging system to do all the work (not requiring the loader to swizzle the adcons when the page was brought into storage for execution): other posts about page mapped filesystem
https://www.garlic.com/~lynn/submain.html#mmap
past post about location independent program immages
https://www.garlic.com/~lynn/submain.html#adcon

An sort of side story ... was somewhere along I was doing some work on logical machines (part of a recent thread in comp.arch on symmetric multiprocessing) ... and there was a corporate advanced technology conference ... where it was asked that logical machines be presented. Also on the agenda was presentation of 801 and cp.r.

so during the logical machine presentation, there was some heckling by somebody in the 801 group ... who stated that they had examined the product vm/370 code at the time and it contained no SMP support ... so they didn't believe that we could run a vm/370 system on a logical machine (a non-cache coherent, 16-way smp ... with 16 370/158 engines all tied together). the response was that the code in the vm/370 could be modified to support multiprocessor (the heckler from the 801 group was basically expressing the opinion that they didn't believe that somebody could write a couple thousand lines of code modifications to the vm/370 kernel).

so it came time for the 801 group to give their presentation ... and they said that the machine was 32bit virtual address and it had 16 256mbyte segments ... which were implemented as 16 segment registers (rather than a table of 16 segment table pointers). I returned the curtesy of heckling them about 1) the huge number of lines of code being proposed for cp.r ... sure seemed like a lot bigger effert than the rather modest amount of changes needed to enhance vm/370 to support 16-way multiprocessor (and which they sort of had implied disbelief that it could be done) and 2) having only the limited 16 possible segments .... seemed to severely restrict the total number of useful different shared objects that could defined in such an environment (which came from trying to grabble with the problem with trying to manage an environment with a maximum of 255 unique shared objects in the 370 world).

the reply was that there were no protection domains in 801 at runtime; that all application, runtime code could change the contents of segment registers as easily as it could change the contents of general registes. that the convention allowing free access to any location in a 32bit (virtual) address space by simplying changing the contents of a general register to address the location ... was extended to the segment registers to provide the application inline code to any of the possible system segments (i.e. applications could as easily do cross address space addressing as it could do intra-address space addressing).

this was somewhat the target for the office product division displaywriter follow-on product; 16bit 801 processor called ROMP running with cp.r ... written in pl.8. A problem did start to show up when the displaywriter project was canceled and the decision was to retarget the machine to the unix workstation environment. They would get the company that did the AT&T port for PC/IX to do one to what was to become called the PC/RT. An issue tho with unix there was something of assumption about having hardware protection domain between what applications could do and what the kernel could do ... and you didn't provide every arbritrary application free wheeling access to all privileged protected security features.

recent post in comp.arch mentioning logical machines
https://www.garlic.com/~lynn/2004f.html#21 Infiniband - practicalities for small clusters

other posts about 801, romp, etc.
https://www.garlic.com/~lynn/subtopic.html#801

other posts about smp, compre&swap, etc
https://www.garlic.com/~lynn/subtopic.html#smp

misc. other posts in the recent comp.arch thread
https://www.garlic.com/~lynn/2004e.html#40 Infiniband - practicalities for small clusters
https://www.garlic.com/~lynn/2004e.html#41 Infiniband - practicalities for small clusters
https://www.garlic.com/~lynn/2004e.html#42 Infiniband - practicalities for small clusters
https://www.garlic.com/~lynn/2004e.html#44 Infiniband - practicalities for small clusters
https://www.garlic.com/~lynn/2004e.html#51 Infiniband - practicalities for small clusters
https://www.garlic.com/~lynn/2004e.html#52 Infiniband - practicalities for small clusters
https://www.garlic.com/~lynn/2004f.html#3 Infiniband - practicalities for small clusters
https://www.garlic.com/~lynn/2004f.html#4 Infiniband - practicalities for small clusters
https://www.garlic.com/~lynn/2004f.html#5 Infiniband - practicalities for small clusters
https://www.garlic.com/~lynn/2004f.html#6 Infiniband - practicalities for small clusters
https://www.garlic.com/~lynn/2004f.html#15 Infiniband - practicalities for small clusters
https://www.garlic.com/~lynn/2004f.html#16 Infiniband - practicalities for small clusters
https://www.garlic.com/~lynn/2004f.html#19 Infiniband - practicalities for small clusters
https://www.garlic.com/~lynn/2004f.html#22 Infiniband - practicalities for small clusters
https://www.garlic.com/~lynn/2004f.html#25 Infiniband - practicalities for small clusters

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

[Meta] Marketplace argument

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: [Meta] Marketplace argument
Newsgroups: comp.arch
Date: Fri, 21 May 2004 13:34:14 -0600

Rick Jones writes:
The computing industry isn't as much a game of "Follow The Leader" as it is one of "Ring Around the Rosy" or perhaps "Duck Duck Goose." - Rick Jones

there was a joke in the valley during the early 80s ... and possibly still that there were actually only (at most) 200 people in the industry ... it isn't either "follow the leader" or "ring around the rosy" ... it is just the same people in different disguises.

there use to be a page on some hp.com url that described the work of somebody who done the dual address space 370 architecture (i.e. having software that needed to have addressing to multiple, different virtual address spaces), worked on 801 fort knox (replace all the micro-engines in various controllers, misc. general purpose computers as well as the low&mid-range 370s), and snake (and talked within past couple years recently about having done a lot of the ia-64 architecture).

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

[Meta] Marketplace argument

Refed: **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: [Meta] Marketplace argument
Newsgroups: comp.arch
Date: Fri, 21 May 2004 21:18:07 -0600

"del cecchi" writes:
Wasn't bernbaum or something like that the name of the guy that left IBM and went to HP and did HPPA? Long about Iliad time?

i have to go back and check some dates ... the person i'm talking was 6-18? months after joel ... there is some folklore he spent the last two weeks (after giving notice) finishing up some work on blue iliad.

... hint
http://www.hpl.hp.com/news/2001/apr-jun/2worley.html

some of the other hp.com references that went into more detail are now 404

lets see if the way-back machine works
https://web.archive.org/web/20000816002838/http://www.hpl.hp.com/features/bill_worley_interview.html

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

[Meta] Marketplace argument

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: [Meta] Marketplace argument
Newsgroups: comp.arch
Date: Fri, 21 May 2004 22:17:17 -0600

Anne & Lynn Wheeler writes:
lets see if the way-back machine works
https://web.archive.org/web/20000816002838/http://www.hpl.hp.com/features/bill_worley_interview.html

in the endicott/801 reference above from the way-back machine ... this was in the 4381 time-frame. endicott had been a whole lot of converted one story warehouse and manufacturing bldgs ... however, brand-spanking, new, multi-story bldg. with brick facing was built for the 801 effort (or at least built in that time-frame and most of the 801 people got offices there). i contributed to the justification that killed the endicott 801 effort based on implementing 370 directly in hardware (aka 4381) ... rather than use 801 for a micro-engine and implement 370 as microcode running on the 801 engine.

that whole 801 thing in the very late 70s and early 80s was the large number of different micro-processors all over the corporation ... all requiring their unique programming. the low and mid-range 370s had all been microcode engines ... going back to 360 days. the issue going into the 4381 was that chip technology was starting to get to the point where you could consider actually doing a mid-range 370 directly in hardware. the advances in chip technology was happening at the same time the push for using 801 to replace the wide variety of micro-engines with a common 801 architecture. The 4381 issue wasn't so much that the idea for replacing all the micro-engines was bad ... but that 4381 could get much better price/performance by implementing directly in hardware (no longer using the micro-engine approach for mid-range 370).

by comparison, the 4381 predecessors; 4341 & before that 148 ... were heavily microcoded micro-processor engines ... avg. something like ten microcode instructions per 370 instructions i.e. the mip rate of the microprocessor engine had to be ten times faster than the 370 mip rate. this is akin to several of the current day 370/390 emulators running on intel platforms.

in fact, ecps for 148 ...
https://www.garlic.com/~lynn/94.html#21 370 ECPS VM microcode assist
https://www.garlic.com/~lynn/94.html#27 370 ECPS VM microcode assist
https://www.garlic.com/~lynn/94.html#28 370 ECPS VM microcode assist

there was effectively a ten times speed up for straight kernel code dropped into microcode ... on nearly a byte-for-byte bases i.e. took 6k bytes of kernel 370 code and dropped it into 6k bytes of microcode for a 10:1 speedup. the issue here was that while overall the microcode engine was simpler than 370 ... the straightline kernel code made little use of any 370 complexities.

there were two categories of ECPS speed-up ... straight kernel code dropped into microcode on nearly 1:1 basis (giving 10times speedup) ... and modifying the virtual machine execution of privileged instructions. Enhancing privilege instructions operation in virtual machine mode could bring a 40-100 times speed up ... since the change was done directly in the native microcide instruction processing routine w/o having to interrupt at all into the kernel for instruction simulation.

I would have met bill when he was share chair of the hasp group (again see wayback reference) but can't remember ... since as an undergraduate I had done a lot of changes to hasp system and was given the opportunity to present at share a number of times.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

vm

Refed: **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: vm
Newsgroups: bit.listserv.ibm-main
Date: Sun, 23 May 2004 13:48:12 -0600

richgr writes:
To answer your original question the answer is "yes, but . . .". The but is that while there is some VM here, "ibm-main" is mostly MVS & OS/390 oriented.

If you want VM, I would suggest joining the VM-ESA list which is primarily VM. There is no corresponding newsgroup (that I know of). To join the VM-ESA list:

send email to the listserv:

listserv@listserv.uark.edu

the whole bit newsgroup hierarchy originated from mailing lists on (vm-based) bitnet/earn somewhat starting back in the early to mid-80s. the mailing list processor was somewhat borrowed from a earlier computer conferencing mailing list processor from the internal corporate network.

at some point there were gateways established between the bitnet/earn mailing lists and the usenet news distribution facility. usenet which had been for the most part uucp ... over time, mostly migrated to tcp/ip and the internet.

many current ISPs don't carry all of the "bitnet" gatewayed mailing lists newsgroup ... and for the places i've run across bit.listserv.vmesa-l ... there has been almost no activity (i.e. little of the vmesa traffic is actually being gatewayed). I count over 400 newsgroups defined in the bit newsgroup hierarchy ... but the majority don't seem to be active &/or actually have a functioning gateway.

misc. past postings related to bitnet/earn:
https://www.garlic.com/~lynn/subnetwork.html#bitnet

the internal corporate network was almost totally vm-based (as well as all the networking related tools) and larger than the arpanet/internet from just about the beginning until about sometime mid-85.

a big issue for the arpanet came with the big switch-over from homogeneous networking to internetworking with technology like gateways on 1/1/83. one of the big issues for the vm-based internal corporate network was that it had effectively gateway function from its origins (and significantly contributed to the internal network ease of growth).

In various respects the JES & other mainframe family of networking shared numerous of the arpanet homogeneous limitations. The stereotype of problems with homogeneous networking was the requirement for synchronized conversion of all nodes. The internal corporate example was a mvs system being upgraded to the latest release of JES ... which was generating file-formats that was causing MVS systems in hursley to crash ... which was in turn blamed on the VM-network infrastructure. One of the tasks that had been given the corporate vm-based networking infrastructure was to maintain canonical JES header formats and the vm-gateways were given the responsibility for knowing which version JES might be on the other end of a wire and to appropriately convert JES headers to keep the MVS from crash and burn. The problem with polluted JES headers between different releases causing respective MVS systems to crash ... was just one of the features that restricted MVS systems to isolated end-nodes in the internal network.

Another common similarity between the JES networking support and arpanet was the use of a one byte field for node addressing. JES was slightly more restricted since the one byte field was also used to address all the local hasp pseudo-devices ... and JES has a nasty habit of trashing anything on the network that has an origin and/or a destination that wasn't in the local one-byte device table (which would have been a real disaster for the internal network to even consider using JES as any sort of intermediate node in the internal network; aka it would arbitrary trash traffic that had origin or destination that wasn't in its local node table .... and it could arbitrarily crash the whole machine if some traffic from a JES system at a different level happened to come by).

At the time of the 1/1/83 switchover from arpanet to internet, arpanet had approx. 250 nodes (at the limit of its addressing infrastructure) and the internal network had almost a thousand nodes. misc. related posts on the subject:
https://www.garlic.com/~lynn/internet.htm

minor reference on the web: A Social History of Bitnet and Listserv
http://www.computer.org/annals/articles/bitnet.htm

in the above, it mentions that bitnet grew slowly in the early '80s, having only 157 nodes at the beginning of 1984. The summer of 1983, the internal network had passed 1000 nodes and was well on its way to doubling that; specific reference:
https://www.garlic.com/~lynn/internet.htm#22

and here is a note from somewhere early 1984 sort of outlining earn
https://www.garlic.com/~lynn/2001h.html#65
the above somewhat refers to that fact that I took a lot of heat and blame for computer conferencing in the 80/81 timeframe.

the corporation contributed heavily to all the bitnet links inside the US ... and possibly paid for nearly all the "earn" (i.e. non-US bitnet) links outside the US

in my desk i have a clear plastic ball (about same size of baseball) commemorating the 1000th node on the internal corporate network.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

MITM attacks

Refed: **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: MITM attacks
Newsgroups: sci.crypt
Date: Sun, 23 May 2004 16:44:52 -0600

Guy Macon <http://www.guymacon.com> writes:
O.K. I will byte. :) (Note that I am not an expert, so feel free to correct the misunderstandings I almost certainly have.)

If I communicate with someone using PGP, and both of us have our keys verified with a high degree of confidence through the web of trust, and you are the man in the middle, how would you break our defense?

Enquiring minds want to know! ;)

you've sort of been sucker punched. basically all MITM countermeasures involve some (trusted) out-of-band communication ... that isn't subject to the MITM attacks of the communication channel in question.

you walk into your bank and something is exchanged that can uniquely provide unique mutual authentication. from then on, you and the bank can exchange messages based on the mutual authentication technology.

somebody else walks into the bank and something else is exchanged that also enables unique mutual authentication. you then want to communicate to this other entity ... you can securely send the message to your bank and have them securely forward to the destination.

the security business process of trust effectively works the same if the two of you exchange messages directly and the bank acts just acts public key server (somewhat akin to the yahoo ietf draft submitted last week). a vulnerability is if you are using the same exact infrastructure to establish trust with the web-of-trust keyserver ... then MITM could be attacking that also. the countermeasure is again some out-of-band information that isn't vulnerable to the MITM attacker. sometimes web-of-trust assumes that it might be able to use a suspect communication channel (prone to MITM) in multiple different ways ... in the hopes that the MITM isn't your ISP and therefor constantly operating.

However, if it is possible to demonstrate trusted mutual authentication between two different parties and a trusted 3rd party ... then it is possible to leverage that to extend mutual authentication directly between the two parties. The level of business trust isn't directly affected by having the bank be the intermediate transmission or by just having them provide the authentication infrastructure.

now attacks on this infrastructure wouldn't be MITM ... but they might be insider. a lot of existing authentication infrastructure is based on various kinds of shared-secrets, aka something you know static data. a lot of current fraud is harvesting such static data and using it to impersonate other entities. a lot of phishing email is making the email sound official enuf that consumers are tricked into believing it w/o having actual proof. So is this a MITM attack? ... or social engineering? Are all insider attacks, MITM?

minor drift, reference to recent news note about a study that is about to be published:
https://www.garlic.com/~lynn/aadsm17.htm#38 Study: ID theft usually an inside job

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Usenet invented 30 years ago by a Swede?

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Usenet invented 30 years ago by a Swede?
Newsgroups: sci.archaeology,soc.culture.nordic,soc.history.medieval,soc.history.science,comp.protocols.tcp-ip,alt.folklore.computers
Date: Sun, 23 May 2004 15:57:07 -0600

Doug Weller writes:
Somehow, on newsgroups devoted to history and archaeology, we were sidetracked to a discussion of who invented Usenet and whether Usenet rules insisted that top posting was correct (I thought I knew the answers to both.

sort of hard to have discussion of usenet w/o also having discussion of uucp ... usenet having grown up in somewhat different genre from arpanet ... lots of store&forward dial-up network (w/o end-to-end connectivity). 1980 or so, somebody tried to make a scathing comparison of the arpanet and store&forward networks .... by claiming that store&forward networks were like the postal system ... and if the postal system was run like the arpanet ... if you were to send a letter from tokyo to london ... you could only post it when it was simultaneously first shift for the origin post office in tokyo, the destination post office in london, as well as every post office between tokyo and london.

old posting in alt.folklore.computers on the uucp subject:
https://www.garlic.com/~lynn/2001b.html#57

the above has reference to the UUCP web site:
http://www.uucp.org/

misc refs from search engine using: usenet, uucp, history
http://www.vrx.net/usenet/history/hardy/
http://www.uucp.org/history/index.shtml
http://www.cs.uu.nl/wais/html/na-dir/usenet/software/part1.html
http://www.tldp.org/LDP/nag/node256.html
http://livinginternet.com/u/ui_old.htm

usenet & uucp both predate the big switch-over of the arpanet to tcp/ip on 1/1/83. somewhat an aside, 5-6 years after the switch-over to tcp/ip, the federal gov. had gosip which was mandating the elimination of tcp/ip with total conversion to osi.

recent posting today touching on some early networking
https://www.garlic.com/~lynn/2004f.html#30

the internal network (mentioned above) started in somewhat the same way the description (in the usenet/uccp references) of the origins of usenet ... except it was quite a bit earlier and between cambridge and endicott. there were going to be this new mainframe generation, the 370s ... that hadn't been announced yet and had some differences from that of 360 ... especially the virtual memory hardware being different from the 360/67. cambridge had done the original virtual machine operating system ... which included simulation of quite a few instructions. the endicott proposal was to modify the cambridge virtual machine supervisor to provide simulation of 370 defined instructions & tables rather than the 360/67 defined instructions and tables. so there was this distributed development project between endicott and cambridge.

i believe an earlier version of the software had been used to communicate between a 2250mod4 and cambridge's mainframe. 2250s were large vector graphics display and keyboard ... most of them with hardware configuration to interface to mainframe. the 2250mod4 was somewhat different, rather than a special purpose dedicated controller box ... the 2250mod4 had an 1130 computer as a controller. for this particular 2250mod4 ... somebody also had ported the pdp1 spacewar game.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

MITM attacks

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: MITM attacks
Newsgroups: sci.crypt
Date: Sun, 23 May 2004 22:47:57 -0600

Tom St Denis writes:
For example, at Crypto'03 Greg signed all of our keys. From that we "trust" that all of the keys of the participants are legit. But what if that wasn't Greg? That could have been his evil twin brother for all I know.

or maybe it was really greg ... but somebody paid him to lie or brainwashed him or was blackmailing him ... or installed viruses in all of greg's computers.

or maybe greg was clean ... but some of the people that got their keys signed ... lied and weren't really there at all. you may have been able to vet greg ... but how well did greg vet the people for every key he signed ... those people may have had evil twins.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Usenet invented 30 years ago by a Swede?

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Usenet invented 30 years ago by a Swede?
Newsgroups: comp.protocols.tcp-ip,alt.folklore.computers
Date: Mon, 24 May 2004 07:27:10 -0600

"Inger E Johansson" writes:
Never said I was the inventor, only told when I was among those who either programmed it directly or worked with it on testing bases. As one of our more distingushed participants in the group Prof Steve Glines already noted "The concept of USENET has been around since the 1960's even if the actual implementation came about ~1980." news:qcjr8c.pki.ln@mail.is-cs.com...

i actually posted a reply to this thread yesterday ... but it is possible that my ISP has some filter on posts with too many n.g. ... however a copy is here:
https://www.garlic.com/~lynn/2004f.html#32

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Questions of IP

Refed: **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Questions of IP
Newsgroups: comp.protocols.tcp-ip
Date: Mon, 24 May 2004 07:38:47 -0600

"Aashish Manocha" writes:
TCP controls your end - to -end communication. IP is at network layer, and the main job is to route the packets. In a packet switched network, the packets in the same context can take any or one path to reach the final destination, the reason may be network congestion or best route selection, or QoS. So, for IP its connection less, as it is only hop to hop.

ip is at the internetworking layer which doesn't exist in the OSI model ... in the OSI model .. it corresponds to an non-existent place between transport/layer4 and networking/layer3. ISO has actually had mandates to standards body to not work on networking things that didn't conform to the OSI model ... ip and LANs as examples; ip because it doesn't exist in OSI and therefor doesn't conform; LANs because the MAC-layer interface sits somewhere in the middle of network/layer3 ... not at an interface defined by OSI. misc. past comments about the OSI issue:
https://www.garlic.com/~lynn/subnetwork.html#xtphsp

there was the big switch-over in arpanet on 1/1/83 to ip protocol. a big feature of ip, or internetworking ... was gateways between networks .. or the inter-network protocol. prior to 1/1/83, arpanet had been packet-switched ... but a homogeneous network w/o gateways. one of the assertions about the internal network being so much larger than the arpanet from just about the start until sometime mid-85 ... was that the internal network effectively had a gateway-type facility in every node.

anyway in the internetworking sense, ip provides the routing between networks ... or the gateway function ... something that doesn't exist in the OSI model.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

MITM attacks

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: MITM attacks
Newsgroups: sci.crypt
Date: Mon, 24 May 2004 07:22:04 -0600

Lassi Hippeläinen writes:
Or maybe nobody was there. Tom's brain was hooked to a stimulator.

Next time you meet someone who claims to be Greg, apply the Samuel Johnson Test to be sure...

note that security can be viewed from two sides ...

1) is there anything absolutely sure at all in the world?

2) many insecurity issues are financially related, a significant amount can be addressed with security proportional to risk ... old post on the subject:
https://www.garlic.com/~lynn/2001h.html#61

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Why doesn't Infiniband supports RDMA multicast

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Why doesn't Infiniband supports RDMA multicast
Newsgroups: comp.arch
Date: Mon, 24 May 2004 09:45:11 -0600

Terje Mathisen writes:
There was an interesting paper about this recently, possible from here or slashdot:

Multicast using forward error correction, in the form of one (i.e. RAID style parity) or more (Reed-Solomon or similar) extra blocks transmitted along with (after) every N data blocks.

This meant that up to a given level of packet loss, every client was able to reconstruct all the original data, without having to ask for retransmissions.

At the same time, multiple clients who had all suffered non-recoverable data loss could be helped out by sending a single extra parity block as part of the regular stream.

another thing that xtp had was rate-based pacing ... about the time slow-start was published there was also a paper about windowing and slow-start being non-stable in real world environments ... minor post:
https://www.garlic.com/~lynn/2003p.html#13 packetloss bad for sliding window protocol ?

a little before the xtp time-frame there was a FEC hack (back about the time kodak bought cyclotomics, which was company doing reed-solomon products; although we did have a guy that had been graduate student at jpl/caltech and had done a lot of work on reed-solomon) everything transmitted was sent reed-solomon encoded (aka 15/16ths reed-solomon encoding getting about six orders BER improvment; on link that typically ran 10**-9, 15/16ths reed-solomon encoding would give about 10**-15 effective).

if the received block couldn't be recovered and a NAK was transmitted, then rather than retransmitting the original block ... the half-rate viterbi of the original block was transmitted. in this scenario, it was possible to have errors on both the original and the retransmission and still obtain valid contents. if the unrecoverable errors exceeded some threshold, the protocol switched to transmitting the half-rate viterbi as part of the original transmission (cutting the effective transmission rate in half ... but eliminating the latency when there was high probability of needing it anyway).

misc. past posts on the subject
https://www.garlic.com/~lynn/93.html#28 Log Structured filesystems -- think twice
https://www.garlic.com/~lynn/99.html#210 AES cyphers leak information like sieves
https://www.garlic.com/~lynn/2001.html#1 4M pages are a bad idea (was Re: AMD 64bit Hammer CPU and VM)
https://www.garlic.com/~lynn/2001k.html#71 Encryption + Error Correction
https://www.garlic.com/~lynn/2002p.html#53 Free Desktop Cyber emulation on PC before Christmas

various past posts on xtp (at one time, i was corporate rep to the xtp TAB), intermixed with comments about hsp (high-speed protocol proposal out of the xtp organization) and osi:
https://www.garlic.com/~lynn/subnetwork.html#xtphsp

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

Refed: **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Mon, 24 May 2004 09:18:47 -0600

hack@watson.ibm.com (hack) writes:
I'm not sure what you mean here. The 360 mod 20 had a 64K limit, but I doubt it ever ran the OS/360 line -- if it did, I'll be amazed (this was a 16-bit entry-level aberration). Segment size was 64K in late 370 engines (early ones had a choice of 64K or 1M); XA had a single 1M segment size -- but that was not visible to user-level programs. Same deal for physical page size (2K or 4K on early 370s, 4K only since about 1980), and again not a user-level issue. Base-displacement addressing has been limited to 4K up until very recently, when second-generation 64-bit z/Series added signed 20-bit displacements.

the os kernel sort of had a 64k barrier associated with number of UCBs (i.e. number of devices) because they used 16bit address. I have some recollection that somewhere along they way they did a hack (sorry) ... where the 16bit number became a UCB number ... multiply it times the size of the UCB and add it to a base. Then there was something later when somebody ran into a problem with having more than 64k devices.

there is still some 16mb stuff ... around ... a couple posts in the past couple days in bit.listserv.ibm-main on some qsam (file i/o) option.

there have been below the line (16mb) and above the line issues where CCWs were kept below the line because of the 24bit address field in CCWs. They could do above the line I/O ... by having the 24bit CCW address field point to IDAL (below the line) ... which supported 31bit address fields.

total aside ... 360/67 had both 24bit and 32bit virtual memory options. for 370, the 32bit virtual memory option was dropped and it only had 24bit. XA introduced 31bit virtual addressing.

while not impacting program correctness there was a problem with 2k/4k pages and the introduction of the 370/168-3 ... which doubled the cache of the 370/168-1 from 32kbytes to 64kbytes. with 32kbyte cache they had used bits in the page displacement part of the address ... allowing cache indexing in parallel with TLB virtual->real address lookup. For 168-3 they co-opted the 2kbit for cache line indexing ... as long as the machine was working in 4k page mode. If the machine was working in 2k page mode ... they disabled half of the cache (since the 2k bit might magically switch values in the virtual->real conversion). What was even worse was if you were running VM and had a mix of 4k page virtual machines and 2k page virtual machines ... the machine had to stop and wipe the cache any time there was transition between 2k & 4k page modes. There were some customers that had been running DOS/VS (2k page operating system) under VM on 370/168-1 that saw significant thruput degradation when they upgraded the 168-1 to a 168-3 (with double the cache).

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Who said "The Mainframe is dead"?

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Who said "The Mainframe is dead"?
Newsgroups: bit.listserv.ibm-main
Date: Mon, 24 May 2004 15:37:38 -0600

jsullivan@ibm-main.lst (Sullivan, John) writes:
Without starting trouble, I ask the above. I'm trying to determine when, and who, made this declaration. I've Google'd for hours and haven't gotten to the who/when.

I know that IBM used to mention analyst research from the early 1990's in their presentations demonstrating the opposite. But I can't seem to find that or the research they quoted.

big explosion in "mainframes" in the late 70s and early 80s was in the mid-range 4341, vaxes, etc. these were somewhat departmental and/or distributed machines. at one point, STL was taking over a room on every floor in every tower to put in a "local" 4341. a least one customer ordered 800 4341s, and there were a number of couple hundred 4341 orders.

by the mid-80s, workstations and larger PCs were starting to takeover the mid-range marketplace. some assumptions about every 4341 being upgraded to 4381 ... and selling additional 4381s didn't materialize. i preceived there to have been similar circumstance for vax in pretty much the same market place.

going into the mid and late 90s, some IT management seeing what had happened to the mid-range .... looked at relatively large and expensive central IT staff and wondered about whether every person and department could become their own system administrator. some of the issues was that some of the glass house industry strength had become so institutionalized that it looked easy from the outside. some executives figured that it couldn't be that difficult when there weren't more disasters and problems ... so how hard could it be to let everybody perform their own sysadm duties.

with the appearance of Y2k remediation ... there were some mainframe operations that could fill their mainframe staffing positions. some of the high-end financial institutions were paying top dollar to lure away experienced mainframe staff to help with their remediation efforts. other experienced staff were retiring having been doing the nearly the same kind of job for 30 years or more. there wasn't any backfill from younger generations ... the universities had been turning out graduates for some time on non-mainframe platforms ... and they were getting significantly better offers from the internet boom (bubble).

some institutions put in strategic conversion off of mainframe because they had staff openings for five years or more that couldn't be filled. I've mentioned in the past one gov installation that turned off the mainframe when the last mainframe support person (they had) retired.

other significant facters in the late 80s thru early 90s was terminal emulation on IBM/PCs. in the early to mid 80s ... terminal emulation gave IBM/PCs a nearly instant major market. You could replace a 3270 with a ibm/pc ... and in the same desk footprint and price range ... have both a 3270 and some local computer capability. As applications on the PCs proliferated in the late 80s ... there was a larger and larger requirements for the PCs to have high-speed access to data ... which was no longer being met by the terminal emulation infrastructure.

There were some significant internal battles where the disk division wanted to offer a far superior infrastructure offering to allow PC & workstation applications to access glass house data. This would have had a severe downside effect on the communication divisions installed terminal emulation infrastructure. At one point there was a major presentation by the disk division claiming that the communication division's intransigent position would result in the death of the disk division. One of the results was that starting in the late 80s there was an enormous increase in demand for disk capacity outside of the glass house .... accompanied with a migration of data from the glass house to local & distributed environments. This also showed up in decline in mainframe processor and disk revenue. In 1992, IBM went into the red.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Tue, 25 May 2004 07:58:21 -0600

Brian Inglis writes:
System calls are normally charged to the process, dispatching and interrupt overhead are not. System call setup is normally interruptable unless the work requires a lock and is trivial, so no DoS unless you can arrange for a deadlock between processes passing bad system call parameters, which would be an OS bug, and the time would in any case be charged to the process, plus probably an extra portion of overhead time due to the system call spinning its wheels while the process is current.

cp/67 and vm/370 always had a timer running that was charged to somebody ... even kernel threads. i/o interrupt of an application would switch from the application timer to a kernel timer ... the i/o interrupt would eventually be associated with a specific process that initiated it ... and the kernel time would be charged off to that process. when it was not possible to attribute which specific process to charge off time to ... it went to the "system" process ... and abstraction that I originally created to have an virtual address space to manage kernel program image. the "system" process typically has less than 1-2 percent of total elapsed time. by contrast some systems calculate "system" time as that time that isn't otherwise accounted for.

scheduling is basically calculating some sort of priority for a process and ordering the process into a list ... of of that time is charged to the process.

since the various lists are kept in order (and charged to specific process) ... selecting a kernel thread basically involves pulling the first off the list ... which gets charged to that kernel thread. selecting a virtual machine is basically pulling the first off the list ... which gets charged to that virtual machine.

the original cp/67 kernel that i worked with in '68 did have some dispatching structure that sporadically did all sorts of re-arranging things based on timer events ... that was independent of any requirement ... and was accounted for to "system overhead". It was basically proportional to the number of processes and independent of any work being done. For 30 virtual machines ... it was consuming ten percent of total elepased time. It was not attractive for two reasons 1) it was unrelated to any actual work going on in the system ... and i have a principle that activity should be proportional to the work and 2) it was extremely heavy handed amount of overhead. I made a joke at the time that I had an objective to be able to accomplish the same effect in zero instructions.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Tue, 25 May 2004 08:12:19 -0600

Brian Inglis writes:
The 12 bit offset is a fixed constant < 4K assembled into an instruction, with a base and often an index register also available. The main use for the fixed constant offset is for field displacements in structures; other variable offsets can be handled in the base and index registers.

as per recent extended postings in a.f.c. regarding location free program images ... the 12 bit offsets are relative displacement values that are combined with base register value (and possibly index register value for some instructions) to form effective address.

original 360 architecture for instructions went to some amount of trouble that nearly all instructions in a program image would operate regardless of the location of the program image.

the problem that i had in doing some read-only shared segment code involving program images ... was early 360 operating system convention of "relocatable adcons". these were address data carried in the program image that pointed to module entry points, data structures, etc. ... and were relative displacements in the image out on disk ... but as part of bringing the disk image into memory for execution, the loader swizzled the relative values into absolute values based on the address location that the program image was loaded.

I had done this enhancement to the cms filesystem to support paged mapped operation ... and also added a lot of bells & whistles to support allowing arbritrary objects to occupy read-only shared segments common to multiple different virtual address spaces. The problem was to eliminate the loading swizzle that occurred moving program image from disk to memory because 1) trivial page-mapping between a virtual address space and a program image in the filesystem, didn't want to have the page interrupt handler to have to worry about whether there was data in a page that had to be swizzled or not and 2) different processes might choose arbritrary virtual address space locations to load the program image ... and if it was loaded as a read-only shared segment ... the address constants couldn't manage to have arbitrary different values simultaneously.

lots of previous postings (some recent) on location independent program images:
https://www.garlic.com/~lynn/submain.html#adcon

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Tue, 25 May 2004 08:21:24 -0600

Terje Mathisen writes:
Many years ago I wrote some Pascal-VS, plus a tiny bit of asm afair, for a file transfer program that ran between our IBM mainframe and PCs connected via a 3270 protocol emulator.

that pascal compiler was original developed by pickens and weber in the los gatos vlsi lab using metaware's tws. pickens later left and went to work for metaware ... which supplied the c-compiler for aos (bsd on the pc/rt) and some number of other projects. weber left for a startup and the to mips and then sun.

several years ago ... i had an assignment to port a 50,000 vs/pascal application to another vendors platform. there were a couple off issues

1) vs/pascal had some number of extensions ... that didn't exist in the other vendors platform,

2) i don't believe the other vendor's pascal ever had been used with reasonably sized application before

3) the vendor had outsourced their pascal work to an operation that was something like 12 time-zones away, i just couldn't drive by and drop in on them to discuss problems.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

can a program be run withour main memory ?

Refed: **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: can a program be run withour main memory ?
Newsgroups: bit.listserv.ibm-main
Date: Tue, 25 May 2004 08:40:31 -0600

ibmlax@ibm-main.lst (Lax TS) writes:
I wanted to know can we run a program directly from a secondr storage device like a hard disk. When a load module is loaded into memory , the processor picks up the instruction one after the other and executes it. The same can take place using a hard disk. I am NOT BOTHERED about the speed ad response time of the running of the progam. I just want to know if this is possible.

almost anything is possible ... in programming, it is just SMOP ... small matter of programming.

i've read stories about some early computers having rotating drums as memory and that was where program were executed from ... so not only is it possible ... that was how some of the early computers worked.

however, the question sounds either like a really bad troll or homework ... since a little use of search engine should turn up the descriptions of early computers that had rotating drum memory.

however, there is another issue ... about relocatable adcons ... which i periodically rant about. i had done this page-mapped enhancement for the cms file system and support to have arbritrary objects loaded as read-only shared segments in different virtual address spaces ... potentially at different virtual addresses.

for the most part the 360 genre has gone to a great deal of trouble that program images are location independent ... i.e. the base+displacement convention has no dependency on program location for execution. the problem is that the convention with relocatable adcons is that they are relative in the program image out on disk ... but get swizzled into a location specific constant when the loader brings them off disk into memory.

the issue was actually two fold ...

1) with the page mapped filesystem it would be convenient to just set up the virtual memory tables and point to object in the filesystem ... and not have the paging supervisor have to worry about swizzling random locations in the pages when they are brought into memory

2) if the same program image happens to occupy a read-only shared segment ... and be mapped to arbritrary different locations in different address spaces ... it would require that the swizzled absolute address constants be able to take on multiple different values simultaneously (possibly requirement for a little quantum computing).

so to have program images that could be loaded from paged mapped filesystem by the paging supervisor (w/o requiring loader swizzle of relocatable adcons into absolute adcons) and to have the mapped program image be truely location independent (potentially mapped simultaneously to different locations in different address spaces) .. I had to hack the code to use relative adcons ... adcons that were used as displacements analogous to the base+displacement paradigm used in instructions.

random past postings about page mapped filesystem
https://www.garlic.com/~lynn/submain.html#mmap
random past postings about location free program images
https://www.garlic.com/~lynn/submain.html#adcon

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

Refed: **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Tue, 25 May 2004 09:35:23 -0600

"Stephen Fuld" writes:
Yes. Since S/360 (except for "aberations" like the /67) lacked any form of address translation hardware. So when a program was loaded, the non-relocatable addresses had to be "adjusted" (as you point out below) to reflect where in real memory the program was loaded. That meant that once a program was loaded, it couldn't be moved. So MFT had "rollout/rollin" which moved an active program to disk so that you could run something else, but when it was rolled in, it had to go to the same physical place as it came out of. Minimizing the number of places where the loader had to do the adjustment was a performance winner.

boeing huntsville had a two processor 360/67 ... which they hacked a version of MVT release 13 to use the translation hardware. MVT didn't do actual any paging ... it was just used to re-arrange addresses. One of the problems was not only needing to have fixed addresses but to have contiguous addresses. They had a lot of "long running" applications ... basically interactive stuff with people at 2250 displays (large vector graphics with keyboard and other things frequently used for complex design projects). They weren't trying to use more application memory than they had real memory (so no paging) ... but MVT had problem with storage fragmentation with long running programs ... and the translation hardware was used to simulate contiguous memory in environments where real storage was getting severely fragmented.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Tue, 25 May 2004 11:45:31 -0600

the 360 instruction architecture allowed for displacement+register(s) for generating effective address ... and therefor had location independence. it is possible to have a convention on 360 where "addresses" that are in storage locations (rather than part of insturctions) are displacements rather than absolute. instead of having a convention were an address was loaded from storage location and used immediately as an effective address ... have a convention where the storage location value is treated as a displacement ... that needs to be combined with some register value to from an effective address (in a manner analogous to what instructions do).

then you can have an image on disk that is location independent and can be moved into memory w/o any swizzling of storage values. it is then also possible to have shared, r/o images occupy virtual memory segments ... where the same program image segment doesn't have to be bound to a fixed address location in every virtual memory address space.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Finites State Machine (OT?)

Refed: **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Finites State Machine (OT?)
Newsgroups: comp.arch,sci.electronics.basics,sci.electronics.design
Date: Tue, 25 May 2004 14:42:40 -0600

Casper H.S. Dik writes:
"leaf subroutine" is also used on SPARC (leaf subroutines don't bother to use a register window and they return with "retl", return from leaf subroutine)

the original cp/67 kernel that i got in jan '68 had all intra-kernel linkages via 360 supervisor call. the kernel convention was that the svc call interrupt routine would dynamically allocate a savearea for the called routine ... also do a little bookkeeping and trace entry for debugging. the svc return would unallocate the saveaea and return.

one of the pathlength things i did was go thru all of the kernel and identify all leaf routines. i identified these and modified the kernel call macro to recognize a leaf routine was being called and do a BALR in place of an svc8. i then defined a fixed (unused/reserved) location in page zero for temporary register save ... this then eventually came to be called "balrsave" (so the leaf routine saved caller's registers in page zero temporary area rather than in a passed save area).

the next thing was to go thru and identify all non-leaf routines that only made calls to leaf routines ... these then became sort of 2nd order leaf routines. these were also modified so that the caller used BALR in place of svc8. however, these routines instead of using "balrsave" for temporary save of the caller's registers used an adjacent area that became to be called "freesave".

For various reasons, the svc8/svc12 calling convention originally took approx 275microseconds on 360/67 (per call) ... it was possible to optimize that down to about 100microseconds by recoding some of the stuff used for debugging purposes. Several of the leaf routines were high frequency calls and performed operations on the order of hundred microseconds or less ... and therefor the svc8/svc12 calling convention was on the order of half that processing time.

The svc call to BALR change picked up something like 20-30 percent of (remaining) kernel time ... on a kernel that I had already optimized to pickup something like 80percent with fastpath changes described in previous posts
https://www.garlic.com/~lynn/2004f.html#6 Infiniband - practicalities for small clusters

the earlier 80percent kernel overhead optimization (presented at fall '68 Atlantic City share) had included various interrupt and dispatching fastpath as well as special case fastpath for various virtual machine simulation operations. It also included the reduction in the svc8/12 call/return overhead from 275mics to around 100mics ... but didn't include the BALR call changes for leaf routines.

The BALR call changes were done the following summer (of 69), when I got con'ed into going to Boeing (student summer employee with a fulltime management job classification level and a badge that let me park in the management parking lot at corporate hdqtrs next to boeing field) to help get BCS setup and operational. That summer, I also did the first version of dynamic adaptive fair share scheduler, the global LRU page replacement, and the hack that allowed portions of the cp kernel to be non-resident and pageable.

scheduler refs:
https://www.garlic.com/~lynn/subtopic.html#fairshare
page replacement refs:
https://www.garlic.com/~lynn/subtopic.html#wsclock

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

Refed: **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Tue, 25 May 2004 15:09:21 -0600

"Stephen Fuld" writes:
But virtual address translation and having the ability to relocate programs in physical memory aren't the same thing. Some (non-S/360) systems had a (OS only manipulatable) hardware base address register to which all "program" addresses were added. Then the OS could relocate the program in physical memory by changing the contents of the base register. This allowed swapping but didn't allow programs larger than physical memory.

several of the 360 & 370 machines had it also ... it just wasn't part of the 360 architecture and wasn't used for general applications.

basically for the past 15-20(?) years the mainframes have had a subset of the virtual machine facility built into the microcode of the hardware called logical partitions (LPARs, something is currently the rage for being duplicated on a lot of other hardware platforms). they are virtual machines ... except they aren't paged/swapped ... and the memory of these vritual machines are contiguously allocated with base/bound hardware specification (and all the special case virtual machine handling is done by the machine microcode).

however the "do-relocate" (or "do-interpretive"? ... DIL?) instruction (which provided base/bound) from the 360/370 days was used exclusively by emulator applications. It was supervisor diagnose x'83' opcode with parameter that selected the DIL(?) function (most of my information on this is 2nd or 3rd hand).

The only exception that i know of was an IBM system engineer (on the boeing account) got a highly modified version of CP/67 running on the 360/50 at the Seattle IBM datacenter (occupied the glass front room on the first floor that currently has ???? jeffory, jasper ... some company that starts with a J (at least the last time i drove by). His work was early 70s before virtual memory was announced on 370s.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Random signatures

Refed: **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Random signatures
Newsgroups: gnu.emacs.gnus
Date: Tue, 25 May 2004 16:05:10 -0600

Max Quordlepleen writes:
Last night I went googling for "random signature" in gnu.emacs.gnus, looking for a ready-made few lines I could steal and drop into my .gnus. I found a nice simple looking phrase that apparently worked for the person who posted it, but I can't get it work for me. I would be interested to learn what I've screwed up with this little snippet:

been a couple years ... i posted one that picked up signature from yow/zippy file.

example of past post with zippy signature
https://www.garlic.com/~lynn/93.html#5 360/67, was re: IBM's project F/S?

and post to gnu.emacs.gnus, 2001
https://www.garlic.com/~lynn/2001b.html#77 Inserting autom. random signature
https://www.garlic.com/~lynn/2001b.html#78 Inserting autom. random signature

yow had a problem that it used a 16bit random number to select a byte position in the file ... and then move to a saying boundary. this only worked for files that were 64kbytes or less ... it had to be patched if you had more than 64kbytes of random sayings.

could also check other postings to the thread in this n.g. from 2001

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

can a program be run withour main memory?

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: can a program be run withour main memory?
Newsgroups: bit.listserv.ibm-main
Date: Tue, 25 May 2004 23:14:50 -0600

Efinnell15 writes:
Their word length (and that of their successors the 709x's) was 36 bits, which held 6 BCD (sic) characters. Somewhere along the line the 709x's came out with 360 tape format. We got a reel of transistor models from Sandia for ECAP or PCAP and somebody had their bits backwards and they wouldn't load from the original tape. So we're down at the Krystal comiserating our plight and the best programmer in town walks in and says "you guys look like death." He'd gone over to the Dark Side with Burroughs but had made RAX stand up and talk when he had it. So we told our tale and he starts writing out the code on those crispy Krystal napkins. Few minutes later we're back at the keypunch and first run we got models!

the first student job i had was to implement the 1401 mpio (front-end to the university's 709) program on 360/30. the university 709 had something like 20,000(?) tubes ... and i have vague recollection somebody saying that the air conditioning unit was enourmous ... i believe something like "20 ton"(?) capacity.

1401 mpio did unit record<->tape front end for the 709 which was primarily used to compile/execute student fortran jobs under ibsys (running tape-to-tape).

they moved the unit record and seven track tapes to 360/30 .. and ran it in 1401 emulation ... when I wasn't trying to get the 360/30 version of mpio running. I got to invent my own interrupt handler, device drivers, task handler, storage management, console interface, etc. I eventually had approximately a box of cards (2000) program; maybe 1500 instructions or around 5k bytes.

on 360/30, the 2400 tape control CCW op-code (this reading off my handy gx20-1703-7 green card, not from memory):


        DDMMM011

DD   density
00   200
01   556
10   800
11   ---

MMM   mode modifer
000   no-op
001   not used
010   reset condition, set density, set odd parity, data converter on,
      translater off
011   ... 9track only ...
100   density, even parity, data converter off, translater off
101   density, even paritiy, data converter off, translater on
110   reset condition, density, odd parity, data converter off, translater off
111   density, odd parity, data converter off, translater on.

typically if you couldn't read a seven track tape ... it was issue of trying different combinations of density (200, 556, 800 bits per inch), parity (even/odd), and possibly data converter and translater options.

in straight bcd mode, 80 byte "card" images, the six bits of each byte (7 tracks, one parity track) basically was straight character ... with bcd a subset of of ebcdic. for binary, there were 160 byte "card" images on tape.

reading cards (for card->tape) was a little harder ... basically, the 2540 reader on the 360 read cards in ebcdic mode, never getting more than 8bits out of a column with 12rows (and bcd wouldn't punch more than 6bits out of a column with 12rows).

instead of doing a read, feed, select stacker ccw op-code ... you did a read ebcdic (11000010/C2 ccw opcode). 2540 ebcdic reads all 12 rows of each column but then has rules for mapping between 8bit ebcdic and 12 rows ... basically no more than six rows are punched in any one column (and the rules for mapping of six out of twelve rows provides for 256 possible values).

If there are punch hole combinations that violate the ebcdic mapping, then an error is generated. I then reread the card with column binary ccw (11100010/D2 ccw opcode). For this there are two six bit values mapped to each 12row column and are read into 160 bytes of 360 memory. I've talked to some people since then who wrote programs that read all cards as binary ... and then looked at the resulting 160 bytes in memory to see if it was really bcd or binary ... and could program reconstruct an 80byte bcd image from a 160 column binary image.

Then I did a separate feed/select stacker once the card was successfully read.

Then the card record was written tape ... either 80byte bcd record or 160byte binary record.

tapes coming off the 709 could have a mixture of card output (both binary and bcd), but mostly printer (bcd) character output.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Blinkenlights?

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Blinkenlights?
Newsgroups: alt.folklore.computers
Date: Wed, 26 May 2004 07:18:27 -0600

"Charlie Gibbs" writes:
One of the displays available on the Univac 90/30's rollers was the length of the current seek on the currently-selected disk (if it was attached to the integrated drive adapter). Other LEDs on the same display indicated which disk was selected. It was great for spotting disk thrashers, especially after the 8418 drives came out (which replaced the 8416's glass cover with a solid one so you couldn't see the heads moving anymore).

one of the famous light indicators was on one of the 370/145 rollers. putting the roller to display the current PSW; there were little mnemonics on the roller for each light corresponding to a bit in the PSW.

this was before virtual memory had been announced for 370 ... but all the 370/145 rollers at customers had a roller which carried the legend "xlate" for one of the PSW bits.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

before execution does it require whole program 2 b loaded in

Refed: **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: before execution does it require whole program 2 b loaded in
memory
Newsgroups: bit.listserv.ibm-main
Date: Wed, 26 May 2004 07:43:09 -0600

ibmlax@ibm-main.lst (Lax TS) writes:
can a program start executing even before it is fully loaded into memory. For example, if program A consists of 10 pages, then does it start execution even if only the first 2 pages are loaded in memory ?

I remember some one telling me that while execution of a program a part of the program can be in memory and the program is executed by paging etc, but before execution it requires the whole program to be loaded in memory. If enough memory not available to load the whole program then it fails. Is this true and why do you have to load the whole program before execution ?

is this another one of those homework questions?

there are several different aspects to the question.

to load a program requires sufficient available contiguous address space. back before os supported virtual memory ... that required congiguous real storage. with initial vs2/svs ... they used virtual memory with a single 16mbyte real machine ... with the kernel laid out in the first 8mbytes and the remaining 8mbytes of the (only) virtual address space supposedly available to all concurrent executing applications.

in the transition to vs2/mvs ... each program got its own 16mbyte virtual address space ... however the os convention has always been that the kernel and various other support services occupy the application's address space. that still required the kernel taking up 8mbytes of the 16mbyte address space available to the application. However, there were these other support services that still needed addressability. For this they defined a common area (in addition to the 8mbyte kernel) that occupied each address space that was needed by various support services. for many installations this could be 4mbytes ... leaving only 4mbytes in each 16mbyte address space for application. application might consists have lots of different independently loaded programs ... and there could be situation where an application totally filled the available 4mbyte area ... and additional programs couldn't be loaded.

this has gotten to be less of a problem with introduction of 31bit (and then 64bit) virtual address spaces (although there may still be application situations that have limitations to only running below the 24bit/16mbyte line ... in which case there could be still be congtiguous address space issues).

i've recently made a lot of lengthy postings in other n.g. on the issue of program images on disk being directly loadable by the paging system and/or program images being truely location independent. I had done some filesystem extensions that supported page mapped capability ... collection of past postings on page mapped filesystem:
https://www.garlic.com/~lynn/submain.html#mmap

the original 360 architects did some fine work such that instructions (with base+displacement architecture) in program image were position/location independent. however, the original os/360 implementation took some short cuts and defined for address pointer convention something called relocatable adcons. In the program image out on disk, this storage locations all have values that are effectively displacements (and are position/location independent) ... but a requirement of mapping the program image to address space is a "loader" process that brings the program image into memory and then runs around converting all the relocatable adcon storage locations to absolute pointers (before program execution can begin).

this has a downside that the actual running program image isn't really position independent and that there has to be this preloader process. lots of past posts on the relocatable adcon problem with simple page mapping program images in a page mapped filesystem:
https://www.garlic.com/~lynn/submain.html#adcon

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Leaf routines

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Leaf routines
Newsgroups: comp.arch
Date: Wed, 26 May 2004 14:00:00 -0600

glen herrmannsfeldt writes:
All OS/360 compilers I knew always saved all registers and chained save areas.

and you could get pli to dynamically allocate save areas (instead of using static area) by declaring recursive ... aka
XADD: PROC(XBPTR,XTPTR) RECURSIVE REORDER;

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Thu, 27 May 2004 08:05:34 -0600

"Stephen Fuld" writes:
Either we are talking about different things or one of us is wrong (could be me). At some level, you have to get to a physical address. You can use displacements from some base, but you have to then load the base into a register to add the offsets to. In many programs this was accomplished by a BALR at the beginning of the program. So far, OK. This buys you position independence at program startup. But now consider what happens when the program has run half way through and the OS wants to swap it out and reload it into a different physical location. Now the contents of that arbitrary register, when added to the same displacement will produce an address that would have been correct before, but now is incorrect and the wrong data is loaded. What am I missing?

original relocatable adcons were designed for a (single) physical address machine that had (almost) no swapping (some stuff called roll-out/roll-in ... but required same physical address). the kernel and applications all occupied the same physical address space ... and paradigm extensively used pointer passing. in any case, there was no reloading to a different location.

the first pass at mapping this particular operating system to a virtual memory hardware was to define a single virtual address space that was full 24bit/16mbytes (maximum) that ran on a machine with much smaller real memory. effectively much of the kernel and applications were operating as if they were on a real 16mbyte machine. virtual memory hardware was used to arbitrarily mapped virtual address space pages to real storage pages ... but that was transparent to both the application and quite a bit of the kernel. most of the kernel and the applications operated as if they were still in physical address machine, but the physical storage available was equal to maximum addressable storage (16mbytes).

the 2nd pass enhanced this implementation so that the virtual address spaces were replicated one per application. from the application standpoint it looked as if it was the only application running on a 16mbyte real machine with the first 8mbytes occupied by the kernel. shared segments were used so that same kernel image was mapped to the first 8mbytes of each address space. there were some transition problems tho. there were some amount of system subsystem functions that were sort of treated like unique applications but there was still the convention that "real" applications passed them pointers as part of service requests.

However, with subsystem applications in their own address spaces ... it was difficult to make use of a pointer that originated in a totally different address space. So they created a kludge glue interface, called the common area ... a portion of "application" space (above the 8mbyte kernel line) tha was common to all application address spaces. You could temporarily put some data in a common area, initialize a pointer to it ... and make a request to a service running in a different address space ... to preserving the pointer passing paradigm from the original (single) physical address implementation.

A problem with this implementation was that the common area kept getting bigger and bigger as you tried to multiplex a large number of different applications and subsystem services. It got to a point that the kernel occupied 8mbytes (of 16mbytes), the common area needed to be 4mbytes, and the maximum application space was now down to 4mbytes. were still stuck in a 24bit/16mbyte address space.

So the next generation machine introduced dual-address space ... special instruction that allowed a semi-privileged subsystem running in one virtual address space to reach across and access data in a different virtual address space (preserving the pointer passing paradigm). Also on this generation machine, they introduced physical storage larger than 16mbytes (& 24bit addressing).

The dual-address space was later generalized with access registers and program calls. basically hardware support for fastpath call through the kernel. The number of subsystem services operating in different address spaces continued to grow ... but they wanted to preserve the efficiency of the original direct BALR calling convention ... where application code could directly call library subroutines and other functions as if they were in the same address space ... using pointer passing convention. Arbitrary address space switching nominally was a kernel call function because of misc. privilege checking that has to occur along the way.

The enhancement was program table supported by hardware features and access registers ... along with a program call instruction. The program call instruction operated at nearly branch&link performance (w/o requiring a interrupt into the kernel) but attempted to instantiate the privilege checking rules for address space switching in tables & hardware. So you load up some number of different address space table pointers in control registers as well as table semantics for program call. You know can put all sorts of subsystem services and library routines in arbitrary address spaces and have them invokable by applications in other address spaces using pointer passing convention.

So this is a long winded pre-amble to saying that the definition of what an address is has gotten a lot more complex than the original definition that there was a one-to-one mapping between what the application thought was an address, what the kernel thougght was an address, and what the hardware thought was an address. There are multiple kinds of addresses. The principles of operation
http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/DZ9AR004/CCONTENTS?SHELF=EZ2HW125&DN=SA22-7201-04&DT=19970613131822

now has a whole section talking about it, address types and formats:
http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/DZ9AR004/3.2?SHELF=EZ2HW125&DT=19970613131822

that covers ten different addresses:


    3.2.1.1       Absolute Address
3.2.1.2       Real Address
    3.2.1.3       Virtual Address
3.2.1.4       Primary Virtual Address
3.2.1.5       Secondary Virtual Address
3.2.1.6       AR-Specified Virtual Address
    3.2.1.7       Home Virtual Address
3.2.1.8       Logical Address
    3.2.1.9       Instruction Address
3.2.1.10      Effective Address

... so my original issue wasn't so much about what does an address mean ... but that I wanted the exact program image out on disk to not have to change when it was loaded into memory for execution (regardless of the reason or cause). I wanted to implement a pagemapped file system where i could just map virtual memory to a program image on disk and let the paging system (simply) do the rest. The problem was that the relocatable adcons that were part of program image out on disk had to be physical modified when the program image was brought into storage before execution.

Physically changing the contents of the program image also had secondary effect of also changing the logical meaning of those storage contents ... but the first order problem was the requirement that the program image was being physically changed (at all) for execution (resulting in it being different than the program image on disk).

the second issue was that the storage contents representing the relocation adcons became specific to the virtual memory location of the executing program image. Independent of what the meaning might be for that storage location ... the physical changes had the secondary effect of binding that specific executable program image to a specific virtual address space location.

So the infrastructure goes to a great deal of trouble of allowing an application be able to load any program image at any (initially real, but now virtual) address location. However, in large real storage infrastructures and virtual memory, with multiple concurrent applications, there is some throughtput advantages of allowing different applications that want to make use of the same program to be able to share the same program image in real storage (saving having to create unique copies of the same program image for each address space). After going to all of the infrastructure and paradigm complexity to have relocatable adcons ... there is now a situation ... that the first application to load a program image will "fix" the location for that program image ... and if any other application were going to be able to share the same program image ... it can now only use it at the address fixed by the first application to load that program image.

So we get into some inter-application and inter-address space complexities. So two applications in private address spaces start loading program images in arbitrary and possibly different order ... but still want to have a significant common physical sharing of the program images. Either there has to be a common application convention pre-reserving specific addresses for every program image (for every possible program image across all possible applications and address spaces) ... or the applications have to use the same exact program loading sequence ... or some other convention ... so the address selected in each address space by each application for every program image somehow comes out to be the same.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

[HTTP/1.0] Content-Type Header

Refed: **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: [HTTP/1.0] Content-Type Header
Newsgroups: comp.protocols.tcp-ip
Date: Thu, 27 May 2004 08:16:59 -0600

Barry Margolin writes:
BTW, I wouldn't expect to find this type of information in the RFC. The RFC specifies the protocol for communicating between the client and server. How the server decides what documents to serve, or determines their content types, is outside the scope of the protocol.

there are some RFCs that define MIME protocol.convention ... somewhat independent of conventions of what specific applications are expected to do with specific content-types. then there are some RFCs covering what might be expected for applications to do with some specific content-types.

so
https://www.garlic.com/~lynn/rfcietff.htm

and under RFC's listed by select Term (term->RFC#)

and scroll down to
Content Distribution Internetworking
3570 3568 3466
Content Negotiation
see also content-type
2738 2703 2534 2533 content-type
see also multipurpose internet mail extensions
3798 3730 3676 3625 3560 3534 3464 3462 3420 3391 3335 3274 3261 3236 3204 3156 3023 3003 2987 2958 2957 2938 2913 2912 2910 2876 2854 2738 2703 2646 2633 2586 2565 2543 2534 2533 2530 2518 2506 2442 2426 2425 2392 2388 2387 2376 2318 2301 2298 2231 2220 2184 2112 2111 2083 2077 2048 2046 2015 1896 1895 1894 1892 1874 1873 1872 1741 1590 1563 1523 1496 1437 1049

clicking on rfc number brings up the RFC summary in the lower frame.

clicking on the ".txt=nnnn" in the RFC summary retrieves the actual RFC.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

Refed: **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Thu, 27 May 2004 08:58:11 -0600

... addenda ...

in the earlier implementations (first physical/real ... and then single virtual address space) all concurrent applications and program images occupied the same address space (wether real or virtual). there was no issue that different applications could choose different addresses for the same program image, precluding being able to share the same program image. because there was a single address space, all application use and all program image locations were coordinated (in the same address space).

it was the move to giving different applications their unique address space that things got into trouble. the paradigm was still left over from the single address environment that different applications could arbritrarily load program images in any order and at any time ... and any sharing issues would automagically be taken care off.

however with applications running in their unique address spaces ... you sort of lost any inter-address apace coordination of address selected for program image loading. applications in one address space were willy-nilly loading program images at first available address ... just like they had in the single address space implementation; however there was no longer the implicit fall-out that the location chosen to load a program image by one application would necessarily always be the same address used by all applications (which was an implicit side-effect of running everything in a single address space).

so we go back to the early 70s. The cp67/cms environment had always given each application in the CMS environment its own address space. Somewhat because for a long time the total CP67 and CMS architecture, design, development, implementation, test and support organization numbered around 15 people ... and that there was a lot of similarity between the virtual memory 360/67 and the rest of the 360 line ... CMS heavily plagerized compliers, applications, programs, etc, written for os/360. CMS even included an OS/360 simulator. There was even a later joke ... that given OS/360 had a documented API ... that the 40kbyte CMS implementation of the OS/360 API didn't suffer all that badly in comparison to the 8mbyte MVS implementation of the OS/360 API (there were some parts of the OS/360 API that CMS wasn't quite able to do in 40kbyte implementation that MVS managed to accomplish in its 8mbyte implementation of the OS/360 API).

Now, I did both a pagemapped filesystem implementation for CMS and an infrastructure that allowed arbitrary specification of images to be shared across different CMS virtual address spaces (in principle, any object in the page mapped filesystem could be invoked as shared across multiple different virtual address spaces). So I ran into the problem thirty some years ago with relocatable adcon paradigm both

1) greatly complicating the page mapping of program image on disk to the executable program image in memory

2) greatly complicating coordinating sharing the same executable program image across multiple different address spaces

Somewhat side-track ... the official virtual memory operating system for 360/67 was TSS/360, which didn't survive. For several years of the early development period (mid-60s) the combined CP67/CMS group was 12 people and the TSS/360 group was soemthing like 1200 people. TSS/360 did implement support for program image on disk being the same as the executable program image in memory ... and that neither the program image on disk nor the executable program image was address location fixed.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Phone # layout vs Computer keypad # layout???!???!!!

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Phone # layout vs Computer keypad # layout???!???!!!
Newsgroups: alt.folklore.computers
Date: Thu, 27 May 2004 14:51:57 -0600

Roland Hutchinson writes:
When I were a lad in La-la Land (area code 213), there were regional toll- free numbers with the prefix ZEnith. I can't recall ever having seen or used a phone with the letter Z over the zero, but you did have to dial the operator (zero) to place a call to a ZEnith number.

when i was a lad ... the area that i was in had one of the last human telephone operators around ... you picked up the phone and told the operator a three digit number (or just who you wanted to talk to) ... or the other way of stating it was it was rather late in getting automated CO switch. then we had to dail three digit prefix followed by a four digit number .... aka had to go to a seven digit number from a three digit number (better than double) ... although my mother still has that same number (so i don't often forget it).

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Text Adventures (which computer was first?)

Refed: **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Text Adventures (which computer was first?)
Newsgroups: alt.folklore.computers
Date: Thu, 27 May 2004 14:54:17 -0600

stanb45@dial.pipex.com (Stan Barr) writes:
One of the best emulators I've seen is Bernhard Baer's PDP-8E emulator, gives you a real looking graphical front panel with blinkenlites and working switches - the only thing missing is the sound of the ASR-33s! Needs a Mac (traditional or modern) though...

and the exercises needed to keep your fingers strong enouf to type on the keyboard (although lots of typing in itself was pretty good strength training).

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

Refed: **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Thu, 27 May 2004 16:01:07 -0600

"Stephen Fuld" writes:
Yes MVS. I snipped the rest of this post that gives the long contorted history of how IBM took many steps to deal with trying to overcome what I consider to be some bad initial decisions. It worked, but became very complex and hard to understand.

the other way of looking at it ... was that there were some systems actually designed for virtual memory. the problem with the os/360 genre was that they sort of slowly and somewhat ... (dare i say it?) ... bumbled .... their way into it over an extended period of years (couple decades) ... always trying to maintain capability with their real-storage memory heratige from the mid-60s.

sometimes in the past there have been comments about why didn't ibm have a virtual memory system like somebody or another ... and in a couple cases, i would point out that not only did ibm have such systems ... they might have had significantly more customer installations than whatever vendor was being cited. the problem was the combination of the number of all such systems was totally dwarfed by the shear magnitude of the number of ibm batch real-memory oriented systems.

it is hard to say that the real-memory batch orientation was turely bad ... since so many customers appeared to have successfully used it. while i may criticize them for not having done a perfect job of adapting from real storage to virtual storage ... it is hard to claim that it was truely bad ... from the standpoint that customers applications continued to run ... all of those machines that did things like payroll checks and airline reservations and random other things didn't stop. it would have been really bad if things stopped working and people didn't get their checks or some number of other bad things. the tortuous path of getting from mid-60s to now ... with the ability for some of the binaries from the period to be able to continue to execute nearly fifty years later ... wasn't perfect but it worked. some large part of the complexity is due to trying to maintain compatibility with the past.

the official mainstream was supposed to have gone 360, new generation of memory technology in real-memory 370s (relatively minor cosmetic differences from the real-memory 360s) and then FS or future systems.

As i've mentioned before, my take on FS at the time was it had similarities to the film playing down at central sq and the inmates being in charge of the asylum ... every thing that anybody had ever thot of was thrown in ... with total lack of any practicallity. One of the things that helped get FS killed was a study that calculated that if the absolute highest performance technology of the day was used to build an FS machine ... the FS technology would cause something at least a 10:1 performance degradation; an FS machine made out of 370/195 technology would run applications at 370/145 thruput.

I've claimed several times that one of the things that was prime motivation for RISC in the 70s was the strong reaction to the FS debacle ... and swinging the pendelum from the absolutely most complex CISC, one-level store, hardware, to the opposite extreme ... as KISS-simple as possible.

One could claim that the whole os/360 stumbling into virtual memory wasn't exactly anticipated ... because officially there was going to have been a complete replacement with FS (much more radical change than the change made by the 360 introduction). The canceling of FS left a lot of the company scrambling (or for some, transferring to rochester).

not every idea in FS was totally impractical .. however many were, while others were just totally impractical for the time ... ... but there didn't seem to be any judgement about what was actually do'able and what wasn't.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Thu, 27 May 2004 16:25:16 -0600

"Stephen Fuld" writes:
I think we are in agreement here. I think Brian agrees also that the techniques Lynn mentioned won't work if you allow relocating the program after it has started execution.

the techniques i used worked for the program image ... whether it started executed or not ... i.e. those techniques are used to allow the same program image to exist simultaneously in a number of different execution contexts. the issue is that the techniques doesn't allow a specific execution context ... which includes the program image as well as the general purpose registers to take on a different address location once execution.

at the moment there isn't much of a requirement to allow a specific execution context to be able to have shifting, changing address locations (outside of the virtual memory scenario) ... which would require changing not only the program image paradigm but also the register addressing conventions. while the technique i describe is necessary to allow the same exact program image to occupy several different places concurrently ... i.e. the image on disk is exactly the same as the image in an address space and the image in any specific address space is exactly the same as the image in all address spaces.

the technique i describe would also be used if you were to allow randomly shifting address locations for specific execution context ... and while it could be necessary for support having a specific execution context randomly change ... it wouldn't be sufficient ... since a specific execution context is also made up of other items which would also have to be adapted to allowing randomly changing address location.

now, address translation hardware is one technique for allowing a specific execution context to have randomly changing physical location ... by imposing a hardware layer between the logical address generated by instruction and a physical effective address.

however, the address translation is necessary for allowing a specific execution context to have randomly changing real address locations ... but it isn't sufficient for allowing the same exact (not changed) program image on disk to be used for execution in an address space (at least not with the relocatable adcon paradigm). The address translation hardware is also not sufficient to allow the same exact program image from disk to be arbitrarily loaded by multiple different applications running concurrently in multiple different address spaces. One possible work around would be to pre-establish a unique fixed address location for all possible program images that all applications can agree to before hand. Then whenever an application is loading a specific program image into their address space ... they won't preclude the same exact program image from being used concurrently by some other address space.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

Refed: **, - **, - **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Thu, 27 May 2004 16:50:15 -0600

addenda to techniques

... another way of describing it is that address translation hardware allows a specific execution context to occupy randomly changing physical memory locations.

... the relocatable adcon scenario allowd multiple different applications to share a common program image when there was a single address space ... either a single real address space from real-memory os/360 days or from single virtual memory address space from os/vs2 svs (single virtual storage ... as opposed to the later mvs ... multiple virtual storage).

the relocable adcon scenario didn't allow that the program image on disk be the same as the executable program image ... but that didn't matter as much for a non-pagemapped filesystem ... you had to do fiddling to get the program image into the address space (real or virtual) in any case.

the scenario that comes up with relocatble adcons is

1) having the same exact program image on disk that executes in an address space (makes pagemapped filesystem program loading somewhat simpler/KISS)

2) and having the same exact program image loaded into arbitrarily different virtual address spaces (in the os/360 genre introduced with os/vs2 mvs).

real-memory os/360 and svs allowed that a (reentrant) program image could be loaded into memory (real or virtual) and the same exact program image could be simultaneously used concurrently by multiple different applications.

going to multiple different virtual address spaces ... either in the mvs scenario or the cms scenario that has existed from the start ... the ability to share the same exact program image (aka have a single physical copy in real storage) becomes problamatic unless the program image has no position dependencies ... or some convention has been pre-established assigning a unique position to every possible program image.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

Infiniband - practicalities for small clusters

Refed: **, - **, - **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: Infiniband - practicalities for small clusters
Newsgroups: comp.arch
Date: Thu, 27 May 2004 17:30:22 -0600

"Stephen Fuld" writes:
I think we are in agreement here. I think Brian agrees also that the techniques Lynn mentioned won't work if you allow relocating the program after it has started execution.

oh, and if you are interested in more discussion ... there is actually a longer, more detailed thread currently running in bit.listserv.ibm-main.

to get the same program images to exist concurrently in multiple address spaces ... they are loaded them into the "linkpack" area. The linkpack area originated in the real-memory os/360s days as sort of a cross between dynamically loadable kernel extensions and static program image caching. In mvs, in the same way that the kernel image occupies every address space ... linkpack is present in every address space. since there is a common kernel interface that serializes the loading of program images into linkpack ... every linkpack loaded program image is assigned a unique address and also is allowed to occupy every address space (as opposed to just those address spaces that were specifically interested in loading that program image).

if you are interested the thread you are looking for appeared to start out as a troll or a homework question ... but has generated quite a bit of posting ... if you use google search you would be looking for the
"before execution does it require whole program 2 b loaded in"

thread.

The answer might be implied if the loader has to run thru arbitrary and random locations in the program image swizzling all relocatable adcons before allowing program execution to start.

part of most recent thread just took a slight detour discussing whether all possible relocatable adcons were full-word aligned and therefor couldn't cross a page-boundary ... possibly affecting what a loader might or might not have to do with respect to processing relocatable adcons before starting program execution.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

before execution does it require whole program 2 b loaded in

Refed: **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: before execution does it require whole program 2 b loaded in
Newsgroups: bit.listserv.ibm-main
Date: Thu, 27 May 2004 16:59:57 -0600

Chris_Craddock@ibm-main.lst (Craddock, Chris) writes:
Can't be. ADCONS are always word aligned and will never "cross" a page boundary.

CMS convention has violated that since possibly 1965(?). the original kernel call mechanism in cms was an svc 202 with a pointer in register one that pointed to a parameter list. the first token in the parameter list was the requested function. it turns out it is the same interface used for typed commands and shell commands (one of the things that made cms so easily extendable was the consistency of its execution interface). in any case a cms kernel call was either


svc 202
                  dc  al4(address)
instructions


svc 202
                  instructions

if there was an error, the return would check for a byte of zeros following the svc and assume it was an address and branch to the address location. if there wasn't a zero, it would assume there was no error handling and go off to a default system provided one.

if there was no error, it would check for a zero following the svc, and if it found one it would return to the svc interrupt address plus four (i.e. skipping the presumed adcon). if the byte following the svc wasn't zero, it would just return to the address of the svc interrupt.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

before execution does it require whole program 2 b loaded in

Refed: **, - **, - **

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: before execution does it require whole program 2 b loaded in
Newsgroups: bit.listserv.ibm-main
Date: Thu, 27 May 2004 17:04:34 -0600

etech@ibm-main.lst (Eric Chevalier) writes:
Gil then posted a bit of sample code showing an ADCON that DID cross a page boundary. This was followed by a bit of discussion to the effect that word aligment of his ADCON was suppressed by an explicit length specifier.

note that the original virtual machine system was circa 1965 with a custom modified 360/40 with virtual memory translation hardware. cp/40 and cms was built for this machine. when ibm started shipping the standard virtual memory 360 product, 360/67, cp and cms was ported to the cp/67 (360/67 address translation had a lot of similarities to 370 address translation, however the custom 360/40 hardware translation control was quite a bit different).

in any case, cms has been using the dc-al4() convention following a two byte svc instruction for just short of 40 years ... and probably half of them are half-word aligned ... and some percentage happen to cross page boundaries.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

before execution does it require whole program 2 b loaded in

From: Anne & Lynn Wheeler <lynn@garlic.com>
Subject: Re: before execution does it require whole program 2 b loaded in
Newsgroups: bit.listserv.ibm-main
Date: Thu, 27 May 2004 17:09:17 -0600

john_w_gilmore@ibm-main.lst (john gilmore) writes:
Moreover, as Paul well knows, Chris knows these things. There is thus more empty rhetoric than substance in Paul's arguments: Ordinary A-type ADCONs do not cross page boundaries, and the production of pathological counter-examples does not change this.

the cms standard scenario couldn't really be considered pathelogical ... the dc-al4() convention following a two-byte svc202 instruction was totally embedded in nearly every kernel module and all application code. at one time it was the ONLY way that programs could be invoked in cms.

--
Anne & Lynn Wheeler | https://www.garlic.com/~lynn/

previous, next, index - home