Received: from PIZZA.BBN.COM by BBN.COM id aa27731; 3 Jan 94 19:25 EST Received: from pizza by PIZZA.BBN.COM id aa29263; 3 Jan 94 19:11 EST Received: from BBN.COM by PIZZA.BBN.COM id aa29259; 3 Jan 94 19:08 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa27279; 3 Jan 94 19:08 EST Received: by ginger.lcs.mit.edu id AA06294; Mon, 3 Jan 94 19:08:02 -0500 Date: Mon, 3 Jan 94 19:08:02 -0500 From: Noel Chiappa Message-Id: <9401040008.AA06294@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: New datagram mode Cc: jnc@ginger.lcs.mit.edu I know everyone's got a lot of back mail, but I'd appreciate it if you didn't all file that (admittedly long, sigh) message from last week about the new datagram mode in the round file.... I'd like to think that this has plugged a major leak in the Nimrod bottle, and I'd like to make this a major forwarding mode, but so far, I've not gotten any feedback from the WG at large at all. I don't want to take a major jump like this without *some* reaction... :-) Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa25824; 4 Jan 94 0:33 EST Received: from pizza by PIZZA.BBN.COM id aa00224; 4 Jan 94 0:16 EST Received: from BBN.COM by PIZZA.BBN.COM id aa00220; 4 Jan 94 0:13 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa25318; 4 Jan 94 0:13 EST Received: by ginger.lcs.mit.edu id AA06785; Tue, 4 Jan 94 00:12:58 -0500 Date: Tue, 4 Jan 94 00:12:58 -0500 From: Noel Chiappa Message-Id: <9401040512.AA06785@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: "Virtual Link" flows Cc: jnc@ginger.lcs.mit.edu One minor thing to beware of is that when sending a packet down a virtual link using source routing, if that virtual link is itself composed of virtual links, you more or less have to have a real flow for the top level virtual link, otherwise you could run into problems with needing a stack of "flow identifiers" in the packet. (I have said this poorly, I just wanted to note it down before I forgot it again... I thought of it some months ago and lost it until just now.) Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa25969; 4 Jan 94 0:44 EST Received: from pizza by PIZZA.BBN.COM id aa00281; 4 Jan 94 0:22 EST Received: from BBN.COM by PIZZA.BBN.COM id aa00277; 4 Jan 94 0:20 EST Received: from necom830.cc.titech.ac.jp by BBN.COM id aa25493; 4 Jan 94 0:21 EST Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 4 Jan 94 14:16:45 +0859 From: Masataka Ohta Return-Path: Message-Id: <9401040517.AA17715@necom830.cc.titech.ac.jp> Subject: Re: mobility and NIMROD To: Noel Chiappa Date: Tue, 4 Jan 94 14:16:44 JST Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM In-Reply-To: <9312281747.AA08649@ginger.lcs.mit.edu>; from "Noel Chiappa" at Dec 28, 93 12:47 pm X-Mailer: ELM [version 2.3 PL11] A happy new year. I'm bac. > > Yes, but it's an ungainly mechanism which is not the best way to > > accomplish what I want. > > It is gainful so that no area ID is necessary. > > There's no free lunch. If you get an area ID by either i) using the set of > ID's of all constituents, or ii) using the ID of one of its constituents, > e.g. the numerically lowest one, you have the problem "what do you do when the > constituent leaves the area". That's why I think that the mobility should be separated from nimrod. > I'd like areas to have identities which are > longer-lasting than that of any particular consitituent. I mean, turning off > one router at the top level could change the "locator" for millions of hosts > underneath it. This is not workable. As I have shown with DNS registration, hosts needs to register in DNS only the direct upper level routers (except for glue information, in which case some (not all) globally reachable pathes should be provided). Moreover, the registration does not have to be changed immediately after some configuration change, as long as some pathes registered in DNS are also actually usable. > > I think areas should have identities which are somewhat more durable > > than the transient list of what border routers it has > > I think your Internet Draft ... mentions that an area ID many vary when > the area is subdivided. > There are two problems with this. First, what do you do when a level 2 area > parititions (I know, I know, you assign the UID's to the networks :-), This is the case where the area ID of the set of all the border routers behaves better than the area ID of least EID. With the set ID, you should try to reach one of level 2 routers whose EID is listed in the routing table of the sender. If the routing table contains separated sets of level 2 border routers, the sending host can locally recongnize that a partitioning has occured. Then, if the sending host randomly choose a host, then, the level 2 router may, reply with ICMP (or simething new like that) that some level 1 router is unreachable because of partitioning, in which case, other level 1 routers should be tried. The sending host may, instead, consult with the level 2 routers about the list of reachable level 1 routers. > and > second (and this is the one that makes me look at it with disfavour), all the > hosts inside the "new" K level area have new locators, which is probably not > practical. Thus, the locator must change dynamically, which is being taken care of, quite naturally, by routing information exchange. > Probably we'll have to use some other partition repair method, like > tunnels (a la IS-IS). > > I don't know though, now that I think about it: there are probably some > partitions you can't fix with tunnels. In that case, you almost *have* to > accept that the topology change will have visible consequences, i.e. new > locators. Still, that's not so bad; if we have a mobile-host mechanism where > we can find out that an EID has a new locator.... :-) I don't think anyone in this WG have worked out a workable moility scheme. Please don't expect too much on DNS TNG, unless you engineer the details of it. > > Suppose a router crashes, or is removed from service, etc. Is the "area" > > still the same, or not? > > Nooo! If an area is split into two areas, it can't be the same, of course. > > Sorry, I happen to think it's not reasonable to say that a top-level area has > fundamentally changed (affecting the millions of hosts inside it) because of a > minor detail of the topology has changed.... Even in the case where an area is > split into two (perhaps by a permanent partition), part of the area ought to > be able to get on with life without a big upheaval. I don't think partitioning of a area could be handled with compact area ID without much static settings. > > If a host H inside Q is using R3 as part of its "locator" for certain > > conversations, and R3 is turned off, it would be nice if the traffic > > could flow around the failed router without H havign to get a new > > "locator" and send it along to the hosts which are in communication with > > it. > > No problem. We should give all the possible pathes including R4 to the > lower layer. > > Ah, but it's easy to find scenarios in which this doesn't work! Consider this > one: H is part of an area which connected to the rest of the world through R1. > H starts a conversation, using only R1 as the lowest layer in its locator, then > R2 is brought up and connects the area to the rest of the world, after which R1 > is shut down. So, H should register both R1 and R2 to DNS. DNS provides the information on all the possible pathes to H. The selection of the best path is done by routing tables, ICMP and such, perhaps in the kernel. > Again, I'd things to have locators which transcend (to some degree) changes in > the topology. Obviously, as I have mentioned at length previously, as the > topology changes, the particular abstraction hierarchy you have chosen will > become non-optimal (in terms of minimizing the sum of costs of the routing), > and eventually you will want to modify the abstraction hierarchy. However: > > >>> I'd like to make the binding between the topology and the abstraction <<< > >>> hierarchy a little looser than in your scheme, so that the change to <<< > >>> the latter to match the former is *controllable*. In system architect <<< > >>> terms, this is the *fundamental* problem with your idea. <<< > > > >> And you have unsolvable problems of EID->locator mapping > > > I keep hearing about this "problem". It's a problem which does not exist > > except during a deployment phase (when we will see traffic from > > unmodified hosts which contains on the original 32-bit IP "address"). > > It does exist forever. > > The percentage of traffic for which I wil have to perform this operation will > decline oevr time to a very small share of the traffic. Since it's a small > share, it doesn't matter if the mechanism to do it isn't very good. Anyway, > it's not "unsolvable", it's simply a translation directory problem. It's a > close match to the "white pages" problem, since entities which are neighbours > in the directory may be otherwise unrelated. The directory must thus be > maintained by an organization which is equally trusted by both entities. I'm afraid "white pages" is another bad example of you. Whiter pages are books. They are broadcasted globaly but quite slowly. Moreover, to broadcast the white page information, routing information must be available along the broadcast pathes. Thus, routing information can not be like "white pages". > From where, do you think, the EID and the locater be obtained? > Where means "the EID and the locator" of the information source, of course. > > In the same way that you have to have the IP address of some DNS server before > you can start using the DNS to translate host names to IP addresses, you'll > have to have the locator (the EID is not strictly speaking needed; you can use > the "all IP endpoints" broadcast EID) of a translation server before you can > start looking up (EID, locator) tuples. Good. That's one of the reason why mobility can not be handled by dynamic DNS TNG. The address of translation server must be configured statically by hand, which can not change dynamically (purely theoretically, just as root name servers, they could be configured dynamically, but...). Broadcasting is not useful, unless not in a very local environment, in which case, you anyway need static configuration somewhere. So, with broadcasting, I think, you are only replicating the current CATENET model (or the subnet model). > Actually, I forgot something in my last message; for multi-homed hosts, you > will get back an (EID, interface-0-locator, i-1-locator, ... i-N-locator) > list. (Gee, Dave, S-expressions again! :-) I'm afraid the expression is O(level^2) lengthy at best. BTW, how, do you think, the information could be registered to DNS or DNS TNG? > That is, EID->locator mapping must be static, which exclude explicit > support for mobility. > > No. If any host wishes to modify it's EID->locator binding (i.e. go mobile), > the host has to notify all entities with which it is in communication of its > new locator; The problem is that, if, by some fault, a mobile host can not notify its new location beforee it moves, it can't do so forever, as the new location is not registered and thus is unreachable. > i.e. tell them to update their EID->locator binding (I don't > really think of it as a "mapping", since to me that term implies a more > general mechanism than is available here.) So, it is NOT the solution adopted by mobility WG. > > The EID and locator are a tuple, and should be carried around together; > > the binding between them is only broken in special cases (such as mobile > > hosts). > > As I have shown in my solution, the binding can not be broken even for > mobility. > > I don't understand why you can't change the binding? That's the *whole point* > of a binding; to allow the relationship of the two objects to be changed. > (Have you read Prof. Saltzer's paper, "On the Naming and Binding of Network > Destinations"? It is available as RFC-1498. Everyone on this mailing list > should read it! :-) So, a name server should provide information for the mapping from name to all the possible pathes. The best available path should be selected by routing information. > >> multi-homed area and mapping for area names > > > I didn't catch what you referred to; could you give a few more details? > > How, do you think, assign area names? > > I'm still not sure what you mean, but let me see if this is what you are > referring to. To use a real example, if an area such as "MIT" is connected to > several different long-haul carriers (let's call them "A", "M" and "S" :-), if > you make the MIT area part of just one of those areas, it may bias incoming > traffic through that long-haul carrier? To solve this, you want to make the > MIT area appear in all three of the A, M and S areas? Correct. > If this is what you are referring to, it is a bit of an issue, but as you > yourself pointed out, solving this by assigning multiple locators (which is > effectively what you are doing, although I admit you have picked a notation > which is fairly compressed) can be an exponential problem. If each area has R > border routers, and is connected to N areas above it, the K'th layer will > contain (N^K)*R EID's (if my mental math hasn't gone wacky). Correct. That's why multi-homing should be discouraged. > It can also take a tree structured notation to solve; if carrier A and M > belong to different global consortia, each will have parent "areas", but *none > in common*. So, your simple notation won't work any more; each K level area > will have a disjoint set of higher level areas above it.. My notation still works. If a level K is multi-homed, level K border routers should put DNS information about EIDs of all the border routers at the directly upper level, at least. For example: for the name of a triplly-homed router at K-th level: . . . . . . should, at least, be registered in DNS. > The "multi-homed" site problem is *routing* problem, and I am convinced that > any attempt to solve it purely in the addressing will fail. The right solution > is to use the locator to find out where the thing is on the map, and then > make sure the map has enough info in it to *show* the multi-homing; it's then > up to the entity picking the route to make use of that multi-homing. Again, all the possible route should be (hopefully in compact fashion, as I have exemplified) registered in DNS, within which the best one should be selected as the routing problem. > The best border routers will be known through routing information, of > course and the selectiton will be made from all the possible border > routers in the lower level routine at the source. > > Any time you are selecting a set of, or single routers, I have to ask the > question "how do you recover when that router fails". Rely on ICMP, ICMP TNG or time-out. > So, a packet contains only 8 EIDs for the source address. > > Well, actually K, where K is the number of layers in the hierarchy. This is > still a fair amount of data, though... it's a factor of 4 to 8 times longer > than locators built up out of locally assigned numbers. True, only when you enforce static hierarchy of area IDs, which should, IMHO, be avoided at any cost. Anyway, I don't think we need more than 4 layers. With a thousand lower lever areas at each level, we can accomodate 1 Tera hosts. > > Also, as noted above, I do think that using a list of the border routers > > as the identification of an area is not the best choice either. > > Then, what is the alternative? > > You have to assign a label for the identifier from some other namespace, and > make the area somewhat independant of the consituent elements. As I pointed > out, for fundamental reasons, this is the *only* viable choice, since you need > to decouple the abstraction hierarchy from the physical topology to some > degree. You might have decoupled the entire issue, but, then, you must solve the all the issues in all the decoupled part. So, you should engineer how DNS TNG could be. > > Also, I get very uneasy every time people try and stuff this kind of > > thing into ARP. I'll pass on the ARP lecture for now... > > I think the solution is neat enough. If some network do not have ARP, > it should have some alternative, which should also be simulated by HR. > > Why? ARP is a kludge. It's there simply because other things are wrong. If > physical addresses could be carried in locators, it would never have been > invented. Why should we not carry physical addresses around in locators? ARP is THE solution within a small area where broadcasting is allowed. > ARP is popular because *it is another layer of binding*, Correct. It is a binding at the datalink layer. > which can be used to > provide flexibility which is needed, but not otherwise provided! (The Butler > Lampson quote about "All problems in computer science can be solved by another > level of indirection" comes to mind!) So, the important point is that, with gloally unique 48 bit MAC addresses of Ethernet, absolutely NO static settings are necessary at the datalink layer. That's why I thin NBMA, which requires much static setting, is nothing. The other important factor of datalink layering is that, selection of alternative path is not present at the layer, which simplifies everything. > However, it is deficient in that it only > operates over a limited scope. Separate EID and locator namespaces, and a more > flexible EID->locator binding, would do everything that ARP does, and a lot > more besides. > > **DOWN WITH ARP**! As long as we need datalink layer, we need broadcasted ARP. > Classical SH should be allowed with triangulation, of course. > > Apparently there are some problems with this. Current TCP implementations do > not seem to work well mobile hosts; as it was explained to me, there are > problems with various timing and retransmission algorithms geting confused > by losrt packets, timing variances, etc. The issue, if any, must be solved with policy-rich routing, anyway. With a sender initiated polity, quadrateral routing will be common. > The solution like smip has been the only engeeringly plausile solution to > me as a DNS engineer > > There are some DNS experts (Rob Austein, Dave Bridgham, etc) who are part of > this WG who don't think this is all totally crazy, so I'm a bit surprised to > hear this, although I'm quite ready to believe you. I think, in DNS WG, after Amsterdam IETF, I have presented enough evidences on why, contrary to common belief, mobility can not be handled with conceptual DNS TNG. > However, we built DNS, and > if it's wrong, we'll change it. If there are fundamental problems, we do need > think about them, of course, and perhaps change course as a result. So, I'd > like to hear what you think the problems are. The fundamental reason why DNS (or DNS TNG) can not be so useful is that it can, at best, provide information of name->EID/locator binding at the time of the query (caching make the matter worse which is not so essential). On the other hand, mobility and rouing must be able to handle the change of name->EID/locater binding even after the name look up. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa27351; 4 Jan 94 1:56 EST Received: from pizza by PIZZA.BBN.COM id aa00509; 4 Jan 94 1:35 EST Received: from nic.near.net by PIZZA.BBN.COM id aa00505; 4 Jan 94 1:34 EST Received: from nic.near.net by nic.near.net id aa27079; 4 Jan 94 1:35 EST To: Masataka Ohta cc: Noel Chiappa , nimrod-wg@nic.near.net Subject: Re: mobility and NIMROD In-reply-to: Your message of Tue, 04 Jan 1994 14:16:44 +0200. <9401040517.AA17715@necom830.cc.titech.ac.jp> Date: Tue, 04 Jan 1994 01:34:27 -0500 From: John Curran -------- ] From: Masataka Ohta ] Subject: Re: mobility and NIMROD ] Date: Tue, 4 Jan 94 14:16:44 JST ] ... ] Anyway, I don't think we need more than 4 layers. With a thousand lower ] lever areas at each level, we can accomodate 1 Tera hosts. Please do not assume uniform distributions. Utilizations of .1 % to 10 % are much more realistic, depending upon the administrative and operational backpressure applied via policies. ] > [Noel] ] > You have to assign a label for the identifier from some other namespace, and ] > make the area somewhat independant of the consituent elements. As I pointed ] > out, for fundamental reasons, this is the *only* viable choice, since you ] > need to decouple the abstraction hierarchy from the physical topology ] > to some degree. ] ] You might have decoupled the entire issue, but, then, you must solve ] the all the issues in all the decoupled part. ] ] So, you should engineer how DNS TNG could be. Functional separation of the entire issue allows consideration of several different approaches to the problem in side-by-side comparision. There is no reason why home server, DNS TNG, hierarchical multicast, and out-of-band communication models cannot be considered for the binding function. Please do not presume that the identification of EID and locator elements mandates DNS TNG for binding. ] The fundamental reason why DNS (or DNS TNG) can not be so useful is ] that it can, at best, provide information of name->EID/locator binding ] at the time of the query (caching make the matter worse which is not ] so essential). ] ] On the other hand, mobility and rouing must be able to handle the ] change of name->EID/locater binding even after the name look up. I am not a fan of DNS for the role of locator binding, but I will point out that there is a recent DNS Dynamic Update ID which addresses many of these issues. If you'd prefer doing it with home mobility servers, then simply replace home address with EIDs and current adddress with current locator, etc. Do you see any reason why the separation of EID and locators prevents use of a mobility-server based scheme for binding? /John   Received: from PIZZA.BBN.COM by BBN.COM id aa27603; 4 Jan 94 2:01 EST Received: from pizza by PIZZA.BBN.COM id aa00553; 4 Jan 94 1:52 EST Received: from nic.near.net by PIZZA.BBN.COM id aa00549; 4 Jan 94 1:50 EST Received: from necom830.cc.titech.ac.jp by nic.near.net id aa28404; 4 Jan 94 1:51 EST Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 4 Jan 94 15:47:15 +0900 From: Masataka Ohta Return-Path: Message-Id: <9401040647.AA18246@necom830.cc.titech.ac.jp> Subject: Re: mobility and NIMROD To: John Curran Date: Tue, 4 Jan 94 15:47:13 JST Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@nic.near.net In-Reply-To: <9401040631.AA18184@necom830.cc.titech.ac.jp>; from "John Curran" at Jan 4, 94 1:34 am X-Mailer: ELM [version 2.3 PL11] > ] Anyway, I don't think we need more than 4 layers. With a thousand lower > ] lever areas at each level, we can accomodate 1 Tera hosts. > > Please do not assume uniform distributions. Utilizations of .1 % to 10 % > are much more realistic, depending upon the administrative and operational > backpressure applied via policies. So? 0.1% of 1 Tera is 1 Giga. > ] You might have decoupled the entire issue, but, then, you must solve > ] the all the issues in all the decoupled part. > ] > ] So, you should engineer how DNS TNG could be. > > Functional separation of the entire issue allows consideration of several > different approaches to the problem in side-by-side comparision. There is > no reason why home server, DNS TNG, hierarchical multicast, and out-of-band > communication models cannot be considered for the binding function. Please > do not presume that the identification of EID and locator elements mandates > DNS TNG for binding. I don't presume so. Any solution, which actually works, is OK. So, what is the workiable alternative? > ] On the other hand, mobility and rouing must be able to handle the > ] change of name->EID/locater binding even after the name look up. > > I am not a fan of DNS for the role of locator binding, but I will point > out that there is a recent DNS Dynamic Update ID which addresses many of > these issues. The ID is mainly on incremental transfer. So far, so good. ut, it poorly addresses the issue of dynamic update, which, I think, should be removed. Though I have requested several times on the defition of the "dynamic update", none has given. So, recently, in DNS WG, I, myself, has worked out how the difinition (and engineering solution) be and why it can't be applicable to mobility. > If you'd prefer doing it with home mobility servers, then Doing what? > simply replace home address with EIDs and current adddress with current > locator, etc. That's what is virtually done with smip. > Do you see any reason why the separation of EID and locators > prevents use of a mobility-server based scheme for binding? "Prevents"? Since nimrod and mobility are orthogonal and totally unrelated, nothing will be prevented. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa27840; 4 Jan 94 2:15 EST Received: from pizza by PIZZA.BBN.COM id aa00635; 4 Jan 94 2:03 EST Received: from nic.near.net by PIZZA.BBN.COM id aa00631; 4 Jan 94 2:01 EST Received: from nic.near.net by nic.near.net id aa28858; 4 Jan 94 2:02 EST To: Masataka Ohta cc: jnc@ginger.lcs.mit.edu, nimrod-wg@nic.near.net Subject: Re: mobility and NIMROD In-reply-to: Your message of Tue, 04 Jan 1994 15:47:13 +0200. <9401040647.AA18246@necom830.cc.titech.ac.jp> Date: Tue, 04 Jan 1994 02:02:52 -0500 From: John Curran -------- ] From: Masataka Ohta ] Subject: Re: mobility and NIMROD ] Date: Tue, 4 Jan 94 15:47:13 JST ] ] > ] Anyway, I don't think we need more than 4 layers. With a thousand lower ] > ] lever areas at each level, we can accomodate 1 Tera hosts. ] > ] > Please do not assume uniform distributions. Utilizations of .1 % to 10 % ] > are much more realistic, depending upon the administrative and operational ] > backpressure applied via policies. ] ] So? 0.1% of 1 Tera is 1 Giga. My apologies, I was not sufficiently clear. At _each_ allocation/delegation level, you can expection utilizations in the .1 % to 10 % range (depending on the policies which are followed.) I would not underestimate the waste of functional identifier space that results from political and administrative loss. /John   Received: from PIZZA.BBN.COM by BBN.COM id aa03980; 4 Jan 94 4:24 EST Received: from pizza by PIZZA.BBN.COM id aa01001; 4 Jan 94 4:10 EST Received: from BBN.COM by PIZZA.BBN.COM id aa00997; 4 Jan 94 4:08 EST Received: from necom830.cc.titech.ac.jp by BBN.COM id aa02947; 4 Jan 94 4:09 EST Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 4 Jan 94 18:04:55 +0859 From: Masataka Ohta Return-Path: Message-Id: <9401040905.AA18501@necom830.cc.titech.ac.jp> Subject: Re: New datagram mode To: Noel Chiappa Date: Tue, 4 Jan 94 18:04:53 JST Cc: nimrod-wg@BBN.COM In-Reply-To: <9312311841.AA26054@ginger.lcs.mit.edu>; from "Noel Chiappa" at Dec 31, 93 1:41 pm X-Mailer: ELM [version 2.3 PL11] > I mentioned in some mail to the IETF list that I have a new idea for > how to do the datagram (i.e. non-flow) mode in Nimrod. I thought loose source routing is the way to go. > I think it produces > a more efficient datagram mode than I had hitherto imagined. In fact, it may > be even more efficient than the existing "hop-by-hop" model! Strange. Why, do you think, next hop decision by flow ID is more efficient than next hop decision by EID of the destination (or, in the source routing case, the next intermediate router). > Source routes using these virtual entities can be guaranteed to be non-looping > overall by a recursion process: the top-level source route does not contain > loops (at least, unless the source is stupid, and choses a path with one :-); > each entity which advertises a virtual link is required to make sure that the > implementation of that entity does not create a loop. I think you are assuming all the routers in the world is producing the correct information about the topology with which a source route path is selected and all the routers on the path is implemented without any fault. > On thinking about the details, an obvious mechanism suggested itself. > If all packets contain a "flow-id" field, *which will be unused in datagram > packets*, the obvious thing is to store the flow-id of the VLF in that field. As a virtual link is merely loosely source routed and have hierarchical structure, you can override the flow-id field only at the top level. Anyway, > I realized that one minor bug is that if > we define the global flow-id in the packet to be the source EID plus a > source-local flow-id, this idea doesn't work (since you'd need to bash the > source EID to the router at the start of the VLF, or something), so perhaps > the flow-id field can't overload the source EID. I think the bug is fatal. > (It turns > out that flow-setup can even provide "denial of service" protection, It can't. An attacker can advertise false routing information with which source route could be constructed. The attacker will, then, drop the actual data packets on the advertised route. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa04339; 4 Jan 94 8:47 EST Received: from pizza by PIZZA.BBN.COM id aa01872; 4 Jan 94 8:38 EST Received: from BBN.COM by PIZZA.BBN.COM id aa01868; 4 Jan 94 8:36 EST Received: from mitsou.inria.fr by BBN.COM id aa03823; 4 Jan 94 8:36 EST Received: by mitsou.inria.fr (5.65c8/IDA-1.2.8) id AA19241; Tue, 4 Jan 1994 14:40:04 +0100 Message-Id: <199401041340.AA19241@mitsou.inria.fr> To: Noel Chiappa Cc: nimrod-wg@BBN.COM Subject: Re: Maps and meshes in the real world In-Reply-To: Your message of "Thu, 23 Dec 1993 17:56:39 EST." <9312232256.AA18830@ginger.lcs.mit.edu> Date: Tue, 04 Jan 1994 14:40:03 +0100 From: Christian Huitema Noel, I have some sympathy for your "mesh" approach but I believe we should be a bit careful with the advances in technology. I was conducting recently a review of the state of the art in IGPs. Well, both RIP and OSPF have one weak point in common -- they don't handle asymetric links. So what? Well, what is a mesh in 10 years? Something like fibers and colors, I guess. Chances are it will be widely asymetric, e.g. you get a laser which is tuned on exactly one emit wavelength, and a set of filters that listen to another set. Not the complete set, that would be too expensive; some form of routing can be used to relay from one wavelength to another. The bad news is that neither RIP nor OSPF can handle this kind of map! Christian Huitema   Received: from PIZZA.BBN.COM by BBN.COM id aa09063; 4 Jan 94 10:25 EST Received: from pizza by PIZZA.BBN.COM id aa02243; 4 Jan 94 10:06 EST Received: from BBN.COM by PIZZA.BBN.COM id aa02239; 4 Jan 94 10:04 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa08247; 4 Jan 94 10:04 EST Received: by ginger.lcs.mit.edu id AA07418; Tue, 4 Jan 94 10:04:26 -0500 Date: Tue, 4 Jan 94 10:04:26 -0500 From: Noel Chiappa Message-Id: <9401041504.AA07418@ginger.lcs.mit.edu> To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp Subject: Re: New datagram mode Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM I thought loose source routing is the way to go. Well, I reckon we should still have source routing (strict and loose), since it has capabilities this new mode lacks, but as I pointed out in the note, source routing does have real disadvantages, and this mode is an attempt to avoid them. > it may be even more efficient than the existing "hop-by-hop" model! Strange. Why, do you think, next hop decision by flow ID is more efficient than next hop decision by EID of the destination (or, in the source routing case, the next intermediate router). First, I didn't think anyone is proposing doing routing based on the EID. Since there is no topological information in the EID, that would require either i) some setup in the routers which are expected to handle packets to that EID, which would effectively be a flow setup with the flow-id being the destination EID, or ii) a translation step (from EID to locator), which would be very inefficient. Even with shortish fixed length locators (a la SIP), which are simply looked up in a routing table, this would *still* produce more efficient forwarding (in the non-active routers, once the DMF has actually been set up). Such hop-by-hop routing uses a "longest match" lookup in the routing table, whereas if the flow-id is at a fixed offset in the packet, a more efficient lookup and forwarding, perhaps even purely in hardware, is possible. Source routing requires looking around in a variable length object to locate the next thing to router to, and once you have done that you still have to go through the equivalent of either of the two options above, so I can't see how it's any improvement. > Source routes using these virtual entities can be guaranteed to be > non-looping overall by a recursion process I think you are assuming all the routers in the world is producing the correct information about the topology with which a source route path is selected and all the routers on the path is implemented without any fault. The first problem will be detected when the active router goes to set up the DMF flow; if it tries to use a link which doesn't exist, etc, it should get back an error message. The second problem can to some degree be avoided; as I said: It turns out that flow-setup can even provide "denial of service" protection, and although each individual active router along the path can provide a denial of service protection on its part of the path, this still does not give complete end-to-end denial of service protection, since a compromised active router can deny service. "Denial of service" attacks refer some intermediate router failing, either accidentally, or deliberately, to forward packets as it claims it can and will. The "deliberate nastiness" model is useful to think about, since you don't have to think about failure probabilites; it's 100% certain that the worst case humans can think of will happen. Note, however, that systems which have good denial-of-service protection can stil fail, since the real world almost always creates scenarios which the most rugged systems have not protected against... A way to avoid denial of service is for the entity which set up the flow to send a certain % of traffic which is "test" traffic. This is traffic which *looks* to intermediate nodes like real user traffic, but by prearrangement between the ends is not. (If you're clever, you can even use real tarffic for this, but that intertwines network level denial-of-service-detection mechanisms in with the application....) The ends use this to make sure their traffic is getting through. If more than a certain % (or all of it, if the d-o-s guy is being dumb :-), you set up a new flow which takes a different path. There are strategies (binary search type things) to quickly discover a single node which is the problem, etc. As I said, these techniques can be applied by active routers to discover malfunctioning non-active routers, and set up a new DMF which bypasses the trouble, but a broken/compromised active router will still cause failures. That's the inevitable price of not "doing it all at the source". Even this level of protection still costs (in complexity and packet traffic), but that's a local cost/benfit tradeoff, not a system architecture question. It is also better than what you get with hop-by-hop, though... However, all these said, if you start thinking about arbitrary implementation errors, almost any scheme can be made to fail. There most resilient against errors appears to be end-end flow setup, which can even work in the face of fairly arbitrary denial-of-service attacks in the middle, although obviously not ones which leave no viable path between the source and destination. > If all packets contain a "flow-id" field, *which will be unused in > datagram packets*, the obvious thing is to store the flow-id of the VLF > in that field. As a virtual link is merely loosely source routed and have hierarchical structure, you can override the flow-id field only at the top level. I'm not sure I quite followed this? If I understand your meaning, you refer to the problem I alluded to in that brief messages recently, where I mentioned that unless high-level virtua links are actually instantiated as flows, you could require a "stack" of flow-ids (or virtual link id's in the source route) in the packets. Also, when I say "LSR", I don't mean that there are parts of the path which are not specified, only that the path is specific in terms of high-level virtual entities, not actual physical resources. > one minor bug is that if we define the global flow-id in the packet to > be the source EID plus a source-local flow-id, this idea doesn't work > (since you'd need to bash the source EID to the router at the start of > the VLF, or something), so perhaps the flow-id field can't overload the > source EID. I think the bug is fatal. No, the only thing that happens is your packet format has to not include the "source EID" field in the "flow-id" semantics. Globally unique flow ids are nice, since they avoid either i) having to agree over some scope on a flow-id allocation, or ii) purely local flow-ids which get changed in the packet at each hop, a la X.25. If you still want packets to contain globally unique flow ids, and you decide the easiest way to create those globaly unique flow-ids is to concatenate some globally unique label (e.g. an EID) with a local flow-id, that just means the "flow-id" field in the packet has to include space for an EID as well as a local flow-id. So the packets get a little bigger. Bandwidth is going to be cheap... > (It turns out that flow-setup can even provide "denial of service" > protection, It can't. An attacker can advertise false routing information with which source route could be constructed. The attacker will, then, drop the actual data packets on the advertised route. As I pointed out, the ends can detect this, and, using the topology map, create a new path which avoids the attacker. See Radia Perlman's PhD thesis for all sorts of other ways to secure map distribution based routing systems against hostile attack. Flow-setup/MD systems can be made extraordinarily resistant to attack, one of their chief attractions.... Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa16987; 4 Jan 94 13:00 EST Received: from pizza by PIZZA.BBN.COM id aa03129; 4 Jan 94 12:47 EST Received: from BBN.COM by PIZZA.BBN.COM id aa03125; 4 Jan 94 12:45 EST Received: from necom830.cc.titech.ac.jp by BBN.COM id aa16429; 4 Jan 94 12:44 EST Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 5 Jan 94 02:40:09 +0900 From: Masataka Ohta Return-Path: Message-Id: <9401041740.AA19759@necom830.cc.titech.ac.jp> Subject: Re: New datagram mode To: Noel Chiappa Date: Wed, 5 Jan 94 2:40:07 JST Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM In-Reply-To: <9401041504.AA07418@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 4, 94 10:04 am X-Mailer: ELM [version 2.3 PL11] > I thought loose source routing is the way to go. > > Well, I reckon we should still have source routing (strict and loose), since > it has capabilities this new mode lacks, but as I pointed out in the note, > source routing does have real disadvantages, and this mode is an attempt to > avoid them. Really? > > it may be even more efficient than the existing "hop-by-hop" model! > > Strange. Why, do you think, next hop decision by flow ID is more efficient > than next hop decision by EID of the destination (or, in the source > routing case, the next intermediate router). > > First, I didn't think anyone is proposing doing routing based on the EID. I think I am. If a source route is like A.B.C.D..., A should know the route to B, B shuold know the route to C, and so on. In general, border routers at level K should know the pathes to border routers at level K-1. > Since there is no topological information in the EID, that would require > either i) some setup in the routers which are expected to handle packets to > that EID, which would effectively be a flow setup with the flow-id being the > destination EID, or ii) a translation step (from EID to locator), which would > be very inefficient. As the routing table gives topological information of EIDs, hosts can construct a source route. > Even with shortish fixed length locators (a la SIP), which are simply looked > up in a routing table, this would *still* produce more efficient forwarding > (in the non-active routers, once the DMF has actually been set up). Such > hop-by-hop routing uses a "longest match" lookup in the routing table, whereas > if the flow-id is at a fixed offset in the packet, a more efficient lookup > and forwarding, perhaps even purely in hardware, is possible. So, simple routing with plain EID is just as fast as routing with flow ID. > Source routing requires looking around in a variable length object I don't think it hurt perormance. > A way to avoid denial of service is for the entity which set up the flow to > send a certain % of traffic which is "test" traffic. Aren't you assuming that there is certain amount of traffic between all the pairs of routers? With connectionless communication, packet exchange along a certain link could be quite infrequent. > > If all packets contain a "flow-id" field, *which will be unused in > > datagram packets*, the obvious thing is to store the flow-id of the VLF > > in that field. > > As a virtual link is merely loosely source routed and have hierarchical > structure, you can override the flow-id field only at the top level. > > I'm not sure I quite followed this? If I understand your meaning, you refer > to the problem I alluded to in that brief messages recently, where I > mentioned that unless high-level virtua links are actually instantiated as > flows, you could require a "stack" of flow-ids (or virtual link id's in the > source route) in the packets. Oops, I failed to note your short mail, sorry. Anyway, as the "stack" of flow-ids is just as bad (not so bad, I think) as variable length source routing, it can not be your choice. Then, the question is, how many instantiated flows will there be in the entire net. > > (It turns out that flow-setup can even provide "denial of service" > > protection, > > It can't. An attacker can advertise false routing information with which > source route could be constructed. The attacker will, then, drop > the actual data packets on the advertised route. > > As I pointed out, the ends can detect this, and, using the topology map, create > a new path which avoids the attacker. I think the end can detect the service denial with connection oriented communication only. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa05933; 5 Jan 94 2:28 EST Received: from pizza by PIZZA.BBN.COM id aa06701; 5 Jan 94 2:12 EST Received: from BBN.COM by PIZZA.BBN.COM id aa06697; 5 Jan 94 2:09 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa05592; 5 Jan 94 2:10 EST Received: by ginger.lcs.mit.edu id AA11037; Wed, 5 Jan 94 02:09:51 -0500 Date: Wed, 5 Jan 94 02:09:51 -0500 From: Noel Chiappa Message-Id: <9401050709.AA11037@ginger.lcs.mit.edu> To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp Subject: Re: New datagram mode Cc: nimrod-wg@BBN.COM > Well, I reckon we should still have source routing (strict and loose), > since it has capabilities this new mode lacks, but as I pointed out in > the note, source routing does have real disadvantages, and this mode is > an attempt to avoid them. Really? I'm a little confused as to exactly what in that paragraph caused the "Really?" response. You don't believe there are disadvantages? I listed the disadvantages in the original note. They were: First, the source (or some agent thereof) has to compute such source routes, so even if they are cached, there is some inefficiency there. Second, they have to be carried in the packets, making them more expensive to construct at the source, as well as bulkier. Third, the processing of [source-routed] packets in network switches is almost inevitably more inefficient. You don't believe this new mode solves them? The route calculation is now distributed, which takes care of the first, there are no source routes in packets, which fixes the second, etc. > First, I didn't think anyone is proposing doing routing based on the EID. I think I am. If a source route is like A.B.C.D..., A should know the route to B, B shuold know the route to C, and so on. If I understand you correctly, this refers to this idea of yours of indentifying areas with the EID's of border routers. I'm speaking of routing on the destination EID only. > Since there is no topological information in the EID As the routing table gives topological information of EIDs, hosts can construct a source route. You must be working with a very different definition of "EID" from the rest of us. It is a flat, effectively random (from the point of view of the network) number, like an Ethernet 48-bit hardware address. There is *no way* for the "routing table" (I detest the term, since it smacks of the DV and hop-by-hop view of the world, which I reckon is headed for the junk-heap of history) to give topological information about an EID *directly*. (If the EID is first mapped into a topologicaly significant name, the locator, the routing table can tell you something about the locator, but that doesn't count.) So, simple routing with plain EID is just as fast as routing with flow ID. You can't "route" from a plain EID. The EID cannot be looked up in a routing table. (Also you don't "route" with flow ID's, in the sense of compute a route, you only "forward" based on them. The IETF has started to used the two terms to distinguish between the process of deciding on routes, and actually passing user data packets though. Of course, in the hop-by-hop model, forwarding does involve a routing step.) > A way to avoid denial of service is for the entity which set up the flow > to send a certain % of traffic which is "test" traffic. Aren't you assuming that there is certain amount of traffic between all the pairs of routers? Only between pairs of routers which wish to protect themselves against denial-of-service attacks. Like much else in Nimrod, this is a cost/benfit knob that is set locally, to allow the users to make the decision on what quality of service they want.. Then, the question is, how many instantiated flows will there be in the entire net. Actually, the important question is the number of simultaneously active (i.e. set up) flows in a given router. If you look at system as a whole, there is a curve, with the number of active flows on one axis, and the number of routers with that many flows on the other. The shape of the curve, and the way it is changing over time, is as important as numbers like the average, and worst case.. This question was discussed on the Big-Internet mailing list some time back. Space doesn't permit retelling the whole thing here, but as you will no doubt recall I mentioned the result than O(N^2) growth does not seem feasible. > the ends [of the flow] can detect this, and, using the topology map, > create a new path which avoids the attacker. I think the end can detect the service denial with connection oriented communication only. It all depends on how much overhead you are wiling to accept. For things which "have to work", you can send test traffic. But that's silly in most cases... If your application wants to send one packet, and get no acknowledgement, then even a random packet loss could disrupt the application. If it's a single request and response, the lack of a response should tell you something's wrong. If you retry several times, and nothing comes back, then perhaps it's time to try something more robust, like a source-routed packet (at least, if the application is important). Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa08777; 5 Jan 94 3:29 EST Received: from pizza by PIZZA.BBN.COM id aa06946; 5 Jan 94 3:14 EST Received: from BBN.COM by PIZZA.BBN.COM id aa06942; 5 Jan 94 3:12 EST Received: from necom830.cc.titech.ac.jp by BBN.COM id aa06737; 5 Jan 94 3:13 EST Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 5 Jan 94 17:08:18 +0900 From: Masataka Ohta Return-Path: Message-Id: <9401050808.AA22138@necom830.cc.titech.ac.jp> Subject: Re: New datagram mode To: Noel Chiappa Date: Wed, 5 Jan 94 17:08:16 JST Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM In-Reply-To: <9401050709.AA11037@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 5, 94 2:09 am X-Mailer: ELM [version 2.3 PL11] > First, the source (or some agent thereof) has to compute such source > routes, so even if they are cached, there is some inefficiency there. > Second, they have to be carried in the packets, making them more expensive > to construct at the source, as well as bulkier. > Third, the processing of [source-routed] packets in network switches is > almost inevitably more inefficient. > > You don't believe this new mode solves them? The route calculation is now > distributed, which takes care of the first, there are no source routes in > packets, which fixes the second, etc. The first one, the distributed calculation of route, only makes load of intermediate routers heavier. The second one, I can't understand. How can route be determined with a single flat EID? Doesn't your packet contain something like layers of Area IDs which makes the packet bulky? > > First, I didn't think anyone is proposing doing routing based on the EID. > > I think I am. If a source route is like A.B.C.D..., A should know the > route to B, B shuold know the route to C, and so on. > > If I understand you correctly, this refers to this idea of yours of > indentifying areas with the EID's of border routers. Yes, I'm speaking of loose source routing by EIDs of intermediate routers. > I'm speaking of routing on the destination EID only. I don't think you can forward any packet with EID only. Aren't you assuming some structure in EID? > You must be working with a very different definition of "EID" from the rest of > us. It is a flat, effectively random (from the point of view of the network) > number, like an Ethernet 48-bit hardware address. To me, EID is flat, of course. > There is *no way* for the > "routing table" (I detest the term, since it smacks of the DV and hop-by-hop > view of the world, which I reckon is headed for the junk-heap of history) to > give topological information about an EID *directly*. (If the EID is first > mapped into a topologicaly significant name, the locator, the routing table > can tell you something about the locator, but that doesn't count.) Just as the current routing table gives information for the next hop determination to various networks, nimrod routing table could give information for the next hop determination to various areas. I think areas should be represented with EID of border routers, thus, routing table could be indexed with EIDs of border routers, which is not essential. So, more generally, > As the routing table gives topological information of EIDs, hosts can > construct a source route. could be refrased as: As the routing table gives topological information of area IDs, hosts can construct a source route consists of area IDs. > So, simple routing with plain EID is just as fast as routing with flow ID. > > You can't "route" from a plain EID. The EID cannot be looked up in a routing > table. I'm assuming that routers have a routing table indexed by EIDs of certain border routers (those at the upper level areas and the direct lower level areas) and EIDs of directly reachable endhosts (that is, endhosts in the same datalink layer). No, the table does not contain the EIDs of all the host on the net, of course. So, the table size is not so large. > > A way to avoid denial of service is for the entity which set up the flow > > to send a certain % of traffic which is "test" traffic. > > Aren't you assuming that there is certain amount of traffic between all > the pairs of routers? > > Only between pairs of routers which wish to protect themselves against > denial-of-service attacks. Like much else in Nimrod, this is a cost/benfit > knob that is set locally, to allow the users to make the decision on what > quality of service they want.. What? Then, how intermediate routers can forward packets without a lot of effort for flow setup? Aren't you suggesting a trade-off between: Connectionless communication through routers fully connected at run time and Connectionless communication through routers fully connected in advance I think both will result in O(N^2) behaviour. > Then, the question is, how many instantiated flows will there be in the > entire net. > > Actually, the important question is the number of simultaneously active (i.e. > set up) flows in a given router. If you look at system as a whole, there is a > curve, with the number of active flows on one axis, and the number of routers > with that many flows on the other. The shape of the curve, and the way it is > changing over time, is as important as numbers like the average, and worst > case.. I don't think you can assume any "flow" for connectionless communication. > This question was discussed on the Big-Internet mailing list some time back. > Space doesn't permit retelling the whole thing here, but as you will no doubt > recall I mentioned the result than O(N^2) growth does not seem feasible. At least, even with your logic: > Only between pairs of routers which wish to protect themselves against > denial-of-service attacks. Like much else in Nimrod, this is a cost/benfit if a user want full protection against service denial, it is O(N^2). > > the ends [of the flow] can detect this, and, using the topology map, > > create a new path which avoids the attacker. > > I think the end can detect the service denial with connection oriented > communication only. > > It all depends on how much overhead you are wiling to accept. For things which > "have to work", you can send test traffic. But that's silly in most cases... It's much more reasonable to expect the end-end protection with each applications on individual hosts. Aggragated protection through intermediate routers can't be so meaningful. > If your application wants to send one packet, and get no acknowledgement, then > even a random packet loss could disrupt the application. If it's a single > request and response, the lack of a response should tell you something's > wrong. If you retry several times, and nothing comes back, then perhaps it's > time to try something more robust, like a source-routed packet (at least, if > the application is important). So, the protection, if any, should be application-wise. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa22788; 5 Jan 94 9:48 EST Received: from pizza by PIZZA.BBN.COM id aa08328; 5 Jan 94 9:30 EST Received: from nic.near.net by PIZZA.BBN.COM id aa08324; 5 Jan 94 9:29 EST Received: from ftp.com by nic.near.net id aa14129; 5 Jan 94 9:30 EST Received: by ftp.com id AA26777; Wed, 5 Jan 94 09:30:50 -0500 Date: Wed, 5 Jan 94 09:30:50 -0500 Message-Id: <9401051430.AA26777@ftp.com> To: mohta@necom830.cc.titech.ac.jp Subject: Re: mobility and NIMROD From: Frank Kastenholz Reply-To: kasten@ftp.com Cc: jcurran@nic.near.net, jnc@ginger.lcs.mit.edu, nimrod-wg@nic.near.net > > ] Anyway, I don't think we need more than 4 layers. With a thousand lower > > ] lever areas at each level, we can accomodate 1 Tera hosts. > > > > Please do not assume uniform distributions. Utilizations of .1 % to 10 % > > are much more realistic, depending upon the administrative and operational > > backpressure applied via policies. > > So? 0.1% of 1 Tera is 1 Giga. Please do not assume a fixed number of hierarchies. If a protocol assumes a fixed number of hierarchies then, within 10 years, we will have to fix it. I do not want to go through this all again. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa22850; 5 Jan 94 9:49 EST Received: from pizza by PIZZA.BBN.COM id aa08352; 5 Jan 94 9:36 EST Received: from BBN.COM by PIZZA.BBN.COM id aa08348; 5 Jan 94 9:34 EST Received: from babyoil.ftp.com by BBN.COM id aa20916; 5 Jan 94 9:30 EST Received: by ftp.com id AA26780; Wed, 5 Jan 94 09:30:52 -0500 Date: Wed, 5 Jan 94 09:30:52 -0500 Message-Id: <9401051430.AA26780@ftp.com> To: mohta@necom830.cc.titech.ac.jp Subject: Re: New datagram mode From: Frank Kastenholz Reply-To: kasten@ftp.com Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM >> I think it produces >> a more efficient datagram mode than I had hitherto imagined. In fact, it may >> be even more efficient than the existing "hop-by-hop" model! > >Strange. > >Why, do you think, next hop decision by flow ID is more efficient >than next hop decision by EID of the destination (or, in the source >routing case, the next intermediate router). There probably will be >1 destination EID 'reachable' for a given flow. If there are packets going to N different EIDs that all can use the same flow, then you can achieve an N:1 improvement in resources required to store the routing information to those EIDs. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa24726; 5 Jan 94 10:20 EST Received: from pizza by PIZZA.BBN.COM id aa08579; 5 Jan 94 10:06 EST Received: from BBN.COM by PIZZA.BBN.COM id aa08574; 5 Jan 94 10:04 EST Received: from babyoil.ftp.com by BBN.COM id aa23615; 5 Jan 94 10:02 EST Received: by ftp.com id AA27982; Wed, 5 Jan 94 10:02:21 -0500 Date: Wed, 5 Jan 94 10:02:21 -0500 Message-Id: <9401051502.AA27982@ftp.com> To: jnc@ginger.lcs.mit.edu Subject: Re: New datagram mode From: Frank Kastenholz Reply-To: kasten@ftp.com Cc: nimrod-wg@BBN.COM I read Noel's note and have a couple of comments. > As a result of all this, I prefer systems which do not have to make > this guarantee of "near-global consistency"; they are both simpler and more > robust. I would, as an architect, *very* desperately like to avoid any such > design *if at all possible*. It was for this reason that I was thinking that > perhaps the only non-flow mode in Nimrod would be a "source-route in packet" > mode. This guaranteed non-looping paths, but without having to guarantee the > near-global consistency in databases, etc, etc. I'm not sure exactly what Noel means by "global" in this paragraph. Does it mean global in that _all_ routers _everywhere_ must be consistant? Or does it mean that all routers in any one area must be consistant? I tend to believe the later. Routers in a given area need only understand how to get to the border-routers of their area, and to the border-routers of contained areas. > For efficient handling of SRD's, I had imagined that i) the packet > would contain a pointer into the source route, and ii) routers would > maintain, for each virtual link, a pre-setup flow which instantiates the > virtual link (hereinafter the "VLF", for 'virtual link flow'). When an SRD > shows up at the router at the start of the VLF, it is somehow "associated" > with the VLF until it gets to the end of it, at which time the source route > is consulted, and the packet is routed onto the next VLF. (Obviously, > physical links are just done as a one-shot, without the necessity of a > flow.) Just to be sure that I understand; I'd specify a source route of the form A-B-C-D... and there would be a 'pre-set-up' flow in the various areas, one between A and B, another between B and C and so on? > The packet contains, in addition to a mungable flow-id field, the > source and destination locators, and a pointer into the locator. The idea > (borrowed from PIP :-) is that the pointer starts out at the lowest level of > the source locator, and moves up that locator, then across to the destination > locator, and then down. In addition to these extra fields in the packet, all > routers have to contain a minimal set of "pre-setup" flows to certain routers > which are at critical places in the abstraction hierarchy. Doesn't this conflict with your earlier note about maps and meshes? By allowing the notion of "critical places" you then allow the notion of "single point of failure". If a router at a critical place fails then, by definition (since the place/router is critical) you have a failure with major impact in the network. > While going up the source locator, each "active" router (i.e. one that >actually makes a decision about where to send the packet, as opposed to >handling it as part of a flow) selects a DMF which will take the packet to the >"next higher" level object in the source locator, advances the pointer, and >sends the packet off along that flow. When it gets to the end of that flow, >the process repeats, until the packet reaches a router which is at the least >common intersection of the two locators (i.e. for A.P.Q.R and A.X.Y.Z, this >would be when the packet reaches A). > The process then inverts, with each active router selecting a DMF >which takes the packet to the next lower object in the destination locator. >So, A would select a flow to A.X, and once it got to A.X, A.X would select a >flow to A.X.Y, etc. > This mode would have almost none of the disadvantages of SRD, since >the source doesn't have to compute a route, and there is no source route in >the packet, just the source and destination locator (and the source locator is >useful to have anyway when the packet gets to the ultimate destination, to >allow a reply to be sent easily). Again, in a world with resource-allocation >going on, that DMF would have a reource limit associated with it, which would >prevent pure datagram traffic from interfering with other resource >allocations. I read these paragraphs as saying that the source route is "deduced" from the source and destination locators. If the source locator is A.B.C and the destination locator is X.Y.Z then the source route between the two is deduced to be A.B.C - A.B - A - X - X.Y - X.Y.Z. No? Isn't this "no brainer routing"? > It might be possible to remove the "hop count" field, since there > are now some fairly strong guarantees that traffic will not loop, but it > might be useful to leave it in as an additional safety measure against > unforseen failure modes. Currently, the hop count is forwarding state which > is retained in the packet to prevent loops, and that retained state is in > some ways made redundant by some slightly more complex state which is > retained by this method. Removing the hop count would slight increase the > efficiency of forwarding, obviously. Can the set of routers comprising a flow change in "mid-flow"? Can the set be temporarily inconsistant? Might a loop exist as a transient of some form during a repair of the flow? If so, the hop-count might be deemed necessary. ===================================================================== (aside) However; it occurs to me that if there is no hop-count, even in today's network: If a loop occurs, then the loop will eventually be fixed. When it is fixed, all existing packets in the loop will be forwarded on towards their destinations, just as if nothing had happened. Some packets that were sent into the loop will be lost due to lack of buffers. I suspect that this loss will be somewhat random (though it might hit "new" packets disproportionatley). So maybe a hop-count is not necessary in any event. (I have not thought all this through...) ===================================================================== Now, what I want to know is what does this new datagram model really buy that the current model does not do. It seems to me that what you are doing is taking a network that is fundamentally a flow network and trying to superimpose the appearance of datagrams on it. The DMFs seem to me to be really nothing more than a way for routers to know how to get to the borders of their areas. In short, Noel, I think that you are taking something and making it overly complex. Why not assume that within an area, all routers have a reasonably consistant database. Their databases may not be complete, but what they have is correct. If this assumption recurses, up and down the hierarchy, then you can do real hop-by-hop forwarding. Granted that this turns the routing into No-Brainer (NB), but you seem to have reduced datagram mode to NB anyway. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa26171; 5 Jan 94 10:41 EST Received: from pizza by PIZZA.BBN.COM id aa08706; 5 Jan 94 10:27 EST Received: from nic.near.net by PIZZA.BBN.COM id aa08702; 5 Jan 94 10:26 EST Received: from necom830.cc.titech.ac.jp by nic.near.net id aa21719; 5 Jan 94 10:27 EST Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Thu, 6 Jan 94 00:23:18 +0900 From: Masataka Ohta Return-Path: Message-Id: <9401051523.AA23502@necom830.cc.titech.ac.jp> Subject: Re: mobility and NIMROD To: kasten@ftp.com Date: Thu, 6 Jan 94 0:23:16 JST Cc: jcurran@nic.near.net, jnc@ginger.lcs.mit.edu, nimrod-wg@nic.near.net In-Reply-To: <9401051430.AA26777@ftp.com>; from "Frank Kastenholz" at Jan 5, 94 9:30 am X-Mailer: ELM [version 2.3 PL11] > > > ] Anyway, I don't think we need more than 4 layers. With a thousand lower > > > ] lever areas at each level, we can accomodate 1 Tera hosts. > > > > > > Please do not assume uniform distributions. Utilizations of .1 % to 10 % > > > are much more realistic, depending upon the administrative and operational > > > backpressure applied via policies. > > > > So? 0.1% of 1 Tera is 1 Giga. > > Please do not assume a fixed number of hierarchies. No one has assumed anything. It is just a estimation, with which no hard decision is made. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa01150; 5 Jan 94 11:49 EST Received: from pizza by PIZZA.BBN.COM id aa09086; 5 Jan 94 11:32 EST Received: from BBN.COM by PIZZA.BBN.COM id aa09082; 5 Jan 94 11:30 EST Received: from Princeton.EDU by BBN.COM id aa29668; 5 Jan 94 11:29 EST Received: from clytemnestra.Princeton.EDU by Princeton.EDU (5.65b/2.103/princeton) id AA26051; Wed, 5 Jan 94 11:28:59 -0500 Received: by clytemnestra.princeton.edu (4.1/1.113) id AA20256; Wed, 5 Jan 94 11:28:58 EST Message-Id: <9401051628.AA20256@clytemnestra.princeton.edu> To: kasten@ftp.com Cc: nimrod-wg@BBN.COM Subject: Re: New datagram mode In-Reply-To: Your message of "Wed, 05 Jan 1994 10:02:21 EST." <9401051502.AA27982@ftp.com> X-Mailer: exmh version 1.2gamma 12/21/93 Date: Wed, 05 Jan 1994 11:28:55 EST From: John Wagner > I read Noel's note and have a couple of comments. > > > As a result of all this, I prefer systems which do not have to make > > this guarantee of "near-global consistency"; they are both simpler and more > > robust. I would, as an architect, *very* desperately like to avoid any such > > design *if at all possible*. It was for this reason that I was thinking that > > perhaps the only non-flow mode in Nimrod would be a "source-route in packet" > > mode. This guaranteed non-looping paths, but without having to guarantee the > > near-global consistency in databases, etc, etc. > > I'm not sure exactly what Noel means by "global" in this paragraph. > Does it mean global in that _all_ routers _everywhere_ must be > consistant? Or does it mean that all routers in any one area must be > consistant? > > I tend to believe the later. Routers in a given area need only > understand how to get to the border-routers of their area, and to the > border-routers of contained areas. Isn't this simply a restatement of the fact that Nimrod is using a hierachical view of the network? The border routers (conceptually) sit at the interfaces between hierarchy levels. The problem comes in determining which routers are at the border (since in a dynamic network what was A.B may suddenly be A.B.A and A.B.B). John Wagner   Received: from PIZZA.BBN.COM by BBN.COM id aa07620; 5 Jan 94 13:39 EST Received: from pizza by PIZZA.BBN.COM id aa00333; 5 Jan 94 13:23 EST Received: from BBN.COM by PIZZA.BBN.COM id ae00313; 5 Jan 94 13:19 EST Received: from babyoil.ftp.com by BBN.COM id aa04644; 5 Jan 94 12:49 EST Received: by ftp.com id AA05977; Wed, 5 Jan 94 12:49:04 -0500 Date: Wed, 5 Jan 94 12:49:04 -0500 Message-Id: <9401051749.AA05977@ftp.com> To: jwagner@princeton.edu Subject: Re: New datagram mode From: Frank Kastenholz Reply-To: kasten@ftp.com Cc: nimrod-wg@BBN.COM >> consistant? Or does it mean that all routers in any one area must be >> consistant? >> >> I tend to believe the later. Routers in a given area need only >> understand how to get to the border-routers of their area, and to the >> border-routers of contained areas. > >Isn't this simply a restatement of the fact that Nimrod is using a hierachical >view of the network? The border routers (conceptually) sit at the interfaces >between hierarchy levels. The problem comes in determining which routers are >at the border (since in a dynamic network what was A.B may suddenly be A.B.A >and A.B.B). Well, first, I was not quite sure what Noel meant by 'global' -- did he mean global as in the entire earth/universe/... or did he really mean within a given area. Making the routing database be "area-wide" consistant is a much easier problem. If an area grows to big, requiring that an inordinate amount of resources are required to keep the database consistant then I can "simply" partition the area or change the area from being one level to being several levels (I can grow the hierarchy "horizontally" or "vertically"). In the extreme, I could keep my areas small enough so that I could use RIP, or even static routes to do the routing :-) I think that determining border routers is "simple" -- if a router advertises that it can get to locators that are all within the area then the router is not a border router. If a router advertises that it can get to locators which are strict prefixes of the area's locator, then that router is a border router to a "containing" area. If a router advertises that it can get to a locator of which the area's locator is a strict prefix, then that router is a border router to a "contained" area. For instance, if I am in area A.B.C.D and a router advertises it can get to A.B.C then that router can get to my "containing" area; if a router advertises that it can get to A.B.C.D.E then that router can get to a contained area. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa29002; 6 Jan 94 2:33 EST Received: from pizza by PIZZA.BBN.COM id aa00884; 6 Jan 94 2:16 EST Received: from BBN.COM by PIZZA.BBN.COM id aa00880; 6 Jan 94 2:14 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa28541; 6 Jan 94 2:14 EST Received: by ginger.lcs.mit.edu id AA18784; Thu, 6 Jan 94 02:14:43 -0500 Date: Thu, 6 Jan 94 02:14:43 -0500 From: Noel Chiappa Message-Id: <9401060714.AA18784@ginger.lcs.mit.edu> To: kasten@ftp.com, nimrod-wg@BBN.COM Subject: Re: New datagram mode Cc: jnc@ginger.lcs.mit.edu > I prefer systems which do not have to make this guarantee of > "near-global consistency" I'm not sure exactly what Noel means by "global" in this paragraph. Does it mean global in that _all_ routers _everywhere_ must be consistant? What I mean by "near-global" is that in a hop-by-hop system, all the routers in the routing scope of an object X (i.e. routers which contain hop-by-hop routing table entries for X) have to have a consistent idea of how to get to X, otherwise loops can develop. It's clearly not *all* routers; that's why I used the "near-global" terminology... Or does it mean that all routers in any one area must be consistant? I tend to believe the later. Routers in a given area need only understand how to get to the border-routers of their area, and to the border-routers of contained areas. It actually doesn't have anything to do with area boundaries. The information you mention is the theoretical minimum necessary to make a hierarchical routing architecture work, but that set is the same no matter which routing/forwarding model is used. You also may be falling into the "abstraction action boundaries are the same as abstraction naming boundaries" trap. Making the two the same produces strictly hierarchical routing, which is inefficient. Just to be sure that I understand; I'd specify a source route of the form A-B-C-D... and there would be a 'pre-set-up' flow in the various areas, one between A and B, another between B and C and so on? Pretty much, except that a source route is probably specified as a list of (virtual) links, not switches. If A and B have several alternate paths between them, with different attributes, how does A decide which one to pick? (To put it in graph-theory terms; if two nodes can have more than one arc between them, a list of nodes does not specify a unique path through the graph, but a list of arcs does.) There would be a pre-setup flow for each (virtual) link, yes. > all routers have to contain a minimal set of "pre-setup" flows to > certain routers which are at critical places in the abstraction > hierarchy. Doesn't this conflict with your earlier note about maps and meshes? By allowing the notion of "critical places" you then allow the notion of "single point of failure". If a router at a critical place fails then, by definition (since the place/router is critical) you have a failure with major impact in the network. Ah, sorry, I'm trying to put 5 pounds of idea into 2 pounds of words, again. (Jeez, the noelgram was *already* 24KB! :-) Yes, reliance on individual routers would produce SPOF's. However, the "critical places" are more of a concept than an actual physical thing; remember, they are "places" in the abstraction hierarchy, not the topology! In reality, any one of a set of routers with property X (i.e. any border router into a lower level area, or any border router out of this area) would do. Clearly, we'll have to have some mechanism to detect failed routers, etc, etc, etc, but that's all pretty much grind-the-crank competent engineering. I read these paragraphs as saying that the source route is "deduced" from the source and destination locators. If the source locator is A.B.C and the destination locator is X.Y.Z then the source route between the two is deduced to be A.B.C - A.B - A - X - X.Y - X.Y.Z. No? Isn't this "no brainer routing"? There is no "source route", even a deduced one. (I've heard Paul Francis' argument that a hierarchical locator is a source route, and I think it's confused.) The packet is routed in an incremental fashion, and the optimality of the resulting route is dependant on the amount of detail about the topology (above the theoretical minimum) you are willing to pay the price of distributing. (As mentioned above, this tradeoff between the amount of info, and the optimiality of routes, is present in all routing architectures.) Later on in the note I said: This level of state provides strictly hierarchical routing. There are pretty obvious optimizations... For example, if you keep DMF's to more than the minimal set ..., and keep your table sorted for efficient lookups (probably much the same as the current routing table for hop-by-hop datagrams), you may be able to short-cut. For example, using the case above (a packet from A.P.Q.R to A.X.Y.Z), if A.P.Q is actually a neighbour to A.X.Y, and maintains a flow directly from A.P.Q to A.X.Y, then when the packet reaches A.P.Q, instead of going the rest of the way up and down, the pointer can be set into the destination locator at A.X.Y, and the packet sent there directly. So, if only the minimum necessary set of DMF's were available, yes, it would be strictly heirarchical routing. ("No-brainer" routing refers to something subtly different, the bias to one long-haul network over another when several equally good ones are available, caused by the abstraction boundaries including the destination with one of the long-haul networks.) > It might be possible to remove the "hop count" field Can the set of routers comprising a flow change in "mid-flow"? The set of resources (physical and virtual) which are the path of a flow cannot change, but the actual set of physical resources (and routers) which make up a virtual resource might change due to local repair/load-adjustment, so yes. Can the set be temporarily inconsistant? Might a loop exist as a transient of some form during a repair of the flow? I don't yet know all the details of flow-repair, so I can't answer this for certain, but I don't think so. At least, I'm having a hard time coming up with a counter-example, assuming some mildly intelligent flow-repair algorithms. Maybe if you have bits of even older versions of the flow left around in various swithces... hmm. If so, the hop-count might be deemed necessary. (aside) However; it occurs to me that if there is no hop-count, even in today's network: If a loop occurs, then the loop will eventually be fixed. When it is fixed, all existing packets in the loop will be forwarded on towards their destinations, just as if nothing had happened. Some packets that were sent into the loop will be lost due to lack of buffers. I suspect that this loss will be somewhat random (though it might hit "new" packets disproportionatley). So maybe a hop-count is not necessary in any event. (I have not thought all this through...) It turns out that John Curran has convinced me that we probably ought to keep the hop-count (as belt and suspenders engineering), but let me go think this through. You have an interesting argument... Now, what I want to know is what does this new datagram model really buy that the current model does not do. Good question. By "current model", do you mean hop-by-hop, or source-routed? If the former, I claim it's more robust, and probably slightly more efficient. It's also less complex to provide, since you don't have the consistency requirements to meet (especially in the Nimrod environment; you don't have to make sure people compute their hop-by-hop routing tables from the same topology map). It may also be a little more flexible in terms of allowing local control of the overhead/optimality tradeoff knob, but I have to think about that. If the latter, I think the original note answers it: This mode would have almost none of the disadvantages of SRD, since the source doesn't have to compute a route, and there is no source route in the packet, just the source and destination locator ... Again, in a world with resource-allocation going on, that DMF would have a reource limit associated with it, which would prevent pure datagram traffic from interfering with other resource allocations. (The latter advantage applies to the comparison with hop-by-hop as well. obviously.) The new mode is also less complex to process in most routers than SRD. It seems to me that what you are doing is taking a network that is fundamentally a flow network and trying to superimpose the appearance of datagrams on it. There is some truth to this. Howver, it was *not* done to get an "all-flow" network. That's just a nice (and interesting, but I'll talk about that in a second) byproduct. The use of flows to get the packets from each active point to the next is there because it provides for more loop-resistant *forwarding* (there are no routing decisions at all being made) between active points. However, flows aren't there just to have flows, either! Remember, in a way, flow setup is nothing but an efficient way of doing "source" routing. (All through Nimrod, it's not actually necessarily generated by the true source, which is why I don't like the term "source". I prefer the term "unitary", since the path is picked by one entity.) I say "in a way", since flows also happen to interact well with some resource allocation ideas, etc. Again, nice, and interesting, and more in a sec on that. What's most important, all through Nimrod as a *routing architecture*, is the unitary routing, which is what underlies the whole thing, and *that's* there for a variety of reasons: robustness, and the ability to support source policies, etc, etc. So, that's why the flows are there. So why are flows so interesting? What is interesting to me about all this is the way that all sorts of seemingly unconnnected things are all leading down the same path. Physicists have this deal where theories which have unexpected scopes and synergies have a peculiar "beauty", which they take to mean that the theory is more likely true. I have much the same feeling. It is just Too Wierd the way all these various things fit together so nicely. I take it to mean that we have discovered something very "right". An interesting topic, but not to be explored in detail right here! The DMFs seem to me to be really nothing more than a way for routers to know how to get to the borders of their areas. ... Why not assume that within an area, all routers have a reasonably consistant database. Their databases may not be complete, but what they have is correct. If this assumption recurses, up and down the hierarchy, then you can do real hop-by-hop forwarding. "Reasonably consistent" doesn't cut it. Unless they *are* consistent, you *will* get routing loops. Hop-by-hop forwarding will not work without this consistency, and that consistency, together with the way the actual decisions on what the path is are distributed, represent real weaknesses. The key difference between this scheme and previous schemes is *not* in the kind of path which is constructed (at the broadest view), or in the information which is needed in the various routers. It is true, in these areas it is basically the same. The key differences are in the consistency requirements, and the distributed decision making on the path, and these differences are for reasons of robustness. The actual mechanisms resulting from these considerations are i) the use of flows to get the packets from each active point to the next, and ii) the montonically increasing pointer into the locators carried with each packet. The former is important since it provides for more loop-resistant *forwarding* (there are no routing decisions at all being made) between active points. The latter is important since it provides for more loop-resistant *routing* at active points. So, to restate, the overall goal is greater robustness in the routing (while retaining efficiency in packet creation, size, and processing), and the mechanism chosen is, at a high level, to increase the forwarding state in the packets, in the form of the locator pointer, and the locally chosen flow (not just a locally chosen *flow id*). Granted that this turns the routing into No-Brainer (NB), but you seem to have reduced datagram mode to NB anyway. Again, NB is something slightly different from "pure hierarchical", which is what I think you mean here. Also again, how far from pure hierarchical, and how close to optimal you are is a function of the amount of information you distribute (i.e. routing overhead), and not of the particular routing architecture. *All* routing architectures (whether hop-by-hop or unitary, and destination-vector or map-distribution) have the same basic tradeoff. Think of it as one of the "Laws of Thermodynamics" of routing! Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa29070; 6 Jan 94 2:40 EST Received: from pizza by PIZZA.BBN.COM id aa00963; 6 Jan 94 2:29 EST Received: from BBN.COM by PIZZA.BBN.COM id ab00959; 6 Jan 94 2:28 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa28778; 6 Jan 94 2:28 EST Received: by ginger.lcs.mit.edu id AA18847; Thu, 6 Jan 94 02:28:20 -0500 Date: Thu, 6 Jan 94 02:28:20 -0500 From: Noel Chiappa Message-Id: <9401060728.AA18847@ginger.lcs.mit.edu> To: jwagner@princeton.edu, kasten@ftp.com Subject: Re: New datagram mode Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM > Routers in a given area need only understand how to get to the > border-routers of their area, and to the border-routers of contained > areas. Isn't this simply a restatement of the fact that Nimrod is using a hierachical view of the network? The border routers (conceptually) sit at the interfaces between hierarchy levels. Again, don't think that abstraction action boundaries (i.e. the scopes over which individual sub-components of an entity are visible) have to match abstraction naming boundaries. (I.e., A.* may be visible as individual destinations outside A.) That way lies pure hierarchical routing... The problem comes in determining which routers are at the border (since in a dynamic network what was A.B may suddenly be A.B.A and A.B.B). Yes. Getting the boundaries configured consistently is going to be a big issue. There's a paper by Seeger and Khanna (Josh Seeger and Atul Khanna, "Reducing Routing Overhead in a Growing DDN", MILCOMM '86, IEEE, 1986) which has a nice scheme for doing this in a way that can't be badly confused. However, I've run out of brain cells for technical stuff, so I can't get into that. Guess I'll go do some political flaming on the IETF list with what few I have left at this point! :-) Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa29850; 6 Jan 94 3:11 EST Received: from pizza by PIZZA.BBN.COM id aa01061; 6 Jan 94 3:01 EST Received: from nic.near.net by PIZZA.BBN.COM id aa01057; 6 Jan 94 2:59 EST Received: from nic.near.net by nic.near.net id aa03445; 6 Jan 94 3:00 EST To: Noel Chiappa cc: kasten@ftp.com, nimrod-wg@nic.near.net Subject: Re: New datagram mode In-reply-to: Your message of Thu, 06 Jan 1994 02:14:43 -0500. <9401060714.AA18784@ginger.lcs.mit.edu> Date: Thu, 06 Jan 1994 03:00:25 -0500 From: John Curran -------- ] From: Noel Chiappa ] Subject: Re: New datagram mode ] Date: Thu, 6 Jan 94 02:14:43 -0500 ] ... ] (Frank) ] If so, the hop-count might be deemed necessary. (aside) However; it occurs ] to me that if there is no hop-count, even in today's network: ] If a loop occurs, then the loop will eventually be fixed. When it is ] fixed, all existing packets in the loop will be forwarded on towards ] their destinations, just as if nothing had happened. Some packets ] that were sent into the loop will be lost due to lack of buffers. ] I suspect that this loss will be somewhat random (though it might ] hit "new" packets disproportionatley). So maybe a hop-count ] is not necessary in any event. (I have not thought all this through...) ] ] It turns out that John Curran has convinced me that we probably ought to keep ] the hop-count (as belt and suspenders engineering), but let me go think this ] through. You have an interesting argument... Given a routing loop (due to, for instance, a software failure which results conflicting area boundaries), all of the datagrams sent to the destination will enter the loop and occupy a permanent portion of the capacity. Without a hop-count, there exists a finite number of datagrams that can be sent until congestion is assured. This congestion would remain even after the source had ceased transmission, and could prevent both management operations and propagation of correct maps. /John   Received: from PIZZA.BBN.COM by BBN.COM id aa04289; 6 Jan 94 4:14 EST Received: from pizza by PIZZA.BBN.COM id aa01308; 6 Jan 94 4:02 EST Received: from BBN.COM by PIZZA.BBN.COM id aa01304; 6 Jan 94 4:00 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa02573; 6 Jan 94 4:00 EST Received: by ginger.lcs.mit.edu id AA18966; Thu, 6 Jan 94 04:00:00 -0500 Date: Thu, 6 Jan 94 04:00:00 -0500 From: Noel Chiappa Message-Id: <9401060900.AA18966@ginger.lcs.mit.edu> To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp Subject: Re: New datagram mode Cc: nimrod-wg@BBN.COM > The route calculation is now distributed, which takes care of the first, > there are no source routes in packets, which fixes the second, etc. The first one, the distributed calculation of route, only makes load of intermediate routers heavier. The calculation happens in two steps, only the second one isn't really a step, it's just a lookup in a precomputed table. And remember, this calculation doesn't happen in *all* intermediate routers, just "active" routers. The first step, the calculation of the DMF path, is obviously the expensive one, but that can happen before the packet actually arrives. In fact, I'd probably make calculating DMF paths the "idle task" for the router, so that all the DMF's (even ones which are not currently instantiated in installed flows) have paths available if you need to set one up (this is assuming that the set of DMF's which are set up on demand in non-null). The second step is the selection of the appropriate DMF when a packet arrives. This is a "longest match" table lookup, exactly the same as the current routing table lookup. I don't hear anyone (even Steve D :-) complaining that that is too expensive! The second one, I can't understand. How can route be determined with a single flat EID? Doesn't your packet contain something like layers of Area IDs which makes the packet bulky? The route is not determined directly from the EID. The EID has to be translated to a locator, which is a hierarchically organized name, suitable for use by the routing. The locator does in fact contain a sequence of area names. As to whether the inclusion of locators makes the packet bulky, I guess that's a matter of opinion. I don't really know how long locators will be, for a start, nor what their representation will look like. For example, purely as an illustration, if we get by for the moment with 5 levels, 2 of 2 bytes and 3 of one byte, and one byte of "total length", that gives us 8 bytes, the same length as SIP addresses. I don't think you can forward any packet with EID only. Aren't you assuming some structure in EID? I agree that you can't forward (or route) only on the EID. No, I don't assume any structure in the EID. That's what locators are for; they are topologically significant, hierarchically structured, names. Locators and EID's *together* provide the functionality of current IPv4 "addresses" (and a little more besides, obviously). Just as the current routing table gives information for the next hop determination to various networks, nimrod routing table could give information for the next hop determination to various areas. What "nimrod routing table"? Up until the invention of DMF's, Nimrod had no routing tables, at least in anything like the classical sense. There were databases set up and used by the routing, but they were i) topology maps, and ii) the database of active flows. Even with DMF's, the only database which will contain "next hop" information will be the flow database. I think areas should be represented with EID of border routers, thus, routing table could be indexed with EIDs of border routers As I have already said, this scheme has what I regard as a *fatal* disadvantage, in that it ties the abstraction hierarchy too closely to the physical topology. I think that the binding (i.e. linkage) between the two should be something we can control, so that so that the change to the former to match the latter is *controllable*. You need to explain either i) why this goal is a bad goal, or ii) why some other advantage of your scheme outweighs the disadvantage of not meeting this goal. If you can't do either one, I don't think your scheme for areas will be very useful. I'm assuming that routers have a routing table indexed by EIDs of certain border routers (those at the upper level areas and the direct lower level areas) and EIDs of directly reachable endhosts (that is, endhosts in the same datalink layer). Ah, now I understand. Your use of "EID's" is only in the context of your planned use of router EID's as area identifiers. Yes, this would work (if I liked your scheme for area identifiers :-). > Only between pairs of routers which wish to protect themselves against > denial-of-service attacks. Like much else in Nimrod, this is a > cost/benfit knob that is set locally, to allow the users to make the > decision on what quality of service they want.. What? Then, how intermediate routers can forward packets without a lot of effort for flow setup? There is a certain amount of effort involved in getting ready to forward packets, even in a hop-by-hop system. The routing has to distribute information, and state has to be set up, etc, etc. It's just a different kind of state; routing tables instead of topology databases, flow databases, etc. I don't really know what you characterize as "a lot of effort". Nimrod may have somewhat more setup overhead, both in state and computing, than alternative schemes, but it also has advantages and capabilities they lack, and I reckon those advantages and capabilities are worth the price in setup overhead. Notice that I carefully say "setup overhead", since the actual processing of user data packets (i.e. "forwarding overhead") is as efficient, if not more so. Thus, the real-time operational characteristics are the same, if not superior. Aren't you suggesting a trade-off between: Connectionless communication through routers fully connected at run time and Connectionless communication through routers fully connected in advance Well, I'm not sure about the "fully connected", but yes, there is a tradeoff between the amount you do in advance, and the amount that is done in response to actual traffic. Again, just like everywhere in Nimrod, this is a cost/benfit knob that can be set locally, to allow the users to make the decision on what quality of service they feel like paying for. I think both will result in O(N^2) behaviour. I'm not sure I quite understand what you are getting at here, but if you are talking about the amount of state required to hold the DMF's (assuming we use point-point flows, and not trees, as I suggested, which would make it plain O(N), just like hop-by-hop routing) it is a more complex calculation than that. There is unlikely to be a single router through which all the flows will pass. In addition, the "N" here is a function of the size of the area, etc, so the amount of state can be controlled by controlling the area size, etc. > Actually, the important question is the number of simultaneously active > (i.e. set up) flows in a given router. I don't think you can assume any "flow" for connectionless communication. If this datagram scheme is adopted, then there will be a subset of the flows (or tree-flows) which will be used for datagrams (i.e. connectionless communication). if a user want full protection against service denial, it is O(N^2). I can't see where this comes from at all. Denial-of-service protection is given by the use of traffic monitoring, along with soure routing. There is nothing here that will cause O(N^2) growth. The only way to have O(N^2) growth is to have O(N^2) growth in the total number of *actual* connections (not *potential* connections) in the network. However, the total number of hosts is only growing at O(N)! Since the number of connections per host is clearly *not* growing at O(N), but is basically a constant, claims of O(N^2) growth are clearly not well supported. It's much more reasonable to expect the end-end protection with each applications on individual hosts. Aggragated protection through intermediate routers can't be so meaningful. ... So, the protection, if any, should be application-wise. True. I merely indicated that *some* denial-of-service protection was an option (i.e. not a necessity). Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa05956; 6 Jan 94 4:48 EST Received: from pizza by PIZZA.BBN.COM id aa01460; 6 Jan 94 4:39 EST Received: from BBN.COM by PIZZA.BBN.COM id aa01456; 6 Jan 94 4:37 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa05620; 6 Jan 94 4:31 EST Received: by ginger.lcs.mit.edu id AA19074; Thu, 6 Jan 94 04:30:21 -0500 Date: Thu, 6 Jan 94 04:30:21 -0500 From: Noel Chiappa Message-Id: <9401060930.AA19074@ginger.lcs.mit.edu> To: Christian.Huitema@sophia.inria.fr, jmoy@proteon.com, nimrod-wg@BBN.COM Subject: Re: Maps and meshes in the real world Cc: jnc@ginger.lcs.mit.edu OSPF [does] allow you to specify different link costs in each direction... you can make the cost in one direction so prohibitively large that the link will in practice only be used in one direction. Ahh, perhaps what you are talking about is control traffic. Indeed, that must be bidirectional in OSPF. Hmm. Interesting. Hard to see how to run a routing protocol when information can only flow in one direction from router A to router B (I mean in general, not along a particular link). I mean, if you never hear anything back, how do you even know it's up? Unidirectional links per se aren't a theoretical problem, as long as there's *some* way back. Still, it's good point, and one we should remember. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa01479; 6 Jan 94 6:53 EST Received: from pizza by PIZZA.BBN.COM id ab00330; 6 Jan 94 6:36 EST Received: from BBN.COM by PIZZA.BBN.COM id ac00318; 6 Jan 94 6:32 EST Received: from necom830.cc.titech.ac.jp by BBN.COM id aa01032; 6 Jan 94 6:31 EST Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Thu, 6 Jan 94 19:19:17 +0900 From: Masataka Ohta Return-Path: Message-Id: <9401061019.AA29004@necom830.cc.titech.ac.jp> Subject: Re: New datagram mode To: Noel Chiappa Date: Thu, 6 Jan 94 19:19:15 JST Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM In-Reply-To: <9401060900.AA18966@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 6, 94 4:00 am X-Mailer: ELM [version 2.3 PL11] > > The route calculation is now distributed, which takes care of the first, > > there are no source routes in packets, which fixes the second, etc. > > The first one, the distributed calculation of route, only makes load > of intermediate routers heavier. > > The calculation happens in two steps, only the second one isn't really a step, > it's just a lookup in a precomputed table. OK. Call the table "nimrod routing table", because it is so. > And remember, this calculation > doesn't happen in *all* intermediate routers, just "active" routers. All routers are either active or down. So? > The first step, the calculation of the DMF path, is obviously the expensive > one, but that can happen before the packet actually arrives. In fact, I'd > probably make calculating DMF paths the "idle task" for the router, Then, the network will melt down upon congestion. Why don't you try to avoid "obviously expensive" things, instead? > The second step is the selection of the appropriate DMF when a packet arrives. > This is a "longest match" table lookup, Then, the lookup will be slow. > As to whether the inclusion of locators makes the packet bulky, I guess > that's a matter of opinion. I don't really know how long locators will be, for > a start, nor what their representation will look like. > > For example, purely as an illustration, if we get by for the moment with 5 > levels, 2 of 2 bytes and 3 of one byte, and one byte of "total length", that > gives us 8 bytes, the same length as SIP addresses. Matter of opinion? It's *YOU* who hate variable length thingy. > Just as the current routing table gives information for the next hop > determination to various networks, nimrod routing table could give > information for the next hop determination to various areas. > > What "nimrod routing table"? Up until the invention of DMF's, Nimrod had no > routing tables, at least in anything like the classical sense. I'm not constrained with any classical sense. > I think areas should be represented with EID of border routers, thus, > routing table could be indexed with EIDs of border routers > > As I have already said, this scheme has what I regard as a *fatal* > disadvantage, No, I don't think you have shown any disadvantage. > in that it ties the abstraction hierarchy too closely to the > physical topology. Representing an area with border routers is the mathematically exact representation. The representation is minimal necessary. Anything less than that lacks information so that the it needs some amount of hand configuration and can't work agaist dynamic area subdivision or merge. > I think that the binding (i.e. linkage) between the two > should be something we can control, so that so that the change to the former > to match the latter is *controllable*. IMHO, the binding SHOULD NOT need any control. The binding should be automatic. No hand configuration, please. > You need to explain either i) why this goal is a bad goal, or ii) why some > other advantage of your scheme outweighs the disadvantage of not meeting > this goal. If you can't do either one, I don't think your scheme for areas > will be very useful. i) Your goal is bad because the primary goal is "NO NEED OF CONTROL". i) Your goal is bad because the definition of "controlable" is not given. With the arbitrary definition of "controlable", any scheme, including mine, is "controlable". It should also be noted that quite versatile control of my scheme is possible through DNS. Moreover, with any scheme including mine, you can, at least, control the configuration of area hierarchy, anyway. Enough? > What? Then, how intermediate routers can forward packets without > a lot of effort for flow setup? > > There is a certain amount of effort involved in getting ready to forward > packets, even in a hop-by-hop system. The routing has to distribute > information, and state has to be set up, etc, etc. It's just a different kind > of state; routing tables instead of topology databases, flow databases, etc. > I don't really know what you characterize as "a lot of effort". If you want to say flow set up do not need much effort, you should use flows set up by end hosts even for connectionless communications. > Well, I'm not sure about the "fully connected", but yes, there is a tradeoff > between the amount you do in advance, and the amount that is done in response > to actual traffic. Again, just like everywhere in Nimrod, this is a > cost/benfit knob that can be set locally, to allow the users to make the > decision on what quality of service they feel like paying for. I'm afraid your knob is on cost/cost without any benefit. > I think both will result in O(N^2) behaviour. > > I'm not sure I quite understand what you are getting at here, but if you are > talking about the amount of state required to hold the DMF's (assuming we use > point-point flows, and not trees, as I suggested, which would make it plain > O(N), just like hop-by-hop routing) it is a more complex calculation than > that. There is unlikely to be a single router through which all the flows will > pass. Within a single area, concentration is likely to happen. > In addition, the "N" here is a function of the size of the area, etc, so the > amount of state can be controlled by controlling the area size, etc. So, if the area size is 1,000, you will get 1,000,000 connections. > > Actually, the important question is the number of simultaneously active > > (i.e. set up) flows in a given router. > > I don't think you can assume any "flow" for connectionless communication. > > If this datagram scheme is adopted, then there will be a subset of the flows > (or tree-flows) which will be used for datagrams (i.e. connectionless > communication). Don't you misundersand the meaning of "connectionless"? UDP could be "connected", in which case flow IDs should be assigned. Still, there is need of "connectionless" UDP. > if a user want full protection against service denial, it is O(N^2). > The only way to have O(N^2) growth is to have O(N^2) growth in the total > number of *actual* connections (not *potential* connections) in the network. > However, the total number of hosts is only growing at O(N)! Since the number > of connections per host is clearly *not* growing at O(N), but is basically a > constant, claims of O(N^2) growth are clearly not well supported. With connectionless communication, by definition, there is NO end-end *actual* connections. So, the number of hosts is irrelevant. Still, with your scheme, all the routers must be connected. Thus, it is O(N^2). > It's much more reasonable to expect the end-end protection with each > applications on individual hosts. Aggragated protection through > intermediate routers can't be so meaningful. ... So, the protection, if > any, should be application-wise. > > True. I merely indicated that *some* denial-of-service protection was an > option (i.e. not a necessity). I think you are mixing up all the unrelated things. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa13273; 6 Jan 94 11:21 EST Received: from pizza by PIZZA.BBN.COM id aa01551; 6 Jan 94 10:52 EST Received: from BBN.COM by PIZZA.BBN.COM id aa01547; 6 Jan 94 10:50 EST Received: from Princeton.EDU by BBN.COM id aa09337; 6 Jan 94 10:16 EST Received: from ponyexpress.Princeton.EDU by Princeton.EDU (5.65b/2.103/princeton) id AA12754; Thu, 6 Jan 94 09:53:41 -0500 Received: from flagstaff.Princeton.EDU by ponyexpress.princeton.edu (5.65c/1.113/newPE) id AA14460; Thu, 6 Jan 1994 09:53:40 -0500 Received: by flagstaff.Princeton.EDU (4.1/Phoenix_Cluster_Client) id AA22699; Thu, 6 Jan 94 09:53:39 EST Message-Id: <9401061453.AA22699@flagstaff.Princeton.EDU> To: Masataka Ohta Cc: Noel Chiappa , nimrod-wg@BBN.COM Subject: Re: New datagram mode In-Reply-To: Your message of "Thu, 06 Jan 1994 19:19:15 +0200." <9401061019.AA29004@necom830.cc.titech.ac.jp> X-Mailer: exmh version 1.2gamma 12/21/93 Date: Thu, 06 Jan 1994 09:53:39 EST From: John Wagner Masataka, > Still, with your scheme, all the routers must be connected. Thus, it is > O(N^2). I think you are misunderstanding Noel's scheme. It requires that all routers have at lease 2 connections (this level and the next level up) but does not require more than 3 connections (this level, the next level up, and the next level down). It does not require that all routers be connected to all other routers although it allows for more than the 3 connections I described. The assumption is a loose mesh. Noel, The funny part of your scheme is that you've created a variation on NJE routing as practiced in Bitnet. There are parts of Bitnet (the core/backbone systems) that are connected in a (fully connected) mesh. But the routing through those parts are performed by sending the files over the pre-defined "flows" (NJE routes defined in the routing tables). These "flows" are computed globally and the maps distributed as flat files. But there are still occasional occurrences of loops when maps don't get updated in synch. The time to heal the loops is more important than the fact they occur. Should Nimrod perform dynamic loop detection? Using your scheme, is the following possible? I have packets following a predefined flow, but there is congestion in a part of the path that flow follows. Can I add a different path through the physical topology and split the flow over the two paths to dynamically increase the size of the pipe but still use the same flow identification? John Wagner   Received: from PIZZA.BBN.COM by BBN.COM id aa04854; 8 Jan 94 5:15 EST Received: from pizza by PIZZA.BBN.COM id aa12926; 8 Jan 94 5:04 EST Received: from BBN.COM by PIZZA.BBN.COM id aa12922; 8 Jan 94 5:01 EST Received: from necom830.cc.titech.ac.jp by BBN.COM id aa04661; 8 Jan 94 5:00 EST Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Sat, 8 Jan 94 18:55:51 +0859 From: Masataka Ohta Return-Path: Message-Id: <9401080956.AA11834@necom830.cc.titech.ac.jp> Subject: Re: New datagram mode To: John Wagner Date: Sat, 8 Jan 94 18:55:50 JST Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM In-Reply-To: <9401061453.AA22699@flagstaff.Princeton.EDU>; from "John Wagner" at Jan 6, 94 9:53 am X-Mailer: ELM [version 2.3 PL11] > Masataka, > > > Still, with your scheme, all the routers must be connected. Thus, it is > > O(N^2). > > I think you are misunderstanding Noel's scheme. It requires that all routers > have at lease 2 connections (this level and the next level up) In which case, packets are relayed hop-by-hop. > but does not > require more than 3 connections (this level, the next level up, and the next > level down). No. So? > It does not require that all routers be connected to all other > routers although it allows for more than the 3 connections I described. The > assumption is a loose mesh. Even if the physical topology is mesh of routers at the same level, all routers must be directly connected by flows, if you want to route packets through a flow without hop-by-hop relaying. >c@9F mohta@cc.titech.ac.jp   Received: from PIZZA.BBN.COM by BBN.COM id aa21100; 8 Jan 94 20:56 EST Received: from pizza by PIZZA.BBN.COM id aa15307; 8 Jan 94 20:45 EST Received: from nic.near.net by PIZZA.BBN.COM id aa15303; 8 Jan 94 20:42 EST Received: from GINGER.LCS.MIT.EDU by nic.near.net id aa12264; 8 Jan 94 20:43 EST Received: by ginger.lcs.mit.edu id AA15497; Sat, 8 Jan 94 20:43:41 -0500 Date: Sat, 8 Jan 94 20:43:41 -0500 From: Noel Chiappa Message-Id: <9401090143.AA15497@ginger.lcs.mit.edu> To: jcurran@nic.near.net, jnc@ginger.lcs.mit.edu Subject: Re: New datagram mode Cc: jnc@ginger.lcs.mit.edu, kasten@ftp.com, nimrod-wg@nic.near.net > It turns out that John Curran has convinced me that we probably ought to > keep the hop-count (as belt and suspenders engineering) Given a routing loop ... all of the datagrams sent to the destination will enter the loop and occupy a permanent portion of the capacity. This is a real issue, and I think he's right, we probably need to keep the hop count (although see below for a counter-argument). My reasoning is that preventing looping data traffic is very desireable, since the side-effects are pretty bad. In the best engineered robust systems, there is redundancy for critical functions, and if the redundancy consists of two entirely separate systems, so much the better (provided the cost is not excessive). The nicest thing about the hop count, from my point of view, is that it represents an *entirely* separate mechanism for dealing with the issue of looping packets. It is a simple, very effective mechanism for catching and killing looping packets. Of course, if a loop forms among a group of routers which do not properly decrement the hop count, this mechanism would fail too. Looping packets can still consume a fair amount of resources if they loop around a short loop at the start of a long path, until they are caught and killed. Finally, it is a mechanism which does add a certain expense to the forwarding of packets. This congestion would remain even after the source had ceased transmission Yes and no. If a certain amount of bandwidth were allocated to datagrams, presumably the loop would cause there to be more offered load than capacity. This would cause packets to be dropped (via whatever drop algorithm), so over time these packets would in all probability decay, so I don't think they'd be there forever. This could be seen as an argument as to why a hop count is not really necessary; if routing loops are very, very, infrequent, even a very poor and inefficient mechanism for dispensing with the resulting packets would be acceptable. However, a permanent loop could still cause severe problems, so I'm not sure I believe this argument. could prevent both management operations and propagation of correct maps. I'm not sure about this. I'd hope that a future network, as an aid to robustness, gives priority to management, operations and routing traffic over normal user traffic. So this shouldn't be an issue. Keep the hop-count; people makes mistakes, ergo maps can be inconsistant. A map which was inconsistent with reality would only lead to a failure when someone tried to set up the flow to instantiate the DMF, so that's not a viable failure mode. However, there are probably potential misconfigurations which can result in loops, I just can't think of any right now. There are other failure modes (involving implementation bugs) which do result in loops though, so they aren't impossible. (It's important to define the hop count in terms of all hops, active and non-active, less someone try to define it as a VLF counter...) Hmmm. Assuming each VLF is not a loop, then a count of VLF's should detect looping just as well as a count of actual hops. However, it's probably just as likely (due to code bugs, etc) to have a loop in the VLF, so a *completely independant* loop-detection mechanism at the lowest level (i.e. the hop count) is probably a good idea... In short, a hop count may not be necessary, or even the most efficient engineering, but it is certainly very robust, since it provides an entirely separate redundant method of preventing looping packets. This alone may make it worth it for a global critical communication substrate. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa22559; 8 Jan 94 22:04 EST Received: from pizza by PIZZA.BBN.COM id aa15463; 8 Jan 94 21:53 EST Received: from BBN.COM by PIZZA.BBN.COM id aa15459; 8 Jan 94 21:51 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa22413; 8 Jan 94 21:51 EST Received: by ginger.lcs.mit.edu id AA15671; Sat, 8 Jan 94 21:51:38 -0500 Date: Sat, 8 Jan 94 21:51:38 -0500 From: Noel Chiappa Message-Id: <9401090251.AA15671@ginger.lcs.mit.edu> To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp Subject: Re: New datagram mode Cc: nimrod-wg@BBN.COM > And remember, this calculation doesn't happen in *all* intermediate > routers, just "active" routers. All routers are either active or down. So? No. The term "active router" was defined in the original message as one that actually makes a decision about where to send the packet (as opposed to handling it as part of a flow). I.e., it's the router at the end of one DMF which chooses the next DMF to send the packet down. Perhaps a different term would be better, but I don't know what. > The first step, the calculation of the DMF path, is obviously the > expensive one, but that can happen before the packet actually arrives. > In fact, I'd probably make calculating DMF paths the "idle task" for the > router, Then, the network will melt down upon congestion. Again, no. Even if a router was fully busy handling traffic, it could still load-shed enough to calculate those routes, *if it had to*. However, I expect this will almost never happen. I don't think there is a *single* router in the entire Internet which is busy 100% of the time. In the usual case, which is what will almost always happen, these calculations will be performed with cycles which would have otherwise gone unused, so they are effectively "free" most of the time. Why don't you try to avoid "obviously expensive" things, instead? This statement represents extremely simplistic design thinking. The goal here is not to absolutely minimize the cost, since to do so usually involves getting rid of some benefit as well. The important question about a given feature is not just "how much does it cost", but "what does that cost buy you". For something as complex as the underlying data layer, it's difficult to assess exactly the benefits. I reckon the increased robustness is well worth it, particularly as the resource being consumed (processing cycles) is one which is increasing at fantastic rates. Cycles are cheap, and getting cheaper. There are also a number of lesser issues, such as the side-benefit that this can provide an easy way to create resource limits on datagram traffic, to prevent datagram traffic interfering with the resources allocated to other traffic. So, it's not that simple. > The second step is the selection of the appropriate DMF when a packet > arrives. This is a "longest match" table lookup, Then, the lookup will be slow. This is exactly the same as the table lookup done now in the Internet. I haven't notices that it's amazingly slow. In fact, I'll bet Steve Deering will have a fit when you tell him that SIP fowarding is going to be "slow" because of it. > As I have already said, this scheme has what I regard as a *fatal* > disadvantage, No, I don't think you have shown any disadvantage. I regard the phrase immediately below as a severe disadvantage. We obviously disagree. > in that it ties the abstraction hierarchy too closely to the > physical topology. Representing an area with border routers is the mathematically exact representation. The representation is minimal necessary. No, it's not the minimal representation. It's the minimal *definition*. If we assign unique names to these definitions, the unique names (i.e. the representation) may still be considerably shorter than the definitions. Anything less than that lacks information so that the it needs some amount of hand configuration and can't work agaist dynamic area subdivision or merge. This has disadvantages too. Assigning new locators to everything due to a temporary partition, or router outage, is a real pain. The binding should be automatic. No hand configuration, please. Things which are purely automatic are very inflexible. I don't believe a mandatory automatic algorithm of the form you suggest is good design for a system of this size. In addition, the setting of area boundaries is something which is almost certaily going to involve some configuration, or do you propose to automate that as well, in a way which utterly removes humans from the loop? > You need to explain either i) why this goal is a bad goal, or ii) why > some other advantage of your scheme outweighs the disadvantage of not > meeting this goal. i) Your goal is bad because the primary goal is "NO NEED OF CONTROL". I'm not sure I agree that this is *the* primary goal, but I can meet this goal (which is essentially the goal of automatic configuration) without throwing away the goal of flexible control. Your scheme removes the possibility of flexible control. All throughout Nimrod there are "necessary" algorithms, i.e. algorithms which must be there, but for which a particular algorithm is not a fundamental part of the architecture. Examples are route selection, etc, etc. Particular algorithms are not fundamental precisely to allow local control, experimentation, replacement, etc. If we make the algorithm for naming of areas a similar local option, we can certainly define a "default" algorithm for picking area names which *is* autoconfiguring, i.e. your goal. If people then don't like the results, they are free to substitute some other algorithm. By making the process of assigning names to areas a mechanical one which precludes any possibility of changing it, you are removing that choice. Your goal is bad because the definition of "controlable" is not given. With the arbitrary definition of "controlable", any scheme, including mine, is "controlable". OK, I have a very simple definition of "controllable" for you, then. An object's locator should not change *automatically* when a particular router (e.g. any of the border routers of an enclosing area) is taken out of service, or a similar minor topology change is made. Your scheme does not have this property. Moreover, with any scheme including mine, you can, at least, control the configuration of area hierarchy, anyway. This is irrelevant, since what I object to is the close tie between the configuration and the resulting locators, not the process of configuration. >> how intermediate routers can forward packets without a lot of effort >> for flow setup? > There is a certain amount of effort involved in getting ready to forward > packets, even in a hop-by-hop system. If you want to say flow set up do not need much effort, you should use flows set up by end hosts even for connectionless communications. Overhead of the flow set up by a host on an end-end basis for a datagram is all borne by that single datagram, whereas the overhead for path calculation, setup, etc for a DMF between two routers is shared among all the datagrams which go from one router to another. I would have thought that the difference, and thus the advantage of DMF's for datagrams, would have been obvious. >> I think both will result in O(N^2) behaviour. > if you are talking about the amount of state required to hold the DMF's > it is a more complex calculation than that. There is unlikely to be a > single router through which all the flows will pass. Within a single area, concentration is likely to happen. This is an assertion for which you provide no evidence, and which does not seem at all correct to me. > In addition, the "N" here is a function of the size of the area, etc, so > the amount of state can be controlled by controlling the area size, etc. So, if the area size is 1,000, you will get 1,000,000 connections. No, your calculation is completely wrong. You don't appear to understand the mechanism at all. I will perform the detailed calculation in a reply to your message to John Wagner, since this message is already quite long. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa27025; 8 Jan 94 23:15 EST Received: from pizza by PIZZA.BBN.COM id aa15662; 8 Jan 94 23:06 EST Received: from BBN.COM by PIZZA.BBN.COM id aa15658; 8 Jan 94 23:03 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa26820; 8 Jan 94 23:04 EST Received: by ginger.lcs.mit.edu id AA15816; Sat, 8 Jan 94 23:04:19 -0500 Date: Sat, 8 Jan 94 23:04:19 -0500 From: Noel Chiappa Message-Id: <9401090404.AA15816@ginger.lcs.mit.edu> To: jwagner@princeton.edu, mohta@necom830.cc.titech.ac.jp, nimrod-wg@BBN.COM Subject: Re: New datagram mode Cc: jnc@ginger.lcs.mit.edu > Still, with your scheme, all the routers must be connected. Thus, it is > O(N^2). I think you are misunderstanding Noel's scheme. ... It does not require that all routers be connected to all other routers It's not clear to me exactly what he thinks is O(N^2), whether it is the total number of flows needed (which is a relatively uninteresting number, actually), or the number of flows passing through any particular router (which *is* the important number), or what. But you're right, it does not require complete connectivity, although the actual math is rather complex, and depends on a number of parameters. It requires that all routers have at lease 2 connections (this level and the next level up) but does not require more than 3 connections (this level, the next level up, and the next level down). The latter number is incorrect, since there can be more than one "next level down"; i.e. if I'm a border router for A, and A contain A.1 .. A.N, I need N flows (people using the word "connection" on this mailing list will be forced to wash off their keyboards with soap :-), one to each A.i in A. The funny part of your scheme is that you've created a variation on NJE routing as practiced in Bitnet. This is interesting. Can you provide a reference? These "flows" are computed globally and the maps distributed as flat files. If I understand you correctly, these routes/paths are computed at one central location? Yes, in that way it is similar, but there are important differences; the "next hop" selection still seems to be hop-by-hop, since you refer to the failure mode "when maps don't get updated in synch". Use of set-up flows (where the state is installed in a reliable way, over the network) avoids this. Should Nimrod perform dynamic loop detection? As far as I can tell, loops could only form in the presence of malfunctioning software; simple timing errors, packet loss, etc, can't do it. I need to think about this for a while, though. Obviously, it's a lot trickier to handle faults when arbitarily malfunctioning switches can be involved... Using your scheme, is the following possible? I have packets following a predefined flow, but there is congestion in a part of the path that flow follows. Can I add a different path through the physical topology and split the flow over the two paths to dynamically increase the size of the pipe but still use the same flow identification? Hmm, good question. I think it depends on how widely you wish to reroute, and how the flow was set up originally. Nimrod does allow local repair in some circumstances, and this could be considered as a "repair". For instance, if you have a virtual link which is built out of a collection of physical links, you can move the flow from one link to another without violating the "contract" which was performed in setting up the flow, since the traffic continues to flow over the virtual link which the source specified; the internal details are hidden. The original application for this was if one of the links failed, but it is equally applicaable if one of the links becomes congested. Splitting the load across several links in a circumstance like this is somewhat different, but would still be fine. Obviously, if the source specifically called for physical link X, and that link becomes congested, it would be a violation of the "contract" to move it to another link without letting the source know. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa05764; 9 Jan 94 4:24 EST Received: from pizza by PIZZA.BBN.COM id aa16600; 9 Jan 94 4:09 EST Received: from nic.near.net by PIZZA.BBN.COM id aa16596; 9 Jan 94 4:06 EST Received: from lager.cisco.com by nic.near.net id aa16823; 9 Jan 94 4:07 EST Received: by lager.cisco.com id AA11605 (5.67a/IDA-1.5 for nimrod-wg@nic.near.net); Sun, 9 Jan 1994 01:07:31 -0800 Date: Sun, 9 Jan 1994 01:07:31 -0800 From: Tony Li Message-Id: <199401090907.AA11605@lager.cisco.com> To: Noel Chiappa Cc: jnc@ginger.lcs.mit.edu, kasten@ftp.com, nimrod-wg@nic.near.net Subject: Re: New datagram mode could prevent both management operations and propagation of correct maps. I'm not sure about this. I'd hope that a future network, as an aid to robustness, gives priority to management, operations and routing traffic over normal user traffic. So this shouldn't be an issue. From an entirely pragmatic point of view, this turns out to be very difficult to do today. Classification of what's "important" is challenging, and can only happen after the packet is already in the box, consuming buffer memory. It's a nice idea, but not something that I think will be pervasive anytime soon. Tony   Received: from PIZZA.BBN.COM by BBN.COM id aa08821; 9 Jan 94 7:31 EST Received: from pizza by PIZZA.BBN.COM id aa17134; 9 Jan 94 7:20 EST Received: from BBN.COM by PIZZA.BBN.COM id aa17130; 9 Jan 94 7:18 EST Received: from necom830.cc.titech.ac.jp by BBN.COM id aa08665; 9 Jan 94 7:18 EST Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Sun, 9 Jan 94 21:14:11 +0900 From: Masataka Ohta Return-Path: Message-Id: <9401091214.AA14974@necom830.cc.titech.ac.jp> Subject: Re: New datagram mode To: Noel Chiappa Date: Sun, 9 Jan 94 21:14:09 JST Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM In-Reply-To: <9401090251.AA15671@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 8, 94 9:51 pm X-Mailer: ELM [version 2.3 PL11] > > And remember, this calculation doesn't happen in *all* intermediate > > routers, just "active" routers. > > All routers are either active or down. So? > > No. The term "active router" was defined in the original message as one that > actually makes a decision about where to send the packet (as opposed to > handling it as part of a flow). That's the router. The other which operates only in data link layer is not. > > The first step, the calculation of the DMF path, is obviously the > > expensive one, but that can happen before the packet actually arrives. > > In fact, I'd probably make calculating DMF paths the "idle task" for the > > router, > Again, no. Even if a router was fully busy handling traffic, it could still > load-shed enough to calculate those routes, *if it had to*. However, I expect > this will almost never happen. I don't think there is a *single* router in the > entire Internet which is busy 100% of the time. In the usual case, which is > what will almost always happen, these calculations will be performed with > cycles which would have otherwise gone unused, so they are effectively "free" > most of the time. If you don't think CPU will e 100% loaded, it is waste of band width to say something should be "idle task". > Why don't you try to avoid "obviously expensive" things, instead? > > This statement represents extremely simplistic design thinking. The goal here > is not to absolutely minimize the cost, since to do so usually involves > getting rid of some benefit as well. The important question about a given > feature is not just "how much does it cost", but "what does that cost buy > you". But, with your vague description, I can't evaluate the cost. How, for example, is the cost of your imagenary DNS TNG? > For something as complex as the underlying data layer, it's difficult to > assess exactly the benefits. That's one of the reason why we should avoid complex underlying data layer. > I reckon the increased robustness is well worth > it, The simpler the more rubust. > particularly as the resource being consumed (processing cycles) is one > which is increasing at fantastic rates. Cycles are cheap, and getting cheaper. Do you remember that you also said bandwidth will be cheaper? The amount of processing has nothing to do with the complexity of processing. For example, instead of always sending all the data, you can send delta to the previous data, which decreasees CPU load and link load and increases the complexity of the protocol. Why you objected source routing, then? Wasn't your reason that it will consume more bandwidth and CPU? While I don't think source routing any costly, your reasoning contains too much contradictions. > There are also a number of lesser issues, such as the side-benefit that this > can provide an easy way to create resource limits on datagram traffic, to > prevent datagram traffic interfering with the resources allocated to other > traffic. So, it's not that simple. Here, your terminology is incorrect. Seemingly, you don't understand the difference between "connectionless communication" and "datagram communication", which should be the source of your confusion. Perhaps, your schems is merely the scheme to handle connected datagrams only. There do exist connected datagram communication to which some resource is pre-allocated along a flow to ensure maximum bandwidth or minimum latency. In this case, all datagrams will have a pre-assigned flow ID. So, your statement should be: < There are also a number of lesser issues, such as the side-benefit that this < can provide an easy way to create resource limits on connectionless < traffic, to prevent connectionless traffic interfering with the < resources allocated to connected traffic. So, it's not that simple. BTW, it is absurd to imporse resource limits on connectionless communications. If actual traffic of connected traffic is less than pre-allocation, the extra bandwidth should be used for connectionless traffic. Anyway, bandwidth allocation is not so complex. > > The second step is the selection of the appropriate DMF when a packet > > arrives. This is a "longest match" table lookup, > > Then, the lookup will be slow. > > This is exactly the same as the table lookup done now in the Internet. I > haven't notices that it's amazingly slow. In fact, I'll bet Steve Deering will > have a fit when you tell him that SIP fowarding is going to be "slow" because > of it. So, let's use my exact match scheme with source routing by EIDs of border routers. > > As I have already said, this scheme has what I regard as a *fatal* > > disadvantage, > > No, I don't think you have shown any disadvantage. > > I regard the phrase immediately below as a severe disadvantage. We obviously > disagree. Severe? Why? Didn't you thin CPU and bandwidth is cheap and getting cheaper? Moreover, I don't think packets should have the full representation of an area. Packets should have source route of border routers, instead, which is a lot shorter. > > in that it ties the abstraction hierarchy too closely to the > > physical topology. > > Representing an area with border routers is the mathematically exact > representation. The representation is minimal necessary. > > No, it's not the minimal representation. It's the minimal *definition*. If we > assign unique names to these definitions, the unique names (i.e. the > representation) may still be considerably shorter than the definitions. It's the minimal representation of an area. Anything shorter needs external information such as mapping between Area_ID and EID of border routers. The lack of such mapping is the fatal defect of your scheme. > Anything less than that lacks information so that the it needs some > amount of hand configuration and can't work agaist dynamic area > subdivision or merge. > > This has disadvantages too. Assigning new locators to everything due to a > temporary partition, or router outage, is a real pain. That is the disadvantage of your scheme, not mine. Your locator won't work against area subdivision. I don't think we need any locator, actually. With my scheme, a bit old, but stable locator information is stored in DNS. On the other hand, routing information exchange will provide the up to date real topology of areas. > The binding should be automatic. No hand configuration, please. > > Things which are purely automatic are very inflexible. I don't believe a > mandatory automatic algorithm of the form you suggest is good design for > a system of this size. I don't think a lot of hand configuration is possibble for a system of this size. > In addition, the setting of area boundaries is something which is almost > certaily going to involve some configuration, So? That's what I wrote. > Moreover, with any scheme including mine, you can, at least, control > the configuration of area hierarchy, anyway. > > This is irrelevant, since what I object to is the close tie between the > configuration and the resulting locators, not the process of configuration. > Your goal is bad because the definition of "controlable" is not given. > With the arbitrary definition of "controlable", any scheme, including > mine, is "controlable". > > OK, I have a very simple definition of "controllable" for you, then. An > object's locator should not change *automatically* when a particular router > (e.g. any of the border routers of an enclosing area) is taken out of service, > or a similar minor topology change is made. Your scheme does not have this > property. As I have wrote several times, an locator information of an object is accessile through DNS, which should not, can not and, thus, does not change *automatically*. DNS gives information on "all the possible pathes". On the other hand, topology information on configuration within areas and connectivity between areas MUST change *automatically*. Routing information gives information on "all the availale pathes". An end sytem should/can/do construct source routing path combining information on "all the possible pathes", "all the available pathes" and preffered policy. On the other hand, if you use shorhanded area ID, if an area is subdivided, you must change objects' locator *automatically*. That is, such a scheme is NOT controllable. > If you want to say flow set up do not need much effort, you should > use flows set up by end hosts even for connectionless communications. > > Overhead of the flow set up by a host on an end-end basis for a datagram is > all borne by that single datagram, whereas the overhead for path calculation, > setup, etc for a DMF between two routers is shared among all the datagrams > which go from one router to another. Thus, you must assumes all the routers are conneted, here, which is O(N^2). > I would have thought that the difference, and thus the advantage of DMF's > for datagrams, would have been obvious. One of the reason of your confusion is you think conneted datagram only. > >> I think both will result in O(N^2) behaviour. > > > if you are talking about the amount of state required to hold the DMF's > > it is a more complex calculation than that. There is unlikely to be a > > single router through which all the flows will pass. > > Within a single area, concentration is likely to happen. > > This is an assertion for which you provide no evidence, and which does not > seem at all correct to me. That's the topology of the most network providers today. > > In addition, the "N" here is a function of the size of the area, etc, so > > the amount of state can be controlled by controlling the area size, etc. > > So, if the area size is 1,000, you will get 1,000,000 connections. > > No, your calculation is completely wrong. You don't appear to understand the > mechanism at all. I will perform the detailed calculation in a reply to your > message to John Wagner, since this message is already quite long. The problem is that you wrongly think there is some "real flow" with connectionless communication. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa18802; 9 Jan 94 16:34 EST Received: from pizza by PIZZA.BBN.COM id aa18517; 9 Jan 94 16:17 EST Received: from BBN.COM by PIZZA.BBN.COM id aa18513; 9 Jan 94 16:15 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa18320; 9 Jan 94 16:15 EST Received: by ginger.lcs.mit.edu id AA17145; Sun, 9 Jan 94 16:15:15 -0500 Date: Sun, 9 Jan 94 16:15:15 -0500 From: Noel Chiappa Message-Id: <9401092115.AA17145@ginger.lcs.mit.edu> To: jwagner@princeton.edu, mohta@necom830.cc.titech.ac.jp Subject: Analysis of DMF's in new datagram mode Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM > It requires that all routers have at lease 2 connections (this level and > the next level up) In which case, packets are relayed hop-by-hop. His statement about "2 connections (sic, grrr :-) - this level and the next level up" is a little confusing to me. How many DMF's a router needs *in the minimum possible configuration* depends on whether or not it is a border router. For routers which are not border routers, they need one DMF out to a border router (i.e. up), and that's it. Traffic to other objects inside the area can be handled by sending it to the border router, which will send it back to the correct object. (Hey, it's not very optimal routing, but I'm talking about the *minimal* set, right?) Border routers require one DMF to each constituent object in their area, other than ones which the border router is already a constituent of. If an object is in itself an area (i.e. a sub-area of the area in question), with multiple border routers, they only need one DMF to that object, to any one of the border routers of the sub-area. Obviously, routers which are border routers for a K level area are also either border or interior routers in a K+1 level area; if interior, they will need only the 1 DMF, but if they are border routers at that level as well they will need DMF's for all objects in that K+1 level area which they are not constituents of. Even if the physical topology is mesh of routers at the same level, all routers must be directly connected by flows, if you want to route packets through a flow without hop-by-hop relaying. No. Let's do the calculation, I'm tired of these incorrect assertions. Let's talk about an area which has a total of B border routers, and I interior objects. A percentage P of the objects are themselves sub-areas of A, so there are (1-P)*I interior non-sub-area objects, and P*I sub-areas. For the sub-areas, each has an average of S border routers. For the sake of simplicity, let's assume that *none* of the border routers is part of any sub-area. So, the resulting numbers are worst-case, since any border router which *is* part of a sub-area removes the need for a DMF to that sub-area; the packet is already in that sub-area. Each of the B border routers has I DMF's associated with this area, for a total of B*I. Each of the interior routers has single DMF. The non-area routers contribute (1-P)*I, and border routers of sub-areas contribute P*I*S. So, the total number of DMF's Ft associated with that area in the area is: Ft = I * (B + (1-P) + P*S) Exactly how fast this will grow as the area grows is thus somewhat tricky, and can't be computed without some assumptions about the relationship between B and I, etc. However, we can take a crack at it. We can pretty well assume the P is a constant, and we can assume either i) that S is a constant, since the area grows by creating more sub-areas, not making the existing sub-areas larger, or ii) that S grows at the same rate as B (since they both represent the count of border routers, just at different level). Let's take the worst case, which is that O(B) = O(S). The growth rate of the total number of DMF's, O(Ft) is: O(Ft) = O(I) * O((1 + P)*B) or O(Ft) = O(I) * k*O(B) I don't know the exact formula for the number of border nodes on a graph with a constant degree of connection between the nodes, but I'll hazard the guess that the number of border nodes will grow as the square root of the number of total nodes. (Handwave justification for this is that it's probably the same as the ratio of the circumference of a circle to the area of a circle, since you can model a graph with a fixed degree of connectivity among the nodes as a geometrical figure where each node takes a fixed area, and thus the area is proportional to the number of nodes.) So, that gives us that O(B) is O(sqrt(I)). So, finally we get that: O(Ft) = k*O(I^1.5) or just plain O(I^1.5) for short. Now, that's for one level. Note that if we grow the system by adding more levels of area (a likely happening), rather than increasing I, then I is a constant, so O(I) = 1, so there is no growth in Ft at all! Now, that was the total number of DMF's in the area. That's a relatively uninteresting number, for reasons which should be intuitively obvious to everyone. What's *important* are the number of DMF's which end in a router, and the number of DMF's which go through (i.e. require state for storage) interior routers. Lets look at the number of DMF's which end in a router, Fe, first. We can ignore interior routers, since they are a simple case; Fe = 1. The hard case is the border routers. There, Fe = I, but that only counts DMF's for this area. If a router is a border router for a number of levels of area, it will have: Fe = sum(Ii), i=l...m where Ii is the value of I for the area at level i, and l and m are the bounds on the number of areas that router is a border router for. In the worst case, l=0, and m=L, where L is the maximum number of levels in the system. So, for that worst case: Fe = sum(Ii), i=0..L If, for simplicities sake, we assume that all Ii average I, then: Fe = I*L, and: O(Fe) = O(I) * O(L) Exactly what O(I) and O(L) are remains to be seen, but we can grow the system be holdingyI constant, and growing L. As a matter of fact, if N is the total number of nodes in the system, and I is constant, then O(L) = logN. So, in this worst case scenario: O(Fe) = O(logN) if we don't grow areas, but instead grow the number of levels. To turn to the average number of DMF's through an router, Fa, we can calculate that if we know the average path length for a DMF. The average number of DMF's can be given by the formula: total number of DMF's * average DMF length Fa = ------------------------------------------- total number of routers The average path length, A, for a graph of fixed degree (i.e. one in which nodes have the same average number of arcs to neighbouring nodes, independant of the size of the graph) is logN (where N is the number of nodes). [Chen 86] This seems like a reasonable model to use, since physical routers have a small, relatively constant average number of interfaces. We can use the number of interior objects (I), plus the number of border routers (B), as the number of nodes, in calculating the path length. So, that gives us: A = log(I + B). However, if we assume that O(B) = O(sqrt(I)), as above in the calculation of Ft above, then B tends to become irrelevant, and: O(A) = O(logI) We already deduced that the total number of flows, Ft, was: Ft = I * (B + (1-P) + P*S) and that: O(Ft) = O(I^1.5) What exactly to call the "total number of nodes" is a little tricky, and here's where I have to do some even more energetic handwaving than usual. Strictly speaking, we can't simply use the number of interior objects (I), plus the number of border routers (B). The problem is that for the cases where a DMF traverses a sub-area, it will traverse two border routers on that sub-area, as well as some interior routers in that sub-area. However, I will make the simplifying assumption that the "path length" calculation above already made the assumption that each sub-area counted as one "node", and errors above and below the division will cancel out. SO, that gives us that the total number of routers, R, is: R = (I + B), and again, assuming that B becomes irrelevant: O(R) = O(I) So, since: O(Fa) = (O(Ft) * O(A)) / O(R) substituting in for O(Ft), etc, gives us that: O(Fa) = (O(I^1.5) * O(logI)) / O(I) or: O(Fa) = O(sqrt(I) * logI) Again, of course, this only counts the flows from one level. I could make some assumptions based on the idea that we will use an "interstate" physical toplogy where long-distance connectivity is provided by a physical mesh at the high levels, not by using the "local" meshes down an abitratily large numebr of levels, but only a few levels down, but my head started to hurt at that point; let's just do the simple thing for now. If we assume that the worst case is that we have flows from all levels, L, giving us a new count of DMF's, Fat, for the worst case we will make the above quantity worse by a term of O(L), i.e.: O(Fat) = O(sqrt(I) * logI) * O(L) As stated above, exactly what O(I) and O(L) are remain to be seen, but again, we can grow the system by holding I constant, and growing L. Again, I is constant, and O(L) = logN, so, again in this worst case scenario: O(Fat) = O(logN) if we don't grow areas, but instead grow the number of levels. I suppose I could sit and think about what the relationship between N, L and I would be, if we decide to grow areas, and not the number of levels, but I don't think it's worth it. However you slice it, these are *not* major growth rates. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa18810; 9 Jan 94 16:34 EST Received: from pizza by PIZZA.BBN.COM id aa18541; 9 Jan 94 16:19 EST Received: from BBN.COM by PIZZA.BBN.COM id aa18537; 9 Jan 94 16:18 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa18404; 9 Jan 94 16:18 EST Received: by ginger.lcs.mit.edu id AA17165; Sun, 9 Jan 94 16:18:22 -0500 Date: Sun, 9 Jan 94 16:18:22 -0500 From: Noel Chiappa Message-Id: <9401092118.AA17165@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: My math Cc: jnc@ginger.lcs.mit.edu I'll bet there are at least Z math errors in that last long message about the number of DMF's needed. I just typed it in in one fell swoop, and nobdy has checked it, so I'd be amazed if there weren't any errors! If anyone sees any, I *am* interested in hearing about them. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa02480; 10 Jan 94 1:41 EST Received: from pizza by PIZZA.BBN.COM id aa20062; 10 Jan 94 1:23 EST Received: from BBN.COM by PIZZA.BBN.COM id aa20058; 10 Jan 94 1:21 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa01788; 10 Jan 94 1:04 EST Received: by ginger.lcs.mit.edu id AA19705; Mon, 10 Jan 94 01:04:07 -0500 Date: Mon, 10 Jan 94 01:04:07 -0500 From: Noel Chiappa Message-Id: <9401100604.AA19705@ginger.lcs.mit.edu> To: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM Subject: Re: My math I'll bet there are at least Z math errors in that last long message about the number of DMF's needed. Well, I just found one error, but it wasn't just math. In the calculation of the number of number of DMF's which end in a router, Fe, I counted the origins of DMF's, but not the terminations. Here's a replacement for that section of the note, with that error corrected. It turns out that the results are the same, except that if you want the *exact* value for O(Fe) there is a constant (1 + P) factor. Big deal. Noel -------- Lets look at the number of DMF's which end in a router, Fe, first. For interior routers, the count is the 1 outgoing DMF, plus incoming DMF's from the border routers. There are two subcases: for routers which are not border routers of sub-areas, all B border routers have a DMF to the router, so there: Fei = (1 + B) For routers which are border routers of sub-areas, the incoming DMF's are shared among the border routers of the sub-area; assuming the distribution is even, this gives us: Fes = (1 + B/S) Note that in the worst case distribution, this is only as bad as the case above, so we can ignore this case. The border routers are a slightly harder case. There are I outgoing DMF's, and the incoming DMF's from the interior routers are shared among all the border routers. Again, assuming an even distribution, this gives us: Feb = (I + ((1-P)*I + P*I*S)/B) or: Feb = I(1 + ((1-P) + P*S)/B) Again, we can assume that P is a constant, and we can assume the worst case, which is that O(B) = O(S). This gives us that: O(Feb) = O(I) * O(1 + (c1 + P*B)/B) or: O(Feb) = O(I) * O(1 + P*B/B) or: O(Feb) = (1 + P) * O(I) This makes intuitive sense, since we know that the DMF's from interior routers which aren't border routers of sub-areas are shared among all border routers of that area, whereas each border router has to maintain a DMF in to each of those routers, so that term drops off in importance. On the other hand, since the number of interior routers which are border routers of sub-areas are growing as fast as the number of border nodes, that term will remain. Since we know that O(B) = O(sqrt(I)), in the long run, as areas get large, the particular Fe which will experience the highest growth rate is Feb, i.e. the number of DMF's which end in a border router. This also makes good sense. That growth rate is just plain old O(I) for short. The analysis above only counts DMF's for this area. If a router is a border router for a number of levels of area, it will have: Febm = (1 + ((1-P) + P*S)/B) * sum(Ii), i=l...m where Ii is the value of I for the area at level i, and l and m are the bounds on the number of areas that router is a border router for. In the worst case, l=0, and m=L, where L is the maximum number of levels in the system. So, for that worst case: Febm = (1 + ((1-P) + P*S)/B) * sum(Ii), i=0..L If, for simplicity's sake, we assume that all Ii average I, then: Febm = (1 + ((1-P) + P*S)/B) * I*L and, using the same analysis as above: O(Febm) = O(I) * O(L) Exactly what O(I) and O(L) are remains to be seen, but we can grow the system be holding I constant, and growing L. As a matter of fact, if N is the total number of nodes in the system, and I is constant, then O(L) = logN. So, in this worst case scenario: O(Febm) = O(logN) if we don't grow areas, but instead grow the number of levels.   Received: from PIZZA.BBN.COM by BBN.COM id aa03421; 10 Jan 94 12:27 EST Received: from pizza by PIZZA.BBN.COM id aa22308; 10 Jan 94 12:12 EST Received: from BBN.COM by PIZZA.BBN.COM id aa22304; 10 Jan 94 12:09 EST Received: from necom830.cc.titech.ac.jp by BBN.COM id aa02133; 10 Jan 94 12:08 EST Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 11 Jan 94 02:03:54 +0859 From: Masataka Ohta Return-Path: Message-Id: <9401101704.AA23795@necom830.cc.titech.ac.jp> Subject: Re: Analysis of DMF's in new datagram mode To: Noel Chiappa Date: Tue, 11 Jan 94 2:03:53 JST Cc: jwagner@princeton.edu, jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM In-Reply-To: <9401092115.AA17145@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 9, 94 4:15 pm X-Mailer: ELM [version 2.3 PL11] > For routers which are not border routers, they need one DMF out to a border > router (i.e. up), and that's it. Traffic to other objects inside the area can > be handled by sending it to the border router, which will send it back to the > correct object. (Hey, it's not very optimal routing, but I'm talking about > the *minimal* set, right?) Wrong. There will be load concentration at border routers, then. > What's *important* are the number of DMF's which end in a router, > and the number of DMF's which go through (i.e. require state for storage) > interior routers. The trick here is that, not all DMFs are equal. That is, higher level DMF means a lot more traffic through it. That's why you can't increase the number of levels at will. Your model is not scalable. > The average path length, A, for a graph of fixed degree (i.e. one in which > nodes have the same average number of arcs to neighbouring nodes, independant > of the size of the graph) is logN (where N is the number of nodes). [Chen 86] Your reasoning is already broken... But... While such an average over all the possible graphs will be so, as your topology is mesh, you should average only over planar graphs. Thus, the average path length will be sqrt(N). It should have been obvious from the beginning when you said "MESH". Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa05130; 10 Jan 94 12:57 EST Received: from pizza by PIZZA.BBN.COM id aa22371; 10 Jan 94 12:34 EST Received: from nic.near.net by PIZZA.BBN.COM id aa22367; 10 Jan 94 12:32 EST Received: from GINGER.LCS.MIT.EDU by nic.near.net id aa08593; 10 Jan 94 12:33 EST Received: by ginger.lcs.mit.edu id AA21215; Mon, 10 Jan 94 12:32:40 -0500 Date: Mon, 10 Jan 94 12:32:40 -0500 From: Noel Chiappa Message-Id: <9401101732.AA21215@ginger.lcs.mit.edu> To: jnc@ginger.lcs.mit.edu, tli@cisco.com Subject: Re: New datagram mode Cc: kasten@ftp.com, nimrod-wg@nic.near.net > I'd hope that a future network, as an aid to obustness, gives priority > to management, operations and routing traffic over normal user traffic. this turns out to be very difficult to do today. Classification of what's "important" is challenging, and can only happen after the packet is already in the box, consuming buffer memory. Yeah, as processing of packets gets faster and faster, it gets harder and harder to do anything complicated while you have your hands on them! I would expect that part of the solution to the conundrum you raise is the deployment of a "real" resource allocation and congestion control system at the internetwork layer, together with flows. For instance, in a fast hardware router (such as my "Faswitch" device), this would enable you to segregate different packet streams into separate buffer pools in hardware, partially avoiding the problem you mention. A general strategy for handling this problem is to have small "incoming" pool, and divide up the main pool into "transit" and "operation and maintainence". When you go congested, you only allow packets out of the incoming pool if there is space in the O&M pool. If not, transit packets get flushed as soon as they hit the box. The users won't like it, but, hey, tough, the network has to protect itself. Of course, you can always think up peculiar traffic patterns that will make even this not work; e.g. an overload of O&M traffic. However, I don't think there's anything *fundamental* that says we can't offer priority to O&M traffic, and I think a robust design ought to make every effort to do so... Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa08039; 10 Jan 94 13:46 EST Received: from pizza by PIZZA.BBN.COM id aa22578; 10 Jan 94 13:28 EST Received: from BBN.COM by PIZZA.BBN.COM id aa22574; 10 Jan 94 13:26 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa06734; 10 Jan 94 13:26 EST Received: by ginger.lcs.mit.edu id AA21447; Mon, 10 Jan 94 13:25:45 -0500 Date: Mon, 10 Jan 94 13:25:45 -0500 From: Noel Chiappa Message-Id: <9401101825.AA21447@ginger.lcs.mit.edu> To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp Subject: Re: New datagram mode Cc: nimrod-wg@BBN.COM > No. The term "active router" was defined in the original message as one > that actually makes a decision about where to send the packet (as > opposed to handling it as part of a flow). That's the router. The other which operates only in data link layer is not. The other ones are *not* operating only in the data link layer. They throw away the physical network header, look at the intertnetwork header to figure out what to do with the packet, and create a new physical network header. Sure sounds like a "router" to me. The only difference is they don't get to make decisions about which path this particular packet stream will use. I.e. the selector is a flow-identifier, not a locator. How, for example, is the cost of your imagenary DNS TNG? The DNS is not involved in any way with the new datagram mode, other than providing the locator to the host before the packet is sent, *exactly* the way the DNS works now. Since this cost is currently deemed acceptable, I would assume it will be acceptable in the future. Perhaps I do not understand your question? > I reckon the increased robustness is well worth it The simpler the more rubust. Excuse me while I roll on the floor laughing. A common technique of building engineers is to provide *redundant* load paths, so that if one fails, the building will not collapse. Redundancy is *not* simpler, but it is more robust. Your assertion is completely wrong. Why you objected source routing, then? Wasn't your reason that it will consume more bandwidth and CPU? I don't like source routing because the costs of source routing must be paid on *each* packet, whereas of the overhead of calculating and setting up DMF's can be amortized over many datagrams. > this can provide an easy way to create resource limits on datagram > traffic, to prevent datagram traffic interfering with the resources > allocated to other traffic. Perhaps, your schems is merely the scheme to handle connected datagrams only. No, it is a scheme *precisely* to handle datagram used in "one-shot" applications such as DNS lookups, etc. So, your statement should be: < the side-benefit that this can provide an easy way to create resource < limits on connectionless traffic, to prevent connectionless traffic < interfering with the resources allocated to connected traffic. If I understand what you mean correctly, this says what I thought I was saying. I don't like to use the term "connections" because then everyone starts thinking I mean something like X.25, where critical state is in the switches. I think the problem is that we don't have good terminology for "datagrams which are part of ongoing data streams" (e.g. the IP datagrams which are part of a TCP connection), and "datagrams which are *not* part of ongoing data streams" (e.g. UDP DNS traffic). Anyone have any suggestions? BTW, it is absurd to imporse resource limits on connectionless communications. If actual traffic of connected traffic is less than pre-allocation, the extra bandwidth should be used for connectionless traffic. In general, I think most people would like the extra bandwidth to be "up for grabs" among all clients, not just the datagram clients. I'm not saying that datagrams ought to have a resource limit. I'm just thinking of the practical difficulties in ensuring that datagram traffic does not interfere with resources allocated to flows. The "trick" of assigning datagrams to a flow, and then dividing up the bandwidth among flows allows us a simpler bandwidth allocation mechanism, as a practical matter. >> Representing an area with border routers is the mathematically exact >> representation. The representation is minimal necessary. > No, it's not the minimal representation. It's the minimal *definition*. > If we assign unique names to these definitions, the unique names (i.e. > the representation) may still be considerably shorter than the > definitions. It's the minimal representation of an area. I see. So the minimal representation of a book is the entire book? Anything shorter needs external information such as mapping between Area_ID and EID of border routers. The lack of such mapping is the fatal defect of your scheme. I'm unclear as to exactly why this mapping is needed. We assign topology aggregates certain names, and uses those names consistently i) when distributing topology information, and ii) when specifying routes. Everything is done in terms of those names. When is the mapping needed? Your locator won't work against area subdivision. I see. So I assume IS-IS won't work if a level 1 area is partitioned, since it uses arbitrary labels for level 1 areas as well? I don't think a lot of hand configuration is possibble for a system of this size. I see. Well, I'm glad to know that the world telephone system doesn't use a lot of hand-configuration. As I have wrote several times, an locator information of an object is accessile through DNS, which should not, can not and, thus, does not change *automatically*. DNS gives information on "all the possible pathes". As the topology changes, the set of "all possible paths" will change. If the information in the DNS is bound so tightly to the actual topology (by including the EID's of all the border routers), sooner or later, the DNS will have to be updated as the topology changes. I don't think this is a good idea. However, we obviously disagree. if you use shorhanded area ID, if an area is subdivided, you must change objects' locator *automatically*. That is, such a scheme is NOT controllable. Not necessarily. There are other techniques for handling partition, as shown by IS-IS. > Overhead of the flow set up by a host on an end-end basis for a datagram > is all borne by that single datagram, whereas the overhead for path > calculation, setup, etc for a DMF between two routers is shared among > all the datagrams which go from one router to another. Thus, you must assumes all the routers are conneted, here, which is O(N^2). No, I am *not* assuming all the routers are connected. In fact, there are *far* fewer DMF's than full connectivity (which would imply non-hierarchical routing). This does produce non-optimal routes, but as Kleinrock and Kamoun showed, you can use hierarchical routing, and get reasonably good routing, at a vast reduction in overhead. This scheme has the *added* tweak that you can increase the number of DMF's above the theoretical minimum to get better routes, but in a way which can be locally controlled so the users get to make the extra-overhead/better-routes tradeoff decision. However, in practise you will get into diminishing returns fairly quickly (again, see Kleinrock and Kamoun), so the number of additional DMF's in the actual network will be fairly small. > I would have thought that the difference, and thus the advantage of DMF's > for datagrams, would have been obvious. One of the reason of your confusion is you think conneted datagram only. I am not thinking of "datagrams which are part of ongoing traffic streams" only. This *entire* scheme is *precisly* for those datagrams which are *not* part of such ongoing streams. The problem is that you wrongly think there is some "real flow" with connectionless communication. No, I am showing a way to move pure datagram traffic along predefined paths. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa15519; 10 Jan 94 15:25 EST Received: from pizza by PIZZA.BBN.COM id aa23180; 10 Jan 94 15:02 EST Received: from BBN.COM by PIZZA.BBN.COM id aa23176; 10 Jan 94 14:58 EST Received: from necom830.cc.titech.ac.jp by BBN.COM id aa13404; 10 Jan 94 14:55 EST Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 11 Jan 94 04:51:17 +0900 From: Masataka Ohta Return-Path: Message-Id: <9401101951.AA24361@necom830.cc.titech.ac.jp> Subject: Re: New datagram mode To: Noel Chiappa Date: Tue, 11 Jan 94 4:51:15 JST Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM In-Reply-To: <9401101825.AA21447@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 10, 94 1:25 pm X-Mailer: ELM [version 2.3 PL11] > > No. The term "active router" was defined in the original message as one > > that actually makes a decision about where to send the packet (as > > opposed to handling it as part of a flow). > > That's the router. The other which operates only in data link layer is not. > The other ones are *not* operating only in the data link layer. They throw > away the physical network header, look at the intertnetwork header to figure > out what to do with the packet, and create a new physical network header. > Sure sounds like a "router" to me. The only difference is they don't get to > make decisions about which path this particular packet stream will use. I.e. > the selector is a flow-identifier, not a locator. Things operates on flow-identifier, which does not actively change packet route is not a router. You should consult people in phone company. They say something like C-plain and will explain why flow based forwarder is not a router. > How, for example, is the cost of your imagenary DNS TNG? > > The DNS is not involved in any way with the new datagram mode, other than > providing the locator to the host before the packet is sent, *exactly* the way > the DNS works now. Since this cost is currently deemed acceptable, I would > assume it will be acceptable in the future. Perhaps I do not understand your > question? As your locator is the result of local coordination, I don't think it can be static. > > I reckon the increased robustness is well worth it > > The simpler the more rubust. > > Excuse me while I roll on the floor laughing. I might have wrongly assumed that you have a sense of a good engineer to be able to detect the point of diminishing return. The reason why simple engineering is still not empty. > Why you objected source routing, then? Wasn't your reason that it will > consume more bandwidth and CPU? > > I don't like source routing because the costs of source routing must be paid > on *each* packet, whereas of the overhead of calculating and setting up DMF's > can be amortized over many datagrams. You want to forward packets with locaters using several masks, which means several table lookup, which is slow. That is the cost to be paid on each packet at each router. You also use flow lookup several times, which is even slower paid at some routers. And the flow setup is the worst. So, I think we should forward packets with FULL EID match without any mask, which means only one (hashed) table look up. That is the cost to be paid on each packet at each router. Which, do you think, is faster? > can be amortized over many datagrams. If there are many datagrams along the setup path. > If I understand what you mean correctly, this says what I thought I was > saying. I don't like to use the term "connections" because then everyone > starts thinking I mean something like X.25, where critical state is in the > switches. > > I think the problem is that we don't have good terminology for "datagrams > which are part of ongoing data streams" (e.g. the IP datagrams which are part > of a TCP connection), and "datagrams which are *not* part of ongoing data > streams" (e.g. UDP DNS traffic). Anyone have any suggestions? "network layer, end-end connection" and "datalink layer, along-the-path connection" should be the proper distinction. > BTW, it is absurd to imporse resource limits on connectionless > communications. If actual traffic of connected traffic is less than > pre-allocation, the extra bandwidth should be used for connectionless > traffic. > > In general, I think most people would like the extra bandwidth to be "up for > grabs" among all clients, not just the datagram clients. I'm not saying that > datagrams ought to have a resource limit. I'm just thinking of the practical > difficulties in ensuring that datagram traffic does not interfere with > resources allocated to flows. The proper terminology here is "best efffort". Datagrams with QoS of "best effort" should be buffered upon contetion and should be dropped upon buffer overflow. For reasonable dropping of vairous daragrams, multiple buffering, as you mentioned, could be useful. > The "trick" of assigning datagrams to a flow, and > then dividing up the bandwidth among flows allows us a simpler bandwidth > allocation mechanism, as a practical matter. With your scheme, a datagram travels through, in general, several flows with variable bandwidth. So, datagrams will be dropped at narrowing points of the path, anyway. > It's the minimal representation of an area. > > I see. So the minimal representation of a book is the entire book? No, I can eliminate a lot of lines from your mail to get the minimal representation without changing the meaning. > As I have wrote several times, an locator information of an object is > accessile through DNS, which should not, can not and, thus, does > not change *automatically*. DNS gives information on "all the possible > pathes". > > As the topology changes, the set of "all possible paths" will change. If the > information in the DNS is bound so tightly to the actual topology (by including > the EID's of all the border routers), sooner or later, the DNS will have to be > updated as the topology changes. I don't think this is a good idea. However, > we obviously disagree. If an address change, DNS should change, of course. > if you use shorhanded area ID, if an area is subdivided, you must change > objects' locator *automatically*. That is, such a scheme is NOT > controllable. > > Not necessarily. There are other techniques for handling partition, as shown > by IS-IS. That is, the structured EID, NSAP. > but as Kleinrock and Kamoun What's that? Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa20810; 10 Jan 94 23:38 EST Received: from pizza by PIZZA.BBN.COM id aa25419; 10 Jan 94 23:26 EST Received: from BBN.COM by PIZZA.BBN.COM id aa25415; 10 Jan 94 23:24 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa13250; 10 Jan 94 23:24 EST Received: by ginger.lcs.mit.edu id AA26598; Mon, 10 Jan 94 23:24:46 -0500 Date: Mon, 10 Jan 94 23:24:46 -0500 From: Noel Chiappa Message-Id: <9401110424.AA26598@ginger.lcs.mit.edu> To: jcurran@nic.near.net, jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM Subject: Re: New datagram mode Cc: jnc@ginger.lcs.mit.edu >> This congestion would remain even after the source had ceased >> transmission > Yes and no. If a certain amount of bandwidth were allocated to > datagrams, presumably the loop would cause there to be more offered load > than capacity. This would cause packets to be dropped (via whatever drop > algorithm), so over time these packets would in all probability decay, > so I don't think they'd be there forever. The looping packets (and non-loop-destined-packets which just happened to be at the wrong place at the wrong time) would be dropped, but only until the load requirements were met. At such time, the system would be in equilibrium: any additional traffic is almost assured of pushing the link utilization up and thereby the drop rate. This will act as a resilant barrier to link utilization: given 60% link utilization due to looping datagrams, a sudden "real traffic" burst of similiar load would result in %20 link over-utilization (and thereby 20% loss for both data streams). Now, when the valid traffic ends, there is a net 50% link utilization due to looping traffic (i.e. after accounting for those dropped during the load.) It may take a while before the effects of the loop have subsided, particularly since a looping traffic flow is almost always going to start at 100% link utilization. True... Another colorful side-effect is that network engineering becomes more difficult; routing difficulties have second-order affects on packet loss, and hence makes problem diagnosis and capacity planning so much fun... Yup. All good arguments as to why a hop count is still useful. I guess what it really comes down to is "how common are loops going to be, and how much will they cost if we don't have a hop count", versus "how much is the hop count going to cost us". If you expect loops to be very, very, rare, it might make sense to drop the hop count. Mind, I'm not sure I want to get rid of the hop count; I still like the idea of having two completely redundant mechanisms to deal with the issue of loops. I guess the way to look at it is that it is useful to sit back and examine "obvious" preconceptions like "there will be a hop count", to see if in fact it really is obvious. Even if you decide to retain the "obvious" mechanism, it's nice to you you really do need it.... Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa26630; 11 Jan 94 14:51 EST Received: from pizza by PIZZA.BBN.COM id aa28675; 11 Jan 94 14:35 EST Received: from BBN.COM by PIZZA.BBN.COM id aa28671; 11 Jan 94 14:31 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa25140; 11 Jan 94 14:28 EST Received: by ginger.lcs.mit.edu id AA02738; Tue, 11 Jan 94 14:28:34 -0500 Date: Tue, 11 Jan 94 14:28:34 -0500 From: Noel Chiappa Message-Id: <9401111928.AA02738@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Robustness Cc: jnc@ginger.lcs.mit.edu Martha and I were just talking about robustness, and how hard it is to quantify. It's a different issue from *proving* programs correct (a field which I know has received a lot of attention), because a truly robust system should function correctly even in the face of incorrect engineering and coding in components, as well as unforseen failure modes. Program proving is clearly not much help with these. One thing we decided would be useful is to have a file of system failures in communication networks, to see what we can learn from them. I can think of three offhand: - The ARPANet failure caused by the IMP memory failure which caused three updates 120 degress apart in the sequence space. - The ATT failure where a timing glitch provoked a bug. - The recent NSFnet failure where the Ethernet card would not accept transit packets. There are valuable lessons to be learned here. For instance, the latter one tells us we might want to have our protocol check not just connectivity to neighbour routers, but *through* neighbour routers to routers one hop away. So, can people please send in descriptions of system failures that they know of? I'll put them into an organized file. Send the complete description to me only, and a one-liner to the whole mailing list so I dont't get N descriptions of the same failure. (I've forgotten the details of the ATT bug, so I could use a description of that one.) Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa28459; 11 Jan 94 15:24 EST Received: from pizza by PIZZA.BBN.COM id aa28955; 11 Jan 94 15:08 EST Received: from BBN.COM by PIZZA.BBN.COM id aa28951; 11 Jan 94 15:06 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa27205; 11 Jan 94 15:02 EST Received: by ginger.lcs.mit.edu id AA04026; Tue, 11 Jan 94 15:01:45 -0500 Date: Tue, 11 Jan 94 15:01:45 -0500 From: Noel Chiappa Message-Id: <9401112001.AA04026@ginger.lcs.mit.edu> To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp Subject: Re: Analysis of DMF's in new datagram mode Cc: nimrod-wg@BBN.COM > For routers which are not border routers, they need one DMF out to a > border router (i.e. up), and that's it. Traffic to other objects inside > the area can be handled by sending it to the border router, which will > send it back to the correct object. There will be load concentration at border routers, then. I was describing the minimal functional configuration. Interior routers will be free to augment their set of DMF's to produce more optimal routing. I hadn't analyzed this option in my analysis to keep it the analysis reasonably simple, but this will not produce unacceptable number of DMF's. An interior router which had a instantiated DMF (not just a potential DMF) to every other object in the area would still have only as many DMF's ending at it as a border router of the area, and that case has been analyzed as O(I), which is reasonable. If every interior router had an instantiated DMF to all the objects in the area, the number of flows through each router in the area would be O(IlogI), which is a little worse, but not terribly so, since I is unlikely to grow large; we won't have enormous areas. In addition, I doubt we will see full mesh connectivity; traffic X-Y graphs always show hot-spots, not an even distribution, at any scale. > What's *important* are the number of DMF's which end in a router, and > the number of DMF's which go through (i.e. require state for storage) > interior routers. not all DMFs are equal. That is, higher level DMF means a lot more traffic through it. That's why you can't increase the number of levels at will. Your model is not scalable. This is an interesting point, but it impacts a lot more than just DMF's. If top level links don't have enough bandwidth to handle the traffic, this is going to be true whether you use DMF's for datagram traffic, or hop-by-hop, or whether the user traffic is all in end-end flows! To grow the network, we will have to do one of two things. Either the top level links (i.e. the express highways) will have to have more bandwidth, in which case this is not a problem, or we are going to have to distribute the traffic over a number of smaller, parallel, links. Since this increases the number of arcs in the graph, *not* the number of nodes, it doesn't impact the growth of DMF's either. I reckon the second is better (since it uses parallel, less expensive, technology, to get the performance), but I'm not sure which we will see. One other thing to note is that *all* routing algorithms have limits on the size of the graph over which you can run them in practise. To build a larger network, you need to introduce levels, and to scale to arbitraty sizes, you need an arbitrary number of levels. > The average path length, A, for a graph of fixed degree (i.e. one in > which nodes have the same average number of arcs to neighbouring nodes, > independant of the size of the graph) is logN (where N is the number of > nodes). [Chen 86] While such an average over all the possible graphs will be so, as your topology is mesh, you should average only over planar graphs. Thus, the average path length will be sqrt(N). This doesn't make sense to me. Planar graphs are not a good model for the connectivity of the network. For instance, in a planar graph, you cannot have full interconnection between 5 nodes (this is similar to the famous "three utilities and three houses" problem, if anyone wants to play with a piece of paper to verify this). However, I don't think that the real network will display such constraints. So, planar graphs are not the applicable area of graph theory, but rather normal graphs, and there it is log(N). Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa02550; 11 Jan 94 16:14 EST Received: from pizza by PIZZA.BBN.COM id aa29263; 11 Jan 94 15:53 EST Received: from BBN.COM by PIZZA.BBN.COM id aa29259; 11 Jan 94 15:52 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa00476; 11 Jan 94 15:44 EST Received: by ginger.lcs.mit.edu id AA05198; Tue, 11 Jan 94 15:44:23 -0500 Date: Tue, 11 Jan 94 15:44:23 -0500 From: Noel Chiappa Message-Id: <9401112044.AA05198@ginger.lcs.mit.edu> To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp Subject: Re: New datagram mode Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM > The other ones are *not* operating only in the data link layer. They > throw away the physical network header, look at the intertnetwork header > to figure out what to do with the packet, and create a new physical > network header. ... The only difference is they don't get to make > decisions about which path this particular packet stream will use. I.e. > the selector is a flow-identifier, not a locator. Things operates on flow-identifier, which does not actively change packet route is not a router. You should consult people in phone company. They say something like C-plain and will explain why flow based forwarder is not a router. I reckon that must of us are happy calling devices which forward packets based on the contents of the internetwork layer header a "router". It's just a name, so it doesn't really matter that much. Remember also that a forwarding device which is forwarding some packets based on their flow id's may also be forwarding *other* packets by looking at their locators. Also, all have to contain flow-setup code, etc, so it's not like you can separate out one group of devices and say "these don't need to contain code for this function". They all have to be functionally identical, even if they don't operate identically on all packets. As your locator is the result of local coordination, I don't think it can be static. You are correct, it is not; as I have mentioned before, as the topology changes we will want to change the abstraction hierarchy to match. However, I want that change in the abstarction hierarchy to be controllable, not fully and unavoidably automatic. >> Why you objected source routing, then? Wasn't your reason that it will >> consume more bandwidth and CPU? > I don't like source routing because the costs of source routing must be > paid on *each* packet, whereas of the overhead of calculating and > setting up DMF's can be amortized over many datagrams. You want to forward packets with locaters using several masks, which means several table lookup, which is slow. That is the cost to be paid on each packet at each router. Depending on what locators look like, there are not necessarily any masks involved. If the locators have syntax of the form A.B...P.Q, then the routing table can be *logically* structured as a tree, with either a multi-way branch or a routing entry leaf at each "."; locating the corrrect branch in a tree is obviously inefficient, but it can be speeded up by hashing techniques, etc. Remember, this cost is not paid at every "router" (my definition), only at "active routers" (again, my definition). You also use flow lookup several times, which is even slower paid at some routers. Lookup of a shortish, fixed length quantity at a fixed offset has to be the easiest single operation, whether it is done in hardware or software. I don't understand why you think it will be slow. And the flow setup is the worst. But it is only performed once (so it does not even add any delay), and the cost shared between any number of packets. So, I think we should forward packets with FULL EID match without any mask, which means only one (hashed) table look up. That is the cost to be paid on each packet at each router. Which, do you think, is faster? How is looking up one of a number of EID's (since your concept is to use a source route consisting of a list of router EID's) any cheaper than looking up a flow-id? Remember, most routers will only be doing a flow-lookup in this scheme, not looking at the locator. > can be amortized over many datagrams. If there are many datagrams along the setup path. That's the whole point of having the minimal set of DMF's necessary to do pure hierarchical routing, augmented *as necessary* where the amount of traffic justifies the cost of extra DMF's. You only go beyond the minimal set (which has been shown to be quite small) if there *are* many datagrams; i.e. if the actual traffic justifies it. Note also that if you think a DMF is unlikely to have any traffic across it, set it up on demand, not in advance. That way, the only DMF's that get set up are the ones that get used. The proper terminology here is "best efffort". Yes, I'd forgotten that term. Datagrams with QoS of "best effort" should be buffered upon contetion and should be dropped upon buffer overflow. > I'm just thinking of the practical difficulties in ensuring that > datagram traffic does not interfere with resources allocated to flows. > The "trick" of assigning datagrams to a flow, and then dividing up the > bandwidth among flows, allows us a simpler bandwidth allocation > mechanism, as a practical matter. With your scheme, a datagram travels through, in general, several flows with variable bandwidth. So, datagrams will be dropped at narrowing points of the path, anyway. I must have missed something. If we do what you suggest, and buffer "best effort" datagrams, won't they be dropped at precisely the same points on congestion? The actual effects of the DMF scheme ought to be the same, it's just a single uniform mechanism for all packets, rather than one for flows and another for datagrams. > There are other techniques for handling partition, as shown by IS-IS. That is, the structured EID, NSAP. You seem to have a private definition for "EID" that the rest of us do not share. To us, an *EID has no structure*. So, an NSAP is not an EID, although it does name *approximately* the same class of things as a EID does. > but as Kleinrock and Kamoun What's that? Leonard Kleinrock and Farouk Kamoun, "Hierarchical Routing for Large Networks: Performance Evaluation and Optimization", Computer Networks 1, North-Holland, 1977, pp. 155-174. It's the fundamental work on hierarchical routing; it mathemtically quantifies how inefficient routing becomes when hierarchies are used, etc. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa11610; 12 Jan 94 0:18 EST Received: from pizza by PIZZA.BBN.COM id aa01425; 12 Jan 94 0:05 EST Received: from BBN.COM by PIZZA.BBN.COM id aa01421; 12 Jan 94 0:02 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa11197; 12 Jan 94 0:03 EST Received: by ginger.lcs.mit.edu id AA08589; Wed, 12 Jan 94 00:03:22 -0500 Date: Wed, 12 Jan 94 00:03:22 -0500 From: Noel Chiappa Message-Id: <9401120503.AA08589@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Architecture document draft outline Cc: jnc@ginger.lcs.mit.edu As discussed at the last IETF, the BBN crew are going to try and crank out a draft "architectural outline" document for Nimrod. Here's a draft outline; any comments, etc are welcome. Noel -------- Nimrod Architecture 1. Introduction and Overview - The current Internet - Assumptions about the future Internet - Why we need a new routing and addressing architecture - A brief overview of this new architecture - How the new architecture can be introduced into the Internet 2. Internetwork Organization and Representation - Basic entities and how they are clustered - Hierarchical organization of clusters - Locators and identifiers - Representation of cluster attributes: - Maps: connectivity - Policies: offered services and restrictions on use - Abstraction 3. Routing and Addressing Functions - Entity and attribute discovery and configuration - Routing information (connectivity and policy) distribution - Route generation - Mapping between endpoint identifiers and locators - Flow setup - Packet forwarding 4. Auxilliary Functions - Network management - Security - Multicast - Mobility support 5. Deployment - How Nimrod fits with IP and with each of the new proposed Internet packet formats - Migrating to Nimrod routing and addressing - Router and host functionality required during transition to Nimrod   Received: from PIZZA.BBN.COM by BBN.COM id aa12156; 12 Jan 94 0:35 EST Received: from pizza by PIZZA.BBN.COM id aa01491; 12 Jan 94 0:20 EST Received: from BBN.COM by PIZZA.BBN.COM id aa01487; 12 Jan 94 0:18 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa11621; 12 Jan 94 0:18 EST Received: by ginger.lcs.mit.edu id AA08636; Wed, 12 Jan 94 00:18:33 -0500 Date: Wed, 12 Jan 94 00:18:33 -0500 From: Noel Chiappa Message-Id: <9401120518.AA08636@ginger.lcs.mit.edu> To: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM Subject: Re: Architecture document draft outline Here are my suggestions: In "1. Introduction and Overview", after "Why we need a new routing and addressing architecture", insert sections about: - Design philosophy of the new architecture (which would talk about maximizing the system lifetime, making algorithms local wherever possible, minimizing the size and complexity of the "common core" system, robustness, and stuff like that) - Functional goals of the new architecture (which would talk about the capabilities we want to provide, such as mechanisms for flexible abstraction, an efficient datagram mode, interact will with new resource allocation mechanisms, etc) I would also be tempted to break up "3. Routing and Addressing Functions" into three sections (or subsections), one on "Topology discovery and distribution", one on "Route Generation", and one on "User traffic handling", to emphasized that these are three separate functional subsystems, only two of which (the first and last) are "system-wide". Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa19409; 12 Jan 94 12:23 EST Received: from pizza by PIZZA.BBN.COM id aa04217; 12 Jan 94 12:04 EST Received: from BBN.COM by PIZZA.BBN.COM id aa04213; 12 Jan 94 12:00 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa17663; 12 Jan 94 11:56 EST Received: by ginger.lcs.mit.edu id AA13725; Wed, 12 Jan 94 11:56:38 -0500 Date: Wed, 12 Jan 94 11:56:38 -0500 From: Noel Chiappa Message-Id: <9401121656.AA13725@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: Analysis of DMF's in new datagram mode Cc: jnc@ginger.lcs.mit.edu I don't know the exact formula for the number of border nodes on a graph with a constant degree of connection between the nodes, but I'll hazard the guess that the number of border nodes will grow as the square root of the number of total nodes. (Handwave justification for this is that it's probably the same as the ratio of the circumference of a circle to the area of a circle, since you can model a graph with a fixed degree of connectivity among the nodes as a geometrical figure where each node takes a fixed area, and thus the area is proportional to the number of nodes.) I've just realized I may have been off-base here. This is probably a good model for planar graphs, but perhaps not for arbitrary graphs. If so, and a three-dimensional analogy is needed, that would mean the correct geometrical analogy is one where each nodes takes a fixed volume, and the number of border nodes is proportional to the surface area of the cube. That would make the number of border nodes grow as the cube root of the number of total nodes. On the other hand, you can represent any graph in two dimensions (if you allow the arcs to cross), so perhaps my original thought is correct. Does anyone know the correct answer? (I've ordered up some graph theory books to augment my meagre stock of works on the topic, but what I have doesn't give this one...) Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa25437; 12 Jan 94 14:03 EST Received: from pizza by PIZZA.BBN.COM id aa04688; 12 Jan 94 13:36 EST Received: from BBN.COM by PIZZA.BBN.COM id aa04684; 12 Jan 94 13:34 EST Received: from babyoil.ftp.com by BBN.COM id aa23091; 12 Jan 94 13:27 EST Received: from tri-flow.ftp.com by ftp.com with SMTP id AA00236; Wed, 12 Jan 94 13:27:20 -0500 Received: by tri-flow.ftp.com.ftp.com (5.0/SMI-SVR4) id AA25845; Wed, 12 Jan 94 13:27:28 EST Date: Wed, 12 Jan 94 13:27:28 EST Message-Id: <9401121827.AA25845@tri-flow.ftp.com.ftp.com> To: jnc@ginger.lcs.mit.edu Subject: Re: Architecture document draft outline From: Frank Kastenholz Reply-To: kasten@ftp.com Cc: nimrod-wg@BBN.COM Content-Length: 3291 > 1. Introduction and Overview > - The current Internet > - Assumptions about the future Internet > - Why we need a new routing and addressing architecture Move the following two someplace else.... > - A brief overview of this new architecture > - How the new architecture can be introduced into the Internet 2. Requirements What the requirements are that the new architecture is supposed to satisfy, various design points, what things it will not do, what limits the proposed architecture has, and so on. 3. Overview - A brief overview of this new architecture - How the new architecture can be introduced into the Internet (or get rid of this bullet, leaving this all in the later chapter on deployment) This chapter should be pretty much limited to description of the various components of the Nimrod Architecture, and descriptions of the relationships between those components. Which is what it looks like you do. > 2. Internetwork Organization and Representation > - Basic entities and how they are clustered > - Hierarchical organization of clusters > - Locators and identifiers > - Representation of cluster attributes: > - Maps: connectivity > - Policies: offered services and restrictions on use > - Abstraction The following three (I suggest splitting routing/addressing and forwarding into two separate chapters) discuss functions and operations. > 3. Routing and Addressing Functions > - Entity and attribute discovery and configuration > - Routing information (connectivity and policy) distribution > - Route generation > - Mapping between endpoint identifiers and locators Put these in a separate chapter. They are conceptually separate elements, they ought to be in a separate chapter, stressing that separation. Given the volume of discussion that's gone on about the datagram mode, there might be a fair amount to say. Note also that in this chapter you should clearly state your reasons for not supporting a "classic hop-by-hop" forwarding model, ala IPv4. > - Flow setup > - Packet forwarding - datagram mode > 4. Auxilliary Functions > - Network management > - Security What are your requirements for security? > - Multicast > - Mobility support What about robustness? (You posted on it yesterday) Byzantine failure protection (you really wanted this as a part of the IPng requirements that Partridge and I did a while ago...) > 5. Deployment > - How Nimrod fits with IP and with each of the new proposed > Internet packet formats > - Migrating to Nimrod routing and addressing > - Router and host functionality required during transition > to Nimrod Finally, in general, please put the "whys" in the document as well as the "whats" -- it is important to understand why certain things are done or are not done. Also, as the document changes while being written and reviewed, I'd suggest that a log be kept in the document of what changes were made and why -- it helps to avoid going over old ground unnecessarily. I've done this in some drafts that I've written and it's proved to be tremendously useful. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa12180; 12 Jan 94 19:34 EST Received: from pizza by PIZZA.BBN.COM id aa06785; 12 Jan 94 19:19 EST Received: from BBN.COM by PIZZA.BBN.COM id aa06781; 12 Jan 94 19:17 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa11664; 12 Jan 94 19:16 EST Received: by ginger.lcs.mit.edu id AA19079; Wed, 12 Jan 94 19:16:25 -0500 Date: Wed, 12 Jan 94 19:16:25 -0500 From: Noel Chiappa Message-Id: <9401130016.AA19079@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Arch doc, packet functionality Cc: jnc@ginger.lcs.mit.edu One other topic we might talk about in this document is what functionality we'd like to see in an inter-router packet format. Right at the moment, I'm not thrilled with any of the existing off-the-shelf alternatives. This section would serve two functions. First, it would be a guide to people working on internetwork packet formats as to what functionality the Nimrod stuff would like to see. Second, should we decide to do a specialized one inside Nimrod, this list would basically translate directly into a packet format in what I hope (famous last words :-) would be a simple process. (I know, I know, "amateurs design packet formats, professionals etc.") To that end, when we discuss various functional items (e.g. flow-id, hop-count, etc), we should give an idea of the minimum size that we think fits our needs. Giving the maximum that we think is useful/cost-efficient probably wouldn't be a bad idea either. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa15197; 12 Jan 94 21:25 EST Received: from pizza by PIZZA.BBN.COM id aa07136; 12 Jan 94 21:11 EST Received: from BBN.COM by PIZZA.BBN.COM id aa07132; 12 Jan 94 21:09 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa14851; 12 Jan 94 21:09 EST Received: by ginger.lcs.mit.edu id AA19383; Wed, 12 Jan 94 21:09:06 -0500 Date: Wed, 12 Jan 94 21:09:06 -0500 From: Noel Chiappa Message-Id: <9401130209.AA19383@ginger.lcs.mit.edu> To: kasten@ftp.com, nimrod-wg@BBN.COM Subject: Re: Architecture document draft outline Cc: jnc@ginger.lcs.mit.edu The following three (I suggest splitting routing/addressing and forwarding into two separate chapters) discuss functions and operations. > - Route generation Actually, now that I think about it, I'd almost prefer to leave this section out of the architecture spec altogether, at least in any detail. A discussion of how they are local, and perhaps a general discussion of strategies, but nothing more. Route generation algorithnms are not part of the Nimrod spec, so I don't think it's appropriate to include them in an architecture specification. I know what you'll all say; without feasible, worked, examples, people will say the architecture "won't work". So? First, there are *plenty* of other critical algorithms (such as incoming and outgoing abstarction control) which aren't handled either. Second, if we have some good ideas on route generation, fine, we can make them a separate RFC. Route generation is not part of the core architecture; I would very strongly prefer to leave it out of the architeture document. Given the volume of discussion that's gone on about the datagram mode, there might be a fair amount to say. Say what? Except for the extended traffic from Masataka Ohta, and that message from you, I haven't seen that much... maybe you're all so stunned with the perfection and necessity of it all, you've nothing to say, but somehow I doubt it! :-) Note also that in this chapter you should clearly state your reasons for not supporting a "classic hop-by-hop" forwarding model, ala IPv4. Yup; good point. What about robustness? (You posted on it yesterday) Byzantine failure protection (you really wanted this as a part of the IPng requirements that Partridge and I did a while ago...) Yah, I was getting ready to say something about this in a note to the IETF as a whole, in response to the NSFNet problems with the failed Ethernet card. I think we have to have protection throughout the network against active hostile attack. This will not serve to protect against *all* unforseen bugs (time and nature seem far more clever at finding holes than muny human intelligence), but it will help. Finally, in general, please put the "whys" in the document as well as the "whats" -- it is important to understand why certain things are done or are not done. Yes, and the architecture document is a good place for this. My only worry is that the resulting tome would be too long for mere mortals? Maybe we could have two versions, one with, and one without "DISCUSSION" sections? A simple matter of changing text-processor macro definitions... Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa05380; 13 Jan 94 8:35 EST Received: from pizza by PIZZA.BBN.COM id aa09049; 13 Jan 94 8:19 EST Received: from BBN.COM by PIZZA.BBN.COM id aa09045; 13 Jan 94 8:16 EST Received: from necom830.cc.titech.ac.jp by BBN.COM id aa04609; 13 Jan 94 8:15 EST Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Thu, 13 Jan 94 22:10:16 +0900 From: Masataka Ohta Return-Path: Message-Id: <9401131310.AA08203@necom830.cc.titech.ac.jp> Subject: Re: Analysis of DMF's in new datagram mode To: Noel Chiappa Date: Thu, 13 Jan 94 22:10:15 JST Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM In-Reply-To: <9401112001.AA04026@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 11, 94 3:01 pm X-Mailer: ELM [version 2.3 PL11] > > For routers which are not border routers, they need one DMF out to a > > border router (i.e. up), and that's it. Traffic to other objects inside > > the area can be handled by sending it to the border router, which will > > send it back to the correct object. > > There will be load concentration at border routers, then. > > I was describing the minimal functional configuration. Because of the load concentration, I don't think your configuration function. > An interior router which had a instantiated DMF (not just a potential DMF) > every other object in the area would still have only as many DMF's ending at > it as a border router of the area, and that case has been analyzed as O(I), > which is reasonable. Your configuration is wrong as to the configuration within an area, of course. But, your configuration is also wrong as to the configuration of area hierarchy. The number of levels must be limited, > If every interior router had an instantiated DMF to all the objects in the > area, the number of flows through each router in the area would be O(IlogI), > which is a little worse, but not terribly so, since I is unlikely to grow > large; we won't have enormous areas. Because of planarity, it is O(I^1.5), where I is not constant. > In addition, I doubt we will see full > mesh connectivity; traffic X-Y graphs always show hot-spots, not an even > distribution, at any scale. Hot spots make the load concentration worse. > > What's *important* are the number of DMF's which end in a router, and > > the number of DMF's which go through (i.e. require state for storage) > > interior routers. > > not all DMFs are equal. That is, higher level DMF means a lot more traffic > through it. That's why you can't increase the number of levels at will. > Your model is not scalable. > > This is an interesting point, but it impacts a lot more than just DMF's. If > top level links don't have enough bandwidth to handle the traffic, this is > going to be true whether you use DMF's for datagram traffic, or hop-by-hop, > or whether the user traffic is all in end-end flows! Top level links MUST have enough bandwidth. That is, there should be a lot of second level areas and they should be connected with a lot of links. Your conffiguration, which assumes small areas, do not allow such a coniguration. > To grow the network, we will have to do one of two things. Either the top > level links (i.e. the express highways) will have to have more bandwidth, in > which case this is not a problem, or we are going to have to distribute the > traffic over a number of smaller, parallel, links. Since this increases the > number of arcs in the graph, *not* the number of nodes, If you increase number of links without increasing the number of nodes, load will concentrate not in links but on border routers. > it doesn't impact the > growth of DMF's either. I reckon the second is better (since it uses parallel, > less expensive, technology, to get the performance), but I'm not sure which > we will see. We need a lot of top level areas with a lot of border routers. > One other thing to note is that *all* routing algorithms have limits on > the size of the graph over which you can run them in practise. To build a > larger network, you need to introduce levels, and to scale to arbitraty > sizes, you need an arbitrary number of levels. Everything has its own limitaiton. Still, we can expect some limitation scale as time goes by. For example, allowable size of routing information is expected to scale as the link speed increases. So, though it is obvious that there should be levels, but you don't have to assume the area size be constant. > > The average path length, A, for a graph of fixed degree (i.e. one in > > which nodes have the same average number of arcs to neighbouring nodes, > > independant of the size of the graph) is logN (where N is the number of > > nodes). [Chen 86] > > While such an average over all the possible graphs will be so, as your > topology is mesh, you should average only over planar graphs. Thus, > the average path length will be sqrt(N). > > This doesn't make sense to me. Planar graphs are not a good model for the > connectivity of the network. As the Earth surface is planar and as the routers are placed on Earth it is the model. > For instance, in a planar graph, you cannot have > full interconnection between 5 nodes Small number of crossing does not matter. But, if you allow arbitrary crossing, it means most links are lengthy. > However, I don't think that the real network will > display such constraints. So, planar graphs are not the applicable area of > graph theory, but rather normal graphs, and there it is log(N). Assume top level routers are distributed all over the surface of the Earth. Then, how, do you think, will the average link length between connected nodes? If you assume full randomness, the average is something with transcontinental scale, which is quite costly. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa15508; 13 Jan 94 11:21 EST Received: from pizza by PIZZA.BBN.COM id aa09956; 13 Jan 94 11:07 EST Received: from BBN.COM by PIZZA.BBN.COM id aa09952; 13 Jan 94 11:05 EST Received: from ns.Novell.COM by BBN.COM id aa14065; 13 Jan 94 11:00 EST Received: from WC.Novell.COM (optics.wc.novell.com) by ns.Novell.COM (4.1/SMI-4.1) id AA09951; Thu, 13 Jan 94 09:00:23 MST Received: from [130.57.64.148] by WC.Novell.COM (4.1/SMI-4.1) id AA15978; Thu, 13 Jan 94 07:55:28 PST Date: Thu, 13 Jan 94 07:55:27 PST Message-Id: <9401131555.AA15978@WC.Novell.COM> X-Sender: minshall@optics.wc.novell.com (Unverified) Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" To: Noel Chiappa From: Greg Minshall Subject: Re: New datagram mode Cc: nimrod-wg@BBN.COM Noel, > The forwarding of these packets is, as already noted, quite >efficient, and in non-active routers, is maximally efficient (perhaps more >so than even standard hop-by-hop). Out of curiosity, why are you thinking that DMF forwarding may possibly be more efficient than standard hop-by-hop? Because you are assuming that lots of destinations will be "multiplexed" over a given datagram flow? Greg   Received: from PIZZA.BBN.COM by BBN.COM id aa17684; 13 Jan 94 11:54 EST Received: from pizza by PIZZA.BBN.COM id aa10121; 13 Jan 94 11:36 EST Received: from BBN.COM by PIZZA.BBN.COM id aa10115; 13 Jan 94 11:34 EST Received: from ftp.com by BBN.COM id aa16073; 13 Jan 94 11:30 EST Received: from tri-flow.ftp.com by ftp.com with SMTP id AA21421; Thu, 13 Jan 94 11:30:59 -0500 Received: by tri-flow.ftp.com.ftp.com (5.0/SMI-SVR4) id AA03056; Thu, 13 Jan 94 11:31:03 EST Date: Thu, 13 Jan 94 11:31:03 EST Message-Id: <9401131631.AA03056@tri-flow.ftp.com.ftp.com> To: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM Subject: Re: Architecture document draft outline From: Frank Kastenholz Reply-To: kasten@ftp.com Content-Length: 3049 >I know what you'll all say; without feasible, worked, examples, people will >say the architecture "won't work". So? First, there are *plenty* of other >critical algorithms (such as incoming and outgoing abstarction control) which >aren't handled either. Second, if we have some good ideas on route generation, >fine, we can make them a separate RFC. Route generation is not part of the >core architecture; I would very strongly prefer to leave it out of the >architeture document. First, you might have to enumerate the requirements that Nimrod has on the algorithms, whatever they are. Second, you have to identify what the required algorithms and protocols are that are not being specified as a part of the architecture. Finally, by having samples you make it much easier for people to understand what is going on, perhaps these could simply be as parts of examples, or as appendices that present outlines of some sample algorithms. > Given the volume of discussion that's gone on about the datagram mode, > there might be a fair amount to say. > >Say what? Except for the extended traffic from Masataka Ohta, and that message >from you, I haven't seen that much... maybe you're all so stunned with the >perfection and necessity of it all, you've nothing to say, but somehow I doubt >it! :-) Well, perhaps I am mistaking volume of postings for breadth of discussion. While I confess that i have not been following all of the details of Masataka Ohta's postings and your responses, my impression is that a volume of posting, even from one person, may indicate a need for additional material in the way of discussion, explanation, background, examples, reasoning and so forth. > Yes, and the architecture document is a good place for this. My only worry > is that the resulting tome would be too long for mere mortals? Maybe we could > have two versions, one with, and one without "DISCUSSION" sections? A simple > matter of changing text-processor macro definitions... Not two documents. Maybe putting the whys and wherefores into separate appendices or DISCUSSION sections ala router requirements (in otherwords, make it easy for an experienced reader to skip over the explanatory material). I prefer in-line discussion sections wherever possible; it means less skipping back and forth to appendices when reading. Also, I just thought of it now; a description of what is required to configure a host and a router and how auto-configuration might work (at a high level) would be useful. Again, I realize that these may not be a part of the architecture, per-se, but this is all needed in order to make a realistic assesment of whether the architecture is useful as an IPng. OR -- as an alternative, do not release the architecture document to the general public (i.e. off of this list) until there are additional documents available that, at least at a high level, describe how these implementation-level things _could_ be done. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa04867; 13 Jan 94 15:59 EST Received: from pizza by PIZZA.BBN.COM id aa11532; 13 Jan 94 15:41 EST Received: from BBN.COM by PIZZA.BBN.COM id aa11528; 13 Jan 94 15:37 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa03204; 13 Jan 94 15:35 EST Received: by ginger.lcs.mit.edu id AA24833; Thu, 13 Jan 94 15:35:34 -0500 Date: Thu, 13 Jan 94 15:35:34 -0500 From: Noel Chiappa Message-Id: <9401132035.AA24833@ginger.lcs.mit.edu> To: Greg_Minshall@novell.com, jnc@ginger.lcs.mit.edu Subject: Re: New datagram mode Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM Out of curiosity, why are you thinking that DMF forwarding may possibly be more efficient than standard hop-by-hop? Because you are assuming that lots of destinations will be "multiplexed" over a given datagram flow? No, it's because in most of the routers along the path, the forwarding will be a "flow lookup" forwarding, rather than a "locator lookup" forwarding. I expect that the former will be more efficient, since you are looking up a forwarding entry based on a shortish fixed length UID tag. Designs for locators vary, from Tuba, which is not fixed length (and longish), to SIP, which has on the surface has this fixed characteristic, but in fact is not really so. SIP assumes that SIP routing tables will be using the "longest match" method, so in fact it's not a simple lookup. The obvious reply is that "well, the result of the lookup will be cached, so from then on it is a simple UID tag lookup" is invalid, since we are talking about datagram *applications*; i.e. there is no next packet. Traffic consisting of a stream of packets is best carried in Nimrod via a flow. However, pure datagram applications should be as efficient, and maybe more so; it depends on i) the ratio of "active" routers to flow-forwarding routers, and ii) the cost ratios among "active" forwarding, "flow" forwarding, and "hop-by-hop" forwarding. Not knowing any of these exactly, I can't say for sure. Also, when I said "efficient", I meant "forwarding time" efficient, since this seems to be the dominant concern. Efficiency along other ratios (such as state) are, of course, a different matter. Still, this new datagram mode should enable us to provide very efficient "pure" datagram service in Nimrod, while staying within the Nimrod design philosophy; i.e. without having to include a hop-by-hop mode. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa04892; 13 Jan 94 15:59 EST Received: from pizza by PIZZA.BBN.COM id aa11545; 13 Jan 94 15:41 EST Received: from BBN.COM by PIZZA.BBN.COM id aa11541; 13 Jan 94 15:39 EST Received: from [131.112.4.4] by BBN.COM id aa03542; 13 Jan 94 15:39 EST Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Fri, 14 Jan 94 05:34:42 +0900 From: Masataka Ohta Return-Path: Message-Id: <9401132034.AA09359@necom830.cc.titech.ac.jp> Subject: Re: New datagram mode To: Noel Chiappa Date: Fri, 14 Jan 94 5:34:40 JST Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM In-Reply-To: <9401112044.AA05198@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 11, 94 3:44 pm X-Mailer: ELM [version 2.3 PL11] > Things operates on flow-identifier, which does not actively change packet > route is not a router. You should consult people in phone company. They > say something like C-plain and will explain why flow based forwarder is > not a router. > I reckon that must of us are happy calling devices which forward packets based > on the contents of the internetwork layer header a "router". It's just a name, > so it doesn't really matter that much. Flow ID is not considered to be internetwor layer. > Remember also that a forwarding device which is forwarding some packets based > on their flow id's may also be forwarding *other* packets by looking at their > locators. That is, it seems to me, what you refered as "active router". > They all have to be functionally identical, even if > they don't operate identically on all packets. But you mentioned some static configuration. Though all the brouters are functionally identical, some are routers and some are bridges depdending on the static configuration. > As your locator is the result of local coordination, I don't think it can > be static. > > You are correct, it is not; as I have mentioned before, as the topology > changes we will want to change the abstraction hierarchy to match. However, I > want that change in the abstarction hierarchy to be controllable, not fully > and unavoidably automatic. So, my question is, how can you globally propergate the information? I don't think you can use DNS here. > Remember, this cost is not paid at every "router" (my definition), only at > "active routers" (again, my definition). Oops, my error. > And the flow setup is the worst. > > But it is only performed once (so it does not even add any delay), and the > cost shared between any number of packets. You should be assuming connected UDP, then. > So, I think we should forward packets with FULL EID match without any > mask, which means only one (hashed) table look up. That is the cost to be > paid on each packet at each router. Which, do you think, is faster? > > How is looking up one of a number of EID's (since your concept is to use a > source route consisting of a list of router EID's) any cheaper than looking > up a flow-id? They are same. > Remember, most routers will only be doing a flow-lookup in > this scheme, not looking at the locator. With my scheme, only the source cares locator. > > can be amortized over many datagrams. > > If there are many datagrams along the setup path. > > That's the whole point of having the minimal set of DMF's necessary to do pure > hierarchical routing, augmented *as necessary* where the amount of traffic > justifies the cost of extra DMF's. You only go beyond the minimal set (which > has been shown to be quite small) if there *are* many datagrams; i.e. if > the actual traffic justifies it. OK. Suppose the traffic needs the maximum set. How large is the maximum? > Note also that if you think a DMF is unlikely to have any traffic across it, > set it up on demand, not in advance. That way, the only DMF's that get set up > are the ones that get used. As the communication is connectionless, you can't expect much usage pattern. Especially, packets which travels long distance, which loads top level routers tends to have less pattern, becase end organizations are less related. > Datagrams with QoS of "best effort" should be buffered upon contetion > and should be dropped upon buffer overflow. > > > I'm just thinking of the practical difficulties in ensuring that > > datagram traffic does not interfere with resources allocated to flows. > > The "trick" of assigning datagrams to a flow, and then dividing up the > > bandwidth among flows, allows us a simpler bandwidth allocation > > mechanism, as a practical matter. > > With your scheme, a datagram travels through, in general, several flows > with variable bandwidth. So, datagrams will be dropped at narrowing points > of the path, anyway. > > I must have missed something. If we do what you suggest, and buffer "best > effort" datagrams, won't they be dropped at precisely the same points on > congestion? The actual effects of the DMF scheme ought to be the same, If you try to have QoS along each flow, some of the bandwidth is reserved and wasted, which is the difference. If you don't try to have QoS, they is no difference. So, why do you bother to have flows? > it's > just a single uniform mechanism for all packets, rather than one for flows > and another for datagrams. Packets with flow will have end-end flow setup. Others not. So, they are not same. > > There are other techniques for handling partition, as shown by IS-IS. > > That is, the structured EID, NSAP. > > You seem to have a private definition for "EID" that the rest of us do not > share. To us, an *EID has no structure*. So, an NSAP is not an EID, although > it does name *approximately* the same class of things as a EID does. It's you who wrongly said my EID have structure. Anyway, how can you handle partitioning? Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa13309; 15 Jan 94 0:05 EST Received: from pizza by PIZZA.BBN.COM id aa19502; 14 Jan 94 23:52 EST Received: from BBN.COM by PIZZA.BBN.COM id aa19498; 14 Jan 94 23:49 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id ab13000; 14 Jan 94 23:50 EST Received: by ginger.lcs.mit.edu id AA06439; Fri, 14 Jan 94 23:50:16 -0500 Date: Fri, 14 Jan 94 23:50:16 -0500 From: Noel Chiappa Message-Id: <9401150450.AA06439@ginger.lcs.mit.edu> To: kasten@ftp.com, nimrod-wg@BBN.COM Subject: Re: Architecture document draft outline Cc: jnc@ginger.lcs.mit.edu First, you might have to enumerate the requirements that Nimrod has on the algorithms ... Second, you have to identify what the required algorithms and protocols are that are not being specified as a part of the architecture. Good points. > My only worry is that the resulting tome would be too long ... > Maybe we could have two versions, one with, and one without "DISCUSSION" > sections? Not two documents. Maybe putting the whys and wherefores into separate appendices or DISCUSSION sections... I prefer in-line discussion sections wherever possible Err, one document would be a total subset of the other. I suggested the short version since long document tend to put people off from reading them.. I agree, any discussion as to why things are done a given way is best done inline. Also, I just thought of it now; a description of what is required to configure a host and a router and how auto-configuration might work (at a high level) would be useful. I spoke with Martha on the phone about the outline (I'll report in a separate message), and I have this bit set that she said configuration would be covered... Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa22605; 17 Jan 94 7:41 EST Received: from pizza by PIZZA.BBN.COM id aa26866; 17 Jan 94 7:26 EST Received: from BBN.COM by PIZZA.BBN.COM id aa26862; 17 Jan 94 7:23 EST Received: from necom830.cc.titech.ac.jp by BBN.COM id aa22107; 17 Jan 94 7:23 EST Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Mon, 17 Jan 94 21:18:44 +0900 From: Masataka Ohta Return-Path: Message-Id: <9401171218.AA21893@necom830.cc.titech.ac.jp> Subject: Re: Architecture document draft outline To: Noel Chiappa Date: Mon, 17 Jan 94 21:18:43 JST Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu In-Reply-To: <9401120503.AA08589@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 12, 94 12:03 am X-Mailer: ELM [version 2.3 PL11] > As discussed at the last IETF, the BBN crew are going to try and > crank out a draft "architectural outline" document for Nimrod. Here's a > draft outline; any comments, etc are welcome. I'm interested in content, not architechture: > 3. Routing and Addressing Functions > - Mapping between endpoint identifiers and locators How will it be? Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa13699; 26 Jan 94 12:23 EST Received: from pizza by PIZZA.BBN.COM id aa16839; 26 Jan 94 12:04 EST Received: from BBN.COM by PIZZA.BBN.COM id aa16835; 26 Jan 94 12:01 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa12235; 26 Jan 94 12:00 EST Received: by ginger.lcs.mit.edu id AA11456; Wed, 26 Jan 94 12:00:03 -0500 Date: Wed, 26 Jan 94 12:00:03 -0500 From: Noel Chiappa Message-Id: <9401261700.AA11456@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Outline Cc: jnc@ginger.lcs.mit.edu I talked with Martha about the points about the outline brought up on the mailing list. She made the general point that the outline was very high-level, and in fact topics were covered that were not listed in that outline. On specific issues: - Nimrod design philosophy - this is already being covered in the Introduction - Nimrod design goals - Ditto. - Route selection algorithm - specific algorithms aren't covered, but a discussion of the required functional attributes (i.e. in terms of what data goes in, and what has to come out) are. - Inter-router packet formats - functional requirements for the inter-router packet format will be covered. That's all my hasty notes of the (now long-ago :-) conversation reveal. If there is something important I've missed, please let me know. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa16499; 26 Jan 94 13:11 EST Received: from pizza by PIZZA.BBN.COM id aa17084; 26 Jan 94 12:46 EST Received: from BBN.COM by PIZZA.BBN.COM id aa17080; 26 Jan 94 12:45 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa14828; 26 Jan 94 12:43 EST Received: by ginger.lcs.mit.edu id AA11761; Wed, 26 Jan 94 12:43:33 -0500 Date: Wed, 26 Jan 94 12:43:33 -0500 From: Noel Chiappa Message-Id: <9401261743.AA11761@ginger.lcs.mit.edu> To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp, nimrod-wg@BBN.COM Subject: Re: New datagram mode Cc: jnc@ginger.lcs.mit.edu >> Things operates on flow-identifier, which does not actively change >> packet route is not a router. ... will explain why flow based forwarder >> is not a router. > I reckon that must of us are happy calling devices which forward packets > based on the contents of the internetwork layer header a "router". It's > just a name, so it doesn't really matter that much. Flow ID is not considered to be internetwor layer. Perhaps we are using the same term ("flow ID") to refer to two different things, since in my conception a "flow ID" is just the name (label, identifier) of an internetwork layer object, the "flow". To me, a flow is a series of packets which belong together (e.g. the TCP packets of a file being transferred via FTP, or a multi-cast video-conference). They may be associated for fundamental reasons (e.g. resource allocation, where the allocation covers a number of packets, not just one), or for efficiency (e.g. in routing/forwarding and access control). This relationship between the packets is made across the entire internetwork system (i.e. across many networks), on an end-end basis, and is thus visible to the internetwork layer. It is thus absoutely an internetwork layer concept. > Remember also that a forwarding device which is forwarding some packets > based on their flow id's may also be forwarding *other* packets by > looking at their locators. That is, it seems to me, what you refered as "active router". Yes, the forwarding based on locators would happen only in "active routers". (The term "active" is not a particularly good one, just something I picked in a hurry because I needed a term to distinguish that set of routers.) > They all have to be functionally identical, even if they don't operate > identically on all packets. But you mentioned some static configuration. Though all the brouters are functionally identical, some are routers and some are bridges depdending on the static configuration. Again, you seem to have a unusual definition of "bridge". To me, a bridge is a device which forwards packets based on the local physical network header (i.e. 802.*, or whatever). All of the devices I am talking about would be forwarding packets based on the internetwork header, which is why I call them all routers. >> As your locator is the result of local coordination, I don't think it >> can be static. > You are correct, it is not; as I have mentioned before, as the topology > changes we will want to change the abstraction hierarchy to match. > However, I want that change in the abstarction hierarchy to be > controllable, not fully and unavoidably automatic. So, my question is, how can you globally propergate the information? I don't think you can use DNS here. Why not? That's part of the reason for controlling the rate of change, to bring it within the rate that the DNS can handle. Also, there has been some discussion about the need to allow things to have multiple locators to make this "renumbering" easier in practise; we can't have a "flag moment" when every locator within the scope of the change gets updated. We need a mechanism which allows the process to happen over some reasonable time period, with interoperation with the rest of the system continuing while the change happens. Allowing (temporary) multiple locators allows this. >> And the flow setup is the worst. > But it is only performed once (so it does not even add any delay), and > the cost [of setting up a DMF is] shared between any number of packets. You should be assuming connected UDP, then. I didn't follow this? > Remember, most routers will only be doing a flow-lookup in this scheme, not looking at the locator. With my scheme, only the source cares locator. I thought that your scheme involved intermediate routers making routing decisions for packets based on the EID of the next border router; this sequence of EID's is the locator in your scheme. Did I not understand something? Your intermediate routers have to look at the current EID in the locator (i.e. a non-fixed offset in the packet), unless you have copied the "current" EID to some other location in the packet. I am assuming that you are not looking at all of them to find the rightmost one that any given router has in its table; this, unfortunately, is what you need to do to find the "optimal" (within the amount of routing data that you have passed around to the routers in your system) path. You could combine the two, and have the intermediate border router (above) set the next border router to aim for to be not just the next one in the list, but the rightmost one it has is it's routing table, which will get you a somewhat optimized route. It still probably won't be as good as the new datagram mode, since you will have to head for the particular border router (named by its EID) in your locator, not the closest one into that area (which may or may not be the optimal entry router for the ultimate destination, sigh, another complication). > That's the whole point of having the minimal set of DMF's necessary to > do pure hierarchical routing, augmented *as necessary* where the amount > of traffic justifies the cost of extra DMF's. You only go beyond the > minimal set (which has been shown to be quite small) if there *are* many > datagrams; i.e. if the actual traffic justifies it. OK. Suppose the traffic needs the maximum set. How large is the maximum? Impossibly large, but this is true of *any* routing architecture. The "maximal" set of routing information would be that set which provides the maximally optimal route *at all times*. In any routing architeture, this would effectively mean thay everyone would have to have a complete database of the entire system; ie. track each individual destination separately, i.e. no hierarchy. The overhead of maintaining that database is far larger than the savings. You reach a point of diminishing returns. The trick is to identify the point at which further detail in the routign database doesn't pay off. I don't know how to do this, and I suspect we may never get a simple, guaranteed optimal algorithm (it feels NP-complete), but as we get better and better practical approximations, the "algorithm-independant" nature of Nimrod will allow us to deplyoy it incrementally, with no global coordination > Note also that if you think a DMF is unlikely to have any traffic across > it, set it up on demand, not in advance. That way, the only DMF's that > get set up are the ones that get used. As the communication is connectionless, you can't expect much usage pattern. This is a conjecture which I suspect is wrong, but I can't prove it right at the moment. However, I can hand-wave. For instance, cars on roads have a lot of the characteristics of datagram, but there are definitely usage patters. You can also look at phone networks; individual calls have a lot of the same characteristics as datagrams, and there, too, there are useage patterns. Especially, packets which travels long distance, which loads top level routers tends to have less pattern, becase end organizations are less related. This is true, but I suspect we'll have to monitor a real network to know what the actual patterns are. I don't think we can predict them. > I must have missed something. If we do what you suggest, and buffer "best > effort" datagrams, won't they be dropped at precisely the same points on > congestion? The actual effects of the DMF scheme ought to be the same, If you try to have QoS along each flow, some of the bandwidth is reserved and wasted, which is the difference. It depends on your resource allocation system. As far as I know, most proposed resource allocation architectures allow reserved, but unused, bandwidth to be given to "capacity available" traffic. Anyway, how can you handle partitioning? This is an open point; there are a number of potential scheme, and a decision as to which set (since I don't think one alone will do it) has not yet been made. We will be discussing it soon, I expect. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa20591; 26 Jan 94 14:17 EST Received: from pizza by PIZZA.BBN.COM id aa17748; 26 Jan 94 13:53 EST Received: from BBN.COM by PIZZA.BBN.COM id aa17744; 26 Jan 94 13:51 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa19050; 26 Jan 94 13:51 EST Received: by ginger.lcs.mit.edu id AA12674; Wed, 26 Jan 94 13:49:04 -0500 Date: Wed, 26 Jan 94 13:49:04 -0500 From: Noel Chiappa Message-Id: <9401261849.AA12674@ginger.lcs.mit.edu> To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp, nimrod-wg@BBN.COM Subject: Re: Analysis of DMF's in new datagram mode Cc: jnc@ginger.lcs.mit.edu >>> For routers which are not border routers, they need one DMF out to a >>> border router (i.e. up), and that's it. Traffic to other objects inside >>> the area can be handled by sending it to the border router, which will >>> send it back to the correct object. >> There will be load concentration at border routers, then. > I was describing the minimal functional configuration. Because of the load concentration, I don't think your configuration function. The answer to this has two parts. First, I don't think it would take *lot* more state to provide better routing. The argument is given below, in the original message. Second, to the extent that your physical network configuration provides load concentrations, this is a problem with the configuration which routing alone cannot solve. I prefer meshes, with lots of smaller routers, for the simple reason that the load is spread over more paths. > An interior router which had a instantiated DMF (not just a potential > DMF) [to] every other object in the area would still have only as many > DMF's ending at it as a border router of the area, and that case has been > analyzed as O(I), which is reasonable. Your configuration is wrong as to the configuration within an area, of course. How is it wrong? It's not at all obvious to me.. But, your configuration is also wrong as to the configuration of area hierarchy. The number of levels must be limited, Why? Everyone seems to agree that the only way to scale the system is to increase the number of levels, and in fact, in general *all* routing architectures seem to have the characteristic that increasing the number of levels increases the overhead a lot more slowly than increasing the size of existing levels. > If every interior router had an instantiated DMF to all the objects in > the area, the number of flows through each router in the area would be > O(IlogI), which is a little worse, but not terribly so, since I is > unlikely to grow large; we won't have enormous areas. Because of planarity, it is O(I^1.5), where I is not constant. I will discuss the planarity issue below, but even if it *were* O(IsqrtI), that would still not be impossible, since I don't expect to see massive growth in I (the number of one-level-down-objects in the average area). > In addition, I doubt we will see full mesh connectivity; traffic X-Y > graphs always show hot-spots, not an even distribution, at any scale. Hot spots make the load concentration worse. These "hot spots" are not *physical* hot-spots, but source-destinaion traffic matrix entries which show larger counts than average. They would only cause *phyical* hot-spots if the topology has limited connectivity, and the sources and destinations are scattered across that topology in a special way. Of course, it would be loony to design you physical topology like that, if you had those tarffic patterns; you'd modify it to get rid of the hot spots. >> not all DMFs are equal. That is, higher level DMF means a lot more >> traffic through it. That's why you can't increase the number of levels >> at will. Your model is not scalable. > This is an interesting point, but it impacts a lot more than just DMF's. > If top level links don't have enough bandwidth to handle the traffic, > this is going to be true whether you use DMF's for datagram traffic, or > hop-by-hop, or whether the user traffic is all in end-end flows! Top level links MUST have enough bandwidth. True, but this is a physical topology design point, not a routing architecture design point. That is, there should be a lot of second level areas and they should be connected with a lot of links. Your conffiguration, which assumes small areas, do not allow such a coniguration. I'm not quite sure I follow this. I have said that any area ought to contain a relatively small (where "relatively" may be O(1,000), or perhaps even more) number of next-level-down objects. This does not mean that a top level area will cover a small part of the physical topology; it may in fact cover a *huge* area of the topology. Was this what you meant, or did you have some reason to think that practical designs would need areas far larger than this? > To grow the network, we will have to do one of two things. Either the top > level links (i.e. the express highways) will have to have more > bandwidth, in which case this is not a problem, or we are going to have > to distribute the traffic over a number of smaller, parallel, links. > Since this increases the number of arcs in the graph, *not* the number > of nodes, If you increase number of links without increasing the number of nodes, load will concentrate not in links but on border routers. Good point; we well need to increase the number of border nodes too. This obviously makes hash of my next statement, but it turns out that DMF growth is more correlated to the number of *interior* objects, and not to the number of border routers, so we can up the number of border routers too (say from sqrt(I), the assumption I used in my calculations) to some higher function) without serious problem. > it doesn't impact the growth of DMF's either. I reckon the second is > better (since it uses parallel, less expensive, technology, to get the > performance), but I'm not sure which we will see. We need a lot of top level areas with a lot of border routers. I need to go away and think about this; I'm convinced that a mesh *is* the answer, but I need to understand that there are no pitfalls (in terms of load concentration or routing overhead) to making it work. Note that this is a problem with routing/physical topology *in general*, not just this specific architeture. > One other thing to note is that *all* routing algorithms have limits on > the size of the graph over which you can run them in practise. To build a > larger network, you need to introduce levels, and to scale to arbitraty > sizes, you need an arbitrary number of levels. Everything has its own limitaiton. Still, we can expect some limitation scale as time goes by. For example, allowable size of routing information is expected to scale as the link speed increases. True, but I think the limit at the moment is memory, not bandwidth, although as I explained (with memory capacities going as the square of feature size, whereas device speed goes linearly with feature size) I expect this balance to shift. So, though it is obvious that there should be levels, but you don't have to assume the area size be constant. Oh, I'm not. I'm just assuming that i) growth in the size of the network will be *faster* than technology for some years to come (how fast is the Internet growing now), so we can't accomodate that growth purely with growth in technology (line speeds and memory sizes). Also, as I explained, you get more "bang for your dollar" out of increasing the number of levels than you do out of increasing the size of each level. As a very simplified example, let's assume you have a 24-bit address. You can either make it two 12-bit fields, or three 8-bit fields. Either gives you the same number of total destination addresses available - 2^24. However, the former would take 2*(2^12) routing table entries in a router, or 8K, whereas the latter would take 3*(2^8), or 768; i.e. a order or magnitude less state! Similar calculations hold for *all* routing architectures. The Kleinrock and Kamoun paper shows that you don't wind up with much worse routes (in fact, the difference is usually infintesimal), and the reduction in routing overhead is substantial. >>> The average path length, A, for a graph of fixed degree (i.e. one in >>> which nodes have the same average number of arcs to neighbouring nodes, >>> independant of the size of the graph) is logN (where N is the number of >>> nodes). [Chen 86] >> While such an average over all the possible graphs will be so, as your >> topology is mesh, you should average only over planar graphs. Thus, >> the average path length will be sqrt(N). > This doesn't make sense to me. Planar graphs are not a good model for the > connectivity of the network. For instance, in a planar graph, you cannot > have full interconnection between 5 nodes As the Earth surface is planar and as the routers are placed on Earth it is the model. Well, not exactly (we aren't two dimensional beings :-), but you do have a good point below, so I'll skip this. > However, I don't think that the real network will display such > constraints. So, planar graphs are not the applicable area of graph > theory, but rather normal graphs, and there it is log(N). Small number of crossing does not matter. But, if you allow arbitrary crossing, it means most links are lengthy. Assume top level routers are distributed all over the surface of the Earth. Then, how, do you think, will the average link length between connected nodes? If you assume full randomness, the average is something with transcontinental scale, which is quite costly. This is a good point, actually. I doubt that normal (i.e. non-planar) graph of fixed degree (i.e. one in which nodes have the same average number of arcs to neighbouring nodes, independant of the size of the graph) is really an optimal model for the network either. The problem, as you have pointed out, is that not all links are equally likely. I.e., if Pij is the probability of a link between nodes i and j (thanks for the notation, Yakov :-), in a real network, Pij is *not* a constant over all j for a given i. Rather, nodes which are "closer" (in the physical space geometry) are more likely to have links than those further away. So, even if the average node does have a constant number of arcs, they are not distributed randomly across the graph. This will move us off the O(logN) point, and toward the O(sqrtN) point. However, without a probability model, and lot of math (or simulation), neither of which I have time for, it's impossible to say how far. My guess, based on looking at real-world* networks like the ARPANet, is that it will be pretty close to the results for true fully random graphs, in future real networks. The reason is simple; long path lengths are a *bad thing*. People will put in enough non-local links to bring the path length down, but my thinking (based on recollection, again) is that its an asymptotic, diminishing-returns type of thing. It doesn't take a lot of long links to really whack down the diameter (and thus the average path length). I know BBN did a lot of work in this area, modelling the ARPANet to see where to add new links. Perhaps someone there can report briefly on what they recall? Again, my specific recollection of that work is that it doesn't take many non-local links to really help. Of course, then you have load issues on those links, but that's another story... Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa03502; 28 Jan 94 12:51 EST Received: from pizza by PIZZA.BBN.COM id aa29954; 28 Jan 94 12:18 EST Received: from BBN.COM by PIZZA.BBN.COM id aa29950; 28 Jan 94 12:15 EST Received: from nsco.network.com by BBN.COM id aa01595; 28 Jan 94 12:14 EST Received: from anubis.network.com by nsco.network.com (5.61/1.34) id AA21410; Fri, 28 Jan 94 11:18:02 -0600 Received: from blefscu.network.com by anubis.network.com (4.1/SMI-4.1) id AA01003; Fri, 28 Jan 94 11:13:12 CST Date: Fri, 28 Jan 94 11:13:12 CST From: Andrew Molitor Message-Id: <9401281713.AA01003@anubis.network.com> To: nimrod-wg@BBN.COM Subject: Re: New Datagram Mode Let's first see if I have this right, then I will assume that I have it right, and draw up a little example, to see if it's as useful as Noel says. As I understand it, the New Datagram Mode treats the locator of the source, massaged and glued to the locator of the destination, as a sort of crude source route. I guess the things in it are different things than in a real source route, but it sure smells like a source route to me. [ As a side note, I expect it'd be a neat tweak to have the source find the lowest common object in the two locators, turn the source locator around, and paste the two together to form something linear and more source route-like. This just tidies up the router code a little, and saves a wee bit of per router forwarding cost. ] Now, playing fast and loose with terminology, I am going to call this massaged and glued pair of locators a 'crude source route', or CSR. Whether or not it actually exists doesn't matter, since it exists logically. As I understand it, the New Datagram Mode uses this CSR to do forwarding, but is permitted to look ahead in it to see if it has a Datagram Mode Flow already set up to somewhere further up in the CSR. If it does, over the flow it goes. If it doesn't, we just do the usual source route thang. So, here's the example. One of the things Nimrod will do a lot of is looking up locators. Lest I be told 'Wrong. DNS not magic.' I will refer to this as 'Locator Location Service', or LLS, provided by 'Locator Location Servers', also LLSs. I am guessing that you'd wind up with DMFs as follows: - everyone's got a DMF to the root LLS (this is icky, you'd need to surround the root LLS with a wad of routers to carry the load? How do the root nameservers deal with it, anyways?) - regional network core routers will typically have a lot of DMFs to local LLSs. Is that it? Anyways, to do a Locator Lookup you go right over the flow to the root (and maybe your answer returns the same way? These flows are bidirectional?). Then subsequent lookups wobble through the hierarchy via Crude Source Routing, until they hit the regional net of the destination, which short-circuits it to the right place over the DMF to the local LLSs. If network.com spends most of its bandwidth doing Locator Lookups to nmsu.edu, a new DMF would/might magically appear from network.com's gateway router to nmsu.edu's gateway router. I think that this thing will work. It remains to be seen if this thing is the same as Noel's thing, though. Andrew Molitor   Received: from PIZZA.BBN.COM by BBN.COM id aa07120; 28 Jan 94 14:01 EST Received: from pizza by PIZZA.BBN.COM id aa00575; 28 Jan 94 13:41 EST Received: from MARENGO.BBN.COM by PIZZA.BBN.COM id aa00569; 28 Jan 94 13:39 EST Date: Fri, 28 Jan 94 13:33:35 EST From: Karen Seo To: nimrod-wg@BBN.COM Subject: apologies Sorry for mistakenly redistributing Andrew Molinar's message.   Received: from PIZZA.BBN.COM by BBN.COM id aa25427; 30 Jan 94 18:57 EST Received: from pizza by PIZZA.BBN.COM id aa09997; 30 Jan 94 18:42 EST Received: from BBN.COM by PIZZA.BBN.COM id aa09993; 30 Jan 94 18:40 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa25170; 30 Jan 94 18:40 EST Received: by ginger.lcs.mit.edu id AA21359; Sun, 30 Jan 94 18:40:30 -0500 Date: Sun, 30 Jan 94 18:40:30 -0500 From: Noel Chiappa Message-Id: <9401302340.AA21359@ginger.lcs.mit.edu> To: amolitor@anubis.network.com, nimrod-wg@BBN.COM Subject: Re: New Datagram Mode Cc: jnc@ginger.lcs.mit.edu As I understand it, the New Datagram Mode treats the locator of the source, massaged and glued to the locator of the destination There is actually no massaging of either locator, just the pointer. as a sort of crude source route. I guess the things in it are different things than in a real source route, but it sure smells like a source route to me. I've had this argument before! No, a hierarchical locator is *not* a source route (at least for any reasonable definition of the term "source route" :-), in any of the varied ways of using it to route traffic, including traditional hop-by-hop, as well as the NDM. I hope that all those who think it *is* will have the time to read this message, wherein I will show conclusively, absolutely and utterly that such a conception is misguided. (Yes, Paul, this means you! :-) First, you need to realize that there is a continuum which stretches from the one end of a strict source route (in which every single physical asses to be used to carry the packet is explicity identified), through loose source routes, to the other end, which is pure hop-by-hop routing on non-hierarchical addresses. (I hope that everyone can see that this represents the other extreme, since clearly, even in a system which was as "anti-source-route" as possible, the source would still have *some* influence, since it has to decide where the traffic is going!) The thing which is present in different measures, all along this continuum, is the *amount of influence* the *source* has in *picking the path* for the traffic. Now, let's look at whether or not routing on hierarchical locators, whether done in a hop-by-hop system or the NDM, can in any way be descibed as "source routing". The usual argument seems to be that you can consider the list of hierarchically related objects in the locator a "source route" of a sort, since one routes first to the largest object, then to the next, etc. However, it is fallacious to think that this means the locator is a source route. Go back to the characteristic of a source route: it allows the source *some control* over the path taken by the traffic. No such thing happens here! In fact, if you think more deeply about what is happening in hierarchical routing systems, the point becomes clear. A hierarchical address structure is simply a way to reduce the overhead of the routing; this *usually* entails a certain amount of non-optimal route selection. However, it is *theoretically* possible to have a system in which, although the addresses are hierarchical (so presumably the routing overhead is somewhat smaller), there *is* no loss of routing optimality; i.e. all traffic takes *exactly the same paths it would have taken had addresses been assigned "flat"*. In such a system, you'd be hard pressed to say, with a straight face, that conversion from "flat" addresses to hierarchical had introduced "source routes". As an aside, such a routing system is basically impossible in the real-world. To explain why would make this Noelgram impossibly long; some other time, perhaps. Rather, like the "optimal page replacement algorithm" (which requires knowledge of *future* memory reference patterns, clearly not a possible input to a real-world algorithm :-), it is a useful theoretical measuring-stick sometimes. Since real hierachical routing can get asymptotically close to this state, depending on the amount of overhead you are willing to pay, I don't think it's unreasonable to use it here to help make it clear why hierarchical locators are *not* source routes. You *could* make an argument that in a system where destinations have many, many locators, which are differentiated on the basis of something about how you get to the destination, e.g. by the long-haul carrier through which the destination is reached, such locators *are* source-routes in a way, since *they allow the source to say something about which route it prefers*. This is entirely different thing, however. The correct way to look at that is that in such a system, for each destination there is a different "tree" of routes through the network (from each possible source, to that destination) for each locator the destination has. Thus, selection of one locator over another basically picks one "tree" over another. However, note, that this selection process would be the same *regardless of whether the locators were flat or hierarchical*. The hierarchical stuff is simply an optimization. It's the multiple locators that provide *all* the soucre routing function, *not* the hierarchical nature. I hope everyone will now be convinced that any *appearance* of "source routing" in the use of hierarchical locators is simply an illusion. The source has no more control over the route taken by the traffic with hierarchical locators than with flat ones. The hierarchical locators may produce slightly non-optimal routes, but that's a whole different bowl of wax. As a side note, I expect it'd be a neat tweak to have the source find the lowest common object in the two locators, turn the source locator around, and paste the two together to form something linear and more source route-like. This just tidies up the router code a little, and saves a wee bit of per router forwarding cost. I don't think so; as far as I can tell without actually doing the code out in detail, it probably does not in fact make the job of the routers much easier. If the routers have optimizations in the database of DMF's (e.g. a packet from A.B.C.D to P.Q.R.S might get to router on the border of A.B.C and find that it had a direct DMF to P.Q.R), you still have to do "longest match" on the destination locator to look for these optimized DMF's. If we have truncated locators in the packets, that would probably just make the search harder, and ambiguous: if you have DMF's to A.P.X.Y and A.Q.X.Y and you see a packet for X.Y, how do you know which one to pick? Sure, you can backtrack and figure it out, but is it realy worth it? I dunno, I expect that it will prove to be non-optimal, but we can probably look at it when the mechanism is designed in detail. As I understand it, the New Datagram Mode uses this CSR to do forwarding, but is permitted to look ahead in it to see if it has a Datagram Mode Flow already set up to somewhere further up in the CSR. If it does, over the flow it goes. If it doesn't, we just do the usual source route thang. Yes, except thatthe thing you do isn't "the usual source route thang", but rather "the usual hierarchical routing thing" (modulo the temporary assignment to a DMF, which is hardly the usual hierarchical routing thing :-). So, here's the example. One of the things Nimrod will do a lot of is looking up locators. ... I will refer to this as 'Locator Location Service', or LLS, provided by 'Locator Location Servers', also LLSs. I am guessing that you'd wind up with DMFs as follows: - everyone's got a DMF to the root LLS (this is icky, you'd need to surround the root LLS with a wad of routers to carry the load? How do the root nameservers deal with it, anyways?) First, we didn't design the DNS so that everyone has to go to the same root, and I imagine the LLS would be the same way; you have multiple roots, you accept local caches of stuff from the root table, etc, etc. But this is not the main reply.. Second, you do *not* have DMF's directly from sources to destinations. You have a limited set of DMF's, from each hierarchical object to a set of "nearby" hierarchical objects. This mesh of DMF's is sufficient to get packets from any place in the network to any other, traversing a number of *different* DMF's on the way. (The set of DMF's each node has to have has a certain minimal theoretical size if the system is to function at all, and you can add extra ones to get rid of non-optimal routing caused by the pure hierarchical routing of the minimum DMF set.) - regional network core routers will typically have a lot of DMFs to local LLSs. Again, no, because the routers generally wouldn't have DMF's directly to destinations. Is that it? Anyways, to do a Locator Lookup you go right over the flow to the root (and maybe your answer returns the same way? These flows are bidirectional?). You couldn't get to the root LLS unless you had its locator. If you had its locator, you didn't need to use a flow to get to it, you could send out a datagram using the NDM, and it would be forwarded over a set of DMF's to get to the root LLS. Then subsequent lookups wobble through the hierarchy via Crude Source Routing, until they hit the regional net of the destination, which short-circuits it to the right place over the DMF to the local LLSs. Again, probably not a DMF directly to the local LLS. (You *might* put this in as an optimization, see above, but only if there were enough traffic to warrant it.) If network.com spends most of its bandwidth doing Locator Lookups to nmsu.edu, a new DMF would/might magically appear from network.com's gateway router to nmsu.edu's gateway router. If enough NDM traffic went from network.com to nmsu.edu, the relevant routers might set up a DMF between themselves, to optimize the routing of NDM traffic. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa05902; 31 Jan 94 0:30 EST Received: from pizza by PIZZA.BBN.COM id aa11032; 31 Jan 94 0:13 EST Received: from BBN.COM by PIZZA.BBN.COM id aa11022; 31 Jan 94 0:11 EST Received: from nsco.network.com by BBN.COM id aa05482; 31 Jan 94 0:11 EST Received: from anubis.network.com by nsco.network.com (5.61/1.34) id AA03106; Sun, 30 Jan 94 23:14:55 -0600 Received: from blefscu.network.com by anubis.network.com (4.1/SMI-4.1) id AA12585; Sun, 30 Jan 94 23:10:02 CST Date: Sun, 30 Jan 94 23:10:02 CST From: Andrew Molitor Message-Id: <9401310510.AA12585@anubis.network.com> To: nimrod-wg@BBN.COM Subject: Re: New Datagram Mode >I've had this argument before! No, a hierarchical locator is *not* a source >route I confess that this was partly Noelbait. However, I did not previously, but do now, see why it's reasonable to say that this locator pair is not a source route. I will concede that point, and try to be a good little trooper and not call these things source routes. >Second, you do *not* have DMF's directly from sources to destinations. You >have a limited set of DMF's, from each hierarchical object to a set of >"nearby" hierarchical objects. This mesh of DMF's is sufficient to get packets >from any place in the network to any other, traversing a number of *different* >DMF's on the way. (The set of DMF's each node has to have has a certain >minimal theoretical size if the system is to function at all, and you can add >extra ones to get rid of non-optimal routing caused by the pure hierarchical >routing of the minimum DMF set.) Oh dear, I failed to make myself clear. I didn't mean that DMFs went from sources to destinations, but that there might well wind up being one set up between an Important Router at one's site to an Important Router near a Locator Location Server, and that NDM packets would tend to wend there way to the first Important Router which would then say to itself 'Ah HA! An LLS packet! I know how to get this thing a good long ways toward that there root LLS!' and zap it over the DMF. >You couldn't get to the root LLS unless you had its locator. If you had its >locator, you didn't need to use a flow to get to it, you could send out a >datagram using the NDM, and it would be forwarded over a set of DMF's to >get to the root LLS. Yeah, this is what I was trying to say. Let's club an analogy to death here. Let's pretend we're trying to get from Middletown CT to Las Cruces, NM by automobile, using NDM. The locators look like: USA.Connecticut.Dead-Center.Ask-a-Native and USA.New-Mexico.South-East.Ask-a-Native and I do indeed see that these two things do not really consititute a route, source or otherwise. from the one to the other, though they do (assuming a little intelligence here and there, at USA-level entities, mostly) have sufficient information to get from one to the other. NDM proposes, essentially, that we have superhighways, cleverly renamed 'Datagram Mode Flows.' Well, not quite, since the topologcal restrictions are not as strict -- if there's a great deal of traffic from Middletown to Las Cruces, then by gum, we run 16 lanes each way from one to the other, and no there is no Tulsa exit. It occurs to me that Noel said almost exactly this not too long ago, but I have to say it myself, just like I'd invented it, before I can get my itty bitty brain around it. Thanks for bearing with me, folks! Andrew Molitor   Received: from PIZZA.BBN.COM by BBN.COM id aa20721; 1 Feb 94 11:23 EST Received: from pizza by PIZZA.BBN.COM id aa21006; 1 Feb 94 10:48 EST Received: from BBN.COM by PIZZA.BBN.COM id aa21002; 1 Feb 94 10:44 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa17733; 1 Feb 94 10:39 EST Received: by ginger.lcs.mit.edu id AA01152; Tue, 1 Feb 94 10:39:35 -0500 Date: Tue, 1 Feb 94 10:39:35 -0500 From: Noel Chiappa Message-Id: <9402011539.AA01152@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Locators, and physical topology Cc: jnc@ginger.lcs.mit.edu For those of you who aren't on ROLC, you might find the following message interesting. It points out the kind of difficulty you can have when i) you rely on your locators to tell you something about the underlying physical connectivity, and ii) your locators do not accurately depict that connectivity. I realize we aren't sure quite yet what, if anything, Nimrod locators will tell you, but we should note two things: i) *some* mechanism is going to be needed to find out about connectivity, and ii) that mechanism better accurately do so. If locators do tell us anything, we need to keep the kind of problems seen here in mind, especially in light of the stuff I talked about a while back where there are forces pulling locators toward different needs; i.e. policy, abstraction, and representation of the physical topology. Noel -------- for the bigger problem of next hop resolution i suggest that we use either directed arp or tony's protocol. these are ok if the next hop can be resolved in several steps and also can cope with the problem of a better route becoming available dynamically. I've been thinking about this whole issue of routing since the last IETF, and the meeting where the "loop" scenario was presented. There are two separate fundamental problems. For the case of trying to optimize paths from one router to another, I think that there is unfortunately a fundamental clash between the current "hop-by-hop" routing architecture used by the Internet, and the (effectively) flow-setup used by most attempts to perform this optimization across multiple logical NBMA's on a single physical NBMA (i.e. getting rid on an intermediate IP router step which turns out not to be needed). As such, there are no *real* fixes to this problem. It's like trying to put wheels on one side of a car, and paddles on the other; the two ain't never gonna get it together, no matter how hard you try. The potential problems caused by this include fatal failures such as the creation of routing loops. There is another problem, not quite as serious (the effects aren't fatal), which affects the process of a host picking an exit router (when the ultimate destination is not on the NBMA mesh), and noticing when a given exit router is no longer the optimal exit router. This has to do with the fact that routing tables are usually organized and calculated to find the best route from *here* to a destination, and usually cannot tell you if *here* is on the best path from the *source* to the destination. Again, there are no real solutions to this problem. It's important to realize that neither of these means you are utterly screwed; they just mean that there are limits on what you can achieve within the current architectural framework, and those limits have to be respected. Attempting to go beyond them will land you in the soup... In more technical terms, here's a description of the first problem. The hop-by-hop model is susceptible to a particularly painful failure mode, which is the formation of routing loops. Various mechanisms are usually used to prevent them, but they all boil down to the same thing: the databases used to make routing decisions (i.e. routing tables, etc) have to be maintained in a *consistent* state. In other words, when a change happens to the topology, the effects of that change must be *reliably* propogated to *every* router which is affected. More technically, every node in the network topology which is *potentially* part of a routing loop must be reliably updated to a consistent state, lest a routing loop be formed. The limitation to nodes which could become part of routing loops is because "leaf" nodes like hosts, which cannot be the destination of any traffic except that which is directly for them, cannot be part of any loop. In other words, if a host has the first hop "wrong", it's not the end of the world; the traffic will take a less efficient path from the source to the destination, but it will get there. However, if a router gets confused (i.e. it's not consistent with all the other routers), all sort of trouble may *potentially* break out. With this in hand, let's look at the kind of failure mode which is basically *inescapable* with the kind of optimization process that has been discussed. The problem is that the optimization represents state about the routing, and state which is *not* under the control of the carefully arranged mechanisms of routing which make sure that that state is consistent across the network. As an example, let's say that router A get to X via the path B,C,D, all of which are on the same physical NBMA. A contains B as the "next hop" entry for X in its routing table. A goes to look up the physical address of B, and gets the PA of D via an optimization process. Now, some change happens in the topology *outside* the NBMA which makes D decide that *A* is the place to send traffic for X. Bingo, routing loop. You can say that "Oh, that's easy to fix, we make A notice that something has changed in the routing, and redo its setup for X", but that's not so easy to guarantee. Maybe the change resulted in A's best next hop no longer being B, in which case A will have noticed something. However, it might have left it at B, in which case you are screwed. I don't recall the details of the example from the IETF any more, but it was something like this case. The point is that there are fundamental reasons why this kind of failure mode is going to be there, and they have to do with attempts to bypass the severe consistency requirements of "hop-by-hop" model; you're creating routing state which is not under the control of the mechanisms which are intended to maintain that required consistency. Are there ways to do this optimization? Yes, but they all boil down to recognition of the real topology, and *doing so at the internetwork routing layer*. In other words, if A,B,C and D are all attached to the same physical network, and can talk directly, *the internetwork routing has to know about it explicitly*. Right now, routers know that P and Q are directly connected if their interface addresses are the same when masked by the "IP network mask" for that physical interface. Since the desire is to allow (effectively) random IP addresses for different parts of a physical NBMA mesh, an alternative mechanism would need to be found to convey this information, and the routing algorithm/protocol would have to be looked at to make sure it works with this kind of thing. (For instance, I don't know if OSPF would work if the topology map showed router 1.2.3.4 connected directly to 10.11.12.13, but it might.) The problem with noticing when a given exit router is no longer the optimal exit router is a little more intractable. It turns out we have this problem now, but because of the way the Internet is addressed, and physically put together, the way the Redirect mechanism fails is usually not so noticeable. For an example of a currently non-working scenario, assume that net N has two routers off it, R1 and R2, both of which are connected to the "rest of the world" cloud. S, a host on N, is getting to D (out there in the RotW) via R1. Now, the routing changes such that the best path from S to D is via R2, and *R1's next hop for D is not on N*. Most current router implementations will *not* notice this; S will never get a redirect from R1 to R2 for D. The problem is simple: as stated, routing tables are usually organized and calculated to find the best route from *here* to a destination, and usually cannot tell you if *here* is on the best path from the *source* to the destination. Fixing this would require substantial changes to the routing tables, and their calculation; you'd basically have to maintain a routing table *per* interface* which listed what the best exit router from that network was to each destination. The overhead of all this is so high that most vendors don't bother. In fact, if you look at the *specification* of most routing protocols, the algorithm *in the spec* does not do this. Now, as stated, this problem is not the end of the world; the packets are going to get there, and if there is a *serious* non-optimality in the path (e.g. R1 lists its best next hop as R2, on the same net N as the source S), you usually find out about it, and the source gets a Redirect. As can be imagined, this "failure mode" becomes i) much more likely when a much larger net, and pool of directly connected hosts and router, is present, and ii) much harder to detect when all sorts of source addresses turn out to be directly connected to the router. The router is going to have to look at each packet that comes in, and say "am I on the best path to that destination", and if not, try and figure out if it is directly connected to the source of the packet. Alternatively, if it knows all the NBMA connections to it, when a topology change happens, it can try and find the ones which would be affected, and send them a Redirect, but this is subject to the kind of problems raised above, although any failures are not fatal in this case. Of course, the whole thing is considerably easier if the router knows the *real* physical topology, as opposed to some logical topology which is laid on top of it, and through which it can only dimly see to the underlying physical topology. Obviously, if *all* it could see was the logical topology, it wouldn't have to worry about it at all. It's this in-between state, neither fish nor fowl, which is so hard to deal with. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa01285; 1 Feb 94 14:27 EST Received: from pizza by PIZZA.BBN.COM id aa22494; 1 Feb 94 14:03 EST Received: from BBN.COM by PIZZA.BBN.COM id ab22487; 1 Feb 94 14:01 EST Received: from inet-gw-1.pa.dec.com by BBN.COM id aa29688; 1 Feb 94 14:00 EST Received: from nacto1.nacto.lkg.dec.com by inet-gw-1.pa.dec.com (5.65/13Jan94) id AA12378; Tue, 1 Feb 94 10:35:14 -0800 Received: from sneezy.nacto.lkg.dec.com by nacto1.nacto.lkg.dec.com with SMTP id AA03899; Tue, 1 Feb 1994 13:35:13 -0500 Received: by sneezy.nacto.lkg.dec.com (5.65/4.7) id AA13541; Tue, 1 Feb 1994 13:35:12 -0500 To: Noel Chiappa Subject: Re: Locators, and physical topology In-Reply-To: <9402011539.AA01152@ginger.lcs.mit.edu> References: <9402011539.AA01152@ginger.lcs.mit.edu> Cc: nimrod-wg@BBN.COM X-Mailer: Poste 2.1 From: David R Oran Date: Tue, 1 Feb 94 13:35:11 -0500 Message-Id: <940201133511.5819@sneezy.nacto.lkg.dec.com> > exit router is no longer the optimal exit router. This has to do with the > fact that routing tables are usually organized and calculated to find the > best route from *here* to a destination, and usually cannot tell you if > *here* is on the best path from the *source* to the destination. Again, > there are no real solutions to this problem. > I know what you're getting at here Noel, but be careful because you overstate the case. Link-State routing does in fact organize routing tables such that *here* can know if it is on the best path from the source to the destination. All you need to do is run an SPF picking the source as the root of the SPF tree instead of *here* and halting if *here* is placed onto the tree. If the algorithm places the destination on the tree before *here*, then conversely, *here* is not on the tree. The objection to link-state routing in the NBMA case is that while the SPF calculation will find optimal routes and avoid extra packet hops, it still requires control traffic to flow over all possible NBMA router pairings to maintain the topology. This can be optimized somewhat by using "designated router" techniques (as OSPF does), but the background traffic is still O(N), where N is the number of routers on the NBMA net. Dave. BTW: my silence on the list doesn't mean I haven't been reading stuff.   Received: from PIZZA.BBN.COM by BBN.COM id aa04357; 1 Feb 94 15:14 EST Received: from pizza by PIZZA.BBN.COM id aa22711; 1 Feb 94 14:51 EST Received: from BBN.COM by PIZZA.BBN.COM id aa22707; 1 Feb 94 14:48 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa02519; 1 Feb 94 14:43 EST Received: by ginger.lcs.mit.edu id AA04917; Tue, 1 Feb 94 14:43:18 -0500 Date: Tue, 1 Feb 94 14:43:18 -0500 From: Noel Chiappa Message-Id: <9402011943.AA04917@ginger.lcs.mit.edu> To: jnc@ginger.lcs.mit.edu, oran@nacto.lkg.dec.com Subject: Re: Locators, and physical topology Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM > This has to do with the fact that routing tables are usually organized > and calculated to find the best route from *here* to a destination, and > usually cannot tell you if *here* is on the best path from the *source* > to the destination. be careful because you overstate the case. Link-State routing does in fact organize routing tables such that *here* can know if it is on the best path from the source to the destination. All you need to do is run an SPF picking the source as the root of the SPF tree instead of *here* and halting if *here* is placed onto the tree. I sort of alluded to this later on, when I said: > Fixing this would require substantial changes to the routing tables, and > their calculation; you'd basically have to maintain a routing table *per* > interface* which listed what the best exit router from that network was > to each destination. The overhead of all this is so high that most > vendors don't bother. In fact, if you look at the *specification* of most > routing protocols, the algorithm *in the spec* does not do this. The fix you outlined works for SPF; there's an equivalent one for DV, which involves only entering the updates received over interface X in the routing table for interface X. In other words, in both SPF and DV, all the data you need is already there in the packets you get, but the algorithm as per the spec doesn't calculate it for you. The objection to link-state routing in the NBMA case is that while the SPF calculation will find optimal routes and avoid extra packet hops, it still requires control traffic to flow over all possible NBMA router pairings to maintain the topology. This can be optimized somewhat by using "designated router" techniques (as OSPF does), but the background traffic is still O(N), where N is the number of routers on the NBMA net. Right. It also requires the routers to all know that they (and the hosts which are associated with them) can all directly communicate, which is something that it's not possible to tell purely by inspection of IP addresses, in most schemes. Then there are all the additional complexity of "how do you handle connectivity restrictions at the NBMA layer; i.e. A can connect directly to B, and B to C (over the same interface that B gets to A with), but not A to C.... Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa23444; 1 Feb 94 22:00 EST Received: from pizza by PIZZA.BBN.COM id aa25370; 1 Feb 94 21:43 EST Received: from BBN.COM by PIZZA.BBN.COM id aa25365; 1 Feb 94 21:41 EST Received: from necom830.cc.titech.ac.jp by BBN.COM id aa22522; 1 Feb 94 21:36 EST Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 2 Feb 94 10:55:52 +0859 From: Masataka Ohta Return-Path: Message-Id: <9402020156.AA16436@necom830.cc.titech.ac.jp> Subject: Re: Locators, and physical topology To: Noel Chiappa Date: Wed, 2 Feb 94 10:55:50 JST Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu In-Reply-To: <9402011539.AA01152@ginger.lcs.mit.edu>; from "Noel Chiappa" at Feb 1, 94 10:39 am X-Mailer: ELM [version 2.3 PL11] > For those of you who aren't on ROLC, you might find the following > message interesting. It points out the kind of difficulty you can have when i) > you rely on your locators to tell you something about the underlying physical > connectivity, and ii) your locators do not accurately depict that > connectivity. If WAN is conventional PDN, routing packet is charged to the end user, which makes periodic flooding of routing information costly. But, if, like nimrod, WAN is operated directly by IP, there is no cloud and there is no routing over the large cloud. Routing packet will be directly handled by WAN operators and its cost will be included in the basic maintainance fee. BTW, at physical layer, there is no such thing as NBMA. Trying to build link level NBMA is just wasting of information, which make the cloud. Am I completely misunderstand something? Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa02567; 4 Feb 94 11:27 EST Received: from pizza by PIZZA.BBN.COM id aa10245; 4 Feb 94 11:07 EST Received: from BBN.COM by PIZZA.BBN.COM id aa10241; 4 Feb 94 11:04 EST Received: from mitsou.inria.fr by BBN.COM id aa00688; 4 Feb 94 10:54 EST Received: by mitsou.inria.fr (5.65c8/IDA-1.2.8) id AA09872; Fri, 4 Feb 1994 16:56:51 +0100 Message-Id: <199402041556.AA09872@mitsou.inria.fr> To: David R Oran Cc: Noel Chiappa , nimrod-wg@BBN.COM Subject: Re: Locators, and physical topology In-Reply-To: Your message of "Tue, 01 Feb 1994 13:35:11 EST." <940201133511.5819@sneezy.nacto.lkg.dec.com> Date: Fri, 04 Feb 1994 16:56:51 +0100 From: Christian Huitema => => > exit router is no longer the optimal exit router. This has to do with the => > fact that routing tables are usually organized and calculated to find the => > best route from *here* to a destination, and usually cannot tell you if => > *here* is on the best path from the *source* to the destination. Again, => > there are no real solutions to this problem. => > => I know what you're getting at here Noel, but be careful because you => overstate the case. Link-State routing does in fact organize routing => tables such that *here* can know if it is on the best path from => the source to the destination. All you need to do is run an => SPF picking the source as the root of the SPF tree instead of => *here* and halting if *here* is placed onto the tree. If the => algorithm places the destination on the tree before *here*, => then conversely, *here* is not on the tree. => => The objection to link-state routing in the NBMA case is that => while the SPF calculation will find optimal routes and avoid => extra packet hops, it still requires control traffic to flow over => all possible NBMA router pairings to maintain the topology. => This can be optimized somewhat by using "designated router" => techniques (as OSPF does), but the background traffic is still => O(N), where N is the number of routers on the NBMA net. => Dave, The algorithm you describe is exactly that implemented by MOSPF. I objected strongly to it when I first saw the proposal, for: 1) Running SPF for *here* has a computational cost of 0(N.log N) 2) So has running SPF from *there* 3) and the number of *there* is 0(N) 4) so the whole thing is 0(N**2.log(N)) Jon Moy got his way through the "computer is cheap, link costs" argument. Note that if you just do RPF in "flood and prune" mode, you don't need to flood the group membership to all routers and you only need an SPF from "here" (using he reverse metrics). Arguably, flooding packets from various sources and sending back prunes has the same network cost as flooding membership-link-state updates through an acknowledged protocol. This as been heavily debated in IDMR. Christian Huitema   Received: from PIZZA.BBN.COM by BBN.COM id aa06453; 8 Feb 94 16:31 EST Received: from pizza by PIZZA.BBN.COM id aa02170; 8 Feb 94 16:07 EST Received: from BBN.COM by PIZZA.BBN.COM id aa02166; 8 Feb 94 16:04 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa03940; 8 Feb 94 16:00 EST Received: by ginger.lcs.mit.edu id AA22740; Tue, 8 Feb 94 15:59:59 -0500 Date: Tue, 8 Feb 94 15:59:59 -0500 From: Noel Chiappa Message-Id: <9402082059.AA22740@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Security... Cc: jnc@ginger.lcs.mit.edu One nice thing about the NDM is that it makes it a lot harder to put a totally bogus source locator in the packet, and have the routers handle it correctly (as is now the case). This may be important down the road, from a security angle. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa16019; 13 Feb 94 9:50 EST Received: from pizza by PIZZA.BBN.COM id aa24536; 13 Feb 94 9:30 EST Received: from BBN.COM by PIZZA.BBN.COM id aa24532; 13 Feb 94 9:27 EST Received: from necom830.cc.titech.ac.jp by BBN.COM id aa15501; 13 Feb 94 9:27 EST Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Sun, 13 Feb 94 23:17:42 +0900 From: Masataka Ohta Return-Path: Message-Id: <9402131417.AA25530@necom830.cc.titech.ac.jp> Subject: Re: New datagram mode To: Noel Chiappa Date: Sun, 13 Feb 94 23:17:40 JST Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM In-Reply-To: <9401261743.AA11761@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 26, 94 12:43 pm X-Mailer: ELM [version 2.3 PL11] > Don't mind. I've been skiing in Swiss last week, which delayed my reply. Is Seatle in March good for skiing? > Flow ID is not considered to be internetwor layer. > > Perhaps we are using the same term ("flow ID") to refer to two different > things, since in my conception a "flow ID" is just the name (label, > identifier) of an internetwork layer object, the "flow". I think we share the common definition. > To me, a flow is a series of packets which belong together Flow is a unit of resource reservaiton. > This relationship between the packets is made across the entire internetwork > system (i.e. across many networks), on an end-end basis, and is thus visible > to the internetwork layer. It is thus absoutely an internetwork layer concept. Just as there are connections in different layers, there are flows in different layers. > Yes, the forwarding based on locators would happen only in "active routers". > (The term "active" is not a particularly good one, just something I picked > in a hurry because I needed a term to distinguish that set of routers.) So my proposal is to call your "active router" just "router". The rest is just bridges. > Again, you seem to have a unusual definition of "bridge". To me, a bridge is a > device which forwards packets based on the local physical network header (i.e. > 802.*, or whatever). All of the devices I am talking about would be forwarding > packets based on the internetwork header, which is why I call them all > routers. Flow ID is a transport layer entity used at several lower layers. > So, my question is, how can you globally propergate the information? > I don't think you can use DNS here. > > Why not? That's part of the reason for controlling the rate of change, to > bring it within the rate that the DNS can handle. Why not? > Anyway, how can you handle partitioning? > > This is an open point; there are a number of potential scheme, and a decision > as to which set (since I don't think one alone will do it) has not yet been > made. We will be discussing it soon, I expect. As the partitioning being the open point, I don't think you can do anything with DNS. > Also, there has been some discussion about the need to allow things to have > multiple locators to make this "renumbering" easier in practise; we can't have > a "flag moment" when every locator within the scope of the change gets updated. > We need a mechanism which allows the process to happen over some reasonable > time period, with interoperation with the rest of the system continuing while > the change happens. Allowing (temporary) multiple locators allows this. You merely repeated the issue I raised. The question is "how" can you do so, not "what" you should do. > >> And the flow setup is the worst. > > > But it is only performed once (so it does not even add any delay), and > > the cost [of setting up a DMF is] shared between any number of packets. > > You should be assuming connected UDP, then. > > I didn't follow this? Unless you assume connected UPD or things like that, the connection won't be reused frequently enough. > > Remember, most routers will only be doing a flow-lookup in this scheme, > not looking at the locator. > > With my scheme, only the source cares locator. > > I thought that your scheme involved intermediate routers making routing > decisions for packets based on the EID of the next border router; this > sequence of EID's is the locator in your scheme. Exactly. It should be noted that exact match is enough. No best match necessary. > Your intermediate routers have to look at the current EID in the > locator (i.e. a non-fixed offset in the packet), No. A packet does not contain any locator. It contains list of EIDs. The list will be made by the source from a locator, at random or with some policy. > unless you have copied the > "current" EID to some other location in the packet. Loose source route mechanism of IPv4 and SIPP will be directly applicable. > I am assuming that you are not looking at all of them to find the rightmost > one that any given router has in its table; this, unfortunately, is what you > need to do to find the "optimal" (within the amount of routing data that you > have passed around to the routers in your system) path. I don't think any scheme with thining can find the "optimal" path. With my scheme, the source can choose the best path using the available routing data. > You could combine the > two, and have the intermediate border router (above) set the next border > router to aim for to be not just the next one in the list, but the rightmost > one it has is it's routing table, which will get you a somewhat optimized > route. I think it complex and consumes a considerable amount of processing power of routers. > It still probably won't be as good as the new datagram mode, since you > will have to head for the particular border router (named by its EID) in your > locator, not the closest one into that area (which may or may not be the > optimal entry router for the ultimate destination, sigh, another > complication). If you want to have some preference, it is easy to do with DNS completely statically. That is: PEID PEID PEID where PEID is an RR name for EID of border routers of parent area. > > That's the whole point of having the minimal set of DMF's necessary to > > do pure hierarchical routing, augmented *as necessary* where the amount > > of traffic justifies the cost of extra DMF's. You only go beyond the > > minimal set (which has been shown to be quite small) if there *are* many > > datagrams; i.e. if the actual traffic justifies it. > > OK. Suppose the traffic needs the maximum set. How large is the maximum? > > Impossibly large, but this is true of *any* routing architecture. Mine does not. Without reasonable maximum, you can't measure the meaningfull efficiency of your scheme. > I don't know how to do this, and I suspect we may never get a simple, > guaranteed optimal algorithm (it feels NP-complete), but as we get better and > better practical approximations, the "algorithm-independant" nature of Nimrod > will allow us to deplyoy it incrementally, with no global coordination I'm afraid that, without impossily large amount of routing information, performance of your scheme is poor. > > Note also that if you think a DMF is unlikely to have any traffic across > > it, set it up on demand, not in advance. That way, the only DMF's that > > get set up are the ones that get used. > > As the communication is connectionless, you can't expect much usage > pattern. > > This is a conjecture which I suspect is wrong, but I can't prove it right at > the moment. You can't, of course. > However, I can hand-wave. For instance, cars on roads have a lot > of the characteristics of datagram, but there are definitely usage patters. > You can also look at phone networks; individual calls have a lot of the > same characteristics as datagrams, and there, too, there are useage patterns. What a faulty hand-waving. You can see usage pattern on a single link, of course. You can't see usage pattern on DMF connection, unless the number of DMF connection is O(number of links) or something as small as that, in which case, traffic concentrates. > Especially, packets which travels long distance, which loads top level > routers tends to have less pattern, becase end organizations are less > related. > > This is true, but I suspect we'll have to monitor a real network to know > what the actual patterns are. I don't think we can predict them. You don't have to predict them. Excessive loading will occur. We should just avoid it as much as possible. > > I must have missed something. If we do what you suggest, and buffer "best > > effort" datagrams, won't they be dropped at precisely the same points on > > congestion? The actual effects of the DMF scheme ought to be the same, > > If you try to have QoS along each flow, some of the bandwidth is reserved > and wasted, which is the difference. > > It depends on your resource allocation system. As far as I know, most proposed > resource allocation architectures allow reserved, but unused, bandwidth to be > given to "capacity available" traffic. If you reserve resources only to forward connectionless packets, there will be eventually NO traffic which use "best effort" strategy. Connectionless packets should be sent as the "capacity availale" traffic. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa17586; 13 Feb 94 11:26 EST Received: from pizza by PIZZA.BBN.COM id aa24734; 13 Feb 94 10:59 EST Received: from BBN.COM by PIZZA.BBN.COM id aa24730; 13 Feb 94 10:57 EST Received: from necom830.cc.titech.ac.jp by BBN.COM id aa17041; 13 Feb 94 10:57 EST Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Mon, 14 Feb 94 00:48:22 +0900 From: Masataka Ohta Return-Path: Message-Id: <9402131548.AA25716@necom830.cc.titech.ac.jp> Subject: Re: Analysis of DMF's in new datagram mode To: Noel Chiappa Date: Mon, 14 Feb 94 0:48:20 JST Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM In-Reply-To: <9401261849.AA12674@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 26, 94 1:49 pm X-Mailer: ELM [version 2.3 PL11] > > I was describing the minimal functional configuration. > > Because of the load concentration, I don't think your configuration > function. > > The answer to this has two parts. First, I don't think it would take *lot* > more state to provide better routing. The argument is given below, in the > original message. > > Second, to the extent that your physical network configuration provides load > concentrations, this is a problem with the configuration which routing alone > cannot solve. I prefer meshes, with lots of smaller routers, for the simple > reason that the load is spread over more paths. > > An interior router which had a instantiated DMF (not just a potential > > DMF) [to] every other object in the area would still have only as many > > DMF's ending at it as a border router of the area, and that case has been > > analyzed as O(I), which is reasonable. > > Your configuration is wrong as to the configuration within an area, > of course. > > How is it wrong? It's not at all obvious to me.. There is no such thing as "a potential DMF". > But, your configuration is also wrong as to the configuration of area > hierarchy. The number of levels must be limited, > > Why? Increasing of levels mean shrinking of an area, which reduces the amount of area local traffic. If areas are shrunk beyond some threashold, which is determined by traffic pattern, the global traffic will exponentially explodes. > Everyone seems to agree that the only way to scale the system is to > increase the number of levels, Yes, but the problem is that we can't have arbitrary many number of levels. > and in fact, in general *all* routing > architectures seem to have the characteristic that increasing the number of > levels increases the overhead a lot more slowly than increasing the size of > existing levels. Unless the size of a level is large enough, there won't be enough locality of traffic within the level. > > If every interior router had an instantiated DMF to all the objects in > > the area, the number of flows through each router in the area would be > > O(IlogI), which is a little worse, but not terribly so, since I is > > unlikely to grow large; we won't have enormous areas. > > Because of planarity, it is O(I^1.5), where I is not constant. > > I will discuss the planarity issue below, but even if it *were* O(IsqrtI), > that would still not be impossible, since I don't expect to see massive > growth in I (the number of one-level-down-objects in the average area). At the top level, I think "I" be as large as 10,000 or 100,000. > > In addition, I doubt we will see full mesh connectivity; traffic X-Y > > graphs always show hot-spots, not an even distribution, at any scale. > > Hot spots make the load concentration worse. > > These "hot spots" are not *physical* hot-spots, but source-destinaion traffic > matrix entries which show larger counts than average. Hot spots make the load concentration worse, anyway. > you'd modify it to get rid of the hot spots. That's the solution agaist the host spot problem. > Top level links MUST have enough bandwidth. > > True, but this is a physical topology design point, not a routing architecture > design point. Routing archtechture must be designed so that the commonly available top level link bandwidth is wide enough. > That is, there should be a lot of second level areas and they should be > connected with a lot of links. Your conffiguration, which assumes small > areas, do not allow such a coniguration. > > I'm not quite sure I follow this. Perhaps, I should have written: > That is, there should be a lot of first level areas and they should be ^^^^^ > connected with a lot of links. Your conffiguration, which assumes small > areas, do not allow such a coniguration. I have assumed the first level area is the entire world, which you should be refering as the zero-th level area. Is that your terminology? > If you increase number of links without increasing the number of nodes, > load will concentrate not in links but on border routers. > > Good point; we well need to increase the number of border nodes too. This > obviously makes hash of my next statement, but it turns out that DMF growth is > more correlated to the number of *interior* objects, and not to the number of > border routers, What? If you don't make use of increased number of border routers, the increse is meaningless. > Everything has its own limitaiton. Still, we can expect some limitation > scale as time goes by. For example, allowable size of routing information > is expected to scale as the link speed increases. > > True, but I think the limit at the moment is memory, not bandwidth, although > as I explained (with memory capacities going as the square of feature size, > whereas device speed goes linearly with feature size) I expect this balance > to shift. Memory is not at all an issue, already. 100MB of memory is much cheaper than a 100Mbps link. > Oh, I'm not. I'm just assuming that i) growth in the size of the network will > be *faster* than technology for some years to come (how fast is the Internet > growing now), so we can't accomodate that growth purely with growth in > technology (line speeds and memory sizes). Agreed. To me, the most serious problem is the toplevel traffic. > Also, as I explained, you get more "bang for your dollar" out of increasing > the number of levels than you do out of increasing the size of each level. As > a very simplified example, let's assume you have a 24-bit address. You can > either make it two 12-bit fields, or three 8-bit fields. Either gives you the > same number of total destination addresses available - 2^24. However, the > former would take 2*(2^12) routing table entries in a router, or 8K, whereas > the latter would take 3*(2^8), or 768; i.e. a order or magnitude less state! What? Do you think memory for 16M is so significant? > I doubt that normal (i.e. non-planar) graph of fixed degree (i.e. one in which > nodes have the same average number of arcs to neighbouring nodes, independant > of the size of the graph) is really an optimal model for the network either. > The problem, as you have pointed out, is that not all links are equally > likely. OK. > I.e., if Pij is the probability of a link between nodes i and j (thanks for > the notation, Yakov :-), in a real network, Pij is *not* a constant over all > j for a given i. Rather, nodes which are "closer" (in the physical space > geometry) are more likely to have links than those further away. So, even if > the average node does have a constant number of arcs, they are not distributed > randomly across the graph. > > This will move us off the O(logN) point, and toward the O(sqrtN) point. > However, without a probability model, and lot of math (or simulation), neither > of which I have time for, it's impossible to say how far. It is O(sqrtN). Isn't it obvious? > My guess, based on > looking at real-world* networks like the ARPANet, is that it will be pretty > close to the results for true fully random graphs, in future real networks. That's only true if there is were single, high-bandwidth backbone, which is NOT our favourite model of MESH. Though you may think T3 bacbone is fast enough forever, a single giga-bit backbone won't have enough capacity in the near future. > The reason is simple; long path lengths are a *bad thing*. People will put in > enough non-local links to bring the path length down, but my thinking (based > on recollection, again) is that its an asymptotic, diminishing-returns type of > thing. It doesn't take a lot of long links to really whack down the diameter > (and thus the average path length). You completely misunderstand the issue. The diminishing-return makes diameter larger, not smaller. Only to make the average diameter O(N^(1/3)), that is, to have topology related to three dimensional space, you need a lot of links (the number depends on locality of traffic) with length (size of the Earth)*O(N^(1/6)) which is transcontinentally lengthy. So, practically speaking, the diameter will be O(N^(1/2)). > I know BBN did a lot of work in this > area, modelling the ARPANet to see where to add new links. Perhaps someone > there can report briefly on what they recall? Perhaps, they think T3 is fast enough. > Again, my specific recollection of that work is that it doesn't take many > non-local links to really help. Of course, then you have load issues on those > links, but that's another story... The load on the top level link is the issue. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa23839; 13 Feb 94 16:22 EST Received: from pizza by PIZZA.BBN.COM id aa25676; 13 Feb 94 16:02 EST Received: from BBN.COM by PIZZA.BBN.COM id aa25672; 13 Feb 94 16:00 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa22892; 13 Feb 94 15:58 EST Received: by ginger.lcs.mit.edu id AA19785; Sun, 13 Feb 94 15:53:15 -0500 Date: Sun, 13 Feb 94 15:53:15 -0500 From: Noel Chiappa Message-Id: <9402132053.AA19785@ginger.lcs.mit.edu> To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp Subject: Re: Analysis of DMF's in new datagram mode Cc: nimrod-wg@BBN.COM There is no such thing as "a potential DMF". A "potential DMF" is a DMF which can be set up (presumably in reponse to demand), but has not been. (The difference is important since there is not state in the routers along the path of the potential DMF.) > even if it *were* O(IsqrtI), that would still not be impossible, since I > don't expect to see massive growth in I (the number of one-level-down > -objects in the average area). At the top level, I think "I" be as large as 10,000 or 100,000. I seriously doubt it would be that large; there aren't that many country codes in the world phone system, and it seems to work just fine. If it does get that large, you need to introduce another layer of hierarchy; Kleinrock/Kamoun type analysis will show that the resulting routing inefficiency is minimal. Perhaps, I should have written: > That is, there should be a lot of first level areas and they should be ^^^^^ > connected with a lot of links. Your conffiguration, which assumes small > areas, do not allow such a coniguration. I have assumed the first level area is the entire world, which you should be refering as the zero-th level area. Is that your terminology? No. This point is not completely decided (we may not number the layers at all) but my personal bias would be to number the layers from the bottom up, so that the top layer would not have a fixed number; we could add another one if the system gets too large. I just refer to it as the "top" layer. > This will move us off the O(logN) point, and toward the O(sqrtN) point. > However, without a probability model, and lot of math (or simulation), > neither of which I have time for, it's impossible to say how far. It is O(sqrtN). Isn't it obvious? No, it's not. As soon as you start getting *some* non-local links, the diameter of the graph is greatly reduced, and I expect you will find there is a relationship between the diameter, and the average path length. Do you in fact have probability model which are you using for Pij, and a simulation based on it to show average path length, which leads you to the O(sqrtN) answer, for graphs which are neither planar, nor fully random? > My guess, based on looking at real-world* networks like the ARPANet, is > that it will be pretty close to the results for true fully random > graphs, in future real networks. That's only true if there is were single, high-bandwidth backbone, which is NOT our favourite model of MESH. Though you may think T3 bacbone is fast enough forever, a single giga-bit backbone won't have enough capacity in the near future. The ARPANet did not have a single backbone. It *was* a mesh. Only to make the average diameter O(N^(1/3)), that is, to have topology related to three dimensional space, you need a lot of links (the number depends on locality of traffic) with length (size of the Earth)*O(N^(1/6)) which is transcontinentally lengthy. So, practically speaking, the diameter will be O(N^(1/2)). This result, which you state without an analysis, does not sound plausible. I agree that in a *fully planar* graph, the diameter will be O(sqrt(N)). (For those who don't follow this, let's look at a very simplistic model, of a two-dimensional representation of a graph, with a square lattice of nodes at fixed spacing, each connected to its four nearest neighbours. To further simplify things, lets make this graph circular, and the physical (i.e. representational) and arc (i.e. graph metric) distance between each node one. The physical area of this graph will N, so the physical diameter of this graph will be sqrt(N/pi); i.e. that many arcs (since the nodes are separated by one physical unit, and also one arc unit). So, dropping the constant factor of pi, O(D) = O(sqrt(N). This simple analysis doesn't prove it's true for all planar graphs, of course, but it tells us that it's true for at least one planar graph. You can do a similar analysis to prove that a three-dimensional lattice has a diameter of O(N ^ 1/3), without *any* non-local links, but this result isn't very interesting since a three-dimensional lattice is not, I think, a very exact model of the network.) However, I don't agree that it will take "a lot of links" to improve it, but don't have the mathematical tools, or the time to do a simulation, to prove it. I am satisfied with results gained in operation of a real mesh network, the ARPAnet. Anyway, what does the traffic model (the "locality of traffic") have to do with the diameter of the graph, a fixed property of the graph? The diameter is the diameter. If you're talking about the average path length for a given traffic pattern, fine, but then you need not only a model of the link probabilty (i.e. the probability of their being a link between nodes i and j), but the traffic density (i.e. the number of packets from node i to node j). Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa24537; 13 Feb 94 16:57 EST Received: from pizza by PIZZA.BBN.COM id aa25822; 13 Feb 94 16:34 EST Received: from BBN.COM by PIZZA.BBN.COM id aa25818; 13 Feb 94 16:32 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa23976; 13 Feb 94 16:29 EST Received: by ginger.lcs.mit.edu id AA19911; Sun, 13 Feb 94 16:24:03 -0500 Date: Sun, 13 Feb 94 16:24:03 -0500 From: Noel Chiappa Message-Id: <9402132124.AA19911@ginger.lcs.mit.edu> To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp Subject: Re: New datagram mode Cc: nimrod-wg@BBN.COM > To me, a flow is a series of packets which belong together Flow is a unit of resource reservaiton. That's one aspect of a flow, but not the only one. The path of the flow is another, one that Nimrod deal with. > Yes, the forwarding based on locators would happen only in "active > routers". So my proposal is to call your "active router" just "router". The rest is just bridges. The problem is that for most of the world, the term "bridge" already has a meaning; i.e. something that forwards a packet based on the *local network header*. Since all the device here are looking at the *internetwork* header, use of the term "bridge" will, I feel, prove confusing. > But it is only performed once (so it does not even add any delay), and > the cost [of setting up a DMF is] shared between any number of packets. Unless you assume connected UPD or things like that, the connection won't be reused frequently enough. A DMF can be thought of as approximately equivalent to a routing table entry. Its use is not limited to a single source host, or anything like that. Once set up, it will be used as often as a routing table entry will, with the same sharing of overhead. I don't think any scheme with thining can find the "optimal" path. With my scheme, the source can choose the best path using the available routing data. That "available routing data" will have been thinned, inevitably. > It still probably won't be as good as the new datagram mode, since you > will have to head for the particular border router (named by its EID) in > your locator, not the closest one into that area (which may or may not > be the optimal entry router for the ultimate destination, sigh, another > complication). If you want to have some preference, it is easy to do with DNS completely statically. Such a static ordering does not work, since the optimality of a given border router (from a set of border routers) is dynamic, and depends on the loction of the source of the traffic. > As far as I know, most proposed resource allocation architectures allow > reserved, but unused, bandwidth to be given to "capacity available" > traffic. If you reserve resources only to forward connectionless packets, there will be eventually NO traffic which use "best effort" strategy. I'm not proposing that we reserve resources only for connectionless packets. Use of DMF's has the side benefit of being a simple mechanism to see that connectionless packets get included in a resource allocation architecture without a lot of special mechanism. Connectionless packets should be sent as the "capacity availale" traffic. Perhaps, but if the avilable bandwith is all allocated, and used, by user flows, the datagram traffic will get dropped. That's why I'd prefer to see the datagrams guaranteed a certain minimum % of the link. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa14965; 14 Feb 94 3:42 EST Received: from pizza by PIZZA.BBN.COM id aa27703; 14 Feb 94 3:22 EST Received: from BBN.COM by PIZZA.BBN.COM id aa27699; 14 Feb 94 3:20 EST Received: from necom830.cc.titech.ac.jp by BBN.COM id aa12060; 14 Feb 94 3:19 EST Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Mon, 14 Feb 94 17:14:03 +0900 From: Masataka Ohta Return-Path: Message-Id: <9402140814.AA28681@necom830.cc.titech.ac.jp> Subject: Re: New datagram mode To: Noel Chiappa Date: Mon, 14 Feb 94 17:14:01 JST Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM In-Reply-To: <9402132124.AA19911@ginger.lcs.mit.edu>; from "Noel Chiappa" at Feb 13, 94 4:24 pm X-Mailer: ELM [version 2.3 PL11] > > To me, a flow is a series of packets which belong together > > Flow is a unit of resource reservaiton. > > That's one aspect of a flow, but not the only one. The path of the flow is > another, one that Nimrod deal with. DMF is not the path of the flow. > So my proposal is to call your "active router" just "router". The rest > is just bridges. > > The problem is that for most of the world, the term "bridge" already has a > meaning; I think your "inactive router" matches your definition of "bridge". > i.e. something that forwards a packet based on the *local network > header*. Since all the device here are looking at the *internetwork* header, > use of the term "bridge" will, I feel, prove confusing. The problem is that your flow ID for DMF is pretty much like *local network header*. As you say: > A DMF can be thought of as approximately equivalent to a routing table entry. it's like interface ID. > > the cost [of setting up a DMF is] shared between any number of packets. > > Unless you assume connected UPD or things like that, the connection won't > be reused frequently enough. > > A DMF can be thought of as approximately equivalent to a routing table entry. > Its use is not limited to a single source host, or anything like that. Once > set up, it will be used as often as a routing table entry will, with the same > sharing of overhead. "once set up", it will be OK. So, my point is that, if you don't provide a large number of DMFs, load will concentrate. If you provide a large number of DMFs, it won't be reused frequently enough, so that set-up cost will be significant. > With my scheme, the source can choose the best path using the available > routing data. > > That "available routing data" will have been thinned, inevitably. Of course. > If you want to have some preference, it is easy to do with DNS completely > statically. > > Such a static ordering does not work, since the optimality of a given border > router (from a set of border routers) is dynamic, It's merely a preference, optimization and does not have to work accurately. > and depends on the loction of the source of the traffic. Such location dependence is completely detectable by the routing data availale to the source dynamically. > If you reserve resources only to forward connectionless packets, there > will be eventually NO traffic which use "best effort" strategy. > > I'm not proposing that we reserve resources only for connectionless packets. No, of course. You are proposing that you reserve resources even for ^^^^ DMFs for connectionless packets. > Use of DMF's has the side benefit of being a simple mechanism to see that > connectionless packets get included in a resource allocation architecture > without a lot of special mechanism. What? Didn't you assume DMF shared? Don't you think there are a lot of resource requirements? Do you want to provide thousnads of DMFs between the same two routers to accomodate various resource requirements? > Connectionless packets should be sent as the "capacity availale" traffic. > > Perhaps, but if the avilable bandwith is all allocated, and used, by user > flows, the datagram traffic will get dropped. That's why I'd prefer to see > the datagrams guaranteed a certain minimum % of the link. OK, you think no bandwidth could be reserved for "capacity available" packets, which is a common misunderstanding. If there is 100Mbps link and 10 requests for 10Mbps communication, you think all of the requueests should be granted. But, no, you don't have to allocate all the bandwidth for bandwidth assured communication. Some of the bandwidth, say 50%, should always be reserved for "capacity availale" traffic. Still, it is possible for some carriers to have several grades of "capacity availale" communication. That is, reserve 20% of bandwidth for grade 1 and 2 communication. If the band is full, grade 1 communication is allowed to use another 20 %. But, still, no minimum bandwidth is reserved even for grade 1 "capacity available" communication, which is completely different from assigning resource to DMFs. That is, some resource should be reserved for connectionless communication, which does not mean each DMF reserve bandwidth.. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa18519; 14 Feb 94 4:38 EST Received: from pizza by PIZZA.BBN.COM id aa27853; 14 Feb 94 4:08 EST Received: from BBN.COM by PIZZA.BBN.COM id aa27849; 14 Feb 94 4:04 EST Received: from necom830.cc.titech.ac.jp by BBN.COM id aa15419; 14 Feb 94 4:02 EST Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Mon, 14 Feb 94 17:56:44 +0900 From: Masataka Ohta Return-Path: Message-Id: <9402140856.AA28765@necom830.cc.titech.ac.jp> Subject: Re: Analysis of DMF's in new datagram mode To: Noel Chiappa Date: Mon, 14 Feb 94 17:56:42 JST Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM In-Reply-To: <9402132053.AA19785@ginger.lcs.mit.edu>; from "Noel Chiappa" at Feb 13, 94 3:53 pm X-Mailer: ELM [version 2.3 PL11] > There is no such thing as "a potential DMF". > > A "potential DMF" is a DMF which can be set up (presumably in reponse to > demand), but has not been. (The difference is important since there is not > state in the routers along the path of the potential DMF.) Suppose that 50% (any none zero value is OK) of DMFs are potential. Then, 50% (if all DMFs are equally likely to be used. If not, the figure changes, but it is unlikely that it is negligibly low) of the connectionless traffic needs to instantiate the potential DMF, which means that the average latency of connectionless traffic is as slow as connection setup. Thus, there is no such thing as "a potential DMF". Q.E.D. BTW, for potential DMF be meaningful for order analysis that is, beyond constant factor, almost 100% of DMFs must be potential. > At the top level, I think "I" be as large as 10,000 or 100,000. > > I seriously doubt it would be that large; there aren't that many country codes > in the world phone system, and it seems to work just fine. So is DNS toplevel domains. So what? > If it does get that > large, you need to introduce another layer of hierarchy; Kleinrock/Kamoun type > analysis will show that the resulting routing inefficiency is minimal. Then, Kleinrock/Kamoun type analysis is broken. > system gets too large. I just refer to it as the "top" layer. OK. Then, my statement should be stated as: That is, there should be a lot of top level areas and they should be ^^^ connected with a lot of links. Your conffiguration, which assumes small areas, do not allow such a coniguration. > > This will move us off the O(logN) point, and toward the O(sqrtN) point. > > However, without a probability model, and lot of math (or simulation), > > neither of which I have time for, it's impossible to say how far. > > It is O(sqrtN). Isn't it obvious? > > No, it's not. As soon as you start getting *some* non-local links, the > diameter of the graph is greatly reduced, As I stated, yes. > Only to make the average diameter O(N^(1/3)), that is, to have topology > related to three dimensional space, you need a lot of links (the number > depends on locality of traffic) with length > > (size of the Earth)*O(N^(1/6)) > and I expect you will find there is > a relationship between the diameter, and the average path length. The problem is that the number of such links depends on locality of traffic, which affect the average path length. > Do you in fact have probability model which are you using for Pij, and a > simulation based on it to show average path length, which leads you to the > O(sqrtN) answer, for graphs which are neither planar, nor fully random? It depends on traffic pattern. With the assumption is that the percentage of truely global traffic is non zero, it is O(sqrt(N)). > > My guess, based on looking at real-world* networks like the ARPANet, is > > that it will be pretty close to the results for true fully random > > graphs, in future real networks. > > That's only true if there is were single, high-bandwidth backbone, which > is NOT our favourite model of MESH. Though you may think T3 bacbone is > fast enough forever, a single giga-bit backbone won't have enough capacity > in the near future. > > The ARPANet did not have a single backbone. It *was* a mesh. Then, it should have had O(sqrt(N)) property. But, it means nothing unless N is large enough. > Only to make the average diameter O(N^(1/3)), that is, to have topology > related to three dimensional space, you need a lot of links (the number > depends on locality of traffic) with length > > (size of the Earth)*O(N^(1/6)) > > which is transcontinentally lengthy. > So, practically speaking, the diameter will be O(N^(1/2)). > > This result, which you state without an analysis, does not sound plausible. It is obvious to me, but, I don't mind if you can show other result.. > I > agree that in a *fully planar* graph, the diameter will be O(sqrt(N)). You can convert locally non-planar, globally planar graph into fully planar graph without changing number of vertices and links beyond constant factor. Simply replace non-planar part with a single vertix and that's it. Unless the non-planar part contains more than O(1) vertices or linkes, which means non-planarity is not local, the O(sqrt(N)) result won't be affected. > This simple analysis doesn't prove it's true for all planar graphs,of > course, You can't, as planarity assumption is not enough. You also need distribution pattern of vertices on the plane and maximum allowable distance between vertices. > However, I don't agree that it will take "a lot of links" to improve it, but > don't have the mathematical tools, or the time to do a simulation, to prove > it. I am satisfied with results gained in operation of a real mesh network, > the ARPAnet. It means nothing unless N is large enough. > If you're talking about the average path length for a given > traffic pattern, fine, Then, I'm fine, too. > but then you need not only a model of the link > probabilty (i.e. the probability of their being a link between nodes i and j), > but the traffic density (i.e. the number of packets from node i to node j). I think we have agreed on the point. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa03630; 15 Mar 94 15:16 EST Received: from pizza by PIZZA.BBN.COM id aa01886; 15 Mar 94 14:01 EST Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa01882; 15 Mar 94 13:57 EST To: nimrod-wg@BBN.COM Subject: Draft Nimrod Architecture Document Date: Tue, 15 Mar 94 13:54:50 -0500 From: Isidro Castineyra I am enclosing below for your comment the draft of the architecture document we, Noel and the three of us at BBN, have been working on. The document is very much a working draft: everything in it is open for discussion. Some time next week, after people have had a chance to take a look at it, we need to decide how we want to spend the two session we have at IETF. Regards, Isidro Isidro Castineyra (isidro@bbn.com) Bolt Beranek and Newman, Incorporated (617) 873-6233 10 Moulton Street, Cambridge, MA 02138 USA ----------------- Nimrod Working Group I. Castineyra Internet Draft J. N. Chiappa March 1994 C. Lynn R. Ramanathan M. Steenstrup Expires 1 September 1994 The Nimrod Routing Architecture Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a ``working draft'' or ``work in progress''. Please check the 1id-abstracts.txt listing contained in the internet-drafts Shadow Directories on ds.internic.net, nic.nordu.net, ftp.nisc.sri.com, or munnari.oz.au to learn the current status of any Internet Draft. This Internet Draft will be submitted to the RFC editor as an architecture specification. Distribution of this Internet Draft is unlimited. Please send comments to nimrod-wg@bbn.com. Abstract We present a scalable internetwork routing architecture, called Nimrod. The Nimrod architecture is designed to accommodate an internetwork of arbitrary size and with heterogeneous service requirements and restrictions and to admit incremental deployment throughout an internetwork. The key to Nimrod's scalability is its ability to represent and manipulate routing-related information at multiple levels of abstraction. Internet Draft Nimrod Architecture February 1994 Contents 1 Introduction 1 1.1 Constraints of the Internetworking Environment . . . . . . . . . . 2 1.2 The Basic Routing Functions . . . . . . . . . . . . . . . . . . . . 3 1.3 Scalability Features . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.1Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.2Restricting Information Distribution . . . . . . . . . . . . . . 5 1.3.3Selecting Feasible Routes . . . . . . . . . . . . . . . . . . . 6 1.3.4Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.5Limiting Forwarding Information . . . . . . . . . . . . . . . . 6 1.4 The Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4.1Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Architectural Overview 8 2.1 Endpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1Connectivity Specifications . . . . . . . . . . . . . . . . . . 9 2.3 Nodes and Arcs . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.1Internal Maps . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Locators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3 Physical Realization 12 3.1 Contiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Multiple Locator Assignment . . . . . . . . . . . . . . . . . . . . 13 3.3 Non-Nimrod Physical Elements . . . . . . . . . . . . . . . . . . . 15 4 Forwarding 16 i Internet Draft Nimrod Architecture February 1994 4.1 Indicating Policy . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.3 Flow Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.4 Basic Topology Entity Chain (BTEC) Mode . . . . . . . . . . . . . . 18 4.5 Datagram Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5 Renumbering 22 6 Auxillary Functionality 23 6.1 Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 6.1.1Effects of Mobility . . . . . . . . . . . . . . . . . . . . . . 26 6.1.2Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 6.1.3Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 6.2 Multicasting . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 6.2.1Goals and Requirements . . . . . . . . . . . . . . . . . . . . . 34 6.2.2Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6.2.3Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6.3 Network Management . . . . . . . . . . . . . . . . . . . . . . . . 40 6.4 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 ii Internet Draft Nimrod Architecture February 1994 1 Introduction Nimrod is a scalable routing architecture designed to accommodate a continually expanding and diversifying internetwork. First suggested by Chiappa in [1], the Nimrod architecture has undergone revision and refinement through the efforts of the Nimrod working group of the IETF. The goals of Nimrod are as follows: 1. Nimrod should support an internetwork of arbitrary size by providing mechanisms to control the amount of routing-related information that must be known globally throughout an internetwork. 2. Nimrod should provide service-specific routing in the presence of multiple constraints imposed by both service providers and users. 3. Nimrod should be incrementally deployable throughout an internetwork and should not require modifications to the existing IP packet format. We have designed the Nimrod architecture to meet these goals. The key features of this architecture include: 1. Representation of internetwork connectivity and services in the form of maps at multiple levels of abstraction. 2. Localized route generation and selection based on maps and session service requirements. 3. Source-directed as well as destination-directed packet forwarding. We describe these features in more detail in sections 1.2 and 1.3 below. Nimrod is a general routing architecture that can be applied to routing both within a single routing domain and among multiple routing domains. As a general internetwork routing architecture designed to deal with increased internetwork size and diversity, Nimrod is equally applicable to both the TCP/IP and OSI environments. In this document, we present the Nimrod architecture. We begin with a discussion of the requirements of internetworking and a brief overview of Nimrod. In the second section, we delve into the details of the Nimrod architecture. In the third section, we describe the routing functionality supported by the Nimrod architecture. A companion document is devoted to an analysis of Nimrod deployment strategies and compatibility with existing internetwork protocols. Parts of the third and fourth sections are missing in this draft. 1 Internet Draft Nimrod Architecture February 1994 1.1 Constraints of the Internetworking Environment Internetworks are growing and evolving systems, in terms of number, diversity, and interconnectivity of service providers and users, and therefore require a routing architecture that can accommodate internetwork growth and evolution. A complicated mix of factors such as technological advances, political alliances, and service supply and demand economics will determine how an internetwork will change over time. However, correctly predicting all of these factors and all of their affects on an internetwork may not be possible. Thus, the flexibility of an internetwork routing architecture is its key to handling unanticipated requirements. In developing the Nimrod architecture, we first assembled a list of internetwork environmental constraints which have implications for routing. This list, enumerated below, includes observations about the present Internet; it also includes predictions about internetworks five to ten years in the future. 1. The Internet will grow to include O(10^9) networks and will retain the general organizational structure of backbone, regional, and local networks. 2. The number of internetwork users may be unbounded. 3. The capacity of internetwork resources is steadily increasing but so is the demand for these resources. 4. Routers and hosts have finite processing capacity and finite memory, and networks have finite transmission capacity. 5. Internetworks comprise different types of communications media---including wireline and wireless, terrestrial and satellite, shared multiaccess and point-to-point---with different service characteristics in terms of throughput, delay, error and loss distributions, and privacy. 6. Internetwork elements---networks, routers, hosts, and processes---may be mobile. 7. The frequency at which an entity moves is usually inversely proportional to the size of the entity, e.g., individual hosts are likely to move around more frequently than entire networks. 8. A session may include m sources and n destinations, where m and n are greater than one. 9. Service providers will specify offered services and restrictions on access to those services. Restrictions may be in terms of when a service is available, how much the service costs, which users may 2 Internet Draft Nimrod Architecture February 1994 subscribe to the service and for what purposes, and how the user must shape its traffic in order to receive a service guarantee. 10. Users will specify session service requirements which may vary widely among sessions. These specifications may be in terms of requested qualities of service, what they are willing to pay for these services, when they want these services, and which providers they wish to use. 11. Service providers and users have a synergistic relationship. That is, as users develop more applications with special service requirements, service providers will respond with the services to meet these demands. Moreover, as service providers deliver more services, users will develop more applications that take advantage of these services. 12. Support for varied and special services will require more processing, memory, and transmission bandwidth on the part of both the service providers offering these services and the users requesting these services. Hence, many routing-related activities will likely be performed not by routers and hosts but rather by independent devices acting on their behalf to store, process, and distribute routing-related information. 13. Users requiring specialized services (e.g., high guaranteed throughput) will usually be willing to incur some delay in obtaining these services. 14. Service providers are reluctant to introduce complicated protocols into their networks, because they are more difficult to manage. 15. Vendors are reluctant to implement complicated protocols in their products, because they take longer to develop. Collectively, these constraints imply that a successful internetwork routing architecture must support special features, such as service-specific routing and component mobility in a large and changing internetwork, using simple procedures that consume a minimal amount of internetwork resources. We believe that the Nimrod architecture meets these goals, and we justify this claim in the remainder of this document. 1.2 The Basic Routing Functions Nimrod supports distribution of link-state routing information in the form of maps, localization of route generation and selection at session sources and destinations, and specification of packet forwarding at the sources or destinations. Link-state routing information distribution permits each service provider to have control over the services it offers, through both distributing 3 Internet Draft Nimrod Architecture February 1994 restrictions in and restricting distribution of its routing information. It also gives the user (normally acting through an ``agent'') control over the routes generated and selected using the maps and the session service requirements. Restricting distribution of routing information serves to reduce the amount of routing information maintained throughout an internetwork and to keep certain routing information private. However, it also leads to inconsistent routing information databases throughout an internetwork, as not all routing information databases will be complete or identical. We expect routing information database inconsistencies to occur often in a large internetwork, regardless of whether privacy is an issue. The reason is that we expect some devices to be incapable of maintaining the complete set of routing information for the internetwork. These devices will select only some of the distributed routing information for storage in their databases. With Nimrod, route generation and selection is a local matter under the control of the users (normally acting through agents) and does not require global coordination among routers. Thus, we have placed the responsibility for and the cost of route generation and selection on the users of a route. Locally-controlled route selection also allows incremental deployment of and experimentation with new routing algorithms, as route selection procedures need not be the same at each location. Nimrod packet forwarding permits a user (normally acting through an agent) to exercise control in the forwarding of packets. The user may be either a session source or destination and may specify the forwarding path in as much detail as the maps permit. Source- or destination-controlled packet forwarding enables freedom from forwarding loops, even in the presence of routing information that is not consistent throughout the internetwork. We note that the Nimrod architecture and Inter-Domain Policy Routing (IDPR) [2] share in common the above features. In developing the Nimrod architecture, we have drawn upon experience gained with IDPR, and we expect to be able to make use of portions of the IDPR protocols and procedures in designing the Nimrod protocols. 1.3 Scalability Features Nimrod must provide service-specific routing in arbitrarily large internetworks and hence must employ mechanisms that help to contain the amount of internetwork resources consumed by the routing functions. We provide a brief synopsis of each such mechanism below. However, we note that arbitrary use of these mechanisms does not guarantee that a scalable routing architecture will result. Rather, when used wisely, these mechanisms enable one to create a scalable routing architecture. 4 Internet Draft Nimrod Architecture February 1994 1.3.1 Clustering The Nimrod architecture is capable of representing internetwork connectivity and services at multiple levels of abstraction. Abstraction of details reduces the amount of information required for routing. The abstraction hierarchy is formed through iterative clustering of internetwork entities beginning with hosts, routers, and networks. However, Nimrod does not specify a cluster formation algorithm but instead permits selection of the clustering criteria to apply. Internetwork entities may be clustered according to relationships among them, such as ``administered by the same authority'', or so as to satisfy some objective function, such as ``minimize the expected amount of forwarding information at each router''. New clusters may be formed by clustering together existing clusters. However, the same clustering criteria need not be applied at each level. Repeated clustering of entities produces a hierarchy of clusters with a unique universal cluster that contains all others. All entities within a cluster must satisfy at least one relation, namely connectivity. That is, if all entities within a cluster are operational, then any two entities within the cluster must be connected by at least one path that lies entirely within that cluster. This condition prohibits the formation of certain types of separated clusters, such as the following. Consider the clustering relation ``belonging to the same administrative body''. Suppose that a company has two branches located at opposite ends of a country and that these two branches must communicate over a public network not owned by the company. Then the two branches cannot be members of the same cluster, unless that cluster also includes the public network connecting them. A given clustering applied to an internetwork results in an organization related to but distinct from the physical organization of the component hosts, routers, and networks. When the clustering is superimposed over the physical internetwork entities, the cluster boundaries may not necessarily coincide with host, router, or network boundaries. Nimrod performs its routing functions with respect to the abstraction hierarchy resulting from a clustering, not with respect to the physical realization of the internetwork. In fact, Nimrod need not even be aware of the physical components of an internetwork. Network management functions are the only ones that require knowledge of physical components of an internetwork. 1.3.2 Restricting Information Distribution The Nimrod architecture supports restricted distribution of routing-related information, both to reduce resource consumption associated with such distribution and to permit information hiding. Each cluster determines the portions of its routing information to distribute and the set of entities to which to distribute this information. We suggest that each cluster 5 Internet Draft Nimrod Architecture February 1994 automatically advertise, to its siblings (i.e., those clusters with a common parent), information that applies to the cluster as a whole. In response to demand, the cluster may advertise information about specific portions of the cluster or information that applies only to specific users. Moreover, recipients of routing-related information may selectively discard this information. For example, an entity need not retain information for a cluster that denies it access to services. 1.3.3 Selecting Feasible Routes Generating routes that satisfy multiple constraints is usually an NP-complete problem and hence a computationally intensive process. With Nimrod, only those users that require routes with special services need assume the computational load associated with generation and selection of such routes. Moreover, the Nimrod architecture allows individual entities to choose their own route generation and selection algorithms. To reduce the amount of processing required for route generation, one should choose an algorithm that produces feasible but not necessarily optimal routes. 1.3.4 Caching The Nimrod architecture encourages caching of acquired routing-related information in order to reduce the amount of resources consumed and delay incurred in obtaining the information in the future. The set of routes generated as a by-product of generating a particular route is one example of routing-related information that is amenable to caching; future requests for any of these routes may be satisfied directly from the route cache. However, as with any caching scheme, the cached information may become stale and its use may result in poor quality routes. Hence, one must consider the expected duration of the usefulness of different types of routing-related information, in determining whether to cache the information and for how long. 1.3.5 Limiting Forwarding Information The Nimrod architecture supports two separate mechanisms for containing the amount of forwarding information that must be maintained per router. The first mechanism is the ability to multiplex, over a single path or tree, multiple traffic flows with similar service requirements originating in a single source cluster and destined for the same destination clusters. The second mechanism is the installation and retention of forwarding information only for active traffic flows. With Nimrod, the service providers and users share responsibility for the amount of forwarding information in an internetwork. Users have control 6 Internet Draft Nimrod Architecture February 1994 over the establishment of paths, and service providers have control over the maintenance of paths. This approach is different from that of the current Internet, in which the routing procedure itself establishes the forwarding information in routers, based upon entity reachability and not upon demand for communication with a given entity. 1.4 The Internet The current IP-based routing architecture has served the Internet well for many years. However, this architecture places bounds on the size and structure of the internetwork that it can accommodate. All currently deployed IP routing procedures express user reachability in terms of IP address, and all packets are forwarded according to destination IP address. IP addresses have fixed internal organization---a portion containing network number and a portion containing subnet and host number, whose lengths depend on address class---and a fixed overall length of 32 bits. Addresses are allocated in blocks by network number. Together, these characteristics create the following problems in a large internetwork: 1. Routing and forwarding information explosion. As Internet routing information includes reachability at the IP network level and packet forwarding is based on destination IP address, the amount of memory required to store routing and forwarding information grows at the rate at which new networks are added to an internetwork. In the Internet, this rate is approximately an annual doubling. Moreover, the amount of processing and transmission bandwidth required to handle the routing information also increases with this internetwork growth. 2. Inefficient use of IP address space. As IP addresses are assigned in blocks according to network number, only a small subset of all the host addresses associated with a given network number may be in use at one time. Nevertheless, the unused host addresses associated with that network number are not available to other hosts on the other networks in an internetwork and hence constitute wasted IP address space. 3. Insufficient IP address space. Regardless of the internal organization of the IP address space, there is a hard limit of 232 possible distinct IP addresses. This will not be enough to accommodate all of the users associated with the 109 networks projected for the future Internet. The IETF community has devoted considerable effort to developing solutions to the problems associated with IP-based addressing in large internetworks. In fact, the IETF recently formed a group whose charter is to specify the requirements for the next generation of the IP protocol and to review proposed solutions. The current proposed solutions to the IP addressing problems include but are not limited to reassignment of existing IP 7 Internet Draft Nimrod Architecture February 1994 addresses for more efficient use of IP address space, increasing the size and changing the structure of the IP address space, and replacing IP addresses with ISO NSAP addresses. These solutions vary according to the amount of implementation and deployment effort required and the period over which they would serve their purpose. None of the proposed solutions has yet been selected for integration into the Internet. In developing the Nimrod routing architecture, we have been more concerned with providing routing in a large and dynamic internetwork than with restructuring the existing IP address space. We believe that an excellent solution to the IP addressing problems will emerge from the IETF effort, and in a companion document we show that the Nimrod routing architecture is compatible with each of the contending IP addressing solutions. 1.4.1 Deployment The protocols based on the Nimrod architecture should be incrementally deployable, permitting a gradual and manageable introduction over time and throughout an internetwork. These protocols could be deployed in isolated areas within the Internet, for example within selected administrative domains. To reach hosts external to a Nimrod domain, traffic from internal hosts would use Nimrod forwarding within the domain and normal IP forwarding from the domain exit point to the external host. Routing information about such a Nimrod domain would be distributed outside of the domain using an existing inter-domain routing protocol such as IDRP [3]. Although the Internet will ultimately require a new IP packet format that includes new forwarding information as well as larger addresses, the Nimrod protocols could be deployed in the current Internet without changing the current IP packet format. Nimrod's packet forwarding requires packets to carry location information not supplied by the current IP packet header format. However, this additional forwarding information could appear in an encapsulating header added by Nimrod routers acting on behalf of hosts and interpreted only by those routers. Refer to section 4 for a complete description of the strategies for deploying Nimrod in an internetwork. 2 Architectural Overview Nimrod is a hierarchical, map-based routing architecture that has been designed to support a wide range of user requirements. It is implemented as a set of protocols and distributed databases. Nimrod's main function is to manage in a scalable fashion how much information about the network is required to choose a route for a traffic stream given the stream's description and requirements (both quality of service requirements and policy requirements). In other words, to manage the trade-off between amount of information about the network and route quality. The following sections describe the basic architectural concepts used in Nimrod. 8 Internet Draft Nimrod Architecture February 1994 2.1 Endpoints The basic entity in Nimrod is the endpoint. An endpoint represents a user of the network layer---for example, a transport layer entity. Each endpoint has at least one endpoint identifier (EID). Any given EID corresponds to a single endpoint. EIDs are globally unique, relatively short bit strings---for example, small multiples of 64 bits. EIDs have no topological significance whatsoever. For ease of management, EIDs might be organized hierarchically, but this is not required. EIDs have a second form, the end point label (EL). ELs are ASCII strings of unlimited length, structured to be used as keys in a distributed database (much like DNS names). Information about an endpoint---for example, how to reach it---can be obtained by querying this distributed database---the Nimrod Locator Server (NLS)---using the endpoint's label as key. 2.2 Maps The basic data structure used for routing is the map. A map expresses the available connectivity between different points of a network. Different maps can represent the same region of a physical network at different levels of detail. A map is a graph composed of nodes and arcs. Properties of nodes and arcs are contained in attributes associated with them. Nimrod necessarily includes languages to specify connectivity and to describe maps. Maps are used by route servers to generate routes. In general, it is not required that different route servers have consistent maps. Route servers can be co-located with routers or be independent entities. Each host has access to one or more route servers. 2.2.1 Connectivity Specifications By connectivity between two points we mean the available services and the restrictions for their use. The following are examples of connectivity specifications: o ``Between these two points, there exists best-effort service with no restrictions.'' o ``Between these two points, guaranteed 10 ms. delay can be arranged for traffic streams whose rate is below 1 Mbyte/sec and that have low (specified) burstiness.'' 9 Internet Draft Nimrod Architecture February 1994 o ``Between these two points, it is offered best-effort service as long as the traffic originates and is destined to research organizations.'' 2.3 Nodes and Arcs A node represents a region of the physical network. The region of the network represented by a node can be as large or as small as desired: a node can represent a continent as well as a process running inside a host. Moreover, as explained in section 3, a region of the network can simultaneously be represented by more than one node. A node has zero or more distinguishable border points to which arcs can be attached. There are two kinds of arcs: unidirectional and multipoint. Unidirectional Arcs: An unidirectional arc has two distinguishable connecting points: a head and a tail. The head and tail of a unidirectional arc are each connected to a border point of a node. The presence of a unidirectional arc between two given border points specifies that traffic can flow between those two points in the direction indicated by the arc (from tail to head). An unidirectional arc has connectivity attributes that specify the types of service offered by that arc, and the restrictions associated with the use of these services. The border points associated with the head and tail of an unidirectional arc may belong to the same node---this arc represents ``transit'' traffic through that node. Multipoint Arcs: A multipoint arc has two or more distinguishable connecting points. Each connecting point of a multipoint arc is connected with a border point of a node. A multipoint arc has connectivity attributes that specify the types of service offered by that arc. The presence of a multipoint arc indicates that the services indicated by that arc's connectivity attributes are offered between any two border points associated with that arc. Given a map, the border points connected by an arc can belong to different nodes or to the same node. When all the endpoints connected by a given arc belong to the same node, that arc is said to be a ``transit'' arc of that node. The distinguishable components of a map are called basic topological entities (BTEs): nodes, the border points of nodes, arcs, the connecting points of arcs, and the connectivity specifications of arcs. 10 Internet Draft Nimrod Architecture February 1994 2.3.1 Internal Maps As part of its attributes, a node can have zero or more internal maps. A route server can obtain a node's internal maps---or any other of the node's attributes, for that matter---by requesting that information from a representative of that node---for example, a route server associated with that node can be such a representative. A node's representative can in principle reply with different internal maps to different requests---because, for example, of security concerns. This implies that different route servers in the network might have different sets of internal maps for the same node. Given a map, a router can obtain a more detailed map of the network by substituting one of the map's nodes by one of that node's internal maps. This process could be continued recursively. Presumably, a route server would expand nodes in the region of the map of current interest. Nimrod defines standard internal maps that are intended to be used for specific purposes. A standard internal maps is the ``transit'' map. This map consists exclusively of the border points of that node and unidirectional and multipoint transit arcs that interconnect the node's border points. This map specifies the services available between the border points of a node. It is requested and used when a route server intends to route traffic *through* a given node. The degree at which a transit map describes the true capabilities of a given node can be determined by the number and types of arcs included in this map. A transit map---not containing nodes---cannot be further expanded. A second standard map is the ``detailed'' map. This map consists of both nodes and arcs. It is intended to give more detail about the region of the network represented by the original node. 2.4 Locators A locator is a string of binary digits that identifies a BTE in a map. Different BTEs have necessarily different locators. A given BTE is assigned only one locator. A given physical element of the network might implement more than one BTE---for example, a router that is part of two different nodes. Though this physical element might therefore be associated with more than one locator, the BTEs that this physical element implements has each only one locator. Locators specify *where* a BTE is in the network. A node is said to own those locators that have as a prefix the locator of the node. In a node that has an internal map, the locators of all BTEs in this internal map are prefixed by the locator of the original node. The locators of a node's border points are also prefixed by the node's locator. A locator belongs to a node in a map if the node's locator is, of all nodes 11 Internet Draft Nimrod Architecture February 1994 in that map, the longest prefix to that locator. For example, given a node with locator ABCD whose internal map contains a node with locator ABCDE, locator ABCDEF belongs to the inner node---the node with locator ABCDE. Given that the nodes in a map have different locators, a locator can belong to at most one node in any map. For any node, any BTE whose locator is prefixed by that node's locator is either one of the node's border points or part of one of the node's internal maps. All routing map information is expressed in terms of locators, and outing selections are based on locators. EID's are not used in making routing decisions---see section 4. 3 Physical Realization We model the network as composed of physical elements: router and hosts; and communication links. The links can be either point-to-point or multi-point (e.g., ethernets, X.25 networks, IP-only networks, etc.). A Nimrod router implements the set of Nimrod protocols. The physical representation of a network has associated with it a Nimrod map. This Nimrod map is a function not only of the physical network, but also of the configured associations between elements (locator assignment) and of the configured connectivity (attributes). Nimrod routers and hosts appear as nodes in a map at the right level of detail. Similarly, links appear as arcs in a map at the right level of detail. 3.1 Contiguity It is required that locators that share a prefix be assigned to a contiguous region of the network. That is, two elements of the network that have been assigned locators that share a prefix should be connected to each other with elements that themselves have been assigned locators with that prefix. The main consequence of this requirement, and it is not a trivial one, is that ``you cannot take your locator with you.'' As an example of this, see figure 1, consider two providers x.net and y.net which appear in a Nimrod map as two nodes with locators A and B. Assume that x.net and y.net are not directly connected. Assume that corporation z.com was originally connected to the first provider. Endpoints within z.com have, therefore, been assigned A-prefixed locators. Corporation z.com decides to change providers---severing its physical connection to x.net. The connectivity requirement implies that after the provider change has taken place endpoints of corporation z.com will have been assigned B-prefixed locators and that it is not possible for them to receive data destined to A-prefixed locators through y.net, as there exists no direct connection between x.net and y.net. 12 Internet Draft Nimrod Architecture February 1994 +++++++++ +++++++++ + + + + + x.net + + y.net + + + + + + + + + +++++++++ +++++++++ * * * * * * +++++++++ + + + z.com + + + + + +++++++++ Figure 1: Connectivity after switching providers This implies, among other things, that cacheing locators must be done carefully. The connectivity requirement simplifies routing information exchange: if it was permitted for z.com to receive A-prefixed locators through y.net, it would be necessary that a map that contains node B include information about the existence of a group of A-prefixed locators inside node B. Similarly, a map including node A, should include information that the set of A-prefixed locators asigned to z.com cannot be found within A. The more situations like this happen, the more the hierarchical nature of Nimrod is subverted to ``flat routing.'' The contiguity requirement can also be expressed as ``EIDs are stable, locators are ephemeral.'' 3.2 Multiple Locator Assignment Network elements can be assigned more than one locator. Consider the example of figure 2, which shows a physical network composed of routers (RA, RB, RC, and RD), hosts (HA, HB, and HC), and communication links. The figure also shows the locators assigned to hosts and routers. In this figure, RA and RB have each been assigned one locator. RC has been assigned locators a.y.r1 and b.d.r1. One of these locators shares a prefix with RA's locator, the other shares a prefix with RB's locator. Hosts HA 13 Internet Draft Nimrod Architecture February 1994 a.t.r1 b.t.r1 ++ ++ +RA+************+RB+ ++ ++ * * * * * * * * * * * * * * * ++ +RC+ a.y.r1 ++ b.d.r1 * *************************** a.y.h1 ++ ++ ++ a.y.h2 b.d.h2 +HA+ +RD+ c.r1 +HB+ b.d.h1 c.h1 ++ ++ ++ c.h2 * * ******************** ++ +HC+ c.h3 ++ Figure 2: Multiple Locators 14 Internet Draft Nimrod Architecture February 1994 a b c +++++++++++++++ +++++++++++++++ +++++++++++++++++ + + + + + + + a.t + + b.t + + + + ++++ + + ++++ + + + + + +**************+**+ + + + + + ++++ + + ++++ + + + + * + + * + + + + ++++ + + ++++ + + + + + + + + + + + + + + ++++ a.y + + ++++ b.d + + + + + + + + + +++++++++++++++ +++++++++++++++ +++++++++++++++++ Figure 3: Nimrod Map and HB have each been assigned three locators. HC has been assigned one locator. Many different Nimrod maps for this network are possible. Depending on what communication paths have been setup between points that do no share a prefix, different maps result. A possible Nimrod map for this network is given in the figure 3. Notice that even though a.y and b.d are defined on the same hardware, no connection is shown to exist between them. This connection has not been configured. A packet given to A with associated destination locator prefixed with ``b.d'' would have to travel from a to b via the link joining them before being directed towards its destination. Similarly, there is no connection between the c node and the other two top level nodes. If desired, these connections could be established. This would involve setting up exchange of routing information. In Nimrod, nodes and arcs represent the configured clustering and connectivity of the network. There is not a ``lowest level'': it is possible to define and advertise a map that is physically realized inside a CPU where a node could indicate, for example, a process or a group of processes. The user of this map need not necessarily know or care. (``It is turtles all the way down!'', in [4] page 63.) 3.3 Non-Nimrod Physical Elements A region of the network that is not Nimrod-aware but includes Nimrod-aware routers or hosts connected to it is represented as a link (possibly a multi-point link). An example of this is an IP-only network that is connected to the Nimrod internetwork via Nimrod-routers. This network would be modelled as a multi-point link. Nimrod-aware hosts connected to this 15 Internet Draft Nimrod Architecture February 1994 network are represented as nodes connected to this link. Nimrod packets destined for Nimrod hosts, or for Nimrod router ``on the other side of the network,'' could be encapsulated inside IP packets. IP-only hosts connected to this network can be reached from other IP-only clouds by, for example, encapsulating IP packets inside packets of the format being used by Nimrod. Nimrod routers connecting the IP network to the Nimrod internetwork would ``de-capsulate'' packets destined to IP-only hosts. IP-only hosts could, for example, be given locators prefixed by the locator of a Nimrod router that knows how to get packets to them, this way putting them ``inside'' the associated Nimrod router. Other treatments are possible: for example, they could be given locators prefixed with the locator of the arc that represents the IP network. In the first case, ``within'' a router, only that router needs to know how to forward packets to IP hosts; however, this makes this router a single point of failure. In the second case, all Nimrod routers connected to this arc need to know how to forward IP packets to IP-only hosts. To simplify packet forwarding, the locator for an IP-only host might include the IP address of the host. 4 Forwarding Nimrod does not specify a packet format. It is possible to use Nimrod with different formats, conceivably simultaneously, in the same network. For example, we anticipate that Nimrod can be used with the packet formats of IPv4, SIPP and TUBA. This section specifies Nimrod's requirements on the packet-forwarding mechanism. Nimrod supports three forwarding modes: 1. Flow mode: in this mode, the packet header includes a flow-id that maps into state has been previously set-up in routers along the way. Packet forwarding when flow-state has been established is relatively simple: follow the instructions in the routers' state. Nimrod includes a mechanism for setting up this state. A more detailed description can be found in section 4.3. 2. BTE chain (BTEC) mode: in this mode, packet carry a list of BTE locators through which the packet is required to go. A more detailed description of the requirements of this mode is given in section 4.4. 3. Datagram mode: in this mode, every packet header carries source and destination locators. Forwarding is done following procedures as indicated in section 4.5. In all of these modes, the packet header also carries locators and EIDs for the source and destinations. In normal operation, forwarding does not take the EIDs into account, only the receiver does. EIDs are carried for 16 Internet Draft Nimrod Architecture February 1994 demultiplexing at the receiver, and to detect certain error conditions. For example, if the EID is unknown at the receiver, the locator and EID of the source included in the packet could be used to generate an error message (this error message itself should probably not be allowed to be the cause of other error messages). Forwarding can also, use the source locator and EID to respond to error conditions. For example, to indicate to the source that the state for a flow-id cannot be found. Packets can be seen as moving between nodes in a map. A packet's header indicates, implicitly or explicitly, a destination locator. In a packet that uses either the datagram or the BTEC forwarding modes, the destination locator is explicitly indicated in the header. In a packet that uses the flow forwarding mode, the destination locator is implied by the flow-id and the distributed state in the network (it might also be included explicitly). Given a map, a packet moves to the node in this map to which the associated destination locator belongs to. If the destination node has a ``detailed'' internal map, the destination locator should belong to one of the nodes in this internal map (otherwise it is an error). The packet goes to this node (and so on, recursively). 4.1 Indicating Policy A datagram-mode packet can indicate a limited form of policy routing by the choice of destination and source locators. For this choice to exist, the source and destination endpoints must have several locators associated with them. This type of policy routing is capable of, for example, choosing providers. A BTE chain (BTEC) packet indicates policy by specifying the BTEs that the packet should traverse. Strictly speaking, there is no policy information included in the packet header: in principle, it is not possibly to determine what criteria were used to select the route by looking at the header; the packet header only contains the results of the route generation process. Similarly, in a flow mode packet, policy is implicit in the chosen route. 4.2 Trust A node that does not divulge its internal map can work internally any way its administrators decide, as long as the node satisfies its external characterization as given in its nimrod map advertisements. Therefore, the advertised Nimrod map should be consistent with a node's actual capabilities. For example, consider the network shown in figure 4 which shows a physical network and the advertised Nimrod map. The physical network consists of hosts and a router connected together by an ethernet. This node can be sub-divided into sub nodes by assigning locators as shown in the figure and advertising the shown map. The map seems to imply that it 17 Internet Draft Nimrod Architecture February 1994 is possible to send packets to node a.x without touching node a.y; however, this is actually not enforceable. More generally it is reasonable to ask how much trust should be put in the maps obtained by a route servers. Even when a node is ``trustworthy,'' and the information received from the node has been authenticated, there is alwasy the possibility of an honest mistake. These are difficult issues that are not unique to Nimrod. Many research and standard groups are addressing them. We plan to incorporate the output of these groups into Nimrod as they become available. 4.3 Flow Mode The header of a flow mode packet includes a flow-id field. This field identifies state that has been established in intermediate routers. This header might also contain locators and EIDs for the source and destination. Nimrod includes protocols to set-up and modify flow-related state in intermediate routers. These protocols not only identify the requested route, but also describe the resources requested by the flow---e.g., bandwidth, delay, etc. The result of a set-up attempt might be either confirmation of the set-up or notification of its failure. 4.4 Basic Topology Entity Chain (BTEC) Mode Routing for a BTEC packet is specified by a list of locators carried in the packet header. The locators correspond to the BTEs that make the specified path in the order that they appear along the path. The route indicated by a BTE packet is ``loose'' because the path is specifed in term of Nimrod BTEs, not physical entities. For example, a locator in the BTEC header could correspond to a type of service between two points of the network without specifying the physical path. In its most detailed form, the header for a BTEC-mode packet is an alternating list of border point locators and arcs (or connectivity specification) locators. (This list can be abbreviated by omitting, for example, the locator for the head of a unidirectional arc.) Including the locator for an unidirectional Nimrod arc in the header of an BTEC packet specifies that the packet should go from the border point that is associated with the tail of the arc to the border point associated with the head of that arc. If two consecutive arcs are both multi-point and they interesect at more than one border points, the header of the packet should include the locator for the desired border point. It is required that any two arcs whose locators appear consecutively in the header of a BTEC packet have at least one border point in common. Given two succesive arcs in a BTEC, if the first one is a unidirectional 18 Internet Draft Nimrod Architecture February 1994 ++ +RA+ a.r1 ++ * * * * ******************************* ++ ++ +Ha+ a.x.h1 +Ha+ a.y.h2 ++ ++ Physical Network a * +++++++++++++++++*++++++++++++++++++++ + * + + ++++++ + + +a.r1+ + + a.x ++++++ a.y + + ++++++++ * * +++++++++ + + + + * *+ + + + + + + + + + + + + + + + ++++++++ +++++++++ + + + + ++++++++++++++++++++++++++++++++++++ Advertised Nimrod Map Figure 4: Example of Questionable Hierarchy 19 Internet Draft Nimrod Architecture February 1994 arc, the border point shared by the two arcs should correspond to the head of the first arc; similarly, if the second arc is unidirectional, the border point shared by the two arcs should correspond to the tail of the second arc. The source-specified routes in both flow mode, and BTEC mode, are specified in terms of BTE's. In flow setup, state for a flow is instantiated in the switches which provide the BTE, but this is not done for the BTEC case. For efficient handling of BTEC mode packets, i) the packet contains a pointer into the source-specified BTEC, and ii) routers would maintain, for each BTE, a pre-setup flow which provides connectivity similar to that of the BTE (hereinafter the "BTEF", for 'BTE flow'). When a BTEC mode packet shows up at the router at the start of the BTEF, it is ``associated'' with the BTEF until it gets to the end of it, at which time the BTEC is consulted, and the packet is routed onto the next BTEF. The mechanism is quite simple. All packets contain a ``flow-id'' field, which is not otherwise used in BTEC packets. The flow-id of the BTEF is stored in that field. The packet will then traverse the routers between the start and end of the BTEF, being handled just like any normal packet which is part of a flow, i.e. by the high-efficiency flow-forwarding mechanism. When the packet gets to the router which is the termination of the BTEF, the flow-block will indicate that the packet needs special handling. This is slightly unusual, in that one doesn't visualize the flow-id field in the packet being modified during transit to refer to different flows. However, provided the flow-id field does't overload the source EID (i.e. use it as part of the flow-id), everything works quite well. 4.5 Datagram Mode A realistic routing architecture must include an optimization for datagram traffic, by which we mean user transactions which consist of single packets, such as a lookup in a remote translation database. Either of the two previous modes contains unacceptable overhead if much of the network traffic consists of such datagram transactions. A mechanism is needed which is approximately as efficient as the existing ``hop-by-hop'' mechanism. Nimrod has such a mechanism, somewhat novel in the details, and it may be even more efficient than ``hop-by-hop''. The scheme can be characterized by the way it divides the state in a datagram network, between routers and the actual packets. Most packets currently contain only a small amount amount of state associated with the forwarding process (``forwarding state'')---the hop count. Nimrod proposes that enlarging the amount of forwarding state in packets can produce a system with useful properties. It was partially inspired by the efficient source routing mechanism in [SIP], and the locator pointer mechanism in [PIP]. 20 Internet Draft Nimrod Architecture February 1994 It uses something much like the BTEF mechanism to support the datagram mode. There is a way to guarantee a strictly non-looping path, but without a source-route in the packet, using a slight variant of the BTEC mechanism. In the datagram mode, the packet contains, in addition to the locally usable flow-id field: o the source and destination locators, and o a pointer into the locators. The pointer starts out at the lowest level of the source locator, and moves up that locator, then to the destination locator, and then down. In addition to these extra fields in the packet, all routers have to contain a minimal set of "pre-setup" flows, to certain routers which are at critical places in the abstraction hierarchy. (The ``pre-setup'' flows do not actually have to be setup in advance, but can be created on demand. There is a minimum set of flows which do have to be *able* to be set up for the system to operate, however. It is purely a local decision, however, which, if any, of those flows to set up before there is an actual traffic requirement for them. As an efficiency move, when a datagram requires that a flow actually be set up to handle it, the data packet could be sent along with the flow setup request, avoiding the round-trip delay. We call these flows ``datagram mode flows'', or ``DMF's'', realizing that none of them need be created until actually needed.) The actual operation of the mechanism is fairly simple. While going up the source locator, each ``active'' router (i.e. one that actually makes a decision about where to send the packet, as opposed to handling it as part of a flow) selects a DMF which will take the packet to the ``next higher'' level object in the source locator, advances the pointer, and sends the packet off along that DMF. When it gets to the end of that DMF, the process repeats, until the packet reaches a router which is at the least common intersection of the two locators. (e.g., for A.P.Q.R and A.X.Y.Z, this would be when the packet reaches A). The process then inverts, with each active router selecting a DMF which takes the packet to the next lower object in the destination locator. So, A would select a flow to A.X, and once it got to A.X, A.X would select a flow to A.X.Y, etc. It can easily be seen that the process guarantees that the resulting path is loop-free. Each flow selected must necessarily get the packet closer to its destination (since each flow selection results in the pointer being monotonically advanced through the locator), and the flows themselves are guaranteed not to loop when their paths are selected, prior to being set up. 21 Internet Draft Nimrod Architecture February 1994 If the system keeps more than the minimal set of DMF's (which is just up to one border router in internal routers, and down to each object one level down for each border router), and keep the table sorted for efficient lookups (e.g. in much the same as the current routing table for hop-by-hop datagrams is), more optimal routing will result. For example, using the case above (a packet from A.P.Q.R to A.X.Y.Z), if A.P.Q is actually a neighbour to A.X.Y, and maintains a flow directly from A.P.Q to A.X.Y, then when the packet reaches A.P.Q, instead of going the rest of the way up and down, the pointer can be set into the destination locator at A.X.Y, and the packet sent there directly. Traffic monitoring and analysis (again, using purely local algorithms) can result in a database being created over time, which shows which DMF's above and beyond the minimal set are worth keeping around. This traffic monitoring would also show which flows from the required minimal set of DMF's it would be useful to set up in advance of actual traffic which needed them. Again, however, all these sets can be changed in a local, incremental way, without disturbing the operation of the system as a whole. These new fowarding state fields would not be covered by an end-end authentication system, any more than the existing hop count field (which is also forwarding state) would be. This would prevent problems caused by the fact that the contents of these fields change as the packet traverses the network. The forwarding of these packets is quite efficient, and in non-active routers, is maximally efficient (perhaps more so than even standard hop-by-hop). In the non-active routers, the packet is associated with flow in a way that makes possible hardware processing without any software involvement at all. In active routers, the process of looking up the next DMF would be about as expensive as the current routing table lookup, and the main difference would be that the result of that lookup would have to be stored in the packet, not a great expense. 5 Renumbering This section presents an example of how to ``renumber'' a Nimrod network. Figure 5 shows a network halfway in the process of being renumbered. The figure shows the physical network and the associated locators. The network is formed by router Ra which is connected to three ethernets. The figure shows five hosts, ``Ha'' to ``He''. To the right of each host two locators are shown. The first locator shown corresponds to the old numbering; the second, to the new numbering. Renumbering has consisted of adding a new level of hierarchy---to simplify the work of Ra, say. Because it is possible for a network element to have more than one locator, the two sets of locators can be active at the same time. Initially, only the first set of locators is active. This means that the NLS responds with 22 Internet Draft Nimrod Architecture February 1994 these locators to queries involving endpoints implemented at these hosts. It also means that Router Ra knows to which ethernet should a packet be directed given the locator in the header. (Given a packet destined to one of the hosts, the router would pick one of the three interfaces based on the ``host part'' of the locator---i.e. ``h1'' in locator a.h1.) When the second set of locators is introduced, for a query involving, for example, an endpoint in Hd, the distributed database would respond with the new locator: for examp.e, a.a.h1. For a time, Router Ra would forward based on the two sets of locators---because the first set of locators might still be cached by some sources. Eventually, Ra would de-activate the original set of locators. Pressumably, Ra would be prepared to forward the new set of locators before the DNS is instructed to use them. If a packet containing an old locator is given to R1 after the locator has been de-activated, an error message would be generated. There exists the possibility that the old locators might be re-assigned. If a packet is received by the wrong endpoint, this situation can be detected by looking at the destination EID which is included in the packet header. The renumbering scheme described above implies that it should be possible to update the NLS securely and, relatively, dynamically. Because renumbering will, most likely, be infrequent and carefully planned, we expect that the load on this updating mechanism should be manageable. A second implication of this renumbering scheme is a requirement for a secure and simple way to update hosts and routers locators. 6 Auxillary Functionality We now turn our attention to functionality that must exist in Nimrod, but is not a part of the ``core'' Nimrod architecture. We shall discuss four topics in this context - mobility support, multicasting, network management and security. Nimrod's approach to auxillary functionality is as follows. Nimrod does not specify a particular solution to provide the functionality but requires that the solution have certain characteristics (eg. scalability). It is the purpose of this section to discuss some of these requirements and evaluate approaches towards meeting them. This attitude towards auxillary functionality is consistent with Nimrod's general philosophy of flexibility, adaptability and incremental change. Each of these topics is being worked on extensively by the research community and it is not our intention to duplicate these efforts. Instead, we intend to let emerging solutions to be grafted and used within Nimrod. For each of the topics mentioned above, we discuss the issues involved, approaches currently being used and proposed by the research community and their viability for Nimrod. A summary of the main points of each topic is 23 Internet Draft Nimrod Architecture February 1994 ++++ + + a.r1 +Ra+ ++++ * *** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ***************** ********* *************************** ++++ ++++ ++++ ++++ ++++ + + a.h5 + + a.h1 + + a.h2 + + a.h4 + + A.h3 +Hd+ a.a.h1 +Ha+ a.a.h2 +Hb+ a.a.h1 +Hc+ a.c.h1 +He+ A.c.h3 ++++ ++++ ++++ ++++ ++++ Figure 5: Renumbering a Network 24 Internet Draft Nimrod Architecture February 1994 given at the end of the discussion so that readers not interested in the somewhat lengthy discussion of the issues may skip to the summary directly. 6.1 Mobility Nimrod permits some physical devices to be mobile, that is, change their network attachment points over time. In this section, we discuss the effects of mobility on Nimrod, describe the functionality required to handle mobility and compare some existing approaches. Nimrod, as a routing and addressing architecture, does not directly concern itself with mobility. That is, Nimrod does not propose a solution to the mobility problem. There are two chief reasons for this. First, mobility is a non-trivial problem whose implications and requirements are still not well understood and will perhaps be understood only when a mobile internetwork is deployed on a large scale. Second, a number of groups (for instance the mobile-ip working group of the IETF) are studying the problem by itself and it is not our intention to duplicate those efforts. The Nimrod architecture carries a functional ``stub'' for mobility, the details of the stub being deferred for later. The stub will be elaborated when a solution that meets the requirements of Nimrod becomes available (for instance from the IETF mobile-ip research). We do not, however, preclude the modification of any such solutions to meet the Nimrod requirements or preclude the development of an independent solution within Nimrod. Nimrod has a basic feature that helps accommodate mobility in a graceful and natural manner, namely, the separation of the endpoint identifier space from the locator space. Recall from section 2.1 that an endpoint (e.g., a host or a process) has a globally unique endpoint identifier (EID). The location of the endpoint within the topology is given by its locator. When an endpoint moves, its EID remains the same, but its locator might change. Nimrod can route a packet to the endpoint after the move, provided it is able to obtain its new locator. Thus, providing a solution to mobility in the context of Nimrod may be perceived as one of maintaining a dynamic association between the endpoints and the locators. Extending this viewpoint further, one can think of mobility-capable Nimrod as essentially consisting of two `modules' : the Nimrod routing module and the dynamic association module (DAM). The DAM is an abstraction, embodying the functionality pertinent to maintaining the dynamic association. This is a valuable paradigm because it facilitates the comparison of various mobility schemes from a common viewpoint. Our discussion will be structured based on the DAM abstraction and be in two parts, the themes of which are : o What constitutes mobility for the DAM and Nimrod? Is the realization of mobility as a ``mobility'' module that interacts with Nimrod viable? 25 Internet Draft Nimrod Architecture February 1994 What are the interactions between Nimrod and such a module? These points will be discussed in section 6.1.1. o What are some of the approaches one can take in engineering the DAM functionality? We classify some approaches and compare them in section 6.1.2. A word of caution: the DAM should not be thought of as something equivalent to the current day DNS - the DAM is a more general concept than that. For instance, consider a mobility solution for Nimrod similar to the scheme described in(1)[5]. Very roughly, this approach is as follows: Every endpoint is associated with a `home' locator. If the endpoint moves, it tells a `home representative' about its new locator. Packets destined for the endpoint sent to the old locator are picked up by the home representative, and sent to the new locator. In this scheme, the DAM embodies the functionalities implemented by all of the home representatives in regard to tracking the mobile hosts. The point is that the association maintenance, while required in some form or the other, may not be an explicitly distinct part, but implicit in the way mobility is handled. Thus, the DAM is merely an abstraction useful to our discussion and should not be construed as dictating the design. 6.1.1 Effects of Mobility One consequence of mobility is the change in the locator of an endpoint. However, not all instances of mobility result in a locator change (for instance, there is no locator change if a host moves within a LAN) and a change in the locator is not the only possible effect of mobility. Mobility might also cause a change in the topology map. This typically happens when entire networks move (eg. an organization relocates, a wireless network in a train or plane moves between cells etc.). If the network is a Nimrod network, we might have a change in the connectivity of the node representing the network and hence a change in the map. In this section, we consider the effects of mobility on the two ``modules'' identified above: Nimrod, which provides routing to a locator, and a hypothetical instantiation of the DAM, which provides a dynamic endpoint-locator association, for use by Nimrod. We consider four scenarios based on whether or not the topology and an endpoint's locator changes and comment on the effect of the scenarios on Nimrod and DAM. ------------------------------ 1. This also resembles the current draft proposal of the IETF mobile-ip working group. 26 Internet Draft Nimrod Architecture February 1994 Scenario 1 . Neither the locator nor the topology changes. This is the trivial case and affects neither the DAM nor Nimrod. An example of this scenario is when a workstation is moved to a new interface on the same local area network. Scenario 2 . The locator changes but the topology remains the same. This is the case when an endpoint moves from one network to another, thereby changing its locator. The DAM is affected in this case, since it has to note the new EID-locator association and indicate this to Nimrod if necessary. The effect on Nimrod is related to obtaining this change from the DAM. For instance, Nimrod may be informed of this change or ask for the association if and when it finds out that the mobile host cannot be reached. Scenario 3 . The locator does not change but the topology changes. One way this could happen is if a network moves and changes its neighbors (topology change) but remains within the cluster containing it (no locator change). DAM is not affected because the EID-locator association has not changed. Nimrod is affected in the sense that the topology map would now have to be updated. Scenario 4 . Both the locator and the topology change. If a network moves out of the cluster containing it, we have a change both in the map and in the locators of the devices in the network. In this case, both Nimrod and DAM are affected. In scenarios 3 and 4, it may not be sufficient to simply let Nimrod handle the topological change using the update mechanisms described in section (on RID). These mechanisms are likely to be optimized for relatively slow changes. Mobile wireless networks (in trains and planes for instance) are likely to produce more frequent changes in topology. Therefore, it might be necessary that topological updates caused by mobility be handled using additional mechanisms. For instance, one might send specific updates to appropriate cluster representatives in the cluster in which the move occured, so that packets entering that cluster can be routed using the new topology. We observe that accommodating mobility of networks, especially the fast moving ones, might require a closer interaction between Nimrod and the DAM than required for mobility that only causes locator changes. It is beyond the scope of this document to specify the nature of this interaction; however, we note that a solution to mobility should handle the case when a network as a whole moves. Current trends [6] indicate that such situations are likely to be common in future when wireless networks will be present in trains, airplanes, ships etc. In summary, if we discount the movement of networks, ie, assume no topology changes, it appears that the mobility solution can be kept fairly independent of Nimrod and in fact be accommodated by an implementation of the DAM. However, to accommodate network mobility (scenarios 3 and 4), it might be necessary for Nimrod routing/routers to get involved with mobility. 27 Internet Draft Nimrod Architecture February 1994 Beyond the constraints imposed by the interaction with Nimrod, it is desirable that the mobility solution have some ``general'' features. By general, we mean that these are not Nimrod specific. However, their paramount importance in future applications makes them worth mentioning in this document. The desirable features are : o Support of both off-line and on-line mobility. Off-line mobility (or portability) refers to the situation in which a session is torn down during the move, while on-line mobility refers to the situation in which the session stays up during the move. While much of the mobility is off-line currently, trends indicate that a large part of mobility in the future is likely to be on-line. A solution that only supports off-line mobility would probably have limited applications in future. o Scalability. One of the primary goals of Nimrod is scalability, and it would be contrary to our design goals if the mobility solution does not scale. There are three directions in which scalability is important : size of the network, number of mobile entities and the frequency of movement of the mobile entities. Note that for any given system with minimum response time (to a move) of t seconds, if the mobile entity changes attachment points faster than 1=t changes per second, the system will fail to track the entity. Augmenting traditional location tracking mechanisms with special techniques such as predictive routing might be necessary in this case. Hooks in the mobility solution for such augmentation is a desirable feature. o Security. It is likely that in future, there will be increased demand for secure communications. Apart from the non-mobility specific security mechanisms, the solution should address the following : -- Authentication. The information sent by a mobile host about its location should be authenticated to prevent impersonation. Additionally, there should be mechanisms to decide if a mobile user who wishes to join a network has the privileges to do so or not. -- Denial of service. The schemes employed for handling mobility in general could be a drain on the resources if not controlled carefully. Specifically, the resource intensive portions of the protocol should be guarded so that inappropriate use of them does not cause excessive load on the network. 28 Internet Draft Nimrod Architecture February 1994 6.1.2 Approaches As mentioned in section 6.1, Nimrod does not provide a solution for mobility, only a functional stub. We require that the protocols comprising the functional stub be independent of the protocols comprising Nimrod routing. This allows for maximum flexibility in designing and developing each set of protocols. As mentioned earlier, the problem of mobility in the context of Nimrod may be viewed as one of maintaining a dynamic association (DAM) and communicating this association and changes therein to Nimrod. Approaches to mobility may be classified based on how different aspects of the DAM are addressed. Our classification identifies two aspects to the mobility solution : 1. How and where to maintain the dynamic association between endpoints and locators? This may be perceived as a problem of database maintenance in a distributed system. The database may be maintained in a centralized fashion, wherein a single entity maintains the association and updates are sent to it by the mobile host or in a distributed fashion, wherein there are a number of entities that store the associations. In the distributed case, an entity might either store all of the associations in the network or store only associations for some EIDs (for instance a cluster representative stores associations only for entities within the cluster). We refer to the former distributed method as ``global'' and the latter as ``local''. A (distributed) database that stores the EID-locator mapping is required by Nimrod even in the absence of mobility. If this service can accommodate dynamic update and retrieval requests at the rate produced by mobility, this service is a candidate for a solution. However, we note that the availability of such a system should not be a requirement for the mobility solution. 2. Where to do the remapping between the EID and locator, in case of a change in association? Some candidates are : the source, the ``home'' location of the host that has moved and any router (say, between the source and the destination) in the network. Many of the existing approaches and perhaps some new approaches to the problem of mobile internetworking may be seen to be instantiations of a combination of a dynamic association method and a remapping method. We consider some combinations as illustrated in Table 1. We discuss four combinations (marked A1 - A4 in the table) and examine their advantages and disadvantages in the context of our requirements. We ignore some approaches (marked X in the table) because they clearly appear to be bad solutions. 29 Internet Draft Nimrod Architecture February 1994 ---------------------------------------- _ _Source _ Home _ Routers _ ---------------------------------------- _Centralized _ A1 _ X _ X _ ---------------------------------------- _Distr. local_ X _ A2 _ A3 _ ---------------------------------------- _Distr. globl_ A4 _ X _ X _ ---------------------------------------- Table 1: Combinations of Association and Remapping Methods Note that this is but one classsification of mobility schemes and that the remapping and EID-locator maintenance strategies mentioned in the table are not exhaustive. The main intention is to help understand better the kinds of approaches that would be most suitable for Nimrod. In the following, we use the term source to refer to the endpoint that is attempting to communicate with or sending packets to a mobile endpoint. The source could be static or mobile. We use the term mobile destination to refer to the endpoint that is the intended destination of the source's packets. A1 . In this approach, all locator-EID mappings are maintained at a centralized location. The source queries the database to get the locator of the mobile destination. Alternatively, the database can send updates to the source when the mobile destination moves. The main advantage of this scheme is its simplicity. Also, no modification to routers is required, and the route from the source to a mobile destination is direct. The main disadvantage is that the scheme is vulnernability - if the centralized location goes down, all information is lost. While this scheme may be sufficient for small sized networks with low mobility, it does not scale adequately to be a long term solution for Nimrod. A2 . This approach uses locally distributed association maintenance with remapping done at the home. This is the approach that is being used by the mobile-ip group of the IETF for the draft proposal and by the Cellular Digital Packet Data (CDPD) consortium. In this approach, every mobile endpoint is associated with a `home' and a `home representative' keeps track of the location of every mobile endpoint associated with it. A protocol between a mobile endpoint and the home representative is used to maintain the information up-to-date. The 30 Internet Draft Nimrod Architecture February 1994 source sends the packet using the home locator of the mobile destination and the home representative forwards it to the mobile destination. The advantage of this scheme is that it is fairly simple and does not involve either the source or the routers in the network. Furthermore, the mobile destination can keep its location secret (known only to the home representative) - this is likely to be a desirable feature for some mobile hosts in some applications. Finally, most of the control information is confined to the cluster containing the home representative and the mobile host and this is a plus for scalability. The main disadvantage is a problem often referred to as triangular routing. That is, the packets have to go from the source to the home representative first before going to the mobile destination. This is especially inefficient if, for instance, both the source and mobile destination are in, say, England and the home representative is in, say, California. Also, there is still some vulnerability, since if the home representative becomes unreachable, the location of all of the mobile hosts it tracks is lost. Nevertheless, we feel that this approach or a modification thereof might be a viable first-cut mobility solution for Nimrod. A3 . In each of the previous cases, the routers in the network were not involved in tracking the location of the mobile host. In this approach, state is maintained in the routers. An example is the approach proposed in [7] wherein the packets sent by a mobile host are snooped and state is created. The packets contain the mobile host's home location and its new location. This mapping is maintained at some routers in the network. When a packet intended for the mobile host addressed to its home location enters such a router, a translation is made and the packet is redirected to the new location. An alternate mechanism is to maintain the mapping in all of the border points (eg. border routers) in the cluster within which the movement took place. A packet from outside the cluster intended for a destination within the cluster would typically enter the cluster through one of the border points. Using the mapping, the border point could figure out the most recent locator of the mobile destination and send the packet directly to that locator. If most of the movements are within low level clusters, this would scale to large numbers of movements. Furthermore, the packet takes an optimal path (or as optimal as one can get with a hierarchical network) to the new location within the time it takes for the cluster representative to get the new information, which is typically quite small for low level clusters. The main disadvantage of this scheme is that routers have to be involved. However, future requirements in regard to scalability and response time might necessitate such an approach. Furthermore, this solution has closer ties with Nimrod routing and is better suited to 31 Internet Draft Nimrod Architecture February 1994 handle scenarios 3 and 4 where the topology changes as a result of mobility. A4 . In this approach, the locator-EID mapping database is maintained in a distributed manner, perhaps like the present day Domain Name Service (DNS). The remapping is done as in A1. This reduces the vulnerability problem present in A1. However, because routers are not involved, this approach appears less well-suited to handle mobility that results in topology changes (scenarios 3 and 4). Nevertheless, we do not reject this approach totally, especially if an adequate dynamic EID-locator mapping mechanism is provided independent of Nimrod (ie, if for instance, the next generation DNS can do the mapping maintenance) All of these approaches seem potentially capable of handling scenarios 1 and 2 of the previous section. Scenarios 3 and 4 are best handled by an approach similar to A3. However, approaches like A3 are more complex and involve more Nimrod entities (eg. routers) than may be desirable. We have tried to bring out the various issues governing mobility in Nimrod. In the final analysis, the tradeoffs between the various options will have to be examined vis-a-vis our particular requirements (for instance, the need to support network mobility) in adopting a solution. It is likely that general requirements such as scalability and security will also influence the direction of the approach to mobility in Nimrod. 6.1.3 Summary o Nimrod permits physical devices to be mobile, but does not specify a particular solution for routing in face of mobility. o The fact that the endpoint identifier (EID) space and the locator space are separated in Nimrod helps in accommodating mobility in a graceful and natural manner. Mobility may be percieved, essentially, as dynamism in the endpoint - locator association. o Nimrod allows two kinds of mobility: -- Endpoint mobility. For example, when a host in a network moves. This might cause a change in the locator associated with the host, but does not cause a change in the topology map for Nimrod. -- Network mobility. For example, when a router or an entire network moves. This might cause a change in the topology in addition to the locator. 32 Internet Draft Nimrod Architecture February 1994 o Endpoint mobility may be handled by maintaining a dynamic association between endpoints and locators. However, network mobility requires the addressing of the topology change problem as well. o Apart from the ability to handle network mobility, it is desirable that the mobility solution be scalable to large networks and large numbers of mobile devices and provide security mechanisms. o There are a number of existing and emerging solutions to mobility. In particular, adaptation of solutions developed by the IETF is a first cut possibility for Nimrod. 6.2 Multicasting Nimrod provides multicast routing and packet forwarding capability. Multicasting is performed by using a multicast delivery tree whose leaves are the multicast destinations. We begin by looking at the similarities and differences between unicast routing and multicast routing. Both unicast and multicast routing require two phases - route generation and packet forwarding. In the case of unicast routing, Nimrod specified three modes of packet forwarding - the flow mode, the datagram mode and the **BTEC** mode; and route generation itself was not specified but left to the particular routing agent. In multicasting, Nimrod leaves both route generation and packet forwarding mechanisms unspecified. To explain why, we first point out three aspects that make multicasting quite different from unicasting : o Groups and group dynamism. In multicasting, the destinations are part of a group, whose membership is dynamic. This brings up the following issues : -- A translation between the group name and the EIDs and locators of the members comprising that group. This is especially relevant in the case of sender initiated multicasting and policy support. -- A mechanism to accommodate new group members in the delivery in response to addition of members and a mechanism to ``prune'' the delivery in response to departures. o State creation. Most solutions to multicasting can essentially be viewed as creating state in routers for multicast packet forwarding. Based on who creates the state, multicasting solutions differ. In multicasting, we have several options for this - eg. the sender, the receivers or the intermediate routers. 33 Internet Draft Nimrod Architecture February 1994 o Route generation. Even more so than in unicast routing, one can choose from a rich spectrum of heuristics with different tradeoffs between a number of parameters (such as cost and delay, algorithmic time complexity and optimality etc.). For instance, some heuristics produce a low cost tree with high end-to-end delay and some produce trees that give the shortest path to each destination but with a higher cost. Heuristics for multicasting are a significant research area today and we expect advances to result in sophisticated heuristics in the near future. Noting that there are various possible combinations of route generation, group dynamism handling and state creation for a solution and that each solution conceivably has applications for which it is the most suitable, we do not specify one particular approach to multicasting in Nimrod. Every implementation of Nimrod is free to use its own multicasting technique, as long as it meets the goals and requirements of Nimrod. Thus, we do not discuss the details of any multicast solution here, only its requirements in the context of Nimrod. Specifically, we structure the discussion in the remainder of the section on the following two themes : o What are the goals that we want to meet in providing multicasting in Nimrod, and what specific requirements do these goals imply, for the multicast solution? o What are some of the approaches to multicasting in vogue today and how relevant are each of these approaches to Nimrod? 6.2.1 Goals and Requirements The chief goals of Nimrod multicasting are as follows: 1. Scalability. Nimrod multicasting must scale in terms of the size of the internetwork, the number of groups supported and the number of members per group. It must also support group dynamism efficiently. This has the following possible implications for the solution : o Routers not on the direct path to the multicast destinations should not be involved in state management. In a network with a large number of routers, a solution that does involve such routers is unlikely to scale (eg. current implementation of mrouted). o It is likely that there will be a number of applications that have a few members per group (eg. medical imaging) and a number of applications that have a large number of members per group (eg. 34 Internet Draft Nimrod Architecture February 1994 news distribution). Nimrod multicasting should scale for both these situations. If no single mechanism adequately scales for both sparse and dense group memberships simultaneously, a combination of mechanisms should perhaps be considered. o In face of group membership change, there must be a facility for incremental addition or deletion of ``branches'' in the multicast tree. Reconstructing the tree from scratch is not likely to scale. o It is likely that we will have some well known groups (ie, groups which are more or less permanent in existence) and some ephemeral groups. The dynamics of group membership are likely to be different for each class and the solution should take that into account as appropriate. 2. Policy support. This includes both quality of service as well as access restrictions, although currently, demand is probably higher for QOS. In particular, every path from the source to each destination should satisfy the requested quality of service and conform to the access restriction. The implications for the multicasting solution are: o It is likely that many multicasting applications will be cost conscious in addition to having strict quality of service bounds (such as delay and jitter). Balancing these will necessitate dealing with some new parameters - eg. the tree cost (sum of the ``cost'' of each link), the tree delay (maximum, mean and variance in end-to-end delay) etc. o In order to support policy based routing, we need to know where the destinations are (so that we can decide what route we can take to them). In such a case, a mechanism that provides an association between a group id and a set of destination locators is probably required. o Some policy constraints are likely to be destination specific. For instance, a domain might refuse transit service to traffic going to certain destination domains. This presents certain unique problems - in particular, for a single group, multiple trees may need to be built, each tree ``servicing'' disjoint partitions of the multicast destinations. This is illustrated with an example in appendix XX. 3. Resource sharing. Multicasting typically goes hand in hand with large traffic volume or applications with a high demand for resources. These, in turn, implies efficient resource management and sharing if possible. Therefore, it is important that we place an emphasis on 35 Internet Draft Nimrod Architecture February 1994 interaction with resource reservation. For instance, Nimrod must be able to provide information on which trees are shareable and which are not so that resource reservation may use it while allocating resources to flows. 6.2.2 Approaches The approaches to multicasting currently in operation and those being considered by the IETF include the following : 1. Distance vector multicast routing protocol (DVMRP)[8]. This approach is based upon distance-vector route information distribution and hop-by-hop forwarding. It uses Reverse Path Forwarding (RPF)[9] - a distributed algorithm for construction an internetwork broadcast tree. DVMRP uses a modified RPF algorithm, essentially a truncated broadcast tree, to build a reverse shortest path sender based multicast delivery tree. A reverse shortest path from s to d is a path that uses the same intermediate nodes as those in the shortest path from d to s(2). An implementation of DVMRP exists in the current Internet in what is commonly referred to as the MBONE. An improvement to this is in the process of being deployed. It incorporates ``prune'' messages are used to truncate further the routers not on the path to the destinations and ``graft'' messages are employed to undo this truncation, if later necessary. The main advantage of this scheme is that it is simple. The major handicap is scalability. Two issues have been raised in this context. First, if S is the number of active sources and G the number of groups, then the state overhead is O(GS) and might be unacceptable when resources are limited. Second, routers not on a multicast tree are involved (in terms of sending/tracking prune and graft messages) even though they might not be interested in the particular source-group pair. The performance of this scheme is expected to be relatively poor for large networks with sparsely distributed group memberships. 2. Core Based Trees (CBT)[10]. This scheme uses a single tree shared by all sources per group. This tree has a single router as the core (with additional routers for robustness) from which branches emanate. The chief distinguishing characteristic of CBT is that it is receiver initiated, ie, receivers wishing to join a multicast group find the tree (or its core) and attach themselves to it, without any participation from the sources. ------------------------------ 2. If the paths are symmetric (ie, cost the same) in either direction, the reverse shortest path is same as the shortest path 36 Internet Draft Nimrod Architecture February 1994 The chief motivation behind this scheme is the reduction of the state overhead, to O(G), in comparison to DVMRP. Also, only routers in the path between the core and the potential members are involved in the process. Core-based tree formation and packet flow are decoupled from underlying unicast routing. The main disadvantage is that packets no longer traverse the shortest path from the source to their destinations. The peformance in general depends on judicious placement of cores and coordination between them. Traffic concentration on links incident to the core is another problem. There is also a dependence on network entities (in other administrative domains, for instance) for resource reservation and policy routing. 3. Protocol Independent Multicasting (PIM)[11]. Yet another approach based on the distance-vector hop-by-hop combination, this is designed to reap the advantages of DVMRP and CBT. Using a ``rendezvous point'', a concept similar to the core discussed above, it allows for the simultaneous existence of shared and source-specific multicast trees. In the steady state, data can be delivered over the reverse shortest path from the sender to the receiver (for better end-to-end delay) or over the shared tree. Using two modes of operation, sparse and dense, this provides improved performance for both when the group membership in an internetwork is sparse and when it is dense. It is however, a complex protocol. A limitation of PIM is that the shortest paths are based on the reverse metrics and therefore truly ``shortest'' only when the links are symmetric. 4. Multicast Open Shortest Path First (MOSPF)[12]. Unlike the abovementioned approaches, this is based on link-state routing information distribution. The packet forwarding mechanism is hop-by-hop. Since every router has complete topology information, every router computes the shortest path multicast tree from any source to any group using Dijkstra's algorithm If the router doing the computation falls within the tree computed, it can determine which links it must forward copies onto. MOSPF inherits advantages of OSPF and link-state distribution, namely localized route computation (and easy verification of loop-freedom), fast convergence to link-state changes etc. However, group membership information is sent throughout the network, including links that are not in the direct path to the multicast destinations. Thus, like DVMRP, this is most suitable for small internetworks, that is, as an intra-domain routing mechanism. 5. Inter-Domain Policy Routing (IDPR)[13]. This approach uses link-state routing information distribution like MOSPF, but uses source-specified packet forwarding. Using the link-state database, the source generates a policy multicast route to the destinations. Using this, the IDPR path-setup procedure sets up state in intermediate entities for packet 37 Internet Draft Nimrod Architecture February 1994 duplication and forwarding. The state contains information about the next-hop entities for the multicast flow. When a data packet arrives, it is forwarded to each next hop entity obtained from the state. Among the advantages of this approach are its ability to support policy based multicast routing with ease and independence (flexibility) in the choice of multicasting algorithm used at the source. IDPR also allows resource sharing over multiple multicast trees. One disadvantage is that it makes it relatively more difficult to handle group membership changes (additions and deletions) since such changes must be first communicated to the source of the tree which will then add branches appropriately. We now discuss the applicability of these approaches to Nimrod. Common to all of the approaches described is the fact that we need to setup states in the intermediate routers for multicast packet forwarding. The approaches differ mainly on who initiates the state creation - the sender (eg. IDPR, PIM), the receiver (eg. CBT, PIM) or the routers themselves create state without intitiation by the sender or receivers (DVMRP, MOSPF). Nimrod should be able to accommodate both sender initiated as well as receiver initiated state creation for multicasting. In the remainder of this section, we discuss the pros and cons of these approaches for Nimrod. Recall that Nimrod uses link-state route information distribution (topology maps) and has three modes of packet forwarding - flow mode, **BTEC** mode and datagram mode. An approach similar to that used in IDPR is viable for multicasting using the flow mode. The source can setup state in intermediate routers which can then appropriately duplicate packets. For the loose source and datagram modes, an approach similar to the one used in MOSPF is applicable. In these situations, the advantages and disadvantages of these approaches in the context of Nimrod is similar to the advantages and disadvantages of IDPR and MOSPF respectively. Sender based trees can be setup using an approach similar to IDPR and generalizing it to an `n' level hierarchy. A significant advantage of this approach is policy-based routing. The source knows about the policies of clusters/administrative domains that care to advertise them and can choose a route the way it wants (ie, not depend upon other entities to do it, as in some schemes mentioned above). Another advantage is that each source can use the multicast route generation algorithm and packet forwarding scheme that best suits it, instead of being forced to use whatever is implemented elsewhere in the network. Further, this approach allows for incrementally deploying new multicast tree generation algorithms as research in that area progresses. CBT-like methods may be used to setup receiver initiated trees. Nimrod provides link-state maps for generating routes and a CBT-like method is 38 Internet Draft Nimrod Architecture February 1994 compatible with this. For instance, a receiver wishing to join a group may generate a (policy) route to the core for that group using its link-state map and attach itself to the tree. A disadvantage of sender based methods in general seems to be the support of group dynamism. Specifically, if there is a change in the membership of the group, the particular database which contains the group-destination mapping must be updated. In comparison, receiver oriented approaches seem to be able to accommodate group dynamism more naturally. Nimrod does not preclude the simultaneous existence of multiple approaches to multicasting and the possibility of switching from one to the other depending on the dynamics of group distributions. Interoperability is an issue - that is, the question of whether or not different implementations of Nimrod can participate in the same tree. However, as long as there is agreement in the nature of the state created (ie, the states can be interpreted uniformly for packet forwarding), this should not be a problem. For instance, a receiver wishing to join a source-specified sender created tree might set up state on a path between itself and a router on the tree with the sender itself being unaware of it. Packets entering the router would now be additionally forwarded along this new ``branch'' to the new receiver. In conclusion, the architecture of Nimrod can accommodate diverse approaches to multicasting. Each approach has its disadvantages with respect to the requirements mentioned in the previous section. The architecture does not demand that one particular solution be used, and indeed, we expect that a combination of approaches will be employed and engineered in a manner most appropriate to the requirements of the particular application or subscriber. 6.2.3 Summary o Nimrod does not specify a particular multicast route generation algorithm or state creation procedure. Nimrod can accommodate diverse multicast techniques and leaves the choice of the technique to the particular instantiation of Nimrod. o A solution for multicasting within Nimrod should be capable of -- Scaling to large networks, large number of multicast groups and large multicast groups. -- Supporting policy including quality of service and access restrictions. -- Resource sharing. 39 Internet Draft Nimrod Architecture February 1994 o Multicasting typically requires the setting up of state in intermediate routers for packet forwarding. The state setup may be initiated by the sender (eg. IDPR), by the receiver (eg. CBT), by both (eg. PIM) or by neither (DVMRP??, MOSPF??). The architecture of Nimrod provides sufficient flexibility to accommodate any of these approaches. 6.3 Network Management To Be Specified. 6.4 Security To Be Specified. References [1] J. N. Chiappa, ``A New IP Routing and Addressing Architecture,'' IETF Internet Draft, 1991. [2] M. Steenstrup, ``Inter-Domain Policy Routing Protocol Specification: version 1,'' RFC 1479, June 1993. [3] ISO, ``Information Processing Systems--Telecommunications and Information Exchange between Systems--Protocol for Exchange of Inter-Domain Routeing Information among Intermediate Systems to Support Forwarding of ISO 8473 PDUs,'' ISO/IEC DIS 10747, August 1992. [4] R. Wright, Three Scientists and their GodsL Looking for Meaning in an Age of Information. New York: Times Book, first ed., 1988. [5] J. Penners and Y. Rekhter, ``Simple Mobile IP (SMIP),'' INTERNET DRAFT, Sep 1993. (draft-penners-mobileip-smip-00.txt). [6] K. A. Wimmer and J. B. Jones, ``Global Development of PCS,'' IEEE Communications Magazine, pp. 22--27, Jun 1992. [7] F. Teraoka, Y. Yokote, and M. Tokoro, ``A Network Architecture Providing Host Migration Transparency,'' in Proceedings of ACM SIGCOMM, 1991. [8] S. Deering and D. Cheriton, ``Multicast routing in datagram internetworks and extended LANs,'' ACM Transactions on Computer Systems, pp. 85--111, May 1990. [9] Y. K. Dalal and R. M. Metcalfe, ``Reverse path forwarding of broadcast 40 Internet Draft Nimrod Architecture February 1994 packets,'' Communications of the ACM, 21(12), pp. 1040--1048, 1978. [10] A. J. Ballardie, P. F. Francis, and J. Crowcroft, ``Core Based Trees,'' in Proceedings of ACM SIGCOMM, 1993. [11] S. Deering, D. Estrin, D. Farinacci, and V. Jacobson, ``IGMP router extensions for routing to sparse multicast groups,'' Internet Draft, July 1993. [12] J. Moy, ``Multicast extensions to OSPF,'' Internet Draft, Sep 1992. [13] M. Steenstrup, ``Inter-Domain Policy Routing,'' Internet Draft, June 1993. 41   Received: from PIZZA.BBN.COM by BBN.COM id aa08235; 15 Mar 94 16:06 EST Received: from pizza by PIZZA.BBN.COM id aa02555; 15 Mar 94 15:41 EST Received: from BBN.COM by PIZZA.BBN.COM id aa02551; 15 Mar 94 15:39 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa06370; 15 Mar 94 15:39 EST Received: by ginger.lcs.mit.edu id AA08989; Tue, 15 Mar 94 15:38:54 -0500 Date: Tue, 15 Mar 94 15:38:54 -0500 From: Noel Chiappa Message-Id: <9403152038.AA08989@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: Draft Nimrod Architecture Document Cc: jnc@ginger.lcs.mit.edu Before everyone looks at this and goes bonkers, let me emphasize that it's a fairly rough draft, in which different people wrote different sections, and the people who wrote text don't necessarily agree with the contents of each others sections! :-) It's being put out at an early stage to stimulate discussion. I have a largish set of comments which take exception to various bits and pieces, and I'll be sending them in soon, so fire away! Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa15742; 21 Mar 94 17:35 EST Received: from pizza by PIZZA.BBN.COM id aa05240; 21 Mar 94 17:11 EST Received: from BBN.COM by PIZZA.BBN.COM id aa05236; 21 Mar 94 17:08 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa14158; 21 Mar 94 17:03 EST Received: by ginger.lcs.mit.edu id AA27065; Mon, 21 Mar 94 17:02:25 -0500 Date: Mon, 21 Mar 94 17:02:25 -0500 From: Noel Chiappa Message-Id: <9403212202.AA27065@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: IETF multicast Cc: jnc@ginger.lcs.mit.edu, mwalnut@cnri.reston.va.us Despite our having put the request in shortly after the last IETF, our second WG slot has been denied multi-cast coverage. (The coverage went to "Multiparty Multimedia Session Control WG", which got two multicast slots, and "Transition and Coexistence Including Testing BOF".) Since we will be reviewing the basic architecture document in both sessions, I don't know what will happen which day. If anyone who waas planning on participating via multi-cast has some specific issue they'd like to cover, please let me know, so we can get to it the first day. Noel PS: If you want to complain, the person to talk to is Megan (mwalnut@cnri).   Received: from PIZZA.BBN.COM by BBN.COM id aa01820; 25 Mar 94 15:37 EST Received: from pizza by PIZZA.BBN.COM id aa00245; 25 Mar 94 15:15 EST Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa00240; 25 Mar 94 15:08 EST To: nimrod-wg@BBN.COM Subject: IETF29 Agenda Date: Fri, 25 Mar 94 15:04:30 -0500 From: Isidro Castineyra Group Name: Nimrod - The New Internet Routing and Addressing Architecture BOF IETF Area: Routing Date/Time: Tuesday, March 29, 1994 0930-1200 PST (multicast) Wednesday, March 31, 1994 0930-1200 PST -------- Proposed Agenda: The main purpose of this meeting is to review the draft Architecture document and to prepare a workplan for the next IETF. 1. Agenda bashing 2. Draft Review & Discussion a. Nimrod Draft Architecture (Isidro Castineyra) 60min b. Discussion 120min 3. Open Issues 60min 3. Workplan 30min 4. Deployment Strategy 30min Chairs: J. Noel Chiappa, Isidro Castineyra (BBN)   Received: from PIZZA.BBN.COM by BBN.COM id aa10140; 29 Mar 94 9:51 EST Received: from pizza by PIZZA.BBN.COM id aa23766; 29 Mar 94 9:23 EST Received: from BBN.COM by PIZZA.BBN.COM id aa23762; 29 Mar 94 9:19 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa07924; 29 Mar 94 9:17 EST Received: by ginger.lcs.mit.edu id AA02566; Tue, 29 Mar 94 09:13:40 -0500 Date: Tue, 29 Mar 94 09:13:40 -0500 From: Noel Chiappa Message-Id: <9403291413.AA02566@ginger.lcs.mit.edu> To: int-serv@isi.edu, nimrod-wg@BBN.COM Subject: A taxonomy of state and tags in datagram networks Cc: big-internet@munnari.oz.au, jnc@ginger.lcs.mit.edu As a result of discussions in the Int-Serv WG sessions yesterday, I'd like to suggest a framework for thinking about, and set of terms for describing, the state in datagram network systems. I think this framework will both help us understand more clearly what's going on, and the common terminology will allow us to have discussions without getting confused. I want to emphasize that I'm not pushing one design or another, just trying to give us a common framework which we can use to understand, discuss, and evaluate different designs in. I don't distinguish between "state" (which one normally thinks of as data which records something transitory about a given process), and information (which might be seen as a little more permanent), since it's merely a matter of time-scale; there's very transient state (such as the hop-count), intermediate lifetime (such as information about a flow), and long-lived (such as routing table entries). However, it's all just state in the end. I start by observing that even "classic" IPv4 has state in both the routers, and the packets. In the packet, I distinguish between what I call "forwarding state", which records something about the progress of this individual packet through the network (such as the hop count, or the pointer into a source route), and what I call "tags", which are fields which are used, in the routers through which the packet passes, to look up various state stored in the routers. An example of tags in the current IPv4 architecture is the "address", which is used to look up routing table state; a "flow-id" might be a future tag. I further subdivide tags into two sub-classes; "keys" and "hints". A key is a field without which, or without the state in the router to which it refers, one cannot forward the packet; the "address" is such a field in IPv4. A hint is a field which is not necessary for the forwarding of the packet, but which makes the forwarding more efficient if the hint is correct, and the state in the router to which it refers is present. In the routers, there is a similar distinction between state which must be present for the forwarding process to be sucessful, which we might call "necessary" state (although I don't like this term, and welcome a better one), such as routing table entries; and cached state which is not necessary for the correct forwarding of packets. The term "soft state" is, I believe, sometimes used to refer to the latter kind of state. Note that neither of these states is what we refer to as "critical" state, i.e. state which is critical to a given end-end communication, and which, once lost, means the loss of the connection. An example of this is the state of a TCP connection, as stored co-located (i.e. sharing fate) with the application. Having defined this way of looking at state, I'll put some thoughts on how we ought to create and maintain this state in a separate message. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa15910; 29 Mar 94 11:20 EST Received: from pizza by PIZZA.BBN.COM id aa24445; 29 Mar 94 11:04 EST Received: from BBN.COM by PIZZA.BBN.COM id aa24441; 29 Mar 94 11:02 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa14676; 29 Mar 94 11:00 EST Received: by ginger.lcs.mit.edu id AA03070; Tue, 29 Mar 94 10:54:59 -0500 Date: Tue, 29 Mar 94 10:54:59 -0500 From: Noel Chiappa Message-Id: <9403291554.AA03070@ginger.lcs.mit.edu> To: int-serv@isi.edu, nimrod-wg@BBN.COM Subject: How to create and maintain state in a packet network Cc: big-internet@munnari.oz.au, jnc@ginger.lcs.mit.edu It seems to be an item of rough consensus that in the future internetwork, the routers will need to contain more state about the traffic flowing through them if we want to provide fair sharing of resources etc. A smaller, but influential, group feels that in order to provide certain services we will need to be able to provide service guarantees, and do so in a way which requires explicit reservation of resources. All of this means more state in the routers. The interesting question is how this state gets into the routers; is it inferred from watching the datagrams which flow through the router, or is in installed in some more explicit way? In all of this discussion, I'm passing over the need to retain support for real datagrams, i.e. transactions which require only a single packet. (In fact, I'm going to use the term "datagram" from here on out to refer to such applications, and use the term "packet" to describe the unit of transmission through the network.) There's clearly no point in storing any state about that datagram in the routers; it's come and gone in the same packet, and this discussion is all about state retention, so that's why I don't talk about them. However, don't assume from this discussion of how to handle the packets generated by non-datagram applications that I'm advocating system which support *only* such packet sequences (which we call "flows"). As another aside, I still think that the unreliable packet ought to be the fundamental building block of the internetwork layer. I really like the design principle that says that we can take any packet and throw it away with no warning or other action, or take any router and turn it off with no warning, and have the system still work. The component design simplicity (since routers don't have to stand on their heads to retain a packet which they have the only copy of), and overall system robustness, resulting from these two assumptions is absolutely unloseable. Anyway, back to the question is how the state gets into the routers. There is an interesting potential synergy here, because there is thought being given to routing architectures which, for reasons of engineering efficiency, store more state in the routers for long-lived flows. (There may be similar thoughts in the security and billing areas, but I'm not aware of them.) It's important to realize that there is no *fundamental* reason why this state has to be stored in the routers, and looked up via a key in the packets. It could easily be repeated in every packet (as a source route), but we don't plan on doing so for reasons of efficiency in header size (both in terms of bandwidth, and in processing to create and forward the packets). This observation is of some use when thinking about the router state which is used for doing resource allocation. Some of that state might be information about the user's service needs; information which could be sent in each packet, or which can be saved in the router, depending on which makes the most engineering sense. I call such state, which reflects the desires of the user, "user state", even when a copy is cached in the routers. However, other state is needed for this cannot be stored in each packet; it's state about the longer-term (i.e. spanning multiple packets) situation. I call this state "server state". There are two schools of thought as to how to proceed. The first says that for reasons of robustness and simplicity, all "user state" (resource class info, source route, etc) ought to be repeated in each packet. For efficiency reasons, the routers may cache such "user state", probably along with precomputed data derived from the user state. (It makes sense to store such cached user state along with any applicable server state, of course.) This school may be subdivided into two subschools, depending on what hint they use in the packet to find this cached state. (It's a hint, not a key, since the state in the router can be discarded at any time without making it impossible to forward the packet.) In one school, there's a field (the flow-id) whose sole purpose in life is to be a hint. In the other school, a number of other fields (source as source and destination address, port, etc) combine to be the hint. The second school says that it's simply going to be too inefficient to carry all the user state around all the time, and we should just bite the bullet, install it in the routers directly, and include in the packets a key (also called a flow-id, just to be confusing) to find that state. I call this this "installation" school. I'm not sure how much use there is to any intermediate position. It seems to me that to have one internetwork layer subsystem (e.g. resource allocation) carry user state in all the packets, and use a hint in the packets to find it, and have a second (e.g. routing) use a direct installaion, and use a key in the packets to find it, makes little sense. We should do one or the other, based on a consideration of the efficiency/robustness tradeoff. It's a little difficult to make this choice without more information of exactly how much user state the network is likely to have in the future (i.e. we might wind up with 500 byte headers if we include the full source route, resource reservation, etc, etc in every header). It's also difficult without consideration of the actual mechanisms involved. As a general principle, we wish to make recovery from a loss of state as local as possible, to limit the number of entities which have to become involved. For instance, when a router crashes, traffic is rerouted around it without needing to open a new TCP connection. In a similar way, the option of the "installation" looks a lot more attractive if it's plausible, and relatively cheap, to reinstall the user state when a router crashed, without otherwise causing a lot of hassle. My intuition tells me that in the long run we're better off with just biting the bullet on user state, and going to an installation paradigm with keys, not replicated user state in each packet with hints, but until we see more details it may prove difficult to know for sure which way is the best way. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa21133; 29 Mar 94 12:43 EST Received: from pizza by PIZZA.BBN.COM id aa24924; 29 Mar 94 12:19 EST Received: from BBN.COM by PIZZA.BBN.COM id aa24920; 29 Mar 94 12:16 EST Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa19273; 29 Mar 94 12:15 EST Received: by ginger.lcs.mit.edu id AA03726; Tue, 29 Mar 94 12:10:36 -0500 Date: Tue, 29 Mar 94 12:10:36 -0500 From: Noel Chiappa Message-Id: <9403291710.AA03726@ginger.lcs.mit.edu> To: int-serv@isi.edu, nimrod-wg@BBN.COM Subject: Re: How to create and maintain state in a packet network Cc: big-internet@munnari.oz.au, jnc@ginger.lcs.mit.edu It seems to me that to have one internetwork layer subsystem (e.g. resource allocation) carry user state in all the packets, and use a hint in the packets to find it, and have a second (e.g. routing) use a direct installation, and use a key in the packets to find it, makes little sense. We should do one or the other... It has been pointed out to me that there are three ways in which to interpret this statement, and it makes sense to take note of the different way, because the utility of doing this makes different degrees of sense in the different ways. First, there is the meaning I had in mind, where one single flow uses different mechanisms for different subsystems. Second, one flow might use a given technique for all its subsystems, and another flow might use a different technique of its; there is potentially some use to this, although I'm not sure the cost in complexity of supporting both mechanisms is worth the benefits. Third, one flow might use on mechanism with one router along its path, and another for a different router. A number of different reasons exist as to why one might do this, including the fact that not all routers may support the same mechanisms simultaneously. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa00817; 29 Mar 94 14:01 EST Received: from pizza by PIZZA.BBN.COM id aa25445; 29 Mar 94 13:43 EST Received: from BBN.COM by PIZZA.BBN.COM id aa25441; 29 Mar 94 13:41 EST Received: from zephyr.isi.edu by BBN.COM id aa29412; 29 Mar 94 13:39 EST Received: by zephyr.isi.edu (5.65c/5.61+local-16) id ; Tue, 29 Mar 1994 10:35:03 -0800 Date: Tue, 29 Mar 1994 10:35:03 -0800 From: Bob Braden Message-Id: <199403291835.AA23930@zephyr.isi.edu> To: int-serv@isi.edu, nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu Subject: Re: A taxonomy of state and tags in datagram networks Cc: big-internet@munnari.oz.au *> one), such as routing table entries; and cached state which is not necessary *> for the correct forwarding of packets. The term "soft state" is, I believe, *> sometimes used to refer to the latter kind of state. Noel, I would say that cached state is simply an example of soft state. The essential feature of soft state is that the router is allowed to delete it without an explicit deletion request. RSVP state is another example of soft state; if it is deleted, packets cannot be forwaarded "correctly" (although they may or may not be forwarded as best-effort datagrams). Yet a router is allowed to time it out and delete the RSVP state that is not refreshed. *> Note that neither of these states is what we refer to as "critical" *> state, i.e. state which is critical to a given end-end communication, and *> which, once lost, means the loss of the connection. An example of this is *> the state of a TCP connection, as stored co-located (i.e. sharing fate) with *> the application. YOu have to get over this exclusively connectiontist view of applications, Noel! Most realtime applications don't have state in that sense. Bob Braden   Received: from PIZZA.BBN.COM by BBN.COM id aa01678; 29 Mar 94 14:18 EST Received: from pizza by PIZZA.BBN.COM id aa25522; 29 Mar 94 13:50 EST Received: from BBN.COM by PIZZA.BBN.COM id aa25518; 29 Mar 94 13:48 EST Received: from pooh.cc.iastate.edu by BBN.COM id aa29792; 29 Mar 94 13:44 EST Received: by iastate.edu with sendmail-5.57/4.7 id ; Tue, 29 Mar 94 12:44:39 -0600 Message-Id: <9403291844.AA18886@iastate.edu> To: Noel Chiappa To: int-serv@isi.edu, nimrod-wg@BBN.COM To: big-internet@munnari.oz.au Subject: Re: How to create and maintain state in a packet network In-Reply-To: Your message of Tue, 29 Mar 94 10:54:59 -0500. <9403291554.AA03070@ginger.lcs.mit.edu> Date: Tue, 29 Mar 94 12:44:39 CST From: John Hascall [...] > Anyway, back to the question is how the state gets into the routers. [...] > There are two schools of thought as to how to proceed. [...school 1) all state in every packet...] [....subschool 1a) dedicated field in packet as hint to state cache...] [....subschool 1b) combine existing fields as hint...] [...school 2) install state in routers, key in packet to find state...] > I'm not sure how much use there is to any intermediate position. It seems to me that there is an intermediate position worth exploring. Have a hint in the packet (preferably a dedicated one in my eyes), and have the initial packet contain the state as an option (a la TCP MSS). Since it is a hint the router could discard it based on whatever criteria (LRU, timeout, lack of space, crash, etc.). It seems to me you would want a way for a router to request the hint again (if it was discarded or if the routing changed and new router wanted the hint). ICMP or its successor would seem the most obvious methodology to request it occur in the next convenient packet. John ----------------------------------------------------------------------------- John Hascall An ill-chosen word is the fool's messenger. Systems Software Engineer Project Vincent Iowa State University Computation Center + Ames, IA 50011 + 515/294-9551   Received: from PIZZA.BBN.COM by BBN.COM id aa07565; 29 Mar 94 15:51 EST Received: from pizza by PIZZA.BBN.COM id aa26404; 29 Mar 94 15:34 EST Received: from BBN.COM by PIZZA.BBN.COM id aa26400; 29 Mar 94 15:31 EST Received: from emory.mathcs.emory.edu by BBN.COM id aa05975; 29 Mar 94 15:28 EST Received: by emory.mathcs.emory.edu (5.65/Emory_mathcs.3.4.19) via UUCP id AA05865 ; Tue, 29 Mar 94 15:28:43 -0500 Received: by shlep.sware.com (16.6/2.0) from paradox.sware.com id AA05625; Tue, 29 Mar 94 15:25:28 -0500 Sender: sale@paradox.sware.com MMDF-Warning: Parse error in original version of preceding line at BBN.COM Received: from (localhost) by paradox.sware.com with SMTP (5.59/ CMW+ v2.3-eef) id AA00642; Tue, 29 Mar 94 15:24:32 EST Return-Path: Date: Tue, 29 Mar 94 15:24:32 EST Message-Id: <9403292024.AA00642@paradox.sware.com> From: Ed Sale X-Mailer: InterMail/CMW [1.1alpha] Cc: sale@sware.com To: nimrod-wg@BBN.COM Subject: Re: How to create and maintain state in a packet network In-Reply-To: Your message of Tue, 29 Mar 94 12:10:36 -0500. <9403291710.AA03726@ginger.lcs.mit.edu> Noel, I believe that the endpoints of flows which will require the new services could be responsible for maintaining the extended state in the intermediate hops along the path. Packets early in the flow's lifetime could be exchanged end-end which contain this state and give the endpoints some assurance that the nodes along the path are able to provide the required service. After this exchange, keys may be used to convey this information until such a time as a node along the path either loses this state for some reason or a new route is established for the flow. At this point the intermediate nodes that need to acquire the state could request it from either their nearest neighbors for the flow or from the endoints themselves. This allows the packets to only carry the extended state on an as-needed basis. How is it possible to reserve resources for providing *guaranteed* services in an internetwork where routers can and do occasionally fail? In my mind this question boils down to, "How many points of failure do we want to provide redundancy for?" The service-reservation messages would potentially have to be passed to all of the routers which might carry packets on behalf of the flow. In any case, I believe that providing this type of service blurs the line between the nework and transport layers to some degree. What are some examples of the kinds of services which are perceived as requiring guaranteed service(s) from the network layer? -- Ed ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Ed Sale SecureWare, Inc. email: sale@sware.com 2957 Clairmont Rd. #200 phone: (404) 315-6296 x24 Atlanta, GA 30329-1647 fax: (404) 315-0293   Received: from PIZZA.BBN.COM by BBN.COM id aa19819; 30 Mar 94 1:31 EST Received: from pizza by PIZZA.BBN.COM id aa28975; 30 Mar 94 1:10 EST Received: from BBN.COM by PIZZA.BBN.COM id aa28971; 30 Mar 94 1:07 EST Received: from monk.proteon.com by BBN.COM id aa19141; 30 Mar 94 1:07 EST Received: from rockford.proteon.com by monk.proteon.com (4.1/Proteon-1.5) id AA28985; Wed, 30 Mar 94 01:06:59 EST Received: by rockford.proteon.com (4.1/SMI-4.1) id AA04527; Wed, 30 Mar 94 01:06:59 EST Date: Wed, 30 Mar 94 01:06:59 EST From: Avri Doria Message-Id: <9403300606.AA04527@rockford.proteon.com> To: nimrod-wg@BBN.COM Subject: Cosmetics or definitions Reply-To: avri@proteon.com i question whether details in a definition can be called cosmetic and readjusted sometime after those definitions have been deeply embeded into the architecture. some definitions can be left loose, e.g. the nature of the object to which an EID is attached, for it may not really matter what kind of object it is that is atomic or fate sharing. other defintions, e.g. whether a boundary attachment point is a node in the context of the next higher abstraction, seems to me to need to be nailed down. avri   Received: from PIZZA.BBN.COM by BBN.COM id aa19404; 30 Mar 94 12:24 EST Received: from pizza by PIZZA.BBN.COM id aa01569; 30 Mar 94 12:03 EST Received: from BBN.COM by PIZZA.BBN.COM id aa01565; 30 Mar 94 11:58 EST Received: from monk.proteon.com by BBN.COM id aa17581; 30 Mar 94 11:52 EST Received: from rockford.proteon.com by monk.proteon.com (4.1/Proteon-1.5) id AA06554; Wed, 30 Mar 94 11:52:52 EST Received: by rockford.proteon.com (4.1/SMI-4.1) id AA04625; Wed, 30 Mar 94 11:52:51 EST Date: Wed, 30 Mar 94 11:52:51 EST From: Avri Doria Message-Id: <9403301652.AA04625@rockford.proteon.com> To: nimrod-wg@BBN.COM Subject: multicast datagrams (pardon the expression) Reply-To: avri@proteon.com in conversations after yesterday morning's meetings, i felt that it was necessary that there be a way to handle what has been defined as non existant: a multicast datagram. i envision the following possible case: some service provider (a business that is) sends out a daily packet to a set of subscribers with information they consider vital to their businesses. this would be a sender originated multicast and new businesses would subscribe in their own good time. the sender would build its multicast list and send it off every morning. it certainly would not make sense for the intervening routers to keep state on such a 'every once in an eternity' type of event. it also would not make sense for state to be created in each router (i forget what you are calling the act of setting up multicast state) just to send the one packet. i am also not sure that it would be reasonable for explorers to be sent out to build a source route (sorry, i forget the acronymic euphomism). avri   Received: from PIZZA.BBN.COM by BBN.COM id aa12892; 30 Mar 94 19:14 EST Received: from pizza by PIZZA.BBN.COM id aa04261; 30 Mar 94 18:54 EST Received: from TTL.BBN.COM by PIZZA.BBN.COM id aa04257; 30 Mar 94 18:51 EST To: avri@proteon.com cc: nimrod-wg@BBN.COM Subject: Re: multicast datagrams (pardon the expression) In-reply-to: Your message of Wed, 30 Mar 94 11:52:51 -0500. <9403301652.AA04625@rockford.proteon.com> Date: Wed, 30 Mar 94 18:47:55 -0500 From: Ram Ramanathan >in conversations after yesterday morning's meetings, i felt that it was >necessary that there be a way to handle what has been defined as non >existant: a multicast datagram. >i envision the following possible case: some service provider (a >business that is) sends out a daily packet to a set of subscribers with >information they consider vital to their businesses. this would be a >sender originated multicast and new businesses would subscribe in their >own good time. the sender would build its multicast list and send it >off every morning. it certainly would not make sense for the >intervening routers to keep state on such a 'every once in an >eternity' type of event. it also would not make sense for state to be >created in each router (i forget what you are calling the act of >setting up multicast state) just to send the one packet. i am also >not sure that it would be reasonable for explorers to be sent out to >build a source route (sorry, i forget the acronymic euphomism). If we have a 'every once in an eternity' type of scenario, why use multicast at all? Why not send individual unicast datagrams? If there are, for example, n destinations, the difference between n individual datagrams and one multicast datagram is probably insignificant when the event occurs infrequently. The difference in overhead between n unicasts and a multicast to n destinations is significant primarily when you send a *large number* of packets (like in video conf) and then, the overhead of n unicasts gets multiplied by that large number and becomes unacceptable. But in this situation, setting up a flow is but a minor price to pay. The point is that "multicasting" itself is simply a more efficient way of sending n unicasts - basically a scalability improvement and becomes relativly unimportant for infrequent or small volume applications. That said, I do think we should consider this option some and am glad Avri brought it up. However, I am not sure everybody agrees on the definiton of a multicast datagram (there was some confusion about this yesterday). The important thing is what kind of state requirements does this imply? No state other than the maps (which are state in a sense)?? Some simple state derived from the maps but not explicityly setup by the sender or receiver? Any comments? - Ram. -Ram.   Received: from PIZZA.BBN.COM by BBN.COM id aa15310; 4 Apr 94 2:38 EDT Received: from pizza by PIZZA.BBN.COM id aa22819; 4 Apr 94 2:15 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa22815; 4 Apr 94 2:10 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa14595; 4 Apr 94 2:09 EDT Received: by ginger.lcs.mit.edu id AA23997; Mon, 4 Apr 94 02:09:47 -0400 Date: Mon, 4 Apr 94 02:09:47 -0400 From: Noel Chiappa Message-Id: <9404040609.AA23997@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: multicast datagrams (pardon the expression) Cc: jnc@ginger.lcs.mit.edu it was necessary that there be a way to handle what has been defined as non existant: a multicast datagram. It turns out that I think there is a way to do this, but first let me point out that in Nimrod "datagram" has the special meaning of an application which sends only one "packet" and then is done, so setup is per se not useful. This presents a conundrum, since you normally can't have a multi-cast group without some setup, so what's the point, really? The answer, obviously, is "source-routed multicast datagrams"! I.e., the datagram contains the spanning-tree path it should be distributed over. The only bug here is that if the group is a large one, the tree could be a *lot* of data.... (Hmm, maybe IPng needs a 32-bit header length field! :-) The knowledge of who is in the group has to be distributed to all sources, of course... Actually, there's a way to solve that, which is to have regional distribution nodes, who know the members in their area. Then you only have to get the packet to them, and they pass it out further. (This would also limit the size of the tree.) Perhaps such servers would be a general service, one which it is worth connecting up with an installed multi-cast flow? Hmmm... lots of possibilities here. i envision the following possible case: some service provider ... sends out a daily packet to a set of subscribers with information they consider vital to their businesses. ... the sender would build its multicast list and send it off every morning. I was wondering about this application; would it really make sense to do this as multicast, and not a bunch of unicast? I guess, if the multi-cast group was large enough, it makes sense. Then you're back to the previous problem... not make sense for state to be created in each router (i forget what you are calling the act of setting up multicast state) I think of this as "multicast flow setup". Actually, I've been thinking a rather radical though, which is that maybe we should ditch separate unicast mechanisms (in flow setup and sourc routed packets), and only have multicast. Unicast would then be a special case of multicast. (I assume this will make Steve happy! :-) My reasoning involves asking "Is there really enough efficiency advantage in having a unicast flow mechanism, separate from multicast flows, to make it worth the complexity of having two separate mechanisms?" I mean, if multi-cast is used a lot, the mechanism to support it must work reasonably efficiently, right? If so, maybe the answer to the previous quesion is "no", right? The thing I'm wondering about is scaling; I assume that the complexity of supporting maintainence of large multicast groups, where you have adding and pruning going on, means you need a range of mechanisms to support groups of varying sizes fromo 3 to 3 million? Perhaps there is some general way of distributing the calculation of the spanning trees, and their maintainence, which will scale? Maybe we will build the trees as colletions of "distribution nodes", one at each place the tree branches, and connect up the nodes with unicast flows? Anyway, it's late, and I can't think clearly. Have to think about this... Noel PS: Nimrod is a fairly complex system, so any time I see a feature of dubious utility (such as flow aggregation, separate unicast and multicast mechanisms, etc) my immediate reaction is "oh good, something I can get rid of"! After all, the Nimrod motto *is* "Perfection has been attained, not when there is nothing left to add, but when there is nothing left to take away"!   Received: from PIZZA.BBN.COM by BBN.COM id aa14075; 4 Apr 94 11:34 EDT Received: from pizza by PIZZA.BBN.COM id aa24047; 4 Apr 94 11:13 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa24043; 4 Apr 94 11:11 EDT Received: from nsco.network.com by BBN.COM id aa12374; 4 Apr 94 11:08 EDT Received: from anubis.network.com by nsco.network.com (5.61/1.34) id AA18292; Mon, 4 Apr 94 10:11:33 -0500 Received: from gramarye.network.com by anubis.network.com (4.1/SMI-4.1) id AA00977; Mon, 4 Apr 94 10:07:58 CDT Date: Mon, 4 Apr 94 10:07:58 CDT From: Joel Halpern Message-Id: <9404041507.AA00977@anubis.network.com> To: nimrod-wg@BBN.COM Subject: Re: multicast datagrams (pardon the expression) Noel suggests, in his latest "interesting-gram", that we get rid of separate Unicast mechanisms, and consider unicast a special case of multi-cast. I think that the key to evaluating this is how much extra work/state/complexity is involved in the "normal" multicast case. If multicast can be handled, at the internetwork layer, at the same complexity and performance as unicast, then we should seriously consider this. One good think about this is that it probably allows, with proper thought, for almost any variation on "anycast" that one wants, since that is merely a behavior halfway between unicast and multicast. This should also serve as a warning as to where scaling is likely to be a problem. Not with the protocol/routing, but with the usage we are encouraging. Thank you, Joel M. Halpern jmh@network.com Network Systems Corporation   Received: from PIZZA.BBN.COM by BBN.COM id aa22876; 4 Apr 94 13:56 EDT Received: from pizza by PIZZA.BBN.COM id aa24756; 4 Apr 94 13:39 EDT Received: from TTL.BBN.COM by PIZZA.BBN.COM id aa24752; 4 Apr 94 13:37 EDT To: nimrod-wg@BBN.COM Subject: Re: multicast datagrams (pardon the expression) In-reply-to: Your message of Mon, 04 Apr 94 10:07:58 -0500. <9404041507.AA00977@anubis.network.com> Date: Mon, 04 Apr 94 13:30:49 -0400 From: Ram Ramanathan Joel Halpern writes : >Noel suggests, in his latest "interesting-gram", that we get rid of >separate Unicast mechanisms, and consider unicast a special case of >multi-cast. I think that the key to evaluating this is how much extra >work/state/complexity is involved in the "normal" multicast case. If >multicast can be handled, at the internetwork layer, at the same >complexity and performance as unicast, then we should seriously consider >this. As mentioned in the architecture document, multicasting has the following non-trivial things to consider, that are not a concern in unicast : 1) Groups and group dynamism. 2) Diverse possibilities of state creation - eg. initiated by senders, receivers, both or neither. Each has its advantages and disadvantages and must not be precluded. 3) Consequences of policies in multicasting. If a transit policy precludes some members as destination and allows the rest of the members, the group is essentially partitioned into two subgroups (arbitrary number in general), requiring a multicast "forest" not a tree - a rather difficult problem. We have addressed and solved this problem in IDPR, but Nimrod would be more complicated. etc. I used to be violently for making unicast a special case of multicast a few months ago. But the above and other points have tempered me a bit. I believe we should concentrate on unicast by itself first. Later, when we understand the tradeoffs better, we could combine the two if we agree that the extra baggage for unicast is worth the "uniformity". >One good think about this is that it probably allows, with proper >thought, for almost any variation on "anycast" that one wants, since >that is merely a behavior halfway between unicast and multicast. This >should also serve as a warning as to where scaling is likely to be a >problem. Not with the protocol/routing, but with the usage we are >encouraging. Yes, it is powerful. In fact, let us take this one step further in generality. In an ideal world, the network provides a x-y-z-cast. Here, x = nodes in the network. y = a subset of x. z = ANY or ALL (a selector that says if all of y or any of y is the dest). Then, broadcast/unicast/multicast/anycast are special cases of x-y-z-cast : broadcast : y = x, z = ALL. unicast : y = destination, z = ALL. multicast : y = group, z = ALL. anycast : y = group, z = ANY (anycast : communicate with any of a given set of nodes). Of course, the advantage in generalizing particulars is that we can "see" new particulars. An (rather impotent) example is : anyothercast : y = x - group, z = ANY. But other combinations may be interesting. best regards, - Ram. -------------------------------------------------------------- Ram Ramanathan Systems and Technologies Bolt, Beranek and Newman Inc. 10 Moulton Street, Cambridge, MA 02138 Phone : (617) 873-2736 INTERNET : ramanath@bbn.com   Received: from PIZZA.BBN.COM by BBN.COM id aa12640; 4 Apr 94 21:14 EDT Received: from pizza by PIZZA.BBN.COM id aa27433; 4 Apr 94 20:54 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa27429; 4 Apr 94 20:52 EDT Received: from alex.disem.dnd.ca by BBN.COM id aa12030; 4 Apr 94 20:52 EDT Received: by alex.disem.dnd.ca (4.1/SMI-4.1) id AA04461; Mon, 4 Apr 94 20:51:57 EST From: "Capt L. Clement" Message-Id: <9404050151.AA04461@alex.disem.dnd.ca> Subject: Is the NIMROD Proposal available? To: nimrod-wg@BBN.COM Date: Mon, 4 Apr 1994 20:51:57 -0500 (EST) X-Mailer: ELM [version 2.4 PL23] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 433 Is the NIMROD proposal available for release? If so, where can it be obtained. Thank you for your consideration in this matter. ------------------------------------------------------------------ Capt L Clement, DISEM 3-4-2 clement@disem.dnd.ca National Defence Headquarters 219 Laurier Ave West Ottawa, Canada Tel: 613 992-3851 K1A 0K2 Fax: 613 996-3979 ------------------------------------------------------------------   Received: from PIZZA.BBN.COM by BBN.COM id aa09780; 5 Apr 94 1:13 EDT Received: from pizza by PIZZA.BBN.COM id aa28333; 5 Apr 94 0:45 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa28329; 5 Apr 94 0:42 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa09072; 5 Apr 94 0:42 EDT Received: by ginger.lcs.mit.edu id AA02140; Tue, 5 Apr 94 00:42:23 -0400 Date: Tue, 5 Apr 94 00:42:23 -0400 From: Noel Chiappa Message-Id: <9404050442.AA02140@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: RFC-1609 Cc: jnc@ginger.lcs.mit.edu It might prove somewhat interesting to look at this document. Here's a clip from it: Network related information, referred to as 'network map' in the rest of this paper, should 1. Show the interconnection between the various network elements. This will basically represent the Network as a graph where vertices represent objects like gateways/workstations/ subnetworks and edges indicate the connections. 2. Show properties and functions of the various network elements and the interconnections. Attributes of vertices will represent various properties of the objects e.g., speed, charge, protocol, OS, etc. Functions include services offered by a network element. ... 5. Contain the policy related information, part of which may be private while the other part may be made public. Using this map the following services may be provided ... 2. Route management: - Find alternate routes by referring to the physical and logical configurations. - Generate routing tables considering local policy and policy of transit domains - Check routing tables for routing loops, non-optimality, incorrect paths, etc. 3. Fault management: In case of network failures alternatives may be found and used to bypass the problem node or link. ... 5. Optimization: The information available can be used to carry out various optimizations, for example cost, traffic, response-time, etc. It all sounds familiar, no? However, on reading it, my perception is that they haven't yet gotten their hands around the abstraction problem; i.e. they provide a way to distribute the storage of the map, but provide no way to "simplify" pieces of it. Since that's the really hard one... Still, it's interesting to see someone else going down the map-based road... Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa20674; 5 Apr 94 4:57 EDT Received: from pizza by PIZZA.BBN.COM id aa29174; 5 Apr 94 4:36 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa29170; 5 Apr 94 4:33 EDT Received: from mitsou.inria.fr by BBN.COM id aa20112; 5 Apr 94 4:30 EDT Received: by mitsou.inria.fr (5.65c8/IDA-1.2.8) id AA22174; Tue, 5 Apr 1994 10:34:59 +0200 Message-Id: <199404050834.AA22174@mitsou.inria.fr> To: Ram Ramanathan Cc: nimrod-wg@BBN.COM Subject: Re: multicast datagrams (pardon the expression) In-Reply-To: Your message of "Mon, 04 Apr 1994 13:30:49 EDT." <199404041810.AA02136@sophia.inria.fr> Date: Tue, 05 Apr 1994 10:34:59 +0200 From: Christian Huitema One key point of multicasting is that you often don't know whom you are sending to. In that case, there is no possibility of including a 'delivery tree' in the packet - that technique is only fit for well controlled conferences were a signalling procedure keeps track of the membership. Christian Huitema   Received: from PIZZA.BBN.COM by BBN.COM id aa18918; 5 Apr 94 12:01 EDT Received: from pizza by PIZZA.BBN.COM id aa01233; 5 Apr 94 11:46 EDT Received: from BBN.COM by PIZZA.BBN.COM id ab01225; 5 Apr 94 11:42 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa17248; 5 Apr 94 11:36 EDT Received: by ginger.lcs.mit.edu id AA07570; Tue, 5 Apr 94 11:35:59 -0400 Date: Tue, 5 Apr 94 11:35:59 -0400 From: Noel Chiappa Message-Id: <9404051535.AA07570@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: multicast datagrams (pardon the expression) Cc: jnc@ginger.lcs.mit.edu One key point of multicasting is that you often don't know whom you are sending to. In that case, there is no possibility of including a 'delivery tree' in the packet - that technique is only fit for well controlled conferences were a signalling procedure keeps track of the membership. All multicast groups do have two things: a membership list, and a delivery tree; the only question is to whether that list is stored in a distributed fashion, and whether the tree is computed by a distributed algorithm. In most current multicast systems, the answer to both is "yes". (I'm excluding multicast groups which only span a single hardware network; the forwarding infrastructure isn't involved in such groups.) In particular, one can view the distributed computation and setup of the delivery tree in most current schemes as a "flow setup"; it's just one with no resource allocation or other flow-type things, simply information about packet forwarding paths (like Nimrod). It cetainly results in state about that multicast group being stored in routers: if it walks like a flow, and quacks like a flow... Maybe there isn't a use for "real" datagram multicast; i.e. one which has *no* state associated with that flowed stored in the routers. However, I for one can't see a way to do it other than i) including the group delivery tree in the packets, or ii) relying on server(s) which know that information, and stick it in the packets (perhaps in a distributed fashion, so that no single server knows the whole tree). Of course, the difference between ii) and multicast flow setup is mostly whether the "servers" are co-located with the routers or not.... Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa25827; 5 Apr 94 13:53 EDT Received: from pizza by PIZZA.BBN.COM id aa01968; 5 Apr 94 13:34 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa01964; 5 Apr 94 13:27 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa24055; 5 Apr 94 13:25 EDT Received: by ginger.lcs.mit.edu id AA09017; Tue, 5 Apr 94 13:25:12 -0400 Date: Tue, 5 Apr 94 13:25:12 -0400 From: Noel Chiappa Message-Id: <9404051725.AA09017@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: How to create and maintain state in a packet network Cc: jnc@ginger.lcs.mit.edu From: Ed Sale I believe that the endpoints of flows which will require the new services could be responsible for maintaining the extended state in the intermediate hops along the path. I think that this is the ultimate fallback position, for reasons of both robustness and simplicity. I.e. if we don't have to engineer the network to *never* punt back to the hosts, it makes the network engineering a lot easier. However, we may still want to do repairs as locally as possible, for a number of reasons... Packets early in the flow's lifetime could be exchanged end-end which contain this state and give the endpoints some assurance that the nodes along the path are able to provide the required service. After this exchange, keys may be used to convey this information You've described what I think of as flow setup, and flow-id's, exactly. until such a time as a node along the path either loses this state for some reason or a new route is established for the flow. At this point the intermediate nodes that need to acquire the state could request it from either their nearest neighbors for the flow or from the endoints themselves. Right; it's a "simple matter of engineering" (famous last words :-) as to which of the two gets done when, based on cost/benfit tradeoff issues. How is it possible to reserve resources for providing *guaranteed* services in an internetwork where routers can and do occasionally fail? In my mind this question boils down to, "How many points of failure do we want to provide redundancy for?" As Masataka Ohta pointed out, there's no such thing as an absolute guarantee. The best you can do is, as you expend more resources (money :-), you get a higher probability of getting the kind of service you want. The service-reservation messages would potentially have to be passed to all of the routers which might carry packets on behalf of the flow. I don't quite see this. I can see systems (such as hop-by-hop) where the large degree of local freedom on where packets go, *together with* the strong decoupling between routing and resource allocation, raise issues which can only be solved with this kind of thing. However, if you have an internetwork layer which couples the two more tightly, you can avoid this. Am I missing something? As a semi-worked example, in Nimrod you go to set up a flow with certain resource needs, so you try and do a resource allocation as part of the setup. If it succeeds along the path you specified (which may be specified in terms of high-level entities which actually refer to a number of parallel real physical paths), that means the routers have picked a linearly arranged set of links and switches to carry your traffic, and the resources were allocated. If one of those fails, the routers may be able to select an alternate set, although still within the group named by the high-level entities which you named in your flow path, and allocate the resources you asked for along that path. If so, everything's OK; you may see a slight service interruption. (Actually, you may not wish such automatic local repair; I suppose we could have a switch in the flow setup to disable it.) If not, you get told "sorry, redo the setup, we can't go it any more". In any case, I believe that providing this type of service blurs the line between the nework and transport layers to some degree. No. What we are doing is radically enhancing the service model provided by the internetwork layer... What are some examples of the kinds of services which are perceived as requiring guaranteed service(s) from the network layer? Jeez, I'm not an application type, so I have to hand-wave a bit; they can answer better than I. The people doing voice seem to think they have resource floors belows which their application just won't work... Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa00992; 6 Apr 94 14:28 EDT Received: from pizza by PIZZA.BBN.COM id aa08990; 6 Apr 94 14:07 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa08986; 6 Apr 94 14:01 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa28584; 6 Apr 94 13:59 EDT Received: by ginger.lcs.mit.edu id AA19774; Wed, 6 Apr 94 13:59:50 -0400 Date: Wed, 6 Apr 94 13:59:50 -0400 From: Noel Chiappa Message-Id: <9404061759.AA19774@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: multicast datagrams (pardon the expression) Cc: jnc@ginger.lcs.mit.edu > regional distribution nodes, who know the members in their area. Then > you only have to get the packet to them, and they pass it out further. > (This would also limit the size of the tree.) Perhaps such servers would > be a general service, one which it is worth connecting up with an > installed multi-cast flow? could you explain how what you propose differs from CBT? Well, depending on what aspect of multicast you're looking at, it may not be very different. I mean, if you're looking at what paths the user data flows along, it may be much the same. As I mentioned in my reply to Christian, I divide multi-cast state into group membership, and distribution (spanning) tree(s). These can basically be handled and calculated separately (with the exception that the former is input to the latter). Most multicast schemes (including CBT) seem to intermix these two functions together. In fact, in looking at multicast schemes in general, there are three different important aspects: the creation and distribution of the two classes of state, and the actual paths the data takes. (For the latter point, CBT uses distribution from centralized point(s); other schemes use separate trees for each source, e.g. MOSPF/DVMRP, or allow either, e.g. PIM.) In each of these three areas, there seem to be different answers that make sense depending on the size of the group, data rate, etc. For instance, if you have a 4 site videoconference, they probably all know who else is in the conference, and it probably makes sense to calculate the spanning tree in a unitary (i.e. non-distributed) way. Separate trees seem to be the way to go in terms of how to distribute the user data for this application, from what I understand. Larry King Live (with callins, so it's not all one way, a la HBO), on the other hand, would use something totally different in all these areas. To me, the best way to go for the long term is to provide a framework into which various "local" answers for all of these things fit. Separating out maintainence of group membership and tree calculation from each other, and allowing varying local answers for each, seems like part of the answer. That way, new algorithms for doing either can be deployed incrementally without great upheaval. The question is "what mechanism must be provided in a uniform system-wide way to allow this"; is multi-cast flow setup it? Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa18477; 7 Apr 94 0:04 EDT Received: from pizza by PIZZA.BBN.COM id aa02037; 6 Apr 94 23:42 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa02033; 6 Apr 94 23:40 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa14493; 6 Apr 94 23:39 EDT Received: by ginger.lcs.mit.edu id AA24966; Wed, 6 Apr 94 23:39:20 -0400 Date: Wed, 6 Apr 94 23:39:20 -0400 From: Noel Chiappa Message-Id: <9404070339.AA24966@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Source routes... Cc: jnc@ginger.lcs.mit.edu We've been requested by the IPNg Technical Requirements people to provide a list of the Nimrod requirements for IPng. I'm working on a draft for what we could submit, and I've come across something I'd like to raise. (I don't want to make any assumptions about whether or not the Internet will use Nimrod (although I think something like it will eventually be where the Internet winds up), so I can't tell them exactly what the IPng requirements will be for routing, as other schemes may need different support. However, I can tell them what Nimrod needs.) The issue has to do with source-routed packets. It has to do with how one actually forward such packets; I imagine a mechanism much like the way datagram packets work. I had imagined that if one expressed a source route in terms of, say, a high-level virtual link, the efficient and robust way to actually forward that packet would be for the nodes which are the ends of that virtual link to actually set up a flow which instantiates that virtual link. Any source-routed packet which arrives which specifies that virtual link would be assigned to that flow (i.e. you bash that flow-id into the unused flow-id field in the packet, and fire it down the flow); when it pops out the other end, the router there looks at the next element in the source route. Does this seem like a reasonable model of how to do it? Of course, any Nimrod area could internally do something different, as long as it followed the semantics of the source route, but I imagine this would be the way to go in most cases. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa21957; 7 Apr 94 11:50 EDT Received: from pizza by PIZZA.BBN.COM id aa04811; 7 Apr 94 11:28 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa04807; 7 Apr 94 11:25 EDT Received: from wd40.ftp.com by BBN.COM id aa20087; 7 Apr 94 11:22 EDT Received: from mailserv-D.ftp.com by ftp.com ; Thu, 7 Apr 1994 11:21:57 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA24901; Thu, 7 Apr 94 11:21:04 EDT Date: Thu, 7 Apr 94 11:21:04 EDT Message-Id: <9404071521.AA24901@mailserv-D.ftp.com> To: jnc@ginger.lcs.mit.edu Subject: Re: Source routes... From: Frank Kastenholz Reply-To: kasten@ftp.com Cc: nimrod-wg@BBN.COM Content-Length: 808 > Does this seem like a reasonable model of how to do it? Yes. But in keeping with Nimrod's model of letting areas do things how they please, I'd word it "the nodes which are the ends of that virtual link take the responsibility for getting the packet across the virtual link in a manner consistent with the Nimrod Architecture". If Nimrod does its best to not specifiy particular algorithms, etc, then saying that the two nodes set up a flow seems to be specifying an algorithm -- I could imagine that the two nodes could do true hop-by-hop forwarding, ala IPv4, if their local topology was simple enough. Where the underlying 'context' is to be general, keep the wording general, and vice versa... -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa29349; 7 Apr 94 13:57 EDT Received: from pizza by PIZZA.BBN.COM id aa05929; 7 Apr 94 13:44 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa05925; 7 Apr 94 13:41 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa28173; 7 Apr 94 13:36 EDT Received: by ginger.lcs.mit.edu id AA03490; Thu, 7 Apr 94 13:36:38 -0400 Date: Thu, 7 Apr 94 13:36:38 -0400 From: Noel Chiappa Message-Id: <9404071736.AA03490@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: Nimrod IPng technical requirements text Cc: jnc@ginger.lcs.mit.edu In polling the BBN crew for points to put in this, I came across an interesting point. It was suggested that I include "multicast locators" as a requirement. This caused an immediate fault. Locators are the names of objects in the Nimrod map; you can't *have* a multicast locator! This raises an interesting question. We can have a multi-drop flow, but how do you name the set of things the multi-cast flow is delivering to (i.e. the multi-cast group)? It can't be a locator, right? It has to be an EID, the only other kind of name we've got. This kind of tends to blow a hole in the definition of an "endpoint" as a fate-sharing region, though... That does tie into something Bob Braden said the other day, which is that it's useful to think more about multi-cast applications, where the concept of "critical state" is far less useful. Maybe these are two facets of the same tihing. Anyway, that would mean that we have EID's for multicast groups, and, moreover, that a single endpoint can have more than one EID. (It would be useful to be able to tell from looking at an EID whether it names a group, or a single endpoint; we can use the "top bit" hack for that.) So, I think that resolves an old open point about whether endpoints and EID's are in one-one correspondance... of course, there's still the issue of whether a single endpoint can have more than one non-multicast EID. Does this all sound OK? Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa02003; 7 Apr 94 14:38 EDT Received: from pizza by PIZZA.BBN.COM id aa06226; 7 Apr 94 14:19 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa06222; 7 Apr 94 14:18 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa00222; 7 Apr 94 14:10 EDT Received: by ginger.lcs.mit.edu id AA03954; Thu, 7 Apr 94 14:10:36 -0400 Date: Thu, 7 Apr 94 14:10:36 -0400 From: Noel Chiappa Message-Id: <9404071810.AA03954@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: Source routes... Cc: jnc@ginger.lcs.mit.edu But in keeping with Nimrod's model of letting areas do things how they please, I'd word it "the nodes which are the ends of that virtual link take the responsibility for getting the packet across the virtual link in a manner consistent with the Nimrod Architecture". I thought that's what I said: "Of course, any Nimrod area could internally do something different, as long as it followed the semantics of the source route". If Nimrod does its best to not specifiy particular algorithms, etc, then saying that the two nodes set up a flow seems to be specifying an algorithm The reason I bring this sort of stuff up is that to the extent we have "recommended" mechanisms, those mechanisms may find support in the packet format (e.g. a flow-id they can bash locally) useful, as opposed to having to create a new header to wrap the packet for transit across their system. Also, let's be realistic; most people will simply implement what the spec suggests, not go invent some whole new mechanism! Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa02237; 7 Apr 94 14:42 EDT Received: from pizza by PIZZA.BBN.COM id aa06280; 7 Apr 94 14:25 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa06276; 7 Apr 94 14:23 EDT Received: from usc.edu by BBN.COM id aa00396; 7 Apr 94 14:14 EDT Received: from hermosa.usc.edu by usc.edu (4.1/SMI-3.0DEV3-USC+3.1) id AA21256; Thu, 7 Apr 94 11:14:03 PDT Received: by hermosa.usc.edu (4.1/SMI-4.1+ucs-3.6) id AA08573; Thu, 7 Apr 94 11:18:49 PDT Date: Thu, 7 Apr 94 11:18:49 PDT From: "Daniel M. Alexander Zappala" Message-Id: <9404071818.AA08573@hermosa.usc.edu> To: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu In-Reply-To: <9404071736.AA03490@ginger.lcs.mit.edu> (message from Noel Chiappa on Thu, 7 Apr 94 13:36:38 -0400) Subject: Re: Nimrod IPng technical requirements text Reply-To: daniel@catarina.usc.edu >> On Thu, 7 Apr 94 13:36:38 -0400, Noel Chiappa said: > In polling the BBN crew for points to put in this, I came across an > interesting point. It was suggested that I include "multicast locators" as a > requirement. > This caused an immediate fault. Locators are the names of objects in the > Nimrod map; you can't *have* a multicast locator! > This raises an interesting question. We can have a multi-drop flow, but how > do you name the set of things the multi-cast flow is delivering to (i.e. the > multi-cast group)? It can't be a locator, right? It has to be an EID, the only > other kind of name we've got. This kind of tends to blow a hole in the > definition of an "endpoint" as a fate-sharing region, though... > That does tie into something Bob Braden said the other day, which is that it's > useful to think more about multi-cast applications, where the concept of > "critical state" is far less useful. Maybe these are two facets of the same > tihing. Why not call it a set of EIDs and give it a set-ID? Set-IDs can refer to the types of non-fatesharing entities that Bob Braden says you need to look into. Daniel   Received: from PIZZA.BBN.COM by BBN.COM id aa07950; 7 Apr 94 15:55 EDT Received: from pizza by PIZZA.BBN.COM id aa06733; 7 Apr 94 15:38 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa06729; 7 Apr 94 15:35 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa06251; 7 Apr 94 15:26 EDT Received: by ginger.lcs.mit.edu id AA04740; Thu, 7 Apr 94 15:26:46 -0400 Date: Thu, 7 Apr 94 15:26:46 -0400 From: Noel Chiappa Message-Id: <9404071926.AA04740@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: Nimrod IPng technical requirements text Cc: jnc@ginger.lcs.mit.edu > This raises an interesting question. We can have a multi-drop flow, but > how do you name the set of things the multi-cast flow is delivering to > (i.e. the multi-cast group)? ... It has to be an EID, the only other > kind of name we've got. ... This kind of tends to blow a hole in the > definition of an "endpoint" as a fate-sharing region, though... Why not call it a set of EIDs and give it a set-ID? Set-IDs can refer to the types of non-fatesharing entities that Bob Braden says you need to look into. Hmm, good idea. I assume there's pretty much a one-one mapping between the concept of "multicast group" and the concept of "set of endpoints", right? Here are some mechanical questions: Does everything work OK if the set-ID's (SID's) come from the same namespace as EID's (perhaps differentiated by the high bit, or something)? Is there any reason to draw them from a different namespace? Also, do packets being sent to SID's look just like packets destined to EID's? I.e., except for the different kind of destinaion "name", is there any different information which needs to be carried to be useful, etc? Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa12363; 7 Apr 94 17:13 EDT Received: from pizza by PIZZA.BBN.COM id aa07232; 7 Apr 94 16:46 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa07228; 7 Apr 94 16:44 EDT Received: from usc.edu by BBN.COM id aa09634; 7 Apr 94 16:22 EDT Received: from laguna.usc.edu by usc.edu (4.1/SMI-3.0DEV3-USC+3.1) id AA26531; Thu, 7 Apr 94 13:22:52 PDT Received: by laguna.usc.edu (4.1/SMI-4.1+ucs-3.6) id AA05986; Thu, 7 Apr 94 13:29:01 PDT Date: Thu, 7 Apr 94 13:29:01 PDT From: "Daniel M. Alexander Zappala" Message-Id: <9404072029.AA05986@laguna.usc.edu> To: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu In-Reply-To: <9404071926.AA04740@ginger.lcs.mit.edu> (message from Noel Chiappa on Thu, 7 Apr 94 15:26:46 -0400) Subject: Re: Nimrod IPng technical requirements text Reply-To: daniel@catarina.usc.edu >> On Thu, 7 Apr 94 15:26:46 -0400, Noel Chiappa said: >> This raises an interesting question. We can have a multi-drop flow, but >> how do you name the set of things the multi-cast flow is delivering to >> (i.e. the multi-cast group)? ... It has to be an EID, the only other >> kind of name we've got. ... This kind of tends to blow a hole in the >> definition of an "endpoint" as a fate-sharing region, though... > Why not call it a set of EIDs and give it a set-ID? Set-IDs can refer to > the types of non-fatesharing entities that Bob Braden says you need to look > into. > Hmm, good idea. I assume there's pretty much a one-one mapping between the > concept of "multicast group" and the concept of "set of endpoints", > right? Right. Or maybe it's a one-one mapping with a "flow"? I.e. a set of EIDs that do not share a fate but DO share routing and QOS state in the network? > Here are some mechanical questions: > Does everything work OK if the set-ID's (SID's) come from the same namespace > as EID's (perhaps differentiated by the high bit, or something)? Is there any > reason to draw them from a different namespace? Well, a set can be identified by enumerating its elements, so the long-form of the set-ID is a listing of its constituent EIDs. Same thing as saying you can send a packet to a multicast group by sending a bunch of unicast packets. Of course you prefer to have an alias for this long name, and this alias could be assigned the way you mention. Technically, isn't using the high bit to differentiate an SID from an EID the same thing as splitting the namespace in half? The only concern is keeping enough space for EIDs. > Also, do packets being sent to SID's look just like packets destined to EID's? > I.e., except for the different kind of destinaion "name", is there any > different information which needs to be carried to be useful, etc? Can't think of any offhand, but it requires more thought. Since you may be treating state in the network differently for SIDs, there may extra info. Daniel   Received: from PIZZA.BBN.COM by BBN.COM id aa14458; 7 Apr 94 17:51 EDT Received: from pizza by PIZZA.BBN.COM id aa07568; 7 Apr 94 17:33 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa07564; 7 Apr 94 17:32 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa12545; 7 Apr 94 17:16 EDT Received: by ginger.lcs.mit.edu id AA06951; Thu, 7 Apr 94 17:16:38 -0400 Date: Thu, 7 Apr 94 17:16:38 -0400 From: Noel Chiappa Message-Id: <9404072116.AA06951@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: Nimrod IPng technical requirements Cc: jnc@ginger.lcs.mit.edu Coming under separate cover is a first crack at the first section of a note on IPng requirement for Nimrod. The first section has to do with packet format issues, and it's pretty well done out; please comment. The second deals with the general interaction with the rest of the internetwork layer, and is still pretty sketchy. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa14707; 7 Apr 94 17:56 EDT Received: from pizza by PIZZA.BBN.COM id aa07577; 7 Apr 94 17:34 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa07573; 7 Apr 94 17:33 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa12602; 7 Apr 94 17:18 EDT Received: by ginger.lcs.mit.edu id AA06969; Thu, 7 Apr 94 17:18:05 -0400 Date: Thu, 7 Apr 94 17:18:05 -0400 From: Noel Chiappa Message-Id: <9404072118.AA06969@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: Nimrod IPng technical requirements (text I) Nimrod and IPng Technical Requirements I don't want to make any assumptions about whether or not the Internet will use Nimrod (although I think something like it will eventually be where the Internet winds up), so I can't tell you exactly what the IPng requirements will be for routing, as other schemes may need different support. However, I can tell you what Nimrod needs. I will tackle the internetwork packet format first (which is simple), and then the whole issue of the interaction with the rest of the internetwork layer, which is a much more difficult topic. In speaking of the packet format, you first need to disinguish between the host-router part of the path, and the router-router part; a format that works OK for one may not do for another. The issue is complicated by the fact that Nimrod can be made to work, albeit not in optimal form, with information/fields missing from the packet in the first host-router hop. The missing information/fields can be added by the first hop router. (This capability is being used to allow deployment and operation with unmodified IPv4 hosts, although similar techniques could be used with other internetworking protocols.) Access to the full range of Nimrod capabilities will require upgrading of hosts to include the necessary information in the packets they exchange with the routers. Second, Nimrod currently has three planned forwarding modes (flows, datagram, and source-routed packets), and a format that works for one may not work for another; some modes use fields that are not used by other modes. The presence or absence of these fields will make a difference. What Nimrod would like to see in the internetworking packet is: - Source and destination EID fields. These are "shortish", fixed length fields which contain globally unique, topologically insensitive identifiers for endpoints (if you aren't familiar with endpoints, think of them as hosts). A length of at least 48 bits, absolute minimum, is needed for each of these; we would strongly recommend 64. (IPv4 will be able to operate with smaller ones for a while, but eventually either need a new packet format, or the horrendous kludgery known as Network Address Translators to allow these fields to be only locally unique.) - A globally unique flow-id. This *must not* use one of the two previous EID fields, as in datagram mode (and probably source-routed mode as well) it will be over-written during transit of the network. (The flow is also not identified using the EID's, since, again, datagram mode will not work if you do.) It could most easily constructed by adding an EID to a locally unique flow-id; the latter should be at least 12 bits absolute minimum (which would be my "out of thin air" guess), but we would strongly recommend a minimum of 16; I would recommend 32. - A hop-count. This has to be more than 8 bits; I would strongly recommend at least 12, and recommend 16 (to make it easy to update). This is not to say that I think networks with diameters larger than 256 are good, or that we should design such nets, but I think limiting the maximum path through the network to 256 hops is likely to bite us down the road the same way making "infinity" 16 in RIP did. When we hit that ceiling, it's going to hurt, and there won't be an easy fix. I will note in passing that we are already seeing paths lengths of over 30 hops. - Optional source and destination locators. These are structured, variable length items which are topologically sensitive identifiers for the place in the network to which the traffic is destined. The smallest maximum length supported should be a minimum of 32 bytes per locator, and longer would be even better; I would recommend 256 bytes per locator. - Paired with the above, an optional pointer into the locators. This is "forwarding state" (i.e. state in the packet which records something about its progress across the network) which is used in the datagram forwarding mode to ensure that the packet does not loop. It needs to be large enough to identify locations in either locator; e.g. if locators can be up to 256 bytes, it would need to be 9 bits. - An optional source route. This are used to support the "source routed packet" forwarding mode. Although not designed in detail yet, the syntax will likely look much like source routes in PIP; in Nimrod they will be a sequence of Nimrod entity identifiers, along with clues as to the context in which each identifier is to be interpreted (e.g. up, down, across, etc). Since those identifiers themselves are variable length (although probably most will be two bytes or less, otherwise the routing overhead inside the named object would be excessive), and the hop count above contemplates the possibility of paths of over 256 hops, it would seem that these might possibly some day exceed 512 bytes, if a lengthy path was specified in terms of the actual physical assets used. - Paired with the above, an optional pointer into the source route. This is also "forwarding state". It needs to be large enough to identify locations anywhere in the source route; e.g. if the source router can be up to 1024 bytes, it would need to be 10 bits. - An internetwork header length. I mention this since the above fields could easily exceed 256 bytes, if they are to all be carried in the internetwork header (see comments below as to where to carry all this information), the header length field needs to be more than 8 bits; I recommend 16 bits. As noted above, it's possible to use Nimrod in a limited mode where needed information/fields are added by the first-hop router. It's thus useful to ask "which of the fields must be present in the host-router header, and which could be added by the router?" The only one which is absolutely necessary in all packets are the EID's (provided that some means is available to map EID's into locators). As to the others, if the user wishes to use flows, and wants to guarantee that their packets are assigned to the correct flows, the flow-id field is needed. If the user wishes efficient datagram mode, it's probably wise to include the locators in the packet sent to the router. If the user wishes to specify the route for the packets, and does not wish to set up a flow, they need to include the source route. How would additional information/fields be added to the packet? This question is complex, since all the IPng candidates (and in fact, any reasonable inter-networking protocol) are extensible protocols; those extension mechanisms could be used. Also, it would possible to carry some of the required information as user data in the internetworking packet, with the original user's data encapsulated further inside. Finally, a private inter-router packet format could be defined. It's not clear which path is best, but we can talk about which fields the Nimrod routers need access to, and how often; less used ones could be placed in harder-to-get-to locations (such as in an encapsulated header). The fields to which the routers need access on every hop are the flow-id and the hop-count. The locator/pointer fields are only needed at intervals (in what datagram forarding mode calls "active" routers), as is the source route (the latter at every object which is named in the source route). Depending on how access control is done, and which forwarding mode is used, the EID's and/or locators might be examined for access control purposes, wherever that function is performed. This is not a complete exploration of the topic, but should give a rough idea of what's going on.   Received: from PIZZA.BBN.COM by BBN.COM id aa14737; 7 Apr 94 17:57 EDT Received: from pizza by PIZZA.BBN.COM id aa07663; 7 Apr 94 17:41 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa07659; 7 Apr 94 17:39 EDT Received: from inet-gw-3.pa.dec.com by BBN.COM id aa13089; 7 Apr 94 17:27 EDT Received: from nacto1.nacto.lkg.dec.com by inet-gw-3.pa.dec.com (5.65/21Mar94) id AA22750; Thu, 7 Apr 94 14:21:24 -0700 Received: from sneezy.nacto.lkg.dec.com by nacto1.nacto.lkg.dec.com with SMTP id AA01086; Thu, 7 Apr 1994 17:20:59 -0400 Received: by sneezy.nacto.lkg.dec.com (5.65/4.7) id AA06459; Thu, 7 Apr 1994 17:20:58 -0400 To: nimrod-wg@BBN.COM Subject: Re: Nimrod IPng technical requirements text In-Reply-To: <9404071736.AA03490@ginger.lcs.mit.edu> References: <9404071736.AA03490@ginger.lcs.mit.edu> X-Mailer: Poste 2.1 From: David R Oran Date: Thu, 7 Apr 94 17:02:22 -0400 Message-Id: <940407170222.4941@sneezy.nacto.lkg.dec.com.thomas> Encoding: 67 TEXT, 6 TEXT SIGNATURE > This raises an interesting question. We can have a multi-drop flow, but how > do you name the set of things the multi-cast flow is delivering to (i.e. the > multi-cast group)? It can't be a locator, right? It has to be an EID, the only > other kind of name we've got. This kind of tends to blow a hole in the > definition of an "endpoint" as a fate-sharing region, though... > No it doesn't have to be an EID. There's no reason why multicast groups need to (or should) share a namespace with EIDs. There are a few possibilities: a) completely separate semantics AND namespace b) separate semantics with the namespace shared between EIDs and MCGroups (this is what OSI did with multicast NSAPs - useful if the same packet field is likely to carry one or the other, but it isn't often necessary to carry both) c) EIDs and MCGroups are indistinguishable to Routers, but the participating hosts can tell. d) EIDs and MCGroups are identical. See below for more discussion of why it MIGHT matter which of these is chosen. > That does tie into something Bob Braden said the other day, which is that it's > useful to think more about multi-cast applications, where the concept of > "critical state" is far less useful. Maybe these are two facets of the same > tihing. > This is true only for multicast applications with minimal "Best effort" delivery semantics. Multicast with causal or total ordering properties certainly DO have critical state! On the other hand, if we very carefully restrict ourselves to only network layer discussions (which may be obvious to you and me, but possibly not to all readers), and further agree that the network layer is not responsible for anything other than best-effort multicast delivery service, then I'm inclined to agree with Bob. > Anyway, that would mean that we have EID's for multicast groups, and, > moreover, that a single endpoint can have more than one EID. (It would be > useful to be able to tell from looking at an EID whether it names a group, or > a single endpoint; we can use the "top bit" hack for that.) > This makes me VERY nervous. If we share EID semantics between individual endpoints and multicast groups, then you can't answer the question "which EIDs are the current receivers of for this multicast group". You *could* answer the question "which locators are associated with this multicast group", but then if you consider *mobile* group members then the membership mapping changes when the host machine moves. Now, from a pragmatic viewpoint I'm not sure any of this matters terribly much since you may never want to know state simultaneously about groups, endpoint-group-participants, and locators of those participants, but history tells me that people err too frequently on the side of collapsing concepts which should be kept separate and later discover that an important degree of freedom or level of indirection has been compromised. I give as an example the discussions around whether EIDs are needed as well as addresses, when the prior architecture used one identifier for both functions! > So, I think that resolves an old open point about whether endpoints and EID's > are in one-one correspondance... of course, there's still the issue of whether > a single endpoint can have more than one non-multicast EID. > > Does this all sound OK? > No. Not yet. Dave. Dave. -+-+-+-+-+-+-+ David R. Oran Phone: + 1 508 486-7377 Digital Equipment Corporation Fax: + 1 508 486-5279 LKG 1-2/A19 Email: oran@lkg.dec.com 550 King Street Littleton, MA 01460   Received: from PIZZA.BBN.COM by BBN.COM id aa15698; 7 Apr 94 18:22 EDT Received: from pizza by PIZZA.BBN.COM id aa08008; 7 Apr 94 18:09 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa08004; 7 Apr 94 18:07 EDT Received: from inet-gw-3.pa.dec.com by BBN.COM id aa14890; 7 Apr 94 18:00 EDT Received: from nacto1.nacto.lkg.dec.com by inet-gw-3.pa.dec.com (5.65/21Mar94) id AA24971; Thu, 7 Apr 94 14:53:10 -0700 Received: from sneezy.nacto.lkg.dec.com by nacto1.nacto.lkg.dec.com with SMTP id AA02254; Thu, 7 Apr 1994 17:52:44 -0400 Received: by sneezy.nacto.lkg.dec.com (5.65/4.7) id AA06504; Thu, 7 Apr 1994 17:52:43 -0400 To: nimrod-wg@BBN.COM Subject: Re: Nimrod IPng technical requirements text In-Reply-To: <9404071736.AA03490@ginger.lcs.mit.edu> References: <9404071736.AA03490@ginger.lcs.mit.edu> X-Mailer: Poste 2.1 From: David R Oran Date: Thu, 7 Apr 94 17:02:22 -0400 Message-Id: <940407170222.4941@sneezy.nacto.lkg.dec.com.thomas> Encoding: 67 TEXT, 6 TEXT SIGNATURE > This raises an interesting question. We can have a multi-drop flow, but how > do you name the set of things the multi-cast flow is delivering to (i.e. the > multi-cast group)? It can't be a locator, right? It has to be an EID, the only > other kind of name we've got. This kind of tends to blow a hole in the > definition of an "endpoint" as a fate-sharing region, though... > No it doesn't have to be an EID. There's no reason why multicast groups need to (or should) share a namespace with EIDs. There are a few possibilities: a) completely separate semantics AND namespace b) separate semantics with the namespace shared between EIDs and MCGroups (this is what OSI did with multicast NSAPs - useful if the same packet field is likely to carry one or the other, but it isn't often necessary to carry both) c) EIDs and MCGroups are indistinguishable to Routers, but the participating hosts can tell. d) EIDs and MCGroups are identical. See below for more discussion of why it MIGHT matter which of these is chosen. > That does tie into something Bob Braden said the other day, which is that it's > useful to think more about multi-cast applications, where the concept of > "critical state" is far less useful. Maybe these are two facets of the same > tihing. > This is true only for multicast applications with minimal "Best effort" delivery semantics. Multicast with causal or total ordering properties certainly DO have critical state! On the other hand, if we very carefully restrict ourselves to only network layer discussions (which may be obvious to you and me, but possibly not to all readers), and further agree that the network layer is not responsible for anything other than best-effort multicast delivery service, then I'm inclined to agree with Bob. > Anyway, that would mean that we have EID's for multicast groups, and, > moreover, that a single endpoint can have more than one EID. (It would be > useful to be able to tell from looking at an EID whether it names a group, or > a single endpoint; we can use the "top bit" hack for that.) > This makes me VERY nervous. If we share EID semantics between individual endpoints and multicast groups, then you can't answer the question "which EIDs are the current receivers of for this multicast group". You *could* answer the question "which locators are associated with this multicast group", but then if you consider *mobile* group members then the membership mapping changes when the host machine moves. Now, from a pragmatic viewpoint I'm not sure any of this matters terribly much since you may never want to know state simultaneously about groups, endpoint-group-participants, and locators of those participants, but history tells me that people err too frequently on the side of collapsing concepts which should be kept separate and later discover that an important degree of freedom or level of indirection has been compromised. I give as an example the discussions around whether EIDs are needed as well as addresses, when the prior architecture used one identifier for both functions! > So, I think that resolves an old open point about whether endpoints and EID's > are in one-one correspondance... of course, there's still the issue of whether > a single endpoint can have more than one non-multicast EID. > > Does this all sound OK? > No. Not yet. Dave. Dave. -+-+-+-+-+-+-+ David R. Oran Phone: + 1 508 486-7377 Digital Equipment Corporation Fax: + 1 508 486-5279 LKG 1-2/A19 Email: oran@lkg.dec.com 550 King Street Littleton, MA 01460   Received: from PIZZA.BBN.COM by BBN.COM id aa17981; 7 Apr 94 19:38 EDT Received: from pizza by PIZZA.BBN.COM id aa08471; 7 Apr 94 19:25 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa08467; 7 Apr 94 19:23 EDT Received: from wd40.ftp.com by BBN.COM id aa17197; 7 Apr 94 19:09 EDT Received: from ftp.com by ftp.com ; Thu, 7 Apr 1994 19:09:36 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Thu, 7 Apr 1994 19:09:36 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA01928; Thu, 7 Apr 94 19:08:44 EDT Date: Thu, 7 Apr 94 19:08:44 EDT Message-Id: <9404072308.AA01928@mailserv-D.ftp.com> To: jnc@ginger.lcs.mit.edu Subject: Re: Nimrod IPng technical requirements text From: Frank Kastenholz Reply-To: kasten@ftp.com Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu Content-Length: 1527 > This caused an immediate fault. Locators are the names of objects in the > Nimrod map; you can't *have* a multicast locator! locators name places.... there need not be an object there :-) >This raises an interesting question. We can have a multi-drop flow, but how >do you name the set of things the multi-cast flow is delivering to (i.e. the >multi-cast group)? It can't be a locator, right? It has to be an EID, the only >other kind of name we've got. This kind of tends to blow a hole in the >definition of an "endpoint" as a fate-sharing region, though... how do we name flows? is there another object in nimrodland which needs naming? i.e. we have names for things (eids), names for places (locators), i would imagine that a there also has to be a 'path' which is followed to get from one place to another -- i.e. flows. if you take a somewhat object-oriented approach to things then i would imagine that these paths have many different attributes, among them things like network qos needed, security goop, perhaps the source and destination places. if an attribute is just an attribute, then a path can have many attributes, it can have many destinations, possibly even many sources. there certainly may be optimizations to be made for certain, common, cases. but we should optimize only when we get the general principles right. or is it getting late in the day and are my brain cells starting to go to sleep? -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa18316; 7 Apr 94 19:51 EDT Received: from pizza by PIZZA.BBN.COM id aa08567; 7 Apr 94 19:35 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa08563; 7 Apr 94 19:33 EDT Received: from wd40.ftp.com by BBN.COM id aa17507; 7 Apr 94 19:22 EDT Received: from ftp.com by ftp.com ; Thu, 7 Apr 1994 19:22:09 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Thu, 7 Apr 1994 19:22:09 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA01996; Thu, 7 Apr 94 19:21:15 EDT Date: Thu, 7 Apr 94 19:21:15 EDT Message-Id: <9404072321.AA01996@mailserv-D.ftp.com> To: oran@nacto.lkg.dec.com Subject: Re: Nimrod IPng technical requirements text From: Frank Kastenholz Reply-To: kasten@ftp.com Cc: nimrod-wg@BBN.COM Content-Length: 748 small nit...... >>Anyway, that would mean that we have EID's for multicast groups, and, >>moreover, that a single endpoint can have more than one EID. ... > Now, from a pragmatic viewpoint I'm not sure any of this matters terribly > much since you may never want to know state simultaneously about > groups, endpoint-group-participants, and locators of those participants, accounting and security and the 'are you allowed to see this? have you bought it?" lawyer types would definitely want to know this sort of stuff, in some fashion. if we assume that the internet will 'go commercial' then these interests will need to be catered to. grrrrrrrrr. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa19464; 7 Apr 94 20:26 EDT Received: from pizza by PIZZA.BBN.COM id aa08887; 7 Apr 94 20:12 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa08883; 7 Apr 94 20:10 EDT Received: from uucp4.netcom.com by BBN.COM id aa19002; 7 Apr 94 20:08 EDT Received: from localhost by netcomsv.netcom.com with UUCP (8.6.4/SMI-4.1) id QAA08422; Thu, 7 Apr 1994 16:51:31 -0700 Received: from cc:Mail UUCPLINK 2.0 by metrico.metricom.com id 9403077657.AA765758582 Thu, 07 Apr 94 15:43:02 Date: Thu, 07 Apr 94 15:43:02 From: Greg_Campbell@metrico.metricom.com Message-Id: <9403077657.AA765758582@metrico.metricom.com> To: kasten@ftp.com Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu Content-Length: 1527 Subject: cc:Mail UUCPLINK 2.0 Undeliverable Message User metrico!rfox is not defined Original text follows ----------------------------------------- Received: by ccmail Received: from netcomsv by metricom.com (UUPC/extended 1.11) with UUCP; Thu, 07 Apr 1994 15:40:11 PDT Received: from PIZZA.BBN.COM by netcomsv.netcom.com with SMTP (8.6.4/SMI-4.1) id QAA15788; Thu, 7 Apr 1994 16:41:56 -0700 Received: from pizza by PIZZA.BBN.COM id aa08471; 7 Apr 94 19:25 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa08467; 7 Apr 94 19:23 EDT Received: from wd40.ftp.com by BBN.COM id aa17197; 7 Apr 94 19:09 EDT Received: from ftp.com by ftp.com ; Thu, 7 Apr 1994 19:09:36 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Thu, 7 Apr 1994 19:09:36 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA01928; Thu, 7 Apr 94 19:08:44 EDT Date: Thu, 7 Apr 94 19:08:44 EDT Message-Id: <9404072308.AA01928@mailserv-D.ftp.com> To: jnc@ginger.lcs.mit.edu Subject: Re: Nimrod IPng technical requirements text From: Frank Kastenholz X-ccAdmin: metricom@netcomsv Reply-To: kasten@ftp.com Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu Content-Length: 1527 > This caused an immediate fault. Locators are the names of objects in the > Nimrod map; you can't *have* a multicast locator! locators name places.... there need not be an object there :-) >This raises an interesting question. We can have a multi-drop flow, but how >do you name the set of things the multi-cast flow is delivering to (i.e. the >multi-cast group)? It can't be a locator, right? It has to be an EID, the only >other kind of name we've got. This kind of tends to blow a hole in the >definition of an "endpoint" as a fate-sharing region, though... how do we name flows? is there another object in nimrodland which needs naming? i.e. we have names for things (eids), names for places (locators), i would imagine that a there also has to be a 'path' which is followed to get from one place to another -- i.e. flows. if you take a somewhat object-oriented approach to things then i would imagine that these paths have many different attributes, among them things like network qos needed, security goop, perhaps the source and destination places. if an attribute is just an attribute, then a path can have many attributes, it can have many destinations, possibly even many sources. there certainly may be optimizations to be made for certain, common, cases. but we should optimize only when we get the general principles right. or is it getting late in the day and are my brain cells starting to go to sleep? -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa20280; 7 Apr 94 20:54 EDT Received: from pizza by PIZZA.BBN.COM id aa09081; 7 Apr 94 20:39 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa09077; 7 Apr 94 20:37 EDT Received: from uucp4.netcom.com by BBN.COM id aa19802; 7 Apr 94 20:36 EDT Received: from localhost by netcomsv.netcom.com with UUCP (8.6.4/SMI-4.1) id RAA00866; Thu, 7 Apr 1994 17:16:47 -0700 Received: from cc:Mail UUCPLINK 2.0 by metrico.metricom.com id 9403077657.AA765759668 Thu, 07 Apr 94 16:01:08 Date: Thu, 07 Apr 94 16:01:08 From: Greg_Campbell@metrico.metricom.com Message-Id: <9403077657.AA765759668@metrico.metricom.com> To: kasten@ftp.com Cc: nimrod-wg@BBN.COM Content-Length: 748 Subject: cc:Mail UUCPLINK 2.0 Undeliverable Message User metrico!rfox is not defined Original text follows ----------------------------------------- Received: by ccmail Received: from netcomsv by metricom.com (UUPC/extended 1.11) with UUCP; Thu, 07 Apr 1994 15:57:43 PDT Received: from PIZZA.BBN.COM by netcomsv.netcom.com with SMTP (8.6.4/SMI-4.1) id QAA18794; Thu, 7 Apr 1994 16:56:22 -0700 Received: from pizza by PIZZA.BBN.COM id aa08567; 7 Apr 94 19:35 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa08563; 7 Apr 94 19:33 EDT Received: from wd40.ftp.com by BBN.COM id aa17507; 7 Apr 94 19:22 EDT Received: from ftp.com by ftp.com ; Thu, 7 Apr 1994 19:22:09 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Thu, 7 Apr 1994 19:22:09 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA01996; Thu, 7 Apr 94 19:21:15 EDT Date: Thu, 7 Apr 94 19:21:15 EDT Message-Id: <9404072321.AA01996@mailserv-D.ftp.com> To: oran@nacto.lkg.dec.com Subject: Re: Nimrod IPng technical requirements text From: Frank Kastenholz X-ccAdmin: metricom@netcomsv Reply-To: kasten@ftp.com Cc: nimrod-wg@BBN.COM Content-Length: 748 small nit...... >>Anyway, that would mean that we have EID's for multicast groups, and, >>moreover, that a single endpoint can have more than one EID. ... > Now, from a pragmatic viewpoint I'm not sure any of this matters terribly > much since you may never want to know state simultaneously about > groups, endpoint-group-participants, and locators of those participants, accounting and security and the 'are you allowed to see this? have you bought it?" lawyer types would definitely want to know this sort of stuff, in some fashion. if we assume that the internet will 'go commercial' then these interests will need to be catered to. grrrrrrrrr. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa24641; 8 Apr 94 22:06 EDT Received: from pizza by PIZZA.BBN.COM id aa16868; 8 Apr 94 21:50 EDT Received: from lotus.com by PIZZA.BBN.COM id aa16864; 8 Apr 94 21:48 EDT Received: from Mail.Lotus.com (crd.lotus.com) by lotus.com (4.1/SMI-4.1) id AA02062; Fri, 8 Apr 94 21:50:21 EDT Received: by Mail.Lotus.com (4.1/SMI-4.1-DNI) id AA08703; Fri, 8 Apr 94 21:54:24 EDT Date: Fri, 8 Apr 94 21:54:24 EDT From: Robert_Ullmann.LOTUS@crd.lotus.com Message-Id: <9404090154.AA08703@Mail.Lotus.com> Received: by DniMail (v1.0); Fri Apr 8 21:54:21 1994 EDT To: unixml: ;, lotus.com@crd.lotus.com MMDF-Warning: Parse error in original version of preceding line at PIZZA.BBN.COM Subject: hop limit Hi, Keep in mind that the hop limit is a log scale number. While it is entirely expected to see the number of hops rise from ~16 to ~30 as the number of connected hosts goes from a few hunded to a few million, it isn't reasonable to then expect it to go anywhere near 256. The empirical formula seems to be max hops (the diameter, in some sense) is 2 times base 2 log of number of hosts. The model is that the worst case is a walk all the way up some hierarchy, and then all the way down some other path. The usual routes are always equal to or better than that. To get 256, we would need approximately 10^40 hosts. This is a big number. (10,000,000,000,000,000,000,000,000,000,000,000,000,000 :-) If you assumed each hop offered a branchiness of at least two, 256 hops would let you reach ~10^80 hosts, or something more than the number of neutrons in the observable universe (10^78, if I remember correctly :-) Best Regards, Robert   Received: from PIZZA.BBN.COM by BBN.COM id aa18130; 9 Apr 94 0:10 EDT Received: from pizza by PIZZA.BBN.COM id aa17310; 8 Apr 94 23:56 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa17306; 8 Apr 94 23:55 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa17749; 8 Apr 94 23:54 EDT Received: by ginger.lcs.mit.edu id AA19162; Fri, 8 Apr 94 23:54:28 -0400 Date: Fri, 8 Apr 94 23:54:28 -0400 From: Noel Chiappa Message-Id: <9404090354.AA19162@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: hop limit Cc: jnc@ginger.lcs.mit.edu Keep in mind that the hop limit is a log scale number. With a fairly evenly distributed random graph, this is true. However, I don't think this formula will apply to the network, since people tend to build all kinds of uneven meshes in practise. I think that you may recall the debate in January with Masataka Ohta, in which he proposed that the best model for the network was a planar graph, in which the average path length (and diameter) go as sqrt(N), not log(N). You can't both be right! At that time, I took a positition that the average would tend toward log(N). However, my personal guess here is that due the non-even nature of real-world networks, I think that while the average will be closer to log(N), the worst case could be pretty bad. I don't think there's necessarily a contradiction between my previous position, and my position here. We have to distinguish between average, and worst case. In fairly even graphs, there's not a lot of variance in the ratio of the average path length to the worst case (i.e. the diameter), for any given size. (As a side-point, my intuition says that in such fairly even graphs, as the graph gets larger, the average path length will asymptote to the diameter, since the longer paths will get you to a far larger % of the large number of total nodes, so the contributions of the shorter paths to the average will diminish. Anyone know if this is right?) Anyway, I think it's reasonable to guess that the worst case will be a lot worse than log(N), since, due to the non-even nature of real world networks, we will probably see a fair amount of variance between the average path length, and the worst case. Real-world experience shows this is accurate, at least so far. E.g., path lengths of more than 16 inside *regionals* (which is why RIP stopped working, and this *did* happen), and a reported path of more than 30 in the Internet about a year ago. In neither case was this anything like theoretical diameter of a graph with the appropriate number of nodes. While it is entirely expected to see the number of hops rise from ~16 to ~30 as the number of connected hosts goes from a few hunded to a few million, it isn't reasonable to then expect it to go anywhere near 256. The empirical formula seems to be max hops (the diameter, in some sense) is 2 times base 2 log of number of hosts. ... To get 256, we would need approximately 10^40 hosts. ... least two, 256 hops would let you reach ~10^80 hosts Real-world experience, as above, shows that this formula does not apply to the worst case. We may get closer to that as the network gets larger (the variance from the theoretical average is likely to decline), but I think we would be *very* unwise to run that chance; it's one I don't want take. I'd personally take almost any bet that we'll see path lengths of larger than 256 before 2020, which is within the expected lifetime of IPng. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa21149; 9 Apr 94 2:10 EDT Received: from pizza by PIZZA.BBN.COM id aa17756; 9 Apr 94 1:58 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa17752; 9 Apr 94 1:56 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa20643; 9 Apr 94 1:55 EDT Received: by ginger.lcs.mit.edu id AA19968; Sat, 9 Apr 94 01:55:00 -0400 Date: Sat, 9 Apr 94 01:55:00 -0400 From: Noel Chiappa Message-Id: <9404090555.AA19968@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: Nimrod IPng technical requirements text Cc: jnc@ginger.lcs.mit.edu how do we name flows? With flow-id's... is there another object in nimrodland which needs naming? I can't think of one offhand, but my brain is pretty run down... i would imagine that a there also has to be a 'path' which is followed to get from one place to another -- i.e. flows. Hmm. Will we need to name paths, separate from flows? Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa22006; 9 Apr 94 2:44 EDT Received: from pizza by PIZZA.BBN.COM id aa17923; 9 Apr 94 2:33 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa17919; 9 Apr 94 2:31 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa21670; 9 Apr 94 2:31 EDT Received: by ginger.lcs.mit.edu id AA20355; Sat, 9 Apr 94 02:31:10 -0400 Date: Sat, 9 Apr 94 02:31:10 -0400 From: Noel Chiappa Message-Id: <9404090631.AA20355@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Multicast group names Cc: jnc@ginger.lcs.mit.edu >> Why not call it a set of EIDs and give it a set-ID? > I assume there's pretty much a one-one mapping between the > concept of "multicast group" and the concept of "set of endpoints", > right? Right. Or maybe it's a one-one mapping with a "flow"? I.e. a set of EIDs that do not share a fate but DO share routing and QOS state in the network? No, a flow (either unicast or multicast) to me has the general meaning of a path through the network with some user requirement info (all stored as non-critical state in routers) attached to it. This is a whole different thing from simply the destination(s) of a flow (which is what EID's and SID's are); you can have several different flows to the same destination(s), for example. > Does everything work OK if the set-ID's (SID's) come from the same > namespace as EID's (perhaps differentiated by the high bit, or > something)? Is there any reason to draw them from a different namespace? Well, a set can be identified by enumerating its elements, so the long-form of the set-ID is a listing of its constituent EIDs. ... Of course you prefer to have an alias for this long name, and this alias could be assigned the way you mention. You'd want that alias for sticking the packet headers; the long form wouldn't be practical for groups of any size at all. Technically, isn't using the high bit to differentiate an SID from an EID the same thing as splitting the namespace in half? The only concern is keeping enough space for EIDs. Yes, we keep the syntax the same, but split the semantics, as Dave Oran pointed out. I dunno, maybe it's a SID if the top N bits are one, or something, but the principle's the same: SID's are drawn from the same namespace as EID's, but you can tell just from looking at one whether it's an SID or an EID. Of course, they name totally different sorts of thing, too. Does this sound like the right thing to everyone? > Also, do packets being sent to SID's look just like packets destined to > EID's? I.e., except for the different kind of destinaion "name", is there > any different information which needs to be carried to be useful, etc? Since you may be treating state in the network differently for SIDs, there may extra info. Well, I dunno. I'm not sure the routers will know anything about SID's and EID's; they will deal with flows. Perhaps some of the multicast group maintainence mechanisms (and to the extent that we have distributed spanning tree computations, those as well) will need to know about SID's? Maybe not, and they can just work with the multicast flow-ids.... Have to think about that. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa23848; 11 Apr 94 6:58 EDT Received: from pizza by PIZZA.BBN.COM id aa01851; 11 Apr 94 6:37 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa01847; 11 Apr 94 6:34 EDT Received: from necom830.cc.titech.ac.jp by BBN.COM id aa23166; 11 Apr 94 6:32 EDT Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Mon, 11 Apr 94 19:25:59 +0900 From: Masataka Ohta Return-Path: Message-Id: <9404111026.AA22168@necom830.cc.titech.ac.jp> Subject: Re: hop limit To: Noel Chiappa Date: Mon, 11 Apr 94 19:25:58 JST Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu In-Reply-To: <9404090354.AA19162@ginger.lcs.mit.edu>; from "Noel Chiappa" at Apr 8, 94 11:54 pm X-Mailer: ELM [version 2.3 PL11] > I think that you may recall the debate in January with Masataka Ohta, in which > he proposed that the best model for the network was a planar graph, in which > the average path length (and diameter) go as sqrt(N), not log(N). You can't > both be right! After thinking about the problem, I have noticed that the problem relates to a flat-rate distance. Between nodes located within the flat-rate distance, topology can be (but not necessarily is) truely random, in which case the hop count scales O(log(N)). Beyond the distance, it scales O(sqrt(N)). In Japan, the distance is 15Km. Not so large. Considering that the maximum arc length on the Earth is 20,000Km, we need hop count of 1300. > At that time, I took a positition that the average would tend toward log(N). You are assuming tree-like topology here, not a random graph nor mesh. That is, your position is biased with the current fact that a small number of T3 routers can handle all the backbone traffic and the backbone is mostly tree structured rather than full mesh. Such topology scales O(log(N)). As network traffic increases, we need a lot of routers handling global traffic, in which case we do need mesh of routers and hop count will be close to O((size of the Earth)/(flat-rate distance)). Of course, if we pay extra money to connect distant routers, the required maximum hop count becomes smaller. Such extra connections are also necessary to reduce the load of routers. But, wiring does cost and the improvement is propotional to the cose paid. So, I must concludes that the maximum hop count of 255 is unsafe. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa12607; 11 Apr 94 10:59 EDT Received: from pizza by PIZZA.BBN.COM id aa03059; 11 Apr 94 10:44 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa03055; 11 Apr 94 10:41 EDT Received: from wd40.ftp.com by BBN.COM id aa11319; 11 Apr 94 10:39 EDT Received: from ftp.com by ftp.com ; Mon, 11 Apr 1994 10:39:10 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Mon, 11 Apr 1994 10:39:10 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA29191; Mon, 11 Apr 94 10:38:11 EDT Date: Mon, 11 Apr 94 10:38:11 EDT Message-Id: <9404111438.AA29191@mailserv-D.ftp.com> To: jnc@ginger.lcs.mit.edu Subject: Re: Nimrod IPng technical requirements text From: Frank Kastenholz Reply-To: kasten@ftp.com Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu Content-Length: 758 > how do we name flows? > > With flow-id's... Well, at the end of my note I said: > or is it getting late in the day and are my brain cells starting to go > to sleep? I guess it was late and my brain cells were asleep.... However, being reminded that those things what I was arguing that we needed to name are FLOWS, it would seem that my original thoughts were sort of on line. IF FLOWS have their own name space, i.e. there is a separate, distinct, unique, FLOWID in the packet, and that name is not 'derived' from the source/dest EIDs of the packet, then we could view the endpoints of the flow as attributes of the flow, just as things like qos. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa15343; 11 Apr 94 11:35 EDT Received: from pizza by PIZZA.BBN.COM id aa03294; 11 Apr 94 11:14 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa03290; 11 Apr 94 11:12 EDT Received: from uucp9.netcom.com by BBN.COM id aa13471; 11 Apr 94 11:11 EDT Received: from localhost by netcomsv.netcom.com with UUCP (8.6.4/SMI-4.1) id IAA29839; Mon, 11 Apr 1994 08:11:33 -0700 Received: from cc:Mail UUCPLINK 2.0 by metrico.metricom.com id 9403117660.AA766072762 Mon, 11 Apr 94 06:59:22 Date: Mon, 11 Apr 94 06:59:22 From: Greg_Campbell@metrico.metricom.com Message-Id: <9403117660.AA766072762@metrico.metricom.com> To: kasten@ftp.com Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu Content-Length: 758 Subject: cc:Mail UUCPLINK 2.0 Undeliverable Message User metrico!rfox is not defined Original text follows ----------------------------------------- Received: by ccmail Received: from netcomsv by metricom.com (UUPC/extended 1.11) with UUCP; Mon, 11 Apr 1994 06:57:46 PDT Received: from PIZZA.BBN.COM by netcomsv.netcom.com with SMTP (8.6.4/SMI-4.1) id IAA09628; Mon, 11 Apr 1994 08:02:08 -0700 Received: from pizza by PIZZA.BBN.COM id aa03059; 11 Apr 94 10:44 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa03055; 11 Apr 94 10:41 EDT Received: from wd40.ftp.com by BBN.COM id aa11319; 11 Apr 94 10:39 EDT Received: from ftp.com by ftp.com ; Mon, 11 Apr 1994 10:39:10 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Mon, 11 Apr 1994 10:39:10 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA29191; Mon, 11 Apr 94 10:38:11 EDT Date: Mon, 11 Apr 94 10:38:11 EDT Message-Id: <9404111438.AA29191@mailserv-D.ftp.com> To: jnc@ginger.lcs.mit.edu Subject: Re: Nimrod IPng technical requirements text From: Frank Kastenholz X-ccAdmin: metricom@netcomsv Reply-To: kasten@ftp.com Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu Content-Length: 758 > how do we name flows? > > With flow-id's... Well, at the end of my note I said: > or is it getting late in the day and are my brain cells starting to go > to sleep? I guess it was late and my brain cells were asleep.... However, being reminded that those things what I was arguing that we needed to name are FLOWS, it would seem that my original thoughts were sort of on line. IF FLOWS have their own name space, i.e. there is a separate, distinct, unique, FLOWID in the packet, and that name is not 'derived' from the source/dest EIDs of the packet, then we could view the endpoints of the flow as attributes of the flow, just as things like qos. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa19073; 11 Apr 94 12:34 EDT Received: from pizza by PIZZA.BBN.COM id aa03658; 11 Apr 94 12:18 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa03654; 11 Apr 94 12:16 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa17711; 11 Apr 94 12:11 EDT Received: by ginger.lcs.mit.edu id AA06931; Mon, 11 Apr 94 12:11:05 -0400 Date: Mon, 11 Apr 94 12:11:05 -0400 From: Noel Chiappa Message-Id: <9404111611.AA06931@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: hop limit Cc: jnc@ginger.lcs.mit.edu After thinking about the problem, I have noticed that the problem relates to a flat-rate distance. Between nodes located within the flat-rate distance, topology can be (but not necessarily is) truely random, in which case the hop count scales O(log(N)). Beyond the distance, it scales O(sqrt(N)). If I understand you correctly, you are saying that i) in local regions of the network (e.g. a city), the graph of the network will be probably be (to a reasonable approximation) random, but that ii) when you look at the network at a high level (e.g. global), the graph will will probably be (to a reasonable approximation) planar? I think there is something to this; I think the network graph will be more randomly connected locally. However, I'm still not sure that, at the global level, it will be close to the planar end of the random<->planar spectrum. It's just too easy to make non-planar links, and I think they act very quickly to reduce the diameter toward the size predicted for random graphs. Actually, if someone feel really energetic, they could try creating a model, and see how quickly the diameter of the graph changes, from the value predicted for planar graphs, to the value predicted for random graphs. I think it's probably easier to do this with statistical models, than by mathematical analysis. > At that time, I took a positition that the average would tend toward > log(N). You are assuming tree-like topology here, not a random graph nor mesh. That is, your position is biased with the current fact that a small number of T3 routers can handle all the backbone traffic and the backbone is mostly tree structured rather than full mesh. Not really, I don't think. The model I was working with was closer to yours. There would definitely be a hierarchy of carriers and links, but, like the road network, it would still be a mesh.. Of course, if we pay extra money to connect distant routers, the required maximum hop count becomes smaller. Such extra connections are also necessary to reduce the load of routers. But, wiring does cost and the improvement is propotional to the cose paid. Good point. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa21319; 11 Apr 94 13:11 EDT Received: from pizza by PIZZA.BBN.COM id aa03851; 11 Apr 94 12:53 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa03847; 11 Apr 94 12:51 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa19784; 11 Apr 94 12:45 EDT Received: by ginger.lcs.mit.edu id AA07385; Mon, 11 Apr 94 12:45:11 -0400 Date: Mon, 11 Apr 94 12:45:11 -0400 From: Noel Chiappa Message-Id: <9404111645.AA07385@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: Nimrod IPng technical requirements text Cc: jnc@ginger.lcs.mit.edu Move the "flag" that this is a multicast EID out of the EID and make it a real flag bit. Advantages are 1) this doubles the name space rather than halving it, 2) there is a specific flag that specifies the semantics (which has implications for future extension), 3) you can assign multicast ids through some central authority simply by incrementing a counter. Hmm. 1) I don't think is that important. For 2), I'd assume that we'd assign the same semantics to the top bit if we did it that way, and if there's only one bit, it's kind of hard to do extended semantics. (Maybe the lesson here is that if the top bit is 0, it's an EID, if the top bits are 1111, it's a SID, and other valuesa re reserved for future use?) For 3), can't you do the same even with the encoding scheme I just proposed. Disadvantages are ... this may have implications for source routing (does it make sense to source route through a multicast?) I didn't think source routes were going to contain EID's/SID's anyway. I assumed they were going to contain locators (and locator elements, for compact expression; you don't need the whole locator anyway, just an up/across/down bit and the element). It could be reasonable to have a multicast group that anyone may join or leave at random. The other extreme is the group that is completely controlled from a central location. It would be nice if the creation of a multicast group could carry with it information about the security level of the multicast. Good point! My goal would be to allow the router to do EID addition/removal to the multi-cast group in those situations that are entirely open. Right, but we need a more general mechanism that will allow users to do the group entry control. Since their policies may not be publicly stateable, it has to be like route selection; capable of being moved totally outside. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa22176; 11 Apr 94 13:25 EDT Received: from pizza by PIZZA.BBN.COM id aa04079; 11 Apr 94 13:13 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa04075; 11 Apr 94 13:11 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa21064; 11 Apr 94 13:06 EDT Received: by ginger.lcs.mit.edu id AA07766; Mon, 11 Apr 94 13:06:31 -0400 Date: Mon, 11 Apr 94 13:06:31 -0400 From: Noel Chiappa Message-Id: <9404111706.AA07766@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: Nimrod IPng technical requirements text Cc: jnc@ginger.lcs.mit.edu we could view the endpoints of the flow as attributes of the flow, just as things like qos. I don't think the internetwork is relaly going to know *all* the attributes of a flow; there may be policy stuff only the source knows, etc. I'll have to think about whether it's useful to think of the endpoints of a flow as attributes of the flow. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa23423; 11 Apr 94 13:46 EDT Received: from pizza by PIZZA.BBN.COM id aa04213; 11 Apr 94 13:29 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa04209; 11 Apr 94 13:28 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa22054; 11 Apr 94 13:23 EDT Received: by ginger.lcs.mit.edu id AA08248; Mon, 11 Apr 94 13:23:50 -0400 Date: Mon, 11 Apr 94 13:23:50 -0400 From: Noel Chiappa Message-Id: <9404111723.AA08248@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: hop limit Cc: jnc@ginger.lcs.mit.edu I read once that the diameter of human population was pretty small. ... The surprising fact is that according to this metric the distance is said to be pretty small - typically around 5. I have no hard proof, though. I've heard this too. However, humans are typically very richly connected. So, maybe in the log(N) thing, the base of the log is some function of the average connectivity of the nodes. Thus, a graph with very richly connected nodes (on average) will have a smaller diameter than a graph with sparsely connected nodes. Good point.... Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa24500; 11 Apr 94 14:02 EDT Received: from pizza by PIZZA.BBN.COM id aa04327; 11 Apr 94 13:46 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa04323; 11 Apr 94 13:44 EDT Received: from wd40.ftp.com by BBN.COM id aa22972; 11 Apr 94 13:37 EDT Received: from ftp.com by ftp.com ; Mon, 11 Apr 1994 13:37:09 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Mon, 11 Apr 1994 13:37:09 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA01948; Mon, 11 Apr 94 13:36:14 EDT Date: Mon, 11 Apr 94 13:36:14 EDT Message-Id: <9404111736.AA01948@mailserv-D.ftp.com> To: jnc@ginger.lcs.mit.edu Subject: Re: Nimrod IPng technical requirements text From: Frank Kastenholz Reply-To: kasten@ftp.com Cc: nimrod-wg@BBN.COM Content-Length: 859 >I'll have to think about whether it's useful to think of the endpoints of a >flow as attributes of the flow. If you do then multicast and unicast are the same thing. However, there may be a security aspect here. If flows are set up ala RSVP's 'receiver pull' mechanism, then you and I could be engaged in a one-to-one conversation and any random person could join into the flow and they could read our conversation. Every flow would need some sort of admission control attributes in order to prevent this sort of evesdropping. If flows are set up in a 'transmitter push' mechanism then one node could be in a position to 'force' a flow onto another node -- but this is not really any worse than a node, today, just sending packets to arbitrary IP addresses. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa26527; 11 Apr 94 14:33 EDT Received: from pizza by PIZZA.BBN.COM id aa04555; 11 Apr 94 14:13 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa04551; 11 Apr 94 14:11 EDT Received: from bridge2.NSD.3Com.COM by BBN.COM id aa24843; 11 Apr 94 14:07 EDT Received: from remmel.NSD.3Com.COM by bridge2.NSD.3Com.COM with SMTP id AA20943 (5.65c/IDA-1.4.4nsd for ); Mon, 11 Apr 1994 11:07:36 -0700 Received: from localhost.NSD.3Com.COM by remmel.NSD.3Com.COM with SMTP id AA28977 (5.65c/IDA-1.4.4-910725); Mon, 11 Apr 1994 11:07:35 -0700 Message-Id: <199404111807.AA28977@remmel.NSD.3Com.COM> To: Noel Chiappa Cc: nimrod-wg@BBN.COM Subject: Re: hop limit In-Reply-To: Your message of "Mon, 11 Apr 94 13:23:50 EDT." <9404111723.AA08248@ginger.lcs.mit.edu> Date: Mon, 11 Apr 94 11:07:33 -0700 From: tracym@nsd.3com.com > I read once that the diameter of human population was pretty small. ... > The surprising fact is that according to this metric the distance is said > to be pretty small - typically around 5. I have no hard proof, though. > > I've heard this too. However, humans are typically very richly connected. So, > maybe in the log(N) thing, the base of the log is some function of the average > connectivity of the nodes. Thus, a graph with very richly connected nodes (on > average) will have a smaller diameter than a graph with sparsely connected > nodes. Good point.... I'd heard a number more like 10 (in grade school), but I'd guess that the "typically" is important. If you step back and only allow connections that have existed in the last month or three, then there may well be strings of outliers that increase the actual maximum diameter greatly. It is easy to imagine strung-out serial topologies of all kinds that shouldn't be precluded. Tracy   Received: from PIZZA.BBN.COM by BBN.COM id aa28537; 11 Apr 94 15:04 EDT Received: from pizza by PIZZA.BBN.COM id aa04802; 11 Apr 94 14:46 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa04797; 11 Apr 94 14:44 EDT Received: from inet-gw-1.pa.dec.com by BBN.COM id aa26946; 11 Apr 94 14:39 EDT Received: from nacto1.nacto.lkg.dec.com by inet-gw-1.pa.dec.com (5.65/21Mar94) id AA18515; Mon, 11 Apr 94 11:34:09 -0700 Received: from sneezy.nacto.lkg.dec.com by nacto1.nacto.lkg.dec.com with SMTP id AA28799; Mon, 11 Apr 1994 14:33:30 -0400 Received: by sneezy.nacto.lkg.dec.com (5.65/4.7) id AA08497; Mon, 11 Apr 1994 14:31:55 -0400 To: Noel Chiappa Cc: nimrod-wg@BBN.COM Subject: Re: hop limit In-Reply-To: <9404111723.AA08248@ginger.lcs.mit.edu> References: <9404111723.AA08248@ginger.lcs.mit.edu> X-Mailer: Poste 2.1 From: David R Oran Date: Mon, 11 Apr 94 14:31:54 -0400 Message-Id: <940411143154.4941@sneezy.nacto.lkg.dec.com.thomas> Encoding: 52 TEXT, 6 TEXT SIGNATURE > I read once that the diameter of human population was pretty small. ... > The surprising fact is that according to this metric the distance is said > to be pretty small - typically around 5. I have no hard proof, though. > > I've heard this too. However, humans are typically very richly connected. So, > maybe in the log(N) thing, the base of the log is some function of the average > connectivity of the nodes. Thus, a graph with very richly connected nodes (on > average) will have a smaller diameter than a graph with sparsely connected > nodes. Good point.... > There's a famous paper on this, but it wasn't for the whole human population. I wish I remember the name of the phenomenon... Basically the study started with a mathematician "x" (after who the metric was named) and looked at who wrote a paper with him. Any of those mathemeticians were at distance 1. Then they looked at mathematicians who wrote a paper with someone who wrote a paper with "x". They were at distance 2. etc. At distance 7 there were all published mathematicians were in the set. In all this discussion, I'm surprised that nobody has used the international phone system as the analogue. The maximum diameter of the phone system today is (I think) 8 (but maybe still 7), going perhaps to 9 by the year 2010. That's for 2*10**8 phones. Now, it might not be safe to assume that the "information superhighway", combined with all its dirt roads, will evolve like the phone system, but it seems unlikely that a world-wide internet would function reasonably at all at diameters much larger than 20 or so. For one thing, the store-and-forward delays would completely fry any real-time traffic. If bits were the only problem here, we shouldn't argue about an extra byte for hop count. On the other hand, by having a large hop dynamic range, you have an interesting connundrum: If you count down, what do you start at? Today it doesn't matter too much since the hop count is really only there to stamp out looping packets and *not* enforce MPL. With a 16 bit hop count, packets can loop for long enough to consume LOTS of resources. Unfortunately, there's no motivation for host system manglers to set the hop count to a reasonable number in a count-down system. A count-up system, is even worse - remember DECnet PhaseIII. A compromise might be to allocate 16 bits but in host requirements set an absolute upper limit for now of 255. On the other hand, I'm not sure I'd want to work on a network with a diameter over 255. Can you spell "message-switching?" Dave. -+-+-+-+-+-+-+ David R. Oran Phone: + 1 508 486-7377 Digital Equipment Corporation Fax: + 1 508 486-5279 LKG 1-2/A19 Email: oran@lkg.dec.com 550 King Street Littleton, MA 01460   Received: from PIZZA.BBN.COM by BBN.COM id aa13045; 11 Apr 94 18:50 EDT Received: from pizza by PIZZA.BBN.COM id aa06370; 11 Apr 94 18:36 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa06366; 11 Apr 94 18:34 EDT Received: from uucp6.netcom.com by BBN.COM id aa12397; 11 Apr 94 18:32 EDT Received: from localhost by netcomsv.netcom.com with UUCP (8.6.4/SMI-4.1) id PAA10792; Mon, 11 Apr 1994 15:23:37 -0700 Received: from cc:Mail UUCPLINK 2.0 by metrico.metricom.com id 9403117660.AA766098820 Mon, 11 Apr 94 14:13:40 Date: Mon, 11 Apr 94 14:13:40 From: Greg_Campbell@metrico.metricom.com Message-Id: <9403117660.AA766098820@metrico.metricom.com> To: kasten@ftp.com Cc: nimrod-wg@BBN.COM Content-Length: 859 Subject: cc:Mail UUCPLINK 2.0 Undeliverable Message User metrico!rfox is not defined Original text follows ----------------------------------------- Received: by ccmail Received: from netcomsv by metricom.com (UUPC/extended 1.11) with UUCP; Mon, 11 Apr 1994 14:11:22 PDT Received: from PIZZA.BBN.COM by netcomsv.netcom.com with SMTP (8.6.4/SMI-4.1) id PAA00875; Mon, 11 Apr 1994 15:09:54 -0700 Received: from pizza by PIZZA.BBN.COM id aa04327; 11 Apr 94 13:46 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa04323; 11 Apr 94 13:44 EDT Received: from wd40.ftp.com by BBN.COM id aa22972; 11 Apr 94 13:37 EDT Received: from ftp.com by ftp.com ; Mon, 11 Apr 1994 13:37:09 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Mon, 11 Apr 1994 13:37:09 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA01948; Mon, 11 Apr 94 13:36:14 EDT Date: Mon, 11 Apr 94 13:36:14 EDT Message-Id: <9404111736.AA01948@mailserv-D.ftp.com> To: jnc@ginger.lcs.mit.edu Subject: Re: Nimrod IPng technical requirements text From: Frank Kastenholz X-ccAdmin: metricom@netcomsv Reply-To: kasten@ftp.com Cc: nimrod-wg@BBN.COM Content-Length: 859 >I'll have to think about whether it's useful to think of the endpoints of a >flow as attributes of the flow. If you do then multicast and unicast are the same thing. However, there may be a security aspect here. If flows are set up ala RSVP's 'receiver pull' mechanism, then you and I could be engaged in a one-to-one conversation and any random person could join into the flow and they could read our conversation. Every flow would need some sort of admission control attributes in order to prevent this sort of evesdropping. If flows are set up in a 'transmitter push' mechanism then one node could be in a position to 'force' a flow onto another node -- but this is not really any worse than a node, today, just sending packets to arbitrary IP addresses. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa16678; 12 Apr 94 1:09 EDT Received: from pizza by PIZZA.BBN.COM id aa07887; 12 Apr 94 0:56 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa07883; 12 Apr 94 0:54 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa16179; 12 Apr 94 0:54 EDT Received: by ginger.lcs.mit.edu id AA13261; Tue, 12 Apr 94 00:54:22 -0400 Date: Tue, 12 Apr 94 00:54:22 -0400 From: Noel Chiappa Message-Id: <9404120454.AA13261@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: hop limit Cc: jnc@ginger.lcs.mit.edu it seems unlikely that a world-wide internet would function reasonably at all at diameters much larger than 20 or so. For one thing, the store-and-forward delays would completely fry any real-time traffic. Hmm, not sure I quite see this. I know how to build a switch which, running at a modest clock rate of 10 Mhz, will switch a 40 byte packet in 4 usec. Assuming a 100 Mbit/sec network, add 4 usec input time, and 4 usec output time (assuming idle interfaces). That gives us 12 usec or so, at least for small packets. (Larger packes get real complex, as depending on relative input and output speeds, you may be able to do "cut through" routing, and overlap input and output times...) Now, the circumference of the earth is about 40K kilometres, so speed of light (300K miles per second) delay half way 'round (20K Km) is 65 msec. So, 1000 hops at 12 usec per would add 12 msec, or less than a fifth of the (rather inevitable :-) minimum propogation delay.... On the other hand, by having a large hop dynamic range, you have an interesting connundrum: If you count down, what do you start at? Today it doesn't matter too much since the hop count is really only there to stamp out looping packets ... With a 16 bit hop count, packets can loop for long enough to consume LOTS of resources. Unfortunately, there's no motivation for host system manglers to set the hop count to a reasonable number in a count-down system. A count-up system, is even worse Big-I talked about this topic a while back, and decided the "right" thing was to have the host find out what number to stick in there from the routers. As a first cut, the routers would give back the diameter of the network, times a safety factor. This could still allow a certain amount of looping if the diameter gets big. If you want to really make it tight, have the router return a number which is the path length to the destination, time a safety factor.. I'd be happy with the first, at least with Nimrod, since hopefully looping packets would be Really Rare, and if the mechanism to catch them is not really efficient, it's not the end of the world. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa17209; 12 Apr 94 1:27 EDT Received: from pizza by PIZZA.BBN.COM id aa07976; 12 Apr 94 1:14 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa07972; 12 Apr 94 1:12 EDT Received: from necom830.cc.titech.ac.jp by BBN.COM id aa16780; 12 Apr 94 1:11 EDT Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 12 Apr 94 14:06:14 +0900 From: Masataka Ohta Return-Path: Message-Id: <9404120506.AA27165@necom830.cc.titech.ac.jp> Subject: Re: hop limit To: Noel Chiappa Date: Tue, 12 Apr 94 14:06:12 JST Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu In-Reply-To: <9404111611.AA06931@ginger.lcs.mit.edu>; from "Noel Chiappa" at Apr 11, 94 12:11 pm X-Mailer: ELM [version 2.3 PL11] > I think there is something to this; I think the network graph will be more > randomly connected locally. However, I'm still not sure that, at the global > level, it will be close to the planar end of the random<->planar spectrum. > It's just too easy to make non-planar links, and I think they act very quickly > to reduce the diameter toward the size predicted for random graphs. True. It is quite easy if the only purpose is the reduction of the diameter. Add negligibly small number of long distance links. That's all. Then, all the long distance communication will use that small number of links. Thus, the link will be overloaded. > You are assuming tree-like topology here, not a random graph nor mesh. > That is, your position is biased with the current fact that a small > number of T3 routers can handle all the backbone traffic and the > backbone is mostly tree structured rather than full mesh. This part explains it. > Not really, I don't think. You do think so, at least partially. As I have pointed out several times already, You haven't paid enough attention to link load concentration issues everywhere in NIMROD specification. > The model I was working with was closer to yours. I hope so. > Of course, if we pay extra money to connect distant routers, the required > maximum hop count becomes smaller. Such extra connections are also > necessary to reduce the load of routers. But, wiring does cost and the > improvement is propotional to the cose paid. > > Good point. What? If you can understand this part, you should be able to have understood the whole issue. Strange. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa17996; 12 Apr 94 1:57 EDT Received: from pizza by PIZZA.BBN.COM id aa08082; 12 Apr 94 1:44 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa08078; 12 Apr 94 1:42 EDT Received: from necom830.cc.titech.ac.jp by BBN.COM id aa17660; 12 Apr 94 1:42 EDT Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 12 Apr 94 14:35:52 +0859 From: Masataka Ohta Return-Path: Message-Id: <9404120536.AA27352@necom830.cc.titech.ac.jp> Subject: Re: hop limit To: David R Oran Date: Tue, 12 Apr 94 14:35:50 JST Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM In-Reply-To: <940411143154.4941@sneezy.nacto.lkg.dec.com.thomas>; from "David R Oran" at Apr 11, 94 2:31 pm X-Mailer: ELM [version 2.3 PL11] > was named) and looked at who wrote a paper with him. Any of > those mathemeticians were at distance 1. Then they looked at > mathematicians who wrote a paper with someone who wrote a paper with > "x". They were at distance 2. etc. > > At distance 7 there were all published mathematicians were in the set. I'm afraid you ignore mathematicians who never wrote co-authored paper. Anyway, if you want to solve material mail routing problem between mathematicians do it with PTT, not here. > In all this discussion, I'm surprised that nobody has used the > international phone system as the analogue. Because it can't be an analogue. Moreover, as we have real model, we don't need any analogue. > The maximum diameter > of the phone system today is (I think) 8 (but maybe still 7), > going perhaps to 9 by the year 2010. That's for 2*10**8 phones. Phone system today aggragates a lot of 64Kbps communication into 2.4Gbps backbone, which is not the case of the future internet. Or are you saisffied with UUCP over 14,400 bps modem forever? > Now, it might not be safe to assume that the "information superhighway", That's a terrible assumption. > combined with all its dirt roads, will evolve like the phone system, but > it seems unlikely that a world-wide internet would function reasonably > at all at diameters much larger than 20 or so. For one thing, the > store-and-forward delays would completely fry any real-time traffic. Use ATM, in a way people in ATM Forum never imagined. See draft-ohta-ip-over-atm-00.txt on how you can do cell-by-cell relaying on routers. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa22240; 12 Apr 94 12:12 EDT Received: from pizza by PIZZA.BBN.COM id aa10432; 12 Apr 94 11:53 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa10428; 12 Apr 94 11:51 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa20581; 12 Apr 94 11:49 EDT Received: by ginger.lcs.mit.edu id AA16418; Tue, 12 Apr 94 11:48:55 -0400 Date: Tue, 12 Apr 94 11:48:55 -0400 From: Noel Chiappa Message-Id: <9404121548.AA16418@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: hop limit Cc: jnc@ginger.lcs.mit.edu > It's just too easy to make non-planar links, and I think they act very > quickly to reduce the diameter toward the size predicted for random > graphs. It is quite easy if the only purpose is the reduction of the diameter. Add negligibly small number of long distance links. That's all. Then, all the long distance communication will use that small number of links. Thus, the link will be overloaded. I have two reactions. First, if the charging policy is at all related to real traffic loads, the extra revenue from all that traffic should enable you to put more capacity in place. Slowly things will stabilize with the number of non-planar long-distance links which are needed to handle the long-distance traffic. Second, I'm not sure what your model for traffic distribution is, but my model is that there's probably going to be an inverse relationship between the distance between two communicating nodes, and the amount of traffic. This says to me that it's perfectly OK to have a graph which is not as thoroughly connected at the long-distance scale as it is locally, since there will be relatively less long-distance traffic than local. You haven't paid enough attention to link load concentration issues everywhere in NIMROD specification. Load concentrations are things I worry about a lot, but I think there are lots of good reasons to think that the coming information infrastructure will be enough of a mesh to minimize massive hot-spots. The same technology and economic trends that are driving supercomputers toward lots of parallel, relatively slow, machines, will operate in networking. Also, lots of parallel links and switches will produce a more robust infrastructure. You did have a good point that we have to make sure the routing will scale well in a system that looks like this, but this issue gets looked at a lot, now that you have raised it, and I think techniques like high-level virtual links will allow us to reduce the complexity of the high-level map, without losing the ability to spread the load across the multitude of parallel real physical links which make up that high-level virtual link. >> Of course, if we pay extra money to connect distant routers, the >> required maximum hop count becomes smaller. Such extra connections are >> also necessary to reduce the load of routers. But, wiring does cost and >> the improvement is propotional to the cose paid. > Good point. What? If you can understand this part, you should be able to have understood the whole issue. Strange. I thought that any system in which the improvement is proportional to the cost is a pretty good system. If the users want better service, they pay more money, and what they get for their money is proportional to the money they spend. Sounds good to me... Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa11658; 12 Apr 94 16:36 EDT Received: from pizza by PIZZA.BBN.COM id aa12076; 12 Apr 94 16:19 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa12072; 12 Apr 94 16:15 EDT Received: from quern.epilogue.com by BBN.COM id aa09348; 12 Apr 94 16:08 EDT To: nimrod-wg@BBN.COM Subject: comments on draft routing architecture document Date: Tue, 12 Apr 94 16:08:20 EDT From: dab@epilogue.com Sender: dab@epilogue.com Message-ID: <9404121608.aa23761@quern.epilogue.com> Now that I'm back home for a day or two it's time to write up some comments I had on the Nimrod Routing Architecture document. Isidro asked me to write up something on bottom up locators, that'll follow along a little later. These are just the easy comments. In section "1.1 Constraints", constraint 1, you write that the Internet "will retain general organization of backbone, regional, and local networks". One of the big wins of Nimrod to my mind was that it didn't require this sort of organization of the network. I think that if we give people an internetworking layer that can only work in this manner then that's the structure the network will take. I don't believe that's the best structure for the network. I also think that if we start with that assumption then we'll develop an internetworking layer that provides only that ability. Look at the geographic vs provider addressing debates to see what results. In constraint 7 you write that "the frequency at which an entity moves is usually inversely proportional to the size of the entity, e.g., individual hosts are likley to move around more frequently than entire networks". While this may be true it's an average over the entire network and I'm not sure it's useful. For a given host or network it may be quite likely to move around. For instance, the probability of the network on an aircraft carrier moving is very high. I believe that we'll need mechanisms that make it as easy as possible for networks to move as well as hosts. Then in section "5. Renumbering" you say "Because renumbering will, most likely, be infrequent and carefully planned ...". I don't think I believe this premiss. For mobile networks I expect renumbering to be quite frequent though perhaps planned. I also expect the network to require renumbering once and a while. Not as a carefully planned thing necessarily but just because the net's getting bigger. Depending on how the locators are done, this could require renumbering large parts of the network that really have no connection with whereever is forcing the renumbering. In a sufficiently large network, such renumbering requests could be very frequent. I'd suggest obviously that we do the numbering in such a way as to avoid this problem if we can but I'd say that the statement that it will be infrequent is premature until we know how we're doing the numbering. Dave Bridgham   Received: from PIZZA.BBN.COM by BBN.COM id aa14373; 13 Apr 94 7:07 EDT Received: from pizza by PIZZA.BBN.COM id aa15609; 13 Apr 94 6:49 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa15605; 13 Apr 94 6:48 EDT Received: from mitsou.inria.fr by BBN.COM id aa13637; 13 Apr 94 6:48 EDT Received: by mitsou.inria.fr (5.65c8/IDA-1.2.8) id AA23988; Wed, 13 Apr 1994 12:49:45 +0200 Message-Id: <199404131049.AA23988@mitsou.inria.fr> To: Masataka Ohta Cc: nimrod-wg@BBN.COM Subject: Re: hop limit In-Reply-To: Your message of "Wed, 13 Apr 1994 19:09:22 +0200." <9404131009.AA03675@necom830.cc.titech.ac.jp> Date: Wed, 13 Apr 1994 12:49:44 +0200 From: Christian Huitema => My model? => => Some amount of communication will be within a city. => => Most communication will be within a single economic unit such => as a country, EC or North America. => => There will be some small, but not negligible, amount of truly global => traffic. => This is the classic telecommunication model. But are you sure that it will remain valid in the long run? What strikes all observers of the Internet is the "global village" effect: e.g. I exchange this mail with you, although we are located in different countries, seperated by a rather large distance. On the hop limit per se: the reason why we have a hop limit in the packets at all is loop protection, i.e. making sure that if a loop exists the packet will roll at most "N" times before it is killed. Obviously, this is not very helpful if the maximum number of hops is very large. One would then have to use innovative techniques, e.g. count of traversed networks.. Christian Huitema   Received: from PIZZA.BBN.COM by BBN.COM id aa13143; 13 Apr 94 6:29 EDT Received: from pizza by PIZZA.BBN.COM id aa15441; 13 Apr 94 6:16 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa15437; 13 Apr 94 6:15 EDT Received: from necom830.cc.titech.ac.jp by BBN.COM id aa12716; 13 Apr 94 6:15 EDT Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 13 Apr 94 19:09:23 +0900 From: Masataka Ohta Return-Path: Message-Id: <9404131009.AA03675@necom830.cc.titech.ac.jp> Subject: Re: hop limit To: Noel Chiappa Date: Wed, 13 Apr 94 19:09:22 JST Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu In-Reply-To: <9404121548.AA16418@ginger.lcs.mit.edu>; from "Noel Chiappa" at Apr 12, 94 11:48 am X-Mailer: ELM [version 2.3 PL11] > > It's just too easy to make non-planar links, and I think they act very > > quickly to reduce the diameter toward the size predicted for random > > graphs. > > It is quite easy if the only purpose is the reduction of the diameter. > Add negligibly small number of long distance links. That's all. > Then, all the long distance communication will use that small number > of links. Thus, the link will be overloaded. > > I have two reactions. First, if the charging policy is at all related to real > traffic loads, the extra revenue from all that traffic should enable you to > put more capacity in place. Slowly things will stabilize with the number of > non-planar long-distance links which are needed to handle the long-distance > traffic. Wrong. It will make the cost so high that networking will die. But, don't mind. It won't be a case. Vendors with larger amount of routers (and, thus, larger hop count) can offer cheaper service and others will be kicked out from the market. > Second, I'm not sure what your model for traffic distribution is, but > my model is that there's probably going to be an inverse relationship between > the distance between two communicating nodes, and the amount of traffic. My model? Some amount of communication will be within a city. Most communication will be within a single economic unit such as a country, EC or North America. There will be some small, but not negligible, amount of truly global traffic. We should investigate how today's telephone traffic pattern is. But, at the same time, we should think that people in the future will behave much more globally. > This > says to me that it's perfectly OK to have a graph which is not as thoroughly > connected at the long-distance scale as it is locally, since there will be > relatively less long-distance traffic than local. To me, it is perfectly OK to have a graph which is not directly connected at the long-distance scale. > You haven't paid enough attention to link load concentration issues > everywhere in NIMROD specification. > > Load concentrations are things I worry about a lot, but I think there are lots > of good reasons to think that the coming information infrastructure will be > enough of a mesh to minimize massive hot-spots. You don't understand the load concentration issue. Your goal is wrong from the beginning. To minimize the massive hot-spots, we should have a rooted tree topology and there is only a single but very hot spot, which melts the network. The proper goal is to increase the number of hot spots, which makes the spots colder. > The same technology and > economic trends that are driving supercomputers toward lots of parallel, > relatively slow, machines, will operate in networking. You misunderstand supercomputers. On supercomputers, minimizing latency is an important goal. But making system large costs super linearly. MPP costs a lot more than the cost of components. For communication between distant locations, no one can expect so little latency because of the speed of light is not fast enough. Your approach is economically infeasible. > Also, lots of parallel > links and switches will produce a more robust infrastructure. Parallel links or parallel processors share their fates too much of the time that it is not robust. For robustness, we should geographically distribute them. > You did have a good point that we have to make sure the routing will scale > well in a system that looks like this, but this issue gets looked at a lot, And gets overlooked a lot. > now that you have raised it, and I think techniques like high-level virtual > links will allow us to reduce the complexity of the high-level map, without > losing the ability to spread the load across the multitude of parallel real > physical links which make up that high-level virtual link. High-level virtual links will help to reduce hop counts, if its applicable. But, can you imagine some cases when such links are not available? For example, how can you establish such links? > I thought that any system in which the improvement is proportional to the cost > is a pretty good system. If the users want better service, they pay more > money, and what they get for their money is proportional to the money they > spend. Sounds good to me... People don't want to pay more money when it is avoidable. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa25150; 13 Apr 94 10:14 EDT Received: from pizza by PIZZA.BBN.COM id aa16444; 13 Apr 94 10:00 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa16440; 13 Apr 94 9:57 EDT Received: from [131.112.4.4] by BBN.COM id aa23952; 13 Apr 94 9:55 EDT Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 13 Apr 94 22:49:24 +0900 From: Masataka Ohta Return-Path: Message-Id: <9404131349.AA04503@necom830.cc.titech.ac.jp> Subject: Re: hop limit To: Christian Huitema Date: Wed, 13 Apr 94 22:49:22 JST Cc: nimrod-wg@BBN.COM In-Reply-To: <199404131049.AA23988@mitsou.inria.fr>; from "Christian Huitema" at Apr 13, 94 12:49 pm X-Mailer: ELM [version 2.3 PL11] > => My model? > => > => Some amount of communication will be within a city. > => > => Most communication will be within a single economic unit such > => as a country, EC or North America. > => > => There will be some small, but not negligible, amount of truly global > => traffic. > => > > This is the classic telecommunication model. But are you sure that it will > remain valid in the long run? What strikes all observers of the Internet is > the "global village" effect: e.g. I exchange this mail with you, although we > are located in different countries, seperated by a rather large distance. I have completely agreed with you already. :We should investigate how today's telephone traffic pattern is. But, at :the same time, we should think that people in the future will behave :much more globally. > On the hop limit per se: the reason why we have a hop limit in the packets at > all is loop protection, i.e. making sure that if a loop exists the packet will > roll at most "N" times before it is killed. Obviously, this is not very > helpful if the maximum number of hops is very large. Currently, maximum allowed is 255. Moreover, old default value of 30 is now too small. So, I'm using 60 on important mail servers (are there any new IAB recommended value?). With such large TTL, if packets loop and multiply exponentially, TTL is meaningless. So, let's assume that packets may loop but does not multiply and the effect of large TTL is only linear. Moreover, current workstations are powerful enough that even FDDI can be saturated by a pair of them. So, if packets are sent without some handshaking, the current TTL is already too large to saturates network anyway. Fortunately, most of the protocols including TCP won't generate much packets unless handshaking succeeds. So, packets may loop but network won't be saturated, because no handshake signal will be returned. Thus, loop does not consume a lot of network bandwidth. TTL, in this case, is useful only to prevent really infinit looping of packets. Then, isn't MAXTTL of 4095 acceptable? > One would then have to > use innovative techniques, e.g. count of traversed networks.. TTL these days exactly means it. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa10022; 14 Apr 94 13:58 EDT Received: from pizza by PIZZA.BBN.COM id aa24831; 14 Apr 94 13:41 EDT Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa24827; 14 Apr 94 13:37 EDT To: nimrod-wg@BBN.COM Subject: Dave's questions Date: Thu, 14 Apr 94 13:37:27 -0400 From: Martha Steenstrup Hi Dave, Mea culpa. I'm responsible for the constraints, and so I'll try to justify them. First, a general statement about the internetworking constraints listed in the draft. These represent predictions of what the Internet will look like over the next ten years and hence the expected environment in which Nimrod would have to operate. These predictions are based on the current state of the Internet, on the growth trends observed over the recent past, and on the predictions of others in the Internet community. However, they are only predictions, not guarantees of the future Internet environment. Please, if these constraints do not constitute a reasonable and complete set, speak up so that we can fix the draft. We expect to use the constraints in two ways. The first is to define the environment in which Nimrod must work. As we are not going to be perfect predictors of the future Internet, we want Nimrod to be flexible enough to accommodate a variety of network topologies, services, and users. At the very least, Nimrod must be able to handle an internetwork in which the constraints listed in draft hold. However, the draft does not claim that Nimrod will not work in an environment in which only some of these constraints hold or that Nimrod will not work in an environment in which additional constraints hold. The second way we expect to use the constraints is in defining the common cases and hence to help in making engineering tradeoffs when we design the protocol details. For example, if we have two protocols, the first of which is very efficient for the expected common case (like the mobility constraint 7 you mentioned) and not very efficient for the rarer case and the second of which is less efficient than the first in the common case, but more efficient than the first in the rarer case, we may end up opting for the protocol which does the best in the common case. Do you think constraints 1 and 7 should be removed? or made more strict? Please let us know. Thanks, m   Received: from PIZZA.BBN.COM by BBN.COM id aa21565; 15 Apr 94 9:41 EDT Received: from pizza by PIZZA.BBN.COM id aa29903; 15 Apr 94 9:15 EDT Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa29899; 15 Apr 94 9:12 EDT To: minutes@cnri.reston.va.us, nimrod-wg@BBN.COM Subject: Nimrod IETF minutes Date: Fri, 15 Apr 94 09:09:26 -0400 From: Isidro Castineyra Seattle IETF Meeting Proceedings Routing Area Nimrod BOF Current Meeting Report Reported by: J. Noel Chiappa and Isidro Castineyra (BBN) Minutes of The New Internet Routing And Addressing Architecture BOF (NIMROD) The objective of this BOF is to design NIMROD: a hierarchical, map-based, routing architecture. Nimrod's stated purpose is to manage in a scalable fashion the trade-off between amount of information about the network and route quality. A rough draft architecture document was distributed to the group's mailing list in preparation for this meeting. The main purpose of the meeting was the review of the draft architecture document and the preparation of the workplan for the next meeting scheduled to take place during IETF30. The group met on the Tuesday and Wednesday, from 0930 to 1200. On Tuesday, Isidro Castineyra presented the contents of the draft architecture document. The presentation covered the stated objectives of Nimrod, its main features and presented an overview of its mechanisms. The following are among the issues raised by the attendees: 1. Mobility The question that was raised was whether internetworks (nodes in Nimrod parlance) are mobile. In response to this it was said that in Nimrod nodes are mobile, but that Nimrod does not propose, at this time, a mechanism to support mobility. The draft architecture suggests ways in which Nimrod can support current approaches to mobility. 2. Node expansion. In Nimrod, a node in a map can be expanded, substituting the node for its internal map. The question was raised of when should one look inside a node for more information. This question was added to the open issue list. 3. What is an endpoint. The draft says that an endpoint represents a user of the network layer---a transport layer entity. The question was if this means that TCP/UDP are two endpoints. Chiappa answered that the an entity that has an end-to-end connection is an endpoint. It was noted that the concept of entity in the draft should be better defined. 4. EIDs and ELs The draft proposes two forms of endpoint identifier: the EID (endpoint identifier), and the enpoint lables. The first one is a relativle short bit string, while the second one is more like a dns name. The question was raised whether both these forms are necessary. It was noted that though the ELs are necessary to perform a distributed look-up, they should not be part of the architecture proper. ELs can be considered a user-interface problem. 5. Multiple EIDs per endpoint The draft permits an endpoint to have more than one EID. The questions was raised whether this was necessary. It was pointed out that there is no apparent way to enforce a single EID per endpoint. 6. Arc's attributes. The draft defines maps as consisting of arcs and nodes. The arcs are latter defined to have attributes. The question is wheter it is necessary for an arc to have attributes, as it is more common to have the attributes residing in nodes. It was noted that both models have the same power of representation and that the distinction was cosmetical, but it was agreed that the next version of the draft would try to conform to the more common representation. 7. Connectivity specifications dynamics Connectivity specifications describe the capabilities of a node. The question was raised whether these specifications are dynamic---that is, whether, for example, they indicate the current load of an element of the network. It was pointed out that dynamic specification might not scale. It was also pointed out that an specification could have different parts with different degrees of dynamism, and that each part could be distributed differently. 8. Border points Nodes have border points to which arcs attached. The question was raised of why are border points necessary. It was answered that border points are used to be able to separate the internal description of a node (its intenal map) and its connection to the outside. 9. Bidirectional arcs The architecuture uses both unidirectional and multipoint arcs. The question was raised of why were bi-directional arcs not included. It was pointed out that a bidirectional arc can be represented with either unidirectional or multipoint arcs. On Wednesday a set of issues was chosen for discussion by the group: A. Arcs and nodes: different representations B. When to look inside of a node C. Dynamics of connectivity specifications. D. Workplan The group decided to continue refining the architecture document using the output of this meeting and discussions in the mailing list. Work on the protocols should also start in this period.   Received: from PIZZA.BBN.COM by BBN.COM id aa01667; 15 Apr 94 12:41 EDT Received: from pizza by PIZZA.BBN.COM id aa01300; 15 Apr 94 12:22 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa01294; 15 Apr 94 12:20 EDT Received: from quern.epilogue.com by BBN.COM id aa00354; 15 Apr 94 12:17 EDT To: msteenst@BBN.COM CC: nimrod-wg@BBN.COM In-reply-to: Martha Steenstrup's message of Thu, 14 Apr 94 13:37:27 -0400 <9404141352.aa06677@quern.epilogue.com> Subject: Dave's questions Date: Fri, 15 Apr 94 12:15:22 EDT From: dab@epilogue.com Sender: dab@epilogue.com Message-ID: <9404151215.aa13078@quern.epilogue.com> Date: Thu, 14 Apr 94 13:37:27 -0400 From: Martha Steenstrup Do you think constraints 1 and 7 should be removed? or made more strict? Please let us know. My opinion is that constraint 1 should be removed because I think it makes it too easy to come up with easy answers that later inhibit a mesh network. In fact, I'd prefer to explicitly work towards a network that assumes that cross links at the "leaves" are the norm rather than the exception. For constraint 7 I can't decide. You didn't say this but it looks like an awful easy step to go from what you did say to a design decision that makes it more difficult for nets to move than hosts. After all, it's less likely. It could end up that we simply can't do better than making mobile nets harder than mobile hosts. But I wouldn't want to start with that as a target. Dave   Received: from PIZZA.BBN.COM by BBN.COM id aa16283; 18 Apr 94 11:31 EDT Received: from pizza by PIZZA.BBN.COM id aa15648; 18 Apr 94 11:11 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa15644; 18 Apr 94 11:07 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa14948; 18 Apr 94 11:06 EDT Received: by ginger.lcs.mit.edu id AA27350; Mon, 18 Apr 94 11:06:21 -0400 Date: Mon, 18 Apr 94 11:06:21 -0400 From: Noel Chiappa Message-Id: <9404181506.AA27350@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: Dave's questions Cc: jnc@ginger.lcs.mit.edu My opinion is that constraint 1 ["The Internet ... will retain the general organizational structure of backbone, regional, and local networks"] should be removed because I think it makes it too easy to come up with easy answers that later inhibit a mesh network. In fact, I'd prefer to explicitly work towards a network that assumes that cross links at the "leaves" are the norm rather than the exception. I agree completely. In my comments (which I need to polish, and put up for review), I said "I expect the general organization style, a loose confederation of autonomous entities, will continue, and the RA must be flexible enough to support this, while still scaling". We have to build something that can support a network constructed like that, as something of a random mesh (although it won't be totally a random mesh, obviously). For constraint 7 ["The frequency at which an entity moves is usually inversely proportional to the size of the entity"] I can't decide. You didn't say this but it looks like an awful easy step to go from what you did say to a design decision that makes it more difficult for nets to move than hosts. After all, it's less likely. It could end up that we simply can't do better than making mobile nets harder than mobile hosts. But I wouldn't want to start with that as a target. I tend to agree that we shouldn't limit outselves, but two things give me pause. First, it seems like it's very likely to be harder (you need more mechanism to make N things do something, than just one thing). Second, I'm really wary of "second-system/kitchen-sink" disease; in fact, I'm sure many people think Nimrod already has it Big Time! (I'd agree with them, if we weren't trying to develop a routing architecture that had the flexibility to grow and change without a massive disruption to the underlying network substrate every few years.) Trying to make moving a network as easy, and important, as moving hosts is an ambitious goal that make me nervous. Anyway, I'm not saying I disagree with you here, and I think we should think about it, but I have to think about it. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa18502; 18 Apr 94 12:09 EDT Received: from pizza by PIZZA.BBN.COM id aa15945; 18 Apr 94 11:49 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa15941; 18 Apr 94 11:47 EDT Received: from ftp.com by BBN.COM id aa17116; 18 Apr 94 11:44 EDT Received: from ftp.com by ftp.com ; Mon, 18 Apr 1994 11:44:21 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Mon, 18 Apr 1994 11:44:21 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA21983; Mon, 18 Apr 94 11:43:16 EDT Date: Mon, 18 Apr 94 11:43:16 EDT Message-Id: <9404181543.AA21983@mailserv-D.ftp.com> To: jnc@ginger.lcs.mit.edu Subject: Re: Dave's questions From: Frank Kastenholz Reply-To: kasten@ftp.com Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu Content-Length: 2993 > For constraint 7 ["The frequency at which an entity moves is usually > inversely proportional to the size of the entity"] I can't decide. You > didn't say this but it looks like an awful easy step to go from what you > did say to a design decision that makes it more difficult for nets to move > than hosts. After all, it's less likely. It could end up that we simply > can't do better than making mobile nets harder than mobile hosts. But I > wouldn't want to start with that as a target. > > I tend to agree that we shouldn't limit outselves, but two things give me > pause. First, it seems like it's very likely to be harder (you need more > mechanism to make N things do something, than just one thing). > > Second, I'm really wary of "second-system/kitchen-sink" disease; in fact, I'm > sure many people think Nimrod already has it Big Time! Consider that things like mobile networks and mobile internetworks are going to probably happen with IPng. I think that Nimrod must be able to allow an entire network or internetwork to move. I do specifically see this as allowing an entire network/internetwork to move as opposed to just greater than 1 host moving (whatever the difference may be). For instance, the US Navy is working on developing networks for all of its ships, planes, and shore installations. So, if each plane is a network, and each ship is a network, then all of the planes on an aircraft carrier are each networked, connected to the carrier's net, which would be connected to one (or more) shore installations' networks. As the carrier moves around on the ocean, it will move from one shore base's area/network to another's -- as will all of the planes on the carrier's flight deck. Of course, the planes may also move around, independently of the carrier, so their networks will have to move both as a result of the carrier moving, AND as a result of the plane taking off and flying someplace else. (and it gets even worse when you consider that the Navy tends to operate its ships in battlegroups, which may represent networks and various ships and planes may be assigned to the bg, entering and later leaving the bg's network...). This also applies to the civilian world. Consider, for example, an airliner or a train with a net connection providing services to laptop computers. Or cars (automakers are looking to replace the multitude of point-to-point control wires with a network...). Finally, one can look at mobility as being an aspect of the same problem as changing providers. In each case, an element of the network's topology is detached from the net's topology and reconnected someplace else. If this problem is solved then you solve the mobility and changing provider problems. Both physical movement (mobility) and changing provider are merely reasons why the network element changes where in the topology it connects. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa14453; 19 Apr 94 2:09 EDT Received: from pizza by PIZZA.BBN.COM id aa20616; 19 Apr 94 1:54 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa20608; 19 Apr 94 1:51 EDT Received: from necom830.cc.titech.ac.jp by BBN.COM id aa13814; 19 Apr 94 1:48 EDT Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 19 Apr 94 14:40:32 +0900 From: Masataka Ohta Return-Path: Message-Id: <9404190540.AA01688@necom830.cc.titech.ac.jp> Subject: Re: Dave's questions To: kasten@ftp.com Date: Tue, 19 Apr 94 14:40:31 JST Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM In-Reply-To: <9404181543.AA21983@mailserv-D.ftp.com>; from "Frank Kastenholz" at Apr 18, 94 11:43 am X-Mailer: ELM [version 2.3 PL11] > Consider that things like mobile networks and mobile internetworks > are going to probably happen with IPng. I think that Nimrod must be > able to allow an entire network or internetwork to move. I do > specifically see this as allowing an entire network/internetwork to > move as opposed to just greater than 1 host moving (whatever the > difference may be). > > For instance, the US Navy is working on developing networks for all > of its ships, planes, and shore installations. So, if each plane is > a network, and each ship is a network, then all of the planes on an > aircraft carrier are each networked, connected to the carrier's net, > which would be connected to one (or more) shore installations' > networks. As the carrier moves around on the ocean, it will move from > one shore base's area/network to another's -- as will all of the > planes on the carrier's flight deck. Of course, the planes may also > move around, independently of the carrier, so their networks will > have to move both as a result of the carrier moving, AND as a result > of the plane taking off and flying someplace else. (and it gets even > worse when you consider that the Navy tends to operate its ships in > battlegroups, which may represent networks and various ships and > planes may be assigned to the bg, entering and later leaving the bg's > network...). So? What's wrong with RIP or OSPF? > This also applies to the civilian world. Consider, for example, an > airliner or a train with a net connection providing services to > laptop computers. Or cars (automakers are looking to replace the > multitude of point-to-point control wires with a network...). If you try to solve moiblity issue of individuals of upto 4G people now living on the Earth through routing mechanism, routing table instantly expoldes. Forget it. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa09985; 19 Apr 94 9:42 EDT Received: from pizza by PIZZA.BBN.COM id aa22263; 19 Apr 94 9:29 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa22259; 19 Apr 94 9:27 EDT Received: from wd40.ftp.com by BBN.COM id aa08840; 19 Apr 94 9:23 EDT Received: from ftp.com by ftp.com ; Tue, 19 Apr 1994 09:23:33 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Tue, 19 Apr 1994 09:23:33 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA03534; Tue, 19 Apr 94 09:22:30 EDT Date: Tue, 19 Apr 94 09:22:30 EDT Message-Id: <9404191322.AA03534@mailserv-D.ftp.com> To: mohta@necom830.cc.titech.ac.jp Subject: Re: Dave's questions From: Frank Kastenholz Reply-To: kasten@ftp.com Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM Content-Length: 2206 > > Consider that things like mobile networks and mobile internetworks > > are going to probably happen with IPng. I think that Nimrod must be > > able to allow an entire network or internetwork to move. I do > > specifically see this as allowing an entire network/internetwork to > > move as opposed to just greater than 1 host moving (whatever the > > difference may be). > So? What's wrong with RIP or OSPF? The routing protocol to use is irrelevant. The real problem is does the architecture allow only leaves of the tree to move around on the topology or does it allow entire subtrees to move around? Also, there is the re-locatoring problem -- if you assume that movement is limited to the leaves of the tree then it makes sense to allow individual leaves to ask some service "what's my new locator?". If you allow entire subtrees to move around, this might not work -- there might be thousands of leaves in the subtree that just moved -- having them all ask at one time "where am I?" might kill the local network(s) and overload various servers. You'd also have the problem of telling all these nodes that they moved... So, if you allow subtrees to move, you might want to do the relocatoring by (for example) having the routers advertise onto their local networks what the locator prefix for that subnet is, rather than having nodes, when they come up ask for the prefix. > > This also applies to the civilian world. Consider, for example, an > > airliner or a train with a net connection providing services to > > laptop computers. Or cars (automakers are looking to replace the > > multitude of point-to-point control wires with a network...). > > If you try to solve moiblity issue of individuals of upto 4G people now > living on the Earth through routing mechanism, routing table instantly > expoldes. > > Forget it. Only if the routing tables have to hold entries for all 4G people. One of the major goals of Nimrod is to allow aggregation of routes, reducing the amount of routing information that has to be kept in the routing tables at any one point in the network. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa24370; 19 Apr 94 22:40 EDT Received: from pizza by PIZZA.BBN.COM id aa27181; 19 Apr 94 22:28 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa27177; 19 Apr 94 22:25 EDT Received: from necom830.cc.titech.ac.jp by BBN.COM id aa20555; 19 Apr 94 22:23 EDT Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 20 Apr 94 11:17:18 +0900 From: Masataka Ohta Return-Path: Message-Id: <9404200217.AA05852@necom830.cc.titech.ac.jp> Subject: Re: Dave's questions To: kasten@ftp.com Date: Wed, 20 Apr 94 11:17:16 JST Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM In-Reply-To: <9404191322.AA03534@mailserv-D.ftp.com>; from "Frank Kastenholz" at Apr 19, 94 9:22 am X-Mailer: ELM [version 2.3 PL11] > > > Consider that things like mobile networks and mobile internetworks > > > are going to probably happen with IPng. I think that Nimrod must be > > > able to allow an entire network or internetwork to move. I do > > > specifically see this as allowing an entire network/internetwork to > > > move as opposed to just greater than 1 host moving (whatever the > > > difference may be). > > > So? What's wrong with RIP or OSPF? > > The routing protocol to use is irrelevant. > > The real problem is does the architecture allow only leaves of the > tree to move around on the topology or does it allow entire subtrees > to move around? Also, there is the re-locatoring problem -- Aha, I agree. Your issue is related to the fact that nimrod can not support locator change. > if you > assume that movement is limited to the leaves of the tree then it > makes sense to allow individual leaves to ask some service "what's my > new locator?". The more important question would be "what's someone else's locator?", I think. Even if there exists some protcol to ask it, it will kill the global network. So far, I don't think nimrod work in its proposed form. > Only if the routing tables have to hold entries for all 4G people. One of > the major goals of Nimrod is to allow aggregation of routes, reducing the > amount of routing information that has to be kept in the routing > tables at any one point in the network. In general, people moves at random. Around Tokyo, more than 10M people moves daily between thier home and office, where very weak correlation between locations of home and oice is seen. There are a lot of nation wide or international travellors who also travels randomly. Of course there is a backbone of transportation, but the correlation between source and distination is mostly random and no meaningful aggregation is possible. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa03918; 20 Apr 94 9:14 EDT Received: from pizza by PIZZA.BBN.COM id aa29539; 20 Apr 94 9:00 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa29535; 20 Apr 94 8:58 EDT Received: from wd40.ftp.com by BBN.COM id aa02486; 20 Apr 94 8:54 EDT Received: from ftp.com by ftp.com ; Wed, 20 Apr 1994 08:54:56 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Wed, 20 Apr 1994 08:54:56 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA17562; Wed, 20 Apr 94 08:53:53 EDT Date: Wed, 20 Apr 94 08:53:53 EDT Message-Id: <9404201253.AA17562@mailserv-D.ftp.com> To: mohta@necom830.cc.titech.ac.jp Subject: Re: Dave's questions From: Frank Kastenholz Reply-To: kasten@ftp.com Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM Content-Length: 4436 > > The real problem is does the architecture allow only leaves of the > > tree to move around on the topology or does it allow entire subtrees > > to move around? Also, there is the re-locatoring problem -- > > Aha, I agree. Your issue is related to the fact that nimrod can not > support locator change. I assume by this you mean "Nimrod can not support changing the locator of a node in the network graph"? True, in the Nimrod architecture document there is no specification of a protocol that does this. But, the document specifies an architecture, not protocols. Section 5 of the document discusses renumbering. What is broken with it? > > if you > > assume that movement is limited to the leaves of the tree then it > > makes sense to allow individual leaves to ask some service "what's my > > new locator?". > > The more important question would be "what's someone else's locator?", > I think. Even if there exists some protcol to ask it, it will kill the > global network. This is true. It is also irrelevant to my discussion. I was talking about what actions have to occur when a node(*) changes its location on the network graph. Specifically, I was dealing with the issues of how a node finds out its locator when it moves. For example, if I take my PC and move it to your network, how does my PC determine its new locator? However, my contention is that this problem is one aspect of a more general problem, how does a node(*) determine its locator under any circumstances? That is, when a node is first created it must determine its locator; when the locator for a node changes (say, I change providers, or a new level is added to the locator hierarchy) the node must learn of its new locator, and when a node moves it must learn its new locator. I believe that one mechanism can be used to solve all these problems. (*) by "node" I mean a node of the Nimrod graph. That is, a node could be a cluster which is composed of other clusters, hosts, routers, networks, etc etc. You are right in that there is also the problem of getting a 'remote' host's locator when I want to start communicating with that host. But this must happen even if hosts do not move on the network. For example, when "tri-flow.ftp.com" (my mail server) sends this mail message to "necom830.cc.titech.ac.jp" (your machine), it must establish a TCP connection to transfer the message (using SMTP). My machine must find the locator for your machine, even though neither machine is moving and neither machine is likely to move. > > Only if the routing tables have to hold entries for all 4G people. One of > > the major goals of Nimrod is to allow aggregation of routes, reducing the > > amount of routing information that has to be kept in the routing > > tables at any one point in the network. > > In general, people moves at random. Around Tokyo, more than 10M people > moves daily between thier home and office, where very weak correlation > between locations of home and oice is seen. There are a lot of nation > wide or international travellors who also travels randomly. Of course > there is a backbone of transportation, but the correlation between > source and distination is mostly random and no meaningful aggregation > is possible. The aggregation would occur at some level higher than the individual person. If both your home and your office get service from the same service provider, then aggregation could occur there. For example, if I had connectivity to my home I might get service from Nearnet. Nearnet also provides service to FTP software. So, a machine at my home might have a locator like nsfnet.nearnet.franks_home.machine. If I move that machine to my office it would then get a locator like nsfnet.nearnet.ftp_software.machine. So the aggregation would occur within Nearnet. If my home service came from PSI instead, then my home locator would be nsfnet.psi.franks_home.machine. When I move the machine to the office, then the aggregation would be at the nsfnet level. In either case, this is invisible to you in Japan since (I assume) Japan's connectivity would be from the Japanese National Backbone to the US Backbone (nsfnet). So the Japanese backbone would need to keep a route to only the nsfnet. It would not need to keep track of routes to nsfnet.nearnet or nsfnet.psi. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa17282; 20 Apr 94 12:51 EDT Received: from pizza by PIZZA.BBN.COM id aa01120; 20 Apr 94 12:33 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa01116; 20 Apr 94 12:31 EDT Received: from uucp6.netcom.com by BBN.COM id aa15859; 20 Apr 94 12:26 EDT Received: from localhost by netcomsv.netcom.com with UUCP (8.6.4/SMI-4.1) id JAA04598; Wed, 20 Apr 1994 09:08:42 -0700 Received: from cc:Mail UUCPLINK 2.0 by metrico.metricom.com id 9403207668.AA766855110 Wed, 20 Apr 94 08:18:30 Date: Wed, 20 Apr 94 08:18:30 From: Greg_Campbell@metrico.metricom.com Message-Id: <9403207668.AA766855110@metrico.metricom.com> To: kasten@ftp.com Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM Content-Length: 2206 Subject: cc:Mail UUCPLINK 2.0 Undeliverable Message User metrico!rfox is not defined Original text follows ----------------------------------------- Received: by ccmail Received: from netcomsv by metricom.com (UUPC/extended 1.11) with UUCP; Tue, 19 Apr 1994 22:45:45 PDT Received: from PIZZA.BBN.COM by netcomsv.netcom.com with SMTP (8.6.4/SMI-4.1) id GAA26150; Tue, 19 Apr 1994 06:44:42 -0700 Received: from pizza by PIZZA.BBN.COM id aa22263; 19 Apr 94 9:29 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa22259; 19 Apr 94 9:27 EDT Received: from wd40.ftp.com by BBN.COM id aa08840; 19 Apr 94 9:23 EDT Received: from ftp.com by ftp.com ; Tue, 19 Apr 1994 09:23:33 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Tue, 19 Apr 1994 09:23:33 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA03534; Tue, 19 Apr 94 09:22:30 EDT Date: Tue, 19 Apr 94 09:22:30 EDT Message-Id: <9404191322.AA03534@mailserv-D.ftp.com> To: mohta@necom830.cc.titech.ac.jp Subject: Re: Dave's questions From: Frank Kastenholz X-ccAdmin: metricom@netcomsv Reply-To: kasten@ftp.com Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM Content-Length: 2206 > > Consider that things like mobile networks and mobile internetworks > > are going to probably happen with IPng. I think that Nimrod must be > > able to allow an entire network or internetwork to move. I do > > specifically see this as allowing an entire network/internetwork to > > move as opposed to just greater than 1 host moving (whatever the > > difference may be). > So? What's wrong with RIP or OSPF? The routing protocol to use is irrelevant. The real problem is does the architecture allow only leaves of the tree to move around on the topology or does it allow entire subtrees to move around? Also, there is the re-locatoring problem -- if you assume that movement is limited to the leaves of the tree then it makes sense to allow individual leaves to ask some service "what's my new locator?". If you allow entire subtrees to move around, this might not work -- there might be thousands of leaves in the subtree that just moved -- having them all ask at one time "where am I?" might kill the local network(s) and overload various servers. You'd also have the problem of telling all these nodes that they moved... So, if you allow subtrees to move, you might want to do the relocatoring by (for example) having the routers advertise onto their local networks what the locator prefix for that subnet is, rather than having nodes, when they come up ask for the prefix. > > This also applies to the civilian world. Consider, for example, an > > airliner or a train with a net connection providing services to > > laptop computers. Or cars (automakers are looking to replace the > > multitude of point-to-point control wires with a network...). > > If you try to solve moiblity issue of individuals of upto 4G people now > living on the Earth through routing mechanism, routing table instantly > expoldes. > > Forget it. Only if the routing tables have to hold entries for all 4G people. One of the major goals of Nimrod is to allow aggregation of routes, reducing the amount of routing information that has to be kept in the routing tables at any one point in the network. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa17281; 20 Apr 94 12:51 EDT Received: from pizza by PIZZA.BBN.COM id aa01126; 20 Apr 94 12:33 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa01122; 20 Apr 94 12:31 EDT Received: from uucp6.netcom.com by BBN.COM id aa15901; 20 Apr 94 12:26 EDT Received: from localhost by netcomsv.netcom.com with UUCP (8.6.4/SMI-4.1) id JAA03600; Wed, 20 Apr 1994 09:04:57 -0700 Received: from cc:Mail UUCPLINK 2.0 by metrico.metricom.com id 9403207668.AA766853371 Wed, 20 Apr 94 07:49:31 Date: Wed, 20 Apr 94 07:49:31 From: Greg_Campbell@metrico.metricom.com Message-Id: <9403207668.AA766853371@metrico.metricom.com> To: kasten@ftp.com Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu Content-Length: 2993 Subject: cc:Mail UUCPLINK 2.0 Undeliverable Message User metrico!rfox is not defined Original text follows ----------------------------------------- Received: by ccmail Received: from netcomsv by metricom.com (UUPC/extended 1.11) with UUCP; Tue, 19 Apr 1994 21:49:03 PDT Received: from PIZZA.BBN.COM by netcomsv.netcom.com with SMTP (8.6.4/SMI-4.1) id JAA22380; Mon, 18 Apr 1994 09:12:01 -0700 Received: from pizza by PIZZA.BBN.COM id aa15945; 18 Apr 94 11:49 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa15941; 18 Apr 94 11:47 EDT Received: from ftp.com by BBN.COM id aa17116; 18 Apr 94 11:44 EDT Received: from ftp.com by ftp.com ; Mon, 18 Apr 1994 11:44:21 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Mon, 18 Apr 1994 11:44:21 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA21983; Mon, 18 Apr 94 11:43:16 EDT Date: Mon, 18 Apr 94 11:43:16 EDT Message-Id: <9404181543.AA21983@mailserv-D.ftp.com> To: jnc@ginger.lcs.mit.edu Subject: Re: Dave's questions From: Frank Kastenholz X-ccAdmin: metricom@netcomsv Reply-To: kasten@ftp.com Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu Content-Length: 2993 > For constraint 7 ["The frequency at which an entity moves is usually > inversely proportional to the size of the entity"] I can't decide. You > didn't say this but it looks like an awful easy step to go from what you > did say to a design decision that makes it more difficult for nets to move > than hosts. After all, it's less likely. It could end up that we simply > can't do better than making mobile nets harder than mobile hosts. But I > wouldn't want to start with that as a target. > > I tend to agree that we shouldn't limit outselves, but two things give me > pause. First, it seems like it's very likely to be harder (you need more > mechanism to make N things do something, than just one thing). > > Second, I'm really wary of "second-system/kitchen-sink" disease; in fact, I'm > sure many people think Nimrod already has it Big Time! Consider that things like mobile networks and mobile internetworks are going to probably happen with IPng. I think that Nimrod must be able to allow an entire network or internetwork to move. I do specifically see this as allowing an entire network/internetwork to move as opposed to just greater than 1 host moving (whatever the difference may be). For instance, the US Navy is working on developing networks for all of its ships, planes, and shore installations. So, if each plane is a network, and each ship is a network, then all of the planes on an aircraft carrier are each networked, connected to the carrier's net, which would be connected to one (or more) shore installations' networks. As the carrier moves around on the ocean, it will move from one shore base's area/network to another's -- as will all of the planes on the carrier's flight deck. Of course, the planes may also move around, independently of the carrier, so their networks will have to move both as a result of the carrier moving, AND as a result of the plane taking off and flying someplace else. (and it gets even worse when you consider that the Navy tends to operate its ships in battlegroups, which may represent networks and various ships and planes may be assigned to the bg, entering and later leaving the bg's network...). This also applies to the civilian world. Consider, for example, an airliner or a train with a net connection providing services to laptop computers. Or cars (automakers are looking to replace the multitude of point-to-point control wires with a network...). Finally, one can look at mobility as being an aspect of the same problem as changing providers. In each case, an element of the network's topology is detached from the net's topology and reconnected someplace else. If this problem is solved then you solve the mobility and changing provider problems. Both physical movement (mobility) and changing provider are merely reasons why the network element changes where in the topology it connects. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa23930; 20 Apr 94 14:44 EDT Received: from pizza by PIZZA.BBN.COM id aa01998; 20 Apr 94 14:21 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa01994; 20 Apr 94 14:19 EDT Received: from uucp5.netcom.com by BBN.COM id aa22128; 20 Apr 94 14:15 EDT Received: from localhost by netcomsv.netcom.com with UUCP (8.6.4/SMI-4.1) id KAA01936; Wed, 20 Apr 1994 10:56:15 -0700 Received: from cc:Mail UUCPLINK 2.0 by metrico.metricom.com id 9403207668.AA766862691 Wed, 20 Apr 94 10:24:51 Date: Wed, 20 Apr 94 10:24:51 From: Greg_Campbell@metrico.metricom.com Message-Id: <9403207668.AA766862691@metrico.metricom.com> To: kasten@ftp.com Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM Content-Length: 4436 Subject: cc:Mail UUCPLINK 2.0 Undeliverable Message User metrico!rfox is not defined Original text follows ----------------------------------------- Received: by ccmail Received: from netcomsv by metricom.com (UUPC/extended 1.11) with UUCP; Wed, 20 Apr 1994 09:35:02 PDT Received: from PIZZA.BBN.COM by netcomsv.netcom.com with SMTP (8.6.4/SMI-4.1) id GAA22877; Wed, 20 Apr 1994 06:16:25 -0700 Received: from pizza by PIZZA.BBN.COM id aa29539; 20 Apr 94 9:00 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa29535; 20 Apr 94 8:58 EDT Received: from wd40.ftp.com by BBN.COM id aa02486; 20 Apr 94 8:54 EDT Received: from ftp.com by ftp.com ; Wed, 20 Apr 1994 08:54:56 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Wed, 20 Apr 1994 08:54:56 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA17562; Wed, 20 Apr 94 08:53:53 EDT Date: Wed, 20 Apr 94 08:53:53 EDT Message-Id: <9404201253.AA17562@mailserv-D.ftp.com> To: mohta@necom830.cc.titech.ac.jp Subject: Re: Dave's questions From: Frank Kastenholz X-ccAdmin: metricom@netcomsv Reply-To: kasten@ftp.com Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM Content-Length: 4436 > > The real problem is does the architecture allow only leaves of the > > tree to move around on the topology or does it allow entire subtrees > > to move around? Also, there is the re-locatoring problem -- > > Aha, I agree. Your issue is related to the fact that nimrod can not > support locator change. I assume by this you mean "Nimrod can not support changing the locator of a node in the network graph"? True, in the Nimrod architecture document there is no specification of a protocol that does this. But, the document specifies an architecture, not protocols. Section 5 of the document discusses renumbering. What is broken with it? > > if you > > assume that movement is limited to the leaves of the tree then it > > makes sense to allow individual leaves to ask some service "what's my > > new locator?". > > The more important question would be "what's someone else's locator?", > I think. Even if there exists some protcol to ask it, it will kill the > global network. This is true. It is also irrelevant to my discussion. I was talking about what actions have to occur when a node(*) changes its location on the network graph. Specifically, I was dealing with the issues of how a node finds out its locator when it moves. For example, if I take my PC and move it to your network, how does my PC determine its new locator? However, my contention is that this problem is one aspect of a more general problem, how does a node(*) determine its locator under any circumstances? That is, when a node is first created it must determine its locator; when the locator for a node changes (say, I change providers, or a new level is added to the locator hierarchy) the node must learn of its new locator, and when a node moves it must learn its new locator. I believe that one mechanism can be used to solve all these problems. (*) by "node" I mean a node of the Nimrod graph. That is, a node could be a cluster which is composed of other clusters, hosts, routers, networks, etc etc. You are right in that there is also the problem of getting a 'remote' host's locator when I want to start communicating with that host. But this must happen even if hosts do not move on the network. For example, when "tri-flow.ftp.com" (my mail server) sends this mail message to "necom830.cc.titech.ac.jp" (your machine), it must establish a TCP connection to transfer the message (using SMTP). My machine must find the locator for your machine, even though neither machine is moving and neither machine is likely to move. > > Only if the routing tables have to hold entries for all 4G people. One of > > the major goals of Nimrod is to allow aggregation of routes, reducing the > > amount of routing information that has to be kept in the routing > > tables at any one point in the network. > > In general, people moves at random. Around Tokyo, more than 10M people > moves daily between thier home and office, where very weak correlation > between locations of home and oice is seen. There are a lot of nation > wide or international travellors who also travels randomly. Of course > there is a backbone of transportation, but the correlation between > source and distination is mostly random and no meaningful aggregation > is possible. The aggregation would occur at some level higher than the individual person. If both your home and your office get service from the same service provider, then aggregation could occur there. For example, if I had connectivity to my home I might get service from Nearnet. Nearnet also provides service to FTP software. So, a machine at my home might have a locator like nsfnet.nearnet.franks_home.machine. If I move that machine to my office it would then get a locator like nsfnet.nearnet.ftp_software.machine. So the aggregation would occur within Nearnet. If my home service came from PSI instead, then my home locator would be nsfnet.psi.franks_home.machine. When I move the machine to the office, then the aggregation would be at the nsfnet level. In either case, this is invisible to you in Japan since (I assume) Japan's connectivity would be from the Japanese National Backbone to the US Backbone (nsfnet). So the Japanese backbone would need to keep a route to only the nsfnet. It would not need to keep track of routes to nsfnet.nearnet or nsfnet.psi. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa24000; 22 Apr 94 16:22 EDT Received: from pizza by PIZZA.BBN.COM id aa19211; 22 Apr 94 16:10 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa19207; 22 Apr 94 16:06 EDT Received: from quern.epilogue.com by BBN.COM id aa22706; 22 Apr 94 16:02 EDT To: nimrod-wg@BBN.COM In-reply-to: Noel Chiappa's message of Mon, 18 Apr 94 11:06:21 -0400 <9404181506.AA27350@ginger.lcs.mit.edu> Subject: Dave's questions Date: Fri, 22 Apr 94 16:01:13 EDT From: dab@epilogue.com Sender: dab@epilogue.com Message-ID: <9404221601.aa03047@quern.epilogue.com> Date: Mon, 18 Apr 94 11:06:21 -0400 From: Noel Chiappa Trying to make moving a network as easy, and important, as moving hosts is an ambitious goal that make me nervous. This is a goal that if we stumble too hard I'd be willing to punt. But, because of the things like what Frank pointed out, I think it's going to be an important requirement of the net in the future. Also, I guess I have this feeling that the design of nimrod is going to make it not quite as hard as we fear. The final also is that I think the way we're going that moving a host is going to be indistinguishable, to nimrod anyway, from moving a network. The design of locators in the architecture document had them growing top down and as far down as you wanted. In other words, the locators didn't have to stop at the interface they could go inside the machine. Way cool, I've always wanted to do that. Strict bottom up locators don't necessarily let you do that but you were pushing for being able to grow both up and down (and I'd pretty much come to that conclusion with my own thinking too). So it looks like we're going to get locators that don't stop at the interface but go inside the machine. Now moving a host looks like moving an entire network. If nimrod can handle one it can handle the other. Dave   Received: from PIZZA.BBN.COM by BBN.COM id aa04263; 23 Apr 94 2:52 EDT Received: from pizza by PIZZA.BBN.COM id aa21848; 23 Apr 94 2:41 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa21844; 23 Apr 94 2:39 EDT Received: from necom830.cc.titech.ac.jp by BBN.COM id aa03991; 23 Apr 94 2:39 EDT Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Sat, 23 Apr 94 15:33:29 +0900 From: Masataka Ohta Return-Path: Message-Id: <9404230633.AA22971@necom830.cc.titech.ac.jp> Subject: Re: Dave's questions To: kasten@ftp.com Date: Sat, 23 Apr 94 15:33:28 JST Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM In-Reply-To: <9404201253.AA17562@mailserv-D.ftp.com>; from "Frank Kastenholz" at Apr 20, 94 8:53 am X-Mailer: ELM [version 2.3 PL11] > I assume by this you mean "Nimrod can not support changing the locator > of a node in the network graph"? True, in the Nimrod architecture document > there is no specification of a protocol that does this. But, the document > specifies an architecture, not protocols. Urrrrr.... > > > if you > > > assume that movement is limited to the leaves of the tree then it > > > makes sense to allow individual leaves to ask some service "what's my > > > new locator?". Didn't you say "new locator" here? Doesn't it imply locator change? > This is true. It is also irrelevant to my discussion. I was talking > about what actions have to occur when a node(*) changes its location > on the network graph. It depends on how the protocol for locator change is. > The aggregation would occur at some level higher than the individual > person. If both your home and your office get service from the same > service provider, then aggregation could occur there. Your assumption is broken. My home will get service from service providers most convenient to my home. My office will get service from service providers most convenient to my office. > In either case, this is invisible to you in Japan since (I assume) I haven't say my move around tokyo affects something in US. Masataka Ohta   Received: from PIZZA.BBN.COM by BBN.COM id aa18360; 25 Apr 94 10:43 EDT Received: from pizza by PIZZA.BBN.COM id aa01192; 25 Apr 94 10:27 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa01188; 25 Apr 94 10:24 EDT Received: from wd40.ftp.com by BBN.COM id aa16105; 25 Apr 94 10:09 EDT Received: from ftp.com by ftp.com ; Mon, 25 Apr 1994 10:09:30 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Mon, 25 Apr 1994 10:09:30 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA06713; Mon, 25 Apr 94 10:08:22 EDT Date: Mon, 25 Apr 94 10:08:22 EDT Message-Id: <9404251408.AA06713@mailserv-D.ftp.com> To: nimrod-wg@BBN.COM Subject: bottom-up or top-down From: Frank Kastenholz Reply-To: kasten@ftp.com Content-Length: 7249 On the flight back from Seattle I was tired and suffering from a cold with the attendant earaches due to the pressurization/ depressurization of the cabin. So what else would my brain turn to but Nimrod -- I guess that my sanity was the first casualty of the week. Specifically, I started pondering the discussion started by Dave Bridgham about whether the tree grows from the bottom up or from the top down. The problem that Dave, et al, had was that if the tree grows from the bottom up (that is, the leaves of the tree are assgned level number 0 and the root is assigned level N with N>0) then it must have the same height all over. This is needed so that, given a locator, you can tell where within the tree that the locator 'belongs'. Assume that you had a hierarchy of 7 nodes, numbered 1-7, with arcs 1-2, 1-3, 2-4, 2-5, 3-6, and 3-7. Node 1 is the root node and nodes 4 5 6 and 7 are the leaf nodes. Further, assume that nodes 1, 2, 4, and 6 have been assigned locator-element "A" and nodes 3, 5, and 7 have element "B" -- that is, the mapping of full locators, starting at the root node, for the individual nodes would be: Node Full Locator 1 A 2 A.A 3 A.B 4 A.A.A 5 A.A.B 6 A.B.A 7 A.B.B The graph would look something like: +----+ | 1A | +----+ |-------| |-----------| +----+ +----+ | 2A | | 3B | +----+ +----+ |---| |----| |--| |---| +----+ +----+ +----+ +----+ | 4A | | 5B | | 6A | | 7B | +----+ +----+ +----+ +----+ (In this note, I'll always use numbers to uniquely name nodes, letters to be the locator elements, and arcs will be named by the two nodes that they connect) So, suppose that you are at node #4 and have locator A.B -- does it refer to node #5 or to node #3? You might be tempted to say that all locators should be rooted at the root (? :-). However, if the hierarchy can grow upward, you might not really know where the root is. You can not root the locator at the leaves, since it would not be unambiguous ( e.g, add another node to the above diagram, #8, with locator-element "B" and connect it to the graph via a link 5-8 -- if locators were rooted at the leaves, then the locator "B.B" would identify both nodes 8 and 3 -- oops). So, you have to include some concept of where in the tree the locator "belongs". Dave and others have been assuming that this information is a 'level number' -- in the above picture, the leaves would all be at level 1, the root at level 3. However, this seems to require that all leaf nodes be at the same level, i.e. level 1, which means that adding new levels in a local manner is impossible without some severe graph gyrations (e.g. MAP's notion of using fractions, or sparse number spaces and the like). Now, My Idea... Why not number the levels relative to the node which is generating the locator? For example, in the following locator hierarchy (numbers are the absoloute node identifiers [ala EIDs], and letters are the elements of the locators) 1A / \ / \ 2A 3B / \ / \ 4A 5B 6A 7B / \ \ \ 8A 9B 10A 11A / \ \ 12A 13B 14A / \ 15A 16B The locator string A.B could refer to nodes 3, 5, 9, 13, or 16. However, if locators are 'qualified' by identifying how many levels up the tree one must go before finding the 'root' of the locator, A.B could then be uniquely used to identify something. For example, node 8 could refer to node 9 as 2.A.B (go up two levels to node 2, and then go to child-node A (node 4) and then child node B (9)). Node 9 could also be referred to as 1.B (up 1 to node 4 and then to child B), or 3.A.A.B. Note that if node 8 thinks that the 'global root' of the locator hierarchy is node 2 (e.g. we've added a new layer at the top but node 8 does not know it yet), then node 8 would not be able to communicate with nodes 1, 3, 6, 7, 11, 14, 15, or 16 because it would not be able to build a well formed locator for those nodes (that is, node 8 believes that it can go up only two levels, not three, so therefore it can not build a locator rooted at node 1). If the graph is acyclic (as I've drawn it here), node-to-node forwarding gets real simple. Any given node needs to know only a few things -- which way is 'up' in the graph, and which way does it need to send packets to get to lower, contained, nodes. Obviously, when a packet is passed from one node to another (i.e. across an arc in the graph), things are a bit more complicated since there might be several links connecting two nodes, each link having different levels/grades of service and so on -- but I do not feel that this is a major problem. More likely, the graph is going to have cycles in it -- people will want to have multiple service providers and so on: 1A / \ / \ 2A 3B / \ / ^ \ / \ 6A | 7B 4A B5C<-----+ \ / \ \ \ 8A 9B 10A 11A / \ \ 12A 13B 14A / \ 15A 16B In this example, node 5 is connected to both nodes 2 and 3. Node 5 also has two locator elements assigned to it. From "within" node 2, it has locator element B and from "within" node 3 it has locator element C; thus, from node 1 there are two, equally valid, locators for node 5: A.A.B and A.B.C (and, obviously, node 10 has locators A.A.B.A and A.B.C.A). This shows that locators really are bound to arcs and not nodes of the map (this was blatantly obvious to me when I drew this graph and, at the same time, observed that node 1 does not need a locator element -- it has no 'containing' node...) =============================================== Now, there is a problem here. What does a host get when it tries to get the locator for some node? (I use the term 'node' here explicitly as a point in the locator graph and 'host' as the computer that sits on my desk). There is a mapping service that, given a host-name (probably a FQDN) and will return a locator to reach that host (just as we get an IPv4 address out of the DNS today). If a host in node-8 tries to reach a host in node-9, what locator does it get back? There are many locators that refer to node 9. The actual locator will depend on where the asker is. Valid locators to reach node 9 from node 8 are 1.B, 2.A.B and 3.A.A.B. If Node16 is trying to reach node 9, the only valid locator would be 5.A.A.B. This I have not yet figured out. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa18366; 25 Apr 94 10:43 EDT Received: from pizza by PIZZA.BBN.COM id aa01179; 25 Apr 94 10:27 EDT Received: from BBN.COM by PIZZA.BBN.COM id ab01168; 25 Apr 94 10:23 EDT Received: from wd40.ftp.com by BBN.COM id aa16097; 25 Apr 94 10:09 EDT Received: from ftp.com by ftp.com ; Mon, 25 Apr 1994 10:09:26 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Mon, 25 Apr 1994 10:09:26 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA06710; Mon, 25 Apr 94 10:08:18 EDT Date: Mon, 25 Apr 94 10:08:18 EDT Message-Id: <9404251408.AA06710@mailserv-D.ftp.com> To: nimrod-wg@BBN.COM Subject: Architecture Comments From: Frank Kastenholz Reply-To: kasten@ftp.com Content-Length: 14164 I've finally finished reading the architecture document that was mailed out on 15 March. I have a few comments. Some are merely editorial in nature, some are of a bit more substance.. Some of these are rather general questions, which i only thought of asking when reading the section indicated. 1. Bullet 1 of section 1.1 (Constraints) says that the Internet will grow to O(10**9) networks. Current thinking is that the number may be as high as 10**12. I don't think that this will really be a problem for Nimrod, but I wanted to point it out. 2. Bullet 7 of 1.1 (constraints still) says that larger entities will move less often than smaller entities (my paraphrasing). While this may be true, I have a feeling that if a large entity (e.g. a network or collection of networks) will move at all, then it will move a lot. Think about networks in planes or cars or ships... 3. In section 1.2 (Basic Routing Functions), it says that Nimrod does distribution using link-state routing. I do understand the advantages of LS over DV routing. My concern is that this may be specification where specification is not required. It seems that a cluster should be free to route within itself in any manner that it wants. The only requirement is that it can truly provide the services/connections that it advertises outside of itself. Or am I missing something? 4. In section 1.3.1 (Clustering), second paragraph. it says that Nimrod does not specify a cluster formation algorithm. It seems to me that we may be sacrificing some simplicity here. I have a feeling (and it is only a feeling, so don't ask me to explain it any more than this) that, since clustering is an essential part of Nimrod, we have to have some description of how clusters are built. I just have this feeling of incompletness here... 5. In section 1.3.1 (Clustering), third paragraph. It says that "two branches can not be in the same cluster unless that cluster also contains the network connecting them". I read this as saying that the two PCs on my desk can not be in the same cluster unless the ethernet to which they are both connected is also in the cluster. Yes? What about using virtual links here? 6. In section 1.3.1 (Clustering), last paragraph. It says that clustering is distinct from the physical organization of the components -- cluster boundaries may not coincide with host, router, etc boundaries. Does this generality add complexity? Would it be simpler to limit clusters to physical entity boundaries? What would we lose by making such a limit? (other than generality). 7. In section 1.4 (The Internet), bullet 1 does not take into account CIDR. SHould it? 8. In section 1.4 (The Internet), bullet 3 says that there are 232 possible distinct IP addresses and 109 networks projected for the future. I think that this may be wrong. 9. Section 2.1, Endpoints. The sentence in the middle of the paragraph probably wants to read "For ease of management, EIDs might be hierarchically administered, but this is not required". 10. Section 2.2, Maps. The third paragraph says that a host as access to one or more route servers. Note that Nimrod must be able to work on an isolated, single ip-network (equivalent) piece of wire with only two PCs on it. 11. In section 2.3 the document talks about border points. What are border points? What use do they serve in the architecture (I don't recall seeing them used anyplace else)? Are border points nodes of the graph? If so, then are the arcs that connect to border points really connecting to border points within the border points? (feel free to recurse as much as desired here :-) 12. In section 2.3, under multi-point arcs. Suppose that I have a mesh network that supports some forms of service allocation or whatever at the datalink level, such as ATM. There are many nodes, all of which can communicate. However, some subset of the nodes is given additional service levels or whatever. How is this dealt with? 13. In 2.3.1, Internal Maps, second paragraph, it says that a "router can obtain more detailed maps ... recursively". At what point does this stop? Is there a logical 'ending' point, or can it be turtles all the way down? 14. In 2.3.1, Internal Maps, third graf, says "A transit map---not containing nodes---cannot be further expanded." This is an odd statement. First of all, a transit map contains some set of nodes and arcs. These nodes are the border-point nodes of the map, and the arcs represent the connectivity services offered by the map. Now, an observation on this: This map contains nodes. Each of these nodes (i.e. the border-point nodes) ought to be expandable ( ignoring administrative constraint) showing how that border-point node 'connects' the various arcs which connect to it. Also, the border-point node, being a node, will have arcs connecting to it and those arcs will have to connect to border-points within the border-point (section 2.3, graf 2). Feel free to recurse at this point as much as you'd like :-) Maybe something extraordinarily brilliant, sublime, and subtle is going on here and I just don't get it. Maybe not. 15. In section 2.4, Locators, graf 1. It says that a BTE is assigned only one locator. I have always been under the rough impression that the locator hierarchy would closely follow the provider hierarchy since network providers and backbones and the like, really define the actual topology with which we have to deal. If this impression is correct, the statement in the document says that even if I have two providers, I get only one locator. I do not think that limiting things to a single locator will be acceptable: - there is the problem of multi-homed hosts (waving your hands and saying 'not allowed' is not allowed). - when rearranging one's networks, one may wish to do things like assign multiple locators to hosts prior to doing the actual physical changes. - for mobility, if a concept similar to the base station concept is in use, then the host that moved may be seen as having 2 locators (one for its true position and one the position of the base station which will 'forward' the traffic). - anything that gets relocatored should probably appear to be at both the old and the new locators for a while, allowing time for the old locators to drain out of the Internetwork without disrupting 'current' traffic too much. - dual providers. 16. section 2.4, Locators, graf 2. You might want to have the sentence that begins "Given that the nodes in a map..." start a new graf. 17. Section 3.1, last graf. The diagram associated shows x.net and y.net as providers and z.com as a service user and then discusses what happens when z.com changes service providers (from x to y). The paragraph in question says "caching of locators must be done carefully". This brought up a thought. If, when z.com changes its providers, from x.net to y.net, why not have x.net redirect the packets that it receives which are destined for z.com to y.net? (presumably the two (x.net and y.net) are connected via some higer level entities such as a national backbone). This may be more 'algorithm' and less 'architecture' but the possibility ought to be mentioned. (Aside, I've found reading the diagrams sometimes confusing. Mostly because of the multiple use of the period (.). In some of the diagrams that I've drawn in ascii, I've found the following conventions very helpful: - periods separate locator elements, only. - if you want to have things like x.net as the 'name' of a blob in the picture, do it as x-net. - i generally assign each node a unique id number in the picture, making it easier to show 'which' node the text is referring to. - i always make locator elements uppercase, single, letters and, out of habit, always start at A. I've found that following rules like this helps make it easy to understand both the text and the diagram.) 18. Section 3.2, Multiple Locator Assignment. How does this fit with section 2.4, Locators, graf 1 where it says that a BTE is assigned only one locator. 19. Fig 2 on page 14. I do not understand how locators are assigned here. we went through this in Seattle so there's probably no need to go into it here. 20. Sect 4, Forwarding, Bullet2 -- BTEC mode is very similar in general concept to IPv4's source routing, yes? 21. Sect 4, last graf, there are a couple of poorly worded sentences: "Given a map, a packet moves to the node in this map to which the the associated destination locator belongs to" Change the 'belongs to' into 'refers'. And "If the destination node has a ``detailed'' internal map, the destination locator should belong to..." Again, change the 'belong' into 'refer' 22. Sect 4.4, BTE Chain Mode. The last sentence of the first graf says "..a locator in the BTEC header could correspond to the type of service...[and not the physical path]" argh! Tilt! Does this say that there are these abstract things in the network called 'types of service' which have locations on the topology? I was sort of under the impression that if you wanted specific types of service, you'd need a flow? The next two grafs of this section seem to be overly complicated by the existance of multi-point arcs. if we get rid of multipoint arcs, aren't things made simpler? 23. Section 4.4, graf 6, it says "ii) routers would maintain, for each BTE, a pre setup flow which provides connectivity similar to that of the BTE." For what "for each BTE"? Is it each adjacent BTE? Each BTE in the Internet? Each BTE of which the router is aware? Some randomly selected set of BTEs? 24. Section 4.4, last two grafs. There is an implication here that internet level headers will (may) change as the packet goes through the internet. This may have ramifications for security and integrity functions of the internet protocols (suppose that a crypto-checksum is run over packets by the source, which the destination uses to authenticate the packet?). Also, obviously the EIDs can not change. I would envision that the EIDs are used by the TCP as a part of the connection identification. 25. Section 4.5, Datagram Mode. graf 4. It says that a packet contains a pointer into the locators. Which locators? There are only two of them. The bullet should include the next paragraph, which describes the use of the pointer (i interpreted the bullet as saying that the pointer is into a chain of locators, much like the source-route pointer in IPv4). 26. Same section, graf 5, The sentence that begines "In addition to these extra fields..." is a non-sequitur with respect to the rest of the graf. Also, this sentence needs to be expanded. What are the 'critical places in the abstraction hierarchy'? How are they defined? How do routers find them? 27. Same section, graf 6. The end of this paragraph ("As an efficiency move...") implies that there is a nimrod packet encapsulation (which can include the flow-setup stuff and the actual datagram). 28. Same section, graf 7. This section seems to imply that routers have presetup flows all the way up the hierarchy (or be able to do so on demand). Also, is it possible that the active router would have a shortcut to the destination. The example given discusses a packet going from A.P.Q.R to A.X.Y.Z. It says that a packet must go all the way up to A and then all the way back down. Suppose that a router at Q has a 'private backdoor' to a router at Y (for example, A.P.Q and A.X.Y are two branch offices of the same company, and the company has set up a private link to connect these offices together). Can the router at Q send the packet directly to the router at Y? This is an extremely important ability -- companies may very well wish to route internal traffic over internal links, links which they have control over, to avoid possible provider traffic-based charges, security issues, and so on. 29. Same section, in general it seems that the notion of a border router is introduced here. It seems that this is a specific element of the architecture and should be called out (someplace) as such and its specific qualities and attributes described. 30. Section 5, Renumbering, and its figure. Can the nmode which gets renumbered be any node of the network? With renumbering occuring recursively down ward, until the last turtle is reached. 31. Section 6.1.1, Effects of Mobility. Observation: If Networks can move, and hosts can move, then there might be multiple, redirected, hops via multiple home representatives. (i.e. if a network moves, then all packets directed to that network would first go to the network's 'old place' and then be redirected to the network's 'new place'. Now, if a host within that network moves, packets directed to the host are first sent to the network's 'old place', which redirects the packets to the network's 'new place' and the packets enter the destination host's network, which sees that the destination host has moved and so will redirect the packets now to the host's 'new place). 32. Section 6.2.1 Approaches (to mobility). This whole section brings up a critical question. How are EIDs structured? Remember, they are here, in effect, defined to be database keys. They can not be simple, random, numbers. If they are, we end up with hosts.txt. There must be hierarchy in the EIDs so that the database which maps EIDs to locators can be distributed and delegated. Otherwise Nimrod does not work... -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa26790; 25 Apr 94 13:02 EDT Received: from pizza by PIZZA.BBN.COM id aa02423; 25 Apr 94 12:41 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa02419; 25 Apr 94 12:39 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa25232; 25 Apr 94 12:34 EDT Received: by ginger.lcs.mit.edu id AA26154; Mon, 25 Apr 94 12:34:26 -0400 Date: Mon, 25 Apr 94 12:34:26 -0400 From: Noel Chiappa Message-Id: <9404251634.AA26154@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: Dave's questions Cc: jnc@ginger.lcs.mit.edu From: dab@epilogue.com The design of locators in the architecture document had them growing top down and as far down as you wanted. In other words, the locators didn't have to stop at the interface they could go inside the machine. Way cool, I've always wanted to do that. Strict bottom up locators don't necessarily let you do that There are ways to do that with bottom-up; you just have to think of extending "sideways". E.g., if you're running a VM OS, the host becomes a router, and there is a virtual net inside the machine, and each virtual machine gets an interface to that virtual net; the virtual network can have a locator at the same level as the real network the host is attched to. The process continues to work if you have VM OS's running in the VM's. Of course, at a certain point, the area containing all those nets may get so large you want to split it, which bring me to my next point... but you were pushing for being able to grow both up and down Well, "pushing" is a bit too strong; I was contemplating the possibility. The main reason I came to this point was asking myself: suppose area X at level K gets too large, and wants to split; the obvious tack is to become two areas, X1 and X2, at level K. However, suppose this is not possible? Being able to introduce another sub-layer below K, and make X into two things at that level (so instead of K.X.mumble turning into K.X1.mumble, it turns into K.X.1.mumble) is one way out. However, I'm not sure the extra complexity of this is worth it; locators can become even more tricky. Maybe you just have to split into two K level things. I think it'll become clearer which is the right way to go as more about locators in general, and especially how to find the binding context of a given locator, becomes clear. E.g., if levels are numbered (both to help with finding binding contexts, as well as allowing non-unique labels), if locators can grow in the middle the numbers have to be rationals; uggh. So it looks like we're going to get locators that don't stop at the interface but go inside the machine. Now moving a host looks like moving an entire network. If nimrod can handle one it can handle the other. Yes and no. If you move a single endpoint, you can use simple mechanisms to tell it (and everyone else) it has moved. If you move a group of entities, the mechanisms to notify them all are inevitably more complicated, as they have to scale, etc. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa14812; 25 Apr 94 15:48 EDT Received: from pizza by PIZZA.BBN.COM id aa03761; 25 Apr 94 15:24 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa03754; 25 Apr 94 15:22 EDT Received: from wd40.ftp.com by BBN.COM id aa07379; 25 Apr 94 15:11 EDT Received: from ftp.com by ftp.com ; Mon, 25 Apr 1994 15:11:27 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Mon, 25 Apr 1994 15:11:27 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA12088; Mon, 25 Apr 94 15:10:20 EDT Date: Mon, 25 Apr 94 15:10:20 EDT Message-Id: <9404251910.AA12088@mailserv-D.ftp.com> To: daniel@catarina.usc.edu Subject: Re: bottom-up or top-down From: Frank Kastenholz Reply-To: kasten@ftp.com Cc: nimrod-wg@BBN.COM Content-Length: 440 > >> On Mon, 25 Apr 94 10:08:22 EDT, Frank Kastenholz said: > > But what happens when you try to ... eeeeek! i sent that out? i didn't mean to! it had bugs (like the one you pointed out). never mind. delete it. expunge it. erase all traces of it from your memory. pretend it never happened. sorry to waste the bandwidth. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa23480; 25 Apr 94 16:29 EDT Received: from pizza by PIZZA.BBN.COM id aa04055; 25 Apr 94 16:03 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa04051; 25 Apr 94 16:01 EDT Received: from usc.edu by BBN.COM id aa15238; 25 Apr 94 15:50 EDT Received: from laguna.usc.edu by usc.edu (4.1/SMI-3.0DEV3-USC+3.1) id AA05567; Mon, 25 Apr 94 12:03:42 PDT Received: by laguna.usc.edu (4.1/SMI-4.1+ucs-3.6) id AA02454; Mon, 25 Apr 94 12:10:15 PDT Date: Mon, 25 Apr 94 12:10:15 PDT From: "Daniel M. Alexander Zappala" Message-Id: <9404251910.AA02454@laguna.usc.edu> To: kasten@ftp.com Cc: nimrod-wg@BBN.COM In-Reply-To: <9404251408.AA06713@mailserv-D.ftp.com> (message from Frank Kastenholz on Mon, 25 Apr 94 10:08:22 EDT) Subject: Re: bottom-up or top-down Reply-To: daniel@catarina.usc.edu >> On Mon, 25 Apr 94 10:08:22 EDT, Frank Kastenholz said: > More likely, the graph is going to have cycles in it -- people > will want to have multiple service providers and so on: > 1A > / \ > / \ > 2A 3B > / \ / ^ \ > / \ 6A | 7B > 4A B5C<-----+ \ > / \ \ \ > 8A 9B 10A 11A > / \ \ > 12A 13B 14A > / \ > 15A 16B > In this example, node 5 is connected to both nodes 2 and 3. Node 5 > also has two locator elements assigned to it. From "within" node 2, > it has locator element B and from "within" node 3 it has locator > element C; thus, from node 1 there are two, equally valid, locators > for node 5: A.A.B and A.B.C (and, obviously, node 10 has locators > A.A.B.A and A.B.C.A). But what happens when you try to go up from a node that has several links in the "up" direction? Which path do you take? Moreover, in your example it is not too confusing, but if node 5 has a link to node 4, then 1.A.A does not uniquely define a path from node 5. Daniel   Received: from PIZZA.BBN.COM by BBN.COM id aa26374; 26 Apr 94 14:24 EDT Received: from pizza by PIZZA.BBN.COM id aa11475; 26 Apr 94 14:06 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa11471; 26 Apr 94 14:03 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa24909; 26 Apr 94 13:59 EDT Received: by ginger.lcs.mit.edu id AA05986; Tue, 26 Apr 94 13:27:03 -0400 Date: Tue, 26 Apr 94 13:27:03 -0400 From: Noel Chiappa Message-Id: <9404261727.AA05986@ginger.lcs.mit.edu> To: int-serv@isi.edu, nimrod-wg@BBN.COM, rsvp@isi.edu, sdrp@catarina.usc.edu Subject: Do we need a 'flows' mailing list? Cc: big-internet@munnari.oz.au, jnc@ginger.lcs.mit.edu During a discussion with Craig, he suggested that I poll people to see if those of us who believe in flows should have a single mailing list for discussing generic flow stuff on. The current situation, with 4 (or more) groups working on flows means that either i) generic topics cause people to get multiple copies or ii) some people miss useful stuff. For instance, some of the Nimrod people at BBN missed the discussion of mulitcast flows which happened on the RSVP list. Replies pro and con to me *only*, please (no 'reply-all' :-), and I'll send out a summary. (Volunteers to host same also accepted...) Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa23900; 28 Apr 94 14:56 EDT Received: from pizza by PIZZA.BBN.COM id aa26456; 28 Apr 94 14:40 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa26452; 28 Apr 94 14:35 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa22571; 28 Apr 94 14:34 EDT Received: by ginger.lcs.mit.edu id AA25086; Thu, 28 Apr 94 14:34:24 -0400 Date: Thu, 28 Apr 94 14:34:24 -0400 From: Noel Chiappa Message-Id: <9404281834.AA25086@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: IPng requirements document points.... Cc: jnc@ginger.lcs.mit.edu During a phone conversation with Martha, she had a number of comments on the requirements. I'll note them briefly here, and I'll shortly post a revised version of the requirements. - Instead of a "hop count", substitute a "looping packet detector". It is true that the hop count is per se not part of the forwarding architecture (except inasmuch as we do wish to see a "belt and suspenders" approach to packet looping), but any equivalent mechanism (such as a timestamp) will do. - The pointers into the locator and source route are really not strictly speaking necesary, since you should be able to tell what element to process next if things are working correctly. It is this not necessary, but my engineer's sixth sense says this is well worth it, and will be a big win, on robustness grounds alone (let along processing efficiency). Wording modified to indicate this. - The kind of source route I have been asssuming we were going to have is what you might call "semi-strict", which is to say that the route does not have to name all individual routers it traverses, but it does have to list topologically contiguous elements, albeit potentially high-level ones. The other possibility is classical "loose" source routing, in which only a few intermediate points through which the route has to pass are named. It has been an open question as to whether Nimrod would provide both, or only one, and this topic will be explored in more detail in a bit. The wording was modified to indicate the needs of both modes, and make clear that no choice had been made as to which (or both) to include. - A header version number is always useful. - Authentication of some sort is needed. See the recent IAB document from the IAB architecture retreat on security (draft-iab-sec-arch-workshop-00.txt), section 4, and especially section 4.3. There is currently no set way of doing "denial/theft of service" in Nimrod, but this topic is well explored in that document. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa02259; 28 Apr 94 16:41 EDT Received: from pizza by PIZZA.BBN.COM id aa27348; 28 Apr 94 16:25 EDT Received: from BBN.COM by PIZZA.BBN.COM id ab27341; 28 Apr 94 16:23 EDT Received: from ftp.com by BBN.COM id aa01087; 28 Apr 94 16:23 EDT Received: by ftp.com ; Thu, 28 Apr 1994 16:23:12 -0400 Received: by ftp.com ; Thu, 28 Apr 1994 16:23:12 -0400 Date: Thu, 28 Apr 1994 16:23:12 -0400 From: Jim DeMarco Message-Id: <9404282023.AA21551@ftp.com> To: jnc@ginger.lcs.mit.edu Cc: nimrod-wg@BBN.COM In-Reply-To: Noel Chiappa's message of Thu, 28 Apr 94 14:34:24 -0400 <9404281834.AA25086@ginger.lcs.mit.edu> Subject: IPng requirements document points.... Reply-To: jdemarco@ftp.com > - Instead of a "hop count", substitute a "looping packet detector". It is >true that the hop count is per se not part of the forwarding architecture >(except inasmuch as we do wish to see a "belt and suspenders" approach to >packet looping), but any equivalent mechanism (such as a timestamp) will do. I believe many current trace-route programs make use of the "hop count" field, and they have proven themselves quite valuable. Are there alternative strategies available to perform the same function that don't require a hop count? --Jim   Received: from PIZZA.BBN.COM by BBN.COM id aa03117; 28 Apr 94 16:56 EDT Received: from pizza by PIZZA.BBN.COM id aa27523; 28 Apr 94 16:40 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa27518; 28 Apr 94 16:38 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa02007; 28 Apr 94 16:36 EDT Received: by ginger.lcs.mit.edu id AA27140; Thu, 28 Apr 94 16:36:57 -0400 Date: Thu, 28 Apr 94 16:36:57 -0400 From: Noel Chiappa Message-Id: <9404282036.AA27140@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: IPng requirements document points.... Cc: jnc@ginger.lcs.mit.edu I believe many current trace-route programs make use of the "hop count" field, and they have proven themselves quite valuable. Good point. Are there alternative strategies available to perform the same function that don't require a hop count? Well, for anything source routed, you wouldn't need it, of course. For datagram mode, if we don't use hop count, an explicit inquiry tool could be done. One was almost done for ICMP (you send router X a packet, asking for the next hop it would use for destination Y), before someone figured out the hop- count hack with trace-route. I don't think it's a major issue. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa03784; 28 Apr 94 17:10 EDT Received: from pizza by PIZZA.BBN.COM id aa27692; 28 Apr 94 16:56 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa27688; 28 Apr 94 16:55 EDT Received: from ftp.com by BBN.COM id aa03007; 28 Apr 94 16:54 EDT Received: from ftp.com by ftp.com ; Thu, 28 Apr 1994 16:54:39 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Thu, 28 Apr 1994 16:54:39 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA25578; Thu, 28 Apr 94 16:53:25 EDT Date: Thu, 28 Apr 94 16:53:25 EDT Message-Id: <9404282053.AA25578@mailserv-D.ftp.com> To: jnc@ginger.lcs.mit.edu Subject: Re: IPng requirements document points.... From: Frank Kastenholz Reply-To: kasten@ftp.com Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu Content-Length: 829 > I believe many current trace-route programs make use of the "hop > count" field, and they have proven themselves quite valuable. > >Good point. > > Are there alternative strategies available to perform the same function > that don't require a hop count? > >Well, for anything source routed, you wouldn't need it, of course. For >datagram mode, if we don't use hop count, an explicit inquiry tool could be >done. One was almost done for ICMP (you send router X a packet, asking for the >next hop it would use for destination Y), before someone figured out the hop- >count hack with trace-route. I don't think it's a major issue. and, of course, nimrod works by distributing maps. just go get the map and take a look :-) -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa03963; 28 Apr 94 17:14 EDT Received: from pizza by PIZZA.BBN.COM id aa27673; 28 Apr 94 16:54 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa27669; 28 Apr 94 16:53 EDT Received: from usc.edu by BBN.COM id aa02859; 28 Apr 94 16:52 EDT Received: from catarina.usc.edu by usc.edu (4.1/SMI-3.0DEV3-USC+3.1) id AA09335; Thu, 28 Apr 94 13:52:13 PDT Received: from catarina.usc.edu by catarina.usc.edu (4.1/SMI-4.1+ucs-3.6) id AA24672; Thu, 28 Apr 94 13:55:20 PDT Message-Id: <9404282055.AA24672@catarina.usc.edu> From: kannan@catarina.usc.edu To: nimrod-wg@BBN.COM Subject: Re: IPng requirements document points.... In-Reply-To: Your message of Thu, 28 Apr 1994 16:36:57 -0400.<9404282036.AA27140@ginger.lcs.mit.edu> Date: Thu, 28 Apr 1994 13:55:20 -0700 Sender: kannan@catarina.usc.edu > I believe many current trace-route programs make use of the "hop > count" field, and they have proven themselves quite valuable. Actually, for most purposes, SNMP should be sufficient in returning the next-hop(s), in a better way, especially when multiple equalk cost next hops etc. exist. The only functional difference might be that traceroute works through the forwarding code, while SNMP would only look up a FIB, but the difference to a human being might be insignificant. > Are there alternative strategies available to perform the same function > that don't require a hop count? > > Well, for anything source routed, you wouldn't need it, of course. For Noel: Are you asserting that when using source-routed paths, I would not need traceroutes? Why would that be so? Kannan   Received: from PIZZA.BBN.COM by BBN.COM id aa05800; 28 Apr 94 17:48 EDT Received: from pizza by PIZZA.BBN.COM id aa28076; 28 Apr 94 17:36 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa28072; 28 Apr 94 17:34 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa04987; 28 Apr 94 17:32 EDT Received: by ginger.lcs.mit.edu id AA27802; Thu, 28 Apr 94 17:32:40 -0400 Date: Thu, 28 Apr 94 17:32:40 -0400 From: Noel Chiappa Message-Id: <9404282132.AA27802@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: IPng requirements document points.... Cc: jncjnc@ginger.lcs.mit.edu and, of course, nimrod works by distributing maps. just go get the map and take a look :-) Oh, right, yeah; forgot about that! :-) Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa06586; 28 Apr 94 18:08 EDT Received: from pizza by PIZZA.BBN.COM id aa28140; 28 Apr 94 17:43 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa28136; 28 Apr 94 17:41 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa05434; 28 Apr 94 17:40 EDT Received: by ginger.lcs.mit.edu id AA27884; Thu, 28 Apr 94 17:40:36 -0400 Date: Thu, 28 Apr 94 17:40:36 -0400 From: Noel Chiappa Message-Id: <9404282140.AA27884@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: IPng requirements document points.... Cc: jnc@ginger.lcs.mit.edu Actually, for most purposes, SNMP should be sufficient in returning the next-hop(s) Well, maybe not. That requires a publicly available SNMP session, something that sites may not want to do. However, specific, limited, ICMP network maintainence things (e.g. Ping) stand a much better chance. especially when multiple equalk cost next hops etc. exist. You can fix that with allowing multiple next hops in the ICMP message (something it's hard to do with the hop-count hack). Also, to the extent that the routing decision is taking as input more than just the destination (e.g. TOS), whatever mechanism is doing the lookup should provide all the same info. Are you asserting that when using source-routed paths, I would not need traceroutes? Why would that be so? The thinking goes that you *already* know the path. Howver, there are a few issues. First, we need to have this discussion about the differences between "semi-strict" and "loose" source routing, as alluded to in the notes of Martha's comments. In a loose source routed system, you'd clearly need it. However, even in semi-strict, it might be nice, since if the semi-strict source route was given in terms of high level entities, you might want to see what physical assets that got translated into. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa12213; 29 Apr 94 2:03 EDT Received: from pizza by PIZZA.BBN.COM id aa00886; 29 Apr 94 1:51 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa00882; 29 Apr 94 1:50 EDT Received: from fennel.acc.com by BBN.COM id aa11002; 29 Apr 94 1:50 EDT Received: from by fennel.acc.com (4.1/SMI-4.1) id AB11373; Thu, 28 Apr 94 22:49:30 PDT Message-Id: <9404290549.AB11373@fennel.acc.com> X-Sender: fbaker@fennel.acc.com Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Thu, 28 Apr 1994 22:49:37 -0800 To: kannan@catarina.usc.edu, nimrod-wg@BBN.COM From: Fred Baker Subject: Re: IPng requirements document points.... At 1:55 PM 4/28/94 -0700, kannan@catarina.usc.edu wrote: >> I believe many current trace-route programs make use of the "hop >> count" field, and they have proven themselves quite valuable. > >Actually, for most purposes, SNMP should be sufficient in returning the >next-hop(s), in a better way, especially when multiple equalk cost next >hops etc. exist. The only functional difference might be that >traceroute works through the forwarding code, while SNMP would only look >up a FIB, but the difference to a human being might be insignificant. the thing that may be at issue here is that all IP routers send ICMP TIME EXCEEDED, but not all implement "public" as the common community - consider SNMPV2, you may not have the appropriate objects in the general MIB view. Consider also: if my trace is towards ftp.acc.com, the forwarding code makes the analysis nicely. Out in the net, it will route towards 129.192. At the egress point from CERFNET, it comes to a router which is running OSPF, and routes towards 129.192.64. At SB-8230.acc.com, and ARP table lookup is done, and the function forwarded. In SNMP, you cannot ask for "whatever arp or route entry most directly relates to 129.192.64.25," you must GET-NEXT through the route entries around the value, and if you find a DIRECT route, check for an ipAddrEntry OR a corresponding ipNetToMediaEntry. If the device exists but no ARP entry is cached, you will be unable to translate the last hop until you first ping the target. This is getting to be a fairly sophisticated application... >> Are there alternative strategies available to perform the same function >> that don't require a hop count? >> >> Well, for anything source routed, you wouldn't need it, of course. For > >Noel: Are you asserting that when using source-routed paths, I would >not need traceroutes? Why would that be so? The simple answer is that you already KNOW the route, by virtue of the source-routed path. What you don't necessarily know is: how to calculate the path (traceroute might be useful for this, but then again maybe you want something more sophisticated for flow setup) If the path breaks, you may have difficulty figuring out where it broke. If the path is "loosely" source routed - the path gets you from one AS to the next without telling you how to transit the AS's - you are still lacking a fair bit of information. _______________________________________________________________________ "There's nothing like hay when you're thirsty!" The White King...   Received: from PIZZA.BBN.COM by BBN.COM id aa22992; 2 May 94 19:02 EDT Received: from pizza by PIZZA.BBN.COM id aa20097; 2 May 94 18:49 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa20089; 2 May 94 18:46 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa22320; 2 May 94 18:44 EDT Received: by ginger.lcs.mit.edu id AA01683; Mon, 2 May 94 18:44:39 -0400 Date: Mon, 2 May 94 18:44:39 -0400 From: Noel Chiappa Message-Id: <9405022244.AA01683@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: New Nimrod document repository... Cc: jnc@ginger.lcs.mit.edu A new document repository has been set up (and a "thanks" to Frank Kastenholz and FTP Software for the help), at research.ftp.com, in the directory pub/nimrod. It contains the IPng requirements draft, the architecture document, the old JNC I-D, and some other stuff (such as the current jargon list, the list of open archirectural points, etc). Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa23121; 2 May 94 19:06 EDT Received: from pizza by PIZZA.BBN.COM id aa20155; 2 May 94 18:54 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa20151; 2 May 94 18:53 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa22608; 2 May 94 18:52 EDT Received: by ginger.lcs.mit.edu id AA01725; Mon, 2 May 94 18:52:44 -0400 Date: Mon, 2 May 94 18:52:44 -0400 From: Noel Chiappa Message-Id: <9405022252.AA01725@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: New version of IPng requirements Cc: jnc@ginger.lcs.mit.edu A new draft version of the "Nimrod IPng requirements" document has been prepared.It is available from the Nimrod document repository, on research.ftp.com, in pub/nimrod/ipng_req.txt, as plain ASCII text. This version has an entire added section which discusses in more general terms the interaction of the routing subsystem with the other subsystems of the internetwork layer. It includes an analysis of the internetwork layer as a collection of subsystems (of which the group of stuff called Nimrod compromises three subsystems), and material on state, flows, and flow setup. It also includes comments from a variety of source, as discussed on the mailing list. I need to get comments fairly soon, as I guess we have to have this in my May 10 or so. So, let's say by noon Eastern time next Monday (the 9th). Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa18163; 4 May 94 11:51 EDT Received: from pizza by PIZZA.BBN.COM id aa01685; 4 May 94 11:30 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa01681; 4 May 94 11:26 EDT Received: from research.ftp.com by BBN.COM id aa16616; 4 May 94 11:24 EDT Received: by Research.Ftp.Com (920330.SGI/) for nimrod-wg@bbn.com id AA02549; Wed, 4 May 94 11:20:31 -0400 Received: by Research.Ftp.Com id AA02549; Wed, 4 May 94 11:20:31 -0400 Date: Wed, 4 May 1994 11:20:31 -0400 (EDT) From: Frank Kastenholz Subject: Comments on "Nimrod and IPng Technical Requirements" To: nimrod-wg@BBN.COM Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII > - A looping packet detector. This is any mechanism that will detect a packet > which is "stuck" in the network; a timeout value in packets, together > with a check in routers,is an example. If this is a hop-count, it has > to be more than 8 bits; I would strongly recommend at least 12, and > recommend 16 (to make it easy to update). This is not to say that I think > networks with diameters larger than 256 are good, or that we should design > such nets, but I think limiting the maximum path through the network to > 256 hops is likely to bite us down the road the same way making > "infinity" 16 in RIP did (as it did, eventually). When we hit that > ceiling, it's going to hurt, and there won't be an easy fix. I will > note in passing that we are already seeing paths lengths of over 30 hops. Is this really needed? Nimrod works by map distribution. So, looking at the map, I can see if there is a loop or not and deal with it. If the map does not show a loop, but there is a loop (i.e. the map is wrong) then I would imagine that there will be enough other problems that killing loopy packets is the least of our worries. > (e.g. up, down, across, etc). Since those identifiers themselves are > variable length (although probably most will be two bytes or less, > otherwise the routing overhead inside the named object would be > excessive), and the hop count above contemplates the possibility of > paths of over 256 hops, it would seem that these might possibly some > day exceed 512 bytes, if a lengthy path was specified in terms of the > actual physical assets used. In general, I'd suggest that we ask for 'large' length fields, and allow administrative limits be placed on the max values. Then, as the need arises, we can let the administrative limit grow. This also applies to the hop-count field -- use a big field, but put a 'smallish' admin limit on it. >3.1.2 The Subsystems of the Internetwork Layer > The subsystems which comprise the are covered by Nimrod are i) routing >information distribution (in the case of Nimrod, topology map distribution), Does this include distributing the attributes of the topology? That is, things like link access policies, and the like. >3.2.2 Flows > > A flow, from the user's point of view, is a..... So, a flow is defined path through the internetwork which has certain attributes (other than a simple source-to-destination 'connection') associated with it and these attributes may have been explicitly used in determining, creating, or selecting, the particular path. Yes? > The packets which belong to a flow could be identified by a tag >consisting of a number of fields (such as addresses, ports, etc), as opposed >to a specialized field. Well, 1. you need more than just the source and destination address/eid/whatever to uniquely id a flow (i.e. there could be >1 flows between machines X and Y, each with its own traits). 2. not all protocols carried over IP have source/dest ports. 3. source/dest port pair might change with each transaction (e.g. SNMP over UDP). So, the conclusion is that you need more than just the "IP addresses" but you can not rely on any specific information carried in the IP payload, so therefore, you need an explicit field in the IP header which identifies the flow. Frank   Received: from PIZZA.BBN.COM by BBN.COM id aa28435; 4 May 94 14:24 EDT Received: from pizza by PIZZA.BBN.COM id aa02822; 4 May 94 14:05 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa02818; 4 May 94 14:03 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa27079; 4 May 94 14:03 EDT Received: by ginger.lcs.mit.edu id AA12882; Wed, 4 May 94 14:03:06 -0400 Date: Wed, 4 May 94 14:03:06 -0400 From: Noel Chiappa Message-Id: <9405041803.AA12882@ginger.lcs.mit.edu> To: kasten@research.ftp.com, nimrod-wg@BBN.COM Subject: Re: Comments on "Nimrod and IPng Technical Requirements" Cc: jnc@ginger.lcs.mit.edu > - A looping packet detector. This is any mechanism that will detect a > packet which is "stuck" in the network Is this really needed? Nimrod works by map distribution. So, looking at the map, I can see if there is a loop or not and deal with it. If the map does not show a loop, but there is a loop (i.e. the map is wrong) then I would imagine that there will be enough other problems that killing loopy packets is the least of our worries. We thought about this for a while, some time back. (Look in the mailing list archive around 10 Jan 94.) I know that "theoretically" Nimrod should not display routing loops. I'm very suspicious of such statements! Real life seems to produce circumstances which "shouldn't" happen, and we've paid painful prices all along (the 'Titanic', etc) for designs that assumed that. If nothing else, implementation errors (which do happen) could cause loops. As I said at the time: My reasoning is that preventing looping data traffic is very desireable, since the side-effects are pretty bad. ... I guess what it really comes down to is "how common are loops going to be, and how much will they cost if we don't have a [detection mechanism]", versus "how much is the [detection mechanism] going to cost us". Given all this, having a separate mechanism to detect and kill looping packets seems wise. 'robustness' is a hard thing to quantify, but two completely independant mechanisms to do the same thing seems like it's really robust. > it would seem that these might possibly some day exceed 512 bytes, if a > lengthy path was specified in terms of the actual physical assets used. In general, I'd suggest that we ask for 'large' length fields, and allow administrative limits be placed on the max values. Then, as the need arises, we can let the administrative limit grow. Good point; I'll make a note to this effect in the introductory section on fields (2.1). I have also gone through and redone the suggested lengths for each field in terms of the new terminology defined in 2.1; if people could please check these lengths, and see what they think of them.... > The subsystems which ... are covered by Nimrod are i) routing > information distribution (... topology map distribution), Does this include distributing the attributes of the topology? That is, things like link access policies, and the like. Yes, I'll note it. > A flow, from the user's point of view, is a..... So, a flow is defined path through the internetwork which has certain attributes (other than a simple source-to-destination 'connection') associated with it and these attributes may have been explicitly used in determining, creating, or selecting, the particular path. Yes? Yes, except that I'd say the "path through the internetwork" is one of the attributes of the flow. Actually, I guess we need to distinguish between i) a flow as a sequence of packets, and ii) a flow, as the thing which is set up in the routers; they are subtly different! I made a note to this effect in 3.2.3. Do you think we need seaprate terms (and if so, any suggestions), or do you think it will always be obvious from the context? > The packets which belong to a flow could be identified by a tag > consisting of a number of fields (such as addresses, ports, etc), as > opposed to a specialized field. Well ... the conclusion is that you need more than just the "IP addresses" but you can not rely on any specific information carried in the IP payload, so therefore, you need an explicit field in the IP header which identifies the flow. I thought I said more or less that: Given that you can always find situations where the existing fields alone don't do the job, and you *still* need a separate field to do the job correctly Do I need an example, or more of an argument? Remember, this document isn't "Why the internetwork layer needs flows" but "Nimrod IPng tech Rqmts"! Yes, I do have to explain some stuff, but I can't put *everything* in here (like "Why the internetwork layer needs EID's" :-) Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa29457; 4 May 94 14:39 EDT Received: from pizza by PIZZA.BBN.COM id aa02911; 4 May 94 14:18 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa02907; 4 May 94 14:16 EDT Received: from research.ftp.com by BBN.COM id aa27768; 4 May 94 14:14 EDT Received: by Research.Ftp.Com (920330.SGI/) for nimrod-wg@bbn.com id AA02717; Wed, 4 May 94 14:10:13 -0400 Received: by Research.Ftp.Com id AA02717; Wed, 4 May 94 14:10:13 -0400 Date: Wed, 4 May 1994 14:10:13 -0400 (EDT) From: Frank Kastenholz Subject: Re: Comments on "Nimrod and IPng Technical Requirements" To: Noel Chiappa Cc: nimrod-wg@BBN.COM In-Reply-To: <9405041803.AA12882@ginger.lcs.mit.edu> Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII On Wed, 4 May 1994, Noel Chiappa wrote: > > A flow, from the user's point of view, is a..... > > So, a flow is defined path through the internetwork which has certain > attributes (other than a simple source-to-destination 'connection') > associated with it and these attributes may have been explicitly used in > determining, creating, or selecting, the particular path. Yes? > > Yes, except that I'd say the "path through the internetwork" is one of the > attributes of the flow. I think that this is you saying "tomahtoe" and me saying "tomaytoe"... > > Actually, I guess we need to distinguish between i) a flow as a sequence of > packets, Always refer to this as a sequence of packets, or packet sequence or packet flow... > and ii) a flow, as the thing which is set up in the routers; And this is just a "flow." Since you describe this flow as "a thing..." it sounds like it needs its own term more than the use 'i)'. > Do I need an example, or more of an argument? Neither. I just arrived at the same conclusion via a different path (although, perhaps the conclusion I came to more stronly requires a specific flow id field in the internetwork header). It tends to validate the conclusion, and it can be filed away and then used if/when required. Frank   Received: from PIZZA.BBN.COM by BBN.COM id aa17617; 4 May 94 20:32 EDT Received: from pizza by PIZZA.BBN.COM id aa05080; 4 May 94 20:22 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa05076; 4 May 94 20:20 EDT Received: from Princeton.EDU by BBN.COM id aa17189; 4 May 94 20:18 EDT Received: from ponyexpress.Princeton.EDU by Princeton.EDU (5.65b/2.110/princeton) id AA24439; Wed, 4 May 94 20:10:57 -0400 Received: from clytemnestra.Princeton.EDU by ponyexpress.princeton.edu (5.65c/1.113/newPE) id AA21777; Wed, 4 May 1994 20:10:55 -0400 Received: by clytemnestra.Princeton.EDU (4.1/Phoenix_Cluster_Client) id AA12308; Wed, 4 May 94 20:10:52 EDT Message-Id: <9405050010.AA12308@clytemnestra.Princeton.EDU> To: nimrod-wg@BBN.COM Subject: Re: Comments on "Nimrod and IPng Technical Requirements" In-Reply-To: Your message of "Wed, 04 May 1994 14:03:06 EDT." <9405041803.AA12882@ginger.lcs.mit.edu> X-Mailer: exmh version 1.3 4/7/94 Date: Wed, 04 May 1994 20:10:51 EDT From: John Wagner > > Actually, I guess we need to distinguish between i) a flow as a sequence of > packets, and ii) a flow, as the thing which is set up in the routers; they are > subtly different! I made a note to this effect in 3.2.3. Do you think we need > seaprate terms (and if so, any suggestions), or do you think it will always be > obvious from the context? I think the only thing obvious is that many will misread the meaning no matter what the context. I think these two are not subtly but drastically different. To pull analogies from other fields: The stream bed (ii) is not the same as the water (i) flowing through it. The storm drain (ii) is not the same as the rain water (i) flowing through it. The wire (ii) is not the same as the electrons (i) flowing thorugh it. I think all of these are good analogies for what a Nimrod flow is *after the flow setup has been done*. Flow setup builds the pipe network. The flow occurs after the pipes are put together. Using "flow" to represent both the pipes and their internal contents sure seems to lead to problems communicating. To use another analogy (excuse me while I date myself); Back at the '69 Worlds Fair in New York the GM exhibit showed a nice steamlined car of the future. What was the big selling point? You'd get into the car in front of your house, drive it to the nearest big road *and take your hands off the steering wheel* because you would be able to tell something in your car where you wanted to go and it would worry about the routing (sound familiar?). The goodness of this analogy is that it leads to use of words we want people to think about; road maps, routes, interconnected roads (meshes), etc. Instead of flows (ii) why not "pre-selected routes", "dynamically defined routes", "network optimized routes", ...? None of these strike me as the right replacement but there has to be something better than "flow". John Wagner   Received: from PIZZA.BBN.COM by BBN.COM id aa19020; 4 May 94 21:29 EDT Received: from pizza by PIZZA.BBN.COM id aa05325; 4 May 94 21:15 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa05321; 4 May 94 21:14 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa18628; 4 May 94 21:14 EDT Received: by ginger.lcs.mit.edu id AA15703; Wed, 4 May 94 21:07:11 -0400 Date: Wed, 4 May 94 21:07:11 -0400 From: Noel Chiappa Message-Id: <9405050107.AA15703@ginger.lcs.mit.edu> To: big-internet@munnari.oz.au, int-serv@isi.edu, nimrod-wg@BBN.COM, rsvp@isi.edu Subject: "Flows" mailing list. Cc: jnc@ginger.lcs.mit.edu So, I received a total of 28 replies (including myself) about the question of whether or not we should have a separate flows mailing list. The count was: 5 No, 3 Maybe (as in "I don't know if we need this list, but it you create it add me"), 20 Yes. Since I think that this constitutes rough consensus, the list has been set up. The list itself is "flows@research.ftp.com", and there's the usual "flows-request@research.ftp.com" for *ALL* requests to be added or deleted (but see below). The kind of things the list should deal with are questions like: - Do we have a single mechanism across all subsystems (routing, resource allocation, etc) to name the packets which are part of a flow? - If so, what is it? - Do we have a single mechanism across all subsystems to install flow state in the routers? - If so, what is it? - How do we do multicast flow setup/maintainence, especially for large multi-cast groups? All who voted "Yes" or "Maybe" have been added. Everyone on the Nimrod-WG mailing list has also been added; I did that since the "flow" subsystem of the Nimrod group of subsystem will probably be mostly discussed there. If anyone on the Nimrod WG mailing list didn't want on, my apologies in advance, but I thought that would probably also save having most of you write in to say "please add me". The archives are available for anonymous ftp from research.ftp.com in the directory pub/flow/Archives/ (note the uppercase A!). The 'current' archive file is named archive. back archives are available as archive-ddMmmyy or archive-ddMmmyy.Z. ddMmmyy is the date that the archive file was saved. for example, if there are two files, archive-01Mar94.Z and archive-03Mar94.Z, archive-03Mar94.Z will have the traffic from shortly before midnight on 01Mar94 up to shortly before midnight on 03Mar94. Thanks to Frank Kastenholz for setting this up, and FTP for hosting. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa19927; 5 May 94 10:58 EDT Received: from pizza by PIZZA.BBN.COM id aa08461; 5 May 94 10:39 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa08457; 5 May 94 10:34 EDT Received: from inet-gw-2.pa.dec.com by BBN.COM id aa18376; 5 May 94 10:32 EDT Received: from nacto1.nacto.lkg.dec.com by inet-gw-2.pa.dec.com (5.65/21Mar94) id AA04319; Thu, 5 May 94 07:24:50 -0700 Received: from sneezy.nacto.lkg.dec.com by nacto1.nacto.lkg.dec.com with SMTP id AA04300; Thu, 5 May 1994 10:24:58 -0400 Received: by sneezy.nacto.lkg.dec.com (5.65/4.7) id AA04476; Thu, 5 May 1994 10:25:01 -0400 To: nimrod-wg@BBN.COM Subject: Re: Comments on "Nimrod and IPng Technical Requirements" In-Reply-To: References: X-Mailer: Poste 2.1 From: David R Oran Date: Thu, 5 May 94 10:24:59 -0400 Message-Id: <940505102459.474@sneezy.nacto.lkg.dec.com.thomas> Encoding: 30 TEXT, 6 TEXT SIGNATURE > Is this really needed? Nimrod works by map distribution. So, looking at > the map, I can see if there is a loop or not and deal with it. If the > map does not show a loop, but there is a loop (i.e. the map is wrong) > then I would imagine that there will be enough other problems that killing > loopy packets is the least of our worries. > If map distribution is done by lower-level flooding, then you might get by without an explicit looping-packet detector (we used to call this "super-macho routing", since if it ever fails, it fails spectacularly). If map distribution is handled on top of the normal IPng forwarding mechanisms, then you can have a serious problem if a misbehaving router ever starts spraying these things around erroneously. > > > In general, I'd suggest that we ask for 'large' length fields, and allow > administrative limits be placed on the max values. Then, as the need arises, > we can let the administrative limit grow. This also applies to the hop-count > field -- use a big field, but put a 'smallish' admin limit on it. > The problem with punting this to the administrative domain is the difficulting of gracefully changing the value, which might be quite involved and error prone. Based on my experience in setting the count-up limit on DECnet Phase IV networks, and doing fault diagnosis where premature dropping of packets is one of many possible symptoms of non-transitive communication, I would be reluctant to endorse something which does not reasonable auto-configure based on a (conservative) assessment of the actual network diameter. I think Noel was onto the right track when he suggested a procedure similar to MTU or Router Discovery to get the diameter estimate to the hosts. -+-+-+-+-+-+-+ David R. Oran Phone: + 1 508 486-7377 Digital Equipment Corporation Fax: + 1 508 486-5279 LKG 1-2/A19 Email: oran@lkg.dec.com 550 King Street Littleton, MA 01460   Received: from PIZZA.BBN.COM by BBN.COM id aa01380; 5 May 94 13:52 EDT Received: from pizza by PIZZA.BBN.COM id aa09573; 5 May 94 13:37 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa09569; 5 May 94 13:35 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa00223; 5 May 94 13:34 EDT Received: by ginger.lcs.mit.edu id AA21928; Thu, 5 May 94 13:34:14 -0400 Date: Thu, 5 May 94 13:34:14 -0400 From: Noel Chiappa Message-Id: <9405051734.AA21928@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: Comments on "Nimrod and IPng Technical Requirements" Cc: jnc@ginger.lcs.mit.edu From: David R Oran If map distribution is done by lower-level flooding, then you might get by without an explicit looping-packet detector (we used to call this "super-macho routing", since if it ever fails, it fails spectacularly).o Yah, the failure mode's what bothers me. I'm worried enough about coding faults, etc, that it seems worth guarding about. If map distribution is handled on top of the normal IPng forwarding mechanisms Some map forwarding will have to be; if you ask for a map of some distant location, they aren't going to flood it to you. On the other hand, most "normal" local updating seems like it shoudl be done via flooding. Based on my experience in setting the count-up limit on DECnet Phase IV networks ... I would be reluctant to endorse something which does not reasonable auto-configure ... a procedure similar to MTU or Router suggested Discovery to get the diameter estimate to the hosts. Yah, I agree completely. This is neither expensive, nor difficult, so I think there's no reason not to go this way. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa12897; 6 May 94 9:40 EDT Received: from pizza by PIZZA.BBN.COM id aa14944; 6 May 94 9:30 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa14940; 6 May 94 9:26 EDT Received: from wd40.ftp.com by BBN.COM id aa11914; 6 May 94 9:24 EDT Received: from ftp.com by ftp.com ; Fri, 6 May 1994 09:24:36 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Fri, 6 May 1994 09:24:36 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA11505; Fri, 6 May 94 09:23:20 EDT Date: Fri, 6 May 94 09:23:20 EDT Message-Id: <9405061323.AA11505@mailserv-D.ftp.com> To: oran@nacto.lkg.dec.com Subject: Re: Comments on "Nimrod and IPng Technical Requirements" From: Frank Kastenholz Reply-To: kasten@ftp.com Cc: nimrod-wg@BBN.COM Content-Length: 2326 >> In general, I'd suggest that we ask for 'large' length fields, and allow >> administrative limits be placed on the max values. Then, as the need arises, >> we can let the administrative limit grow. This also applies to the hop-count >> field -- use a big field, but put a 'smallish' admin limit on it. >> >The problem with punting this to the administrative domain is the >difficulting of gracefully changing the value, Yup. I was imagining a mechanism that would allow say 1 or 2 years for a 'phase-in' for new limits. I would hope that we can see, soon enough, when the current admin limits are going to be too small so that a note can be publised to vendors saying 'increase the limit on parameter X' and they would do it as a part of their normal release process and then get fielded in systems as a part of the normal upgrade process. Obviously, we'd need a crystal ball that has a 2 or more year range, at least for these limits. I'll certainly admit that this might be a bit too optimistic. > I would be reluctant to endorse something >which does not reasonable auto-configure based on a (conservative) >assessment of the actual network diameter. I think Noel was onto the right >track when he suggested a procedure similar to MTU or Router Discovery to >get the diameter estimate to the hosts. Yeah. Would be nice if we could set up a hunk of DNS that contains network-wide configuration parameters (such as max TTL). This could feed into the "local configuration protocol" such as DHCP or BOOTP or the like. Could also use various other feedback schemes. For example, for TTL, we could use a VJ-like 'slow-start' algorithm. We could start the TTL at a small-ish value (e.g. 16) and then send the packet. A TTL expired message would come back, so we up the TTL. We keep doing this until we either start to get responses from the far node, or the transmitting host decides that there is a loop. I'd try to do loop detection based on the locator of the node (router) sending the TTL expired message. If the locator of the router sending TTL expired message 'N' is topologically closer to the destination than the locator for the router sending TTL Expired message 'N-1' then you know that you are 'making progress'. -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa08163; 10 May 94 14:58 EDT Received: from pizza by PIZZA.BBN.COM id aa08356; 10 May 94 14:33 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa08352; 10 May 94 14:30 EDT Received: from inet-gw-1.pa.dec.com by BBN.COM id aa06000; 10 May 94 14:28 EDT Received: from xirtlu.zk3.dec.com by inet-gw-1.pa.dec.com (5.65/21Mar94) id AA24519; Tue, 10 May 94 11:22:46 -0700 Received: by xirtlu.zk3.dec.com; id AA29481; Tue, 10 May 1994 11:53:36 -0400 Message-Id: <9405101553.AA29481@xirtlu.zk3.dec.com> To: Noel Chiappa Cc: nimrod-wg@BBN.COM Subject: Re: IPng requirements document points.... In-Reply-To: Your message of "Thu, 28 Apr 94 16:36:57 EDT." <9404282036.AA27140@ginger.lcs.mit.edu> Date: Tue, 10 May 94 11:53:30 -0400 From: bound@zk3.dec.com X-Mts: smtp Noel and WG, We have an IPng Directorate Retreat May 19 and 20 in Chicago. The topics are Routing/Addressing, Autoconfig, and Transition. On our Directorate Telechat yesterday I asked if we had absorbed the NIMROD reqs yet. Frank K. was going to check with Noel but I figured I would probe here too as I think these requirements are good for the NEXT Internet World, which I believe will be far more complex than today per all the users types who will want to be on the Internet. I think the last bottom-line section of the reqs I pulled accross from research.ftp.com are what needs to be sent in to the IPng Directorate. At a minimum they should be at the top of any analysis if readers wish to view them. I say this because the Directorate has now been mandated by the IPng AD's to write up their technical reviews of the IPng proposals. So these folks are really maxed out (like me) and need concise data right now. Kind of like sending something to a high level technical manager or director who you want to read your document, give them the technical jist up front to entice them to read the rest of your technical paper. In this case thats the core requirements. I think stating what will BREAK the core beliefs and abstractions in NIMROD are critical. /jim   Received: from PIZZA.BBN.COM by BBN.COM id aa18482; 11 May 94 17:04 EDT Received: from pizza by PIZZA.BBN.COM id aa15853; 11 May 94 16:43 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa15849; 11 May 94 16:40 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa16888; 11 May 94 16:37 EDT Received: by ginger.lcs.mit.edu id AA09472; Wed, 11 May 94 16:37:18 -0400 Date: Wed, 11 May 94 16:37:18 -0400 From: Noel Chiappa Message-Id: <9405112037.AA09472@ginger.lcs.mit.edu> To: bound@zk3.dec.com Subject: Re: IPng requirements document points.... Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM From: bound@zk3.dec.com I think these requirements are good for the NEXT Internet World, which I believe will be far more complex than today per all the users types who will want to be on the Internet. By "NEXT", do you mean IPng, or some hypothetical thing after that? If you really think we're soon going to need something beyond IPng, maybe we should wait a bit, and see if we can compress two "product cycles" of the internetwork layer into one. At the very least, we should make it plain to people that IPng is an interim step.. I think the last bottom-line section of the reqs I pulled accross from research.ftp.com are what needs to be sent in to the IPng Directorate. I assume you mean section 3.3, "Specific Interaction Issues"? What about section 2.2, "Packet Format Fields"? Many of these fields describe information which ought to be provided by the hosts (e.g. the source and destination locators, flow-id, etc). If you don't think lengths ought to be included, I would strongly disagree. This group is working on actual mechanisms, and these are our best thoughts on what size fields we will need to support the designs we are doing. There are some "world-view" sections that don't need to be in there (such as 3.1 and 3.2), but I'd say all the rest contain stuff which is of direct relevance of what Nimrod thinks it needs... At a minimum they should be at the top of any analysis if readers wish to view them. ... So these folks are really maxed out (like me) and need concise data right now. Kind of like sending something to a high level technical manager or director who you want to read your document, give them the technical jist up front to entice them to read the rest of your technical paper. Are you saying you want a shorter document, because this one is too long? In this case thats the core requirements. I think stating what will BREAK the core beliefs and abstractions in NIMROD are critical. The core requirements, as best we can codify them, and in the sections I mentioned: 2.2 and 3.3. Leaving out any would tend to do serious damage, the exact nature of which (and our workarounds in response) we could discuss. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa25221; 12 May 94 0:32 EDT Received: from pizza by PIZZA.BBN.COM id aa17907; 12 May 94 0:15 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa17901; 12 May 94 0:14 EDT Received: from inet-gw-3.pa.dec.com by BBN.COM id aa24481; 12 May 94 0:11 EDT Received: from xirtlu.zk3.dec.com by inet-gw-3.pa.dec.com (5.65/21Mar94) id AA27262; Wed, 11 May 94 21:08:41 -0700 Received: by xirtlu.zk3.dec.com; id AA05384; Thu, 12 May 1994 00:08:34 -0400 Message-Id: <9405120408.AA05384@xirtlu.zk3.dec.com> To: Noel Chiappa Cc: bound@zk3.dec.com, nimrod-wg@BBN.COM Subject: Re: IPng requirements document points.... In-Reply-To: Your message of "Wed, 11 May 94 16:37:18 EDT." <9405112037.AA09472@ginger.lcs.mit.edu> Date: Thu, 12 May 94 00:08:28 -0400 From: bound@zk3.dec.com X-Mts: smtp Noel, From: bound@zk3.dec.com I think these requirements are good for the NEXT Internet World, which I believe will be far more complex than today per all the users types who will want to be on the Internet. >By "NEXT", do you mean IPng, or some hypothetical thing after that? If you >really think we're soon going to need something beyond IPng, maybe we should >wait a bit, and see if we can compress two "product cycles" of the internetwork >layer into one. At the very least, we should make it plain to people that IPng >is an interim step.. I mean IPng we need to get it right now. As far as IPng being an interim step that is unacceptable to vendors and most customers in the real world. They will loose faith in the IETF if they (we) cannot figure it out with IPng. I believe forcing the separation of EIDs and Locators permits us a great flexibility for the future putting my long term architecture hat on. This forces us to live with this model and also provides us discrete components to architect and then engineer into the year 2000. I am positive its the right thing to do (call it 20 years of this industry intuition). I think the last bottom-line section of the reqs I pulled accross from research.ftp.com are what needs to be sent in to the IPng Directorate. >I assume you mean section 3.3, "Specific Interaction Issues"? >What about section 2.2, "Packet Format Fields"? Many of these fields describe >information which ought to be provided by the hosts (e.g. the source and >destination locators, flow-id, etc). If you don't think lengths ought to be >included, I would strongly disagree. This group is working on actual >mechanisms, and these are our best thoughts on what size fields we will need >to support the designs we are doing. I did mean 3.3 but your right 2.2 is required to make 3.3. work. >There are some "world-view" sections that don't need to be in there (such as >3.1 and 3.2), but I'd say all the rest contain stuff which is of direct >relevance of what Nimrod thinks it needs... At a minimum they should be at the top of any analysis if readers wish to view them. ... So these folks are really maxed out (like me) and need concise data right now. Kind of like sending something to a high level technical manager or director who you want to read your document, give them the technical jist up front to entice them to read the rest of your technical paper. >Are you saying you want a shorter document, because this one is too long? No just put a quick overview and then the actual requirements up front and then put all the rest behind it as supporting discussion. Now I have read these discussions thats why I think the requirements are on target. I just think its best to give them the requirements up front. Let that be the first discussion point not the technical philosophy. Most likely if one dislikes the requirements the technical philosophy discussion will begin anyway. In this case thats the core requirements. I think stating what will BREAK the core beliefs and abstractions in NIMROD are critical. >The core requirements, as best we can codify them, and in the sections I >mentioned: 2.2 and 3.3. Leaving out any would tend to do serious damage, the >exact nature of which (and our workarounds in response) we could discuss. I agree and think above I made it more clear what I suggested. take care, /jim   Received: from PIZZA.BBN.COM by BBN.COM id aa05699; 17 Jun 94 15:27 EDT Received: from pizza by PIZZA.BBN.COM id aa29321; 17 Jun 94 15:07 EDT Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa29317; 17 Jun 94 15:04 EDT To: nimrod-wg@BBN.COM Subject: New Architecture Draft Date: Fri, 17 Jun 94 15:00:42 -0400 From: Isidro Castineyra There is a new draft of the Nimrod architecture. The draft incorporates comments and suggestions received during the last IETF and from the mail in this list. The file is in ftp://bbn.com/pub/nimrod-wg/architecture.draft It is an ascii file. We are hoping to put this in the Ineternet Draft archive at the beginning of July. Please send comments to the working group mailing list. Thanks, Isidro Isidro Castineyra (isidro@bbn.com) Bolt Beranek and Newman, Incorporated (617) 873-6233 10 Moulton Street, Cambridge, MA 02138 USA   Received: from PIZZA.BBN.COM by BBN.COM id aa09403; 20 Jun 94 12:36 EDT Received: from pizza by PIZZA.BBN.COM id aa11709; 20 Jun 94 12:19 EDT Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa11705; 20 Jun 94 12:16 EDT To: nimrod-wg@BBN.COM Subject: more stuff on bbn.com Date: Mon, 20 Jun 94 12:17:13 -0400 From: Martha Steenstrup Hello, There is a document describing a take on Nimrod functionality in the pub/nimrod-wg directory of bbn.com. It is a compressed postscript file, func.ps.Z, and is available via anonymous FTP. There will be a straight ASCII version later this week, but that version will lack the figures of the postscript version. So I urge you to try the postscript version first. This document should get you thinking about how one might go about adding internetwork functionality, based on the Nimrod routing architecture. It's a sort of first step toward actual protocols. Please send your Nimrod protocol ideas and your comments on the document to the nimrod-wg mailing list. Thanks, Martha   Received: from PIZZA.BBN.COM by BBN.COM id aa13264; 20 Jun 94 13:36 EDT Received: from pizza by PIZZA.BBN.COM id aa12264; 20 Jun 94 13:21 EDT Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa12260; 20 Jun 94 13:19 EDT To: nimrod-wg@BBN.COM Subject: bbn.com Date: Mon, 20 Jun 94 13:20:29 -0400 From: Martha Steenstrup Apparently, the new files have not yet been placed on bbn.com. (Security precautions at BBN preclude us from writing directly to that machine. Hence, we rely on certain designated people to place things in that directory for us.) I guess I jumped the gun on this one. I will let you know when the document is REALLY there. Sorry about that. m   Received: from PIZZA.BBN.COM by BBN.COM id aa13727; 20 Jun 94 13:43 EDT Received: from pizza by PIZZA.BBN.COM id aa12344; 20 Jun 94 13:26 EDT Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa12340; 20 Jun 94 13:25 EDT To: nimrod-wg@BBN.COM Subject: all documents Date: Mon, 20 Jun 94 13:25:49 -0400 From: Martha Steenstrup Hello again, Just FTPed to bbn.com and all documents are there. The architecture document, the functionality document, the mobility document, and the multicast document. Please let us know if you have any trouble obtaining these. Thanks, m   Received: from PIZZA.BBN.COM by BBN.COM id aa16628; 20 Jun 94 14:28 EDT Received: from pizza by PIZZA.BBN.COM id aa12738; 20 Jun 94 14:12 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa12734; 20 Jun 94 14:09 EDT Received: from quern.epilogue.com by BBN.COM id aa15466; 20 Jun 94 14:10 EDT From: Dave Bridgham Sender: dab@epilogue.com To: nimrod-wg@BBN.COM Subject: comments on architecture draft Date: Mon, 20 Jun 94 14:08:54 -0400 Message-ID: <9406201408.aa05437@quern.epilogue.com> The Nimrod approach to providing this routing functionality includes map distribution according to the ``link-state'' paradigm, I would describe the link-state paradigm as each device broadcasts the state of its links through the net by flooding which enables any device in the flooding area (the flood plain?) to construct a map of the net. This doesn't come very close to my picture of nimrod. A map is a graph composed of nodes and arcs. Properties of nodes and arcs are contained in attributes associated with them. Nimrod defines languages to specify these attributes and to describe maps. I thought that one of the few conclusions we came to at the last IETF was that arcs did not have attributes. While I was one of the people who argued that it doesn't matter which way you go because one is convertable to the other, implementationwise I'm certainly going to convert any arc with attributes to a node and that's how I'd usually draw it too. It also solves the problem of where you're allowed to draw cluster boundries. You can only cut arcs, which aren't enties they only connect entities. The locator of an arc is prefixed by the locator of the node attached to the tail of the arc (the node the arc ``leaves''). The locators of all attributes of a node are also prefixed by the node's locator. For example, the locators for the connectivity specifications in the Transit Connectivity attribute are prefixed by the locator of the node. If arcs are entities then they don't need locators at all. defined at this stage. An alternative would be not to assign locators to attributes, but assign an attribute number. The attribute would be identified by the locator of the node (or arc) and the attribute number. The concatnation of these two starts to look suspiciously like a locator. I had always assumed that attributes of a node would be named by a string. Think property lists on atoms in lisp. Strings have the property that they seem more expandable than integers. Maybe from a theoretic point of view they're not really, but in practice it seems to work out that way. Imagine email header fields being named by numbers instead of things like From:, Subject, or X-todays-funny-saying:. When I concatenate an attribute name with a locator it doesn't really look like a locator anymore. Nimrod has no pre-defined ``lowest level'': for example, it is possible to define and advertise a map that is physically realized inside a CPU. In this map, a node could represent, for example, a process or a group of processes. The user of this map need not necessarily know or care. (``It is turtles all the way down!'', in [3] page 63.) Well, I've never written up anything on bottom-up locators so I suppose it's reasonable that we're still assuming top-down. The main consequence of this requirement, and it is not a trivial one, is that ``you cannot take your locator with you.'' As an example of this, see figure 1, ... With a little editing of the whitespace this was too funny. A datagram-mode packet can indicate a limited form of policy routing by the choice of destination and source locators. For this choice to exist, the source or destination endpoints must have several locators associated with them. This type of policy routing is capable of, for example, choosing providers. I don't think you should even think about suggesting that this is possible. I seriously doubt that people will ever be able to do better than first level provider selection by having multiple locators. I'd much rather discourage policy selection by multiple locators and encourage people to come up with algorithms that use the information provided in the network maps to do it reasonably. The renumbering scheme described above implies that it should be possible to update the DNS (or its equivalent) securely and, relatively, dynamically. However, because renumbering will, most likely, be infrequent and carefully planned, we expect that the load on this updating mechanism should be manageable. I suggest that if we end up using top-down locators then renumbering will not necessarily be infrequent. Also, renumbering may involve large pieces of the entire internet at times. Dave Bridgham   Received: from PIZZA.BBN.COM by BBN.COM id aa25332; 20 Jun 94 16:25 EDT Received: from pizza by PIZZA.BBN.COM id aa13467; 20 Jun 94 16:11 EDT Received: from TTL.BBN.COM by PIZZA.BBN.COM id aa13463; 20 Jun 94 16:09 EDT To: nimrod-wg@BBN.COM Subject: even more stuff on bbn.com Date: Mon, 20 Jun 94 16:03:56 -0400 From: Ram Ramanathan There are two more documents related to Nimrod available for anonymous FTP from bbn.com. One is on mobility support and the other on multicast support. Each of them is available in both .ps and .txt (ascii) format. Anonymous FTP to bbn.com and go to /pub/nimrod-wg. The files are : mobility.ps mobility.txt multicast.ps multicast.txt These documents describe the requirements that mobility/multicast solution should meet and approaches to solutions including examples (Mobile-IP and PIM). The style is somewhat more discussion oriented than the architecture and functionality documents. Please send your comments and suggestions to this mailing list as soon as you can so that we can talk about it before making it if appropriate, an internet-draft. - Ram. -------------------------------------------------------------- Ram Ramanathan Advanced Networking R & D BBN Systems and Technologies 10 Moulton Street, Cambridge, MA 02138 Phone : (617) 873-2736 INTERNET : ramanath@bbn.com   Received: from PIZZA.BBN.COM by BBN.COM id aa17549; 22 Jun 94 17:08 EDT Received: from pizza by PIZZA.BBN.COM id aa29058; 22 Jun 94 16:54 EDT Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa29054; 22 Jun 94 16:50 EDT To: nimrod-wg@BBN.COM Subject: functionality document Date: Wed, 22 Jun 94 14:40:00 -0400 From: Martha Steenstrup A straight text version of the Nimrod functionality document, minus the figures, should be available on bbn.com by the end of the day. The name of the file is func.txt.   Received: from PIZZA.BBN.COM by BBN.COM id aa19589; 23 Jun 94 14:45 EDT Received: from pizza by PIZZA.BBN.COM id aa05802; 23 Jun 94 14:31 EDT Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa05798; 23 Jun 94 14:28 EDT To: nimrod-wg@BBN.COM cc: isidro@BBN.COM Subject: Dave's comments to the architecutre draft Date: Thu, 23 Jun 94 14:25:29 -0400 From: Isidro Castineyra David, Thanks for your comments. My thoughts are below. AD> = Architecture Draft DC> = Dave's Comments AD> The Nimrod approach to providing this routing functionality AD> includes map distribution according to the ``link-state'' paradigm, DC> I would describe the link-state paradigm as each device broadcasts DC> the state of its links through the net by flooding which enables DC> any device in the flooding area (the flood plain?) to construct a DC> map of the net. This doesn't come very close to my picture of DC> nimrod. Are you talking about the description "link-state" or about the existence of a flooding protocol? If the former, I am not sure if what I think Nimrod is going to have should be called "link-state" or not. If the latter, I think that there is going to be to be a "hierarchical flooding" mechanism in operation. After all, there has to be a way of discovering a node's map to start with. Something like the following. In general, each router participates in implementing one or more nodes. A router participates in reliably flooding updates for the maps of those nodes it implements. Consider the figure below (I hate ascii drawings) which shows only routers and their interconnections. The first figure first shows a physical network, the seconmd shows the Nimrod map (you might want to print this) /\ /\ /R1\ a:b /R2\ a:c \ / \ / \/ \/ \ / \ / \ / \ / \/\ a:d:1 /\ a:d:2 /R3\________________________________ /R4\ \ / \ / \/. .\/ \ . . / \ . . / \ . . / \ . a:d:3 . / \ . /\ . / \ . /R5\. / \ \ / / \ \/ / \ | / \ | / \ | / \ | / \ | / \ | / \ | / \ | / \ | / /\ a:d:4 /R6\ \ / \/ | | | | | /\ /R7\ a:e \ / \/ +-----+ +-----+ | | | | | a:b | |a:c | | | | | +-----+. +-----+ . . . . . . . +-----+ . . | | . .| a:d | . | |. +-----+ | | | | | / | +-----+ | | | a:e | | | +-----+ All routers are part of node a. There will be at least two floodings happening. One associated with node a:d, another associated with node a. Routers R1, R2, R3, R4, R6, R7 (but not R5) participate in the flooding associated with node a. Routers R3 to R6 participate in the flooding associated with node a:d. (R5 participates only in the a:d flooding). R1 does not see what's happening inside a:d (nor do R2 or R7, for that matter). If R1 needs that a:d's map, it would have to expressely request it (from R3, perhaps). R5 does not see what's happening outside a:d AD> A map is a graph composed of nodes and arcs. Properties of nodes AD> and arcs are contained in attributes associated with them. Nimrod AD> defines languages to specify these attributes and to describe maps. DC> I thought that one of the few conclusions we came to at the last DC> IETF was that arcs did not have attributes. While I was one of the DC> people who argued that it doesn't matter which way you go because DC> one is convertable to the other, implementationwise I'm certainly DC> going to convert any arc with attributes to a node and that's how DC> I'd usually draw it too. DC> It also solves the problem of where you're allowed to draw cluster DC> boundaries. You can only cut arcs, which aren't enties they only DC> connect entities. Ram is going to address that in another message. AD> The locator of an arc is prefixed by the locator of the node AD> attached to the tail of the arc (the node the arc ``leaves''). The AD> locators of all attributes of a node are also prefixed by the AD> node's locator. For example, the locators for the connectivity AD> specifications in the Transit Connectivity attribute are prefixed AD> by the locator of the node. DC> If arcs are entities then they don't need locators at all. One needs some way to refer to an arc when you are specifying a "source route". Rather than have other types of names, we thought that giving the arcs locators was easier. AD> defined at this stage. An alternative would be not to assign AD> locators to attributes, but assign an attribute number. The AD> attribute would be identified by the locator of the node (or arc) AD> and the attribute number. The concatnation of these two starts to AD> look suspiciously like a locator. DC> I had always assumed that attributes of a node would be named by a DC> string. Think property lists on atoms in lisp. Strings have the DC> property that they seem more expandable than integers. Maybe from DC> a theoretic point of view they're not really, but in practice it DC> seems to work out that way. Imagine email header fields being DC> named by numbers instead of things like From:, Subject, or DC> X-todays-funny-saying:. When I concatenate an attribute name with DC> a locator it doesn't really look like a locator anymore. We are going to need a way to refer to "well known attributes", those, I think, will be strings. For example, in some data base it will say "the following are the connectivity specifications associated with this arc (or node's transit specifications, if we make the change above), but then you will also need to give them a name so that a packet or flow that wishes to use it can refer to it succinctly. I agree with you that perhaps it should not be called a locator, but I was trying to minimize the number of things. AD> Nimrod has no pre-defined ``lowest level'': for example, it is AD> possible to define and advertise a map that is physically realized AD> inside a CPU. In this map, a node could represent, for example, a AD> process or a group of processes. The user of this map need not AD> necessarily know or care. (``It is turtles all the way down!'', in AD> [3] page 63.) DC> Well, I've never written up anything on bottom-up locators so I DC> suppose it's reasonable that we're still assuming top-down. Actually, I am not assuming how the locators are assigned. I believe that the architecture as it stands supports both ways of assignigment. Perhaps we should add a explicit note to that effect. AD> The main consequence of this requirement, and it is not a trivial AD> one, is that ``you cannot take your locator with you.'' As an AD> example of this, see figure 1, ... DC> With a little editing of the whitespace this was too funny. AD> A datagram-mode packet can indicate a limited form of policy AD> routing by the choice of destination and source locators. For this AD> choice to exist, the source or destination endpoints must have AD> several locators associated with them. This type of policy routing AD> is capable of, for example, choosing providers. DC> I don't think you should even think about suggesting that this is DC> possible. I seriously doubt that people will ever be able to do DC> better than first level provider selection by having multiple DC> locators. I'd much rather discourage policy selection by multiple DC> locators and encourage people to come up with algorithms that use DC> the information provided in the network maps to do it reasonably. I have no objection to deleting this. I just thought that our customers (Nimrod's would be users) want to do this, and will try to do this even if we do not like it. AD> The renumbering scheme described above implies that it should be AD> possible to update the DNS (or its equivalent) securely and, AD> relatively, dynamically. However, because renumbering will, most AD> likely, be infrequent and carefully planned, we expect that the AD> load on this updating mechanism should be manageable. DC> I suggest that if we end up using top-down locators then DC> renumbering will not necessarily be infrequent. Also, renumbering DC> may involve large pieces of the entire internet at times. Agreed. Isidro   Received: from PIZZA.BBN.COM by BBN.COM id aa26078; 23 Jun 94 15:46 EDT Received: from pizza by PIZZA.BBN.COM id aa06295; 23 Jun 94 15:33 EDT Received: from TTL.BBN.COM by PIZZA.BBN.COM id aa06291; 23 Jun 94 15:31 EDT To: nimrod-wg@BBN.COM Subject: Re: Dave's comments to the architecutre draft In-reply-to: Your message of Thu, 23 Jun 94 14:25:29 -0400. Date: Thu, 23 Jun 94 15:30:02 -0400 From: Ram Ramanathan >AD> A map is a graph composed of nodes and arcs. Properties of nodes >AD> and arcs are contained in attributes associated with them. Nimrod >AD> defines languages to specify these attributes and to describe maps. > DC> I thought that one of the few conclusions we came to at the last > DC> IETF was that arcs did not have attributes. While I was one of the > DC> people who argued that it doesn't matter which way you go because > DC> one is convertable to the other, implementationwise I'm certainly > DC> going to convert any arc with attributes to a node and that's how > DC> I'd usually draw it too. > DC> It also solves the problem of where you're allowed to draw cluster > DC> boundaries. You can only cut arcs, which aren't enties they only > DC> connect entities. >Ram is going to address that in another message I don't think it is a big problem that needs "addressing", but my opinion is that the architecture only specifies that a node *can* have attributes and an arc *can* have attributes. As an implementor, one may choose to have attributes for arcs or not. As Dave mentions, they are both functionally equivalent. I am not in favor of *precluding* arcs from having attributes. However, perhaps the text should be changed to bring out the point more clearly - something like "A map is a graph composed of nodes and arcs. Nodes and arcs may have attributes associated with them. Nimrod specifies the language ...". Regarding clustering, I consider it as replacement of one map with another. Both nodes and arcs can be aggregated. Aggregation results in a new set of nodes and a new set of arcs, with an associated mapping between the old and the new. How the mapping is stored and processed is a subject that belongs in the protocol document. - Ram.   Received: from PIZZA.BBN.COM by BBN.COM id aa11269; 24 Jun 94 10:32 EDT Received: from pizza by PIZZA.BBN.COM id aa10989; 24 Jun 94 10:19 EDT Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa10985; 24 Jun 94 10:16 EDT To: Ram Ramanathan cc: nimrod-wg@BBN.COM Subject: Re: Dave's comments to the architecutre draft In-reply-to: Your message of Thu, 23 Jun 94 15:30:02 -0400. Date: Fri, 24 Jun 94 10:13:14 -0400 From: Isidro Castineyra >>Regarding clustering, I consider it as replacement of one map with another. >>Both nodes and arcs can be aggregated. Aggregation results in a new set of >>nodes and a new set of arcs, with an associated mapping between the old >>and the new. How the mapping is stored and processed is a subject that >>belongs in the protocol document. >> I am not convinced that this mapping should be part of Nimrod. I imagine that implementations will keep such a mapping. But how would this map be useful to a user of the map is not clear to me. Moreover, requiring a mapping might be more of a hindrance than a help. Isidro   Received: from PIZZA.BBN.COM by BBN.COM id aa26835; 24 Jun 94 15:02 EDT Received: from pizza by PIZZA.BBN.COM id aa13527; 24 Jun 94 14:41 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa13523; 24 Jun 94 14:39 EDT Received: from quern.epilogue.com by BBN.COM id aa25678; 24 Jun 94 14:39 EDT From: Dave Bridgham Sender: dab@epilogue.com To: isidro@BBN.COM CC: nimrod-wg@BBN.COM In-reply-to: Isidro Castineyra's message of Thu, 23 Jun 94 14:25:29 -0400 <9406231437.aa04238@quern.epilogue.com> Subject: Dave's comments to the architecutre draft Date: Fri, 24 Jun 94 14:39:21 -0400 Message-ID: <9406241439.aa11386@quern.epilogue.com> This response contains rather vast quantities of included message. Sorry about that. Date: Thu, 23 Jun 94 14:25:29 -0400 From: Isidro Castineyra AD> = Architecture Draft DC> = Dave's Comments AD> The Nimrod approach to providing this routing functionality AD> includes map distribution according to the ``link-state'' paradigm, DC> I would describe the link-state paradigm as each device broadcasts DC> the state of its links through the net by flooding which enables DC> any device in the flooding area (the flood plain?) to construct a DC> map of the net. This doesn't come very close to my picture of DC> nimrod. Are you talking about the description "link-state" or about the existence of a flooding protocol? If the former, I am not sure if what I think Nimrod is going to have should be called "link-state" or not. If the latter, I think that there is going to be to be a "hierarchical flooding" mechanism in operation. After all, there has to be a way of discovering a node's map to start with. Something like the following. In general, each router participates in implementing one or more nodes. A router participates in reliably flooding updates for the maps of those nodes it implements. Consider the figure below (I hate ascii drawings) which shows only routers and their interconnections. The first figure first shows a physical network, the seconmd shows the Nimrod map (you might want to print this) Ah, I see I think. I misunderstood just which part of Nimrod you were talking about. I thought you were talking about the map distribution part where route calculators talk to various map servers in the porcess of figuring out a route to somewhere. Now I think you were talking about how the maps are generated in the first place. Very likely the lowest level map of a nimrod map heirarchy would be produced by a link state protocol of some sort. Of course something else could work here but I see link state is the most likely at this time. However, I'm not sure this follows for any layers above that. The map at one layer will get abstracted and passed up. For now I assume the abstraction will be largely very simple with hand tuning; maybe someday we'll get good algorithms for automating this. The layer above I assumed would take maps from below to build its maps. I guess this could be described as being similar to the link-state paradigm. One needs some way to refer to an arc when you are specifying a "source route". Rather than have other types of names, we thought that giving the arcs locators was easier. This is true if you give arcs attributes. It is not true if you don't. You only need to specify arc in a source route if arcs are somehow related to some entity. If they're not then they don't get specified in source routes and they don't need names. We are going to need a way to refer to "well known attributes", those, I think, will be strings. For example, in some data base it will say "the following are the connectivity specifications associated with this arc (or node's transit specifications, if we make the change above), but then you will also need to give them a name so that a packet or flow that wishes to use it can refer to it succinctly. I agree with you that perhaps it should not be called a locator, but I was trying to minimize the number of things. I thought the idea was to encapsulate all the various choices made by the route generator in the source route or flow spec. You seem to be suggesting here that some of the information used in chosing the route is going to be embedded in the source route or flow spec. I assume so part of the route choice can be made out it the network somewhere. As for the naming, you say the connectivity spec would have a string name and a, presumably different, name so the flow can refer to it succinctly. From the original message I assume this second name would be the number then. I wouldn't use the number at all, but I'm wierd. AD> A datagram-mode packet can indicate a limited form of policy AD> routing by the choice of destination and source locators. For this AD> choice to exist, the source or destination endpoints must have AD> several locators associated with them. This type of policy routing AD> is capable of, for example, choosing providers. DC> I don't think you should even think about suggesting that this is DC> possible. I seriously doubt that people will ever be able to do DC> better than first level provider selection by having multiple DC> locators. I'd much rather discourage policy selection by multiple DC> locators and encourage people to come up with algorithms that use DC> the information provided in the network maps to do it reasonably. I have no objection to deleting this. I just thought that our customers (Nimrod's would be users) want to do this, and will try to do this even if we do not like it. I'm sure some will; I'd like them to be as few as possible. I'd rather spend the time up front building Nimrod from the beginning such that it does the right thing without such kludges. I believe the architecture has it in it and I'd not like that lost in the rush of people setting up their sites with multiple locators to do first level provider selection. Dave   Received: from PIZZA.BBN.COM by BBN.COM id aa00160; 24 Jun 94 15:36 EDT Received: from pizza by PIZZA.BBN.COM id aa13792; 24 Jun 94 15:20 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa13788; 24 Jun 94 15:18 EDT Received: from quern.epilogue.com by BBN.COM id aa28781; 24 Jun 94 15:16 EDT From: Dave Bridgham Sender: dab@epilogue.com To: ramanath@BBN.COM CC: nimrod-wg@BBN.COM In-reply-to: Ram Ramanathan's message of Thu, 23 Jun 94 15:30:02 -0400 <9406231538.aa04470@quern.epilogue.com> Subject: Dave's comments to the architecutre draft Date: Fri, 24 Jun 94 15:15:56 -0400 Message-ID: <9406241515.aa11674@quern.epilogue.com> Date: Thu, 23 Jun 94 15:30:02 -0400 From: Ram Ramanathan >AD> A map is a graph composed of nodes and arcs. Properties of nodes >AD> and arcs are contained in attributes associated with them. Nimrod >AD> defines languages to specify these attributes and to describe maps. > DC> I thought that one of the few conclusions we came to at the last > DC> IETF was that arcs did not have attributes. While I was one of the > DC> people who argued that it doesn't matter which way you go because > DC> one is convertable to the other, implementationwise I'm certainly > DC> going to convert any arc with attributes to a node and that's how > DC> I'd usually draw it too. > DC> It also solves the problem of where you're allowed to draw cluster > DC> boundaries. You can only cut arcs, which aren't enties they only > DC> connect entities. I can go either way here, my biggest concern was that I thought we came to an agreement on this at the Seattle IETF. An agreement the other way. Even though I was argueing for the side we didn't agree to I've come since to believe that Noel was right. I think it really does work better I think to make anything with attributes a node and links are non-entities that link the nodes together. I don't think it is a big problem that needs "addressing", but my opinion is that the architecture only specifies that a node *can* have attributes and an arc *can* have attributes. As an implementor, one may choose to have attributes for arcs or not. As Dave mentions, they are both functionally equivalent. I am not in favor of *precluding* arcs from having attributes. However, perhaps the text should be changed to bring out the point more clearly - something like "A map is a graph composed of nodes and arcs. Nodes and arcs may have attributes associated with them. Nimrod specifies the language ...". As an implementor if arcs can have attributes I better implement handling arcs with attributes. Either that or have a input converter to re-write any maps containing arcs with attributes into one with only nodes with attributes. Dave   Received: from PIZZA.BBN.COM by BBN.COM id aa03174; 24 Jun 94 16:28 EDT Received: from pizza by PIZZA.BBN.COM id aa14184; 24 Jun 94 16:12 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa14180; 24 Jun 94 16:10 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa02027; 24 Jun 94 16:07 EDT Received: by ginger.lcs.mit.edu id AA09260; Fri, 24 Jun 94 16:07:46 -0400 Date: Fri, 24 Jun 94 16:07:46 -0400 From: Noel Chiappa Message-Id: <9406242007.AA09260@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Re: Dave's comments to the architecutre draft Cc: jnc@ginger.lcs.mit.edu From: Dave Bridgham >> I thought that one of the few conclusions we came to at the last >> IETF was that arcs did not have attributes. I can go either way here, my biggest concern was that I thought we came to an agreement on this at the Seattle IETF. There was a lot of heat (and some light :-) about the node and arc model, but I don't recall the exact outcome on this specific issue (do arcs have attributes). I do remember Isidro sort of capitulating and saying "OK, we'll do it that way", and me being unhappy because I wanted people to go with the node/arc model I suggested only if they were convinced it was the Right Thing, not because I was stubborn. I've come since to believe that Noel was right. :-) I think it really does work better I think to make anything with attributes a node and links are non-entities that link the nodes together. It certainly makes the implementation more of a simple step from the formal description. The one thing that still worries me is that that formalism isn't the closest match to the kinds of pictures we all draw (where the interfaces are arcs, and nodes represent nets and boxes). On the other hand, I don't know how else to deal with the fact that interfaces are going to have attributes, unless we make the interfaces attributes of the router/host nodes, and then all the interface "attributes" are sub-attributes of the interface attribute.. Sigh, I keep promising to go off and think about the node/arc model hard, but I keep not having the time. These stupid arguments on Big-I about "should TLN's be different from TSILN's", and then the fixed/variable stuff, keep wasting valuable time. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa00978; 24 Jun 94 23:38 EDT Received: from pizza by PIZZA.BBN.COM id aa16096; 24 Jun 94 23:27 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa16092; 24 Jun 94 23:25 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa24637; 24 Jun 94 23:25 EDT Received: by ginger.lcs.mit.edu id AA11059; Fri, 24 Jun 94 23:25:31 -0400 Date: Fri, 24 Jun 94 23:25:31 -0400 From: Noel Chiappa Message-Id: <9406250325.AA11059@ginger.lcs.mit.edu> To: dab@epilogue.com, isidro@BBN.COM Subject: Re: Dave's comments to the architecutre draft Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM From: Dave Bridgham AD> The Nimrod approach to providing this routing functionality AD> includes map distribution according to the ``link-state'' paradigm, I must confess a certain amount of unease with the use of "link state" here. To me, LS invokes a mental image of things like the new ARPAnet algorithm, IS-IS, OSPF, etc; i.e. routing architectures which depend on synchronized databases and identical route calculations to make a hop-by-hop routing paradigm work. I prefer to put IDPR and Nimrod in a class I call "map distribution", of which LS is a subset. > One needs some way to refer to an arc when you are specifying a > "source route". Rather than have other types of names, we thought that > giving the arcs locators was easier. This is true if you give arcs attributes. It is not true if you don't. I'd put if that you only need to name the arcs if you allow more than one arc between a pair of nodes; if you don't, then there's no possible confusion. If arcs just represent connectivity, with no attributes, then you don't need to have more than one between any node pair (the second one contains 0 information), but if arcs have attributes, you need to be able to allow multiple arcs. But then we get back to whether arcs have nodes, and what the data will look like anyway... Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa21037; 29 Jun 94 11:52 EDT Received: from pizza by PIZZA.BBN.COM id aa12122; 29 Jun 94 11:29 EDT Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa12118; 29 Jun 94 11:26 EDT To: nimrod-wg@BBN.COM Subject: Administrivia Date: Wed, 29 Jun 94 11:21:34 -0400 From: Isidro Castineyra A couple of things to get us organized for IETF. 1.- The current draft Nimrod documents have been put in host bbn.com in directory /pub/nimrod-wg. These files are accessible via anonymous ftp. There are four different documents and seven files there: architecture.draft draft architecture func.ps.Z postscript version of the functionality func.txt text version of the functionality mobility.ps postscript version of Nimrod's approach to mobility mobility.txt text version of Nimrod's approach to mobility multicast.ps postscript version of Nimrod's approach to multicast multicast.txt text version of Nimrod's approach to multicast We plan to submit these documents to the Internet Draft archive by the end of the first week of July unless there are objections. Please send your comments to the list so we can incorporate them. 2.- I will be sending out a draft agenda soon. Let me know of any items you would like to see in it. Thanks, Isidro   Received: from PIZZA.BBN.COM by BBN.COM id aa25995; 29 Jun 94 13:09 EDT Received: from pizza by PIZZA.BBN.COM id aa12781; 29 Jun 94 12:59 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa12777; 29 Jun 94 12:56 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa25078; 29 Jun 94 12:54 EDT Received: by ginger.lcs.mit.edu id AA12237; Wed, 29 Jun 94 12:54:52 -0400 Date: Wed, 29 Jun 94 12:54:52 -0400 From: Noel Chiappa Message-Id: <9406291654.AA12237@ginger.lcs.mit.edu> To: nimrod-wg@BBN.COM Subject: Locators and EID's Cc: jnc@ginger.lcs.mit.edu Everyone on this WG mailing list who think's that the internetwork should have transport levels names (e.g. EID's) which are separate from routing names (e.g. locators) needs to respond to the recent query from the IPng AD's on the Big-Internet mailing list about whether people want one "name" or two. I'd guess that many of you have given up reading it, but many people whom I know favor this split haven't responded, so please, take the time to do so. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa05050; 29 Jun 94 15:16 EDT Received: from pizza by PIZZA.BBN.COM id aa13802; 29 Jun 94 15:04 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa13798; 29 Jun 94 15:00 EDT Received: from inet-gw-2.pa.dec.com by BBN.COM id aa04000; 29 Jun 94 15:00 EDT Received: from xirtlu.zk3.dec.com by inet-gw-2.pa.dec.com (5.65/27May94) id AA03691; Wed, 29 Jun 94 11:52:12 -0700 Received: by xirtlu.zk3.dec.com; id AA26345; Wed, 29 Jun 1994 14:52:03 -0400 Message-Id: <9406291852.AA26345@xirtlu.zk3.dec.com> To: Noel Chiappa Cc: nimrod-wg@BBN.COM, sob@hsdndev.harvard.edu, mankin@cmf.nrl.navy.mil, pvm@isi.edu Subject: Re: Locators and EID's In-Reply-To: Your message of "Wed, 29 Jun 94 12:54:52 EDT." <9406291654.AA12237@ginger.lcs.mit.edu> Date: Wed, 29 Jun 94 14:51:56 -0400 From: bound@zk3.dec.com X-Mts: smtp Noel, > Everyone on this WG mailing list who think's that the internetwork >should have transport levels names (e.g. EID's) which are separate from >routing names (e.g. locators) needs to respond to the recent query from the >IPng AD's on the Big-Internet mailing list about whether people want one >"name" or two. > I'd guess that many of you have given up reading it, but many people >whom I know favor this split haven't responded, so please, take the time to >do so. Not at all. Just very busy. Also a lot of us answered this on the SIPP list. My response was that I only wanted one name for my system. My new Internet node for example to test IPng is called sipper (not available yet to incoming packets still working on the filters). I don't want to remember any other name as an end user. I want that to all be transparent to me. So what I am saying is that I don't want to have to call my node sipper.jimbo. If other qualifiers like zk3.dec.com constitute a name then thats OK. I hate to see the X.400 naming strings in any address so its a taste issue not a technical issue. On TSNs or EIDs/Locators you know I agree with that split at the network packet level and software that drives that separation. But for IPng I think the best we can do is make sure that IPng does not preclude in the future a simple change to IPng to use EIDs and Locators. This can be accomplished with a carefully defined IPng header, address space, and source route. I also think we need a separate working group to just work on EIDs and what they mean and get some implementation experience based on a spec. /jim   Received: from PIZZA.BBN.COM by BBN.COM id aa15181; 30 Jun 94 10:51 EDT Received: from pizza by PIZZA.BBN.COM id aa19768; 30 Jun 94 10:35 EDT Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa19764; 30 Jun 94 10:31 EDT To: nimrod-wg@BBN.COM Subject: Draft Nimrod Agenda for Toronto Date: Thu, 30 Jun 94 10:28:33 -0400 From: Isidro Castineyra This is the draft agenda for the Toronto IETF. Please send comments and suggestions. Thanks, Isidro ---------------- Group Name: Nimrod - The New Internet Routing and Addressing Architecture IETF Area: Routing Date/Time: Tuesday, July 26, 1994 1600-1800 EST (multicast) Wednesday, July 27, 1994 1600-1800 EST Proposed Agenda -- First Session 1. Agenda bashing 2. Architecture a. Update (Isidro Castineyra) 30min b. Questions 30min 4. Nimrod Functionality a. Overview (Martha Steenstrup) 30min b. Questions 30min Proposed Agenda -- Second Session 6. Implementation Sketch a. Database Structuring (Isidro Castineyra) 15min b. Protocol Mechanisms (Ram Ramanathan) 15min c. Mapping Functionality to Databases and Protocols 15min (Martha Steenstrup) d. Discussion 30min 8. Multicast and Mobility a. Update (Ram Ramanathan) 15min b. Questions 15min 7. Open Issues and Work Plan 15min   Received: from PIZZA.BBN.COM by BBN.COM id aa20272; 30 Jun 94 12:00 EDT Received: from pizza by PIZZA.BBN.COM id aa20294; 30 Jun 94 11:41 EDT Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa20290; 30 Jun 94 11:39 EDT To: Noel Chiappa cc: nimrod-wg@BBN.COM Subject: Re: Dave's comments to the architecutre draft In-reply-to: Your message of Fri, 24 Jun 94 16:07:46 -0400. <9406242007.AA09260@ginger.lcs.mit.edu> Date: Thu, 30 Jun 94 11:27:23 -0400 From: Isidro Castineyra I think David and Noel are right that we agreed to have arcs with no attributes. I'll re-write the architecture document with that in mind. There are basically two approaches I can think of: 1.- As mentioned by Noel: make the interfaces attributes of the router/host nodes, and then all the interface "attributes" are sub-attributes of the interface attribute.. The only problem is in the current draft "interface" is not used as a concept. (The previous draft used something equivalent, the node connecting point, or something like that, but we got rid of that.) 2.- Represent arcs with attributes as two arcs with a node in the middle. I really do not like this approach. These nodes would have to have locators, which I do not know how to assign reasonably. Isidro   Received: from PIZZA.BBN.COM by BBN.COM id aa22367; 30 Jun 94 12:39 EDT Received: from pizza by PIZZA.BBN.COM id aa20709; 30 Jun 94 12:21 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa20705; 30 Jun 94 12:19 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa21163; 30 Jun 94 12:14 EDT Received: by ginger.lcs.mit.edu id AA19695; Thu, 30 Jun 94 12:14:08 -0400 Date: Thu, 30 Jun 94 12:14:08 -0400 From: Noel Chiappa Message-Id: <9406301614.AA19695@ginger.lcs.mit.edu> To: isidro@BBN.COM, jnc@ginger.lcs.mit.edu Subject: Re: Dave's comments to the architecutre draft Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM From: Isidro Castineyra we agreed to have arcs with no attributes. ... There are basically two approaches I can think of: ... make the interfaces attributes of the router/host nodes ... Represent arcs with attributes as two arcs with a node in the middle. I'll try and find some time to think about this (if this stupid IPng drivel will let up on Big-I)... I really do not like this approach. These nodes would have to have locators, which I do not know how to assign reasonably. Well, maybe it's not so bad as all that. If those nodes represent interfaces, then it's legit for them to have locators, and you can either assign them as subsidiary to the locator of the router node, or the locator of the network node. The Internet has previously always done the second, since that's what makes most sense in a routing-table world, but I can see cases where it's useful to have them associated with the router. (Maybe we do both, with overlapping areas, nasty as that sounds.) Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa22407; 30 Jun 94 12:40 EDT Received: from pizza by PIZZA.BBN.COM id aa20728; 30 Jun 94 12:22 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa20724; 30 Jun 94 12:20 EDT Received: from quern.epilogue.com by BBN.COM id aa21231; 30 Jun 94 12:15 EDT From: Dave Bridgham Sender: dab@epilogue.com To: nimrod-wg@BBN.COM In-reply-to: Isidro Castineyra's message of Thu, 30 Jun 94 11:27:23 -0400 <9406301150.aa25045@quern.epilogue.com> Subject: Dave's comments to the architecutre draft Date: Thu, 30 Jun 94 12:15:28 -0400 Message-ID: <9406301215.aa25247@quern.epilogue.com> Date: Thu, 30 Jun 94 11:27:23 -0400 From: Isidro Castineyra 1.- As mentioned by Noel: make the interfaces attributes of the router/host nodes, and then all the interface "attributes" are sub-attributes of the interface attribute.. The only problem is in the current draft "interface" is not used as a concept. (The previous draft used something equivalent, the node connecting point, or something like that, but we got rid of that.) Huh? Some nodes are networks, some are interfaces, some are hosts, some are aggregates, and some were invented as a place to stash some extra attributes. When you build the map (for any value of `you') you just put in map nodes whereever you need attributes. The map distribution language needs to be able to specify nodes, the attributes of each, the links between them, and the links off this map to other maps. Route generators need to be able to read a map in the map distribution language and understand the attributes so that it can pick a sequence of nodes whos attributes are all acceptable. I suggest that attributes are tuples of where both the tag and the value are strings. I don't know if the tags should be heirarchical or flat. We'll spend a lot of time over the next few decades adding new tag types. 2.- Represent arcs with attributes as two arcs with a node in the middle. I really do not like this approach. These nodes would have to have locators, which I do not know how to assign reasonably. Take the map of nodes and arcs. Draw clustering circles. Don't locators just fall out? The top-down vs bottom-up issue is still there but given a map with clustering circles drawn, the locators shouldn't be that hard to come up with. Oh yeah, this reminds me of something I mentioned to Noel on the phone a while back. I thought it made the whole issue of reasonably assigning locators much simpler but Noel said his head hurt and hung up. I mostly forgot about it until now. Next message. Dave   Received: from PIZZA.BBN.COM by BBN.COM id aa23159; 30 Jun 94 12:56 EDT Received: from pizza by PIZZA.BBN.COM id aa20901; 30 Jun 94 12:42 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa20897; 30 Jun 94 12:39 EDT Received: from quern.epilogue.com by BBN.COM id aa22276; 30 Jun 94 12:38 EDT From: Dave Bridgham Sender: dab@epilogue.com To: nimrod-wg@BBN.COM Subject: grouping strategies Date: Thu, 30 Jun 94 12:37:57 -0400 Message-ID: <9406301237.aa25449@quern.epilogue.com> One rathole we've fallen in more than once is how do we group things in the maps? Specifically, if we have nodes for hosts, networks, and interfaces, does the interface node group with the host or the network? This provided one of the stronger arguments for why we needed overlapping areas; group the interface both ways. Well, how about another solution? I claim that the above thinking is unnecessarily close to IPv4 routing or heirarchical routing. Nimrod allows more flexibility, use it. In particular, put all the nodes for hosts, networks, and interfaces at a site in the same group. No sub-grouping. If the resulting aggregation is not so large as to be unwieldy this seems like it would be much easier. In other words, within a single site, say a company with no more than 1000 to 5000 hosts, I'd just have a flat locator heirarchy to the hosts. I'd go to more groupings if the site was too large or if there were policy reasons for more structure in the grouping. Probably adminstrative reasons would cause more groups too. Implementationwise, I picture something like ES-IS to let the packet forwarders know about all the hosts and link-state like flooding between the packet forwarders to create the lowest level map. Dave   Received: from PIZZA.BBN.COM by BBN.COM id aa23853; 30 Jun 94 13:12 EDT Received: from pizza by PIZZA.BBN.COM id aa21052; 30 Jun 94 12:56 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa21048; 30 Jun 94 12:54 EDT Received: from wd40.ftp.com by BBN.COM id aa23073; 30 Jun 94 12:55 EDT Received: from ftp.com by ftp.com ; Thu, 30 Jun 1994 12:54:58 -0400 Received: from mailserv-D.ftp.com by ftp.com ; Thu, 30 Jun 1994 12:54:58 -0400 Received: by mailserv-D.ftp.com (5.0/SMI-SVR4) id AA13913; Thu, 30 Jun 94 12:53:01 EDT Date: Thu, 30 Jun 94 12:53:01 EDT Message-Id: <9406301653.AA13913@mailserv-D.ftp.com> To: dab@epilogue.com Subject: Re: grouping strategies From: Frank Kastenholz Reply-To: kasten@ftp.com Cc: nimrod-wg@BBN.COM Content-Length: 687 > In other words, > within a single site, say a company with no more than 1000 to 5000 > hosts, I'd just have a flat locator heirarchy to the hosts. > > I'd go to more groupings if the site was too large or if there were > policy reasons for more structure in the grouping. Probably > adminstrative reasons would cause more groups too. > > Implementationwise, I picture something like ES-IS to let the packet > forwarders know about all the hosts and link-state like flooding > between the packet forwarders to create the lowest level map. How is this different from bridging? -- Frank Kastenholz FTP Software 2 High Street North Andover, Mass. USA 01845 (508)685-4000   Received: from PIZZA.BBN.COM by BBN.COM id aa24640; 30 Jun 94 13:21 EDT Received: from pizza by PIZZA.BBN.COM id aa21162; 30 Jun 94 13:06 EDT Received: from BBN.COM by PIZZA.BBN.COM id ab21155; 30 Jun 94 13:04 EDT Received: from quern.epilogue.com by BBN.COM id aa23487; 30 Jun 94 13:05 EDT From: Dave Bridgham Sender: dab@epilogue.com To: kasten@ftp.com CC: nimrod-wg@BBN.COM In-reply-to: Frank Kastenholz's message of Thu, 30 Jun 94 12:53:01 EDT <9406301653.AA13913@mailserv-D.ftp.com> Subject: grouping strategies Date: Thu, 30 Jun 94 13:05:11 -0400 Message-ID: <9406301305.aa25646@quern.epilogue.com> Date: Thu, 30 Jun 94 12:53:01 EDT From: Frank Kastenholz How is this different from bridging? - Broadcasts don't pass through packet forwarders. - You don't flood packets until you've learned where the host lives. - It isn't media layer transparent. - It'll work between different medias. - The maps thus generated could have attributes with policy and performance information. - It can work with resource reservation systems. - I like this scheme and I don't like bridging. - That's what floats immediately to the top of my mind. - One thing I didn't say in the first message. I'm not saying that I'd require Nimrod users to set up their site as I've suggested. They can do as they like. I'm saying they could and if they did life gets easier and this recurring issue about do we group the interface node with the host or the network goes away. Dave   Received: from PIZZA.BBN.COM by BBN.COM id aa25250; 30 Jun 94 13:26 EDT Received: from pizza by PIZZA.BBN.COM id aa21239; 30 Jun 94 13:12 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa21235; 30 Jun 94 13:10 EDT Received: from black-ice.cc.vt.edu by BBN.COM id aa23684; 30 Jun 94 13:09 EDT Received: (from valdis@localhost) by black-ice.cc.vt.edu (8.6.9/8.6.9) id NAA21328; Thu, 30 Jun 1994 13:09:18 -0400 Message-Id: <199406301709.NAA21328@black-ice.cc.vt.edu> To: Noel Chiappa cc: nimrod-wg@BBN.COM Subject: Re: Locators and EID's In-reply-to: Your message of "Wed, 29 Jun 1994 12:54:52 EDT." <9406291654.AA12237@ginger.lcs.mit.edu> From: Valdis.Kletnieks@vt.edu Date: Thu, 30 Jun 1994 13:09:18 +22306356 Sender: valdis@black-ice.cc.vt.edu Noel, the AD, and anybody else who cares: EID's are not locators. They address subtly different issues which may need to be dealt with in IPng (the areas where I see the distinction making the most difference are mobility, multicasting, and anything else we devise where we don't want to nail down both ends of the connection with really big spikes (a la the current [srcaddr,scrport,dstadd,dstport] quads that specify IP V4 connections). /Valdis   Received: from PIZZA.BBN.COM by BBN.COM id aa27786; 30 Jun 94 13:57 EDT Received: from pizza by PIZZA.BBN.COM id aa21505; 30 Jun 94 13:41 EDT Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa21501; 30 Jun 94 13:39 EDT To: Noel Chiappa cc: nimrod-wg@BBN.COM Subject: Re: Dave's comments to the architecutre draft In-reply-to: Your message of Thu, 30 Jun 94 12:14:08 -0400. <9406301614.AA19695@ginger.lcs.mit.edu> Date: Thu, 30 Jun 94 13:35:49 -0400 From: Isidro Castineyra >> I really do not like this approach. These nodes would have to have >> locators, which I do not know how to assign reasonably. >> >>Well, maybe it's not so bad as all that. If those nodes represent interfaces, >>then it's legit for them to have locators, and you can either assign them as >>subsidiary to the locator of the router node, or the locator of the network >>node. Something with a locator subsidiary to the locator of a node (i.e., having its locator prefixed by the locator of a node) should basically be *inside* the node. I really do not like having it hanging way out separated by a full arc. Anyway, I would prefer not to talk about routers in this context. It is too much an implementation issue (or node-realization issue). Isidro   Received: from PIZZA.BBN.COM by BBN.COM id aa08311; 30 Jun 94 15:50 EDT Received: from pizza by PIZZA.BBN.COM id aa22510; 30 Jun 94 15:34 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa22506; 30 Jun 94 15:31 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa06450; 30 Jun 94 15:28 EDT Received: by ginger.lcs.mit.edu id AA22121; Thu, 30 Jun 94 15:28:38 -0400 Date: Thu, 30 Jun 94 15:28:38 -0400 From: Noel Chiappa Message-Id: <9406301928.AA22121@ginger.lcs.mit.edu> To: dab@epilogue.com, nimrod-wg@BBN.COM Subject: Re: grouping strategies Cc: jnc@ginger.lcs.mit.edu From: Dave Bridgham In particular, put all the nodes for hosts, networks, and interfaces at a site in the same group. It would be nice to be able to tell, from looking at a locator, what kind of thing it names. Although, perhaps this information is an attribute of the thing which is named? That wouldn't be so bad; you could do things like run consistency checks on the maps to make sure that there was always a node of type "interface" between a mode of type "machine" and a node of type "network". In other words, within a single site ... I'd just have a flat locator heirarchy to the hosts. Like IS-IS with a single level 1 area covering multiple physical networks, yes? I'd go to more groupings if the site was too large or if there were policy reasons for more structure in the grouping. Probably adminstrative reasons would cause more groups too. I think it would be mostly a human factors deal. For instance, it's nice if a locator for a host interface could be assigned automatically (serverless autoconfig) by appending the local physical network address to the locator of the network. (Yes, you could do the same thing by appending the LPNA to the locator of the whole company, if the LPNA is unique within the company.) That way, you can look at an interface locator, and know which network the machine is on, without consulting a map. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa09687; 30 Jun 94 16:11 EDT Received: from pizza by PIZZA.BBN.COM id aa22738; 30 Jun 94 15:54 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa22733; 30 Jun 94 15:52 EDT Received: from quern.epilogue.com by BBN.COM id aa08349; 30 Jun 94 15:51 EDT From: Dave Bridgham Sender: dab@epilogue.com To: nimrod-wg@BBN.COM In-reply-to: Noel Chiappa's message of Thu, 30 Jun 94 15:28:38 -0400 <9406301928.AA22121@ginger.lcs.mit.edu> Subject: grouping strategies Date: Thu, 30 Jun 94 15:50:04 -0400 Message-ID: <9406301550.aa27373@quern.epilogue.com> Date: Thu, 30 Jun 94 15:28:38 -0400 From: Noel Chiappa It would be nice to be able to tell, from looking at a locator, what kind of thing it names. Although, perhaps this information is an attribute of the thing which is named? Yeah, I'd just make the type of the node another attribute of the node. Why do you wish to infer the node's type from its name? That wouldn't be so bad; you could do things like run consistency checks on the maps to make sure that there was always a node of type "interface" between a mode of type "machine" and a node of type "network". I thought I'd successfully gotten you to give up on that idea. My choice would be to allow any node to be linked to any other node. If the potential intervening nodes have no attributes, I have no urge to require them in the map. In other words, within a single site ... I'd just have a flat locator heirarchy to the hosts. Like IS-IS with a single level 1 area covering multiple physical networks, yes? Could be. I don't know IS-IS. Dave   Received: from PIZZA.BBN.COM by BBN.COM id aa10232; 30 Jun 94 16:20 EDT Received: from pizza by PIZZA.BBN.COM id aa22879; 30 Jun 94 16:06 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa22875; 30 Jun 94 16:04 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa09303; 30 Jun 94 16:04 EDT Received: by ginger.lcs.mit.edu id AA22547; Thu, 30 Jun 94 16:04:11 -0400 Date: Thu, 30 Jun 94 16:04:11 -0400 From: Noel Chiappa Message-Id: <9406302004.AA22547@ginger.lcs.mit.edu> To: isidro@BBN.COM, jnc@ginger.lcs.mit.edu Subject: Re: Dave's comments to the architecutre draft Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM From: Isidro Castineyra > If those nodes represent interfaces, then it's legit for them to have > locators, and you can either assign them as subsidiary to the locator of > the router node, or the locator of the network node. Something with a locator subsidiary to the locator of a node (i.e., having its locator prefixed by the locator of a node) should basically be *inside* the node. I really do not like having it hanging way out separated by a full arc. Oops, yes, when you draw the circle labelled "network with locator A.B.C", that circle includes all the nodes which represent interfaces (with locators of the form A.B.C.). Of course, if you look inside the node A.B.C, in addition to all the enclosed nodes for the interfaces, there has to be a node for the network itself (unless you have N^2 arcs); A.B.C.0 might be the name of that node. When I spoke of "subsididary to the network node", I was speaking of node A.B.C, but of course there's ambiguity when speaking of "the network node" as to whether you mean A.B.C or A.B.C.0. That's part of why I used to like the model where the lowest level locators all represented interfaces (a network couldn't have a locator on the same level as an interface), but you guys sucessfully convinced me that the "turtles all the way down" model was more flexible! :-) Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa15581; 30 Jun 94 18:03 EDT Received: from pizza by PIZZA.BBN.COM id aa23676; 30 Jun 94 17:49 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa23672; 30 Jun 94 17:47 EDT Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa14858; 30 Jun 94 17:47 EDT Received: by ginger.lcs.mit.edu id AA23628; Thu, 30 Jun 94 17:47:44 -0400 Date: Thu, 30 Jun 94 17:47:44 -0400 From: Noel Chiappa Message-Id: <9406302147.AA23628@ginger.lcs.mit.edu> To: dab@epilogue.com, nimrod-wg@BBN.COM Subject: Re: grouping strategies Cc: jnc@ginger.lcs.mit.edu From: Dave Bridgham > It would be nice to be able to tell, from looking at a locator, > what kind of thing it names. Why do you wish to infer the node's type from its name? Ah, no good reason, really, now that I think about it. It just seemed like a useful thing.. > you could do things like run consistency checks on the maps to make sure > that there was always a node of type "interface" between a mode of type > "machine" and a node of type "network". My choice would be to allow any node to be linked to any other node. In what circumstances would it be reasonable to have a node of type "network" joined directly to another node of type "network"? I think consistency checks are important, particularly once we get to the levels where hand-tuned abstractions start to appear. If the potential intervening nodes have no attributes, I have no urge to require them in the map. Sure, but is a locator an attribute or not? If it is (and I would lean to saying it is), then even an interface with no other attributes has an attribute, its locator. Noel   Received: from PIZZA.BBN.COM by BBN.COM id aa18186; 30 Jun 94 18:57 EDT Received: from pizza by PIZZA.BBN.COM id aa24048; 30 Jun 94 18:45 EDT Received: from BBN.COM by PIZZA.BBN.COM id aa24044; 30 Jun 94 18:43 EDT Received: from quern.epilogue.com by BBN.COM id aa17181; 30 Jun 94 18:43 EDT From: Dave Bridgham Sender: dab@epilogue.com To: nimrod-wg@BBN.COM In-reply-to: Noel Chiappa's message of Thu, 30 Jun 94 17:47:44 -0400 <9406302147.AA23628@ginger.lcs.mit.edu> Subject: grouping strategies Date: Thu, 30 Jun 94 18:43:16 -0400 Message-ID: <9406301843.aa29676@quern.epilogue.com> Date: Thu, 30 Jun 94 17:47:44 -0400 From: Noel Chiappa In what circumstances would it be reasonable to have a node of type "network" joined directly to another node of type "network"? I think consistency checks are important, particularly once we get to the levels where hand-tuned abstractions start to appear. Because I can't think of why I'd use that particular case right now you'd prohibit that and all other wierd cases? I can see connecting two interface nodes directly together in the case of a point-to-point network. Two interface routers could directly link two interface node with no intervening host node. If the potential intervening nodes have no attributes, I have no urge to require them in the map. Sure, but is a locator an attribute or not? If it is (and I would lean to saying it is), then even an interface with no other attributes has an attribute, its locator. I don't know if a locator is an attribute. I see attributes as strictly optional. The locator is the node's name. Without it you can't reference the node so it may be safely garbage collected. I guess I picture nodes like lisp symbols. The attributes are the property list and the locator is the symbol name. The heirarchical locators structure is like having a heirarchical package system (maybe I'm stretching this a bit too far). Do we tag host nodes with their EIDs? Do EIDs even matter at this level? Single homed hosts don't need a host node, the interface node is sufficient. Multi-homed hosts (I'm talking about non-routers here) need either a host node or the n^2 interconnect. If the n^2 interconnect, who gets the EID? All of them? If the host node, need to indicate that this node is not a transit node. That's just policy I suppose. There's really no need for network nodes or host nodes to have locators (for the moment ignoring what I wrote above about locators being inherent in nodes). Interfaces and aggregations of interfaces are named by locators. Network and host nodes are just there to cut down the number of links (reduce N^2 to N). Maybe it is useful to have unlocator'd nodes. I think I'm rambling. Dave