Date: Mon, 3 Jan 94 19:08:02 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401040008.AA06294@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: New datagram mode
Cc: jnc@ginger.lcs.mit.edu

	I know everyone's got a lot of back mail, but I'd appreciate it if
you didn't all file that (admittedly long, sigh) message from last week
about the new datagram mode in the round file....
	I'd like to think that this has plugged a major leak in the Nimrod
bottle, and I'd like to make this a major forwarding mode, but so far,
I've not gotten any feedback from the WG at large at all. I don't want to
take a major jump like this without *some* reaction... :-)

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa25824; 4 Jan 94 0:33 EST
Received: from pizza by PIZZA.BBN.COM id aa00224; 4 Jan 94 0:16 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa00220; 4 Jan 94 0:13 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa25318; 4 Jan 94 0:13 EST
Received: by ginger.lcs.mit.edu 
	id AA06785; Tue, 4 Jan 94 00:12:58 -0500
Date: Tue, 4 Jan 94 00:12:58 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401040512.AA06785@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: "Virtual Link" flows
Cc: jnc@ginger.lcs.mit.edu

One minor thing to beware of is that when sending a packet down a virtual link
using source routing, if that virtual link is itself composed of virtual
links, you more or less have to have a real flow for the top level virtual
link, otherwise you could run into problems with needing a stack of "flow
identifiers" in the packet. (I have said this poorly, I just wanted to note
it down before I forgot it again... I thought of it some months ago and lost
it until just now.)

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa25969; 4 Jan 94 0:44 EST
Received: from pizza by PIZZA.BBN.COM id aa00281; 4 Jan 94 0:22 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa00277; 4 Jan 94 0:20 EST
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa25493; 4 Jan 94 0:21 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 4 Jan 94 14:16:45 +0859
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401040517.AA17715@necom830.cc.titech.ac.jp>
Subject: Re: mobility and NIMROD
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Date: Tue, 4 Jan 94 14:16:44 JST
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
In-Reply-To: <9312281747.AA08649@ginger.lcs.mit.edu>; from "Noel Chiappa" at Dec 28, 93 12:47 pm
X-Mailer: ELM [version 2.3 PL11]

A happy new year. I'm bac.

>     > Yes, but it's an ungainly mechanism which is not the best way to
>     > accomplish what I want.
> 
>     It is gainful so that no area ID is necessary.
> 
> There's no free lunch. If you get an area ID by either i) using the set of
> ID's of all constituents, or ii) using the ID of one of its constituents,
> e.g. the numerically lowest one, you have the problem "what do you do when the
> constituent leaves the area".

That's why I think that the mobility should be separated from nimrod.

> I'd like areas to have identities which are
> longer-lasting than that of any particular consitituent. I mean, turning off
> one router at the top level could change the "locator" for millions of hosts
> underneath it. This is not workable.

As I have shown with DNS registration, hosts needs to register in DNS only
the direct upper level routers (except for glue information, in which
case some (not all) globally reachable pathes should be provided).

Moreover, the registration does not have to be changed immediately after
some configuration change, as long as some pathes registered in DNS are
also actually usable.

>     > I think areas should have identities which are somewhat more durable
>     > than the transient list of what border routers it has
> 
>     I think your Internet Draft ... mentions that an area ID many vary when
>     the area is subdivided.

> There are two problems with this. First, what do you do when a level 2 area
> parititions (I know, I know, you assign the UID's to the networks :-),

This is the case where the area ID of the set of all the border routers
behaves better than the area ID of least EID.

With the set ID, you should try to reach one of level 2 routers whose
EID is listed in the routing table of the sender.

If the routing table contains separated sets of level 2 border routers,
the sending host can locally recongnize that a partitioning has occured.

Then, if the sending host randomly choose a host, then, the level 2
router may, reply with ICMP (or simething new like that) that some
level 1 router is unreachable because of partitioning, in which case,
other level 1 routers should be tried.

The sending host may, instead, consult with the level 2 routers about the
list of reachable level 1 routers.

> and
> second (and this is the one that makes me look at it with disfavour), all the
> hosts inside the "new" K level area have new locators, which is probably not
> practical.

Thus, the locator must change dynamically, which is being taken care of,
quite naturally, by routing information exchange.

> Probably we'll have to use some other partition repair method, like
> tunnels (a la IS-IS).
> 
> I don't know though, now that I think about it: there are probably some
> partitions you can't fix with tunnels. In that case, you almost *have* to
> accept that the topology change will have visible consequences, i.e. new
> locators. Still, that's not so bad; if we have a mobile-host mechanism where
> we can find out that an EID has a new locator.... :-)

I don't think anyone in this WG have worked out a workable moility scheme.

Please don't expect too much on DNS TNG, unless you engineer the details
of it.

>     > Suppose a router crashes, or is removed from service, etc. Is the "area"
>     > still the same, or not?
> 
>     Nooo! If an area is split into two areas, it can't be the same, of course.
> 
> Sorry, I happen to think it's not reasonable to say that a top-level area has
> fundamentally changed (affecting the millions of hosts inside it) because of a
> minor detail of the topology has changed.... Even in the case where an area is
> split into two (perhaps by a permanent partition), part of the area ought to
> be able to get on with life without a big upheaval.

I don't think partitioning of a area could be handled with compact area ID
without much static settings.

>     > If a host H inside Q is using R3 as part of its "locator" for certain
>     > conversations, and R3 is turned off, it would be nice if the traffic
>     > could flow around the failed router without H havign to get a new
>     > "locator" and send it along to the hosts which are in communication with
>     > it.
> 
>     No problem. We should give all the possible pathes including R4 to the
>     lower layer.
> 
> Ah, but it's easy to find scenarios in which this doesn't work! Consider this
> one: H is part of an area which connected to the rest of the world through R1.
> H starts a conversation, using only R1 as the lowest layer in its locator, then
> R2 is brought up and connects the area to the rest of the world, after which R1
> is shut down.

So, H should register both R1 and R2 to DNS. DNS provides the information
on all the possible pathes to H. The selection of the best path
is done by routing tables, ICMP and such, perhaps in the kernel.

> Again, I'd things to have locators which transcend (to some degree) changes in
> the topology. Obviously, as I have mentioned at length previously, as the
> topology changes, the particular abstraction hierarchy you have chosen will
> become non-optimal (in terms of minimizing the sum of costs of the routing),
> and eventually you will want to modify the abstraction hierarchy. However:
> 
> >>> I'd like to make the binding between the topology and the abstraction <<<
> >>> hierarchy a little looser than in your scheme, so that the change to <<<
> >>> the latter to match the former is *controllable*. In system architect <<<
> >>> terms, this is the *fundamental* problem with your idea. <<<
> 
> 
>     >>  And you have unsolvable problems of EID->locator mapping
> 
>     > I keep hearing about this "problem". It's a problem which does not exist
>     > except during a deployment phase (when we will see traffic from
>     > unmodified hosts which contains on the original 32-bit IP "address").
> 
>     It does exist forever.
> 
> The percentage of traffic for which I wil have to perform this operation will
> decline oevr time to a very small share of the traffic. Since it's a small
> share, it doesn't matter if the mechanism to do it isn't very good. Anyway,
> it's not "unsolvable", it's simply a translation directory problem. It's a
> close match to the "white pages" problem, since entities which are neighbours
> in the directory may be otherwise unrelated. The directory must thus be
> maintained by an organization which is equally trusted by both entities.

I'm afraid "white pages" is another bad example of you.

Whiter pages are books. They are broadcasted globaly but quite slowly.

Moreover, to broadcast the white page information, routing information
must be available along the broadcast pathes.

Thus, routing information can not be like "white pages".

>     From where, do you think, the EID and the locater be obtained?
>     Where means "the EID and the locator" of the information source, of course.
> 
> In the same way that you have to have the IP address of some DNS server before
> you can start using the DNS to translate host names to IP addresses, you'll
> have to have the locator (the EID is not strictly speaking needed; you can use
> the "all IP endpoints" broadcast EID) of a translation server before you can
> start looking up (EID, locator) tuples.

Good. That's one of the reason why mobility can not be handled by
dynamic DNS TNG. The address of translation server must be configured
statically by hand, which can not change dynamically (purely theoretically,
just as root name servers, they could be configured dynamically, but...).

Broadcasting is not useful, unless not in a very local environment, in which
case, you anyway need static configuration somewhere. So, with broadcasting,
I think, you are only replicating the current CATENET model (or the
subnet model).

> Actually, I forgot something in my last message; for multi-homed hosts, you
> will get back an (EID, interface-0-locator, i-1-locator, ... i-N-locator)
> list. (Gee, Dave, S-expressions again! :-)

I'm afraid the expression is O(level^2) lengthy at best.

BTW, how, do you think, the information could be registered to DNS or
DNS TNG?

>     That is, EID->locator mapping must be static, which exclude explicit
>     support for mobility.
> 
> No. If any host wishes to modify it's EID->locator binding (i.e. go mobile),
> the host has to notify all entities with which it is in communication of its
> new locator;

The problem is that, if, by some fault, a mobile host can not notify
its new location beforee it moves, it can't do so forever, as the new
location is not registered and thus is unreachable.

> i.e. tell them to update their EID->locator binding (I don't
> really think of it as a "mapping", since to me that term implies a more
> general mechanism than is available here.)

So, it is NOT the solution adopted by mobility WG.

>     > The EID and locator are a tuple, and should be carried around together;
>     > the binding between them is only broken in special cases (such as mobile
>     > hosts).
> 
>     As I have shown in my solution, the binding can not be broken even for
>     mobility.
> 
> I don't understand why you can't change the binding? That's the *whole point*
> of a binding; to allow the relationship of the two objects to be changed.
> (Have you read Prof. Saltzer's paper, "On the Naming and Binding of Network
> Destinations"? It is available as RFC-1498. Everyone on this mailing list
> should read it! :-)

So, a name server should provide information for the mapping from
name to all the possible pathes. The best available path should be
selected by routing information.

>     >> multi-homed area and mapping for area names
> 
>     > I didn't catch what you referred to; could you give a few more details?
> 
>     How, do you think, assign area names?
> 
> I'm still not sure what you mean, but let me see if this is what you are
> referring to. To use a real example, if an area such as "MIT" is connected to
> several different long-haul carriers (let's call them "A", "M" and "S" :-), if
> you make the MIT area part of just one of those areas, it may bias incoming
> traffic through that long-haul carrier? To solve this, you want to make the
> MIT area appear in all three of the A, M and S areas?

Correct.

> If this is what you are referring to, it is a bit of an issue, but as you
> yourself pointed out, solving this by assigning multiple locators (which is
> effectively what you are doing, although I admit you have picked a notation
> which is fairly compressed) can be an exponential problem. If each area has R
> border routers, and is connected to N areas above it, the K'th layer will
> contain (N^K)*R EID's (if my mental math hasn't gone wacky).

Correct. That's why multi-homing should be discouraged.

> It can also take a tree structured notation to solve; if carrier A and M
> belong to different global consortia, each will have parent "areas", but *none
> in common*. So, your simple notation won't work any more; each K level area
> will have a disjoint set of higher level areas above it..

My notation still works.

If a level K is multi-homed, level K border routers should put DNS
information about EIDs of all the border routers at the directly upper
level, at least.

For example: for the name of a triplly-homed router at K-th level:

	<EID of border router 1 at (K+1)-th level area A>
	<EID of border router 2 at (K+1)-th level area A>
	<EID of border router 3 at (K+1)-th level area A>
                              .
                              .
	<EID of border router 1 at (K+1)-th level area B>
	<EID of border router 2 at (K+1)-th level area B>
	<EID of border router 3 at (K+1)-th level area B>
                              .
                              .
	<EID of border router 1 at (K+1)-th level area C>
	<EID of border router 2 at (K+1)-th level area C>
	<EID of border router 3 at (K+1)-th level area C>
                              .
                              .

should, at least, be registered in DNS.

> The "multi-homed" site problem is *routing* problem, and I am convinced that
> any attempt to solve it purely in the addressing will fail. The right solution
> is to use the locator to find out where the thing is on the map, and then
> make sure the map has enough info in it to *show* the multi-homing; it's then
> up to the entity picking the route to make use of that multi-homing.

Again, all the possible route should be (hopefully in compact fashion, as
I have exemplified) registered in DNS, within which the best one should
be selected as the routing problem.

>     The best border routers will be known through routing information, of
>     course and the selectiton will be made from all the possible border
>     routers in the lower level routine at the source.
> 
> Any time you are selecting a set of, or single routers, I have to ask the
> question "how do you recover when that router fails".

Rely on ICMP, ICMP TNG or time-out.

>     So, a packet contains only 8 EIDs for the source address.
> 
> Well, actually K, where K is the number of layers in the hierarchy. This is
> still a fair amount of data, though... it's a factor of 4 to 8 times longer
> than locators built up out of locally assigned numbers.

True, only when you enforce static hierarchy of area IDs, which should,
IMHO, be avoided at any cost.

Anyway, I don't think we need more than 4 layers. With a thousand lower
lever areas at each level, we can accomodate 1 Tera hosts.

>     > Also, as noted above, I do think that using a list of the border routers
>     > as the identification of an area is not the best choice either.
> 
>     Then, what is the alternative?
> 
> You have to assign a label for the identifier from some other namespace, and
> make the area somewhat independant of the consituent elements. As I pointed
> out, for fundamental reasons, this is the *only* viable choice, since you need
> to decouple the abstraction hierarchy from the physical topology to some
> degree.

You might have decoupled the entire issue, but, then, you must solve
the all the issues in all the decoupled part.

So, you should engineer how DNS TNG could be.

>     > Also, I get very uneasy every time people try and stuff this kind of
>     > thing into ARP. I'll pass on the ARP lecture for now...
> 
>     I think the solution is neat enough. If some network do not have ARP,
>     it should have some alternative, which should also be simulated by HR.
> 
> Why? ARP is a kludge. It's there simply because other things are wrong. If
> physical addresses could be carried in locators, it would never have been
> invented. Why should we not carry physical addresses around in locators?

ARP is THE solution within a small area where broadcasting is allowed.

> ARP is popular because *it is another layer of binding*,

Correct. It is a binding at the datalink layer.

> which can be used to
> provide flexibility which is needed, but not otherwise provided! (The Butler
> Lampson quote about "All problems in computer science can be solved by another
> level of indirection" comes to mind!)

So, the important point is that, with gloally unique 48 bit MAC addresses
of Ethernet, absolutely NO static settings are necessary at the datalink
layer. That's why I thin NBMA, which requires much static setting, is
nothing.

The other important factor of datalink layering is that, selection of
alternative path is not present at the layer, which simplifies
everything.

> However, it is deficient in that it only
> operates over a limited scope. Separate EID and locator namespaces, and a more
> flexible EID->locator binding, would do everything that ARP does, and a lot
> more besides.
> 
> **DOWN WITH ARP**!

As long as we need datalink layer, we need broadcasted ARP.

>     Classical SH should be allowed with triangulation, of course.
> 
> Apparently there are some problems with this. Current TCP implementations do
> not seem to work well mobile hosts; as it was explained to me, there are
> problems with various timing and retransmission algorithms geting confused
> by losrt packets, timing variances, etc.

The issue, if any, must be solved with policy-rich routing, anyway.

With a sender initiated polity, quadrateral routing will be common.

>     The solution like smip has been the only engeeringly plausile solution to
>     me as a DNS engineer
> 
> There are some DNS experts (Rob Austein, Dave Bridgham, etc) who are part of
> this WG who don't think this is all totally crazy, so I'm a bit surprised to
> hear this, although I'm quite ready to believe you.

I think, in DNS WG, after Amsterdam IETF, I have presented enough evidences
on why, contrary to common belief, mobility can not be handled with
conceptual DNS TNG.

> However, we built DNS, and
> if it's wrong, we'll change it. If there are fundamental problems, we do need
> think about them, of course, and perhaps change course as a result. So, I'd
> like to hear what you think the problems are.

The fundamental reason why DNS (or DNS TNG) can not be so useful is
that it can, at best, provide information of name->EID/locator binding
at the time of the query (caching make the matter worse which is not
so essential).

On the other hand, mobility and rouing must be able to handle the
change of name->EID/locater binding even after the name look up.

							Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa27351; 4 Jan 94 1:56 EST
Received: from pizza by PIZZA.BBN.COM id aa00509; 4 Jan 94 1:35 EST
Received: from nic.near.net by PIZZA.BBN.COM id aa00505; 4 Jan 94 1:34 EST
Received: from nic.near.net by nic.near.net id aa27079; 4 Jan 94 1:35 EST
To: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
cc: Noel Chiappa <jnc@ginger.lcs.mit.edu>, nimrod-wg@nic.near.net
Subject: Re: mobility and NIMROD 
In-reply-to: Your message of Tue, 04 Jan 1994 14:16:44 +0200.
             <9401040517.AA17715@necom830.cc.titech.ac.jp> 
Date: Tue, 04 Jan 1994 01:34:27 -0500
From: John Curran <jcurran@nic.near.net>

--------
] From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
] Subject: Re: mobility and NIMROD
] Date: Tue, 4 Jan 94 14:16:44 JST
] ...
] Anyway, I don't think we need more than 4 layers. With a thousand lower
] lever areas at each level, we can accomodate 1 Tera hosts.

Please do not assume uniform distributions.  Utilizations of .1 % to 10 %
are much more realistic, depending upon the administrative and operational
backpressure applied via policies.

] > [Noel]
] > You have to assign a label for the identifier from some other namespace, and
] > make the area somewhat independant of the consituent elements. As I pointed
] > out, for fundamental reasons, this is the *only* viable choice, since you 
] > need to decouple the abstraction hierarchy from the physical topology 
] > to some degree.
] 
] You might have decoupled the entire issue, but, then, you must solve
] the all the issues in all the decoupled part.
] 
] So, you should engineer how DNS TNG could be.

Functional separation of the entire issue allows consideration of several
different approaches to the problem in side-by-side comparision.  There is
no reason why home server, DNS TNG, hierarchical multicast, and out-of-band
communication models cannot be considered for the binding function.  Please
do not presume that the identification of EID and locator elements mandates
DNS TNG for binding.

] The fundamental reason why DNS (or DNS TNG) can not be so useful is
] that it can, at best, provide information of name->EID/locator binding
] at the time of the query (caching make the matter worse which is not
] so essential).
] 
] On the other hand, mobility and rouing must be able to handle the
] change of name->EID/locater binding even after the name look up.

I am not a fan of DNS for the role of locator binding, but I will point
out that there is a recent DNS Dynamic Update ID which addresses many of 
these issues.  If you'd prefer doing it with home mobility servers, then 
simply replace home address with EIDs and current adddress with current 
locator, etc.  Do you see any reason why the separation of EID and locators
prevents use of a mobility-server based scheme for binding?

/John


Received: from PIZZA.BBN.COM by BBN.COM id aa27603; 4 Jan 94 2:01 EST
Received: from pizza by PIZZA.BBN.COM id aa00553; 4 Jan 94 1:52 EST
Received: from nic.near.net by PIZZA.BBN.COM id aa00549; 4 Jan 94 1:50 EST
Received: from necom830.cc.titech.ac.jp by nic.near.net id aa28404;
          4 Jan 94 1:51 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 4 Jan 94 15:47:15 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401040647.AA18246@necom830.cc.titech.ac.jp>
Subject: Re: mobility and NIMROD
To: John Curran <jcurran@nic.near.net>
Date: Tue, 4 Jan 94 15:47:13 JST
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@nic.near.net
In-Reply-To: <9401040631.AA18184@necom830.cc.titech.ac.jp>; from "John Curran" at Jan 4, 94 1:34 am
X-Mailer: ELM [version 2.3 PL11]

> ] Anyway, I don't think we need more than 4 layers. With a thousand lower
> ] lever areas at each level, we can accomodate 1 Tera hosts.
> 
> Please do not assume uniform distributions.  Utilizations of .1 % to 10 %
> are much more realistic, depending upon the administrative and operational
> backpressure applied via policies.

So? 0.1% of 1 Tera is 1 Giga.

> ] You might have decoupled the entire issue, but, then, you must solve
> ] the all the issues in all the decoupled part.
> ] 
> ] So, you should engineer how DNS TNG could be.
> 
> Functional separation of the entire issue allows consideration of several
> different approaches to the problem in side-by-side comparision.  There is
> no reason why home server, DNS TNG, hierarchical multicast, and out-of-band
> communication models cannot be considered for the binding function.  Please
> do not presume that the identification of EID and locator elements mandates
> DNS TNG for binding.

I don't presume so. Any solution, which actually works, is OK. So, what
is the workiable alternative?

> ] On the other hand, mobility and rouing must be able to handle the
> ] change of name->EID/locater binding even after the name look up.
> 
> I am not a fan of DNS for the role of locator binding, but I will point
> out that there is a recent DNS Dynamic Update ID which addresses many of 
> these issues.

The ID is mainly on incremental transfer. So far, so good.

ut, it poorly addresses the issue of dynamic update, which, I think,
should be removed.

Though I have requested several times on the defition of the
"dynamic update", none has given.

So, recently, in DNS WG, I, myself, has worked out how the
difinition (and engineering solution) be and why it can't be applicable
to mobility.

> If you'd prefer doing it with home mobility servers, then 

Doing what?

> simply replace home address with EIDs and current adddress with current 
> locator, etc.

That's what is virtually done with smip.

> Do you see any reason why the separation of EID and locators
> prevents use of a mobility-server based scheme for binding?

"Prevents"? Since nimrod and mobility are orthogonal and totally
unrelated, nothing will be prevented.

							Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa27840; 4 Jan 94 2:15 EST
Received: from pizza by PIZZA.BBN.COM id aa00635; 4 Jan 94 2:03 EST
Received: from nic.near.net by PIZZA.BBN.COM id aa00631; 4 Jan 94 2:01 EST
Received: from nic.near.net by nic.near.net id aa28858; 4 Jan 94 2:02 EST
To: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
cc: jnc@ginger.lcs.mit.edu, nimrod-wg@nic.near.net
Subject: Re: mobility and NIMROD 
In-reply-to: Your message of Tue, 04 Jan 1994 15:47:13 +0200.
             <9401040647.AA18246@necom830.cc.titech.ac.jp> 
Date: Tue, 04 Jan 1994 02:02:52 -0500
From: John Curran <jcurran@nic.near.net>

--------
] From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
] Subject: Re: mobility and NIMROD
] Date: Tue, 4 Jan 94 15:47:13 JST
]
] > ] Anyway, I don't think we need more than 4 layers. With a thousand lower
] > ] lever areas at each level, we can accomodate 1 Tera hosts.
] > 
] > Please do not assume uniform distributions.  Utilizations of .1 % to 10 %
] > are much more realistic, depending upon the administrative and operational
] > backpressure applied via policies.
] 
] So? 0.1% of 1 Tera is 1 Giga.

My apologies, I was not sufficiently clear.  At _each_ allocation/delegation
level, you can expection utilizations in the .1 % to 10 % range (depending on
the policies which are followed.)   I would not underestimate the waste of 
functional identifier space that results from political and administrative
loss.

/John


Received: from PIZZA.BBN.COM by BBN.COM id aa03980; 4 Jan 94 4:24 EST
Received: from pizza by PIZZA.BBN.COM id aa01001; 4 Jan 94 4:10 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa00997; 4 Jan 94 4:08 EST
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa02947; 4 Jan 94 4:09 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 4 Jan 94 18:04:55 +0859
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401040905.AA18501@necom830.cc.titech.ac.jp>
Subject: Re: New datagram mode
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Date: Tue, 4 Jan 94 18:04:53 JST
Cc: nimrod-wg@BBN.COM
In-Reply-To: <9312311841.AA26054@ginger.lcs.mit.edu>; from "Noel Chiappa" at Dec 31, 93 1:41 pm
X-Mailer: ELM [version 2.3 PL11]

> 	I mentioned in some mail to the IETF list that I have a new idea for
> how to do the datagram (i.e. non-flow) mode in Nimrod.

I thought loose source routing is the way to go.

> I think it produces
> a more efficient datagram mode than I had hitherto imagined. In fact, it may
> be even more efficient than the existing "hop-by-hop" model!

Strange.

Why, do you think, next hop decision by flow ID is more efficient
than next hop decision by EID of the destination (or, in the source
routing case, the next intermediate router).

> Source routes using these virtual entities can be guaranteed to be non-looping
> overall by a recursion process: the top-level source route does not contain
> loops (at least, unless the source is stupid, and choses a path with one :-);
> each entity which advertises a virtual link is required to make sure that the
> implementation of that entity does not create a loop.

I think you are assuming all the routers in the world is producing the
correct information about the topology with which a source route path
is selected and all the routers on the path is implemented
without any fault.

> 	On thinking about the details, an obvious mechanism suggested itself.
> If all packets contain a "flow-id" field, *which will be unused in datagram
> packets*, the obvious thing is to store the flow-id of the VLF in that field.

As a virtual link is merely loosely source routed and have hierarchical
structure, you can override the flow-id field only at the top level.

Anyway,

> I realized that one minor bug is that if
> we define the global flow-id in the packet to be the source EID plus a
> source-local flow-id, this idea doesn't work (since you'd need to bash the
> source EID to the router at the start of the VLF, or something), so perhaps
> the flow-id field can't overload the source EID.

I think the bug is fatal.

> (It turns
> out that flow-setup can even provide "denial of service" protection,

It can't. An attacker can advertise false routing information with which
source route could be constructed. The attacker will, then, drop
the actual data packets on the advertised route.

							Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa04339; 4 Jan 94 8:47 EST
Received: from pizza by PIZZA.BBN.COM id aa01872; 4 Jan 94 8:38 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa01868; 4 Jan 94 8:36 EST
Received: from mitsou.inria.fr by BBN.COM id aa03823; 4 Jan 94 8:36 EST
Received: by mitsou.inria.fr
	(5.65c8/IDA-1.2.8) id AA19241; Tue, 4 Jan 1994 14:40:04 +0100
Message-Id: <199401041340.AA19241@mitsou.inria.fr>
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Cc: nimrod-wg@BBN.COM
Subject: Re: Maps and meshes in the real world 
In-Reply-To: Your message of "Thu, 23 Dec 1993 17:56:39 EST."
             <9312232256.AA18830@ginger.lcs.mit.edu> 
Date: Tue, 04 Jan 1994 14:40:03 +0100
From: Christian Huitema <Christian.Huitema@sophia.inria.fr>

Noel,

I have some sympathy for your "mesh" approach but I believe we should be a bit
careful with the advances in technology. I was conducting recently a review of
the state of the art in IGPs. Well, both RIP and OSPF have one weak point in
common -- they don't handle asymetric links.

So what? Well, what is a mesh in 10 years? Something like fibers and colors, I
guess. Chances are it will be widely asymetric, e.g. you get a laser which is
tuned on exactly one emit wavelength, and a set of filters that listen to
another set. Not the complete set, that would be too expensive; some form of
routing can be used to relay from one wavelength to another. The bad news is
that neither RIP nor OSPF can handle this kind of map!

Christian Huitema


Received: from PIZZA.BBN.COM by BBN.COM id aa09063; 4 Jan 94 10:25 EST
Received: from pizza by PIZZA.BBN.COM id aa02243; 4 Jan 94 10:06 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa02239; 4 Jan 94 10:04 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa08247; 4 Jan 94 10:04 EST
Received: by ginger.lcs.mit.edu 
	id AA07418; Tue, 4 Jan 94 10:04:26 -0500
Date: Tue, 4 Jan 94 10:04:26 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401041504.AA07418@ginger.lcs.mit.edu>
To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp
Subject: Re: New datagram mode
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM

    I thought loose source routing is the way to go.

Well, I reckon we should still have source routing (strict and loose), since
it has capabilities this new mode lacks, but as I pointed out in the note,
source routing does have real disadvantages, and this mode is an attempt to
avoid them.


    > it may be even more efficient than the existing "hop-by-hop" model!

    Strange. Why, do you think, next hop decision by flow ID is more efficient
    than next hop decision by EID of the destination (or, in the source
    routing case, the next intermediate router).

First, I didn't think anyone is proposing doing routing based on the EID.
Since there is no topological information in the EID, that would require
either i) some setup in the routers which are expected to handle packets to
that EID, which would effectively be a flow setup with the flow-id being the
destination EID, or ii) a translation step (from EID to locator), which would
be very inefficient.

Even with shortish fixed length locators (a la SIP), which are simply looked
up in a routing table, this would *still* produce more efficient forwarding
(in the non-active routers, once the DMF has actually been set up). Such
hop-by-hop routing uses a "longest match" lookup in the routing table, whereas
if the flow-id is at a fixed offset in the packet, a more efficient lookup
and forwarding, perhaps even purely in hardware, is possible.

Source routing requires looking around in a variable length object to locate
the next thing to router to, and once you have done that you still have to
go through the equivalent of either of the two options above, so I can't see
how it's any improvement.


    > Source routes using these virtual entities can be guaranteed to be
    > non-looping overall by a recursion process

    I think you are assuming all the routers in the world is producing the
    correct information about the topology with which a source route path
    is selected and all the routers on the path is implemented
    without any fault.

The first problem will be detected when the active router goes to set up the
DMF flow; if it tries to use a link which doesn't exist, etc, it should get
back an error message. The second problem can to some degree be avoided; as
I said:

    It turns out that flow-setup can even provide "denial of service"
    protection, and although each individual active router along the path can
    provide a denial of service protection on its part of the path, this
    still does not give complete end-to-end denial of service protection,
    since a compromised active router can deny service.

"Denial of service" attacks refer some intermediate router failing, either
accidentally, or deliberately, to forward packets as it claims it can and
will. The "deliberate nastiness" model is useful to think about, since you
don't have to think about failure probabilites; it's 100% certain that the
worst case humans can think of will happen. Note, however, that systems which
have good denial-of-service protection can stil fail, since the real world
almost always creates scenarios which the most rugged systems have not
protected against...

A way to avoid denial of service is for the entity which set up the flow to
send a certain % of traffic which is "test" traffic. This is traffic which
*looks* to intermediate nodes like real user traffic, but by prearrangement
between the ends is not. (If you're clever, you can even use real tarffic for
this, but that intertwines network level denial-of-service-detection
mechanisms in with the application....) The ends use this to make sure their
traffic is getting through. If more than a certain % (or all of it, if the
d-o-s guy is being dumb :-), you set up a new flow which takes a different
path. There are strategies (binary search type things) to quickly discover a
single node which is the problem, etc. As I said, these techniques can be
applied by active routers to discover malfunctioning non-active routers, and
set up a new DMF which bypasses the trouble, but a broken/compromised active
router will still cause failures. That's the inevitable price of not "doing
it all at the source". Even this level of protection still costs (in
complexity and packet traffic), but that's a local cost/benfit tradeoff, not
a system architecture question. It is also better than what you get with
hop-by-hop, though...

However, all these said, if you start thinking about arbitrary implementation
errors, almost any scheme can be made to fail. There most resilient against
errors appears to be end-end flow setup, which can even work in the face of
fairly arbitrary denial-of-service attacks in the middle, although obviously
not ones which leave no viable path between the source and destination.


    > If all packets contain a "flow-id" field, *which will be unused in
    > datagram packets*, the obvious thing is to store the flow-id of the VLF
    > in that field.

    As a virtual link is merely loosely source routed and have hierarchical
    structure, you can override the flow-id field only at the top level.

I'm not sure I quite followed this? If I understand your meaning, you refer
to the problem I alluded to in that brief messages recently, where I
mentioned that unless high-level virtua links are actually instantiated as
flows, you could require a "stack" of flow-ids (or virtual link id's in the
source route) in the packets.

Also, when I say "LSR", I don't mean that there are parts of the path which
are not specified, only that the path is specific in terms of high-level
virtual entities, not actual physical resources.


    > one minor bug is that if we define the global flow-id in the packet to
    > be the source EID plus a source-local flow-id, this idea doesn't work
    > (since you'd need to bash the source EID to the router at the start of
    > the VLF, or something), so perhaps the flow-id field can't overload the
    > source EID.

    I think the bug is fatal.

No, the only thing that happens is your packet format has to not include the
"source EID" field in the "flow-id" semantics.

Globally unique flow ids are nice, since they avoid either i) having to agree
over some scope on a flow-id allocation, or ii) purely local flow-ids which
get changed in the packet at each hop, a la X.25. If you still want packets
to contain globally unique flow ids, and you decide the easiest way to create
those globaly unique flow-ids is to concatenate some globally unique label
(e.g. an EID) with a local flow-id, that just means the "flow-id" field in
the packet has to include space for an EID as well as a local flow-id.

So the packets get a little bigger. Bandwidth is going to be cheap...


    > (It turns out that flow-setup can even provide "denial of service"
    > protection,

    It can't. An attacker can advertise false routing information with which
    source route could be constructed. The attacker will, then, drop
    the actual data packets on the advertised route.

As I pointed out, the ends can detect this, and, using the topology map, create
a new path which avoids the attacker.

See Radia Perlman's PhD thesis for all sorts of other ways to secure map
distribution based routing systems against hostile attack. Flow-setup/MD
systems can be made extraordinarily resistant to attack, one of their chief
attractions....


	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa16987; 4 Jan 94 13:00 EST
Received: from pizza by PIZZA.BBN.COM id aa03129; 4 Jan 94 12:47 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa03125; 4 Jan 94 12:45 EST
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa16429;
          4 Jan 94 12:44 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 5 Jan 94 02:40:09 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401041740.AA19759@necom830.cc.titech.ac.jp>
Subject: Re: New datagram mode
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Date: Wed, 5 Jan 94 2:40:07 JST
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
In-Reply-To: <9401041504.AA07418@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 4, 94 10:04 am
X-Mailer: ELM [version 2.3 PL11]

>     I thought loose source routing is the way to go.
> 
> Well, I reckon we should still have source routing (strict and loose), since
> it has capabilities this new mode lacks, but as I pointed out in the note,
> source routing does have real disadvantages, and this mode is an attempt to
> avoid them.

Really?

>     > it may be even more efficient than the existing "hop-by-hop" model!
> 
>     Strange. Why, do you think, next hop decision by flow ID is more efficient
>     than next hop decision by EID of the destination (or, in the source
>     routing case, the next intermediate router).
> 
> First, I didn't think anyone is proposing doing routing based on the EID.

I think I am.

If a source route is like A.B.C.D..., A should know the route to B, B
shuold know the route to C, and so on.

In general, border routers at level K should know the pathes to
border routers at level K-1.

> Since there is no topological information in the EID, that would require
> either i) some setup in the routers which are expected to handle packets to
> that EID, which would effectively be a flow setup with the flow-id being the
> destination EID, or ii) a translation step (from EID to locator), which would
> be very inefficient.

As the routing table gives topological information of EIDs, hosts
can construct a source route.

> Even with shortish fixed length locators (a la SIP), which are simply looked
> up in a routing table, this would *still* produce more efficient forwarding
> (in the non-active routers, once the DMF has actually been set up). Such
> hop-by-hop routing uses a "longest match" lookup in the routing table, whereas
> if the flow-id is at a fixed offset in the packet, a more efficient lookup
> and forwarding, perhaps even purely in hardware, is possible.

So, simple routing with plain EID is just as fast as routing with flow ID.

> Source routing requires looking around in a variable length object

I don't think it hurt perormance.

> A way to avoid denial of service is for the entity which set up the flow to
> send a certain % of traffic which is "test" traffic.

Aren't you assuming that there is certain amount of traffic between
all the pairs of routers?

With connectionless communication, packet exchange along a certain link
could be quite infrequent.

>     > If all packets contain a "flow-id" field, *which will be unused in
>     > datagram packets*, the obvious thing is to store the flow-id of the VLF
>     > in that field.
> 
>     As a virtual link is merely loosely source routed and have hierarchical
>     structure, you can override the flow-id field only at the top level.
> 
> I'm not sure I quite followed this? If I understand your meaning, you refer
> to the problem I alluded to in that brief messages recently, where I
> mentioned that unless high-level virtua links are actually instantiated as
> flows, you could require a "stack" of flow-ids (or virtual link id's in the
> source route) in the packets.

Oops, I failed to note your short mail, sorry.

Anyway, as the "stack" of flow-ids is just as bad (not so bad, I think)
as variable length source routing, it can not be your choice.

Then, the question is, how many instantiated flows will there be in the
entire net.

>     > (It turns out that flow-setup can even provide "denial of service"
>     > protection,
> 
>     It can't. An attacker can advertise false routing information with which
>     source route could be constructed. The attacker will, then, drop
>     the actual data packets on the advertised route.
> 
> As I pointed out, the ends can detect this, and, using the topology map, create
> a new path which avoids the attacker.

I think the end can detect the service denial with connection oriented
communication only.

						Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa05933; 5 Jan 94 2:28 EST
Received: from pizza by PIZZA.BBN.COM id aa06701; 5 Jan 94 2:12 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa06697; 5 Jan 94 2:09 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa05592; 5 Jan 94 2:10 EST
Received: by ginger.lcs.mit.edu 
	id AA11037; Wed, 5 Jan 94 02:09:51 -0500
Date: Wed, 5 Jan 94 02:09:51 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401050709.AA11037@ginger.lcs.mit.edu>
To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp
Subject: Re: New datagram mode
Cc: nimrod-wg@BBN.COM

    > Well, I reckon we should still have source routing (strict and loose),
    > since it has capabilities this new mode lacks, but as I pointed out in
    > the note, source routing does have real disadvantages, and this mode is
    > an attempt to avoid them.

    Really?

I'm a little confused as to exactly what in that paragraph caused the
"Really?" response. You don't believe there are disadvantages? I listed the
disadvantages in the original note. They were:

    First, the source (or some agent thereof) has to compute such source
    routes, so even if they are cached, there is some inefficiency there.
    Second, they have to be carried in the packets, making them more expensive
    to construct at the source, as well as bulkier.
    Third, the processing of [source-routed] packets in network switches is
    almost inevitably more inefficient.

You don't believe this new mode solves them? The route calculation is now
distributed, which takes care of the first, there are no source routes in
packets, which fixes the second, etc.


    > First, I didn't think anyone is proposing doing routing based on the EID.

    I think I am. If a source route is like A.B.C.D..., A should know the
    route to B, B shuold know the route to C, and so on.

If I understand you correctly, this refers to this idea of yours of
indentifying areas with the EID's of border routers. I'm speaking of routing
on the destination EID only.

    > Since there is no topological information in the EID

    As the routing table gives topological information of EIDs, hosts can
    construct a source route.

You must be working with a very different definition of "EID" from the rest of
us. It is a flat, effectively random (from the point of view of the network)
number, like an Ethernet 48-bit hardware address. There is *no way* for the
"routing table" (I detest the term, since it smacks of the DV and hop-by-hop
view of the world, which I reckon is headed for the junk-heap of history) to
give topological information about an EID *directly*. (If the EID is first
mapped into a topologicaly significant name, the locator, the routing table
can tell you something about the locator, but that doesn't count.)

    So, simple routing with plain EID is just as fast as routing with flow ID.

You can't "route" from a plain EID. The EID cannot be looked up in a routing
table. (Also you don't "route" with flow ID's, in the sense of compute a
route, you only "forward" based on them. The IETF has started to used the
two terms to distinguish between the process of deciding on routes, and
actually passing user data packets though. Of course, in the hop-by-hop
model, forwarding does involve a routing step.)


    > A way to avoid denial of service is for the entity which set up the flow
    > to send a certain % of traffic which is "test" traffic.

    Aren't you assuming that there is certain amount of traffic between all
    the pairs of routers?

Only between pairs of routers which wish to protect themselves against
denial-of-service attacks. Like much else in Nimrod, this is a cost/benfit
knob that is set locally, to allow the users to make the decision on what
quality of service they want..


    Then, the question is, how many instantiated flows will there be in the
    entire net.

Actually, the important question is the number of simultaneously active (i.e.
set up) flows in a given router. If you look at system as a whole, there is a
curve, with the number of active flows on one axis, and the number of routers
with that many flows on the other. The shape of the curve, and the way it is
changing over time, is as important as numbers like the average, and worst
case..

This question was discussed on the Big-Internet mailing list some time back.
Space doesn't permit retelling the whole thing here, but as you will no doubt
recall I mentioned the result than O(N^2) growth does not seem feasible.


    > the ends [of the flow] can detect this, and, using the topology map,
    > create a new path which avoids the attacker.

    I think the end can detect the service denial with connection oriented
    communication only.

It all depends on how much overhead you are wiling to accept. For things which
"have to work", you can send test traffic. But that's silly in most cases...

If your application wants to send one packet, and get no acknowledgement, then
even a random packet loss could disrupt the application. If it's a single
request and response, the lack of a response should tell you something's
wrong. If you retry several times, and nothing comes back, then perhaps it's
time to try something more robust, like a source-routed packet (at least, if
the application is important).

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa08777; 5 Jan 94 3:29 EST
Received: from pizza by PIZZA.BBN.COM id aa06946; 5 Jan 94 3:14 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa06942; 5 Jan 94 3:12 EST
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa06737; 5 Jan 94 3:13 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 5 Jan 94 17:08:18 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401050808.AA22138@necom830.cc.titech.ac.jp>
Subject: Re: New datagram mode
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Date: Wed, 5 Jan 94 17:08:16 JST
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
In-Reply-To: <9401050709.AA11037@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 5, 94 2:09 am
X-Mailer: ELM [version 2.3 PL11]

>     First, the source (or some agent thereof) has to compute such source
>     routes, so even if they are cached, there is some inefficiency there.
>     Second, they have to be carried in the packets, making them more expensive
>     to construct at the source, as well as bulkier.
>     Third, the processing of [source-routed] packets in network switches is
>     almost inevitably more inefficient.
> 
> You don't believe this new mode solves them? The route calculation is now
> distributed, which takes care of the first, there are no source routes in
> packets, which fixes the second, etc.

The first one, the distributed calculation of route, only makes load
of intermediate routers heavier. The second one, I can't understand.
How can route be determined with a single flat EID? Doesn't your
packet contain something like layers of Area IDs which makes the
packet bulky?

>     > First, I didn't think anyone is proposing doing routing based on the EID.
> 
>     I think I am. If a source route is like A.B.C.D..., A should know the
>     route to B, B shuold know the route to C, and so on.
> 
> If I understand you correctly, this refers to this idea of yours of
> indentifying areas with the EID's of border routers.

Yes, I'm speaking of loose source routing by EIDs of intermediate routers.

> I'm speaking of routing on the destination EID only.

I don't think you can forward any packet with EID only. Aren't you
assuming some structure in EID?

> You must be working with a very different definition of "EID" from the rest of
> us. It is a flat, effectively random (from the point of view of the network)
> number, like an Ethernet 48-bit hardware address.

To me, EID is flat, of course.

> There is *no way* for the
> "routing table" (I detest the term, since it smacks of the DV and hop-by-hop
> view of the world, which I reckon is headed for the junk-heap of history) to
> give topological information about an EID *directly*. (If the EID is first
> mapped into a topologicaly significant name, the locator, the routing table
> can tell you something about the locator, but that doesn't count.)

Just as the current routing table gives information for the
next hop determination to various networks, nimrod routing
table could give information for the next hop determination
to various areas. I think areas should be represented with EID of
border routers, thus, routing table could be indexed with EIDs
of border routers, which is not essential.

So, more generally,

>     As the routing table gives topological information of EIDs, hosts can
>     construct a source route.

could be refrased as:

	As the routing table gives topological information of area IDs,
	hosts can construct a source route consists of area IDs.

>     So, simple routing with plain EID is just as fast as routing with flow ID.
> 
> You can't "route" from a plain EID. The EID cannot be looked up in a routing
> table.

I'm assuming that routers have a routing table indexed by EIDs of
certain border routers (those at the upper level areas and
the direct lower level areas) and EIDs of directly reachable
endhosts (that is, endhosts in the same datalink layer).

No, the table does not contain the EIDs of all the host on the net,
of course.

So, the table size is not so large.

>     > A way to avoid denial of service is for the entity which set up the flow
>     > to send a certain % of traffic which is "test" traffic.
> 
>     Aren't you assuming that there is certain amount of traffic between all
>     the pairs of routers?
> 
> Only between pairs of routers which wish to protect themselves against
> denial-of-service attacks. Like much else in Nimrod, this is a cost/benfit
> knob that is set locally, to allow the users to make the decision on what
> quality of service they want..

What? Then, how intermediate routers can forward packets without
a lot of effort for flow setup?

Aren't you suggesting a trade-off between:

	Connectionless communication through routers fully connected
	at run time

and

	Connectionless communication through routers fully connected
	in advance

I think both will result in O(N^2) behaviour.

>     Then, the question is, how many instantiated flows will there be in the
>     entire net.
> 
> Actually, the important question is the number of simultaneously active (i.e.
> set up) flows in a given router. If you look at system as a whole, there is a
> curve, with the number of active flows on one axis, and the number of routers
> with that many flows on the other. The shape of the curve, and the way it is
> changing over time, is as important as numbers like the average, and worst
> case..

I don't think you can assume any "flow" for connectionless communication.

> This question was discussed on the Big-Internet mailing list some time back.
> Space doesn't permit retelling the whole thing here, but as you will no doubt
> recall I mentioned the result than O(N^2) growth does not seem feasible.

At least, even with your logic:

> Only between pairs of routers which wish to protect themselves against
> denial-of-service attacks. Like much else in Nimrod, this is a cost/benfit

if a user want full protection against service denial, it is O(N^2).

>     > the ends [of the flow] can detect this, and, using the topology map,
>     > create a new path which avoids the attacker.
> 
>     I think the end can detect the service denial with connection oriented
>     communication only.
> 
> It all depends on how much overhead you are wiling to accept. For things which
> "have to work", you can send test traffic. But that's silly in most cases...

It's much more reasonable to expect the end-end protection with each
applications on individual hosts. Aggragated protection through
intermediate routers can't be so meaningful.

> If your application wants to send one packet, and get no acknowledgement, then
> even a random packet loss could disrupt the application. If it's a single
> request and response, the lack of a response should tell you something's
> wrong. If you retry several times, and nothing comes back, then perhaps it's
> time to try something more robust, like a source-routed packet (at least, if
> the application is important).

So, the protection, if any, should be application-wise.

						Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa22788; 5 Jan 94 9:48 EST
Received: from pizza by PIZZA.BBN.COM id aa08328; 5 Jan 94 9:30 EST
Received: from nic.near.net by PIZZA.BBN.COM id aa08324; 5 Jan 94 9:29 EST
Received: from ftp.com by nic.near.net id aa14129; 5 Jan 94 9:30 EST
Received: by ftp.com 
	id AA26777; Wed, 5 Jan 94 09:30:50 -0500
Date: Wed, 5 Jan 94 09:30:50 -0500
Message-Id: <9401051430.AA26777@ftp.com>
To: mohta@necom830.cc.titech.ac.jp
Subject: Re: mobility and NIMROD
From: Frank Kastenholz <kasten@ftp.com>
Reply-To: kasten@ftp.com
Cc: jcurran@nic.near.net, jnc@ginger.lcs.mit.edu, nimrod-wg@nic.near.net

 > > ] Anyway, I don't think we need more than 4 layers. With a thousand lower
 > > ] lever areas at each level, we can accomodate 1 Tera hosts.
 > > 
 > > Please do not assume uniform distributions.  Utilizations of .1 % to 10 %
 > > are much more realistic, depending upon the administrative and operational
 > > backpressure applied via policies.
 > 
 > So? 0.1% of 1 Tera is 1 Giga.

Please do not assume a fixed number of hierarchies. If a protocol
assumes a fixed number of hierarchies then, within 10 years, we will
have to fix it. I do not want to go through this all again.

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa22850; 5 Jan 94 9:49 EST
Received: from pizza by PIZZA.BBN.COM id aa08352; 5 Jan 94 9:36 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa08348; 5 Jan 94 9:34 EST
Received: from babyoil.ftp.com by BBN.COM id aa20916; 5 Jan 94 9:30 EST
Received: by ftp.com 
	id AA26780; Wed, 5 Jan 94 09:30:52 -0500
Date: Wed, 5 Jan 94 09:30:52 -0500
Message-Id: <9401051430.AA26780@ftp.com>
To: mohta@necom830.cc.titech.ac.jp
Subject: Re: New datagram mode
From: Frank Kastenholz <kasten@ftp.com>
Reply-To: kasten@ftp.com
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM


>> I think it produces
>> a more efficient datagram mode than I had hitherto imagined. In fact, it may
>> be even more efficient than the existing "hop-by-hop" model!
>
>Strange.
>
>Why, do you think, next hop decision by flow ID is more efficient
>than next hop decision by EID of the destination (or, in the source
>routing case, the next intermediate router).

There probably will be >1 destination EID 'reachable' for a given flow.
If there are packets going to N different EIDs that all can use the same
flow, then you can achieve an N:1 improvement in resources required to
store the routing information to those EIDs.

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa24726; 5 Jan 94 10:20 EST
Received: from pizza by PIZZA.BBN.COM id aa08579; 5 Jan 94 10:06 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa08574; 5 Jan 94 10:04 EST
Received: from babyoil.ftp.com by BBN.COM id aa23615; 5 Jan 94 10:02 EST
Received: by ftp.com 
	id AA27982; Wed, 5 Jan 94 10:02:21 -0500
Date: Wed, 5 Jan 94 10:02:21 -0500
Message-Id: <9401051502.AA27982@ftp.com>
To: jnc@ginger.lcs.mit.edu
Subject: Re: New datagram mode
From: Frank Kastenholz <kasten@ftp.com>
Reply-To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM

I read Noel's note and have a couple of comments.

 >         As a result of all this, I prefer systems which do not have to make
 > this guarantee of "near-global consistency"; they are both simpler and more
 > robust. I would, as an architect, *very* desperately like to avoid any such
 > design *if at all possible*. It was for this reason that I was thinking that
 > perhaps the only non-flow mode in Nimrod would be a "source-route in packet"
 > mode. This guaranteed non-looping paths, but without having to guarantee the
 > near-global consistency in databases, etc, etc.

I'm not sure exactly what Noel means by "global" in this paragraph.
Does it mean global in that _all_ routers _everywhere_ must be
consistant?  Or does it mean that all routers in any one area must be
consistant?

I tend to believe the later. Routers in a given area need only
understand how to get to the border-routers of their area, and to the
border-routers of contained areas.


 >         For efficient handling of SRD's, I had imagined that i) the packet
 > would contain a pointer into the source route, and ii) routers would
 > maintain, for each virtual link, a pre-setup flow which instantiates the
 > virtual link (hereinafter the "VLF", for 'virtual link flow'). When an SRD
 > shows up at the router at the start of the VLF, it is somehow "associated"
 > with the VLF until it gets to the end of it, at which time the source route
 > is consulted, and the packet is routed onto the next VLF. (Obviously,
 > physical links are just done as a one-shot, without the necessity of a
 > flow.)

Just to be sure that I understand; I'd specify a source route of the form
A-B-C-D... and there would be a 'pre-set-up' flow in the various
areas, one between A and B, another between B and C and so on?


 >         The packet contains, in addition to a mungable flow-id field, the
 > source and destination locators, and a pointer into the locator. The idea
 > (borrowed from PIP :-) is that the pointer starts out at the lowest level of
 > the source locator, and moves up that locator, then across to the destination
 > locator, and then down. In addition to these extra fields in the packet, all
 > routers have to contain a minimal set of "pre-setup" flows to certain routers
 > which are at critical places in the abstraction hierarchy.

Doesn't this conflict with your earlier note about maps and meshes?
By allowing the notion of "critical places" you then allow the notion
of "single point of failure". If a router at a critical place fails
then, by definition (since the place/router is critical) you have a
failure with major impact in the network.


>        While going up the source locator, each "active" router (i.e. one that
>actually makes a decision about where to send the packet, as opposed to
>handling it as part of a flow) selects a DMF which will take the packet to the
>"next higher" level object in the source locator, advances the pointer, and
>sends the packet off along that flow. When it gets to the end of that flow,
>the process repeats, until the packet reaches a router which is at the least
>common intersection of the two locators (i.e. for A.P.Q.R and A.X.Y.Z, this
>would be when the packet reaches A).
>        The process then inverts, with each active router selecting a DMF
>which takes the packet to the next lower object in the destination locator.
>So, A would select a flow to A.X, and once it got to A.X, A.X would select a
>flow to A.X.Y, etc.
>        This mode would have almost none of the disadvantages of SRD, since
>the source doesn't have to compute a route, and there is no source route in
>the packet, just the source and destination locator (and the source locator is
>useful to have anyway when the packet gets to the ultimate destination, to
>allow a reply to be sent easily). Again, in a world with resource-allocation
>going on, that DMF would have a reource limit associated with it, which would
>prevent pure datagram traffic from interfering with other resource
>allocations.

I read these paragraphs as saying that the source route is "deduced"
from the source and destination locators. If the source locator is
A.B.C and the destination locator is X.Y.Z then the source route
between the two is deduced to be A.B.C - A.B - A - X - X.Y - X.Y.Z.
No?

Isn't this "no brainer routing"?

 >         It might be possible to remove the "hop count" field, since there
 > are now some fairly strong guarantees that traffic will not loop, but it
 > might be useful to leave it in as an additional safety measure against
 > unforseen failure modes. Currently, the hop count is forwarding state which
 > is retained in the packet to prevent loops, and that retained state is in
 > some ways made redundant by some slightly more complex state which is
 > retained by this method. Removing the hop count would slight increase the
 > efficiency of forwarding, obviously.

Can the set of routers comprising a flow change in "mid-flow"? Can the
set be temporarily inconsistant? Might a loop exist as a transient of some
form during a repair of the flow?  If so, the hop-count might be deemed
necessary.

=====================================================================
(aside)

However; it occurs to me that if there is no hop-count, even in today's
network:
   If a loop occurs, then the loop will eventually be fixed. When it is
   fixed, all existing packets in the loop will be forwarded on towards
   their destinations, just as if nothing had happened. Some packets
   that were sent into the loop will be lost due to lack of buffers.
   I suspect that this loss will be somewhat random (though it might
   hit "new" packets disproportionatley). So maybe a hop-count
   is not necessary in any event. (I have not thought all this through...)
   
=====================================================================

Now, what I want to know is what does this new datagram model really
buy that the current model does not do.

It seems to me that what you are doing is taking a network that is
fundamentally a flow network and trying to superimpose the appearance
of datagrams on it.

The DMFs seem to me to be really nothing more than a way for routers
to know how to get to the borders of their areas.

In short, Noel, I think that you are taking something and making it
overly complex.

Why not assume that within an area, all routers have a reasonably
consistant database. Their databases may not be complete, but what
they have is correct. If this assumption recurses, up and down the
hierarchy, then you can do real hop-by-hop forwarding. Granted that
this turns the routing into No-Brainer (NB), but you seem to have
reduced datagram mode to NB anyway. 

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa26171; 5 Jan 94 10:41 EST
Received: from pizza by PIZZA.BBN.COM id aa08706; 5 Jan 94 10:27 EST
Received: from nic.near.net by PIZZA.BBN.COM id aa08702; 5 Jan 94 10:26 EST
Received: from necom830.cc.titech.ac.jp by nic.near.net id aa21719;
          5 Jan 94 10:27 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Thu, 6 Jan 94 00:23:18 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401051523.AA23502@necom830.cc.titech.ac.jp>
Subject: Re: mobility and NIMROD
To: kasten@ftp.com
Date: Thu, 6 Jan 94 0:23:16 JST
Cc: jcurran@nic.near.net, jnc@ginger.lcs.mit.edu, nimrod-wg@nic.near.net
In-Reply-To: <9401051430.AA26777@ftp.com>; from "Frank Kastenholz" at Jan 5, 94 9:30 am
X-Mailer: ELM [version 2.3 PL11]

>  > > ] Anyway, I don't think we need more than 4 layers. With a thousand lower
>  > > ] lever areas at each level, we can accomodate 1 Tera hosts.
>  > > 
>  > > Please do not assume uniform distributions.  Utilizations of .1 % to 10 %
>  > > are much more realistic, depending upon the administrative and operational
>  > > backpressure applied via policies.
>  > 
>  > So? 0.1% of 1 Tera is 1 Giga.
> 
> Please do not assume a fixed number of hierarchies.

No one has assumed anything. It is just a estimation, with which no
hard decision is made.

						Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa01150; 5 Jan 94 11:49 EST
Received: from pizza by PIZZA.BBN.COM id aa09086; 5 Jan 94 11:32 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa09082; 5 Jan 94 11:30 EST
Received: from Princeton.EDU by BBN.COM id aa29668; 5 Jan 94 11:29 EST
Received: from clytemnestra.Princeton.EDU by Princeton.EDU (5.65b/2.103/princeton)
	id AA26051; Wed, 5 Jan 94 11:28:59 -0500
Received: by clytemnestra.princeton.edu (4.1/1.113)
	id AA20256; Wed, 5 Jan 94 11:28:58 EST
Message-Id: <9401051628.AA20256@clytemnestra.princeton.edu>
To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM
Subject: Re: New datagram mode 
In-Reply-To: Your message of "Wed, 05 Jan 1994 10:02:21 EST."
             <9401051502.AA27982@ftp.com> 
X-Mailer: exmh version 1.2gamma 12/21/93
Date: Wed, 05 Jan 1994 11:28:55 EST
From: John Wagner <jwagner@princeton.edu>

> I read Noel's note and have a couple of comments.
> 
>  >         As a result of all this, I prefer systems which do not have to make
>  > this guarantee of "near-global consistency"; they are both simpler and more
>  > robust. I would, as an architect, *very* desperately like to avoid any such
>  > design *if at all possible*. It was for this reason that I was thinking 
that
>  > perhaps the only non-flow mode in Nimrod would be a "source-route in 
packet"
>  > mode. This guaranteed non-looping paths, but without having to guarantee 
the
>  > near-global consistency in databases, etc, etc.
> 
> I'm not sure exactly what Noel means by "global" in this paragraph.
> Does it mean global in that _all_ routers _everywhere_ must be
> consistant?  Or does it mean that all routers in any one area must be
> consistant?
> 
> I tend to believe the later. Routers in a given area need only
> understand how to get to the border-routers of their area, and to the
> border-routers of contained areas.

Isn't this simply a restatement of the fact that Nimrod is using a hierachical 
view of the network?  The border routers (conceptually) sit at the interfaces 
between hierarchy levels.  The problem comes in determining which routers are 
at the border (since in a dynamic network what was A.B may suddenly be A.B.A 
and A.B.B).

  John Wagner


Received: from PIZZA.BBN.COM by BBN.COM id aa07620; 5 Jan 94 13:39 EST
Received: from pizza by PIZZA.BBN.COM id aa00333; 5 Jan 94 13:23 EST
Received: from BBN.COM by PIZZA.BBN.COM id ae00313; 5 Jan 94 13:19 EST
Received: from babyoil.ftp.com by BBN.COM id aa04644; 5 Jan 94 12:49 EST
Received: by ftp.com 
	id AA05977; Wed, 5 Jan 94 12:49:04 -0500
Date: Wed, 5 Jan 94 12:49:04 -0500
Message-Id: <9401051749.AA05977@ftp.com>
To: jwagner@princeton.edu
Subject: Re: New datagram mode 
From: Frank Kastenholz <kasten@ftp.com>
Reply-To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM

>> consistant?  Or does it mean that all routers in any one area must be
>> consistant?
>> 
>> I tend to believe the later. Routers in a given area need only
>> understand how to get to the border-routers of their area, and to the
>> border-routers of contained areas.
>
>Isn't this simply a restatement of the fact that Nimrod is using a hierachical
>view of the network?  The border routers (conceptually) sit at the interfaces 
>between hierarchy levels.  The problem comes in determining which routers are 
>at the border (since in a dynamic network what was A.B may suddenly be A.B.A 
>and A.B.B).

Well, first, I was not quite sure what Noel meant by 'global' -- did
he mean global as in the entire earth/universe/... or did he really
mean within a given area. Making the routing database be "area-wide"
consistant is a much easier problem. If an area grows to big,
requiring that an inordinate amount of resources are required to keep
the database consistant then I can "simply" partition the area or
change the area from being one level to being several levels (I can
grow the hierarchy "horizontally" or "vertically"). In the extreme, I
could keep my areas small enough so that I could use RIP, or even
static routes to do the routing :-)

I think that determining border routers is "simple" -- if a router
advertises that it can get to locators that are all within the area
then the router is not a border router. If a router advertises that
it can get to locators which are strict prefixes of the area's
locator, then that router is a border router to a "containing" area.
If a router advertises that it can get to a locator of which the
area's locator is a strict prefix, then that router is a border
router to a "contained" area. For instance, if I am in area A.B.C.D
and a router advertises it can get to A.B.C then that router can get
to my "containing" area; if a router advertises that it can get to
A.B.C.D.E then that router can get to a contained area.

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa29002; 6 Jan 94 2:33 EST
Received: from pizza by PIZZA.BBN.COM id aa00884; 6 Jan 94 2:16 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa00880; 6 Jan 94 2:14 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa28541; 6 Jan 94 2:14 EST
Received: by ginger.lcs.mit.edu 
	id AA18784; Thu, 6 Jan 94 02:14:43 -0500
Date: Thu, 6 Jan 94 02:14:43 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401060714.AA18784@ginger.lcs.mit.edu>
To: kasten@ftp.com, nimrod-wg@BBN.COM
Subject: Re: New datagram mode
Cc: jnc@ginger.lcs.mit.edu

    > I prefer systems which do not have to make this guarantee of
    > "near-global consistency"

    I'm not sure exactly what Noel means by "global" in this paragraph.
    Does it mean global in that _all_ routers _everywhere_ must be
    consistant?

What I mean by "near-global" is that in a hop-by-hop system, all the routers
in the routing scope of an object X (i.e. routers which contain hop-by-hop
routing table entries for X) have to have a consistent idea of how to get to
X, otherwise loops can develop. It's clearly not *all* routers; that's why I
used the "near-global" terminology...

    Or does it mean that all routers in any one area must be consistant?
    I tend to believe the later. Routers in a given area need only
    understand how to get to the border-routers of their area, and to the
    border-routers of contained areas.

It actually doesn't have anything to do with area boundaries. The information
you mention is the theoretical minimum necessary to make a hierarchical
routing architecture work, but that set is the same no matter which
routing/forwarding model is used.

You also may be falling into the "abstraction action boundaries are the same
as abstraction naming boundaries" trap. Making the two the same produces
strictly hierarchical routing, which is inefficient.


    Just to be sure that I understand; I'd specify a source route of the form
    A-B-C-D... and there would be a 'pre-set-up' flow in the various
    areas, one between A and B, another between B and C and so on?

Pretty much, except that a source route is probably specified as a list of
(virtual) links, not switches. If A and B have several alternate paths between
them, with different attributes, how does A decide which one to pick? (To put
it in graph-theory terms; if two nodes can have more than one arc between them,
a list of nodes does not specify a unique path through the graph, but a list
of arcs does.) There would be a pre-setup flow for each (virtual) link, yes.


    > all routers have to contain a minimal set of "pre-setup" flows to
    > certain routers which are at critical places in the abstraction
    > hierarchy.

    Doesn't this conflict with your earlier note about maps and meshes?
    By allowing the notion of "critical places" you then allow the notion
    of "single point of failure". If a router at a critical place fails
    then, by definition (since the place/router is critical) you have a
    failure with major impact in the network.

Ah, sorry, I'm trying to put 5 pounds of idea into 2 pounds of words, again.
(Jeez, the noelgram was *already* 24KB! :-)

Yes, reliance on individual routers would produce SPOF's. However, the
"critical places" are more of a concept than an actual physical thing;
remember, they are "places" in the abstraction hierarchy, not the topology!
In reality, any one of a set of routers with property X (i.e. any border
router into a lower level area, or any border router out of this area) would
do.

Clearly, we'll have to have some mechanism to detect failed routers, etc,
etc, etc, but that's all pretty much grind-the-crank competent engineering.


    I read these paragraphs as saying that the source route is "deduced"
    from the source and destination locators. If the source locator is
    A.B.C and the destination locator is X.Y.Z then the source route
    between the two is deduced to be A.B.C - A.B - A - X - X.Y - X.Y.Z.
    No? Isn't this "no brainer routing"?

There is no "source route", even a deduced one. (I've heard Paul Francis'
argument that a hierarchical locator is a source route, and I think it's
confused.) The packet is routed in an incremental fashion, and the optimality
of the resulting route is dependant on the amount of detail about the topology
(above the theoretical minimum) you are willing to pay the price of
distributing. (As mentioned above, this tradeoff between the amount of info,
and the optimiality of routes, is present in all routing architectures.) Later
on in the note I said:

    This level of state provides strictly hierarchical routing. There are
    pretty obvious optimizations... For example, if you keep DMF's to more
    than the minimal set ..., and keep your table sorted for efficient lookups
    (probably much the same as the current routing table for hop-by-hop
    datagrams), you may be able to short-cut.
    For example, using the case above (a packet from A.P.Q.R to A.X.Y.Z),
    if A.P.Q is actually a neighbour to A.X.Y, and maintains a flow directly
    from A.P.Q to A.X.Y, then when the packet reaches A.P.Q, instead of going
    the rest of the way up and down, the pointer can be set into the
    destination locator at A.X.Y, and the packet sent there directly.

So, if only the minimum necessary set of DMF's were available, yes, it would
be strictly heirarchical routing. ("No-brainer" routing refers to something
subtly different, the bias to one long-haul network over another when several
equally good ones are available, caused by the abstraction boundaries
including the destination with one of the long-haul networks.)


    > It might be possible to remove the "hop count" field

    Can the set of routers comprising a flow change in "mid-flow"?

The set of resources (physical and virtual) which are the path of a flow
cannot change, but the actual set of physical resources (and routers) which
make up a virtual resource might change due to local repair/load-adjustment,
so yes.

    Can the set be temporarily inconsistant? Might a loop exist as a transient
    of some form during a repair of the flow?

I don't yet know all the details of flow-repair, so I can't answer this for
certain, but I don't think so. At least, I'm having a hard time coming up with
a counter-example, assuming some mildly intelligent flow-repair algorithms.
Maybe if you have bits of even older versions of the flow left around in
various swithces... hmm.

    If so, the hop-count might be deemed necessary. (aside) However; it occurs
    to me that if there is no hop-count, even in today's network:
     If a loop occurs, then the loop will eventually be fixed. When it is
     fixed, all existing packets in the loop will be forwarded on towards
     their destinations, just as if nothing had happened. Some packets
     that were sent into the loop will be lost due to lack of buffers.
     I suspect that this loss will be somewhat random (though it might
     hit "new" packets disproportionatley). So maybe a hop-count
     is not necessary in any event. (I have not thought all this through...)
   
It turns out that John Curran has convinced me that we probably ought to keep
the hop-count (as belt and suspenders engineering), but let me go think this
through. You have an interesting argument...


    Now, what I want to know is what does this new datagram model really
    buy that the current model does not do.

Good question. By "current model", do you mean hop-by-hop, or source-routed?

If the former, I claim it's more robust, and probably slightly more efficient.
It's also less complex to provide, since you don't have the consistency
requirements to meet (especially in the Nimrod environment; you don't have to
make sure people compute their hop-by-hop routing tables from the same
topology map). It may also be a little more flexible in terms of allowing
local control of the overhead/optimality tradeoff knob, but I have to think
about that.

If the latter, I think the original note answers it:

    This mode would have almost none of the disadvantages of SRD, since
    the source doesn't have to compute a route, and there is no source route
    in the packet, just the source and destination locator ... Again, in a
    world with resource-allocation going on, that DMF would have a reource
    limit associated with it, which would prevent pure datagram traffic from
    interfering with other resource allocations.

(The latter advantage applies to the comparison with hop-by-hop as well.
obviously.) The new mode is also less complex to process in most routers than
SRD.

    It seems to me that what you are doing is taking a network that is
    fundamentally a flow network and trying to superimpose the appearance
    of datagrams on it.

There is some truth to this. Howver, it was *not* done to get an "all-flow"
network. That's just a nice (and interesting, but I'll talk about that in a
second) byproduct.

The use of flows to get the packets from each active point to the next is
there because it provides for more loop-resistant *forwarding* (there are no
routing decisions at all being made) between active points. However, flows
aren't there just to have flows, either! Remember, in a way, flow setup is
nothing but an efficient way of doing "source" routing. (All through Nimrod,
it's not actually necessarily generated by the true source, which is why I
don't like the term "source". I prefer the term "unitary", since the path is
picked by one entity.)

I say "in a way", since flows also happen to interact well with some resource
allocation ideas, etc. Again, nice, and interesting, and more in a sec on
that. What's most important, all through Nimrod as a *routing architecture*,
is the unitary routing, which is what underlies the whole thing, and *that's*
there for a variety of reasons: robustness, and the ability to support source
policies, etc, etc. So, that's why the flows are there.

So why are flows so interesting? What is interesting to me about all this is
the way that all sorts of seemingly unconnnected things are all leading down
the same path. Physicists have this deal where theories which have unexpected
scopes and synergies have a peculiar "beauty", which they take to mean that
the theory is more likely true. I have much the same feeling. It is just Too
Wierd the way all these various things fit together so nicely. I take it to
mean that we have discovered something very "right". An interesting topic,
but not to be explored in detail right here!


    The DMFs seem to me to be really nothing more than a way for routers
    to know how to get to the borders of their areas. ... Why not assume that
    within an area, all routers have a reasonably consistant database. Their
    databases may not be complete, but what they have is correct. If this
    assumption recurses, up and down the hierarchy, then you can do real
    hop-by-hop forwarding.

"Reasonably consistent" doesn't cut it. Unless they *are* consistent, you
*will* get routing loops. Hop-by-hop forwarding will not work without this
consistency, and that consistency, together with the way the actual decisions
on what the path is are distributed, represent real weaknesses. The key
difference between this scheme and previous schemes is *not* in the kind of
path which is constructed (at the broadest view), or in the information which
is needed in the various routers. It is true, in these areas it is basically
the same.

The key differences are in the consistency requirements, and the distributed
decision making on the path, and these differences are for reasons of
robustness. The actual mechanisms resulting from these considerations are i)
the use of flows to get the packets from each active point to the next, and
ii) the montonically increasing pointer into the locators carried with each
packet. The former is important since it provides for more loop-resistant
*forwarding* (there are no routing decisions at all being made) between active
points. The latter is important since it provides for more loop-resistant
*routing* at active points.

So, to restate, the overall goal is greater robustness in the routing (while
retaining efficiency in packet creation, size, and processing), and the
mechanism chosen is, at a high level, to increase the forwarding state in the
packets, in the form of the locator pointer, and the locally chosen flow (not
just a locally chosen *flow id*).

    Granted that this turns the routing into No-Brainer (NB), but you seem to
    have reduced datagram mode to NB anyway.

Again, NB is something slightly different from "pure hierarchical", which is
what I think you mean here.

Also again, how far from pure hierarchical, and how close to optimal you are
is a function of the amount of information you distribute (i.e. routing
overhead), and not of the particular routing architecture. *All* routing
architectures (whether hop-by-hop or unitary, and destination-vector or
map-distribution) have the same basic tradeoff. Think of it as one of the
"Laws of Thermodynamics" of routing!

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa29070; 6 Jan 94 2:40 EST
Received: from pizza by PIZZA.BBN.COM id aa00963; 6 Jan 94 2:29 EST
Received: from BBN.COM by PIZZA.BBN.COM id ab00959; 6 Jan 94 2:28 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa28778; 6 Jan 94 2:28 EST
Received: by ginger.lcs.mit.edu 
	id AA18847; Thu, 6 Jan 94 02:28:20 -0500
Date: Thu, 6 Jan 94 02:28:20 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401060728.AA18847@ginger.lcs.mit.edu>
To: jwagner@princeton.edu, kasten@ftp.com
Subject: Re: New datagram mode
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM

    > Routers in a given area need only understand how to get to the
    > border-routers of their area, and to the border-routers of contained
    > areas.

    Isn't this simply a restatement of the fact that Nimrod is using a
    hierachical view of the network? The border routers (conceptually) sit at
    the interfaces between hierarchy levels.

Again, don't think that abstraction action boundaries (i.e. the scopes over
which individual sub-components of an entity are visible) have to match
abstraction naming boundaries. (I.e., A.* may be visible as individual
destinations outside A.) That way lies pure hierarchical routing...

    The problem comes in determining which routers are at the border (since in
    a dynamic network what was A.B may suddenly be A.B.A and A.B.B).

Yes. Getting the boundaries configured consistently is going to be a big
issue. There's a paper by Seeger and Khanna (Josh Seeger and Atul Khanna,
"Reducing Routing Overhead in a Growing DDN", MILCOMM '86, IEEE, 1986) which
has a nice scheme for doing this in a way that can't be badly confused.

However, I've run out of brain cells for technical stuff, so I can't get into
that. Guess I'll go do some political flaming on the IETF list with what few
I have left at this point! :-)

	Noel	


Received: from PIZZA.BBN.COM by BBN.COM id aa29850; 6 Jan 94 3:11 EST
Received: from pizza by PIZZA.BBN.COM id aa01061; 6 Jan 94 3:01 EST
Received: from nic.near.net by PIZZA.BBN.COM id aa01057; 6 Jan 94 2:59 EST
Received: from nic.near.net by nic.near.net id aa03445; 6 Jan 94 3:00 EST
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
cc: kasten@ftp.com, nimrod-wg@nic.near.net
Subject: Re: New datagram mode 
In-reply-to: Your message of Thu, 06 Jan 1994 02:14:43 -0500.
             <9401060714.AA18784@ginger.lcs.mit.edu> 
Date: Thu, 06 Jan 1994 03:00:25 -0500
From: John Curran <jcurran@nic.near.net>

--------
] From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
] Subject: Re: New datagram mode
] Date: Thu, 6 Jan 94 02:14:43 -0500
] ...
]     (Frank)
]     If so, the hop-count might be deemed necessary. (aside) However; it occurs
]     to me that if there is no hop-count, even in today's network:
]      If a loop occurs, then the loop will eventually be fixed. When it is
]      fixed, all existing packets in the loop will be forwarded on towards
]      their destinations, just as if nothing had happened. Some packets
]      that were sent into the loop will be lost due to lack of buffers.
]      I suspect that this loss will be somewhat random (though it might
]      hit "new" packets disproportionatley). So maybe a hop-count
]      is not necessary in any event. (I have not thought all this through...)
]    
] It turns out that John Curran has convinced me that we probably ought to keep
] the hop-count (as belt and suspenders engineering), but let me go think this
] through. You have an interesting argument...

Given a routing loop (due to, for instance, a software failure which results 
conflicting area boundaries),  all of the datagrams sent to the destination 
will enter the loop and occupy a permanent portion of the capacity.  Without 
a hop-count, there exists a finite number of datagrams that can be sent until 
congestion is assured.   This congestion would remain even after the source 
had ceased transmission, and could prevent both management operations and 
propagation of correct maps.  

/John


Received: from PIZZA.BBN.COM by BBN.COM id aa04289; 6 Jan 94 4:14 EST
Received: from pizza by PIZZA.BBN.COM id aa01308; 6 Jan 94 4:02 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa01304; 6 Jan 94 4:00 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa02573; 6 Jan 94 4:00 EST
Received: by ginger.lcs.mit.edu 
	id AA18966; Thu, 6 Jan 94 04:00:00 -0500
Date: Thu, 6 Jan 94 04:00:00 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401060900.AA18966@ginger.lcs.mit.edu>
To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp
Subject: Re: New datagram mode
Cc: nimrod-wg@BBN.COM

    > The route calculation is now distributed, which takes care of the first,
    > there are no source routes in packets, which fixes the second, etc.

    The first one, the distributed calculation of route, only makes load
    of intermediate routers heavier.

The calculation happens in two steps, only the second one isn't really a step,
it's just a lookup in a precomputed table. And remember, this calculation
doesn't happen in *all* intermediate routers, just "active" routers.

The first step, the calculation of the DMF path, is obviously the expensive
one, but that can happen before the packet actually arrives. In fact, I'd
probably make calculating DMF paths the "idle task" for the router, so that
all the DMF's (even ones which are not currently instantiated in installed
flows) have paths available if you need to set one up (this is assuming that
the set of DMF's which are set up on demand in non-null).

The second step is the selection of the appropriate DMF when a packet arrives.
This is a "longest match" table lookup, exactly the same as the current
routing table lookup. I don't hear anyone (even Steve D :-) complaining that
that is too expensive!

    The second one, I can't understand. How can route be determined with a
    single flat EID? Doesn't your packet contain something like layers of Area
    IDs which makes the packet bulky?

The route is not determined directly from the EID. The EID has to be
translated to a locator, which is a hierarchically organized name, suitable
for use by the routing. The locator does in fact contain a sequence of area
names. As to whether the inclusion of locators makes the packet bulky, I guess
that's a matter of opinion. I don't really know how long locators will be, for
a start, nor what their representation will look like.

For example, purely as an illustration, if we get by for the moment with 5
levels, 2 of 2 bytes and 3 of one byte, and one byte of "total length", that
gives us 8 bytes, the same length as SIP addresses.


    I don't think you can forward any packet with EID only. Aren't you
    assuming some structure in EID?

I agree that you can't forward (or route) only on the EID. No, I don't assume
any structure in the EID. That's what locators are for; they are topologically
significant, hierarchically structured, names. Locators and EID's *together*
provide the functionality of current IPv4 "addresses" (and a little more
besides, obviously).


    Just as the current routing table gives information for the next hop
    determination to various networks, nimrod routing table could give
    information for the next hop determination to various areas.

What "nimrod routing table"? Up until the invention of DMF's, Nimrod had no
routing tables, at least in anything like the classical sense. There were
databases set up and used by the routing, but they were i) topology maps, and
ii) the database of active flows. Even with DMF's, the only database which
will contain "next hop" information will be the flow database.

    I think areas should be represented with EID of border routers, thus,
    routing table could be indexed with EIDs of border routers

As I have already said, this scheme has what I regard as a *fatal*
disadvantage, in that it ties the abstraction hierarchy too closely to the
physical topology. I think that the binding (i.e. linkage) between the two
should be something we can control, so that so that the change to the former
to match the latter is *controllable*.

You need to explain either i) why this goal is a bad goal, or ii) why some
other advantage of your scheme outweighs the disadvantage of not meeting
this goal. If you can't do either one, I don't think your scheme for areas
will be very useful.


    I'm assuming that routers have a routing table indexed by EIDs of certain
    border routers (those at the upper level areas and the direct lower level
    areas) and EIDs of directly reachable endhosts (that is, endhosts in the
    same datalink layer).

Ah, now I understand. Your use of "EID's" is only in the context of your
planned use of router EID's as area identifiers. Yes, this would work (if I
liked your scheme for area identifiers :-).


    > Only between pairs of routers which wish to protect themselves against
    > denial-of-service attacks. Like much else in Nimrod, this is a
    > cost/benfit knob that is set locally, to allow the users to make the
    > decision on what quality of service they want..

    What? Then, how intermediate routers can forward packets without
    a lot of effort for flow setup?

There is a certain amount of effort involved in getting ready to forward
packets, even in a hop-by-hop system. The routing has to distribute
information, and state has to be set up, etc, etc. It's just a different kind
of state; routing tables instead of topology databases, flow databases, etc.
I don't really know what you characterize as "a lot of effort".

Nimrod may have somewhat more setup overhead, both in state and computing,
than alternative schemes, but it also has advantages and capabilities they
lack, and I reckon those advantages and capabilities are worth the price in
setup overhead. Notice that I carefully say "setup overhead", since the actual
processing of user data packets (i.e. "forwarding overhead") is as efficient,
if not more so. Thus, the real-time operational characteristics are the same,
if not superior.


    Aren't you suggesting a trade-off between:

	Connectionless communication through routers fully connected
	at run time

    and

	Connectionless communication through routers fully connected
	in advance

Well, I'm not sure about the "fully connected", but yes, there is a tradeoff
between the amount you do in advance, and the amount that is done in response
to actual traffic. Again, just like everywhere in Nimrod, this is a
cost/benfit knob that can be set locally, to allow the users to make the
decision on what quality of service they feel like paying for.

    I think both will result in O(N^2) behaviour.

I'm not sure I quite understand what you are getting at here, but if you are
talking about the amount of state required to hold the DMF's (assuming we use
point-point flows, and not trees, as I suggested, which would make it plain
O(N), just like hop-by-hop routing) it is a more complex calculation than
that. There is unlikely to be a single router through which all the flows will
pass.

In addition, the "N" here is a function of the size of the area, etc, so the
amount of state can be controlled by controlling the area size, etc.


    > Actually, the important question is the number of simultaneously active
    > (i.e. set up) flows in a given router.

    I don't think you can assume any "flow" for connectionless communication.

If this datagram scheme is adopted, then there will be a subset of the flows
(or tree-flows) which will be used for datagrams (i.e. connectionless
communication).

    if a user want full protection against service denial, it is O(N^2).

I can't see where this comes from at all. Denial-of-service protection is
given by the use of traffic monitoring, along with soure routing. There is
nothing here that will cause O(N^2) growth.

The only way to have O(N^2) growth is to have O(N^2) growth in the total
number of *actual* connections (not *potential* connections) in the network.
However, the total number of hosts is only growing at O(N)! Since the number
of connections per host is clearly *not* growing at O(N), but is basically a
constant, claims of O(N^2) growth are clearly not well supported.


    It's much more reasonable to expect the end-end protection with each
    applications on individual hosts. Aggragated protection through
    intermediate routers can't be so meaningful. ... So, the protection, if
    any, should be application-wise.

True. I merely indicated that *some* denial-of-service protection was an
option (i.e. not a necessity).

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa05956; 6 Jan 94 4:48 EST
Received: from pizza by PIZZA.BBN.COM id aa01460; 6 Jan 94 4:39 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa01456; 6 Jan 94 4:37 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa05620; 6 Jan 94 4:31 EST
Received: by ginger.lcs.mit.edu 
	id AA19074; Thu, 6 Jan 94 04:30:21 -0500
Date: Thu, 6 Jan 94 04:30:21 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401060930.AA19074@ginger.lcs.mit.edu>
To: Christian.Huitema@sophia.inria.fr, jmoy@proteon.com, nimrod-wg@BBN.COM
Subject: Re: Maps and meshes in the real world
Cc: jnc@ginger.lcs.mit.edu

    OSPF [does] allow you to specify different link costs in each direction...
    you can make the cost in one direction so prohibitively large that the
    link will in practice only be used in one direction.
    Ahh, perhaps what you are talking about is control traffic. Indeed,
    that must be bidirectional in OSPF.

Hmm. Interesting. Hard to see how to run a routing protocol when information
can only flow in one direction from router A to router B (I mean in general,
not along a particular link). I mean, if you never hear anything back, how do
you even know it's up?

Unidirectional links per se aren't a theoretical problem, as long as there's
*some* way back. Still, it's good point, and one we should remember.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa01479; 6 Jan 94 6:53 EST
Received: from pizza by PIZZA.BBN.COM id ab00330; 6 Jan 94 6:36 EST
Received: from BBN.COM by PIZZA.BBN.COM id ac00318; 6 Jan 94 6:32 EST
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa01032; 6 Jan 94 6:31 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Thu, 6 Jan 94 19:19:17 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta>
Message-Id: <9401061019.AA29004@necom830.cc.titech.ac.jp>
Subject: Re: New datagram mode
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Date: Thu, 6 Jan 94 19:19:15 JST
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
In-Reply-To: <9401060900.AA18966@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 6, 94 4:00 am
X-Mailer: ELM [version 2.3 PL11]

>     > The route calculation is now distributed, which takes care of the first,
>     > there are no source routes in packets, which fixes the second, etc.
> 
>     The first one, the distributed calculation of route, only makes load
>     of intermediate routers heavier.
> 
> The calculation happens in two steps, only the second one isn't really a step,
> it's just a lookup in a precomputed table.

OK. Call the table "nimrod routing table", because it is so.

> And remember, this calculation
> doesn't happen in *all* intermediate routers, just "active" routers.

All routers are either active or down. So?

> The first step, the calculation of the DMF path, is obviously the expensive
> one, but that can happen before the packet actually arrives. In fact, I'd
> probably make calculating DMF paths the "idle task" for the router,

Then, the network will melt down upon congestion.

Why don't you try to avoid "obviously expensive" things, instead?

> The second step is the selection of the appropriate DMF when a packet arrives.
> This is a "longest match" table lookup,

Then, the lookup will be slow.

> As to whether the inclusion of locators makes the packet bulky, I guess
> that's a matter of opinion. I don't really know how long locators will be, for
> a start, nor what their representation will look like.
> 
> For example, purely as an illustration, if we get by for the moment with 5
> levels, 2 of 2 bytes and 3 of one byte, and one byte of "total length", that
> gives us 8 bytes, the same length as SIP addresses.

Matter of opinion? It's *YOU* who hate variable length thingy.

>     Just as the current routing table gives information for the next hop
>     determination to various networks, nimrod routing table could give
>     information for the next hop determination to various areas.
> 
> What "nimrod routing table"? Up until the invention of DMF's, Nimrod had no
> routing tables, at least in anything like the classical sense.

I'm not constrained with any classical sense.

>     I think areas should be represented with EID of border routers, thus,
>     routing table could be indexed with EIDs of border routers
> 
> As I have already said, this scheme has what I regard as a *fatal*
> disadvantage,

No, I don't think you have shown any disadvantage.

> in that it ties the abstraction hierarchy too closely to the
> physical topology.

Representing an area with border routers is the mathematically exact
representation. The representation is minimal necessary.

Anything less than that lacks information so that the it needs some
amount of hand configuration and can't work agaist dynamic area
subdivision or merge.

> I think that the binding (i.e. linkage) between the two
> should be something we can control, so that so that the change to the former
> to match the latter is *controllable*.

IMHO, the binding SHOULD NOT need any control.

The binding should be automatic. No hand configuration, please.

> You need to explain either i) why this goal is a bad goal, or ii) why some
> other advantage of your scheme outweighs the disadvantage of not meeting
> this goal. If you can't do either one, I don't think your scheme for areas
> will be very useful.

	i) Your goal is bad because the primary goal is "NO NEED OF
	CONTROL".

	i) Your goal is bad because the definition of "controlable" is
	not given. With the arbitrary definition of "controlable",
	any scheme, including mine, is "controlable".

It should also be noted that quite versatile control of my scheme is
possible through DNS.

Moreover, with any scheme including mine, you can, at least, control
the configuration of area hierarchy, anyway.

Enough?

>     What? Then, how intermediate routers can forward packets without
>     a lot of effort for flow setup?
> 
> There is a certain amount of effort involved in getting ready to forward
> packets, even in a hop-by-hop system. The routing has to distribute
> information, and state has to be set up, etc, etc. It's just a different kind
> of state; routing tables instead of topology databases, flow databases, etc.
> I don't really know what you characterize as "a lot of effort".

If you want to say flow set up do not need much effort, you should
use flows set up by end hosts even for connectionless communications.

> Well, I'm not sure about the "fully connected", but yes, there is a tradeoff
> between the amount you do in advance, and the amount that is done in response
> to actual traffic. Again, just like everywhere in Nimrod, this is a
> cost/benfit knob that can be set locally, to allow the users to make the
> decision on what quality of service they feel like paying for.

I'm afraid your knob is on cost/cost without any benefit.

>     I think both will result in O(N^2) behaviour.
> 
> I'm not sure I quite understand what you are getting at here, but if you are
> talking about the amount of state required to hold the DMF's (assuming we use
> point-point flows, and not trees, as I suggested, which would make it plain
> O(N), just like hop-by-hop routing) it is a more complex calculation than
> that. There is unlikely to be a single router through which all the flows will
> pass.

Within a single area, concentration is likely to happen.

> In addition, the "N" here is a function of the size of the area, etc, so the
> amount of state can be controlled by controlling the area size, etc.

So, if the area size is 1,000, you will get 1,000,000 connections.

>     > Actually, the important question is the number of simultaneously active
>     > (i.e. set up) flows in a given router.
> 
>     I don't think you can assume any "flow" for connectionless communication.
> 
> If this datagram scheme is adopted, then there will be a subset of the flows
> (or tree-flows) which will be used for datagrams (i.e. connectionless
> communication).

Don't you misundersand the meaning of "connectionless"? UDP could be
"connected", in which case flow IDs should be assigned.

Still, there is need of "connectionless" UDP.

>     if a user want full protection against service denial, it is O(N^2).

> The only way to have O(N^2) growth is to have O(N^2) growth in the total
> number of *actual* connections (not *potential* connections) in the network.
> However, the total number of hosts is only growing at O(N)! Since the number
> of connections per host is clearly *not* growing at O(N), but is basically a
> constant, claims of O(N^2) growth are clearly not well supported.

With connectionless communication, by definition, there is NO end-end
*actual* connections. So, the number of hosts is irrelevant.

Still, with your scheme, all the routers must be connected. Thus, it is
O(N^2).

>     It's much more reasonable to expect the end-end protection with each
>     applications on individual hosts. Aggragated protection through
>     intermediate routers can't be so meaningful. ... So, the protection, if
>     any, should be application-wise.
> 
> True. I merely indicated that *some* denial-of-service protection was an
> option (i.e. not a necessity).

I think you are mixing up all the unrelated things.

						Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa13273; 6 Jan 94 11:21 EST
Received: from pizza by PIZZA.BBN.COM id aa01551; 6 Jan 94 10:52 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa01547; 6 Jan 94 10:50 EST
Received: from Princeton.EDU by BBN.COM id aa09337; 6 Jan 94 10:16 EST
Received: from ponyexpress.Princeton.EDU by Princeton.EDU (5.65b/2.103/princeton)
	id AA12754; Thu, 6 Jan 94 09:53:41 -0500
Received: from flagstaff.Princeton.EDU by ponyexpress.princeton.edu (5.65c/1.113/newPE)
	id AA14460; Thu, 6 Jan 1994 09:53:40 -0500
Received: by flagstaff.Princeton.EDU (4.1/Phoenix_Cluster_Client)
	id AA22699; Thu, 6 Jan 94 09:53:39 EST
Message-Id: <9401061453.AA22699@flagstaff.Princeton.EDU>
To: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Cc: Noel Chiappa <jnc@ginger.lcs.mit.edu>, nimrod-wg@BBN.COM
Subject: Re: New datagram mode 
In-Reply-To: Your message of "Thu, 06 Jan 1994 19:19:15 +0200."
             <9401061019.AA29004@necom830.cc.titech.ac.jp> 
X-Mailer: exmh version 1.2gamma 12/21/93
Date: Thu, 06 Jan 1994 09:53:39 EST
From: John Wagner <jwagner@princeton.edu>


Masataka,

> Still, with your scheme, all the routers must be connected. Thus, it is
> O(N^2).

I think you are misunderstanding Noel's scheme.  It requires that all routers 
have at lease 2 connections (this level and the next level up) but does not 
require more than 3 connections (this level, the next level up, and the next 
level down).  It does not require that all routers be connected to all other 
routers although it allows for more than the 3 connections I described.  The 
assumption is a loose mesh.


Noel,  The funny part of your scheme is that you've created a variation on NJE 
routing as practiced in Bitnet.  There are parts of Bitnet (the core/backbone 
systems) that are connected in a (fully connected) mesh.  But the routing 
through those parts are performed by sending the files over the pre-defined 
"flows" (NJE routes defined in the routing tables).  These "flows" are computed 
globally and the maps distributed as flat files.  But there are still 
occasional occurrences of loops when maps don't get updated in synch.  The time 
to heal the loops is more important than the fact they occur.  Should Nimrod 
perform dynamic loop detection? 

Using your scheme, is the following possible?  I have packets following a 
predefined flow, but there is congestion in a part of the path that flow 
follows.  Can I add a different path through the physical topology and split 
the flow over the two paths to dynamically increase the size of the pipe but 
still use the same flow identification?

   John Wagner


Received: from PIZZA.BBN.COM by BBN.COM id aa04854; 8 Jan 94 5:15 EST
Received: from pizza by PIZZA.BBN.COM id aa12926; 8 Jan 94 5:04 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa12922; 8 Jan 94 5:01 EST
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa04661; 8 Jan 94 5:00 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Sat, 8 Jan 94 18:55:51 +0859
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401080956.AA11834@necom830.cc.titech.ac.jp>
Subject: Re: New datagram mode
To: John Wagner <jwagner@princeton.edu>
Date: Sat, 8 Jan 94 18:55:50 JST
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
In-Reply-To: <9401061453.AA22699@flagstaff.Princeton.EDU>; from "John Wagner" at Jan 6, 94 9:53 am
X-Mailer: ELM [version 2.3 PL11]

> Masataka,
> 
> > Still, with your scheme, all the routers must be connected. Thus, it is
> > O(N^2).
> 
> I think you are misunderstanding Noel's scheme.  It requires that all routers 
> have at lease 2 connections (this level and the next level up)

In which case, packets are relayed hop-by-hop.

> but does not 
> require more than 3 connections (this level, the next level up, and the next 
> level down).

No. So?

> It does not require that all routers be connected to all other 
> routers although it allows for more than the 3 connections I described.  The 
> assumption is a loose mesh.

Even if the physical topology is mesh of routers at the same level,
all routers must be directly connected by flows, if you want to route
packets through a flow without hop-by-hop relaying.

					>c@9F
					mohta@cc.titech.ac.jp


Received: from PIZZA.BBN.COM by BBN.COM id aa21100; 8 Jan 94 20:56 EST
Received: from pizza by PIZZA.BBN.COM id aa15307; 8 Jan 94 20:45 EST
Received: from nic.near.net by PIZZA.BBN.COM id aa15303; 8 Jan 94 20:42 EST
Received: from GINGER.LCS.MIT.EDU by nic.near.net id aa12264; 8 Jan 94 20:43 EST
Received: by ginger.lcs.mit.edu 
	id AA15497; Sat, 8 Jan 94 20:43:41 -0500
Date: Sat, 8 Jan 94 20:43:41 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401090143.AA15497@ginger.lcs.mit.edu>
To: jcurran@nic.near.net, jnc@ginger.lcs.mit.edu
Subject: Re: New datagram mode
Cc: jnc@ginger.lcs.mit.edu, kasten@ftp.com, nimrod-wg@nic.near.net

    > It turns out that John Curran has convinced me that we probably ought to
    > keep the hop-count (as belt and suspenders engineering)

    Given a routing loop ... all of the datagrams sent to the destination will
    enter the loop and occupy a permanent portion of the capacity.

This is a real issue, and I think he's right, we probably need to keep the hop
count (although see below for a counter-argument).

My reasoning is that preventing looping data traffic is very desireable, since
the side-effects are pretty bad. In the best engineered robust systems, there
is redundancy for critical functions, and if the redundancy consists of two
entirely separate systems, so much the better (provided the cost is not
excessive). The nicest thing about the hop count, from my point of view, is
that it represents an *entirely* separate mechanism for dealing with the issue
of looping packets. It is a simple, very effective mechanism for catching and
killing looping packets.

Of course, if a loop forms among a group of routers which do not properly
decrement the hop count, this mechanism would fail too. Looping packets can
still consume a fair amount of resources if they loop around a short loop at
the start of a long path, until they are caught and killed. Finally, it is
a mechanism which does add a certain expense to the forwarding of packets.


    This congestion would remain even after the source had ceased transmission

Yes and no. If a certain amount of bandwidth were allocated to datagrams,
presumably the loop would cause there to be more offered load than capacity.
This would cause packets to be dropped (via whatever drop algorithm), so over
time these packets would in all probability decay, so I don't think they'd be
there forever.

This could be seen as an argument as to why a hop count is not really
necessary; if routing loops are very, very, infrequent, even a very poor and
inefficient mechanism for dispensing with the resulting packets would be
acceptable. However, a permanent loop could still cause severe problems,
so I'm not sure I believe this argument.

    could prevent both management operations and propagation of correct maps.

I'm not sure about this. I'd hope that a future network, as an aid to
robustness, gives priority to management, operations and routing traffic
over normal user traffic. So this shouldn't be an issue.


    Keep the hop-count; people makes mistakes, ergo maps can be inconsistant.

A map which was inconsistent with reality would only lead to a failure when
someone tried to set up the flow to instantiate the DMF, so that's not a
viable failure mode. However, there are probably potential misconfigurations
which can result in loops, I just can't think of any right now. There are
other failure modes (involving implementation bugs) which do result in loops
though, so they aren't impossible.

    (It's important to define the hop count in terms of all hops, active and
    non-active, less someone try to define it as a VLF counter...)

Hmmm. Assuming each VLF is not a loop, then a count of VLF's should detect
looping just as well as a count of actual hops. However, it's probably just as
likely (due to code bugs, etc) to have a loop in the VLF, so a *completely
independant* loop-detection mechanism at the lowest level (i.e. the hop count)
is probably a good idea...


In short, a hop count may not be necessary, or even the most efficient
engineering, but it is certainly very robust, since it provides an entirely
separate redundant method of preventing looping packets. This alone may make
it worth it for a global critical communication substrate.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa22559; 8 Jan 94 22:04 EST
Received: from pizza by PIZZA.BBN.COM id aa15463; 8 Jan 94 21:53 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa15459; 8 Jan 94 21:51 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa22413; 8 Jan 94 21:51 EST
Received: by ginger.lcs.mit.edu 
	id AA15671; Sat, 8 Jan 94 21:51:38 -0500
Date: Sat, 8 Jan 94 21:51:38 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401090251.AA15671@ginger.lcs.mit.edu>
To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp
Subject: Re: New datagram mode
Cc: nimrod-wg@BBN.COM

    > And remember, this calculation doesn't happen in *all* intermediate
    > routers, just "active" routers.

    All routers are either active or down. So?

No. The term "active router" was defined in the original message as one that
actually makes a decision about where to send the packet (as opposed to
handling it as part of a flow). I.e., it's the router at the end of one
DMF which chooses the next DMF to send the packet down. Perhaps a different
term would be better, but I don't know what.

    > The first step, the calculation of the DMF path, is obviously the
    > expensive one, but that can happen before the packet actually arrives.
    > In fact, I'd probably make calculating DMF paths the "idle task" for the
    > router,

    Then, the network will melt down upon congestion.

Again, no. Even if a router was fully busy handling traffic, it could still
load-shed enough to calculate those routes, *if it had to*. However, I expect
this will almost never happen. I don't think there is a *single* router in the
entire Internet which is busy 100% of the time. In the usual case, which is
what will almost always happen, these calculations will be performed with
cycles which would have otherwise gone unused, so they are effectively "free"
most of the time.

    Why don't you try to avoid "obviously expensive" things, instead?

This statement represents extremely simplistic design thinking. The goal here
is not to absolutely minimize the cost, since to do so usually involves
getting rid of some benefit as well. The important question about a given
feature is not just "how much does it cost", but "what does that cost buy
you". 

For something as complex as the underlying data layer, it's difficult to
assess exactly the benefits. I reckon the increased robustness is well worth
it, particularly as the resource being consumed (processing cycles) is one
which is increasing at fantastic rates. Cycles are cheap, and getting cheaper.
There are also a number of lesser issues, such as the side-benefit that this
can provide an easy way to create resource limits on datagram traffic, to
prevent datagram traffic interfering with the resources allocated to other
traffic. So, it's not that simple.


    > The second step is the selection of the appropriate DMF when a packet
    > arrives. This is a "longest match" table lookup,

    Then, the lookup will be slow.

This is exactly the same as the table lookup done now in the Internet. I
haven't notices that it's amazingly slow. In fact, I'll bet Steve Deering will
have a fit when you tell him that SIP fowarding is going to be "slow" because
of it.


    > As I have already said, this scheme has what I regard as a *fatal*
    > disadvantage,

    No, I don't think you have shown any disadvantage.

I regard the phrase immediately below as a severe disadvantage. We obviously
disagree.

    > in that it ties the abstraction hierarchy too closely to the
    > physical topology.

    Representing an area with border routers is the mathematically exact
    representation. The representation is minimal necessary.

No, it's not the minimal representation. It's the minimal *definition*. If we
assign unique names to these definitions, the unique names (i.e. the
representation) may still be considerably shorter than the definitions.

    Anything less than that lacks information so that the it needs some
    amount of hand configuration and can't work agaist dynamic area
    subdivision or merge.

This has disadvantages too. Assigning new locators to everything due to a
temporary partition, or router outage, is a real pain.

    The binding should be automatic. No hand configuration, please.

Things which are purely automatic are very inflexible. I don't believe a
mandatory automatic algorithm of the form you suggest is good design for
a system of this size.

In addition, the setting of area boundaries is something which is almost
certaily going to involve some configuration, or do you propose to automate
that as well, in a way which utterly removes humans from the loop?

    > You need to explain either i) why this goal is a bad goal, or ii) why
    > some other advantage of your scheme outweighs the disadvantage of not
    > meeting  this goal.

    i) Your goal is bad because the primary goal is "NO NEED OF CONTROL".

I'm not sure I agree that this is *the* primary goal, but I can meet this goal
(which is essentially the goal of automatic configuration) without throwing
away the goal of flexible control. Your scheme removes the possibility of
flexible control.

All throughout Nimrod there are "necessary" algorithms, i.e. algorithms which
must be there, but for which a particular algorithm is not a fundamental part
of the architecture. Examples are route selection, etc, etc. Particular
algorithms are not fundamental precisely to allow local control,
experimentation, replacement, etc.

If we make the algorithm for naming of areas a similar local option, we can
certainly define a "default" algorithm for picking area names which *is*
autoconfiguring, i.e. your goal. If people then don't like the results, they
are free to substitute some other algorithm. By making the process of
assigning names to areas a mechanical one which precludes any possibility of
changing it, you are removing that choice.

    Your goal is bad because the definition of "controlable" is not given.
    With the arbitrary definition of "controlable", any scheme, including
    mine, is "controlable".

OK, I have a very simple definition of "controllable" for you, then. An
object's locator should not change *automatically* when a particular router
(e.g. any of the border routers of an enclosing area) is taken out of service,
or a similar minor topology change is made. Your scheme does not have this
property.

    Moreover, with any scheme including mine, you can, at least, control
    the configuration of area hierarchy, anyway.

This is irrelevant, since what I object to is the close tie between the
configuration and the resulting locators, not the process of configuration.


    >> how intermediate routers can forward packets without a lot of effort
    >> for flow setup?

    > There is a certain amount of effort involved in getting ready to forward
    > packets, even in a hop-by-hop system.

    If you want to say flow set up do not need much effort, you should
    use flows set up by end hosts even for connectionless communications.

Overhead of the flow set up by a host on an end-end basis for a datagram is
all borne by that single datagram, whereas the overhead for path calculation,
setup, etc for a DMF between two routers is shared among all the datagrams
which go from one router to another.

I would have thought that the difference, and thus the advantage of DMF's
for datagrams, would have been obvious.


    >> I think both will result in O(N^2) behaviour.

    > if you are talking about the amount of state required to hold the DMF's
    > it is a more complex calculation than that. There is unlikely to be a
    > single router through which all the flows will pass.

    Within a single area, concentration is likely to happen.

This is an assertion for which you provide no evidence, and which does not
seem at all correct to me.

    > In addition, the "N" here is a function of the size of the area, etc, so
    > the amount of state can be controlled by controlling the area size, etc.

    So, if the area size is 1,000, you will get 1,000,000 connections.

No, your calculation is completely wrong. You don't appear to understand the
mechanism at all. I will perform the detailed calculation in a reply to your
message to John Wagner, since this message is already quite long.


	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa27025; 8 Jan 94 23:15 EST
Received: from pizza by PIZZA.BBN.COM id aa15662; 8 Jan 94 23:06 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa15658; 8 Jan 94 23:03 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa26820; 8 Jan 94 23:04 EST
Received: by ginger.lcs.mit.edu 
	id AA15816; Sat, 8 Jan 94 23:04:19 -0500
Date: Sat, 8 Jan 94 23:04:19 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401090404.AA15816@ginger.lcs.mit.edu>
To: jwagner@princeton.edu, mohta@necom830.cc.titech.ac.jp, nimrod-wg@BBN.COM
Subject: Re: New datagram mode
Cc: jnc@ginger.lcs.mit.edu


    > Still, with your scheme, all the routers must be connected. Thus, it is
    > O(N^2).

    I think you are misunderstanding Noel's scheme. ... It does not require
    that all routers be connected to all other routers

It's not clear to me exactly what he thinks is O(N^2), whether it is the total
number of flows needed (which is a relatively uninteresting number, actually),
or the number of flows passing through any particular router (which *is* the
important number), or what. But you're right, it does not require complete
connectivity, although the actual math is rather complex, and depends on a
number of parameters.

    It requires that all routers have at lease 2 connections (this level and
    the next level up) but does not require more than 3 connections (this
    level, the next level up, and the next level down).

The latter number is incorrect, since there can be more than one "next level
down"; i.e. if I'm a border router for A, and A contain A.1 .. A.N, I need
N flows (people using the word "connection" on this mailing list will be forced
to wash off their keyboards with soap :-), one to each A.i in A.


    The funny part of your scheme is that you've created a variation on NJE 
    routing as practiced in Bitnet.

This is interesting. Can you provide a reference?

    These "flows" are computed globally and the maps distributed as flat
    files.

If I understand you correctly, these routes/paths are computed at one central
location? Yes, in that way it is similar, but there are important differences;
the "next hop" selection still seems to be hop-by-hop, since you refer to the
failure mode "when maps don't get updated in synch". Use of set-up flows (where
the state is installed in a reliable way, over the network) avoids this.


    Should Nimrod perform dynamic loop detection?

As far as I can tell, loops could only form in the presence of malfunctioning
software; simple timing errors, packet loss, etc, can't do it. I need to think
about this for a while, though. Obviously, it's a lot trickier to handle faults
when arbitarily malfunctioning switches can be involved...


    Using your scheme, is the following possible? I have packets following a
    predefined flow, but there is congestion in a part of the path that flow
    follows. Can I add a different path through the physical topology and
    split the flow over the two paths to dynamically increase the size of the
    pipe but still use the same flow identification?

Hmm, good question. I think it depends on how widely you wish to reroute, and
how the flow was set up originally. Nimrod does allow local repair in some
circumstances, and this could be considered as a "repair".

For instance, if you have a virtual link which is built out of a collection of
physical links, you can move the flow from one link to another without
violating the "contract" which was performed in setting up the flow, since the
traffic continues to flow over the virtual link which the source specified;
the internal details are hidden. The original application for this was if one
of the links failed, but it is equally applicaable if one of the links becomes
congested. Splitting the load across several links in a circumstance like this
is somewhat different, but would still be fine.

Obviously, if the source specifically called for physical link X, and that
link becomes congested, it would be a violation of the "contract" to move it
to another link without letting the source know.


	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa05764; 9 Jan 94 4:24 EST
Received: from pizza by PIZZA.BBN.COM id aa16600; 9 Jan 94 4:09 EST
Received: from nic.near.net by PIZZA.BBN.COM id aa16596; 9 Jan 94 4:06 EST
Received: from lager.cisco.com by nic.near.net id aa16823; 9 Jan 94 4:07 EST
Received: by lager.cisco.com id AA11605
  (5.67a/IDA-1.5 for nimrod-wg@nic.near.net); Sun, 9 Jan 1994 01:07:31 -0800
Date: Sun, 9 Jan 1994 01:07:31 -0800
From: Tony Li <tli@cisco.com>
Message-Id: <199401090907.AA11605@lager.cisco.com>
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Cc: jnc@ginger.lcs.mit.edu, kasten@ftp.com, nimrod-wg@nic.near.net
Subject: Re: New datagram mode


       could prevent both management operations and propagation of
       correct maps. 

   I'm not sure about this. I'd hope that a future network, as an aid to
   robustness, gives priority to management, operations and routing traffic
   over normal user traffic. So this shouldn't be an issue.

From an entirely pragmatic point of view, this turns out to be very
difficult to do today.  Classification of what's "important" is
challenging, and can only happen after the packet is already in the
box, consuming buffer memory.

It's a nice idea, but not something that I think will be pervasive
anytime soon.

Tony


Received: from PIZZA.BBN.COM by BBN.COM id aa08821; 9 Jan 94 7:31 EST
Received: from pizza by PIZZA.BBN.COM id aa17134; 9 Jan 94 7:20 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa17130; 9 Jan 94 7:18 EST
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa08665; 9 Jan 94 7:18 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Sun, 9 Jan 94 21:14:11 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401091214.AA14974@necom830.cc.titech.ac.jp>
Subject: Re: New datagram mode
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Date: Sun, 9 Jan 94 21:14:09 JST
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
In-Reply-To: <9401090251.AA15671@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 8, 94 9:51 pm
X-Mailer: ELM [version 2.3 PL11]

>     > And remember, this calculation doesn't happen in *all* intermediate
>     > routers, just "active" routers.
> 
>     All routers are either active or down. So?
> 
> No. The term "active router" was defined in the original message as one that
> actually makes a decision about where to send the packet (as opposed to
> handling it as part of a flow).

That's the router. The other which operates only in data link layer is not.

>     > The first step, the calculation of the DMF path, is obviously the
>     > expensive one, but that can happen before the packet actually arrives.
>     > In fact, I'd probably make calculating DMF paths the "idle task" for the
>     > router,

> Again, no. Even if a router was fully busy handling traffic, it could still
> load-shed enough to calculate those routes, *if it had to*. However, I expect
> this will almost never happen. I don't think there is a *single* router in the
> entire Internet which is busy 100% of the time. In the usual case, which is
> what will almost always happen, these calculations will be performed with
> cycles which would have otherwise gone unused, so they are effectively "free"
> most of the time.

If you don't think CPU will e 100% loaded, it is waste of band width
to say something should be "idle task".

>     Why don't you try to avoid "obviously expensive" things, instead?
> 
> This statement represents extremely simplistic design thinking. The goal here
> is not to absolutely minimize the cost, since to do so usually involves
> getting rid of some benefit as well. The important question about a given
> feature is not just "how much does it cost", but "what does that cost buy
> you". 

But, with your vague description, I can't evaluate the cost.

How, for example, is the cost of your imagenary DNS TNG?

> For something as complex as the underlying data layer, it's difficult to
> assess exactly the benefits.

That's one of the reason why we should avoid complex underlying data layer.

> I reckon the increased robustness is well worth
> it,

The simpler the more rubust.

> particularly as the resource being consumed (processing cycles) is one
> which is increasing at fantastic rates. Cycles are cheap, and getting cheaper.

Do you remember that you also said bandwidth will be cheaper?

The amount of processing has nothing to do with the complexity of
processing.

For example, instead of always sending all the data, you can send delta
to the previous data, which decreasees CPU load and link load and
increases the complexity of the protocol.

Why you objected source routing, then?

Wasn't your reason that it will consume more bandwidth and CPU?

While I don't think source routing any costly, your reasoning contains
too much contradictions.

> There are also a number of lesser issues, such as the side-benefit that this
> can provide an easy way to create resource limits on datagram traffic, to
> prevent datagram traffic interfering with the resources allocated to other
> traffic. So, it's not that simple.

Here, your terminology is incorrect.

Seemingly, you don't understand the difference between "connectionless
communication" and "datagram communication", which should be the
source of your confusion.

Perhaps, your schems is merely the scheme to handle connected datagrams
only.

There do exist connected datagram communication to which some resource
is pre-allocated along a flow to ensure maximum bandwidth or minimum
latency. In this case, all datagrams will have a pre-assigned flow ID.

So, your statement should be:

< There are also a number of lesser issues, such as the side-benefit that this
< can provide an easy way to create resource limits on connectionless
< traffic, to prevent connectionless traffic interfering with the
< resources allocated to connected traffic. So, it's not that simple.

BTW, it is absurd to imporse resource limits on connectionless
communications. If actual traffic of connected traffic is less
than pre-allocation, the extra bandwidth should be used for
connectionless traffic. Anyway, bandwidth allocation is not so
complex.

>     > The second step is the selection of the appropriate DMF when a packet
>     > arrives. This is a "longest match" table lookup,
> 
>     Then, the lookup will be slow.
> 
> This is exactly the same as the table lookup done now in the Internet. I
> haven't notices that it's amazingly slow. In fact, I'll bet Steve Deering will
> have a fit when you tell him that SIP fowarding is going to be "slow" because
> of it.

So, let's use my exact match scheme with source routing by EIDs of
border routers.

>     > As I have already said, this scheme has what I regard as a *fatal*
>     > disadvantage,
> 
>     No, I don't think you have shown any disadvantage.
> 
> I regard the phrase immediately below as a severe disadvantage. We obviously
> disagree.

Severe? Why? Didn't you thin CPU and bandwidth is cheap and getting
cheaper?

Moreover, I don't think packets should have the full representation
of an area. Packets should have source route of border routers,
instead, which is a lot shorter.

>     > in that it ties the abstraction hierarchy too closely to the
>     > physical topology.
> 
>     Representing an area with border routers is the mathematically exact
>     representation. The representation is minimal necessary.
> 
> No, it's not the minimal representation. It's the minimal *definition*. If we
> assign unique names to these definitions, the unique names (i.e. the
> representation) may still be considerably shorter than the definitions.

It's the minimal representation of an area. Anything shorter needs
external information such as mapping between Area_ID and EID of
border routers.

The lack of such mapping is the fatal defect of your scheme.

>     Anything less than that lacks information so that the it needs some
>     amount of hand configuration and can't work agaist dynamic area
>     subdivision or merge.
> 
> This has disadvantages too. Assigning new locators to everything due to a
> temporary partition, or router outage, is a real pain.

That is the disadvantage of your scheme, not mine. Your locator won't
work against area subdivision.

I don't think we need any locator, actually. With my scheme, a bit old,
but stable locator information is stored in DNS. On the other hand,
routing information exchange will provide the up to date real topology
of areas.

>     The binding should be automatic. No hand configuration, please.
> 
> Things which are purely automatic are very inflexible. I don't believe a
> mandatory automatic algorithm of the form you suggest is good design for
> a system of this size.

I don't think a lot of hand configuration is possibble for a system of
this size.

> In addition, the setting of area boundaries is something which is almost
> certaily going to involve some configuration,

So? That's what I wrote.

	>     Moreover, with any scheme including mine, you can, at least, control
	>     the configuration of area hierarchy, anyway.
	> 
	> This is irrelevant, since what I object to is the close tie between the
	> configuration and the resulting locators, not the process of configuration.

>     Your goal is bad because the definition of "controlable" is not given.
>     With the arbitrary definition of "controlable", any scheme, including
>     mine, is "controlable".
> 
> OK, I have a very simple definition of "controllable" for you, then. An
> object's locator should not change *automatically* when a particular router
> (e.g. any of the border routers of an enclosing area) is taken out of service,
> or a similar minor topology change is made. Your scheme does not have this
> property.

As I have wrote several times, an locator information of an object is
accessile through DNS, which should not, can not and, thus, does
not change *automatically*. DNS gives information on "all the possible
pathes".

On the other hand, topology information on configuration within areas and
connectivity between areas MUST change *automatically*. Routing information
gives information on "all the availale pathes".

An end sytem should/can/do construct source routing path combining
information on "all the possible pathes", "all the available pathes" and
preffered policy.

On the other hand, if you use shorhanded area ID, if an area is subdivided,
you must change objects' locator *automatically*. That is, such a scheme
is NOT controllable. 

>     If you want to say flow set up do not need much effort, you should
>     use flows set up by end hosts even for connectionless communications.
> 
> Overhead of the flow set up by a host on an end-end basis for a datagram is
> all borne by that single datagram, whereas the overhead for path calculation,
> setup, etc for a DMF between two routers is shared among all the datagrams
> which go from one router to another.

Thus, you must assumes all the routers are conneted, here, which is O(N^2).

> I would have thought that the difference, and thus the advantage of DMF's
> for datagrams, would have been obvious.

One of the reason of your confusion is you think conneted datagram only.

>     >> I think both will result in O(N^2) behaviour.
> 
>     > if you are talking about the amount of state required to hold the DMF's
>     > it is a more complex calculation than that. There is unlikely to be a
>     > single router through which all the flows will pass.
> 
>     Within a single area, concentration is likely to happen.
> 
> This is an assertion for which you provide no evidence, and which does not
> seem at all correct to me.

That's the topology of the most network providers today.

>     > In addition, the "N" here is a function of the size of the area, etc, so
>     > the amount of state can be controlled by controlling the area size, etc.
> 
>     So, if the area size is 1,000, you will get 1,000,000 connections.
> 
> No, your calculation is completely wrong. You don't appear to understand the
> mechanism at all. I will perform the detailed calculation in a reply to your
> message to John Wagner, since this message is already quite long.

The problem is that you wrongly think there is some "real flow" with
connectionless communication.

						Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa18802; 9 Jan 94 16:34 EST
Received: from pizza by PIZZA.BBN.COM id aa18517; 9 Jan 94 16:17 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa18513; 9 Jan 94 16:15 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa18320; 9 Jan 94 16:15 EST
Received: by ginger.lcs.mit.edu 
	id AA17145; Sun, 9 Jan 94 16:15:15 -0500
Date: Sun, 9 Jan 94 16:15:15 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401092115.AA17145@ginger.lcs.mit.edu>
To: jwagner@princeton.edu, mohta@necom830.cc.titech.ac.jp
Subject: Analysis of DMF's in new datagram mode
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM

    > It requires that all routers have at lease 2 connections (this level and
    > the next level up)

    In which case, packets are relayed hop-by-hop.

His statement about "2 connections (sic, grrr :-) - this level and the next
level up" is a little confusing to me. How many DMF's a router needs *in the
minimum possible configuration* depends on whether or not it is a border
router.

For routers which are not border routers, they need one DMF out to a border
router (i.e. up), and that's it. Traffic to other objects inside the area can
be handled by sending it to the border router, which will send it back to the
correct object. (Hey, it's not very optimal routing, but I'm talking about
the *minimal* set, right?)

Border routers require one DMF to each constituent object in their area, other
than ones which the border router is already a constituent of. If an object is
in itself an area (i.e. a sub-area of the area in question), with multiple
border routers, they only need one DMF to that object, to any one of the
border routers of the sub-area.

Obviously, routers which are border routers for a K level area are also either
border or interior routers in a K+1 level area; if interior, they will need
only the 1 DMF, but if they are border routers at that level as well they will
need DMF's for all objects in that K+1 level area which they are not
constituents of.


    Even if the physical topology is mesh of routers at the same level, all
    routers must be directly connected by flows, if you want to route packets
    through a flow without hop-by-hop relaying.

No. Let's do the calculation, I'm tired of these incorrect assertions.


Let's talk about an area which has a total of B border routers, and I interior
objects. A percentage P of the objects are themselves sub-areas of A, so there
are (1-P)*I interior non-sub-area objects, and P*I sub-areas.  For the
sub-areas, each has an average of S border routers. For the sake of
simplicity, let's assume that *none* of the border routers is part of any
sub-area. So, the resulting numbers are worst-case, since any border router
which *is* part of a sub-area removes the need for a DMF to that sub-area; the
packet is already in that sub-area.

Each of the B border routers has I DMF's associated with this area, for a
total of B*I. Each of the interior routers has single DMF. The non-area
routers contribute (1-P)*I, and border routers of sub-areas contribute P*I*S.
So, the total number of DMF's Ft associated with that area in the area is:

	Ft = I * (B + (1-P) + P*S)

Exactly how fast this will grow as the area grows is thus somewhat tricky, and
can't be computed without some assumptions about the relationship between B
and I, etc. However, we can take a crack at it. We can pretty well assume the
P is a constant, and we can assume either i) that S is a constant, since the
area grows by creating more sub-areas, not making the existing sub-areas
larger, or ii) that S grows at the same rate as B (since they both represent
the count of border routers, just at different level). Let's take the worst
case, which is that O(B) = O(S). The growth rate of the total number of DMF's,
O(Ft) is:

	O(Ft) = O(I) * O((1 + P)*B)

or

	O(Ft) = O(I) * k*O(B)

I don't know the exact formula for the number of border nodes on a graph with
a constant degree of connection between the nodes, but I'll hazard the guess
that the number of border nodes will grow as the square root of the number of
total nodes. (Handwave justification for this is that it's probably the same
as the ratio of the circumference of a circle to the area of a circle, since
you can model a graph with a fixed degree of connectivity among the nodes as a
geometrical figure where each node takes a fixed area, and thus the area is
proportional to the number of nodes.) So, that gives us that O(B) is
O(sqrt(I)). So, finally we get that:

	O(Ft) = k*O(I^1.5)

or just plain O(I^1.5) for short. Now, that's for one level. Note that if we
grow the system by adding more levels of area (a likely happening), rather
than increasing I, then I is a constant, so O(I) = 1, so there is no growth in
Ft at all!


Now, that was the total number of DMF's in the area. That's a relatively
uninteresting number, for reasons which should be intuitively obvious to
everyone. What's *important* are the number of DMF's which end in a router,
and the number of DMF's which go through (i.e. require state for storage)
interior routers.

Lets look at the number of DMF's which end in a router, Fe, first. We can
ignore interior routers, since they are a simple case; Fe = 1. The hard
case is the border routers. There, Fe = I, but that only counts DMF's for
this area.

If a router is a border router for a number of levels of area, it will have:

	Fe = sum(Ii), i=l...m

where Ii is the value of I for the area at level i, and l and m are the bounds
on the number of areas that router is a border router for. In the worst case,
l=0, and m=L, where L is the maximum number of levels in the system. So, 
for that worst case:

	Fe = sum(Ii), i=0..L

If, for simplicities sake, we assume that all Ii average I, then:

	Fe = I*L,

and:
	O(Fe) = O(I) * O(L)

Exactly what O(I) and O(L) are remains to be seen, but we can grow the system
be holdingyI constant, and growing L. As a matter of fact, if N is the total
number of nodes in the system, and I is constant, then O(L) = logN. So, in
this worst case scenario:

	O(Fe) = O(logN)

if we don't grow areas, but instead grow the number of levels.


To turn to the average number of DMF's through an router, Fa, we can calculate
that if we know the average path length for a DMF. The average number of DMF's
can be given by the formula:

	     total number of DMF's * average DMF length
	Fa = -------------------------------------------
			total number of routers

The average path length, A, for a graph of fixed degree (i.e. one in which
nodes have the same average number of arcs to neighbouring nodes, independant
of the size of the graph) is logN (where N is the number of nodes). [Chen 86]
This seems like a reasonable model to use, since physical routers have a
small, relatively constant average number of interfaces.

We can use the number of interior objects (I), plus the number of border
routers (B), as the number of nodes, in calculating the path length. So,
that gives us:

	A = log(I + B).

However, if we assume that O(B) = O(sqrt(I)), as above in the calculation of
Ft above, then B tends to become irrelevant, and:

	O(A) = O(logI)

We already deduced that the total number of flows, Ft, was:

	Ft = I * (B + (1-P) + P*S)

and that:

	O(Ft) = O(I^1.5)

What exactly to call the "total number of nodes" is a little tricky, and here's
where I have to do some even more energetic handwaving than usual. Strictly
speaking, we can't simply use the number of interior objects (I), plus the
number of border routers (B). The problem is that for the cases where a DMF
traverses a sub-area, it will traverse two border routers on that sub-area,
as well as some interior routers in that sub-area.

However, I will make the simplifying assumption that the "path length"
calculation above already made the assumption that each sub-area counted as
one "node", and errors above and below the division will cancel out. SO,
that gives us that the total number of routers, R, is:

	R = (I + B),

and again, assuming that B becomes irrelevant:

	O(R) = O(I)

So, since:

	O(Fa) = (O(Ft) * O(A)) / O(R)

substituting in for O(Ft), etc, gives us that:

	O(Fa) = (O(I^1.5) * O(logI)) / O(I)

or:

	O(Fa) = O(sqrt(I) * logI)

Again, of course, this only counts the flows from one level.

I could make some assumptions based on the idea that we will use an
"interstate" physical toplogy where long-distance connectivity is provided by
a physical mesh at the high levels, not by using the "local" meshes down an
abitratily large numebr of levels, but only a few levels down, but my head
started to hurt at that point; let's just do the simple thing for now.

If we assume that the worst case is that we have flows from all levels, L,
giving us a new count of DMF's, Fat, for the worst case we will make the above
quantity worse by a term of O(L), i.e.:

	O(Fat) = O(sqrt(I) * logI) * O(L)

As stated above, exactly what O(I) and O(L) are remain to be seen, but again,
we can grow the system by holding I constant, and growing L. Again, I is
constant, and O(L) = logN, so, again in this worst case scenario:

	O(Fat) = O(logN)
	
if we don't grow areas, but instead grow the number of levels.


I suppose I could sit and think about what the relationship between N, L and I
would be, if we decide to grow areas, and not the number of levels, but I
don't think it's worth it. However you slice it, these are *not* major growth
rates.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa18810; 9 Jan 94 16:34 EST
Received: from pizza by PIZZA.BBN.COM id aa18541; 9 Jan 94 16:19 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa18537; 9 Jan 94 16:18 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa18404; 9 Jan 94 16:18 EST
Received: by ginger.lcs.mit.edu 
	id AA17165; Sun, 9 Jan 94 16:18:22 -0500
Date: Sun, 9 Jan 94 16:18:22 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401092118.AA17165@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: My math
Cc: jnc@ginger.lcs.mit.edu

	I'll bet there are at least Z math errors in that last long message
about the number of DMF's needed. I just typed it in in one fell swoop, and
nobdy has checked it, so I'd be amazed if there weren't any errors! If anyone
sees any, I *am* interested in hearing about them.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa02480; 10 Jan 94 1:41 EST
Received: from pizza by PIZZA.BBN.COM id aa20062; 10 Jan 94 1:23 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa20058; 10 Jan 94 1:21 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa01788; 10 Jan 94 1:04 EST
Received: by ginger.lcs.mit.edu 
	id AA19705; Mon, 10 Jan 94 01:04:07 -0500
Date: Mon, 10 Jan 94 01:04:07 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401100604.AA19705@ginger.lcs.mit.edu>
To: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
Subject: Re:  My math

    I'll bet there are at least Z math errors in that last long message
    about the number of DMF's needed.

Well, I just found one error, but it wasn't just math. In the calculation of
the number of number of DMF's which end in a router, Fe, I counted the origins
of DMF's, but not the terminations. Here's a replacement for that section of
the note, with that error corrected.

It turns out that the results are the same, except that if you want the
*exact* value for O(Fe) there is a constant (1 + P) factor. Big deal.

	Noel

--------

Lets look at the number of DMF's which end in a router, Fe, first.

For interior routers, the count is the 1 outgoing DMF, plus incoming DMF's
from the border routers. There are two subcases: for routers which are not
border routers of sub-areas, all B border routers have a DMF to the router,
so there:

	Fei = (1 + B)

For routers which are border routers of sub-areas, the incoming DMF's are
shared among the border routers of the sub-area; assuming the distribution
is even, this gives us:

	Fes = (1 + B/S)

Note that in the worst case distribution, this is only as bad as the case
above, so we can ignore this case.

The border routers are a slightly harder case. There are I outgoing DMF's,
and the incoming DMF's from the interior routers are shared among all
the border routers. Again, assuming an even distribution, this gives us:

	Feb = (I + ((1-P)*I + P*I*S)/B)
or:
	Feb = I(1 + ((1-P) + P*S)/B)

Again, we can assume that P is a constant, and we can assume the worst case,
which is that O(B) = O(S). This gives us that:

	O(Feb) = O(I) * O(1 + (c1 + P*B)/B)
or:
	O(Feb) = O(I) * O(1 + P*B/B)
or:
	O(Feb) = (1 + P) * O(I)

This makes intuitive sense, since we know that the DMF's from interior routers
which aren't border routers of sub-areas are shared among all border routers
of that area, whereas each border router has to maintain a DMF in to each of
those routers, so that term drops off in importance. On the other hand, since
the number of interior routers which are border routers of sub-areas are
growing as fast as the number of border nodes, that term will remain.
 
Since we know that O(B) = O(sqrt(I)), in the long run, as areas get large, the
particular Fe which will experience the highest growth rate is Feb, i.e. the
number of DMF's which end in a border router. This also makes good sense. That
growth rate is just plain old O(I) for short.


The analysis above only counts DMF's for this area. If a router is a border
router for a number of levels of area, it will have:

	Febm = (1 + ((1-P) + P*S)/B) * sum(Ii), i=l...m

where Ii is the value of I for the area at level i, and l and m are the bounds
on the number of areas that router is a border router for. In the worst case,
l=0, and m=L, where L is the maximum number of levels in the system. So, 
for that worst case:

	Febm = (1 + ((1-P) + P*S)/B) * sum(Ii), i=0..L

If, for simplicity's sake, we assume that all Ii average I, then:

	Febm = (1 + ((1-P) + P*S)/B) * I*L

and, using the same analysis as above:

	O(Febm) = O(I) * O(L)

Exactly what O(I) and O(L) are remains to be seen, but we can grow the system
be holding I constant, and growing L. As a matter of fact, if N is the total
number of nodes in the system, and I is constant, then O(L) = logN. So, in
this worst case scenario:

	O(Febm) = O(logN)

if we don't grow areas, but instead grow the number of levels.


Received: from PIZZA.BBN.COM by BBN.COM id aa03421; 10 Jan 94 12:27 EST
Received: from pizza by PIZZA.BBN.COM id aa22308; 10 Jan 94 12:12 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa22304; 10 Jan 94 12:09 EST
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa02133;
          10 Jan 94 12:08 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 11 Jan 94 02:03:54 +0859
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401101704.AA23795@necom830.cc.titech.ac.jp>
Subject: Re: Analysis of DMF's in new datagram mode
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Date: Tue, 11 Jan 94 2:03:53 JST
Cc: jwagner@princeton.edu, jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
In-Reply-To: <9401092115.AA17145@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 9, 94 4:15 pm
X-Mailer: ELM [version 2.3 PL11]

> For routers which are not border routers, they need one DMF out to a border
> router (i.e. up), and that's it. Traffic to other objects inside the area can
> be handled by sending it to the border router, which will send it back to the
> correct object. (Hey, it's not very optimal routing, but I'm talking about
> the *minimal* set, right?)

Wrong. There will be load concentration at border routers, then.

> What's *important* are the number of DMF's which end in a router,
> and the number of DMF's which go through (i.e. require state for storage)
> interior routers.

The trick here is that, not all DMFs are equal.

That is, higher level DMF means a lot more traffic through it.

That's why you can't increase the number of levels at will.

Your model is not scalable.

> The average path length, A, for a graph of fixed degree (i.e. one in which
> nodes have the same average number of arcs to neighbouring nodes, independant
> of the size of the graph) is logN (where N is the number of nodes). [Chen 86]

Your reasoning is already broken... But...

While such an average over all the possible graphs will be so, as your
topology is mesh, you should average only over planar graphs. Thus,
the average path length will be sqrt(N).

It should have been obvious from the beginning when you said "MESH".

							Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa05130; 10 Jan 94 12:57 EST
Received: from pizza by PIZZA.BBN.COM id aa22371; 10 Jan 94 12:34 EST
Received: from nic.near.net by PIZZA.BBN.COM id aa22367; 10 Jan 94 12:32 EST
Received: from GINGER.LCS.MIT.EDU by nic.near.net id aa08593;
          10 Jan 94 12:33 EST
Received: by ginger.lcs.mit.edu 
	id AA21215; Mon, 10 Jan 94 12:32:40 -0500
Date: Mon, 10 Jan 94 12:32:40 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401101732.AA21215@ginger.lcs.mit.edu>
To: jnc@ginger.lcs.mit.edu, tli@cisco.com
Subject: Re: New datagram mode
Cc: kasten@ftp.com, nimrod-wg@nic.near.net

    > I'd hope that a future network, as an aid to obustness, gives priority
    > to management, operations and routing traffic over normal user traffic.

    this turns out to be very difficult to do today.  Classification of what's
    "important" is challenging, and can only happen after the packet is
    already in the box, consuming buffer memory.

Yeah, as processing of packets gets faster and faster, it gets harder and
harder to do anything complicated while you have your hands on them!
I would expect that part of the solution to the conundrum you raise is the
deployment of a "real" resource allocation and congestion control system at
the internetwork layer, together with flows.

For instance, in a fast hardware router (such as my "Faswitch" device), this
would enable you to segregate different packet streams into separate buffer
pools in hardware, partially avoiding the problem you mention.

A general strategy for handling this problem is to have small "incoming" pool,
and divide up the main pool into "transit" and "operation and maintainence".
When you go congested, you only allow packets out of the incoming pool if
there is space in the O&M pool. If not, transit packets get flushed as soon as
they hit the box. The users won't like it, but, hey, tough, the network has to
protect itself. Of course, you can always think up peculiar traffic patterns
that will make even this not work; e.g. an overload of O&M traffic.

However, I don't think there's anything *fundamental* that says we can't offer
priority to O&M traffic, and I think a robust design ought to make every effort
to do so...

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa08039; 10 Jan 94 13:46 EST
Received: from pizza by PIZZA.BBN.COM id aa22578; 10 Jan 94 13:28 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa22574; 10 Jan 94 13:26 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa06734; 10 Jan 94 13:26 EST
Received: by ginger.lcs.mit.edu 
	id AA21447; Mon, 10 Jan 94 13:25:45 -0500
Date: Mon, 10 Jan 94 13:25:45 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401101825.AA21447@ginger.lcs.mit.edu>
To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp
Subject: Re: New datagram mode
Cc: nimrod-wg@BBN.COM

    > No. The term "active router" was defined in the original message as one
    > that actually makes a decision about where to send the packet (as
    > opposed to handling it as part of a flow).

    That's the router. The other which operates only in data link layer is not.

The other ones are *not* operating only in the data link layer. They throw
away the physical network header, look at the intertnetwork header to figure
out what to do with the packet, and create a new physical network header.
Sure sounds like a "router" to me. The only difference is they don't get to
make decisions about which path this particular packet stream will use. I.e.
the selector is a flow-identifier, not a locator.


    How, for example, is the cost of your imagenary DNS TNG?

The DNS is not involved in any way with the new datagram mode, other than
providing the locator to the host before the packet is sent, *exactly* the way
the DNS works now. Since this cost is currently deemed acceptable, I would
assume it will be acceptable in the future. Perhaps I do not understand your
question?


    > I reckon the increased robustness is well worth it

    The simpler the more rubust.

Excuse me while I roll on the floor laughing. A common technique of building
engineers is to provide *redundant* load paths, so that if one fails, the
building will not collapse. Redundancy is *not* simpler, but it is more
robust. Your assertion is completely wrong.


    Why you objected source routing, then?  Wasn't your reason that it will
    consume more bandwidth and CPU?

I don't like source routing because the costs of source routing must be paid
on *each* packet, whereas of the overhead of calculating and setting up DMF's
can be amortized over many datagrams.


    > this can provide an easy way to create resource limits on datagram
    > traffic, to prevent datagram traffic interfering with the resources
    > allocated to other traffic.

    Perhaps, your schems is merely the scheme to handle connected datagrams
    only.

No, it is a scheme *precisely* to handle datagram used in "one-shot"
applications such as DNS lookups, etc.

    So, your statement should be:

    < the side-benefit that this can provide an easy way to create resource
    < limits on connectionless traffic, to prevent connectionless traffic
    < interfering with the resources allocated to connected traffic.

If I understand what you mean correctly, this says what I thought I was
saying. I don't like to use the term "connections" because then everyone
starts thinking I mean something like X.25, where critical state is in the
switches.

I think the problem is that we don't have good terminology for "datagrams
which are part of ongoing data streams" (e.g. the IP datagrams which are part
of a TCP connection), and "datagrams which are *not* part of ongoing data
streams" (e.g. UDP DNS traffic). Anyone have any suggestions?

    BTW, it is absurd to imporse resource limits on connectionless
    communications. If actual traffic of connected traffic is less than
    pre-allocation, the extra bandwidth should be used for connectionless
    traffic.

In general, I think most people would like the extra bandwidth to be "up for
grabs" among all clients, not just the datagram clients. I'm not saying that
datagrams ought to have a resource limit. I'm just thinking of the practical
difficulties in ensuring that datagram traffic does not interfere with
resources allocated to flows. The "trick" of assigning datagrams to a flow, and
then dividing up the bandwidth among flows allows us a simpler bandwidth
allocation mechanism, as a practical matter.


    >>  Representing an area with border routers is the mathematically exact
    >>  representation. The representation is minimal necessary.

    > No, it's not the minimal representation. It's the minimal *definition*.
    > If we assign unique names to these definitions, the unique names (i.e.
    > the representation) may still be considerably shorter than the
    > definitions.

    It's the minimal representation of an area.

I see. So the minimal representation of a book is the entire book?

    Anything shorter needs external information such as mapping between
    Area_ID and EID of border routers. The lack of such mapping is the fatal
    defect of your scheme.

I'm unclear as to exactly why this mapping is needed. We assign topology
aggregates certain names, and uses those names consistently i) when
distributing topology information, and ii) when specifying routes. Everything
is done in terms of those names. When is the mapping needed?


    Your locator won't work against area subdivision.

I see. So I assume IS-IS won't work if a level 1 area is partitioned, since it
uses arbitrary labels for level 1 areas as well?

    I don't think a lot of hand configuration is possibble for a system of
    this size.

I see. Well, I'm glad to know that the world telephone system doesn't use a
lot of hand-configuration.


    As I have wrote several times, an locator information of an object is
    accessile through DNS, which should not, can not and, thus, does
    not change *automatically*. DNS gives information on "all the possible
    pathes".

As the topology changes, the set of "all possible paths" will change. If the
information in the DNS is bound so tightly to the actual topology (by including
the EID's of all the border routers), sooner or later, the DNS will have to be
updated as the topology changes. I don't think this is a good idea. However,
we obviously disagree.

    if you use shorhanded area ID, if an area is subdivided, you must change
    objects' locator *automatically*. That is, such a scheme is NOT
    controllable.

Not necessarily. There are other techniques for handling partition, as shown
by IS-IS.


    > Overhead of the flow set up by a host on an end-end basis for a datagram
    > is all borne by that single datagram, whereas the overhead for path
    > calculation, setup, etc for a DMF between two routers is shared among
    > all the datagrams which go from one router to another.

    Thus, you must assumes all the routers are conneted, here, which is O(N^2).

No, I am *not* assuming all the routers are connected. In fact, there are
*far* fewer DMF's than full connectivity (which would imply non-hierarchical
routing). This does produce non-optimal routes, but as Kleinrock and Kamoun
showed, you can use hierarchical routing, and get reasonably good routing,
at a vast reduction in overhead.

This scheme has the *added* tweak that you can increase the number of DMF's
above the theoretical minimum to get better routes, but in a way which can be
locally controlled so the users get to make the extra-overhead/better-routes
tradeoff decision. However, in practise you will get into diminishing returns
fairly quickly (again, see Kleinrock and Kamoun), so the number of additional
DMF's in the actual network will be fairly small.

    > I would have thought that the difference, and thus the advantage of DMF's
    > for datagrams, would have been obvious.

    One of the reason of your confusion is you think conneted datagram only.

I am not thinking of "datagrams which are part of ongoing traffic streams"
only.  This *entire* scheme is *precisly* for those datagrams which are *not*
part of such ongoing streams.

    The problem is that you wrongly think there is some "real flow" with
    connectionless communication.

No, I am showing a way to move pure datagram traffic along predefined paths.


	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa15519; 10 Jan 94 15:25 EST
Received: from pizza by PIZZA.BBN.COM id aa23180; 10 Jan 94 15:02 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa23176; 10 Jan 94 14:58 EST
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa13404;
          10 Jan 94 14:55 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 11 Jan 94 04:51:17 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401101951.AA24361@necom830.cc.titech.ac.jp>
Subject: Re: New datagram mode
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Date: Tue, 11 Jan 94 4:51:15 JST
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
In-Reply-To: <9401101825.AA21447@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 10, 94 1:25 pm
X-Mailer: ELM [version 2.3 PL11]

>     > No. The term "active router" was defined in the original message as one
>     > that actually makes a decision about where to send the packet (as
>     > opposed to handling it as part of a flow).
> 
>     That's the router. The other which operates only in data link layer is not.

> The other ones are *not* operating only in the data link layer. They throw
> away the physical network header, look at the intertnetwork header to figure
> out what to do with the packet, and create a new physical network header.
> Sure sounds like a "router" to me. The only difference is they don't get to
> make decisions about which path this particular packet stream will use. I.e.
> the selector is a flow-identifier, not a locator.

Things operates on flow-identifier, which does not actively change
packet route is not a router.

You should consult people in phone company. They say something like
C-plain and will explain why flow based forwarder is not a router.

>     How, for example, is the cost of your imagenary DNS TNG?
> 
> The DNS is not involved in any way with the new datagram mode, other than
> providing the locator to the host before the packet is sent, *exactly* the way
> the DNS works now. Since this cost is currently deemed acceptable, I would
> assume it will be acceptable in the future. Perhaps I do not understand your
> question?

As your locator is the result of local coordination, I don't think
it can be static.

>     > I reckon the increased robustness is well worth it
> 
>     The simpler the more rubust.
> 
> Excuse me while I roll on the floor laughing.

I might have wrongly assumed that you have a sense of a good engineer
to be able to detect the point of diminishing return. The reason why
simple engineering is still not empty.

>     Why you objected source routing, then?  Wasn't your reason that it will
>     consume more bandwidth and CPU?
> 
> I don't like source routing because the costs of source routing must be paid
> on *each* packet, whereas of the overhead of calculating and setting up DMF's
> can be amortized over many datagrams.

You want to forward packets with locaters using several masks, which
means several table lookup, which is slow. That is the cost to be paid
on each packet at each router. You also use flow lookup several times, which
is even slower paid at some routers. And the flow setup is the worst.

So, I think we should forward packets with FULL EID match without
any mask, which means only one (hashed) table look up. That is the cost
to be paid on each packet at each router.

Which, do you think, is faster?

> can be amortized over many datagrams.

If there are many datagrams along the setup path.

> If I understand what you mean correctly, this says what I thought I was
> saying. I don't like to use the term "connections" because then everyone
> starts thinking I mean something like X.25, where critical state is in the
> switches.
> 
> I think the problem is that we don't have good terminology for "datagrams
> which are part of ongoing data streams" (e.g. the IP datagrams which are part
> of a TCP connection), and "datagrams which are *not* part of ongoing data
> streams" (e.g. UDP DNS traffic). Anyone have any suggestions?

"network layer, end-end connection" and "datalink layer, along-the-path
connection" should be the proper distinction.

>     BTW, it is absurd to imporse resource limits on connectionless
>     communications. If actual traffic of connected traffic is less than
>     pre-allocation, the extra bandwidth should be used for connectionless
>     traffic.
> 
> In general, I think most people would like the extra bandwidth to be "up for
> grabs" among all clients, not just the datagram clients. I'm not saying that
> datagrams ought to have a resource limit. I'm just thinking of the practical
> difficulties in ensuring that datagram traffic does not interfere with
> resources allocated to flows.

The proper terminology here is "best efffort".

Datagrams with QoS of "best effort" should be buffered upon contetion
and should be dropped upon buffer overflow. For reasonable dropping of
vairous daragrams, multiple buffering, as you mentioned, could be useful.

> The "trick" of assigning datagrams to a flow, and
> then dividing up the bandwidth among flows allows us a simpler bandwidth
> allocation mechanism, as a practical matter.

With your scheme, a datagram travels through, in general, several flows
with variable bandwidth. So, datagrams will be dropped at narrowing points
of the path, anyway.

>     It's the minimal representation of an area.
> 
> I see. So the minimal representation of a book is the entire book?

No, I can eliminate a lot of lines from your mail to get
the minimal representation without changing the meaning.

>     As I have wrote several times, an locator information of an object is
>     accessile through DNS, which should not, can not and, thus, does
>     not change *automatically*. DNS gives information on "all the possible
>     pathes".
> 
> As the topology changes, the set of "all possible paths" will change. If the
> information in the DNS is bound so tightly to the actual topology (by including
> the EID's of all the border routers), sooner or later, the DNS will have to be
> updated as the topology changes. I don't think this is a good idea. However,
> we obviously disagree.

If an address change, DNS should change, of course.

>     if you use shorhanded area ID, if an area is subdivided, you must change
>     objects' locator *automatically*. That is, such a scheme is NOT
>     controllable.
> 
> Not necessarily. There are other techniques for handling partition, as shown
> by IS-IS.

That is, the structured EID, NSAP.

> but as Kleinrock and Kamoun

What's that?

							Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa20810; 10 Jan 94 23:38 EST
Received: from pizza by PIZZA.BBN.COM id aa25419; 10 Jan 94 23:26 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa25415; 10 Jan 94 23:24 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa13250; 10 Jan 94 23:24 EST
Received: by ginger.lcs.mit.edu 
	id AA26598; Mon, 10 Jan 94 23:24:46 -0500
Date: Mon, 10 Jan 94 23:24:46 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401110424.AA26598@ginger.lcs.mit.edu>
To: jcurran@nic.near.net, jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
Subject: Re: New datagram mode
Cc: jnc@ginger.lcs.mit.edu

    >> This congestion would remain even after the source had ceased
    >> transmission

    > Yes and no. If a certain amount of bandwidth were allocated to
    > datagrams, presumably the loop would cause there to be more offered load
    > than capacity. This would cause packets to be dropped (via whatever drop
    > algorithm), so over time these packets would in all probability decay,
    > so I don't think they'd be there forever.

    The looping packets (and non-loop-destined-packets which just happened to
    be at the wrong place at the wrong time) would be dropped, but only until
    the load requirements were met.  At such time, the system would be in
    equilibrium: any additional traffic is almost assured of pushing the link
    utilization up and thereby the drop rate.

    This will act as a resilant barrier to link utilization: given 60% link 
    utilization due to looping datagrams, a sudden "real traffic" burst of 
    similiar load would result in %20 link over-utilization (and thereby 20%
    loss for both data streams).  Now, when the valid traffic ends, there is 
    a net 50% link utilization due to looping traffic (i.e. after accounting 
    for those dropped during the load.)  It may take a while before the
    effects of the loop have subsided, particularly since a looping traffic
    flow is almost always going to start at 100% link utilization.

True...

    Another colorful side-effect is that network engineering becomes more 
    difficult; routing difficulties have second-order affects on packet loss, 
    and hence makes problem diagnosis and capacity planning so much fun...

Yup. All good arguments as to why a hop count is still useful. I guess what it
really comes down to is "how common are loops going to be, and how much will
they cost if we don't have a hop count", versus "how much is the hop count
going to cost us". If you expect loops to be very, very, rare, it might make
sense to drop the hop count. Mind, I'm not sure I want to get rid of the hop
count; I still like the idea of having two completely redundant mechanisms to
deal with the issue of loops.

I guess the way to look at it is that it is useful to sit back and examine
"obvious" preconceptions like "there will be a hop count", to see if in fact
it really is obvious. Even if you decide to retain the "obvious" mechanism,
it's nice to you you really do need it....

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa26630; 11 Jan 94 14:51 EST
Received: from pizza by PIZZA.BBN.COM id aa28675; 11 Jan 94 14:35 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa28671; 11 Jan 94 14:31 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa25140; 11 Jan 94 14:28 EST
Received: by ginger.lcs.mit.edu 
	id AA02738; Tue, 11 Jan 94 14:28:34 -0500
Date: Tue, 11 Jan 94 14:28:34 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401111928.AA02738@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Robustness
Cc: jnc@ginger.lcs.mit.edu

	Martha and I were just talking about robustness, and how hard it is to
quantify. It's a different issue from *proving* programs correct (a field
which I know has received a lot of attention), because a truly robust system
should function correctly even in the face of incorrect engineering and coding
in components, as well as unforseen failure modes. Program proving is clearly
not much help with these.

	One thing we decided would be useful is to have a file of system
failures in communication networks, to see what we can learn from them. I
can think of three offhand:

    - The ARPANet failure caused by the IMP memory failure which caused three
      updates 120 degress apart in the sequence space.

    - The ATT failure where a timing glitch provoked a bug.

    - The recent NSFnet failure where the Ethernet card would not accept
      transit packets.

There are valuable lessons to be learned here. For instance, the latter one
tells us we might want to have our protocol check not just connectivity to
neighbour routers, but *through* neighbour routers to routers one hop away.
	So, can people please send in descriptions of system failures that
they know of? I'll put them into an organized file. Send the complete
description to me only, and a one-liner to the whole mailing list so I dont't
get N descriptions of the same failure. (I've forgotten the details of the
ATT bug, so I could use a description of that one.)

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa28459; 11 Jan 94 15:24 EST
Received: from pizza by PIZZA.BBN.COM id aa28955; 11 Jan 94 15:08 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa28951; 11 Jan 94 15:06 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa27205; 11 Jan 94 15:02 EST
Received: by ginger.lcs.mit.edu 
	id AA04026; Tue, 11 Jan 94 15:01:45 -0500
Date: Tue, 11 Jan 94 15:01:45 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401112001.AA04026@ginger.lcs.mit.edu>
To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp
Subject: Re: Analysis of DMF's in new datagram mode
Cc: nimrod-wg@BBN.COM

    > For routers which are not border routers, they need one DMF out to a
    > border router (i.e. up), and that's it. Traffic to other objects inside
    > the area can be handled by sending it to the border router, which will
    > send it back to the correct object.

    There will be load concentration at border routers, then.

I was describing the minimal functional configuration. Interior routers will
be free to augment their set of DMF's to produce more optimal routing. I
hadn't analyzed this option in my analysis to keep it the analysis reasonably
simple, but this will not produce unacceptable number of DMF's.

An interior router which had a instantiated DMF (not just a potential DMF) to
every other object in the area would still have only as many DMF's ending at
it as a border router of the area, and that case has been analyzed as O(I),
which is reasonable.

If every interior router had an instantiated DMF to all the objects in the
area, the number of flows through each router in the area would be O(IlogI),
which is a little worse, but not terribly so, since I is unlikely to grow
large; we won't have enormous areas. In addition, I doubt we will see full
mesh connectivity; traffic X-Y graphs always show hot-spots, not an even
distribution, at any scale.


    > What's *important* are the number of DMF's which end in a router, and
    > the number of DMF's which go through (i.e. require state for storage)
    > interior routers.

    not all DMFs are equal. That is, higher level DMF means a lot more traffic
    through it. That's why you can't increase the number of levels at will.
    Your model is not scalable.

This is an interesting point, but it impacts a lot more than just DMF's. If
top level links don't have enough bandwidth to handle the traffic, this is
going to be true whether you use DMF's for datagram traffic, or hop-by-hop,
or whether the user traffic is all in end-end flows!

To grow the network, we will have to do one of two things. Either the top
level links (i.e. the express highways) will have to have more bandwidth, in
which case this is not a problem, or we are going to have to distribute the
traffic over a number of smaller, parallel, links. Since this increases the
number of arcs in the graph, *not* the number of nodes, it doesn't impact the
growth of DMF's either. I reckon the second is better (since it uses parallel,
less expensive, technology, to get the performance), but I'm not sure which
we will see.

One other thing to note is that *all* routing algorithms have limits on
the size of the graph over which you can run them in practise. To build a
larger network, you need to introduce levels, and to scale to arbitraty
sizes, you need an arbitrary number of levels.


    > The average path length, A, for a graph of fixed degree (i.e. one in
    > which nodes have the same average number of arcs to neighbouring nodes,
    > independant of the size of the graph) is logN (where N is the number of
    > nodes). [Chen 86]

    While such an average over all the possible graphs will be so, as your
    topology is mesh, you should average only over planar graphs. Thus,
    the average path length will be sqrt(N).

This doesn't make sense to me. Planar graphs are not a good model for the
connectivity of the network. For instance, in a planar graph, you cannot have
full interconnection between 5 nodes (this is similar to the famous "three
utilities and three houses" problem, if anyone wants to play with a piece of
paper to verify this). However, I don't think that the real network will
display such constraints. So, planar graphs are not the applicable area of
graph theory, but rather normal graphs, and there it is log(N).

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa02550; 11 Jan 94 16:14 EST
Received: from pizza by PIZZA.BBN.COM id aa29263; 11 Jan 94 15:53 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa29259; 11 Jan 94 15:52 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa00476; 11 Jan 94 15:44 EST
Received: by ginger.lcs.mit.edu 
	id AA05198; Tue, 11 Jan 94 15:44:23 -0500
Date: Tue, 11 Jan 94 15:44:23 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401112044.AA05198@ginger.lcs.mit.edu>
To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp
Subject: Re: New datagram mode
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM

    > The other ones are *not* operating only in the data link layer. They
    > throw away the physical network header, look at the intertnetwork header
    > to figure out what to do with the packet, and create a new physical
    > network header. ... The only difference is they don't get to make
    > decisions about which path this particular packet stream will use. I.e.
    > the selector is a flow-identifier, not a locator.

    Things operates on flow-identifier, which does not actively change packet
    route is not a router. You should consult people in phone company. They
    say something like C-plain and will explain why flow based forwarder is
    not a router.

I reckon that must of us are happy calling devices which forward packets based
on the contents of the internetwork layer header a "router". It's just a name,
so it doesn't really matter that much.

Remember also that a forwarding device which is forwarding some packets based
on their flow id's may also be forwarding *other* packets by looking at their
locators. Also, all have to contain flow-setup code, etc, so it's not like you
can separate out one group of devices and say "these don't need to contain
code for this function". They all have to be functionally identical, even if
they don't operate identically on all packets.


    As your locator is the result of local coordination, I don't think it can
    be static.

You are correct, it is not; as I have mentioned before, as the topology
changes we will want to change the abstraction hierarchy to match. However, I
want that change in the abstarction hierarchy to be controllable, not fully
and unavoidably automatic.


    >> Why you objected source routing, then?  Wasn't your reason that it will
    >> consume more bandwidth and CPU?

    > I don't like source routing because the costs of source routing must be
    > paid on *each* packet, whereas of the overhead of calculating and
    > setting up DMF's can be amortized over many datagrams.

    You want to forward packets with locaters using several masks, which means
    several table lookup, which is slow. That is the cost to be paid on each
    packet at each router.

Depending on what locators look like, there are not necessarily any masks
involved. If the locators have syntax of the form A.B...P.Q, then the routing
table can be *logically* structured as a tree, with either a multi-way branch
or a routing entry leaf at each "."; locating the corrrect branch in a tree is
obviously inefficient, but it can be speeded up by hashing techniques, etc.
Remember, this cost is not paid at every "router" (my definition), only at
"active routers" (again, my definition).

    You also use flow lookup several times, which is even slower paid at some
    routers.

Lookup of a shortish, fixed length quantity at a fixed offset has to be the
easiest single operation, whether it is done in hardware or software. I don't
understand why you think it will be slow.

    And the flow setup is the worst.

But it is only performed once (so it does not even add any delay), and the
cost shared between any number of packets.

    So, I think we should forward packets with FULL EID match without any
    mask, which means only one (hashed) table look up. That is the cost to be
    paid on each packet at each router. Which, do you think, is faster?

How is looking up one of a number of EID's (since your concept is to use a
source route consisting of a list of router EID's) any cheaper than looking
up a flow-id? Remember, most routers will only be doing a flow-lookup in
this scheme, not looking at the locator.


    > can be amortized over many datagrams.

    If there are many datagrams along the setup path.

That's the whole point of having the minimal set of DMF's necessary to do pure
hierarchical routing, augmented *as necessary* where the amount of traffic
justifies the cost of extra DMF's. You only go beyond the minimal set (which
has been shown to be quite small) if there *are* many datagrams; i.e. if
the actual traffic justifies it.

Note also that if you think a DMF is unlikely to have any traffic across it,
set it up on demand, not in advance.  That way, the only DMF's that get set up
are the ones that get used.


    The proper terminology here is "best efffort".

Yes, I'd forgotten that term.

    Datagrams with QoS of "best effort" should be buffered upon contetion
    and should be dropped upon buffer overflow.

    > I'm just thinking of the practical difficulties in ensuring that
    > datagram traffic does not interfere with resources allocated to flows.
    > The "trick" of assigning datagrams to a flow, and then dividing up the
    > bandwidth among flows, allows us a simpler bandwidth allocation
    > mechanism, as a practical matter.

    With your scheme, a datagram travels through, in general, several flows
    with variable bandwidth. So, datagrams will be dropped at narrowing points
    of the path, anyway.

I must have missed something. If we do what you suggest, and buffer "best
effort" datagrams, won't they be dropped at precisely the same points on
congestion? The actual effects of the DMF scheme ought to be the same, it's
just a single uniform mechanism for all packets, rather than one for flows
and another for datagrams.


    > There are other techniques for handling partition, as shown by IS-IS.

    That is, the structured EID, NSAP.

You seem to have a private definition for "EID" that the rest of us do not
share. To us, an *EID has no structure*. So, an NSAP is not an EID, although
it does name *approximately* the same class of things as a EID does.


    > but as Kleinrock and Kamoun

    What's that?

Leonard Kleinrock and Farouk Kamoun, "Hierarchical Routing for Large Networks:
Performance Evaluation and Optimization", Computer Networks 1, North-Holland,
1977, pp. 155-174. It's the fundamental work on hierarchical routing; it
mathemtically quantifies how inefficient routing becomes when hierarchies are
used, etc.


	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa11610; 12 Jan 94 0:18 EST
Received: from pizza by PIZZA.BBN.COM id aa01425; 12 Jan 94 0:05 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa01421; 12 Jan 94 0:02 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa11197; 12 Jan 94 0:03 EST
Received: by ginger.lcs.mit.edu 
	id AA08589; Wed, 12 Jan 94 00:03:22 -0500
Date: Wed, 12 Jan 94 00:03:22 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401120503.AA08589@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Architecture document draft outline
Cc: jnc@ginger.lcs.mit.edu

	As discussed at the last IETF, the BBN crew are going to try and
crank out a draft "architectural outline" document for Nimrod. Here's a
draft outline; any comments, etc are welcome.

	Noel

--------

Nimrod Architecture

1. Introduction and Overview
   - The current Internet
   - Assumptions about the future Internet
   - Why we need a new routing and addressing architecture
   - A brief overview of this new architecture
   - How the new architecture can be introduced into the Internet

2. Internetwork Organization and Representation
   - Basic entities and how they are clustered
   - Hierarchical organization of clusters
   - Locators and identifiers
   - Representation of cluster attributes:
     - Maps: connectivity
     - Policies: offered services and restrictions on use
     - Abstraction

3. Routing and Addressing Functions
   - Entity and attribute discovery and configuration
   - Routing information (connectivity and policy) distribution
   - Route generation
   - Mapping between endpoint identifiers and locators
   - Flow setup
   - Packet forwarding

4. Auxilliary Functions
   - Network management
   - Security
   - Multicast
   - Mobility support

5. Deployment
   - How Nimrod fits with IP and with each of the new proposed
     Internet packet formats
   - Migrating to Nimrod routing and addressing
   - Router and host functionality required during transition
     to Nimrod
   

Received: from PIZZA.BBN.COM by BBN.COM id aa12156; 12 Jan 94 0:35 EST
Received: from pizza by PIZZA.BBN.COM id aa01491; 12 Jan 94 0:20 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa01487; 12 Jan 94 0:18 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa11621; 12 Jan 94 0:18 EST
Received: by ginger.lcs.mit.edu 
	id AA08636; Wed, 12 Jan 94 00:18:33 -0500
Date: Wed, 12 Jan 94 00:18:33 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401120518.AA08636@ginger.lcs.mit.edu>
To: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
Subject: Re:  Architecture document draft outline

Here are my suggestions:

In "1. Introduction and Overview", after "Why we need a new routing and
addressing architecture", insert sections about:

   - Design philosophy of the new architecture (which would talk about
     maximizing the system lifetime, making algorithms local wherever possible,
     minimizing the size and complexity of the "common core" system,
     robustness, and stuff like that)
   - Functional goals of the new architecture (which would talk about the
     capabilities we want to provide, such as mechanisms for flexible
     abstraction, an efficient datagram mode, interact will with new resource
     allocation mechanisms, etc)


I would also be tempted to break up "3. Routing and Addressing Functions" into
three sections (or subsections), one on "Topology discovery and distribution",
one on "Route Generation", and one on "User traffic handling", to emphasized
that these are three separate functional subsystems, only two of which (the
first and last) are "system-wide".

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa19409; 12 Jan 94 12:23 EST
Received: from pizza by PIZZA.BBN.COM id aa04217; 12 Jan 94 12:04 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa04213; 12 Jan 94 12:00 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa17663; 12 Jan 94 11:56 EST
Received: by ginger.lcs.mit.edu 
	id AA13725; Wed, 12 Jan 94 11:56:38 -0500
Date: Wed, 12 Jan 94 11:56:38 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401121656.AA13725@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re:  Analysis of DMF's in new datagram mode
Cc: jnc@ginger.lcs.mit.edu

    I don't know the exact formula for the number of border nodes on a graph
    with a constant degree of connection between the nodes, but I'll hazard
    the guess that the number of border nodes will grow as the square root of
    the number of total nodes. (Handwave justification for this is that it's
    probably the same as the ratio of the circumference of a circle to the
    area of a circle, since you can model a graph with a fixed degree of
    connectivity among the nodes as a geometrical figure where each node takes
    a fixed area, and thus the area is proportional to the number of nodes.)

I've just realized I may have been off-base here. This is probably a good
model for planar graphs, but perhaps not for arbitrary graphs. If so, and a
three-dimensional analogy is needed, that would mean the correct geometrical
analogy is one where each nodes takes a fixed volume, and the number of border
nodes is proportional to the surface area of the cube. That would make the
number of border nodes grow as the cube root of the number of total nodes.

On the other hand, you can represent any graph in two dimensions (if you allow
the arcs to cross), so perhaps my original thought is correct. Does anyone
know the correct answer? (I've ordered up some graph theory books to augment
my meagre stock of works on the topic, but what I have doesn't give this
one...)

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa25437; 12 Jan 94 14:03 EST
Received: from pizza by PIZZA.BBN.COM id aa04688; 12 Jan 94 13:36 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa04684; 12 Jan 94 13:34 EST
Received: from babyoil.ftp.com by BBN.COM id aa23091; 12 Jan 94 13:27 EST
Received: from tri-flow.ftp.com by ftp.com with SMTP
	id AA00236; Wed, 12 Jan 94 13:27:20 -0500
Received: by tri-flow.ftp.com.ftp.com (5.0/SMI-SVR4)
	id AA25845; Wed, 12 Jan 94 13:27:28 EST
Date: Wed, 12 Jan 94 13:27:28 EST
Message-Id: <9401121827.AA25845@tri-flow.ftp.com.ftp.com>
To: jnc@ginger.lcs.mit.edu
Subject: Re: Architecture document draft outline
From: Frank Kastenholz <kasten@tri-flow.ftp.com>
Reply-To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM
Content-Length: 3291

 > 1. Introduction and Overview
 >    - The current Internet
 >    - Assumptions about the future Internet
 >    - Why we need a new routing and addressing architecture
Move the following two someplace else....
 >    - A brief overview of this new architecture
 >    - How the new architecture can be introduced into the Internet

2. Requirements
	What the requirements are that the new architecture is supposed
	to satisfy, various design points, what things it will not
	do, what limits the proposed architecture has, and so on.

3. Overview
    - A brief overview of this new architecture
    - How the new architecture can be introduced into the Internet
         (or get rid of this bullet, leaving this all in the
         later chapter on deployment)


This chapter should be pretty much limited to description of the
various components of the Nimrod Architecture, and descriptions
of the relationships between those components. Which is what
it looks like you do.
 > 2. Internetwork Organization and Representation
 >    - Basic entities and how they are clustered
 >    - Hierarchical organization of clusters
 >    - Locators and identifiers
 >    - Representation of cluster attributes:
 >      - Maps: connectivity
 >      - Policies: offered services and restrictions on use
 >      - Abstraction

The following three (I suggest splitting routing/addressing and forwarding
into two separate chapters) discuss functions and operations.
 > 3. Routing and Addressing Functions
 >    - Entity and attribute discovery and configuration
 >    - Routing information (connectivity and policy) distribution
 >    - Route generation
 >    - Mapping between endpoint identifiers and locators

Put these in a separate chapter. They are conceptually separate
elements, they ought to be in a separate chapter, stressing that
separation. Given the volume of discussion that's gone on about
the datagram mode, there might be a fair amount to say. Note also
that in this chapter you should clearly state your reasons for not
supporting a "classic hop-by-hop" forwarding model, ala IPv4.

 >    - Flow setup
 >    - Packet forwarding
      - datagram mode

 > 4. Auxilliary Functions
 >    - Network management
 >    - Security
What are your requirements for security? 
 >    - Multicast
 >    - Mobility support
 
What about robustness? (You posted on it yesterday) Byzantine failure
protection (you really wanted this as a part of the IPng requirements
that Partridge and I did a while ago...)

 > 5. Deployment
 >    - How Nimrod fits with IP and with each of the new proposed
 >      Internet packet formats
 >    - Migrating to Nimrod routing and addressing
 >    - Router and host functionality required during transition
 >      to Nimrod

Finally, in general, please put the "whys" in the document as well as
the "whats" -- it is important to understand why certain things are
done or are not done. Also, as the document changes while being
written and reviewed, I'd suggest that a log be kept in the document
of what changes were made and why -- it helps to avoid going over old
ground unnecessarily. I've done this in some drafts that I've written
and it's proved to be tremendously useful.

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa12180; 12 Jan 94 19:34 EST
Received: from pizza by PIZZA.BBN.COM id aa06785; 12 Jan 94 19:19 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa06781; 12 Jan 94 19:17 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa11664; 12 Jan 94 19:16 EST
Received: by ginger.lcs.mit.edu 
	id AA19079; Wed, 12 Jan 94 19:16:25 -0500
Date: Wed, 12 Jan 94 19:16:25 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401130016.AA19079@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Arch doc, packet functionality
Cc: jnc@ginger.lcs.mit.edu

	One other topic we might talk about in this document is what
functionality we'd like to see in an inter-router packet format. Right at the
moment, I'm not thrilled with any of the existing off-the-shelf alternatives.
	This section would serve two functions. First, it would be a guide to
people working on internetwork packet formats as to what functionality the
Nimrod stuff would like to see. Second, should we decide to do a specialized
one inside Nimrod, this list would basically translate directly into a packet
format in what I hope (famous last words :-) would be a simple process. (I
know, I know, "amateurs design packet formats, professionals etc.")
	To that end, when we discuss various functional items (e.g. flow-id,
hop-count, etc), we should give an idea of the minimum size that we think fits
our needs. Giving the maximum that we think is useful/cost-efficient probably
wouldn't be a bad idea either.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa15197; 12 Jan 94 21:25 EST
Received: from pizza by PIZZA.BBN.COM id aa07136; 12 Jan 94 21:11 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa07132; 12 Jan 94 21:09 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa14851; 12 Jan 94 21:09 EST
Received: by ginger.lcs.mit.edu 
	id AA19383; Wed, 12 Jan 94 21:09:06 -0500
Date: Wed, 12 Jan 94 21:09:06 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401130209.AA19383@ginger.lcs.mit.edu>
To: kasten@ftp.com, nimrod-wg@BBN.COM
Subject: Re: Architecture document draft outline
Cc: jnc@ginger.lcs.mit.edu

    The following three (I suggest splitting routing/addressing and forwarding
    into two separate chapters) discuss functions and operations.

     >    - Route generation

Actually, now that I think about it, I'd almost prefer to leave this section
out of the architecture spec altogether, at least in any detail. A discussion
of how they are local, and perhaps a general discussion of strategies, but
nothing more. Route generation algorithnms are not part of the Nimrod spec, so
I don't think it's appropriate to include them in an architecture
specification.

I know what you'll all say; without feasible, worked, examples, people will
say the architecture "won't work". So? First, there are *plenty* of other
critical algorithms (such as incoming and outgoing abstarction control) which
aren't handled either. Second, if we have some good ideas on route generation,
fine, we can make them a separate RFC. Route generation is not part of the
core architecture; I would very strongly prefer to leave it out of the
architeture document.


    Given the volume of discussion that's gone on about the datagram mode,
    there might be a fair amount to say.

Say what? Except for the extended traffic from Masataka Ohta, and that message
from you, I haven't seen that much... maybe you're all so stunned with the
perfection and necessity of it all, you've nothing to say, but somehow I doubt
it! :-)

    Note also that in this chapter you should clearly state your reasons for
    not supporting a "classic hop-by-hop" forwarding model, ala IPv4.

Yup; good point.

    What about robustness? (You posted on it yesterday) Byzantine failure
    protection (you really wanted this as a part of the IPng requirements
    that Partridge and I did a while ago...)

Yah, I was getting ready to say something about this in a note to the IETF as
a whole, in response to the NSFNet problems with the failed Ethernet card. I
think we have to have protection throughout the network against active hostile
attack. This will not serve to protect against *all* unforseen bugs (time and
nature seem far more clever at finding holes than muny human intelligence), but
it will help.

    Finally, in general, please put the "whys" in the document as well as
    the "whats" -- it is important to understand why certain things are
    done or are not done.

Yes, and the architecture document is a good place for this. My only worry
is that the resulting tome would be too long for mere mortals? Maybe we could
have two versions, one with, and one without "DISCUSSION" sections? A simple
matter of changing text-processor macro definitions...

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa05380; 13 Jan 94 8:35 EST
Received: from pizza by PIZZA.BBN.COM id aa09049; 13 Jan 94 8:19 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa09045; 13 Jan 94 8:16 EST
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa04609;
          13 Jan 94 8:15 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Thu, 13 Jan 94 22:10:16 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401131310.AA08203@necom830.cc.titech.ac.jp>
Subject: Re: Analysis of DMF's in new datagram mode
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Date: Thu, 13 Jan 94 22:10:15 JST
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
In-Reply-To: <9401112001.AA04026@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 11, 94 3:01 pm
X-Mailer: ELM [version 2.3 PL11]

>     > For routers which are not border routers, they need one DMF out to a
>     > border router (i.e. up), and that's it. Traffic to other objects inside
>     > the area can be handled by sending it to the border router, which will
>     > send it back to the correct object.
> 
>     There will be load concentration at border routers, then.
> 
> I was describing the minimal functional configuration.

Because of the load concentration, I don't think your configuration function.

> An interior router which had a instantiated DMF (not just a potential DMF)
> every other object in the area would still have only as many DMF's ending at
> it as a border router of the area, and that case has been analyzed as O(I),
> which is reasonable.

Your configuration is wrong as to the configuration within an area,
of course.

But, your configuration is also wrong as to the configuration of
area hierarchy. The number of levels must be limited,

> If every interior router had an instantiated DMF to all the objects in the
> area, the number of flows through each router in the area would be O(IlogI),
> which is a little worse, but not terribly so, since I is unlikely to grow
> large; we won't have enormous areas.

Because of planarity, it is O(I^1.5), where I is not constant.

> In addition, I doubt we will see full
> mesh connectivity; traffic X-Y graphs always show hot-spots, not an even
> distribution, at any scale.

Hot spots make the load concentration worse.

>     > What's *important* are the number of DMF's which end in a router, and
>     > the number of DMF's which go through (i.e. require state for storage)
>     > interior routers.
> 
>     not all DMFs are equal. That is, higher level DMF means a lot more traffic
>     through it. That's why you can't increase the number of levels at will.
>     Your model is not scalable.
> 
> This is an interesting point, but it impacts a lot more than just DMF's. If
> top level links don't have enough bandwidth to handle the traffic, this is
> going to be true whether you use DMF's for datagram traffic, or hop-by-hop,
> or whether the user traffic is all in end-end flows!

Top level links MUST have enough bandwidth. That is, there should be
a lot of second level areas and they should be connected with a
lot of links.

Your conffiguration, which assumes small areas, do not allow such a
coniguration.

> To grow the network, we will have to do one of two things. Either the top
> level links (i.e. the express highways) will have to have more bandwidth, in
> which case this is not a problem, or we are going to have to distribute the
> traffic over a number of smaller, parallel, links. Since this increases the
> number of arcs in the graph, *not* the number of nodes,

If you increase number of links without increasing the number of nodes,
load will concentrate not in links but on border routers.

> it doesn't impact the
> growth of DMF's either. I reckon the second is better (since it uses parallel,
> less expensive, technology, to get the performance), but I'm not sure which
> we will see.

We need a lot of top level areas with a lot of border routers.

> One other thing to note is that *all* routing algorithms have limits on
> the size of the graph over which you can run them in practise. To build a
> larger network, you need to introduce levels, and to scale to arbitraty
> sizes, you need an arbitrary number of levels.

Everything has its own limitaiton. Still, we can expect some limitation
scale as time goes by.

For example, allowable size of routing information is expected to
scale as the link speed increases.

So, though it is obvious that there should be levels, but you don't
have to assume the area size be constant.

>     > The average path length, A, for a graph of fixed degree (i.e. one in
>     > which nodes have the same average number of arcs to neighbouring nodes,
>     > independant of the size of the graph) is logN (where N is the number of
>     > nodes). [Chen 86]
> 
>     While such an average over all the possible graphs will be so, as your
>     topology is mesh, you should average only over planar graphs. Thus,
>     the average path length will be sqrt(N).
> 
> This doesn't make sense to me. Planar graphs are not a good model for the
> connectivity of the network.

As the Earth surface is planar and as the routers are placed on Earth it
is the model.

> For instance, in a planar graph, you cannot have
> full interconnection between 5 nodes

Small number of crossing does not matter. But, if you allow
arbitrary crossing, it means most links are lengthy.

> However, I don't think that the real network will
> display such constraints. So, planar graphs are not the applicable area of
> graph theory, but rather normal graphs, and there it is log(N).

Assume top level routers are distributed all over the surface of the
Earth.

Then, how, do you think, will the average link length between connected
nodes?

If you assume full randomness, the average is something with
transcontinental scale, which is quite costly.

							Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa15508; 13 Jan 94 11:21 EST
Received: from pizza by PIZZA.BBN.COM id aa09956; 13 Jan 94 11:07 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa09952; 13 Jan 94 11:05 EST
Received: from ns.Novell.COM by BBN.COM id aa14065; 13 Jan 94 11:00 EST
Received: from WC.Novell.COM (optics.wc.novell.com) by ns.Novell.COM (4.1/SMI-4.1)
	id AA09951; Thu, 13 Jan 94 09:00:23 MST
Received: from [130.57.64.148] by WC.Novell.COM (4.1/SMI-4.1)
	id AA15978; Thu, 13 Jan 94 07:55:28 PST
Date: Thu, 13 Jan 94 07:55:27 PST
Message-Id: <9401131555.AA15978@WC.Novell.COM>
X-Sender: minshall@optics.wc.novell.com (Unverified)
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
From: Greg Minshall <Greg_Minshall@novell.com>
Subject: Re: New datagram mode
Cc: nimrod-wg@BBN.COM

Noel,

>        The forwarding of these packets is, as already noted, quite
>efficient, and in non-active routers, is maximally efficient (perhaps more
>so than even standard hop-by-hop).

Out of curiosity, why are you thinking that DMF forwarding may possibly be
more efficient than standard hop-by-hop?  Because you are assuming that
lots of destinations will be "multiplexed" over a given datagram flow?

Greg


Received: from PIZZA.BBN.COM by BBN.COM id aa17684; 13 Jan 94 11:54 EST
Received: from pizza by PIZZA.BBN.COM id aa10121; 13 Jan 94 11:36 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa10115; 13 Jan 94 11:34 EST
Received: from ftp.com by BBN.COM id aa16073; 13 Jan 94 11:30 EST
Received: from tri-flow.ftp.com by ftp.com with SMTP
	id AA21421; Thu, 13 Jan 94 11:30:59 -0500
Received: by tri-flow.ftp.com.ftp.com (5.0/SMI-SVR4)
	id AA03056; Thu, 13 Jan 94 11:31:03 EST
Date: Thu, 13 Jan 94 11:31:03 EST
Message-Id: <9401131631.AA03056@tri-flow.ftp.com.ftp.com>
To: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
Subject: Re: Architecture document draft outline
From: Frank Kastenholz <kasten@tri-flow.ftp.com>
Reply-To: kasten@ftp.com
Content-Length: 3049

>I know what you'll all say; without feasible, worked, examples, people will
>say the architecture "won't work". So? First, there are *plenty* of other
>critical algorithms (such as incoming and outgoing abstarction control) which
>aren't handled either. Second, if we have some good ideas on route generation,
>fine, we can make them a separate RFC. Route generation is not part of the
>core architecture; I would very strongly prefer to leave it out of the
>architeture document.

First, you might have to enumerate the requirements that Nimrod has
on the algorithms, whatever they are.

Second, you have to identify what the required algorithms and
protocols are that are not being specified as a part of the
architecture. 

Finally, by having samples you make it much easier for people to
understand what is going on, perhaps these could simply be as parts
of examples, or as appendices that present outlines of some sample
algorithms.


>     Given the volume of discussion that's gone on about the datagram mode,
>     there might be a fair amount to say.
> 
>Say what? Except for the extended traffic from Masataka Ohta, and that message
>from you, I haven't seen that much... maybe you're all so stunned with the
>perfection and necessity of it all, you've nothing to say, but somehow I doubt
>it! :-)

Well, perhaps I am mistaking volume of postings for breadth of
discussion.  While I confess that i have not been following all of
the details of Masataka Ohta's postings and your responses, my
impression is that a volume of posting, even from one person, may
indicate a need for additional material in the way of discussion,
explanation, background, examples, reasoning and so forth.


> Yes, and the architecture document is a good place for this. My only worry
> is that the resulting tome would be too long for mere mortals? Maybe we could
> have two versions, one with, and one without "DISCUSSION" sections? A simple
> matter of changing text-processor macro definitions...

Not two documents. Maybe putting the whys and wherefores into
separate appendices or DISCUSSION sections ala router requirements
(in otherwords, make it easy for an experienced reader to skip over
the explanatory material). I prefer in-line discussion sections
wherever possible; it means less skipping back and forth to
appendices when reading.


Also, I just thought of it now; a description of what is required to
configure a host and a router and how auto-configuration might work (at
a high level) would be useful. Again, I realize that these may not be
a part of the architecture, per-se, but this is all needed in order
to make a realistic assesment of whether the architecture is useful
as an IPng.

OR -- as an alternative, do not release the architecture document
to the general public (i.e. off of this list) until there are additional
documents available that, at least at a high level, describe how
these implementation-level things _could_ be done.

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa04867; 13 Jan 94 15:59 EST
Received: from pizza by PIZZA.BBN.COM id aa11532; 13 Jan 94 15:41 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa11528; 13 Jan 94 15:37 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa03204; 13 Jan 94 15:35 EST
Received: by ginger.lcs.mit.edu 
	id AA24833; Thu, 13 Jan 94 15:35:34 -0500
Date: Thu, 13 Jan 94 15:35:34 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401132035.AA24833@ginger.lcs.mit.edu>
To: Greg_Minshall@novell.com, jnc@ginger.lcs.mit.edu
Subject: Re: New datagram mode
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM


    Out of curiosity, why are you thinking that DMF forwarding may possibly be
    more efficient than standard hop-by-hop? Because you are assuming that
    lots of destinations will be "multiplexed" over a given datagram flow?

No, it's because in most of the routers along the path, the forwarding will be
a "flow lookup" forwarding, rather than a "locator lookup" forwarding. I
expect that the former will be more efficient, since you are looking up a
forwarding entry based on a shortish fixed length UID tag.

Designs for locators vary, from Tuba, which is not fixed length (and longish),
to SIP, which has on the surface has this fixed characteristic, but in fact is
not really so. SIP assumes that SIP routing tables will be using the "longest
match" method, so in fact it's not a simple lookup.
The obvious reply is that "well, the result of the lookup will be cached, so
from then on it is a simple UID tag lookup" is invalid, since we are talking
about datagram *applications*; i.e. there is no next packet. Traffic
consisting of a stream of packets is best carried in Nimrod via a flow.

However, pure datagram applications should be as efficient, and maybe more so;
it depends on i) the ratio of "active" routers to flow-forwarding routers, and
ii) the cost ratios among "active" forwarding, "flow" forwarding, and
"hop-by-hop" forwarding. Not knowing any of these exactly, I can't say for
sure.
Also, when I said "efficient", I meant "forwarding time" efficient, since
this seems to be the dominant concern. Efficiency along other ratios (such as
state) are, of course, a different matter.

Still, this new datagram mode should enable us to provide very efficient
"pure" datagram service in Nimrod, while staying within the Nimrod design
philosophy; i.e. without having to include a hop-by-hop mode.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa04892; 13 Jan 94 15:59 EST
Received: from pizza by PIZZA.BBN.COM id aa11545; 13 Jan 94 15:41 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa11541; 13 Jan 94 15:39 EST
Received: from [131.112.4.4] by BBN.COM id aa03542; 13 Jan 94 15:39 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Fri, 14 Jan 94 05:34:42 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401132034.AA09359@necom830.cc.titech.ac.jp>
Subject: Re: New datagram mode
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Date: Fri, 14 Jan 94 5:34:40 JST
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
In-Reply-To: <9401112044.AA05198@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 11, 94 3:44 pm
X-Mailer: ELM [version 2.3 PL11]

>     Things operates on flow-identifier, which does not actively change packet
>     route is not a router. You should consult people in phone company. They
>     say something like C-plain and will explain why flow based forwarder is
>     not a router.

> I reckon that must of us are happy calling devices which forward packets based
> on the contents of the internetwork layer header a "router". It's just a name,
> so it doesn't really matter that much.

Flow ID is not considered to be internetwor layer.

> Remember also that a forwarding device which is forwarding some packets based
> on their flow id's may also be forwarding *other* packets by looking at their
> locators.

That is, it seems to me, what you refered as "active router".

> They all have to be functionally identical, even if
> they don't operate identically on all packets.

But you mentioned some static configuration. Though all the brouters are
functionally identical, some are routers and some are bridges depdending
on the static configuration.

>     As your locator is the result of local coordination, I don't think it can
>     be static.
> 
> You are correct, it is not; as I have mentioned before, as the topology
> changes we will want to change the abstraction hierarchy to match. However, I
> want that change in the abstarction hierarchy to be controllable, not fully
> and unavoidably automatic.

So, my question is, how can you globally propergate the information?

I don't think you can use DNS here.

> Remember, this cost is not paid at every "router" (my definition), only at
> "active routers" (again, my definition).

Oops, my error.

>     And the flow setup is the worst.
> 
> But it is only performed once (so it does not even add any delay), and the
> cost shared between any number of packets.

You should be assuming connected UDP, then.

>     So, I think we should forward packets with FULL EID match without any
>     mask, which means only one (hashed) table look up. That is the cost to be
>     paid on each packet at each router. Which, do you think, is faster?
> 
> How is looking up one of a number of EID's (since your concept is to use a
> source route consisting of a list of router EID's) any cheaper than looking
> up a flow-id?

They are same.

> Remember, most routers will only be doing a flow-lookup in
> this scheme, not looking at the locator.

With my scheme, only the source cares locator.

>     > can be amortized over many datagrams.
> 
>     If there are many datagrams along the setup path.
> 
> That's the whole point of having the minimal set of DMF's necessary to do pure
> hierarchical routing, augmented *as necessary* where the amount of traffic
> justifies the cost of extra DMF's. You only go beyond the minimal set (which
> has been shown to be quite small) if there *are* many datagrams; i.e. if
> the actual traffic justifies it.

OK. Suppose the traffic needs the maximum set. How large is the maximum?

> Note also that if you think a DMF is unlikely to have any traffic across it,
> set it up on demand, not in advance.  That way, the only DMF's that get set up
> are the ones that get used.

As the communication is connectionless, you can't expect much usage
pattern. Especially, packets which travels long distance, which loads
top level routers tends to have less pattern, becase end organizations
are less related.

>     Datagrams with QoS of "best effort" should be buffered upon contetion
>     and should be dropped upon buffer overflow.
> 
>     > I'm just thinking of the practical difficulties in ensuring that
>     > datagram traffic does not interfere with resources allocated to flows.
>     > The "trick" of assigning datagrams to a flow, and then dividing up the
>     > bandwidth among flows, allows us a simpler bandwidth allocation
>     > mechanism, as a practical matter.
> 
>     With your scheme, a datagram travels through, in general, several flows
>     with variable bandwidth. So, datagrams will be dropped at narrowing points
>     of the path, anyway.
> 
> I must have missed something. If we do what you suggest, and buffer "best
> effort" datagrams, won't they be dropped at precisely the same points on
> congestion? The actual effects of the DMF scheme ought to be the same,

If you try to have QoS along each flow, some of the bandwidth is reserved
and wasted, which is the difference.

If you don't try to have QoS, they is no difference.

So, why do you bother to have flows?

> it's
> just a single uniform mechanism for all packets, rather than one for flows
> and another for datagrams.

Packets with flow will have end-end flow setup. Others not. So, they are
not same.

>     > There are other techniques for handling partition, as shown by IS-IS.
> 
>     That is, the structured EID, NSAP.
> 
> You seem to have a private definition for "EID" that the rest of us do not
> share. To us, an *EID has no structure*. So, an NSAP is not an EID, although
> it does name *approximately* the same class of things as a EID does.

It's you who wrongly said my EID have structure.

Anyway, how can you handle partitioning?

							Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa13309; 15 Jan 94 0:05 EST
Received: from pizza by PIZZA.BBN.COM id aa19502; 14 Jan 94 23:52 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa19498; 14 Jan 94 23:49 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id ab13000; 14 Jan 94 23:50 EST
Received: by ginger.lcs.mit.edu 
	id AA06439; Fri, 14 Jan 94 23:50:16 -0500
Date: Fri, 14 Jan 94 23:50:16 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401150450.AA06439@ginger.lcs.mit.edu>
To: kasten@ftp.com, nimrod-wg@BBN.COM
Subject: Re: Architecture document draft outline
Cc: jnc@ginger.lcs.mit.edu

    First, you might have to enumerate the requirements that Nimrod has
    on the algorithms ... Second, you have to identify what the required
    algorithms and protocols are that are not being specified as a part of the
    architecture. 

Good points.

    > My only worry is that the resulting tome would be too long ...
    > Maybe we could have two versions, one with, and one without "DISCUSSION"
    > sections?

    Not two documents. Maybe putting the whys and wherefores into separate
    appendices or DISCUSSION sections... I prefer in-line discussion sections
    wherever possible

Err, one document would be a total subset of the other. I suggested the short
version since long document tend to put people off from reading them..

I agree, any discussion as to why things are done a given way is best done
inline.

    Also, I just thought of it now; a description of what is required to
    configure a host and a router and how auto-configuration might work (at
    a high level) would be useful.

I spoke with Martha on the phone about the outline (I'll report in a separate
message), and I have this bit set that she said configuration would be
covered...

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa22605; 17 Jan 94 7:41 EST
Received: from pizza by PIZZA.BBN.COM id aa26866; 17 Jan 94 7:26 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa26862; 17 Jan 94 7:23 EST
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa22107;
          17 Jan 94 7:23 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Mon, 17 Jan 94 21:18:44 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401171218.AA21893@necom830.cc.titech.ac.jp>
Subject: Re: Architecture document draft outline
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Date: Mon, 17 Jan 94 21:18:43 JST
Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu
In-Reply-To: <9401120503.AA08589@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 12, 94 12:03 am
X-Mailer: ELM [version 2.3 PL11]

> 	As discussed at the last IETF, the BBN crew are going to try and
> crank out a draft "architectural outline" document for Nimrod. Here's a
> draft outline; any comments, etc are welcome.

I'm interested in content, not architechture:

> 3. Routing and Addressing Functions

>    - Mapping between endpoint identifiers and locators

How will it be?

							Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa13699; 26 Jan 94 12:23 EST
Received: from pizza by PIZZA.BBN.COM id aa16839; 26 Jan 94 12:04 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa16835; 26 Jan 94 12:01 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa12235; 26 Jan 94 12:00 EST
Received: by ginger.lcs.mit.edu 
	id AA11456; Wed, 26 Jan 94 12:00:03 -0500
Date: Wed, 26 Jan 94 12:00:03 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401261700.AA11456@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Outline
Cc: jnc@ginger.lcs.mit.edu

<Sorry to take a while getting to this, but things have been crazy here...
 kids and mother all deathly sick from the flue, etc... >


I talked with Martha about the points about the outline brought up on the
mailing list. She made the general point that the outline was very high-level,
and in fact topics were covered that were not listed in that outline. On
specific issues:

 - Nimrod design philosophy - this is already being covered in the Introduction

 - Nimrod design goals - Ditto.

 - Route selection algorithm - specific algorithms aren't covered, but a
    discussion of the required functional attributes (i.e. in terms of what
    data goes in, and what has to come out) are.

 - Inter-router packet formats - functional requirements for the inter-router
    packet format will be covered.

That's all my hasty notes of the (now long-ago :-) conversation reveal. If
there is something important I've missed, please let me know.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa16499; 26 Jan 94 13:11 EST
Received: from pizza by PIZZA.BBN.COM id aa17084; 26 Jan 94 12:46 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa17080; 26 Jan 94 12:45 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa14828; 26 Jan 94 12:43 EST
Received: by ginger.lcs.mit.edu 
	id AA11761; Wed, 26 Jan 94 12:43:33 -0500
Date: Wed, 26 Jan 94 12:43:33 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401261743.AA11761@ginger.lcs.mit.edu>
To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp, 
    nimrod-wg@BBN.COM
Subject: Re: New datagram mode
Cc: jnc@ginger.lcs.mit.edu

<Again, my apologies for the delay in the reply.>


    >> Things operates on flow-identifier, which does not actively change
    >> packet route is not a router. ... will explain why flow based forwarder
    >> is not a router.

    > I reckon that must of us are happy calling devices which forward packets
    > based on the contents of the internetwork layer header a "router". It's
    > just a name, so it doesn't really matter that much.

    Flow ID is not considered to be internetwor layer.

Perhaps we are using the same term ("flow ID") to refer to two different
things, since in my conception a "flow ID" is just the name (label,
identifier) of an internetwork layer object, the "flow".

To me, a flow is a series of packets which belong together (e.g. the TCP
packets of a file being transferred via FTP, or a multi-cast
video-conference). They may be associated for fundamental reasons (e.g.
resource allocation, where the allocation covers a number of packets, not just
one), or for efficiency (e.g. in routing/forwarding and access control).

This relationship between the packets is made across the entire internetwork
system (i.e. across many networks), on an end-end basis, and is thus visible
to the internetwork layer. It is thus absoutely an internetwork layer concept.

    > Remember also that a forwarding device which is forwarding some packets
    > based on their flow id's may also be forwarding *other* packets by
    > looking at their locators.

    That is, it seems to me, what you refered as "active router".

Yes, the forwarding based on locators would happen only in "active routers".
(The term "active" is not a particularly good one, just something I picked
in a hurry because I needed a term to distinguish that set of routers.)

    > They all have to be functionally identical, even if they don't operate
    > identically on all packets.

    But you mentioned some static configuration. Though all the brouters are
    functionally identical, some are routers and some are bridges depdending
    on the static configuration.

Again, you seem to have a unusual definition of "bridge". To me, a bridge is a
device which forwards packets based on the local physical network header (i.e.
802.*, or whatever). All of the devices I am talking about would be forwarding
packets based on the internetwork header, which is why I call them all
routers.


    >> As your locator is the result of local coordination, I don't think it
    >> can be static.

    > You are correct, it is not; as I have mentioned before, as the topology
    > changes we will want to change the abstraction hierarchy to match.
    > However, I want that change in the abstarction hierarchy to be
    > controllable, not fully and unavoidably automatic.

    So, my question is, how can you globally propergate the information?
    I don't think you can use DNS here.

Why not? That's part of the reason for controlling the rate of change, to
bring it within the rate that the DNS can handle.

Also, there has been some discussion about the need to allow things to have
multiple locators to make this "renumbering" easier in practise; we can't have
a "flag moment" when every locator within the scope of the change gets updated.
We need a mechanism which allows the process to happen over some reasonable
time period, with interoperation with the rest of the system continuing while
the change happens. Allowing (temporary) multiple locators allows this.


    >>  And the flow setup is the worst.

    > But it is only performed once (so it does not even add any delay), and
    > the cost [of setting up a DMF is] shared between any number of packets.

    You should be assuming connected UDP, then.

I didn't follow this?


    > Remember, most routers will only be doing a flow-lookup in this scheme,
    not looking at the locator.

    With my scheme, only the source cares locator.

I thought that your scheme involved intermediate routers making routing
decisions for packets based on the EID of the next border router; this
sequence of EID's is the locator in your scheme. Did I not understand
something? Your intermediate routers have to look at the current EID in the
locator (i.e. a non-fixed offset in the packet), unless you have copied the
"current" EID to some other location in the packet.

I am assuming that you are not looking at all of them to find the rightmost
one that any given router has in its table; this, unfortunately, is what you
need to do to find the "optimal" (within the amount of routing data that you
have passed around to the routers in your system) path. You could combine the
two, and have the intermediate border router (above) set the next border
router to aim for to be not just the next one in the list, but the rightmost
one it has is it's routing table, which will get you a somewhat optimized
route. It still probably won't be as good as the new datagram mode, since you
will have to head for the particular border router (named by its EID) in your
locator, not the closest one into that area (which may or may not be the
optimal entry router for the ultimate destination, sigh, another
complication).


    > That's the whole point of having the minimal set of DMF's necessary to
    > do pure hierarchical routing, augmented *as necessary* where the amount
    > of traffic justifies the cost of extra DMF's. You only go beyond the
    > minimal set (which has been shown to be quite small) if there *are* many
    > datagrams; i.e.  if the actual traffic justifies it.

    OK. Suppose the traffic needs the maximum set. How large is the maximum?

Impossibly large, but this is true of *any* routing architecture. The
"maximal" set of routing information would be that set which provides the
maximally optimal route *at all times*. In any routing architeture, this would
effectively mean thay everyone would have to have a complete database of the
entire system; ie. track each individual destination separately, i.e. no
hierarchy. The overhead of maintaining that database is far larger than the
savings. You reach a point of diminishing returns. The trick is to identify
the point at which further detail in the routign database doesn't pay off.

I don't know how to do this, and I suspect we may never get a simple,
guaranteed optimal algorithm (it feels NP-complete), but as we get better and
better practical approximations, the "algorithm-independant" nature of Nimrod
will allow us to deplyoy it incrementally, with no global coordination


    > Note also that if you think a DMF is unlikely to have any traffic across
    > it, set it up on demand, not in advance. That way, the only DMF's that
    > get set up are the ones that get used.

    As the communication is connectionless, you can't expect much usage
    pattern.

This is a conjecture which I suspect is wrong, but I can't prove it right at
the moment. However, I can hand-wave. For instance, cars on roads have a lot
of the characteristics of datagram, but there are definitely usage patters.
You can also look at phone networks; individual calls have a lot of the
same characteristics as datagrams, and there, too, there are useage patterns.

    Especially, packets which travels long distance, which loads top level
    routers tends to have less pattern, becase end organizations are less
    related.

This is true, but I suspect we'll have to monitor a real network to know
what the actual patterns are. I don't think we can predict them.


    > I must have missed something. If we do what you suggest, and buffer "best
    > effort" datagrams, won't they be dropped at precisely the same points on
    > congestion? The actual effects of the DMF scheme ought to be the same,

    If you try to have QoS along each flow, some of the bandwidth is reserved
    and wasted, which is the difference.

It depends on your resource allocation system. As far as I know, most proposed
resource allocation architectures allow reserved, but unused, bandwidth to be
given to "capacity available" traffic.


    Anyway, how can you handle partitioning?

This is an open point; there are a number of potential scheme, and a decision
as to which set (since I don't think one alone will do it) has not yet been
made. We will be discussing it soon, I expect.


	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa20591; 26 Jan 94 14:17 EST
Received: from pizza by PIZZA.BBN.COM id aa17748; 26 Jan 94 13:53 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa17744; 26 Jan 94 13:51 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa19050; 26 Jan 94 13:51 EST
Received: by ginger.lcs.mit.edu 
	id AA12674; Wed, 26 Jan 94 13:49:04 -0500
Date: Wed, 26 Jan 94 13:49:04 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401261849.AA12674@ginger.lcs.mit.edu>
To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp, 
    nimrod-wg@BBN.COM
Subject: Re: Analysis of DMF's in new datagram mode
Cc: jnc@ginger.lcs.mit.edu

    >>> For routers which are not border routers, they need one DMF out to a
    >>> border router (i.e. up), and that's it. Traffic to other objects inside
    >>> the area can be handled by sending it to the border router, which will
    >>> send it back to the correct object.
 
    >> There will be load concentration at border routers, then.

    > I was describing the minimal functional configuration.

    Because of the load concentration, I don't think your configuration
    function.

The answer to this has two parts. First, I don't think it would take *lot*
more state to provide better routing. The argument is given below, in the
original message.

Second, to the extent that your physical network configuration provides load
concentrations, this is a problem with the configuration which routing alone
cannot solve. I prefer meshes, with lots of smaller routers, for the simple
reason that the load is spread over more paths.

    > An interior router which had a instantiated DMF (not just a potential
    > DMF) [to] every other object in the area would still have only as many
    > DMF's ending at it as a border router of the area, and that case has been
    > analyzed as O(I), which is reasonable.

    Your configuration is wrong as to the configuration within an area,
    of course.

How is it wrong? It's not at all obvious to me..

    But, your configuration is also wrong as to the configuration of area
    hierarchy. The number of levels must be limited,

Why? Everyone seems to agree that the only way to scale the system is to
increase the number of levels, and in fact, in general *all* routing
architectures seem to have the characteristic that increasing the number of
levels increases the overhead a lot more slowly than increasing the size of
existing levels.

    > If every interior router had an instantiated DMF to all the objects in
    > the area, the number of flows through each router in the area would be
    > O(IlogI), which is a little worse, but not terribly so, since I is
    > unlikely to grow large; we won't have enormous areas.

    Because of planarity, it is O(I^1.5), where I is not constant.

I will discuss the planarity issue below, but even if it *were* O(IsqrtI),
that would still not be impossible, since I don't expect to see massive
growth in I (the number of one-level-down-objects in the average area).


    > In addition, I doubt we will see full mesh connectivity; traffic X-Y
    > graphs always show hot-spots, not an even distribution, at any scale.

    Hot spots make the load concentration worse.

These "hot spots" are not *physical* hot-spots, but source-destinaion traffic
matrix entries which show larger counts than average. They would only cause
*phyical* hot-spots if the topology has limited connectivity, and the sources
and destinations are scattered across that topology in a special way. Of
course, it would be loony to design you physical topology like that, if you
had those tarffic patterns; you'd modify it to get rid of the hot spots.


    >> not all DMFs are equal. That is, higher level DMF means a lot more
    >> traffic through it. That's why you can't increase the number of levels
    >> at will.  Your model is not scalable.

    > This is an interesting point, but it impacts a lot more than just DMF's.
    > If top level links don't have enough bandwidth to handle the traffic,
    > this is going to be true whether you use DMF's for datagram traffic, or
    > hop-by-hop, or whether the user traffic is all in end-end flows!

    Top level links MUST have enough bandwidth.

True, but this is a physical topology design point, not a routing architecture
design point.

    That is, there should be a lot of second level areas and they should be
    connected with a lot of links. Your conffiguration, which assumes small
    areas, do not allow such a coniguration.

I'm not quite sure I follow this. I have said that any area ought to contain a
relatively small (where "relatively" may be O(1,000), or perhaps even more)
number of next-level-down objects. This does not mean that a top level area
will cover a small part of the physical topology; it may in fact cover a
*huge* area of the topology. Was this what you meant, or did you have some
reason to think that practical designs would need areas far larger than
this?

    > To grow the network, we will have to do one of two things. Either the top
    > level links (i.e. the express highways) will have to have more
    > bandwidth, in which case this is not a problem, or we are going to have
    > to distribute the traffic over a number of smaller, parallel, links.
    > Since this increases the number of arcs in the graph, *not* the number
    > of nodes,

    If you increase number of links without increasing the number of nodes,
    load will concentrate not in links but on border routers.

Good point; we well need to increase the number of border nodes too. This
obviously makes hash of my next statement, but it turns out that DMF growth is
more correlated to the number of *interior* objects, and not to the number of
border routers, so we can up the number of border routers too (say from
sqrt(I), the assumption I used in my calculations) to some higher function)
without serious problem.

    > it doesn't impact the growth of DMF's either. I reckon the second is
    > better (since it uses parallel, less expensive, technology, to get the
    > performance), but I'm not sure which we will see.

    We need a lot of top level areas with a lot of border routers.

I need to go away and think about this; I'm convinced that a mesh *is* the
answer, but I need to understand that there are no pitfalls (in terms of load
concentration or routing overhead) to making it work. Note that this is a
problem with routing/physical topology *in general*, not just this specific
architeture.


    > One other thing to note is that *all* routing algorithms have limits on
    > the size of the graph over which you can run them in practise. To build a
    > larger network, you need to introduce levels, and to scale to arbitraty
    > sizes, you need an arbitrary number of levels.

    Everything has its own limitaiton. Still, we can expect some limitation
    scale as time goes by. For example, allowable size of routing information
    is expected to scale as the link speed increases.

True, but I think the limit at the moment is memory, not bandwidth, although
as I explained (with memory capacities going as the square of feature size,
whereas device speed goes linearly with feature size) I expect this balance
to shift.

    So, though it is obvious that there should be levels, but you don't
    have to assume the area size be constant.

Oh, I'm not. I'm just assuming that i) growth in the size of the network will
be *faster* than technology for some years to come (how fast is the Internet
growing now), so we can't accomodate that growth purely with growth in
technology (line speeds and memory sizes).

Also, as I explained, you get more "bang for your dollar" out of increasing
the number of levels than you do out of increasing the size of each level. As
a very simplified example, let's assume you have a 24-bit address. You can
either make it two 12-bit fields, or three 8-bit fields. Either gives you the
same number of total destination addresses available - 2^24. However, the
former would take 2*(2^12) routing table entries in a router, or 8K, whereas
the latter would take 3*(2^8), or 768; i.e. a order or magnitude less state!

Similar calculations hold for *all* routing architectures. The Kleinrock and
Kamoun paper shows that you don't wind up with much worse routes (in fact,
the difference is usually infintesimal), and the reduction in routing overhead
is substantial.


    >>> The average path length, A, for a graph of fixed degree (i.e. one in
    >>> which nodes have the same average number of arcs to neighbouring nodes,
    >>> independant of the size of the graph) is logN (where N is the number of
    >>> nodes). [Chen 86]

    >> While such an average over all the possible graphs will be so, as your
    >> topology is mesh, you should average only over planar graphs. Thus,
    >> the average path length will be sqrt(N).

    > This doesn't make sense to me. Planar graphs are not a good model for the
    > connectivity of the network. For instance, in a planar graph, you cannot
    > have full interconnection between 5 nodes

    As the Earth surface is planar and as the routers are placed on Earth it
    is the model.

Well, not exactly (we aren't two dimensional beings :-), but you do have a
good point below, so I'll skip this.

    > However, I don't think that the real network will display such
    > constraints. So, planar graphs are not the applicable area of graph
    > theory, but rather normal graphs, and there it is log(N).

    Small number of crossing does not matter. But, if you allow arbitrary
    crossing, it means most links are lengthy. Assume top level routers are
    distributed all over the surface of the Earth. Then, how, do you think,
    will the average link length between connected nodes? If you assume full
    randomness, the average is something with transcontinental scale, which is
    quite costly.

This is a good point, actually.

I doubt that normal (i.e. non-planar) graph of fixed degree (i.e. one in which
nodes have the same average number of arcs to neighbouring nodes, independant
of the size of the graph) is really an optimal model for the network either.
The problem, as you have pointed out, is that not all links are equally
likely.

I.e., if Pij is the probability of a link between nodes i and j (thanks for
the notation, Yakov :-), in a real network, Pij is *not* a constant over all
j for a given i. Rather, nodes which are "closer" (in the physical space
geometry) are more likely to have links than those further away. So, even if
the average node does have a constant number of arcs, they are not distributed
randomly across the graph.

This will move us off the O(logN) point, and toward the O(sqrtN) point.
However, without a probability model, and lot of math (or simulation), neither
of which I have time for, it's impossible to say how far. My guess, based on
looking at real-world* networks like the ARPANet, is that it will be pretty
close to the results for true fully random graphs, in future real networks.

The reason is simple; long path lengths are a *bad thing*. People will put in
enough non-local links to bring the path length down, but my thinking (based
on recollection, again) is that its an asymptotic, diminishing-returns type of
thing. It doesn't take a lot of long links to really whack down the diameter
(and thus the average path length). I know BBN did a lot of work in this
area, modelling the ARPANet to see where to add new links.  Perhaps someone
there can report briefly on what they recall?

Again, my specific recollection of that work is that it doesn't take many
non-local links to really help. Of course, then you have load issues on those
links, but that's another story...

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa03502; 28 Jan 94 12:51 EST
Received: from pizza by PIZZA.BBN.COM id aa29954; 28 Jan 94 12:18 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa29950; 28 Jan 94 12:15 EST
Received: from nsco.network.com by BBN.COM id aa01595; 28 Jan 94 12:14 EST
Received: from anubis.network.com by nsco.network.com (5.61/1.34)
	id AA21410; Fri, 28 Jan 94 11:18:02 -0600
Received: from blefscu.network.com by anubis.network.com (4.1/SMI-4.1)
	id AA01003; Fri, 28 Jan 94 11:13:12 CST
Date: Fri, 28 Jan 94 11:13:12 CST
From: Andrew Molitor <amolitor@anubis.network.com>
Message-Id: <9401281713.AA01003@anubis.network.com>
To: nimrod-wg@BBN.COM
Subject: Re: New Datagram Mode

	Let's first see if I have this right, then I will assume that
I have it right, and draw up a little example, to see if it's as useful
as Noel says.

	As I understand it, the New Datagram Mode treats the locator of
the source, massaged and glued to the locator of the destination, as a
sort of crude source route. I guess the things in it are different
things than in a real source route, but it sure smells like a source
route to me. [ As a side note, I expect it'd be a neat tweak to have the
source find the lowest common object in the two locators, turn the
source locator around, and paste the two together to form something
linear and more source route-like. This just tidies up the router code
a little, and saves a wee bit of per router forwarding cost. ]

	Now, playing fast and loose with terminology, I am going to
call this massaged and glued pair of locators a 'crude source route',
or CSR. Whether or not it actually exists doesn't matter, since it
exists logically.

	As I understand it, the New Datagram Mode uses this CSR to
do forwarding, but is permitted to look ahead in it to see if it has
a Datagram Mode Flow already set up to somewhere further up in the
CSR. If it does, over the flow it goes. If it doesn't, we just do
the usual source route thang.

	So, here's the example. One of the things Nimrod will do
a lot of is looking up locators. Lest I be told 'Wrong. DNS not magic.'
I will refer to this as 'Locator Location Service', or LLS, provided
by 'Locator Location Servers', also LLSs. I am guessing that you'd wind
up with DMFs as follows:

	- everyone's got a DMF to the root LLS (this is icky, you'd
need to surround the root LLS with a wad of routers to carry the load?
How do the root nameservers deal with it, anyways?)

	- regional network core routers will typically have a lot
of DMFs to local LLSs.

	Is that it? Anyways, to do a Locator Lookup you go right
over the flow to the root (and maybe your answer returns the same way?
These flows are bidirectional?). Then subsequent lookups wobble
through the hierarchy via Crude Source Routing, until they hit
the regional net of the destination, which short-circuits it to
the right place over the DMF to the local LLSs. If network.com
spends most of its bandwidth doing Locator Lookups to nmsu.edu,
a new DMF would/might magically appear from network.com's gateway
router to nmsu.edu's gateway router.

	I think that this thing will work. It remains to be seen if this
thing is the same as Noel's thing, though.

		Andrew Molitor


Received: from PIZZA.BBN.COM by BBN.COM id aa07120; 28 Jan 94 14:01 EST
Received: from pizza by PIZZA.BBN.COM id aa00575; 28 Jan 94 13:41 EST
Received: from MARENGO.BBN.COM by PIZZA.BBN.COM id aa00569; 28 Jan 94 13:39 EST
Date:     Fri, 28 Jan 94 13:33:35 EST
From:     Karen Seo <kseo@BBN.COM>
To:       nimrod-wg@BBN.COM
Subject:  apologies

Sorry for mistakenly redistributing Andrew Molinar's message.


Received: from PIZZA.BBN.COM by BBN.COM id aa25427; 30 Jan 94 18:57 EST
Received: from pizza by PIZZA.BBN.COM id aa09997; 30 Jan 94 18:42 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa09993; 30 Jan 94 18:40 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa25170; 30 Jan 94 18:40 EST
Received: by ginger.lcs.mit.edu 
	id AA21359; Sun, 30 Jan 94 18:40:30 -0500
Date: Sun, 30 Jan 94 18:40:30 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9401302340.AA21359@ginger.lcs.mit.edu>
To: amolitor@anubis.network.com, nimrod-wg@BBN.COM
Subject: Re: New Datagram Mode
Cc: jnc@ginger.lcs.mit.edu

<Ah, a *technical* message in my mailbox: excellent! I'm *really* tired
 of the politics and NomComm stuff! :->


    As I understand it, the New Datagram Mode treats the locator of the
    source, massaged and glued to the locator of the destination

There is actually no massaging of either locator, just the pointer.


    as a sort of crude source route. I guess the things in it are different
    things than in a real source route, but it sure smells like a source route
    to me.

I've had this argument before! No, a hierarchical locator is *not* a source
route (at least for any reasonable definition of the term "source route" :-),
in any of the varied ways of using it to route traffic, including traditional
hop-by-hop, as well as the NDM. I hope that all those who think it *is* will
have the time to read this message, wherein I will show conclusively,
absolutely and utterly that such a conception is misguided. (Yes, Paul, this
means you! :-)


First, you need to realize that there is a continuum which stretches from the
one end of a strict source route (in which every single physical asses to be
used to carry the packet is explicity identified), through loose source
routes, to the other end, which is pure hop-by-hop routing on non-hierarchical
addresses. (I hope that everyone can see that this represents the other
extreme, since clearly, even in a system which was as "anti-source-route" as
possible, the source would still have *some* influence, since it has to decide
where the traffic is going!) The thing which is present in different measures,
all along this continuum, is the *amount of influence* the *source* has in
*picking the path* for the traffic. 

Now, let's look at whether or not routing on hierarchical locators, whether
done in a hop-by-hop system or the NDM, can in any way be descibed as "source
routing". The usual argument seems to be that you can consider the list of
hierarchically related objects in the locator a "source route" of a sort,
since one routes first to the largest object, then to the next, etc. However,
it is fallacious to think that this means the locator is a source route.

Go back to the characteristic of a source route: it allows the source *some
control* over the path taken by the traffic. No such thing happens here!

In fact, if you think more deeply about what is happening in hierarchical
routing systems, the point becomes clear. A hierarchical address structure is
simply a way to reduce the overhead of the routing; this *usually* entails a
certain amount of non-optimal route selection. However, it is *theoretically*
possible to have a system in which, although the addresses are hierarchical
(so presumably the routing overhead is somewhat smaller), there *is* no loss
of routing optimality; i.e. all traffic takes *exactly the same paths it would
have taken had addresses been assigned "flat"*. In such a system, you'd be
hard pressed to say, with a straight face, that conversion from "flat"
addresses to hierarchical had introduced "source routes".

As an aside, such a routing system is basically impossible in the real-world.
To explain why would make this Noelgram impossibly long; some other time,
perhaps. Rather, like the "optimal page replacement algorithm" (which requires
knowledge of *future* memory reference patterns, clearly not a possible input
to a real-world algorithm :-), it is a useful theoretical measuring-stick
sometimes. Since real hierachical routing can get asymptotically close to this
state, depending on the amount of overhead you are willing to pay, I don't
think it's unreasonable to use it here to help make it clear why hierarchical
locators are *not* source routes.

You *could* make an argument that in a system where destinations have many,
many locators, which are differentiated on the basis of something about how
you get to the destination, e.g. by the long-haul carrier through which the
destination is reached, such locators *are* source-routes in a way, since
*they allow the source to say something about which route it prefers*. This is
entirely different thing, however.

The correct way to look at that is that in such a system, for each destination
there is a different "tree" of routes through the network (from each possible
source, to that destination) for each locator the destination has. Thus,
selection of one locator over another basically picks one "tree" over another.
However, note, that this selection process would be the same *regardless of
whether the locators were flat or hierarchical*. The hierarchical stuff is
simply an optimization. It's the multiple locators that provide *all* the
soucre routing function, *not* the hierarchical nature.


I hope everyone will now be convinced that any *appearance* of "source
routing" in the use of hierarchical locators is simply an illusion. The
source has no more control over the route taken by the traffic with
hierarchical locators than with flat ones. The hierarchical locators may
produce slightly non-optimal routes, but that's a whole different bowl of
wax.


    As a side note, I expect it'd be a neat tweak to have the source find the
    lowest common object in the two locators, turn the source locator around,
    and paste the two together to form something linear and more source
    route-like. This just tidies up the router code a little, and saves a wee
    bit of per router forwarding cost.

I don't think so; as far as I can tell without actually doing the code out in
detail, it probably does not in fact make the job of the routers much easier.

If the routers have optimizations in the database of DMF's (e.g. a packet from
A.B.C.D to P.Q.R.S might get to router on the border of A.B.C and find that it
had a direct DMF to P.Q.R), you still have to do "longest match" on the
destination locator to look for these optimized DMF's. If we have truncated
locators in the packets, that would probably just make the search harder, and
ambiguous: if you have DMF's to A.P.X.Y and A.Q.X.Y and you see a packet for
X.Y, how do you know which one to pick? Sure, you can backtrack and figure it
out, but is it realy worth it?

I dunno, I expect that it will prove to be non-optimal, but we can probably
look at it when the mechanism is designed in detail.

    As I understand it, the New Datagram Mode uses this CSR to do forwarding,
    but is permitted to look ahead in it to see if it has a Datagram Mode Flow
    already set up to somewhere further up in the CSR. If it does, over the
    flow it goes. If it doesn't, we just do the usual source route thang.

Yes, except thatthe thing you do isn't "the usual source route thang", but
rather "the usual hierarchical routing thing" (modulo the temporary assignment
to a DMF, which is hardly the usual hierarchical routing thing :-).


    So, here's the example. One of the things Nimrod will do a lot of is
    looking up locators. ... I will refer to this as 'Locator Location
    Service', or LLS, provided by 'Locator Location Servers', also LLSs. I am
    guessing that you'd wind up with DMFs as follows:

	- everyone's got a DMF to the root LLS (this is icky, you'd
	need to surround the root LLS with a wad of routers to carry the load?
	How do the root nameservers deal with it, anyways?)

First, we didn't design the DNS so that everyone has to go to the same root,
and I imagine the LLS would be the same way; you have multiple roots, you
accept local caches of stuff from the root table, etc, etc. But this is not
the main reply..

Second, you do *not* have DMF's directly from sources to destinations. You
have a limited set of DMF's, from each hierarchical object to a set of
"nearby" hierarchical objects. This mesh of DMF's is sufficient to get packets
from any place in the network to any other, traversing a number of *different*
DMF's on the way. (The set of DMF's each node has to have has a certain
minimal theoretical size if the system is to function at all, and you can add
extra ones to get rid of non-optimal routing caused by the pure hierarchical
routing of the minimum DMF set.)

    - regional network core routers will typically have a lot of DMFs to
    local LLSs.

Again, no, because the routers generally wouldn't have DMF's directly to
destinations.

    Is that it? Anyways, to do a Locator Lookup you go right over the flow to
    the root (and maybe your answer returns the same way? These flows are
    bidirectional?).

You couldn't get to the root LLS unless you had its locator. If you had its
locator, you didn't need to use a flow to get to it, you could send out a
datagram using the NDM, and it would be forwarded over a set of DMF's to
get to the root LLS.

    Then subsequent lookups wobble through the hierarchy via Crude Source
    Routing, until they hit the regional net of the destination, which
    short-circuits it to the right place over the DMF to the local LLSs.

Again, probably not a DMF directly to the local LLS. (You *might* put this
in as an optimization, see above, but only if there were enough traffic to
warrant it.)

    If network.com spends most of its bandwidth doing Locator Lookups to
    nmsu.edu, a new DMF would/might magically appear from network.com's
    gateway router to nmsu.edu's gateway router.

If enough NDM traffic went from network.com to nmsu.edu, the relevant routers
might set up a DMF between themselves, to optimize the routing of NDM traffic.


	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa05902; 31 Jan 94 0:30 EST
Received: from pizza by PIZZA.BBN.COM id aa11032; 31 Jan 94 0:13 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa11022; 31 Jan 94 0:11 EST
Received: from nsco.network.com by BBN.COM id aa05482; 31 Jan 94 0:11 EST
Received: from anubis.network.com by nsco.network.com (5.61/1.34)
	id AA03106; Sun, 30 Jan 94 23:14:55 -0600
Received: from blefscu.network.com by anubis.network.com (4.1/SMI-4.1)
	id AA12585; Sun, 30 Jan 94 23:10:02 CST
Date: Sun, 30 Jan 94 23:10:02 CST
From: Andrew Molitor <amolitor@anubis.network.com>
Message-Id: <9401310510.AA12585@anubis.network.com>
To: nimrod-wg@BBN.COM
Subject: Re: New Datagram Mode

>I've had this argument before! No, a hierarchical locator is *not* a source
>route

	I confess that this was partly Noelbait. However, I did not
previously, but do now, see why it's reasonable to say that this locator
pair is not a source route. I will concede that point, and try to be
a good little trooper and not call these things source routes.

>Second, you do *not* have DMF's directly from sources to destinations. You
>have a limited set of DMF's, from each hierarchical object to a set of
>"nearby" hierarchical objects. This mesh of DMF's is sufficient to get packets
>from any place in the network to any other, traversing a number of *different*
>DMF's on the way. (The set of DMF's each node has to have has a certain
>minimal theoretical size if the system is to function at all, and you can add
>extra ones to get rid of non-optimal routing caused by the pure hierarchical
>routing of the minimum DMF set.)

	Oh dear, I failed to make myself clear. I didn't mean that DMFs
went from sources to destinations, but that there might well wind up
being one set up between an Important Router at one's site to an
Important Router near a Locator Location Server, and that NDM packets
would tend to wend there way to the first Important Router which would
then say to itself 'Ah HA! An LLS packet!  I know how to get this thing
a good long ways toward that there root LLS!' and zap it over the DMF.


>You couldn't get to the root LLS unless you had its locator. If you had its
>locator, you didn't need to use a flow to get to it, you could send out a
>datagram using the NDM, and it would be forwarded over a set of DMF's to
>get to the root LLS.

	Yeah, this is what I was trying to say. 

	Let's club an analogy to death here.

	Let's pretend we're trying to get from Middletown CT to
Las Cruces, NM by automobile, using NDM. The locators look like:

	USA.Connecticut.Dead-Center.Ask-a-Native
and

	USA.New-Mexico.South-East.Ask-a-Native

	and I do indeed see that these two things do not really
consititute a route, source or otherwise. from the one to the
other, though they do (assuming a little intelligence here and
there, at USA-level entities, mostly) have sufficient information
to get from one to the other.

	NDM proposes, essentially, that we have superhighways,
cleverly renamed 'Datagram Mode Flows.' Well, not quite, since
the topologcal restrictions are not as strict -- if there's a great
deal of traffic from Middletown to Las Cruces, then by gum, we
run 16 lanes each way from one to the other, and no there is no
Tulsa exit.

	It occurs to me that Noel said almost exactly this not too
long ago, but I have to say it myself, just like I'd invented it,
before I can get my itty bitty brain around it. Thanks for bearing with
me, folks!

		Andrew Molitor


Received: from PIZZA.BBN.COM by BBN.COM id aa20721; 1 Feb 94 11:23 EST
Received: from pizza by PIZZA.BBN.COM id aa21006; 1 Feb 94 10:48 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa21002; 1 Feb 94 10:44 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa17733; 1 Feb 94 10:39 EST
Received: by ginger.lcs.mit.edu 
	id AA01152; Tue, 1 Feb 94 10:39:35 -0500
Date: Tue, 1 Feb 94 10:39:35 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9402011539.AA01152@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Locators, and physical topology
Cc: jnc@ginger.lcs.mit.edu

	For those of you who aren't on ROLC, you might find the following
message interesting. It points out the kind of difficulty you can have when i)
you rely on your locators to tell you something about the underlying physical
connectivity, and ii) your locators do not accurately depict that
connectivity.
	I realize we aren't sure quite yet what, if anything, Nimrod locators
will tell you, but we should note two things: i) *some* mechanism is going to
be needed to find out about connectivity, and ii) that mechanism better
accurately do so.
	If locators do tell us anything, we need to keep the kind of problems
seen here in mind, especially in light of the stuff I talked about a while
back where there are forces pulling locators toward different needs; i.e.
policy, abstraction, and representation of the physical topology.

	Noel

--------

    for the bigger problem of next hop resolution i suggest that we use
    either directed arp or tony's protocol.  these are ok if the next hop
    can be resolved in several steps and also can cope with the problem of a
    better route becoming available dynamically.

I've been thinking about this whole issue of routing since the last IETF, and
the meeting where the "loop" scenario was presented. There are two separate
fundamental problems.

For the case of trying to optimize paths from one router to another, I
think that there is unfortunately a fundamental clash between the current
"hop-by-hop" routing architecture used by the Internet, and the
(effectively) flow-setup used by most attempts to perform this optimization
across multiple logical NBMA's on a single physical NBMA (i.e. getting rid
on an intermediate IP router step which turns out not to be needed). As
such, there are no *real* fixes to this problem. It's like trying to put
wheels on one side of a car, and paddles on the other; the two ain't never
gonna get it together, no matter how hard you try. The potential problems
caused by this include fatal failures such as the creation of routing loops.

There is another problem, not quite as serious (the effects aren't fatal),
which affects the process of a host picking an exit router (when the
ultimate destination is not on the NBMA mesh), and noticing when a given
exit router is no longer the optimal exit router. This has to do with the
fact that routing tables are usually organized and calculated to find the
best route from *here* to a destination, and usually cannot tell you if
*here* is on the best path from the *source* to the destination. Again,
there are no real solutions to this problem.

It's important to realize that neither of these means you are utterly
screwed; they just mean that there are limits on what you can achieve
within the current architectural framework, and those limits have to be
respected. Attempting to go beyond them will land you in the soup...


In more technical terms, here's a description of the first problem.

The hop-by-hop model is susceptible to a particularly painful failure mode,
which is the formation of routing loops. Various mechanisms are usually
used to prevent them, but they all boil down to the same thing: the
databases used to make routing decisions (i.e. routing tables, etc) have to
be maintained in a *consistent* state. In other words, when a change
happens to the topology, the effects of that change must be *reliably*
propogated to *every* router which is affected. More technically, every
node in the network topology which is *potentially* part of a routing loop
must be reliably updated to a consistent state, lest a routing loop be
formed.

The limitation to nodes which could become part of routing loops is because
"leaf" nodes like hosts, which cannot be the destination of any traffic
except that which is directly for them, cannot be part of any loop. In
other words, if a host has the first hop "wrong", it's not the end of the
world; the traffic will take a less efficient path from the source to
the destination, but it will get there. However, if a router gets confused
(i.e. it's not consistent with all the other routers), all sort of trouble
may *potentially* break out.

With this in hand, let's look at the kind of failure mode which is
basically *inescapable* with the kind of optimization process that has been
discussed. The problem is that the optimization represents state about
the routing, and state which is *not* under the control of the carefully
arranged mechanisms of routing which make sure that that state is consistent
across the network.

As an example, let's say that router A get to X via the path B,C,D, all of
which are on the same physical NBMA. A contains B as the "next hop" entry
for X in its routing table. A goes to look up the physical address of B,
and gets the PA of D via an optimization process. Now, some change happens
in the topology *outside* the NBMA which makes D decide that *A* is the
place to send traffic for X. Bingo, routing loop.

You can say that "Oh, that's easy to fix, we make A notice that something
has changed in the routing, and redo its setup for X", but that's not so
easy to guarantee. Maybe the change resulted in A's best next hop no longer
being B, in which case A will have noticed something. However, it might have
left it at B, in which case you are screwed. I don't recall the details of
the example from the IETF any more, but it was something like this case.

The point is that there are fundamental reasons why this kind of failure
mode is going to be there, and they have to do with attempts to bypass the
severe consistency requirements of "hop-by-hop" model; you're creating
routing state which is not under the control of the mechanisms which are
intended to maintain that required consistency.

Are there ways to do this optimization? Yes, but they all boil down to
recognition of the real topology, and *doing so at the internetwork routing
layer*. In other words, if A,B,C and D are all attached to the same
physical network, and can talk directly, *the internetwork routing has to
know about it explicitly*.

Right now, routers know that P and Q are directly connected if their
interface addresses are the same when masked by the "IP network mask" for
that physical interface. Since the desire is to allow (effectively) random
IP addresses for different parts of a physical NBMA mesh, an alternative
mechanism would need to be found to convey this information, and the
routing algorithm/protocol would have to be looked at to make sure it works
with this kind of thing. (For instance, I don't know if OSPF would work
if the topology map showed router 1.2.3.4 connected directly to 10.11.12.13,
but it might.)


The problem with noticing when a given exit router is no longer the optimal
exit router is a little more intractable. It turns out we have this problem
now, but because of the way the Internet is addressed, and physically put
together, the way the Redirect mechanism fails is usually not so noticeable.

For an example of a currently non-working scenario, assume that net N has
two routers off it, R1 and R2, both of which are connected to the "rest of
the world" cloud. S, a host on N, is getting to D (out there in the RotW)
via R1. Now, the routing changes such that the best path from S to D is via
R2, and *R1's next hop for D is not on N*. Most current router
implementations will *not* notice this; S will never get a redirect from
R1 to R2 for D.

The problem is simple: as stated, routing tables are usually organized and
calculated to find the best route from *here* to a destination, and usually
cannot tell you if *here* is on the best path from the *source* to the
destination.

Fixing this would require substantial changes to the routing tables, and
their calculation; you'd basically have to maintain a routing table *per*
interface* which listed what the best exit router from that network was to
each destination. The overhead of all this is so high that most vendors
don't bother. In fact, if you look at the *specification* of most routing
protocols, the algorithm *in the spec* does not do this.

Now, as stated, this problem is not the end of the world; the packets are
going to get there, and if there is a *serious* non-optimality in the path
(e.g. R1 lists its best next hop as R2, on the same net N as the source S),
you usually find out about it, and the source gets a Redirect.

As can be imagined, this "failure mode" becomes i) much more likely when a
much larger net, and pool of directly connected hosts and router, is
present, and ii) much harder to detect when all sorts of source addresses
turn out to be directly connected to the router. The router is going to
have to look at each packet that comes in, and say "am I on the best path
to that destination", and if not, try and figure out if it is directly
connected to the source of the packet. Alternatively, if it knows all the
NBMA connections to it, when a topology change happens, it can try and find
the ones which would be affected, and send them a Redirect, but this is
subject to the kind of problems raised above, although any failures are not
fatal in this case.

Of course, the whole thing is considerably easier if the router knows the
*real* physical topology, as opposed to some logical topology which is
laid on top of it, and through which it can only dimly see to the underlying
physical topology. Obviously, if *all* it could see was the logical topology,
it wouldn't have to worry about it at all. It's this in-between state, neither
fish nor fowl, which is so hard to deal with.


	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa01285; 1 Feb 94 14:27 EST
Received: from pizza by PIZZA.BBN.COM id aa22494; 1 Feb 94 14:03 EST
Received: from BBN.COM by PIZZA.BBN.COM id ab22487; 1 Feb 94 14:01 EST
Received: from inet-gw-1.pa.dec.com by BBN.COM id aa29688; 1 Feb 94 14:00 EST
Received: from nacto1.nacto.lkg.dec.com by inet-gw-1.pa.dec.com (5.65/13Jan94)
	id AA12378; Tue, 1 Feb 94 10:35:14 -0800
Received: from sneezy.nacto.lkg.dec.com by nacto1.nacto.lkg.dec.com with SMTP
	id AA03899; Tue, 1 Feb 1994 13:35:13 -0500
Received: by sneezy.nacto.lkg.dec.com (5.65/4.7) id AA13541; Tue, 1 Feb 1994 13:35:12 -0500
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Subject: Re: Locators, and physical topology
In-Reply-To: <9402011539.AA01152@ginger.lcs.mit.edu>
References: <9402011539.AA01152@ginger.lcs.mit.edu>
Cc: nimrod-wg@BBN.COM
X-Mailer: Poste 2.1
From: David R Oran <oran@nacto.lkg.dec.com>
Date: Tue, 1 Feb 94 13:35:11 -0500
Message-Id: <940201133511.5819@sneezy.nacto.lkg.dec.com>


> exit router is no longer the optimal exit router. This has to do with the
> fact that routing tables are usually organized and calculated to find the
> best route from *here* to a destination, and usually cannot tell you if
> *here* is on the best path from the *source* to the destination. Again,
> there are no real solutions to this problem.
> 
I know what you're getting at here Noel, but be careful because you
overstate the case. Link-State routing does in fact organize routing
tables such that *here* can know if it is on the best path from
the source to the destination. All you need to do is run an
SPF picking the source as the root of the SPF tree instead of
*here* and halting if *here* is placed onto the tree. If the
algorithm places the destination on the tree before *here*,
then conversely, *here* is not on the tree.

The objection to link-state routing in the NBMA case is that
while the SPF calculation will find optimal routes and avoid
extra packet hops, it still requires control traffic to flow over
all possible NBMA router pairings to maintain the topology.
This can be optimized somewhat by using "designated router"
techniques (as OSPF does), but the background traffic is still
O(N), where N is the number of routers on the NBMA net.

Dave.

BTW: my silence on the list doesn't mean I haven't been reading stuff.


Received: from PIZZA.BBN.COM by BBN.COM id aa04357; 1 Feb 94 15:14 EST
Received: from pizza by PIZZA.BBN.COM id aa22711; 1 Feb 94 14:51 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa22707; 1 Feb 94 14:48 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa02519; 1 Feb 94 14:43 EST
Received: by ginger.lcs.mit.edu 
	id AA04917; Tue, 1 Feb 94 14:43:18 -0500
Date: Tue, 1 Feb 94 14:43:18 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9402011943.AA04917@ginger.lcs.mit.edu>
To: jnc@ginger.lcs.mit.edu, oran@nacto.lkg.dec.com
Subject: Re: Locators, and physical topology
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM

    > This has to do with the fact that routing tables are usually organized
    > and calculated to find the best route from *here* to a destination, and
    > usually cannot tell you if *here* is on the best path from the *source*
    > to the destination.

    be careful because you overstate the case. Link-State routing does in fact
    organize routing tables such that *here* can know if it is on the best
    path from the source to the destination. All you need to do is run an SPF
    picking the source as the root of the SPF tree instead of *here* and
    halting if *here* is placed onto the tree.

I sort of alluded to this later on, when I said:

    > Fixing this would require substantial changes to the routing tables, and
    > their calculation; you'd basically have to maintain a routing table *per*
    > interface* which listed what the best exit router from that network was
    > to each destination. The overhead of all this is so high that most
    > vendors don't bother. In fact, if you look at the *specification* of most
    > routing protocols, the algorithm *in the spec* does not do this.

The fix you outlined works for SPF; there's an equivalent one for DV, which
involves only entering the updates received over interface X in the routing
table for interface X. In other words, in both SPF and DV, all the data you
need is already there in the packets you get, but the algorithm as per the
spec doesn't calculate it for you.


    The objection to link-state routing in the NBMA case is that while the SPF
    calculation will find optimal routes and avoid extra packet hops, it still
    requires control traffic to flow over all possible NBMA router pairings to
    maintain the topology. This can be optimized somewhat by using "designated
    router" techniques (as OSPF does), but the background traffic is still
    O(N), where N is the number of routers on the NBMA net.

Right. It also requires the routers to all know that they (and the hosts which
are associated with them) can all directly communicate, which is something that
it's not possible to tell purely by inspection of IP addresses, in most
schemes.

Then there are all the additional complexity of "how do you handle connectivity
restrictions at the NBMA layer; i.e. A can connect directly to B, and B to
C (over the same interface that B gets to A with), but not A to C....

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa23444; 1 Feb 94 22:00 EST
Received: from pizza by PIZZA.BBN.COM id aa25370; 1 Feb 94 21:43 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa25365; 1 Feb 94 21:41 EST
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa22522;
          1 Feb 94 21:36 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 2 Feb 94 10:55:52 +0859
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta>
Message-Id: <9402020156.AA16436@necom830.cc.titech.ac.jp>
Subject: Re: Locators, and physical topology
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Date: Wed, 2 Feb 94 10:55:50 JST
Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu
In-Reply-To: <9402011539.AA01152@ginger.lcs.mit.edu>; from "Noel Chiappa" at Feb 1, 94 10:39 am
X-Mailer: ELM [version 2.3 PL11]

> 	For those of you who aren't on ROLC, you might find the following
> message interesting. It points out the kind of difficulty you can have when i)
> you rely on your locators to tell you something about the underlying physical
> connectivity, and ii) your locators do not accurately depict that
> connectivity.

If WAN is conventional PDN, routing packet is charged to the end user,
which makes periodic flooding of routing information costly.

But, if, like nimrod, WAN is operated directly by IP, there is no cloud
and there is no routing over the large cloud. Routing packet will be
directly handled by WAN operators and its cost will be included in
the basic maintainance fee.

BTW, at physical layer, there is no such thing as NBMA. Trying to
build link level NBMA is just wasting of information, which make
the cloud.

Am I completely misunderstand something?

							Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa02567; 4 Feb 94 11:27 EST
Received: from pizza by PIZZA.BBN.COM id aa10245; 4 Feb 94 11:07 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa10241; 4 Feb 94 11:04 EST
Received: from mitsou.inria.fr by BBN.COM id aa00688; 4 Feb 94 10:54 EST
Received: by mitsou.inria.fr
	(5.65c8/IDA-1.2.8) id AA09872; Fri, 4 Feb 1994 16:56:51 +0100
Message-Id: <199402041556.AA09872@mitsou.inria.fr>
To: David R Oran <oran@nacto.lkg.dec.com>
Cc: Noel Chiappa <jnc@ginger.lcs.mit.edu>, nimrod-wg@BBN.COM
Subject: Re: Locators, and physical topology 
In-Reply-To: Your message of "Tue, 01 Feb 1994 13:35:11 EST."
             <940201133511.5819@sneezy.nacto.lkg.dec.com> 
Date: Fri, 04 Feb 1994 16:56:51 +0100
From: Christian Huitema <Christian.Huitema@sophia.inria.fr>

=> 
=> > exit router is no longer the optimal exit router. This has to do with the
=> > fact that routing tables are usually organized and calculated to find the
=> > best route from *here* to a destination, and usually cannot tell you if
=> > *here* is on the best path from the *source* to the destination. Again,
=> > there are no real solutions to this problem.
=> > 
=> I know what you're getting at here Noel, but be careful because you
=> overstate the case. Link-State routing does in fact organize routing
=> tables such that *here* can know if it is on the best path from
=> the source to the destination. All you need to do is run an
=> SPF picking the source as the root of the SPF tree instead of
=> *here* and halting if *here* is placed onto the tree. If the
=> algorithm places the destination on the tree before *here*,
=> then conversely, *here* is not on the tree.
=> 
=> The objection to link-state routing in the NBMA case is that
=> while the SPF calculation will find optimal routes and avoid
=> extra packet hops, it still requires control traffic to flow over
=> all possible NBMA router pairings to maintain the topology.
=> This can be optimized somewhat by using "designated router"
=> techniques (as OSPF does), but the background traffic is still
=> O(N), where N is the number of routers on the NBMA net.
=> 

Dave,

The algorithm you describe is exactly that implemented by MOSPF. I objected
strongly to it when I first saw the proposal, for:

	1) Running SPF for *here* has a computational cost of 0(N.log N)

	2) So has running SPF from *there*

	3) and the number of *there* is 0(N)

	4) so the whole thing is 0(N**2.log(N))

Jon Moy got his way through the "computer is cheap, link costs" argument.

Note that if you just do RPF in "flood and prune" mode, you don't need to
flood the group membership to all routers and you only need an SPF from "here"
(using he reverse metrics). Arguably, flooding packets from various sources
and sending back prunes has the same network cost as flooding
membership-link-state updates through an acknowledged protocol. This as been
heavily debated in IDMR.

Christian Huitema


Received: from PIZZA.BBN.COM by BBN.COM id aa06453; 8 Feb 94 16:31 EST
Received: from pizza by PIZZA.BBN.COM id aa02170; 8 Feb 94 16:07 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa02166; 8 Feb 94 16:04 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa03940; 8 Feb 94 16:00 EST
Received: by ginger.lcs.mit.edu 
	id AA22740; Tue, 8 Feb 94 15:59:59 -0500
Date: Tue, 8 Feb 94 15:59:59 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9402082059.AA22740@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Security...
Cc: jnc@ginger.lcs.mit.edu

One nice thing about the NDM is that it makes it a lot harder to put a totally
bogus source locator in the packet, and have the routers handle it correctly
(as is now the case). This may be important down the road, from a security
angle.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa16019; 13 Feb 94 9:50 EST
Received: from pizza by PIZZA.BBN.COM id aa24536; 13 Feb 94 9:30 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa24532; 13 Feb 94 9:27 EST
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa15501;
          13 Feb 94 9:27 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Sun, 13 Feb 94 23:17:42 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9402131417.AA25530@necom830.cc.titech.ac.jp>
Subject: Re: New datagram mode
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Date: Sun, 13 Feb 94 23:17:40 JST
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
In-Reply-To: <9401261743.AA11761@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 26, 94 12:43 pm
X-Mailer: ELM [version 2.3 PL11]

> <Again, my apologies for the delay in the reply.>

Don't mind. I've been skiing in Swiss last week, which delayed my reply.
Is Seatle in March good for skiing?

>     Flow ID is not considered to be internetwor layer.
> 
> Perhaps we are using the same term ("flow ID") to refer to two different
> things, since in my conception a "flow ID" is just the name (label,
> identifier) of an internetwork layer object, the "flow".

I think we share the common definition.

> To me, a flow is a series of packets which belong together

Flow is a unit of resource reservaiton.

> This relationship between the packets is made across the entire internetwork
> system (i.e. across many networks), on an end-end basis, and is thus visible
> to the internetwork layer. It is thus absoutely an internetwork layer concept.

Just as there are connections in different layers, there are flows in
different layers.

> Yes, the forwarding based on locators would happen only in "active routers".
> (The term "active" is not a particularly good one, just something I picked
> in a hurry because I needed a term to distinguish that set of routers.)

So my proposal is to call your "active router" just "router". The rest
is just bridges.

> Again, you seem to have a unusual definition of "bridge". To me, a bridge is a
> device which forwards packets based on the local physical network header (i.e.
> 802.*, or whatever). All of the devices I am talking about would be forwarding
> packets based on the internetwork header, which is why I call them all
> routers.

Flow ID is a transport layer entity used at several lower layers.

>     So, my question is, how can you globally propergate the information?
>     I don't think you can use DNS here.
> 
> Why not? That's part of the reason for controlling the rate of change, to
> bring it within the rate that the DNS can handle.

Why not?

>     Anyway, how can you handle partitioning?
> 
> This is an open point; there are a number of potential scheme, and a decision
> as to which set (since I don't think one alone will do it) has not yet been
> made. We will be discussing it soon, I expect.

As the partitioning being the open point, I don't think you can do anything
with DNS.

> Also, there has been some discussion about the need to allow things to have
> multiple locators to make this "renumbering" easier in practise; we can't have
> a "flag moment" when every locator within the scope of the change gets updated.
> We need a mechanism which allows the process to happen over some reasonable
> time period, with interoperation with the rest of the system continuing while
> the change happens. Allowing (temporary) multiple locators allows this.

You merely repeated the issue I raised. The question is "how" can you
do so, not "what" you should do.

>     >>  And the flow setup is the worst.
> 
>     > But it is only performed once (so it does not even add any delay), and
>     > the cost [of setting up a DMF is] shared between any number of packets.
> 
>     You should be assuming connected UDP, then.
> 
> I didn't follow this?

Unless you assume connected UPD or things like that, the connection won't
be reused frequently enough.

>     > Remember, most routers will only be doing a flow-lookup in this scheme,
>     not looking at the locator.
> 
>     With my scheme, only the source cares locator.
> 
> I thought that your scheme involved intermediate routers making routing
> decisions for packets based on the EID of the next border router; this
> sequence of EID's is the locator in your scheme.

Exactly.

It should be noted that exact match is enough. No best match necessary.

> Your intermediate routers have to look at the current EID in the
> locator (i.e. a non-fixed offset in the packet),

No. A packet does not contain any locator. It contains list of EIDs.

The list will be made by the source from a locator, at random or
with some policy.

> unless you have copied the
> "current" EID to some other location in the packet.

Loose source route mechanism of IPv4 and SIPP will be directly applicable.

> I am assuming that you are not looking at all of them to find the rightmost
> one that any given router has in its table; this, unfortunately, is what you
> need to do to find the "optimal" (within the amount of routing data that you
> have passed around to the routers in your system) path.

I don't think any scheme with thining can find the "optimal" path.

With my scheme, the source can choose the best path using the available
routing data.

> You could combine the
> two, and have the intermediate border router (above) set the next border
> router to aim for to be not just the next one in the list, but the rightmost
> one it has is it's routing table, which will get you a somewhat optimized
> route.

I think it complex and consumes a considerable amount of processing power
of routers.

> It still probably won't be as good as the new datagram mode, since you
> will have to head for the particular border router (named by its EID) in your
> locator, not the closest one into that area (which may or may not be the
> optimal entry router for the ultimate destination, sigh, another
> complication).

If you want to have some preference, it is easy to do with DNS completely
statically. That is:

	<eid>	PEID	<preerence1> <eid of border router 1 of parent area>
	<eid>	PEID	<preerence2> <eid of border router 2 of parent area>
	<eid>	PEID	<preerence3> <eid of border router 3 of parent area>

where PEID is an RR name for EID of border routers of parent area.

>     > That's the whole point of having the minimal set of DMF's necessary to
>     > do pure hierarchical routing, augmented *as necessary* where the amount
>     > of traffic justifies the cost of extra DMF's. You only go beyond the
>     > minimal set (which has been shown to be quite small) if there *are* many
>     > datagrams; i.e.  if the actual traffic justifies it.
> 
>     OK. Suppose the traffic needs the maximum set. How large is the maximum?
> 
> Impossibly large, but this is true of *any* routing architecture.

Mine does not.

Without reasonable maximum, you can't measure the meaningfull efficiency
of your scheme.

> I don't know how to do this, and I suspect we may never get a simple,
> guaranteed optimal algorithm (it feels NP-complete), but as we get better and
> better practical approximations, the "algorithm-independant" nature of Nimrod
> will allow us to deplyoy it incrementally, with no global coordination

I'm afraid that, without impossily large amount of routing information,
performance of your scheme is poor.

>     > Note also that if you think a DMF is unlikely to have any traffic across
>     > it, set it up on demand, not in advance. That way, the only DMF's that
>     > get set up are the ones that get used.
> 
>     As the communication is connectionless, you can't expect much usage
>     pattern.
> 
> This is a conjecture which I suspect is wrong, but I can't prove it right at
> the moment.

You can't, of course.

> However, I can hand-wave. For instance, cars on roads have a lot
> of the characteristics of datagram, but there are definitely usage patters.
> You can also look at phone networks; individual calls have a lot of the
> same characteristics as datagrams, and there, too, there are useage patterns.

What a faulty hand-waving.

You can see usage pattern on a single link, of course.

You can't see usage pattern on DMF connection, unless the number of
DMF connection is O(number of links) or something as small as that,
in which case, traffic concentrates.

>     Especially, packets which travels long distance, which loads top level
>     routers tends to have less pattern, becase end organizations are less
>     related.
> 
> This is true, but I suspect we'll have to monitor a real network to know
> what the actual patterns are. I don't think we can predict them.

You don't have to predict them. Excessive loading will occur.

We should just avoid it as much as possible.

>     > I must have missed something. If we do what you suggest, and buffer "best
>     > effort" datagrams, won't they be dropped at precisely the same points on
>     > congestion? The actual effects of the DMF scheme ought to be the same,
> 
>     If you try to have QoS along each flow, some of the bandwidth is reserved
>     and wasted, which is the difference.
> 
> It depends on your resource allocation system. As far as I know, most proposed
> resource allocation architectures allow reserved, but unused, bandwidth to be
> given to "capacity available" traffic.

If you reserve resources only to forward connectionless packets, there
will be eventually NO traffic which use "best effort" strategy.

Connectionless packets should be sent as the "capacity availale" traffic.

							Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa17586; 13 Feb 94 11:26 EST
Received: from pizza by PIZZA.BBN.COM id aa24734; 13 Feb 94 10:59 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa24730; 13 Feb 94 10:57 EST
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa17041;
          13 Feb 94 10:57 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Mon, 14 Feb 94 00:48:22 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9402131548.AA25716@necom830.cc.titech.ac.jp>
Subject: Re: Analysis of DMF's in new datagram mode
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Date: Mon, 14 Feb 94 0:48:20 JST
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
In-Reply-To: <9401261849.AA12674@ginger.lcs.mit.edu>; from "Noel Chiappa" at Jan 26, 94 1:49 pm
X-Mailer: ELM [version 2.3 PL11]

>     > I was describing the minimal functional configuration.
> 
>     Because of the load concentration, I don't think your configuration
>     function.
> 
> The answer to this has two parts. First, I don't think it would take *lot*
> more state to provide better routing. The argument is given below, in the
> original message.
> 
> Second, to the extent that your physical network configuration provides load
> concentrations, this is a problem with the configuration which routing alone
> cannot solve. I prefer meshes, with lots of smaller routers, for the simple
> reason that the load is spread over more paths.

>     > An interior router which had a instantiated DMF (not just a potential
>     > DMF) [to] every other object in the area would still have only as many
>     > DMF's ending at it as a border router of the area, and that case has been
>     > analyzed as O(I), which is reasonable.
> 
>     Your configuration is wrong as to the configuration within an area,
>     of course.
> 
> How is it wrong? It's not at all obvious to me..

There is no such thing as "a potential DMF".

>     But, your configuration is also wrong as to the configuration of area
>     hierarchy. The number of levels must be limited,
> 
> Why?

Increasing of levels mean shrinking of an area, which reduces the amount
of area local traffic. If areas are shrunk beyond some threashold, which
is determined by traffic pattern, the global traffic will exponentially
explodes.

> Everyone seems to agree that the only way to scale the system is to
> increase the number of levels,

Yes, but the problem is that we can't have arbitrary many number of levels.

> and in fact, in general *all* routing
> architectures seem to have the characteristic that increasing the number of
> levels increases the overhead a lot more slowly than increasing the size of
> existing levels.

Unless the size of a level is large enough, there won't be enough locality
of traffic within the level.

>     > If every interior router had an instantiated DMF to all the objects in
>     > the area, the number of flows through each router in the area would be
>     > O(IlogI), which is a little worse, but not terribly so, since I is
>     > unlikely to grow large; we won't have enormous areas.
> 
>     Because of planarity, it is O(I^1.5), where I is not constant.
> 
> I will discuss the planarity issue below, but even if it *were* O(IsqrtI),
> that would still not be impossible, since I don't expect to see massive
> growth in I (the number of one-level-down-objects in the average area).

At the top level, I think "I" be as large as 10,000 or 100,000.

>     > In addition, I doubt we will see full mesh connectivity; traffic X-Y
>     > graphs always show hot-spots, not an even distribution, at any scale.
> 
>     Hot spots make the load concentration worse.
> 
> These "hot spots" are not *physical* hot-spots, but source-destinaion traffic
> matrix entries which show larger counts than average.

Hot spots make the load concentration worse, anyway.

> you'd modify it to get rid of the hot spots.

That's the solution agaist the host spot problem.

>     Top level links MUST have enough bandwidth.
> 
> True, but this is a physical topology design point, not a routing architecture
> design point.

Routing archtechture must be designed so that the commonly available
top level link bandwidth is wide enough.

>     That is, there should be a lot of second level areas and they should be
>     connected with a lot of links. Your conffiguration, which assumes small
>     areas, do not allow such a coniguration.
> 
> I'm not quite sure I follow this.

Perhaps, I should have written:

>     That is, there should be a lot of first level areas and they should be
                                        ^^^^^
>     connected with a lot of links. Your conffiguration, which assumes small
>     areas, do not allow such a coniguration.

I have assumed the first level area is the entire world, which you should
be refering as the zero-th level area. Is that your terminology?

>     If you increase number of links without increasing the number of nodes,
>     load will concentrate not in links but on border routers.
> 
> Good point; we well need to increase the number of border nodes too. This
> obviously makes hash of my next statement, but it turns out that DMF growth is
> more correlated to the number of *interior* objects, and not to the number of
> border routers,

What? If you don't make use of increased number of border routers, the
increse is meaningless.

>     Everything has its own limitaiton. Still, we can expect some limitation
>     scale as time goes by. For example, allowable size of routing information
>     is expected to scale as the link speed increases.
> 
> True, but I think the limit at the moment is memory, not bandwidth, although
> as I explained (with memory capacities going as the square of feature size,
> whereas device speed goes linearly with feature size) I expect this balance
> to shift.

Memory is not at all an issue, already. 100MB of memory is much cheaper than a
100Mbps link.

> Oh, I'm not. I'm just assuming that i) growth in the size of the network will
> be *faster* than technology for some years to come (how fast is the Internet
> growing now), so we can't accomodate that growth purely with growth in
> technology (line speeds and memory sizes).

Agreed. To me, the most serious problem is the toplevel traffic.

> Also, as I explained, you get more "bang for your dollar" out of increasing
> the number of levels than you do out of increasing the size of each level. As
> a very simplified example, let's assume you have a 24-bit address. You can
> either make it two 12-bit fields, or three 8-bit fields. Either gives you the
> same number of total destination addresses available - 2^24. However, the
> former would take 2*(2^12) routing table entries in a router, or 8K, whereas
> the latter would take 3*(2^8), or 768; i.e. a order or magnitude less state!

What? Do you think memory for 16M is so significant?

> I doubt that normal (i.e. non-planar) graph of fixed degree (i.e. one in which
> nodes have the same average number of arcs to neighbouring nodes, independant
> of the size of the graph) is really an optimal model for the network either.
> The problem, as you have pointed out, is that not all links are equally
> likely.

OK.

> I.e., if Pij is the probability of a link between nodes i and j (thanks for
> the notation, Yakov :-), in a real network, Pij is *not* a constant over all
> j for a given i. Rather, nodes which are "closer" (in the physical space
> geometry) are more likely to have links than those further away. So, even if
> the average node does have a constant number of arcs, they are not distributed
> randomly across the graph.
> 
> This will move us off the O(logN) point, and toward the O(sqrtN) point.
> However, without a probability model, and lot of math (or simulation), neither
> of which I have time for, it's impossible to say how far.

It is O(sqrtN). Isn't it obvious?

> My guess, based on
> looking at real-world* networks like the ARPANet, is that it will be pretty
> close to the results for true fully random graphs, in future real networks.

That's only true if there is were single, high-bandwidth backbone, which is
NOT our favourite model of MESH.

Though you may think T3 bacbone is fast enough forever, a single giga-bit
backbone won't have enough capacity in the near future.

> The reason is simple; long path lengths are a *bad thing*. People will put in
> enough non-local links to bring the path length down, but my thinking (based
> on recollection, again) is that its an asymptotic, diminishing-returns type of
> thing. It doesn't take a lot of long links to really whack down the diameter
> (and thus the average path length).

You completely misunderstand the issue.

The diminishing-return makes diameter larger, not smaller.

Only to make the average diameter O(N^(1/3)), that is, to have topology
related to three dimensional space, you need a lot of links (the
number depends on locality of traffic) with length

	(size of the Earth)*O(N^(1/6))

which is transcontinentally lengthy.

So, practically speaking, the diameter will be O(N^(1/2)).

> I know BBN did a lot of work in this
> area, modelling the ARPANet to see where to add new links.  Perhaps someone
> there can report briefly on what they recall?

Perhaps, they think T3 is fast enough.

> Again, my specific recollection of that work is that it doesn't take many
> non-local links to really help. Of course, then you have load issues on those
> links, but that's another story...

The load on the top level link is the issue.

							Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa23839; 13 Feb 94 16:22 EST
Received: from pizza by PIZZA.BBN.COM id aa25676; 13 Feb 94 16:02 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa25672; 13 Feb 94 16:00 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa22892; 13 Feb 94 15:58 EST
Received: by ginger.lcs.mit.edu 
	id AA19785; Sun, 13 Feb 94 15:53:15 -0500
Date: Sun, 13 Feb 94 15:53:15 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9402132053.AA19785@ginger.lcs.mit.edu>
To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp
Subject: Re: Analysis of DMF's in new datagram mode
Cc: nimrod-wg@BBN.COM

    There is no such thing as "a potential DMF".

A "potential DMF" is a DMF which can be set up (presumably in reponse to
demand), but has not been. (The difference is important since there is not
state in the routers along the path of the potential DMF.)


    > even if it *were* O(IsqrtI), that would still not be impossible, since I
    > don't expect to see massive growth in I (the number of one-level-down
    > -objects in the average area).

    At the top level, I think "I" be as large as 10,000 or 100,000.

I seriously doubt it would be that large; there aren't that many country codes
in the world phone system, and it seems to work just fine. If it does get that
large, you need to introduce another layer of hierarchy; Kleinrock/Kamoun type
analysis will show that the resulting routing inefficiency is minimal.


    Perhaps, I should have written:

    > That is, there should be a lot of first level areas and they should be
                                        ^^^^^
    > connected with a lot of links. Your conffiguration, which assumes small
    > areas, do not allow such a coniguration.

    I have assumed the first level area is the entire world, which you should
    be refering as the zero-th level area. Is that your terminology?

No. This point is not completely decided (we may not number the layers at all)
but my personal bias would be to number the layers from the bottom up, so that
the top layer would not have a fixed number; we could add another one if the
system gets too large. I just refer to it as the "top" layer.


    > This will move us off the O(logN) point, and toward the O(sqrtN) point.
    > However, without a probability model, and lot of math (or simulation),
    > neither of which I have time for, it's impossible to say how far.

    It is O(sqrtN). Isn't it obvious?

No, it's not. As soon as you start getting *some* non-local links, the
diameter of the graph is greatly reduced, and I expect you will find there is
a relationship between the diameter, and the average path length.

Do you in fact have probability model which are you using for Pij, and a
simulation based on it to show average path length, which leads you to the
O(sqrtN) answer, for graphs which are neither planar, nor fully random?

    > My guess, based on looking at real-world* networks like the ARPANet, is
    > that it will be pretty close to the results for true fully random
    > graphs, in future real networks.

    That's only true if there is were single, high-bandwidth backbone, which
    is NOT our favourite model of MESH. Though you may think T3 bacbone is
    fast enough forever, a single giga-bit backbone won't have enough capacity
    in the near future.

The ARPANet did not have a single backbone. It *was* a mesh.


    Only to make the average diameter O(N^(1/3)), that is, to have topology
    related to three dimensional space, you need a lot of links (the number
    depends on locality of traffic) with length

	(size of the Earth)*O(N^(1/6))

    which is transcontinentally lengthy.
    So, practically speaking, the diameter will be O(N^(1/2)).

This result, which you state without an analysis, does not sound plausible. I
agree that in a *fully planar* graph, the diameter will be O(sqrt(N)).

 (For those who don't follow this, let's look at a very simplistic model, of a
  two-dimensional representation of a graph, with a square lattice of nodes at
  fixed spacing, each connected to its four nearest neighbours. To further
  simplify things, lets make this graph circular, and the physical (i.e.
  representational) and arc (i.e. graph metric) distance between each node one.

  The physical area of this graph will N, so the physical diameter of this
  graph will be sqrt(N/pi); i.e. that many arcs (since the nodes are
  separated by one physical unit, and also one arc unit). So, dropping the
  constant factor of pi, O(D) = O(sqrt(N).

  This simple analysis doesn't prove it's true for all planar graphs, of
  course, but it tells us that it's true for at least one planar graph.

  You can do a similar analysis to prove that a three-dimensional lattice has
  a diameter of O(N ^ 1/3), without *any* non-local links, but this result
  isn't very interesting since a three-dimensional lattice is not, I think,
  a very exact model of the network.)

However, I don't agree that it will take "a lot of links" to improve it, but
don't have the mathematical tools, or the time to do a simulation, to prove
it. I am satisfied with results gained in operation of a real mesh network,
the ARPAnet.

Anyway, what does the traffic model (the "locality of traffic") have to do
with the diameter of the graph, a fixed property of the graph? The diameter is
the diameter. If you're talking about the average path length for a given
traffic pattern, fine, but then you need not only a model of the link
probabilty (i.e. the probability of their being a link between nodes i and j),
but the traffic density (i.e. the number of packets from node i to node j).


	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa24537; 13 Feb 94 16:57 EST
Received: from pizza by PIZZA.BBN.COM id aa25822; 13 Feb 94 16:34 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa25818; 13 Feb 94 16:32 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa23976; 13 Feb 94 16:29 EST
Received: by ginger.lcs.mit.edu 
	id AA19911; Sun, 13 Feb 94 16:24:03 -0500
Date: Sun, 13 Feb 94 16:24:03 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9402132124.AA19911@ginger.lcs.mit.edu>
To: jnc@ginger.lcs.mit.edu, mohta@necom830.cc.titech.ac.jp
Subject: Re: New datagram mode
Cc: nimrod-wg@BBN.COM

    > To me, a flow is a series of packets which belong together

    Flow is a unit of resource reservaiton.

That's one aspect of a flow, but not the only one. The path of the flow is
another, one that Nimrod deal with.


    > Yes, the forwarding based on locators would happen only in "active
    > routers".

    So my proposal is to call your "active router" just "router". The rest
    is just bridges.

The problem is that for most of the world, the term "bridge" already has a
meaning; i.e. something that forwards a packet based on the *local network
header*. Since all the device here are looking at the *internetwork* header,
use of the term "bridge" will, I feel, prove confusing.


    > But it is only performed once (so it does not even add any delay), and
    > the cost [of setting up a DMF is] shared between any number of packets.

    Unless you assume connected UPD or things like that, the connection won't
    be reused frequently enough.

A DMF can be thought of as approximately equivalent to a routing table entry.
Its use is not limited to a single source host, or anything like that. Once
set up, it will be used as often as a routing table entry will, with the same
sharing of overhead.


    I don't think any scheme with thining can find the "optimal" path.
    With my scheme, the source can choose the best path using the available
    routing data.

That "available routing data" will have been thinned, inevitably.


    > It still probably won't be as good as the new datagram mode, since you
    > will have to head for the particular border router (named by its EID) in
    > your locator, not the closest one into that area (which may or may not
    > be the optimal entry router for the ultimate destination, sigh, another
    > complication).

    If you want to have some preference, it is easy to do with DNS completely
    statically.

Such a static ordering does not work, since the optimality of a given border
router (from a set of border routers) is dynamic, and depends on the loction
of the source of the traffic.


    > As far as I know, most proposed resource allocation architectures allow
    > reserved, but unused, bandwidth to be given to "capacity available"
    > traffic.

    If you reserve resources only to forward connectionless packets, there
    will be eventually NO traffic which use "best effort" strategy.

I'm not proposing that we reserve resources only for connectionless packets.
Use of DMF's has the side benefit of being a simple mechanism to see that
connectionless packets get included in a resource allocation architecture
without a lot of special mechanism.

    Connectionless packets should be sent as the "capacity availale" traffic.

Perhaps, but if the avilable bandwith is all allocated, and used, by user
flows, the datagram traffic will get dropped. That's why I'd prefer to see
the datagrams guaranteed a certain minimum % of the link.


	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa14965; 14 Feb 94 3:42 EST
Received: from pizza by PIZZA.BBN.COM id aa27703; 14 Feb 94 3:22 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa27699; 14 Feb 94 3:20 EST
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa12060;
          14 Feb 94 3:19 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Mon, 14 Feb 94 17:14:03 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9402140814.AA28681@necom830.cc.titech.ac.jp>
Subject: Re: New datagram mode
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Date: Mon, 14 Feb 94 17:14:01 JST
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
In-Reply-To: <9402132124.AA19911@ginger.lcs.mit.edu>; from "Noel Chiappa" at Feb 13, 94 4:24 pm
X-Mailer: ELM [version 2.3 PL11]

>     > To me, a flow is a series of packets which belong together
> 
>     Flow is a unit of resource reservaiton.
> 
> That's one aspect of a flow, but not the only one. The path of the flow is
> another, one that Nimrod deal with.

DMF is not the path of the flow.

>     So my proposal is to call your "active router" just "router". The rest
>     is just bridges.
> 
> The problem is that for most of the world, the term "bridge" already has a
> meaning;

I think your "inactive router" matches your definition of "bridge".

> i.e. something that forwards a packet based on the *local network
> header*. Since all the device here are looking at the *internetwork* header,
> use of the term "bridge" will, I feel, prove confusing.

The problem is that your flow ID for DMF is pretty much like *local
network header*.

As you say:

> A DMF can be thought of as approximately equivalent to a routing table entry.

it's like interface ID.

>     > the cost [of setting up a DMF is] shared between any number of packets.
> 
>     Unless you assume connected UPD or things like that, the connection won't
>     be reused frequently enough.
> 
> A DMF can be thought of as approximately equivalent to a routing table entry.
> Its use is not limited to a single source host, or anything like that. Once
> set up, it will be used as often as a routing table entry will, with the same
> sharing of overhead.

"once set up", it will be OK.

So, my point is that, if you don't provide a large number of DMFs,
load will concentrate. If you provide a large number of DMFs, it won't
be reused frequently enough, so that set-up cost will be significant.

>     With my scheme, the source can choose the best path using the available
>     routing data.
> 
> That "available routing data" will have been thinned, inevitably.

Of course.

>     If you want to have some preference, it is easy to do with DNS completely
>     statically.
> 
> Such a static ordering does not work, since the optimality of a given border
> router (from a set of border routers) is dynamic,

It's merely a preference, optimization and does not have to work accurately.

> and depends on the loction of the source of the traffic.

Such location dependence is completely detectable by the routing data
availale to the source dynamically.

>     If you reserve resources only to forward connectionless packets, there
>     will be eventually NO traffic which use "best effort" strategy.
> 
> I'm not proposing that we reserve resources only for connectionless packets.

No, of course. You are proposing that you reserve resources even for
                                                            ^^^^
DMFs for connectionless packets.

> Use of DMF's has the side benefit of being a simple mechanism to see that
> connectionless packets get included in a resource allocation architecture
> without a lot of special mechanism.

What? Didn't you assume DMF shared? Don't you think there are a lot of
resource requirements? Do you want to provide thousnads of DMFs
between the same two routers to accomodate various resource requirements?

>     Connectionless packets should be sent as the "capacity availale" traffic.
> 
> Perhaps, but if the avilable bandwith is all allocated, and used, by user
> flows, the datagram traffic will get dropped. That's why I'd prefer to see
> the datagrams guaranteed a certain minimum % of the link.

OK, you think no bandwidth could be reserved for "capacity available"
packets, which is a common misunderstanding.

If there is 100Mbps link and 10 requests for 10Mbps communication, you
think all of the requueests should be granted.

But, no, you don't have to allocate all the bandwidth for bandwidth
assured communication. Some of the bandwidth, say 50%, should always
be reserved for "capacity availale" traffic.

Still, it is possible for some carriers to have several grades of "capacity
availale" communication. That is, reserve 20% of bandwidth for grade 1 and
2 communication. If the band is full, grade 1 communication is allowed to
use another 20 %. But, still, no minimum bandwidth is reserved even for
grade 1 "capacity available" communication, which is completely
different from assigning resource to DMFs.

That is, some resource should be reserved for connectionless communication,
which does not mean each DMF reserve bandwidth..

						Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa18519; 14 Feb 94 4:38 EST
Received: from pizza by PIZZA.BBN.COM id aa27853; 14 Feb 94 4:08 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa27849; 14 Feb 94 4:04 EST
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa15419;
          14 Feb 94 4:02 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Mon, 14 Feb 94 17:56:44 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9402140856.AA28765@necom830.cc.titech.ac.jp>
Subject: Re: Analysis of DMF's in new datagram mode
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Date: Mon, 14 Feb 94 17:56:42 JST
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
In-Reply-To: <9402132053.AA19785@ginger.lcs.mit.edu>; from "Noel Chiappa" at Feb 13, 94 3:53 pm
X-Mailer: ELM [version 2.3 PL11]

>     There is no such thing as "a potential DMF".
> 
> A "potential DMF" is a DMF which can be set up (presumably in reponse to
> demand), but has not been. (The difference is important since there is not
> state in the routers along the path of the potential DMF.)

Suppose that 50% (any none zero value is OK) of DMFs are potential.

Then, 50% (if all DMFs are equally likely to be used. If not, the figure
changes, but it is unlikely that it is negligibly low) of the
connectionless traffic needs to instantiate the potential DMF, which
means that the average latency of connectionless traffic is as slow
as connection setup.

Thus, there is no such thing as "a potential DMF".

Q.E.D.

BTW, for potential DMF be meaningful for order analysis that is,
beyond constant factor, almost 100% of DMFs must be potential.

>     At the top level, I think "I" be as large as 10,000 or 100,000.
> 
> I seriously doubt it would be that large; there aren't that many country codes
> in the world phone system, and it seems to work just fine.

So is DNS toplevel domains. So what?

> If it does get that
> large, you need to introduce another layer of hierarchy; Kleinrock/Kamoun type
> analysis will show that the resulting routing inefficiency is minimal.

Then, Kleinrock/Kamoun type analysis is broken.

> system gets too large. I just refer to it as the "top" layer.

OK. Then, my statement should be stated as:

	That is, there should be a lot of top level areas and they should be
					  ^^^
	connected with a lot of links. Your conffiguration, which assumes small
	areas, do not allow such a coniguration.

>     > This will move us off the O(logN) point, and toward the O(sqrtN) point.
>     > However, without a probability model, and lot of math (or simulation),
>     > neither of which I have time for, it's impossible to say how far.
> 
>     It is O(sqrtN). Isn't it obvious?
> 
> No, it's not. As soon as you start getting *some* non-local links, the
> diameter of the graph is greatly reduced,

As I stated, yes.

	>     Only to make the average diameter O(N^(1/3)), that is, to have topology
	>     related to three dimensional space, you need a lot of links (the number
	>     depends on locality of traffic) with length
	> 
	> 	(size of the Earth)*O(N^(1/6))

> and I expect you will find there is
> a relationship between the diameter, and the average path length.

The problem is that the number of such links depends on locality of traffic,
which affect the average path length.

> Do you in fact have probability model which are you using for Pij, and a
> simulation based on it to show average path length, which leads you to the
> O(sqrtN) answer, for graphs which are neither planar, nor fully random?

It depends on traffic pattern.

With the assumption is that the percentage of truely global traffic is non
zero, it is O(sqrt(N)).

>     > My guess, based on looking at real-world* networks like the ARPANet, is
>     > that it will be pretty close to the results for true fully random
>     > graphs, in future real networks.
> 
>     That's only true if there is were single, high-bandwidth backbone, which
>     is NOT our favourite model of MESH. Though you may think T3 bacbone is
>     fast enough forever, a single giga-bit backbone won't have enough capacity
>     in the near future.
> 
> The ARPANet did not have a single backbone. It *was* a mesh.

Then, it should have had O(sqrt(N)) property. But, it means nothing unless
N is large enough.

>     Only to make the average diameter O(N^(1/3)), that is, to have topology
>     related to three dimensional space, you need a lot of links (the number
>     depends on locality of traffic) with length
> 
> 	(size of the Earth)*O(N^(1/6))
> 
>     which is transcontinentally lengthy.
>     So, practically speaking, the diameter will be O(N^(1/2)).
> 
> This result, which you state without an analysis, does not sound plausible.

It is obvious to me, but, I don't mind if you can show other result..

> I
> agree that in a *fully planar* graph, the diameter will be O(sqrt(N)).

You can convert locally non-planar, globally planar graph into fully
planar graph without changing number of vertices and links beyond
constant factor.

Simply replace non-planar part with a single vertix and that's it.

Unless the non-planar part contains more than O(1) vertices or linkes,
which means non-planarity is not local, the O(sqrt(N)) result won't
be affected.

>   This simple analysis doesn't prove it's true for all planar graphs,of
>   course,

You can't, as planarity assumption is not enough. You also need
distribution pattern of vertices on the plane and maximum allowable
distance between vertices.

> However, I don't agree that it will take "a lot of links" to improve it, but
> don't have the mathematical tools, or the time to do a simulation, to prove
> it. I am satisfied with results gained in operation of a real mesh network,
> the ARPAnet.

It means nothing unless N is large enough.

> If you're talking about the average path length for a given
> traffic pattern, fine,

Then, I'm fine, too.

> but then you need not only a model of the link
> probabilty (i.e. the probability of their being a link between nodes i and j),
> but the traffic density (i.e. the number of packets from node i to node j).

I think we have agreed on the point.

							Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa03630; 15 Mar 94 15:16 EST
Received: from pizza by PIZZA.BBN.COM id aa01886; 15 Mar 94 14:01 EST
Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa01882; 15 Mar 94 13:57 EST
To: nimrod-wg@BBN.COM
Subject: Draft Nimrod Architecture Document
Date: Tue, 15 Mar 94 13:54:50 -0500
From: Isidro Castineyra <isidro@BBN.COM>

I am enclosing below for your comment the draft of the architecture
document we, Noel and the three of us at BBN, have been working
on. The document is very much a working draft: everything in it is
open for discussion. Some time next week, after people have had a
chance to take a look at it, we need to decide how we want to spend
the two session we have at IETF.

Regards,
Isidro


Isidro Castineyra                                 (isidro@bbn.com)   
Bolt Beranek and Newman,  Incorporated		  (617) 873-6233
10 Moulton Street, Cambridge, MA 02138		  USA

-----------------


Nimrod Working Group                                           I. Castineyra
Internet Draft                                                 J. N. Chiappa
March 1994                                                           C. Lynn
                                                               R. Ramanathan
                                                               M. Steenstrup
                                                    Expires 1 September 1994


                      The Nimrod Routing Architecture


                            Status of this Memo


This document is an Internet Draft.  Internet Drafts are working documents
of the Internet Engineering Task Force (IETF), its Areas, and its Working
Groups.  Note that other groups may also distribute working documents as
Internet Drafts.

Internet Drafts are draft documents valid for a maximum of six months.
Internet Drafts may be updated, replaced, or obsoleted by other documents at
any time.  It is not appropriate to use Internet Drafts as reference
material or to cite them other than as a ``working draft'' or ``work in
progress''.

Please check the 1id-abstracts.txt listing contained in the internet-drafts
Shadow Directories on ds.internic.net, nic.nordu.net, ftp.nisc.sri.com, or
munnari.oz.au to learn the current status of any Internet Draft.

This Internet Draft will be submitted to the RFC editor as an architecture
specification.  Distribution of this Internet Draft is unlimited.  Please
send comments to nimrod-wg@bbn.com.


                                  Abstract


We present a scalable internetwork routing architecture, called Nimrod.  The
Nimrod architecture is designed to accommodate an internetwork of arbitrary
size and with heterogeneous service requirements and restrictions and to
admit incremental deployment throughout an internetwork.  The key to
Nimrod's scalability is its ability to represent and manipulate
routing-related information at multiple levels of abstraction.


Internet Draft              Nimrod Architecture                February 1994


Contents


1 Introduction                                                             1

  1.1 Constraints of the Internetworking Environment  . . . . . . . . . .  2

  1.2 The Basic Routing Functions . . . . . . . . . . . . . . . . . . . . 3

  1.3 Scalability Features  . . . . . . . . . . . . . . . . . . . . . . .  4

    1.3.1Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . .  5

    1.3.2Restricting Information Distribution . . . . . . . . . . . . . .  5

    1.3.3Selecting Feasible Routes  . . . . . . . . . . . . . . . . . . .  6

    1.3.4Caching  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  6

    1.3.5Limiting Forwarding Information  . . . . . . . . . . . . . . . .  6

  1.4 The Internet  . . . . . . . . . . . . . . . . . . . . . . . . . . .  7

    1.4.1Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . .  8

2 Architectural Overview                                                   8

  2.1 Endpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

  2.2 Maps  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  9

    2.2.1Connectivity Specifications  . . . . . . . . . . . . . . . . . .  9

  2.3 Nodes and Arcs  . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.3.1Internal Maps  . . . . . . . . . . . . . . . . . . . . . . . . . 11

  2.4 Locators  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11


3 Physical Realization                                                    12

  3.1 Contiguity  . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

  3.2 Multiple Locator Assignment . . . . . . . . . . . . . . . . . . . . 13

  3.3 Non-Nimrod Physical Elements  . . . . . . . . . . . . . . . . . . . 15

4 Forwarding                                                              16


                                     i


Internet Draft              Nimrod Architecture                February 1994


  4.1 Indicating Policy . . . . . . . . . . . . . . . . . . . . . . . . . 17

  4.2 Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

  4.3 Flow Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

  4.4 Basic Topology Entity Chain (BTEC) Mode . . . . . . . . . . . . . . 18

  4.5 Datagram Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5 Renumbering                                                             22


6 Auxillary Functionality                                                 23

  6.1 Mobility  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    6.1.1Effects of Mobility  . . . . . . . . . . . . . . . . . . . . . . 26

    6.1.2Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    6.1.3Summary  . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

  6.2 Multicasting  . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    6.2.1Goals and Requirements . . . . . . . . . . . . . . . . . . . . . 34

    6.2.2Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    6.2.3Summary  . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

  6.3 Network Management  . . . . . . . . . . . . . . . . . . . . . . . . 40

  6.4 Security  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40


                                     ii


Internet Draft              Nimrod Architecture                February 1994


1 Introduction


Nimrod is a scalable routing architecture designed to accommodate a
continually expanding and diversifying internetwork.  First suggested by
Chiappa in [1], the Nimrod architecture has undergone revision and
refinement through the efforts of the Nimrod working group of the IETF.

The goals of Nimrod are as follows:


 1. Nimrod should support an internetwork of arbitrary size by providing
    mechanisms to control the amount of routing-related information that
    must be known globally throughout an internetwork.

 2. Nimrod should provide service-specific routing in the presence of
    multiple constraints imposed by both service providers and users.

 3. Nimrod should be incrementally deployable throughout an internetwork
    and should not require modifications to the existing IP packet format.


We have designed the Nimrod architecture to meet these goals.  The key
features of this architecture include:


 1. Representation of internetwork connectivity and services in the form of
    maps at multiple levels of abstraction.

 2. Localized route generation and selection based on maps and session
    service requirements.

 3. Source-directed as well as destination-directed packet forwarding.


We describe these features in more detail in sections 1.2 and 1.3 below.

Nimrod is a general routing architecture that can be applied to routing both
within a single routing domain and among multiple routing domains.  As a
general internetwork routing architecture designed to deal with increased
internetwork size and diversity, Nimrod is equally applicable to both the
TCP/IP and OSI environments.

In this document, we present the Nimrod architecture.  We begin with a
discussion of the requirements of internetworking and a brief overview of
Nimrod.  In the second section, we delve into the details of the Nimrod
architecture.  In the third section, we describe the routing functionality
supported by the Nimrod architecture.  A companion document is devoted to an
analysis of Nimrod deployment strategies and compatibility with existing
internetwork protocols.  Parts of the third and fourth sections are missing
in this draft.

                                     1


Internet Draft              Nimrod Architecture                February 1994


1.1 Constraints of the Internetworking Environment


Internetworks are growing and evolving systems, in terms of number,
diversity, and interconnectivity of service providers and users, and
therefore require a routing architecture that can accommodate internetwork
growth and evolution.  A complicated mix of factors such as technological
advances, political alliances, and service supply and demand economics will
determine how an internetwork will change over time.  However, correctly
predicting all of these factors and all of their affects on an internetwork
may not be possible.  Thus, the flexibility of an internetwork routing
architecture is its key to handling unanticipated requirements.

In developing the Nimrod architecture, we first assembled a list of
internetwork environmental constraints which have implications for routing.
This list, enumerated below, includes observations about the present
Internet; it also includes predictions about internetworks five to ten years
in the future.


 1. The Internet will grow to include O(10^9) networks and will retain the
    general organizational structure of backbone, regional, and local
    networks.

 2. The number of internetwork users may be unbounded.

 3. The capacity of internetwork resources is steadily increasing but so is
    the demand for these resources.

 4. Routers and hosts have finite processing capacity and finite memory,
    and networks have finite transmission capacity.

 5. Internetworks comprise different types of communications
    media---including wireline and wireless, terrestrial and satellite,
    shared multiaccess and point-to-point---with different service
    characteristics in terms of throughput, delay, error and loss
    distributions, and privacy.

 6. Internetwork elements---networks, routers, hosts, and processes---may
    be mobile.

 7. The frequency at which an entity moves is usually inversely
    proportional to the size of the entity, e.g., individual hosts are
    likely to move around more frequently than entire networks.

 8. A session may include m sources and n destinations, where m and n are
    greater than one.

 9. Service providers will specify offered services and restrictions on
    access to those services.  Restrictions may be in terms of when a
    service is available, how much the service costs, which users may

                                     2


Internet Draft              Nimrod Architecture                February 1994


    subscribe to the service and for what purposes, and how the user must
    shape its traffic in order to receive a service guarantee.

10. Users will specify session service requirements which may vary widely
    among sessions.  These specifications may be in terms of requested
    qualities of service, what they are willing to pay for these services,
    when they want these services, and which providers they wish to use.

11. Service providers and users have a synergistic relationship.  That is,
    as users develop more applications with special service requirements,
    service providers will respond with the services to meet these demands.
    Moreover, as service providers deliver more services, users will
    develop more applications that take advantage of these services.

12. Support for varied and special services will require more processing,
    memory, and transmission bandwidth on the part of both the service
    providers offering these services and the users requesting these
    services.  Hence, many routing-related activities will likely be
    performed not by routers and hosts but rather by independent devices
    acting on their behalf to store, process, and distribute
    routing-related information.

13. Users requiring specialized services (e.g., high guaranteed throughput)
    will usually be willing to incur some delay in obtaining these
    services.

14. Service providers are reluctant to introduce complicated protocols into
    their networks, because they are more difficult to manage.

15. Vendors are reluctant to implement complicated protocols in their
    products, because they take longer to develop.


Collectively, these constraints imply that a successful internetwork routing
architecture must support special features, such as service-specific routing
and component mobility in a large and changing internetwork, using simple
procedures that consume a minimal amount of internetwork resources.  We
believe that the Nimrod architecture meets these goals, and we justify this
claim in the remainder of this document.


1.2 The Basic Routing Functions


Nimrod supports distribution of link-state routing information in the form
of maps, localization of route generation and selection at session sources
and destinations, and specification of packet forwarding at the sources or
destinations.

Link-state routing information distribution permits each service provider to
have control over the services it offers, through both distributing

                                     3


Internet Draft              Nimrod Architecture                February 1994


restrictions in and restricting distribution of its routing information.  It
also gives the user (normally acting through an ``agent'') control over the
routes generated and selected using the maps and the session service
requirements.

Restricting distribution of routing information serves to reduce the amount
of routing information maintained throughout an internetwork and to keep
certain routing information private.  However, it also leads to inconsistent
routing information databases throughout an internetwork, as not all routing
information databases will be complete or identical.  We expect routing
information database inconsistencies to occur often in a large internetwork,
regardless of whether privacy is an issue.  The reason is that we expect
some devices to be incapable of maintaining the complete set of routing
information for the internetwork.  These devices will select only some of
the distributed routing information for storage in their databases.

With Nimrod, route generation and selection is a local matter under the
control of the users (normally acting through agents) and does not require
global coordination among routers.  Thus, we have placed the responsibility
for and the cost of route generation and selection on the users of a route.
Locally-controlled route selection also allows incremental deployment of and
experimentation with new routing algorithms, as route selection procedures
need not be the same at each location.

Nimrod packet forwarding permits a user (normally acting through an agent)
to exercise control in the forwarding of packets.  The user may be either a
session source or destination and may specify the forwarding path in as much
detail as the maps permit.  Source- or destination-controlled packet
forwarding enables freedom from forwarding loops, even in the presence of
routing information that is not consistent throughout the internetwork.

We note that the Nimrod architecture and Inter-Domain Policy Routing (IDPR)
[2] share in common the above features.  In developing the Nimrod
architecture, we have drawn upon experience gained with IDPR, and we expect
to be able to make use of portions of the IDPR protocols and procedures in
designing the Nimrod protocols.


1.3 Scalability Features


Nimrod must provide service-specific routing in arbitrarily large
internetworks and hence must employ mechanisms that help to contain the
amount of internetwork resources consumed by the routing functions.  We
provide a brief synopsis of each such mechanism below.  However, we note
that arbitrary use of these mechanisms does not guarantee that a scalable
routing architecture will result.  Rather, when used wisely, these
mechanisms enable one to create a scalable routing architecture.


                                     4


Internet Draft              Nimrod Architecture                February 1994


1.3.1 Clustering


The Nimrod architecture is capable of representing internetwork connectivity
and services at multiple levels of abstraction.  Abstraction of details
reduces the amount of information required for routing.

The abstraction hierarchy is formed through iterative clustering of
internetwork entities beginning with hosts, routers, and networks.  However,
Nimrod does not specify a cluster formation algorithm but instead permits
selection of the clustering criteria to apply.  Internetwork entities may be
clustered according to relationships among them, such as ``administered by
the same authority'', or so as to satisfy some objective function, such as
``minimize the expected amount of forwarding information at each router''.
New clusters may be formed by clustering together existing clusters.
However, the same clustering criteria need not be applied at each level.
Repeated clustering of entities produces a hierarchy of clusters with a
unique universal cluster that contains all others.

All entities within a cluster must satisfy at least one relation, namely
connectivity.  That is, if all entities within a cluster are operational,
then any two entities within the cluster must be connected by at least one
path that lies entirely within that cluster.  This condition prohibits the
formation of certain types of separated clusters, such as the following.
Consider the clustering relation ``belonging to the same administrative
body''.  Suppose that a company has two branches located at opposite ends of
a country and that these two branches must communicate over a public network
not owned by the company.  Then the two branches cannot be members of the
same cluster, unless that cluster also includes the public network
connecting them.

A given clustering applied to an internetwork results in an organization
related to but distinct from the physical organization of the component
hosts, routers, and networks.  When the clustering is superimposed over the
physical internetwork entities, the cluster boundaries may not necessarily
coincide with host, router, or network boundaries.  Nimrod performs its
routing functions with respect to the abstraction hierarchy resulting from a
clustering, not with respect to the physical realization of the
internetwork.  In fact, Nimrod need not even be aware of the physical
components of an internetwork.  Network management functions are the only
ones that require knowledge of physical components of an internetwork.


1.3.2 Restricting Information Distribution


The Nimrod architecture supports restricted distribution of routing-related
information, both to reduce resource consumption associated with such
distribution and to permit information hiding.  Each cluster determines the
portions of its routing information to distribute and the set of entities to
which to distribute this information.  We suggest that each cluster

                                     5


Internet Draft              Nimrod Architecture                February 1994


automatically advertise, to its siblings (i.e., those clusters with a common
parent), information that applies to the cluster as a whole.  In response to
demand, the cluster may advertise information about specific portions of the
cluster or information that applies only to specific users.  Moreover,
recipients of routing-related information may selectively discard this
information.  For example, an entity need not retain information for a
cluster that denies it access to services.


1.3.3 Selecting Feasible Routes


Generating routes that satisfy multiple constraints is usually an
NP-complete problem and hence a computationally intensive process.  With
Nimrod, only those users that require routes with special services need
assume the computational load associated with generation and selection of
such routes.  Moreover, the Nimrod architecture allows individual entities
to choose their own route generation and selection algorithms.  To reduce
the amount of processing required for route generation, one should choose an
algorithm that produces feasible but not necessarily optimal routes.


1.3.4 Caching


The Nimrod architecture encourages caching of acquired routing-related
information in order to reduce the amount of resources consumed and delay
incurred in obtaining the information in the future.  The set of routes
generated as a by-product of generating a particular route is one example of
routing-related information that is amenable to caching; future requests for
any of these routes may be satisfied directly from the route cache.
However, as with any caching scheme, the cached information may become stale
and its use may result in poor quality routes.  Hence, one must consider the
expected duration of the usefulness of different types of routing-related
information, in determining whether to cache the information and for how
long.


1.3.5 Limiting Forwarding Information


The Nimrod architecture supports two separate mechanisms for containing the
amount of forwarding information that must be maintained per router.  The
first mechanism is the ability to multiplex, over a single path or tree,
multiple traffic flows with similar service requirements originating in a
single source cluster and destined for the same destination clusters.  The
second mechanism is the installation and retention of forwarding information
only for active traffic flows.

With Nimrod, the service providers and users share responsibility for the
amount of forwarding information in an internetwork.  Users have control

                                     6


Internet Draft              Nimrod Architecture                February 1994


over the establishment of paths, and service providers have control over the
maintenance of paths.  This approach is different from that of the current
Internet, in which the routing procedure itself establishes the forwarding
information in routers, based upon entity reachability and not upon demand
for communication with a given entity.


1.4 The Internet


The current IP-based routing architecture has served the Internet well for
many years.  However, this architecture places bounds on the size and
structure of the internetwork that it can accommodate.  All currently
deployed IP routing procedures express user reachability in terms of IP
address, and all packets are forwarded according to destination IP address.
IP addresses have fixed internal organization---a portion containing network
number and a portion containing subnet and host number, whose lengths depend
on address class---and a fixed overall length of 32 bits.  Addresses are
allocated in blocks by network number.  Together, these characteristics
create the following problems in a large internetwork:


 1. Routing and forwarding information explosion.  As Internet routing
    information includes reachability at the IP network level and packet
    forwarding is based on destination IP address, the amount of memory
    required to store routing and forwarding information grows at the rate
    at which new networks are added to an internetwork.  In the Internet,
    this rate is approximately an annual doubling.  Moreover, the amount of
    processing and transmission bandwidth required to handle the routing
    information also increases with this internetwork growth.

 2. Inefficient use of IP address space.  As IP addresses are assigned in
    blocks according to network number, only a small subset of all the host
    addresses associated with a given network number may be in use at one
    time.  Nevertheless, the unused host addresses associated with that
    network number are not available to other hosts on the other networks
    in an internetwork and hence constitute wasted IP address space.

 3. Insufficient IP address space.  Regardless of the internal organization
    of the IP address space, there is a hard limit of 232 possible distinct
    IP addresses.  This will not be enough to accommodate all of the users
    associated with the 109 networks projected for the future Internet.


The IETF community has devoted considerable effort to developing solutions
to the problems associated with IP-based addressing in large internetworks.
In fact, the IETF recently formed a group whose charter is to specify the
requirements for the next generation of the IP protocol and to review
proposed solutions.  The current proposed solutions to the IP addressing
problems include but are not limited to reassignment of existing IP


                                     7


Internet Draft              Nimrod Architecture                February 1994


addresses for more efficient use of IP address space, increasing the size
and changing the structure of the IP address space, and replacing IP
addresses with ISO NSAP addresses.  These solutions vary according to the
amount of implementation and deployment effort required and the period over
which they would serve their purpose.  None of the proposed solutions has
yet been selected for integration into the Internet.

In developing the Nimrod routing architecture, we have been more concerned
with providing routing in a large and dynamic internetwork than with
restructuring the existing IP address space.  We believe that an excellent
solution to the IP addressing problems will emerge from the IETF effort, and
in a companion document we show that the Nimrod routing architecture is
compatible with each of the contending IP addressing solutions.


1.4.1 Deployment


The protocols based on the Nimrod architecture should be incrementally
deployable, permitting a gradual and manageable introduction over time and
throughout an internetwork.  These protocols could be deployed in isolated
areas within the Internet, for example within selected administrative
domains.  To reach hosts external to a Nimrod domain, traffic from internal
hosts would use Nimrod forwarding within the domain and normal IP forwarding
from the domain exit point to the external host.  Routing information about
such a Nimrod domain would be distributed outside of the domain using an
existing inter-domain routing protocol such as IDRP [3].

Although the Internet will ultimately require a new IP packet format that
includes new forwarding information as well as larger addresses, the Nimrod
protocols could be deployed in the current Internet without changing the
current IP packet format.  Nimrod's packet forwarding requires packets to
carry location information not supplied by the current IP packet header
format.  However, this additional forwarding information could appear in an
encapsulating header added by Nimrod routers acting on behalf of hosts and
interpreted only by those routers.  Refer to section 4 for a complete
description of the strategies for deploying Nimrod in an internetwork.


2 Architectural Overview


Nimrod is a hierarchical, map-based routing architecture that has been
designed to support a wide range of user requirements.  It is implemented as
a set of protocols and distributed databases.  Nimrod's main function is to
manage in a scalable fashion how much information about the network is
required to choose a route for a traffic stream given the stream's
description and requirements (both quality of service requirements and
policy requirements).  In other words, to manage the trade-off between
amount of information about the network and route quality.  The following
sections describe the basic architectural concepts used in Nimrod.

                                     8


Internet Draft              Nimrod Architecture                February 1994


2.1 Endpoints


The basic entity in Nimrod is the endpoint.  An endpoint represents a user
of the network layer---for example, a transport layer entity.  Each endpoint
has at least one endpoint identifier (EID). Any given EID corresponds to a
single endpoint.  EIDs are globally unique, relatively short bit
strings---for example, small multiples of 64 bits.  EIDs have no topological
significance whatsoever.  For ease of management, EIDs might be organized
hierarchically, but this is not required.  EIDs have a second form, the end
point label (EL). ELs are ASCII strings of unlimited length, structured to
be used as keys in a distributed database (much like DNS names).
Information about an endpoint---for example, how to reach it---can be
obtained by querying this distributed database---the Nimrod Locator Server
(NLS)---using the endpoint's label as key.


2.2 Maps


The basic data structure used for routing is the map.  A map expresses the
available connectivity between different points of a network.  Different
maps can represent the same region of a physical network at different levels
of detail.

A map is a graph composed of nodes and arcs.  Properties of nodes and arcs
are contained in attributes associated with them.  Nimrod necessarily
includes languages to specify connectivity and to describe maps.

Maps are used by route servers to generate routes.  In general, it is not
required that different route servers have consistent maps.  Route servers
can be co-located with routers or be independent entities.  Each host has
access to one or more route servers.


2.2.1 Connectivity Specifications


By connectivity between two points we mean the available services and the
restrictions for their use.  The following are examples of connectivity
specifications:


  o ``Between these two points, there exists best-effort service with no
    restrictions.''

  o ``Between these two points, guaranteed 10 ms.  delay can be arranged
    for traffic streams whose rate is below 1 Mbyte/sec and that have low
    (specified) burstiness.''


                                     9


Internet Draft              Nimrod Architecture                February 1994


  o ``Between these two points, it is offered best-effort service as long
    as the traffic originates and is destined to research organizations.''


2.3 Nodes and Arcs


A node represents a region of the physical network.  The region of the
network represented by a node can be as large or as small as desired:  a
node can represent a continent as well as a process running inside a host.
Moreover, as explained in section 3, a region of the network can
simultaneously be represented by more than one node.

A node has zero or more distinguishable border points to which arcs can be
attached.

There are two kinds of arcs:  unidirectional and multipoint.


Unidirectional Arcs: An unidirectional arc has two distinguishable
connecting points:  a head and a tail.  The head and tail of a
unidirectional arc are each connected to a border point of a node.  The
presence of a unidirectional arc between two given border points specifies
that traffic can flow between those two points in the direction indicated by
the arc (from tail to head).  An unidirectional arc has connectivity
attributes that specify the types of service offered by that arc, and the
restrictions associated with the use of these services.  The border points
associated with the head and tail of an unidirectional arc may belong to the
same node---this arc represents ``transit'' traffic through that node.


Multipoint Arcs: A multipoint arc has two or more distinguishable connecting
points.  Each connecting point of a multipoint arc is connected with a
border point of a node.  A multipoint arc has connectivity attributes that
specify the types of service offered by that arc.  The presence of a
multipoint arc indicates that the services indicated by that arc's
connectivity attributes are offered between any two border points associated
with that arc.  Given a map, the border points connected by an arc can
belong to different nodes or to the same node.  When all the endpoints
connected by a given arc belong to the same node, that arc is said to be a
``transit'' arc of that node.

The distinguishable components of a map are called basic topological
entities (BTEs):  nodes, the border points of nodes, arcs, the connecting
points of arcs, and the connectivity specifications of arcs.


                                     10


Internet Draft              Nimrod Architecture                February 1994


2.3.1 Internal Maps


As part of its attributes, a node can have zero or more internal maps.  A
route server can obtain a node's internal maps---or any other of the node's
attributes, for that matter---by requesting that information from a
representative of that node---for example, a route server associated with
that node can be such a representative.  A node's representative can in
principle reply with different internal maps to different
requests---because, for example, of security concerns.  This implies that
different route servers in the network might have different sets of internal
maps for the same node.

Given a map, a router can obtain a more detailed map of the network by
substituting one of the map's nodes by one of that node's internal maps.
This process could be continued recursively.  Presumably, a route server
would expand nodes in the region of the map of current interest.  Nimrod
defines standard internal maps that are intended to be used for specific
purposes.

A standard internal maps is the ``transit'' map.  This map consists
exclusively of the border points of that node and unidirectional and
multipoint transit arcs that interconnect the node's border points.  This
map specifies the services available between the border points of a node.
It is requested and used when a route server intends to route traffic
*through* a given node.  The degree at which a transit map describes the
true capabilities of a given node can be determined by the number and types
of arcs included in this map.  A transit map---not containing nodes---cannot
be further expanded.

A second standard map is the ``detailed'' map.  This map consists of both
nodes and arcs.  It is intended to give more detail about the region of the
network represented by the original node.


2.4 Locators


A locator is a string of binary digits that identifies a BTE in a map.
Different BTEs have necessarily different locators.  A given BTE is assigned
only one locator.  A given physical element of the network might implement
more than one BTE---for example, a router that is part of two different
nodes.  Though this physical element might therefore be associated with more
than one locator, the BTEs that this physical element implements has each
only one locator.  Locators specify *where* a BTE is in the network.

A node is said to own those locators that have as a prefix the locator of
the node.  In a node that has an internal map, the locators of all BTEs in
this internal map are prefixed by the locator of the original node.  The
locators of a node's border points are also prefixed by the node's locator.
A locator belongs to a node in a map if the node's locator is, of all nodes

                                     11


Internet Draft              Nimrod Architecture                February 1994


in that map, the longest prefix to that locator.  For example, given a node
with locator ABCD whose internal map contains a node with locator ABCDE,
locator ABCDEF belongs to the inner node---the node with locator ABCDE.
Given that the nodes in a map have different locators, a locator can belong
to at most one node in any map.  For any node, any BTE whose locator is
prefixed by that node's locator is either one of the node's border points or
part of one of the node's internal maps.

All routing map information is expressed in terms of locators, and outing
selections are based on locators.  EID's are not used in making routing
decisions---see section 4.


3 Physical Realization


We model the network as composed of physical elements:  router and hosts;
and communication links.  The links can be either point-to-point or
multi-point (e.g., ethernets, X.25 networks, IP-only networks, etc.).  A
Nimrod router implements the set of Nimrod protocols.

The physical representation of a network has associated with it a Nimrod
map.  This Nimrod map is a function not only of the physical network, but
also of the configured associations between elements (locator assignment)
and of the configured connectivity (attributes).  Nimrod routers and hosts
appear as nodes in a map at the right level of detail.  Similarly, links
appear as arcs in a map at the right level of detail.


3.1 Contiguity


It is required that locators that share a prefix be assigned to a contiguous
region of the network.  That is, two elements of the network that have been
assigned locators that share a prefix should be connected to each other with
elements that themselves have been assigned locators with that prefix.  The
main consequence of this requirement, and it is not a trivial one, is that
``you cannot take your locator with you.''

As an example of this, see figure 1, consider two providers x.net and y.net
which appear in a Nimrod map as two nodes with locators A and B. Assume that
x.net and y.net are not directly connected.  Assume that corporation z.com
was originally connected to the first provider.  Endpoints within z.com
have, therefore, been assigned A-prefixed locators.  Corporation z.com
decides to change providers---severing its physical connection to x.net.
The connectivity requirement implies that after the provider change has
taken place endpoints of corporation z.com will have been assigned
B-prefixed locators and that it is not possible for them to receive data
destined to A-prefixed locators through y.net, as there exists no direct
connection between x.net and y.net.


                                     12


Internet Draft              Nimrod Architecture                February 1994


          +++++++++                        +++++++++
          +       +                        +       +
          + x.net +                        + y.net +
          +       +                        +       +
          +       +                        +       +
          +++++++++                        +++++++++
                                          *
                                         *
                                        *
                                       *
                                      *
                                     *
                          +++++++++
                          +       +
                          + z.com +
                          +       +
                          +       +
                          +++++++++


             Figure 1:  Connectivity after switching providers


This implies, among other things, that cacheing locators must be done
carefully.  The connectivity requirement simplifies routing information
exchange:  if it was permitted for z.com to receive A-prefixed locators
through y.net, it would be necessary that a map that contains node B include
information about the existence of a group of A-prefixed locators inside
node B. Similarly, a map including node A, should include information that
the set of A-prefixed locators asigned to z.com cannot be found within A.
The more situations like this happen, the more the hierarchical nature of
Nimrod is subverted to ``flat routing.''  The contiguity requirement can
also be expressed as ``EIDs are stable, locators are ephemeral.''


3.2 Multiple Locator Assignment


Network elements can be assigned more than one locator.  Consider the
example of figure 2, which shows a physical network composed of routers (RA,
RB, RC, and RD), hosts (HA, HB, and HC), and communication links.  The
figure also shows the locators assigned to hosts and routers.

In this figure, RA and RB have each been assigned one locator.  RC has been
assigned locators a.y.r1 and b.d.r1.  One of these locators shares a prefix
with RA's locator, the other shares a prefix with RB's locator.  Hosts HA


                                     13


Internet Draft              Nimrod Architecture                February 1994


        a.t.r1              b.t.r1
          ++              ++
         +RA+************+RB+
          ++              ++
           *             *
            *           *
             *         *
              *       *
               *     *
                *   *
                 * *
                  *
                  ++
                 +RC+  a.y.r1
                  ++   b.d.r1
                   *
           ***************************
    a.y.h1   ++       ++            ++     a.y.h2
    b.d.h2  +HA+     +RD+ c.r1     +HB+    b.d.h1
    c.h1     ++       ++            ++     c.h2
                       *
                       *
            ********************
             ++
            +HC+ c.h3
             ++


                        Figure 2:  Multiple Locators


                                     14


Internet Draft              Nimrod Architecture                February 1994


           a                       b                   c
     +++++++++++++++       +++++++++++++++         +++++++++++++++++
     +             +       +             +         +               +
     +        a.t  +       +      b.t    +         +               +
     +   ++++      +       +  ++++       +         +               +
     +   +  +**************+**+  +       +         +               +
     +   ++++      +       +  ++++       +         +               +
     +     *       +       +    *        +         +               +
     +   ++++      +       +  ++++       +         +               +
     +   +  +      +       +  +  +       +         +               +
     +   ++++ a.y  +       +  ++++ b.d   +         +               +
     +             +       +             +         +               +
     +++++++++++++++       +++++++++++++++         +++++++++++++++++


                           Figure 3:  Nimrod Map

and HB have each been assigned three locators.  HC has been assigned one
locator.  Many different Nimrod maps for this network are possible.
Depending on what communication paths have been setup between points that do
no share a prefix, different maps result.  A possible Nimrod map for this
network is given in the figure 3.

Notice that even though a.y and b.d are defined on the same hardware, no
connection is shown to exist between them.  This connection has not been
configured.  A packet given to A with associated destination locator
prefixed with ``b.d'' would have to travel from a to b via the link joining
them before being directed towards its destination.  Similarly, there is no
connection between the c node and the other two top level nodes.  If
desired, these connections could be established.  This would involve setting
up exchange of routing information.

In Nimrod, nodes and arcs represent the configured clustering and
connectivity of the network.  There is not a ``lowest level'':  it is
possible to define and advertise a map that is physically realized inside a
CPU where a node could indicate, for example, a process or a group of
processes.  The user of this map need not necessarily know or care.  (``It
is turtles all the way down!'', in [4] page 63.)


3.3 Non-Nimrod Physical Elements


A region of the network that is not Nimrod-aware but includes Nimrod-aware
routers or hosts connected to it is represented as a link (possibly a
multi-point link).  An example of this is an IP-only network that is
connected to the Nimrod internetwork via Nimrod-routers.  This network would
be modelled as a multi-point link.  Nimrod-aware hosts connected to this


                                     15


Internet Draft              Nimrod Architecture                February 1994


network are represented as nodes connected to this link.  Nimrod packets
destined for Nimrod hosts, or for Nimrod router ``on the other side of the
network,'' could be encapsulated inside IP packets.

IP-only hosts connected to this network can be reached from other IP-only
clouds by, for example, encapsulating IP packets inside packets of the
format being used by Nimrod.  Nimrod routers connecting the IP network to
the Nimrod internetwork would ``de-capsulate'' packets destined to IP-only
hosts.  IP-only hosts could, for example, be given locators prefixed by the
locator of a Nimrod router that knows how to get packets to them, this way
putting them ``inside'' the associated Nimrod router.  Other treatments are
possible:  for example, they could be given locators prefixed with the
locator of the arc that represents the IP network.  In the first case,
``within'' a router, only that router needs to know how to forward packets
to IP hosts; however, this makes this router a single point of failure.  In
the second case, all Nimrod routers connected to this arc need to know how
to forward IP packets to IP-only hosts.  To simplify packet forwarding, the
locator for an IP-only host might include the IP address of the host.


4 Forwarding


Nimrod does not specify a packet format.  It is possible to use Nimrod with
different formats, conceivably simultaneously, in the same network.  For
example, we anticipate that Nimrod can be used with the packet formats of
IPv4, SIPP and TUBA. This section specifies Nimrod's requirements on the
packet-forwarding mechanism.

Nimrod supports three forwarding modes:


 1. Flow mode:  in this mode, the packet header includes a flow-id that
    maps into state has been previously set-up in routers along the way.
    Packet forwarding when flow-state has been established is relatively
    simple:  follow the instructions in the routers' state.  Nimrod
    includes a mechanism for setting up this state.  A more detailed
    description can be found in section 4.3.

 2. BTE chain (BTEC) mode:  in this mode, packet carry a list of BTE
    locators through which the packet is required to go.  A more detailed
    description of the requirements of this mode is given in section 4.4.

 3. Datagram mode:  in this mode, every packet header carries source and
    destination locators.  Forwarding is done following procedures as
    indicated in section 4.5.


In all of these modes, the packet header also carries locators and EIDs for
the source and destinations.  In normal operation, forwarding does not take
the EIDs into account, only the receiver does.  EIDs are carried for

                                     16


Internet Draft              Nimrod Architecture                February 1994


demultiplexing at the receiver, and to detect certain error conditions.  For
example, if the EID is unknown at the receiver, the locator and EID of the
source included in the packet could be used to generate an error message
(this error message itself should probably not be allowed to be the cause of
other error messages).  Forwarding can also, use the source locator and EID
to respond to error conditions.  For example, to indicate to the source that
the state for a flow-id cannot be found.

Packets can be seen as moving between nodes in a map.  A packet's header
indicates, implicitly or explicitly, a destination locator.  In a packet
that uses either the datagram or the BTEC forwarding modes, the destination
locator is explicitly indicated in the header.  In a packet that uses the
flow forwarding mode, the destination locator is implied by the flow-id and
the distributed state in the network (it might also be included explicitly).
Given a map, a packet moves to the node in this map to which the associated
destination locator belongs to.  If the destination node has a ``detailed''
internal map, the destination locator should belong to one of the nodes in
this internal map (otherwise it is an error).  The packet goes to this node
(and so on, recursively).


4.1 Indicating Policy


A datagram-mode packet can indicate a limited form of policy routing by the
choice of destination and source locators.  For this choice to exist, the
source and destination endpoints must have several locators associated with
them.  This type of policy routing is capable of, for example, choosing
providers.

A BTE chain (BTEC) packet indicates policy by specifying the BTEs that the
packet should traverse.  Strictly speaking, there is no policy information
included in the packet header:  in principle, it is not possibly to
determine what criteria were used to select the route by looking at the
header; the packet header only contains the results of the route generation
process.  Similarly, in a flow mode packet, policy is implicit in the chosen
route.


4.2 Trust


A node that does not divulge its internal map can work internally any way
its administrators decide, as long as the node satisfies its external
characterization as given in its nimrod map advertisements.  Therefore, the
advertised Nimrod map should be consistent with a node's actual
capabilities.  For example, consider the network shown in figure 4 which
shows a physical network and the advertised Nimrod map.  The physical
network consists of hosts and a router connected together by an ethernet.
This node can be sub-divided into sub nodes by assigning locators as shown
in the figure and advertising the shown map.  The map seems to imply that it

                                     17


Internet Draft              Nimrod Architecture                February 1994


is possible to send packets to node a.x without touching node a.y; however,
this is actually not enforceable.

More generally it is reasonable to ask how much trust should be put in the
maps obtained by a route servers.  Even when a node is ``trustworthy,'' and
the information received from the node has been authenticated, there is
alwasy the possibility of an honest mistake.  These are difficult issues
that are not unique to Nimrod.  Many research and standard groups are
addressing them.  We plan to incorporate the output of these groups into
Nimrod as they become available.


4.3 Flow Mode


The header of a flow mode packet includes a flow-id field.  This field
identifies state that has been established in intermediate routers.  This
header might also contain locators and EIDs for the source and destination.
Nimrod includes protocols to set-up and modify flow-related state in
intermediate routers.  These protocols not only identify the requested
route, but also describe the resources requested by the flow---e.g.,
bandwidth, delay, etc.  The result of a set-up attempt might be either
confirmation of the set-up or notification of its failure.


4.4 Basic Topology Entity Chain (BTEC) Mode


Routing for a BTEC packet is specified by a list of locators carried in the
packet header.  The locators correspond to the BTEs that make the specified
path in the order that they appear along the path.  The route indicated by a
BTE packet is ``loose'' because the path is specifed in term of Nimrod BTEs,
not physical entities.  For example, a locator in the BTEC header could
correspond to a type of service between two points of the network without
specifying the physical path.

In its most detailed form, the header for a BTEC-mode packet is an
alternating list of border point locators and arcs (or connectivity
specification) locators.  (This list can be abbreviated by omitting, for
example, the locator for the head of a unidirectional arc.)

Including the locator for an unidirectional Nimrod arc in the header of an
BTEC packet specifies that the packet should go from the border point that
is associated with the tail of the arc to the border point associated with
the head of that arc.  If two consecutive arcs are both multi-point and they
interesect at more than one border points, the header of the packet should
include the locator for the desired border point.  It is required that any
two arcs whose locators appear consecutively in the header of a BTEC packet
have at least one border point in common.

Given two succesive arcs in a BTEC, if the first one is a unidirectional

                                     18


Internet Draft              Nimrod Architecture                February 1994


                         ++
                        +RA+ a.r1
                         ++
                         *
                         *
                         *
                         *
            *******************************
                ++                      ++
               +Ha+ a.x.h1             +Ha+ a.y.h2
                ++                      ++


                      Physical Network


             a             *
          +++++++++++++++++*++++++++++++++++++++
          +                *                   +
          +              ++++++                +
          +              +a.r1+                +
          +   a.x        ++++++  a.y           +
          +   ++++++++  *      * +++++++++     +
          +   +      + *        *+       +     +
          +   +      +           +       +     +
          +   +      +           +       +     +
          +   ++++++++           +++++++++     +
          +                                    +
          + ++++++++++++++++++++++++++++++++++++


                      Advertised Nimrod Map


                Figure 4:  Example of Questionable Hierarchy


                                     19


Internet Draft              Nimrod Architecture                February 1994


arc, the border point shared by the two arcs should correspond to the head
of the first arc; similarly, if the second arc is unidirectional, the border
point shared by the two arcs should correspond to the tail of the second
arc.

The source-specified routes in both flow mode, and BTEC mode, are specified
in terms of BTE's.  In flow setup, state for a flow is instantiated in the
switches which provide the BTE, but this is not done for the BTEC case.

For efficient handling of BTEC mode packets, i) the packet contains a
pointer into the source-specified BTEC, and ii) routers would maintain, for
each BTE, a pre-setup flow which provides connectivity similar to that of
the BTE (hereinafter the "BTEF", for 'BTE flow').  When a BTEC mode packet
shows up at the router at the start of the BTEF, it is ``associated'' with
the BTEF until it gets to the end of it, at which time the BTEC is
consulted, and the packet is routed onto the next BTEF.

The mechanism is quite simple.  All packets contain a ``flow-id'' field,
which is not otherwise used in BTEC packets.  The flow-id of the BTEF is
stored in that field.  The packet will then traverse the routers between the
start and end of the BTEF, being handled just like any normal packet which
is part of a flow, i.e.  by the high-efficiency flow-forwarding mechanism.
When the packet gets to the router which is the termination of the BTEF, the
flow-block will indicate that the packet needs special handling.

This is slightly unusual, in that one doesn't visualize the flow-id field in
the packet being modified during transit to refer to different flows.
However, provided the flow-id field does't overload the source EID (i.e.
use it as part of the flow-id), everything works quite well.


4.5 Datagram Mode


A realistic routing architecture must include an optimization for datagram
traffic, by which we mean user transactions which consist of single packets,
such as a lookup in a remote translation database.  Either of the two
previous modes contains unacceptable overhead if much of the network traffic
consists of such datagram transactions.  A mechanism is needed which is
approximately as efficient as the existing ``hop-by-hop'' mechanism.  Nimrod
has such a mechanism, somewhat novel in the details, and it may be even more
efficient than ``hop-by-hop''.

The scheme can be characterized by the way it divides the state in a
datagram network, between routers and the actual packets.  Most packets
currently contain only a small amount amount of state associated with the
forwarding process (``forwarding state'')---the hop count.  Nimrod proposes
that enlarging the amount of forwarding state in packets can produce a
system with useful properties.  It was partially inspired by the efficient
source routing mechanism in [SIP], and the locator pointer mechanism in
[PIP].

                                     20


Internet Draft              Nimrod Architecture                February 1994


It uses something much like the BTEF mechanism to support the datagram mode.
There is a way to guarantee a strictly non-looping path, but without a
source-route in the packet, using a slight variant of the BTEC mechanism.

In the datagram mode, the packet contains, in addition to the locally usable
flow-id field:


  o the source and destination locators, and

  o a pointer into the locators.


The pointer starts out at the lowest level of the source locator, and moves
up that locator, then to the destination locator, and then down.  In
addition to these extra fields in the packet, all routers have to contain a
minimal set of "pre-setup" flows, to certain routers which are at critical
places in the abstraction hierarchy.

(The ``pre-setup'' flows do not actually have to be setup in advance, but
can be created on demand.  There is a minimum set of flows which do have to
be *able* to be set up for the system to operate, however.  It is purely a
local decision, however, which, if any, of those flows to set up before
there is an actual traffic requirement for them.  As an efficiency move,
when a datagram requires that a flow actually be set up to handle it, the
data packet could be sent along with the flow setup request, avoiding the
round-trip delay.  We call these flows ``datagram mode flows'', or
``DMF's'', realizing that none of them need be created until actually
needed.)

The actual operation of the mechanism is fairly simple.  While going up the
source locator, each ``active'' router (i.e.  one that actually makes a
decision about where to send the packet, as opposed to handling it as part
of a flow) selects a DMF which will take the packet to the ``next higher''
level object in the source locator, advances the pointer, and sends the
packet off along that DMF. When it gets to the end of that DMF, the process
repeats, until the packet reaches a router which is at the least common
intersection of the two locators.  (e.g., for A.P.Q.R and A.X.Y.Z, this
would be when the packet reaches A).

The process then inverts, with each active router selecting a DMF which
takes the packet to the next lower object in the destination locator.  So, A
would select a flow to A.X, and once it got to A.X, A.X would select a flow
to A.X.Y, etc.

It can easily be seen that the process guarantees that the resulting path is
loop-free.  Each flow selected must necessarily get the packet closer to its
destination (since each flow selection results in the pointer being
monotonically advanced through the locator), and the flows themselves are
guaranteed not to loop when their paths are selected, prior to being set up.


                                     21


Internet Draft              Nimrod Architecture                February 1994


If the system keeps more than the minimal set of DMF's (which is just up to
one border router in internal routers, and down to each object one level
down for each border router), and keep the table sorted for efficient
lookups (e.g.  in much the same as the current routing table for hop-by-hop
datagrams is), more optimal routing will result.

For example, using the case above (a packet from A.P.Q.R to A.X.Y.Z), if
A.P.Q is actually a neighbour to A.X.Y, and maintains a flow directly from
A.P.Q to A.X.Y, then when the packet reaches A.P.Q, instead of going the
rest of the way up and down, the pointer can be set into the destination
locator at A.X.Y, and the packet sent there directly.

Traffic monitoring and analysis (again, using purely local algorithms) can
result in a database being created over time, which shows which DMF's above
and beyond the minimal set are worth keeping around.  This traffic
monitoring would also show which flows from the required minimal set of
DMF's it would be useful to set up in advance of actual traffic which needed
them.  Again, however, all these sets can be changed in a local, incremental
way, without disturbing the operation of the system as a whole.

These new fowarding state fields would not be covered by an end-end
authentication system, any more than the existing hop count field (which is
also forwarding state) would be.  This would prevent problems caused by the
fact that the contents of these fields change as the packet traverses the
network.

The forwarding of these packets is quite efficient, and in non-active
routers, is maximally efficient (perhaps more so than even standard
hop-by-hop).  In the non-active routers, the packet is associated with flow
in a way that makes possible hardware processing without any software
involvement at all.  In active routers, the process of looking up the next
DMF would be about as expensive as the current routing table lookup, and the
main difference would be that the result of that lookup would have to be
stored in the packet, not a great expense.


5 Renumbering


This section presents an example of how to ``renumber'' a Nimrod network.
Figure 5 shows a network halfway in the process of being renumbered.  The
figure shows the physical network and the associated locators.  The network
is formed by router Ra which is connected to three ethernets.  The figure
shows five hosts, ``Ha'' to ``He''.  To the right of each host two locators
are shown.  The first locator shown corresponds to the old numbering; the
second, to the new numbering.  Renumbering has consisted of adding a new
level of hierarchy---to simplify the work of Ra, say.

Because it is possible for a network element to have more than one locator,
the two sets of locators can be active at the same time.  Initially, only
the first set of locators is active.  This means that the NLS responds with

                                     22


Internet Draft              Nimrod Architecture                February 1994


these locators to queries involving endpoints implemented at these hosts.
It also means that Router Ra knows to which ethernet should a packet be
directed given the locator in the header.  (Given a packet destined to one
of the hosts, the router would pick one of the three interfaces based on the
``host part'' of the locator---i.e.  ``h1'' in locator a.h1.)  When the
second set of locators is introduced, for a query involving, for example, an
endpoint in Hd, the distributed database would respond with the new locator:
for examp.e, a.a.h1.  For a time, Router Ra would forward based on the two
sets of locators---because the first set of locators might still be cached
by some sources.  Eventually, Ra would de-activate the original set of
locators.

Pressumably, Ra would be prepared to forward the new set of locators before
the DNS is instructed to use them.  If a packet containing an old locator is
given to R1 after the locator has been de-activated, an error message would
be generated.  There exists the possibility that the old locators might be
re-assigned.  If a packet is received by the wrong endpoint, this situation
can be detected by looking at the destination EID which is included in the
packet header.

The renumbering scheme described above implies that it should be possible to
update the NLS securely and, relatively, dynamically.  Because renumbering
will, most likely, be infrequent and carefully planned, we expect that the
load on this updating mechanism should be manageable.

A second implication of this renumbering scheme is a requirement for a
secure and simple way to update hosts and routers locators.


6 Auxillary Functionality


We now turn our attention to functionality that must exist in Nimrod, but is
not a part of the ``core'' Nimrod architecture.  We shall discuss four
topics in this context - mobility support, multicasting, network management
and security.

Nimrod's approach to auxillary functionality is as follows.  Nimrod does not
specify a particular solution to provide the functionality but requires that
the solution have certain characteristics (eg.  scalability).  It is the
purpose of this section to discuss some of these requirements and evaluate
approaches towards meeting them.  This attitude towards auxillary
functionality is consistent with Nimrod's general philosophy of flexibility,
adaptability and incremental change.  Each of these topics is being worked
on extensively by the research community and it is not our intention to
duplicate these efforts.  Instead, we intend to let emerging solutions to be
grafted and used within Nimrod.

For each of the topics mentioned above, we discuss the issues involved,
approaches currently being used and proposed by the research community and
their viability for Nimrod.  A summary of the main points of each topic is

                                     23


Internet Draft              Nimrod Architecture                February 1994


                             ++++
                             +  + a.r1
                             +Ra+
                             ++++
                               *
                              ***
                             * * *
                            *  *  *
                           *   *   *
                          *    *    *
                         *     *     *
                        *      *      *
                       *       *       *
                      *        *        *
                     *         *         *
                    *          *          *
                   *           *           *
     *****************     *********     ***************************
     ++++          ++++          ++++           ++++         ++++
     +  + a.h5     +  + a.h1     +  + a.h2      +  + a.h4    +  + A.h3
     +Hd+ a.a.h1   +Ha+ a.a.h2   +Hb+ a.a.h1    +Hc+ a.c.h1  +He+ A.c.h3
     ++++          ++++          ++++           ++++         ++++


                      Figure 5:  Renumbering a Network


                                     24


Internet Draft              Nimrod Architecture                February 1994


given at the end of the discussion so that readers not interested in the
somewhat lengthy discussion of the issues may skip to the summary directly.


6.1 Mobility


Nimrod permits some physical devices to be mobile, that is, change their
network attachment points over time.  In this section, we discuss the
effects of mobility on Nimrod, describe the functionality required to handle
mobility and compare some existing approaches.

Nimrod, as a routing and addressing architecture, does not directly concern
itself with mobility.  That is, Nimrod does not propose a solution to the
mobility problem.  There are two chief reasons for this.  First, mobility is
a non-trivial problem whose implications and requirements are still not well
understood and will perhaps be understood only when a mobile internetwork is
deployed on a large scale.  Second, a number of groups (for instance the
mobile-ip working group of the IETF) are studying the problem by itself and
it is not our intention to duplicate those efforts.

The Nimrod architecture carries a functional ``stub'' for mobility, the
details of the stub being deferred for later.  The stub will be elaborated
when a solution that meets the requirements of Nimrod becomes available (for
instance from the IETF mobile-ip research).  We do not, however, preclude
the modification of any such solutions to meet the Nimrod requirements or
preclude the development of an independent solution within Nimrod.

Nimrod has a basic feature that helps accommodate mobility in a graceful and
natural manner, namely, the separation of the endpoint identifier space from
the locator space.  Recall from section 2.1 that an endpoint (e.g., a host
or a process) has a globally unique endpoint identifier (EID). The location
of the endpoint within the topology is given by its locator.  When an
endpoint moves, its EID remains the same, but its locator might change.
Nimrod can route a packet to the endpoint after the move, provided it is
able to obtain its new locator.

Thus, providing a solution to mobility in the context of Nimrod may be
perceived as one of maintaining a dynamic association between the endpoints
and the locators.  Extending this viewpoint further, one can think of
mobility-capable Nimrod as essentially consisting of two `modules' :  the
Nimrod routing module and the dynamic association module (DAM). The DAM is
an abstraction, embodying the functionality pertinent to maintaining the
dynamic association.  This is a valuable paradigm because it facilitates the
comparison of various mobility schemes from a common viewpoint.  Our
discussion will be structured based on the DAM abstraction and be in two
parts, the themes of which are :


  o What constitutes mobility for the DAM and Nimrod?  Is the realization
    of mobility as a ``mobility'' module that interacts with Nimrod viable?

                                     25


Internet Draft              Nimrod Architecture                February 1994


    What are the interactions between Nimrod and such a module?  These
    points will be discussed in section 6.1.1.

  o What are some of the approaches one can take in engineering the DAM
    functionality?  We classify some approaches and compare them in section
    6.1.2.


A word of caution:  the DAM should not be thought of as something equivalent
to the current day DNS - the DAM is a more general concept than that.  For
instance, consider a mobility solution for Nimrod similar to the scheme
described in(1)[5].  Very roughly, this approach is as follows:  Every
endpoint is associated with a `home' locator.  If the endpoint moves, it
tells a `home representative' about its new locator.  Packets destined for
the endpoint sent to the old locator are picked up by the home
representative, and sent to the new locator.  In this scheme, the DAM
embodies the functionalities implemented by all of the home representatives
in regard to tracking the mobile hosts.  The point is that the association
maintenance, while required in some form or the other, may not be an
explicitly distinct part, but implicit in the way mobility is handled.

Thus, the DAM is merely an abstraction useful to our discussion and should
not be construed as dictating the design.


6.1.1 Effects of Mobility


One consequence of mobility is the change in the locator of an endpoint.
However, not all instances of mobility result in a locator change (for
instance, there is no locator change if a host moves within a LAN) and a
change in the locator is not the only possible effect of mobility.  Mobility
might also cause a change in the topology map.  This typically happens when
entire networks move (eg.  an organization relocates, a wireless network in
a train or plane moves between cells etc.).  If the network is a Nimrod
network, we might have a change in the connectivity of the node representing
the network and hence a change in the map.

In this section, we consider the effects of mobility on the two ``modules''
identified above:  Nimrod, which provides routing to a locator, and a
hypothetical instantiation of the DAM, which provides a dynamic
endpoint-locator association, for use by Nimrod.  We consider four scenarios
based on whether or not the topology and an endpoint's locator changes and
comment on the effect of the scenarios on Nimrod and DAM.
------------------------------
 1. This also  resembles the current  draft proposal of  the IETF  mobile-ip
working group.


                                     26


Internet Draft              Nimrod Architecture                February 1994


Scenario 1 .  Neither the locator nor the topology changes.  This is the
    trivial case and affects neither the DAM nor Nimrod.  An example of
    this scenario is when a workstation is moved to a new interface on the
    same local area network.

Scenario 2 .  The locator changes but the topology remains the same.  This
    is the case when an endpoint moves from one network to another, thereby
    changing its locator.  The DAM is affected in this case, since it has
    to note the new EID-locator association and indicate this to Nimrod if
    necessary.  The effect on Nimrod is related to obtaining this change
    from the DAM. For instance, Nimrod may be informed of this change or
    ask for the association if and when it finds out that the mobile host
    cannot be reached.

Scenario 3 .  The locator does not change but the topology changes.  One
    way this could happen is if a network moves and changes its neighbors
    (topology change) but remains within the cluster containing it (no
    locator change).  DAM is not affected because the EID-locator
    association has not changed.  Nimrod is affected in the sense that the
    topology map would now have to be updated.

Scenario 4 .  Both the locator and the topology change.  If a network moves
    out of the cluster containing it, we have a change both in the map and
    in the locators of the devices in the network.  In this case, both
    Nimrod and DAM are affected.


In scenarios 3 and 4, it may not be sufficient to simply let Nimrod handle
the topological change using the update mechanisms described in section (on
RID). These mechanisms are likely to be optimized for relatively slow
changes.  Mobile wireless networks (in trains and planes for instance) are
likely to produce more frequent changes in topology.  Therefore, it might be
necessary that topological updates caused by mobility be handled using
additional mechanisms.  For instance, one might send specific updates to
appropriate cluster representatives in the cluster in which the move
occured, so that packets entering that cluster can be routed using the new
topology.  We observe that accommodating mobility of networks, especially
the fast moving ones, might require a closer interaction between Nimrod and
the DAM than required for mobility that only causes locator changes.  It is
beyond the scope of this document to specify the nature of this interaction;
however, we note that a solution to mobility should handle the case when a
network as a whole moves.  Current trends [6] indicate that such situations
are likely to be common in future when wireless networks will be present in
trains, airplanes, ships etc.

In summary, if we discount the movement of networks, ie, assume no topology
changes, it appears that the mobility solution can be kept fairly
independent of Nimrod and in fact be accommodated by an implementation of
the DAM. However, to accommodate network mobility (scenarios 3 and 4), it
might be necessary for Nimrod routing/routers to get involved with mobility.


                                     27


Internet Draft              Nimrod Architecture                February 1994


Beyond the constraints imposed by the interaction with Nimrod, it is
desirable that the mobility solution have some ``general'' features.  By
general, we mean that these are not Nimrod specific.  However, their
paramount importance in future applications makes them worth mentioning in
this document.  The desirable features are :


  o Support of both off-line and on-line mobility.  Off-line mobility (or
    portability) refers to the situation in which a session is torn down
    during the move, while on-line mobility refers to the situation in
    which the session stays up during the move.  While much of the mobility
    is off-line currently, trends indicate that a large part of mobility in
    the future is likely to be on-line.  A solution that only supports
    off-line mobility would probably have limited applications in future.

  o Scalability.  One of the primary goals of Nimrod is scalability, and it
    would be contrary to our design goals if the mobility solution does not
    scale.  There are three directions in which scalability is important :
    size of the network, number of mobile entities and the frequency of
    movement of the mobile entities.

    Note that for any given system with minimum response time (to a move)
    of t seconds, if the mobile entity changes attachment points faster
    than 1=t changes per second, the system will fail to track the entity.
    Augmenting traditional location tracking mechanisms with special
    techniques such as predictive routing might be necessary in this case.
    Hooks in the mobility solution for such augmentation is a desirable
    feature.

  o Security.  It is likely that in future, there will be increased demand
    for secure communications.  Apart from the non-mobility specific
    security mechanisms, the solution should address the following :


     --  Authentication.  The information sent by a mobile host about its
         location should be authenticated to prevent impersonation.
         Additionally, there should be mechanisms to decide if a mobile
         user who wishes to join a network has the privileges to do so or
         not.

     --  Denial of service.  The schemes employed for handling mobility in
         general could be a drain on the resources if not controlled
         carefully.  Specifically, the resource intensive portions of the
         protocol should be guarded so that inappropriate use of them does
         not cause excessive load on the network.


                                     28


Internet Draft              Nimrod Architecture                February 1994


6.1.2 Approaches


As mentioned in section 6.1, Nimrod does not provide a solution for
mobility, only a functional stub.  We require that the protocols comprising
the functional stub be independent of the protocols comprising Nimrod
routing.  This allows for maximum flexibility in designing and developing
each set of protocols.

As mentioned earlier, the problem of mobility in the context of Nimrod may
be viewed as one of maintaining a dynamic association (DAM) and
communicating this association and changes therein to Nimrod.  Approaches to
mobility may be classified based on how different aspects of the DAM are
addressed.

Our classification identifies two aspects to the mobility solution :


 1. How and where to maintain the dynamic association between endpoints and
    locators?  This may be perceived as a problem of database maintenance
    in a distributed system.  The database may be maintained in a
    centralized fashion, wherein a single entity maintains the association
    and updates are sent to it by the mobile host or in a distributed
    fashion, wherein there are a number of entities that store the
    associations.  In the distributed case, an entity might either store
    all of the associations in the network or store only associations for
    some EIDs (for instance a cluster representative stores associations
    only for entities within the cluster).  We refer to the former
    distributed method as ``global'' and the latter as ``local''.

    A (distributed) database that stores the EID-locator mapping is
    required by Nimrod even in the absence of mobility.  If this service
    can accommodate dynamic update and retrieval requests at the rate
    produced by mobility, this service is a candidate for a solution.
    However, we note that the availability of such a system should not be a
    requirement for the mobility solution.

 2. Where to do the remapping between the EID and locator, in case of a
    change in association?  Some candidates are :  the source, the ``home''
    location of the host that has moved and any router (say, between the
    source and the destination) in the network.


Many of the existing approaches and perhaps some new approaches to the
problem of mobile internetworking may be seen to be instantiations of a
combination of a dynamic association method and a remapping method.  We
consider some combinations as illustrated in Table 1.  We discuss four
combinations (marked A1 - A4 in the table) and examine their advantages and
disadvantages in the context of our requirements.  We ignore some approaches
(marked X in the table) because they clearly appear to be bad solutions.


                                     29


Internet Draft              Nimrod Architecture                February 1994


      ----------------------------------------
      _            _Source _  Home  _ Routers _
      ----------------------------------------
      _Centralized _  A1   _   X    _    X    _
      ----------------------------------------
      _Distr. local_  X    _   A2   _   A3    _
      ----------------------------------------
      _Distr. globl_  A4   _   X    _    X    _
      ----------------------------------------


        Table 1:  Combinations of Association and Remapping Methods


Note that this is but one classsification of mobility schemes and that the
remapping and EID-locator maintenance strategies mentioned in the table are
not exhaustive.  The main intention is to help understand better the kinds
of approaches that would be most suitable for Nimrod.

In the following, we use the term source to refer to the endpoint that is
attempting to communicate with or sending packets to a mobile endpoint.  The
source could be static or mobile.  We use the term mobile destination to
refer to the endpoint that is the intended destination of the source's
packets.


A1 .  In this approach, all locator-EID mappings are maintained at a
    centralized location.  The source queries the database to get the
    locator of the mobile destination.  Alternatively, the database can
    send updates to the source when the mobile destination moves.

    The main advantage of this scheme is its simplicity.  Also, no
    modification to routers is required, and the route from the source to a
    mobile destination is direct.

    The main disadvantage is that the scheme is vulnernability - if the
    centralized location goes down, all information is lost.  While this
    scheme may be sufficient for small sized networks with low mobility, it
    does not scale adequately to be a long term solution for Nimrod.

A2 .  This approach uses locally distributed association maintenance with
    remapping done at the home.  This is the approach that is being used by
    the mobile-ip group of the IETF for the draft proposal and by the
    Cellular Digital Packet Data (CDPD) consortium.  In this approach,
    every mobile endpoint is associated with a `home' and a `home
    representative' keeps track of the location of every mobile endpoint
    associated with it.  A protocol between a mobile endpoint and the home
    representative is used to maintain the information up-to-date.  The


                                     30


Internet Draft              Nimrod Architecture                February 1994


    source sends the packet using the home locator of the mobile
    destination and the home representative forwards it to the mobile
    destination.

    The advantage of this scheme is that it is fairly simple and does not
    involve either the source or the routers in the network.  Furthermore,
    the mobile destination can keep its location secret (known only to the
    home representative) - this is likely to be a desirable feature for
    some mobile hosts in some applications.  Finally, most of the control
    information is confined to the cluster containing the home
    representative and the mobile host and this is a plus for scalability.

    The main disadvantage is a problem often referred to as triangular
    routing.  That is, the packets have to go from the source to the home
    representative first before going to the mobile destination.  This is
    especially inefficient if, for instance, both the source and mobile
    destination are in, say, England and the home representative is in,
    say, California.  Also, there is still some vulnerability, since if the
    home representative becomes unreachable, the location of all of the
    mobile hosts it tracks is lost.

    Nevertheless, we feel that this approach or a modification thereof
    might be a viable first-cut mobility solution for Nimrod.

A3 .  In each of the previous cases, the routers in the network were not
    involved in tracking the location of the mobile host.  In this
    approach, state is maintained in the routers.  An example is the
    approach proposed in [7] wherein the packets sent by a mobile host are
    snooped and state is created.  The packets contain the mobile host's
    home location and its new location.  This mapping is maintained at some
    routers in the network.  When a packet intended for the mobile host
    addressed to its home location enters such a router, a translation is
    made and the packet is redirected to the new location.

    An alternate mechanism is to maintain the mapping in all of the border
    points (eg.  border routers) in the cluster within which the movement
    took place.  A packet from outside the cluster intended for a
    destination within the cluster would typically enter the cluster
    through one of the border points.  Using the mapping, the border point
    could figure out the most recent locator of the mobile destination and
    send the packet directly to that locator.  If most of the movements are
    within low level clusters, this would scale to large numbers of
    movements.  Furthermore, the packet takes an optimal path (or as
    optimal as one can get with a hierarchical network) to the new location
    within the time it takes for the cluster representative to get the new
    information, which is typically quite small for low level clusters.

    The main disadvantage of this scheme is that routers have to be
    involved.  However, future requirements in regard to scalability and
    response time might necessitate such an approach.  Furthermore, this
    solution has closer ties with Nimrod routing and is better suited to

                                     31


Internet Draft              Nimrod Architecture                February 1994


    handle scenarios 3 and 4 where the topology changes as a result of
    mobility.

A4 .  In this approach, the locator-EID mapping database is maintained in a
    distributed manner, perhaps like the present day Domain Name Service
    (DNS). The remapping is done as in A1.

    This reduces the vulnerability problem present in A1.  However, because
    routers are not involved, this approach appears less well-suited to
    handle mobility that results in topology changes (scenarios 3 and 4).

    Nevertheless, we do not reject this approach totally, especially if an
    adequate dynamic EID-locator mapping mechanism is provided independent
    of Nimrod (ie, if for instance, the next generation DNS can do the
    mapping maintenance)


All of these approaches seem potentially capable of handling scenarios 1 and
2 of the previous section.  Scenarios 3 and 4 are best handled by an
approach similar to A3.  However, approaches like A3 are more complex and
involve more Nimrod entities (eg.  routers) than may be desirable.

We have tried to bring out the various issues governing mobility in Nimrod.
In the final analysis, the tradeoffs between the various options will have
to be examined vis-a-vis our particular requirements (for instance, the need
to support network mobility) in adopting a solution.  It is likely that
general requirements such as scalability and security will also influence
the direction of the approach to mobility in Nimrod.


6.1.3 Summary


  o Nimrod permits physical devices to be mobile, but does not specify a
    particular solution for routing in face of mobility.

  o The fact that the endpoint identifier (EID) space and the locator space
    are separated in Nimrod helps in accommodating mobility in a graceful
    and natural manner.  Mobility may be percieved, essentially, as
    dynamism in the endpoint - locator association.

  o Nimrod allows two kinds of mobility:


     --  Endpoint mobility.  For example, when a host in a network moves.
         This might cause a change in the locator associated with the host,
         but does not cause a change in the topology map for Nimrod.

     --  Network mobility.  For example, when a router or an entire network
         moves.  This might cause a change in the topology in addition to
         the locator.

                                     32


Internet Draft              Nimrod Architecture                February 1994


  o Endpoint mobility may be handled by maintaining a dynamic association
    between endpoints and locators.  However, network mobility requires the
    addressing of the topology change problem as well.

  o Apart from the ability to handle network mobility, it is desirable that
    the mobility solution be scalable to large networks and large numbers
    of mobile devices and provide security mechanisms.

  o There are a number of existing and emerging solutions to mobility.  In
    particular, adaptation of solutions developed by the IETF is a first
    cut possibility for Nimrod.


6.2 Multicasting


Nimrod provides multicast routing and packet forwarding capability.
Multicasting is performed by using a multicast delivery tree whose leaves
are the multicast destinations.

We begin by looking at the similarities and differences between unicast
routing and multicast routing.  Both unicast and multicast routing require
two phases - route generation and packet forwarding.  In the case of unicast
routing, Nimrod specified three modes of packet forwarding - the flow mode,
the datagram mode and the **BTEC** mode; and route generation itself was not
specified but left to the particular routing agent.  In multicasting, Nimrod
leaves both route generation and packet forwarding mechanisms unspecified.
To explain why, we first point out three aspects that make multicasting
quite different from unicasting :


  o Groups and group dynamism.  In multicasting, the destinations are part
    of a group, whose membership is dynamic.  This brings up the following
    issues :


     --  A translation between the group name and the EIDs and locators of
         the members comprising that group.  This is especially relevant in
         the case of sender initiated multicasting and policy support.

     --  A mechanism to accommodate new group members in the delivery in
         response to addition of members and a mechanism to ``prune'' the
         delivery in response to departures.


  o State creation.  Most solutions to multicasting can essentially be
    viewed as creating state in routers for multicast packet forwarding.
    Based on who creates the state, multicasting solutions differ.  In
    multicasting, we have several options for this - eg.  the sender, the
    receivers or the intermediate routers.


                                     33


Internet Draft              Nimrod Architecture                February 1994


  o Route generation.  Even more so than in unicast routing, one can choose
    from a rich spectrum of heuristics with different tradeoffs between a
    number of parameters (such as cost and delay, algorithmic time
    complexity and optimality etc.).  For instance, some heuristics produce
    a low cost tree with high end-to-end delay and some produce trees that
    give the shortest path to each destination but with a higher cost.
    Heuristics for multicasting are a significant research area today and
    we expect advances to result in sophisticated heuristics in the near
    future.


Noting that there are various possible combinations of route generation,
group dynamism handling and state creation for a solution and that each
solution conceivably has applications for which it is the most suitable, we
do not specify one particular approach to multicasting in Nimrod.  Every
implementation of Nimrod is free to use its own multicasting technique, as
long as it meets the goals and requirements of Nimrod.

Thus, we do not discuss the details of any multicast solution here, only its
requirements in the context of Nimrod.  Specifically, we structure the
discussion in the remainder of the section on the following two themes :


  o What are the goals that we want to meet in providing multicasting in
    Nimrod, and what specific requirements do these goals imply, for the
    multicast solution?

  o What are some of the approaches to multicasting in vogue today and how
    relevant are each of these approaches to Nimrod?


6.2.1 Goals and Requirements


The chief goals of Nimrod multicasting are as follows:


 1. Scalability.  Nimrod multicasting must scale in terms of the size of
    the internetwork, the number of groups supported and the number of
    members per group.  It must also support group dynamism efficiently.
    This has the following possible implications for the solution :


       o Routers not on the direct path to the multicast destinations
         should not be involved in state management.  In a network with a
         large number of routers, a solution that does involve such routers
         is unlikely to scale (eg.  current implementation of mrouted).

       o It is likely that there will be a number of applications that have
         a few members per group (eg.  medical imaging) and a number of
         applications that have a large number of members per group (eg.

                                     34


Internet Draft              Nimrod Architecture                February 1994


         news distribution).  Nimrod multicasting should scale for both
         these situations.  If no single mechanism adequately scales for
         both sparse and dense group memberships simultaneously, a
         combination of mechanisms should perhaps be considered.

       o In face of group membership change, there must be a facility for
         incremental addition or deletion of ``branches'' in the multicast
         tree.  Reconstructing the tree from scratch is not likely to
         scale.

       o It is likely that we will have some well known groups (ie, groups
         which are more or less permanent in existence) and some ephemeral
         groups.  The dynamics of group membership are likely to be
         different for each class and the solution should take that into
         account as appropriate.


 2. Policy support.  This includes both quality of service as well as
    access restrictions, although currently, demand is probably higher for
    QOS. In particular, every path from the source to each destination
    should satisfy the requested quality of service and conform to the
    access restriction.  The implications for the multicasting solution
    are:


       o It is likely that many multicasting applications will be cost
         conscious in addition to having strict quality of service bounds
         (such as delay and jitter).  Balancing these will necessitate
         dealing with some new parameters - eg.  the tree cost (sum of the
         ``cost'' of each link), the tree delay (maximum, mean and variance
         in end-to-end delay) etc.

       o In order to support policy based routing, we need to know where
         the destinations are (so that we can decide what route we can take
         to them).  In such a case, a mechanism that provides an
         association between a group id and a set of destination locators
         is probably required.

       o Some policy constraints are likely to be destination specific.
         For instance, a domain might refuse transit service to traffic
         going to certain destination domains.  This presents certain
         unique problems - in particular, for a single group, multiple
         trees may need to be built, each tree ``servicing'' disjoint
         partitions of the multicast destinations.  This is illustrated
         with an example in appendix XX.


 3. Resource sharing.  Multicasting typically goes hand in hand with large
    traffic volume or applications with a high demand for resources.
    These, in turn, implies efficient resource management and sharing if
    possible.  Therefore, it is important that we place an emphasis on

                                     35


Internet Draft              Nimrod Architecture                February 1994


    interaction with resource reservation.  For instance, Nimrod must be
    able to provide information on which trees are shareable and which are
    not so that resource reservation may use it while allocating resources
    to flows.


6.2.2 Approaches


The approaches to multicasting currently in operation and those being
considered by the IETF include the following :


 1. Distance vector multicast routing protocol (DVMRP)[8].  This approach
    is based upon distance-vector route information distribution and
    hop-by-hop forwarding.  It uses Reverse Path Forwarding (RPF)[9] - a
    distributed algorithm for construction an internetwork broadcast tree.
    DVMRP uses a modified RPF algorithm, essentially a truncated broadcast
    tree, to build a reverse shortest path sender based multicast delivery
    tree.  A reverse shortest path from s to d is a path that uses the same
    intermediate nodes as those in the shortest path from d to s(2).  An
    implementation of DVMRP exists in the current Internet in what is
    commonly referred to as the MBONE. An improvement to this is in the
    process of being deployed.  It incorporates ``prune'' messages are used
    to truncate further the routers not on the path to the destinations and
    ``graft'' messages are employed to undo this truncation, if later
    necessary.

    The main advantage of this scheme is that it is simple.  The major
    handicap is scalability.  Two issues have been raised in this context.
    First, if S is the number of active sources and G the number of groups,
    then the state overhead is O(GS) and might be unacceptable when
    resources are limited.  Second, routers not on a multicast tree are
    involved (in terms of sending/tracking prune and graft messages) even
    though they might not be interested in the particular source-group
    pair.  The performance of this scheme is expected to be relatively poor
    for large networks with sparsely distributed group memberships.

 2. Core Based Trees (CBT)[10].  This scheme uses a single tree shared by
    all sources per group.  This tree has a single router as the core (with
    additional routers for robustness) from which branches emanate.  The
    chief distinguishing characteristic of CBT is that it is receiver
    initiated, ie, receivers wishing to join a multicast group find the
    tree (or its core) and attach themselves to it, without any
    participation from the sources.

------------------------------
 2. If the paths are symmetric (ie, cost the same) in either direction,  the
reverse shortest path is same as the shortest path


                                     36


Internet Draft              Nimrod Architecture                February 1994


    The chief motivation behind this scheme is the reduction of the state
    overhead, to O(G), in comparison to DVMRP. Also, only routers in the
    path between the core and the potential members are involved in the
    process.  Core-based tree formation and packet flow are decoupled from
    underlying unicast routing.

    The main disadvantage is that packets no longer traverse the shortest
    path from the source to their destinations.  The peformance in general
    depends on judicious placement of cores and coordination between them.
    Traffic concentration on links incident to the core is another problem.
    There is also a dependence on network entities (in other administrative
    domains, for instance) for resource reservation and policy routing.

 3. Protocol Independent Multicasting (PIM)[11].  Yet another approach
    based on the distance-vector hop-by-hop combination, this is designed
    to reap the advantages of DVMRP and CBT. Using a ``rendezvous point'',
    a concept similar to the core discussed above, it allows for the
    simultaneous existence of shared and source-specific multicast trees.
    In the steady state, data can be delivered over the reverse shortest
    path from the sender to the receiver (for better end-to-end delay) or
    over the shared tree.

    Using two modes of operation, sparse and dense, this provides improved
    performance for both when the group membership in an internetwork is
    sparse and when it is dense.  It is however, a complex protocol.  A
    limitation of PIM is that the shortest paths are based on the reverse
    metrics and therefore truly ``shortest'' only when the links are
    symmetric.

 4. Multicast Open Shortest Path First (MOSPF)[12].  Unlike the
    abovementioned approaches, this is based on link-state routing
    information distribution.  The packet forwarding mechanism is
    hop-by-hop.  Since every router has complete topology information,
    every router computes the shortest path multicast tree from any source
    to any group using Dijkstra's algorithm If the router doing the
    computation falls within the tree computed, it can determine which
    links it must forward copies onto.

    MOSPF inherits advantages of OSPF and link-state distribution, namely
    localized route computation (and easy verification of loop-freedom),
    fast convergence to link-state changes etc.  However, group membership
    information is sent throughout the network, including links that are
    not in the direct path to the multicast destinations.  Thus, like
    DVMRP, this is most suitable for small internetworks, that is, as an
    intra-domain routing mechanism.

 5. Inter-Domain Policy Routing (IDPR)[13].  This approach uses link-state
    routing information distribution like MOSPF, but uses source-specified
    packet forwarding.  Using the link-state database, the source generates
    a policy multicast route to the destinations.  Using this, the IDPR
    path-setup procedure sets up state in intermediate entities for packet

                                     37


Internet Draft              Nimrod Architecture                February 1994


    duplication and forwarding.  The state contains information about the
    next-hop entities for the multicast flow.  When a data packet arrives,
    it is forwarded to each next hop entity obtained from the state.

    Among the advantages of this approach are its ability to support policy
    based multicast routing with ease and independence (flexibility) in the
    choice of multicasting algorithm used at the source.  IDPR also allows
    resource sharing over multiple multicast trees.  One disadvantage is
    that it makes it relatively more difficult to handle group membership
    changes (additions and deletions) since such changes must be first
    communicated to the source of the tree which will then add branches
    appropriately.


We now discuss the applicability of these approaches to Nimrod.  Common to
all of the approaches described is the fact that we need to setup states in
the intermediate routers for multicast packet forwarding.  The approaches
differ mainly on who initiates the state creation - the sender (eg.  IDPR,
PIM), the receiver (eg.  CBT, PIM) or the routers themselves create state
without intitiation by the sender or receivers (DVMRP, MOSPF).

Nimrod should be able to accommodate both sender initiated as well as
receiver initiated state creation for multicasting.  In the remainder of
this section, we discuss the pros and cons of these approaches for Nimrod.

Recall that Nimrod uses link-state route information distribution (topology
maps) and has three modes of packet forwarding - flow mode, **BTEC** mode
and datagram mode.

An approach similar to that used in IDPR is viable for multicasting using
the flow mode.  The source can setup state in intermediate routers which can
then appropriately duplicate packets.  For the loose source and datagram
modes, an approach similar to the one used in MOSPF is applicable.  In these
situations, the advantages and disadvantages of these approaches in the
context of Nimrod is similar to the advantages and disadvantages of IDPR and
MOSPF respectively.

Sender based trees can be setup using an approach similar to IDPR and
generalizing it to an `n' level hierarchy.  A significant advantage of this
approach is policy-based routing.  The source knows about the policies of
clusters/administrative domains that care to advertise them and can choose a
route the way it wants (ie, not depend upon other entities to do it, as in
some schemes mentioned above).  Another advantage is that each source can
use the multicast route generation algorithm and packet forwarding scheme
that best suits it, instead of being forced to use whatever is implemented
elsewhere in the network.  Further, this approach allows for incrementally
deploying new multicast tree generation algorithms as research in that area
progresses.

CBT-like methods may be used to setup receiver initiated trees.  Nimrod
provides link-state maps for generating routes and a CBT-like method is

                                     38


Internet Draft              Nimrod Architecture                February 1994


compatible with this.  For instance, a receiver wishing to join a group may
generate a (policy) route to the core for that group using its link-state
map and attach itself to the tree.

A disadvantage of sender based methods in general seems to be the support of
group dynamism.  Specifically, if there is a change in the membership of the
group, the particular database which contains the group-destination mapping
must be updated.  In comparison, receiver oriented approaches seem to be
able to accommodate group dynamism more naturally.

Nimrod does not preclude the simultaneous existence of multiple approaches
to multicasting and the possibility of switching from one to the other
depending on the dynamics of group distributions.  Interoperability is an
issue - that is, the question of whether or not different implementations of
Nimrod can participate in the same tree.  However, as long as there is
agreement in the nature of the state created (ie, the states can be
interpreted uniformly for packet forwarding), this should not be a problem.
For instance, a receiver wishing to join a source-specified sender created
tree might set up state on a path between itself and a router on the tree
with the sender itself being unaware of it.  Packets entering the router
would now be additionally forwarded along this new ``branch'' to the new
receiver.

In conclusion, the architecture of Nimrod can accommodate diverse approaches
to multicasting.  Each approach has its disadvantages with respect to the
requirements mentioned in the previous section.  The architecture does not
demand that one particular solution be used, and indeed, we expect that a
combination of approaches will be employed and engineered in a manner most
appropriate to the requirements of the particular application or subscriber.


6.2.3 Summary


  o Nimrod does not specify a particular multicast route generation
    algorithm or state creation procedure.  Nimrod can accommodate diverse
    multicast techniques and leaves the choice of the technique to the
    particular instantiation of Nimrod.

  o A solution for multicasting within Nimrod should be capable of


     --  Scaling to large networks, large number of multicast groups and
         large multicast groups.

     --  Supporting policy including quality of service and access
         restrictions.

     --  Resource sharing.


                                     39


Internet Draft              Nimrod Architecture                February 1994


  o Multicasting typically requires the setting up of state in intermediate
    routers for packet forwarding.  The state setup may be initiated by the
    sender (eg.  IDPR), by the receiver (eg.  CBT), by both (eg.  PIM) or
    by neither (DVMRP??, MOSPF??).  The architecture of Nimrod provides
    sufficient flexibility to accommodate any of these approaches.


6.3 Network Management


To Be Specified.


6.4 Security


To Be Specified.


References


 [1] J. N. Chiappa, ``A New IP Routing and Addressing Architecture,'' IETF
     Internet Draft, 1991.

 [2] M. Steenstrup, ``Inter-Domain Policy Routing Protocol Specification:
     version 1,'' RFC 1479, June 1993.

 [3] ISO, ``Information Processing Systems--Telecommunications and
     Information Exchange between Systems--Protocol for Exchange of
     Inter-Domain Routeing Information among Intermediate Systems to
     Support Forwarding of ISO 8473 PDUs,'' ISO/IEC DIS 10747, August 1992.

 [4] R. Wright, Three Scientists and their GodsL Looking for Meaning in an
     Age of Information. New York:  Times Book, first ed., 1988.

 [5] J. Penners and Y. Rekhter, ``Simple Mobile IP (SMIP),'' INTERNET
     DRAFT, Sep 1993. (draft-penners-mobileip-smip-00.txt).

 [6] K. A. Wimmer and J. B. Jones, ``Global Development of PCS,'' IEEE
     Communications Magazine, pp. 22--27, Jun 1992.

 [7] F. Teraoka, Y. Yokote, and M. Tokoro, ``A Network Architecture
     Providing Host Migration Transparency,'' in Proceedings of ACM
     SIGCOMM, 1991.

 [8] S. Deering and D. Cheriton, ``Multicast routing in datagram
     internetworks and extended LANs,'' ACM Transactions on Computer
     Systems, pp. 85--111, May 1990.

 [9] Y. K. Dalal and R. M. Metcalfe, ``Reverse path forwarding of broadcast

                                     40


Internet Draft              Nimrod Architecture                February 1994


     packets,'' Communications of the ACM, 21(12), pp. 1040--1048, 1978.

[10] A. J. Ballardie, P. F. Francis, and J. Crowcroft, ``Core Based
     Trees,'' in Proceedings of ACM SIGCOMM, 1993.

[11] S. Deering, D. Estrin, D. Farinacci, and V. Jacobson, ``IGMP router
     extensions for routing to sparse multicast groups,'' Internet Draft,
     July 1993.

[12] J. Moy, ``Multicast extensions to OSPF,'' Internet Draft, Sep 1992.

[13] M. Steenstrup, ``Inter-Domain Policy Routing,'' Internet Draft, June
     1993.


                                     41


Received: from PIZZA.BBN.COM by BBN.COM id aa08235; 15 Mar 94 16:06 EST
Received: from pizza by PIZZA.BBN.COM id aa02555; 15 Mar 94 15:41 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa02551; 15 Mar 94 15:39 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa06370; 15 Mar 94 15:39 EST
Received: by ginger.lcs.mit.edu 
	id AA08989; Tue, 15 Mar 94 15:38:54 -0500
Date: Tue, 15 Mar 94 15:38:54 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9403152038.AA08989@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re:  Draft Nimrod Architecture Document
Cc: jnc@ginger.lcs.mit.edu

	Before everyone looks at this and goes bonkers, let me emphasize that
it's a fairly rough draft, in which different people wrote different sections,
and the people who wrote text don't necessarily agree with the contents of
each others sections! :-) It's being put out at an early stage to stimulate
discussion. I have a largish set of comments which take exception to various
bits and pieces, and I'll be sending them in soon, so fire away!

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa15742; 21 Mar 94 17:35 EST
Received: from pizza by PIZZA.BBN.COM id aa05240; 21 Mar 94 17:11 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa05236; 21 Mar 94 17:08 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa14158; 21 Mar 94 17:03 EST
Received: by ginger.lcs.mit.edu 
	id AA27065; Mon, 21 Mar 94 17:02:25 -0500
Date: Mon, 21 Mar 94 17:02:25 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9403212202.AA27065@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: IETF multicast
Cc: jnc@ginger.lcs.mit.edu, mwalnut@cnri.reston.va.us

	Despite our having put the request in shortly after the last IETF, our
second WG slot has been denied multi-cast coverage. (The coverage went to
"Multiparty Multimedia Session Control WG", which got two multicast slots, and
"Transition and Coexistence Including Testing BOF".)
	Since we will be reviewing the basic architecture document in both
sessions, I don't know what will happen which day. If anyone who waas planning
on participating via multi-cast has some specific issue they'd like to cover,
please let me know, so we can get to it the first day.

	Noel

PS: If you want to complain, the person to talk to is Megan (mwalnut@cnri).


Received: from PIZZA.BBN.COM by BBN.COM id aa01820; 25 Mar 94 15:37 EST
Received: from pizza by PIZZA.BBN.COM id aa00245; 25 Mar 94 15:15 EST
Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa00240; 25 Mar 94 15:08 EST
To: nimrod-wg@BBN.COM
Subject: IETF29 Agenda
Date: Fri, 25 Mar 94 15:04:30 -0500
From: Isidro Castineyra <isidro@BBN.COM>

Group Name: Nimrod - The New Internet Routing and Addressing
            Architecture BOF
IETF Area: Routing
Date/Time: Tuesday, March 29, 1994
           0930-1200 PST (multicast)
           Wednesday, March 31, 1994
           0930-1200 PST 

--------

Proposed Agenda:

    The main purpose of this meeting is to review the draft
    Architecture document and to prepare a workplan for the next IETF.

    1. Agenda bashing

    2. Draft Review & Discussion
        a. Nimrod Draft Architecture (Isidro Castineyra)        60min
        b. Discussion						120min

    3. Open Issues						60min

    3. Workplan							30min
    
    4. Deployment Strategy 					30min


Chairs: J. Noel Chiappa, Isidro Castineyra (BBN)


Received: from PIZZA.BBN.COM by BBN.COM id aa10140; 29 Mar 94 9:51 EST
Received: from pizza by PIZZA.BBN.COM id aa23766; 29 Mar 94 9:23 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa23762; 29 Mar 94 9:19 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa07924; 29 Mar 94 9:17 EST
Received: by ginger.lcs.mit.edu 
	id AA02566; Tue, 29 Mar 94 09:13:40 -0500
Date: Tue, 29 Mar 94 09:13:40 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9403291413.AA02566@ginger.lcs.mit.edu>
To: int-serv@isi.edu, nimrod-wg@BBN.COM
Subject: A taxonomy of state and tags in datagram networks
Cc: big-internet@munnari.oz.au, jnc@ginger.lcs.mit.edu

	As a result of discussions in the Int-Serv WG sessions yesterday, I'd
like to suggest a framework for thinking about, and set of terms for
describing, the state in datagram network systems. I think this framework will
both help us understand more clearly what's going on, and the common
terminology will allow us to have discussions without getting confused.
	I want to emphasize that I'm not pushing one design or another, just
trying to give us a common framework which we can use to understand, discuss,
and evaluate different designs in.

	I don't distinguish between "state" (which one normally thinks of as
data which records something transitory about a given process), and
information (which might be seen as a little more permanent), since it's
merely a matter of time-scale; there's very transient state (such as the
hop-count), intermediate lifetime (such as information about a flow), and
long-lived (such as routing table entries). However, it's all just state
in the end.
	I start by observing that even "classic" IPv4 has state in both the
routers, and the packets.

	In the packet, I distinguish between what I call "forwarding state",
which records something about the progress of this individual packet through
the network (such as the hop count, or the pointer into a source route), and
what I call "tags", which are fields which are used, in the routers through
which the packet passes, to look up various state stored in the routers.
An example of tags in the current IPv4 architecture is the "address", which is
used to look up routing table state; a "flow-id" might be a future tag.
	I further subdivide tags into two sub-classes; "keys" and "hints".  A
key is a field without which, or without the state in the router to which it
refers, one cannot forward the packet; the "address" is such a field in IPv4.
A hint is a field which is not necessary for the forwarding of the packet, but
which makes the forwarding more efficient if the hint is correct, and the
state in the router to which it refers is present.

	In the routers, there is a similar distinction between state which
must be present for the forwarding process to be sucessful, which we might
call "necessary" state (although I don't like this term, and welcome a better
one), such as routing table entries; and cached state which is not necessary
for the correct forwarding of packets. The term "soft state" is, I believe,
sometimes used to refer to the latter kind of state.
	Note that neither of these states is what we refer to as "critical"
state, i.e. state which is critical to a given end-end communication, and
which, once lost, means the loss of the connection. An example of this is
the state of a TCP connection, as stored co-located (i.e. sharing fate) with
the application.

	Having defined this way of looking at state, I'll put some thoughts
on how we ought to create and maintain this state in a separate message.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa15910; 29 Mar 94 11:20 EST
Received: from pizza by PIZZA.BBN.COM id aa24445; 29 Mar 94 11:04 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa24441; 29 Mar 94 11:02 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa14676; 29 Mar 94 11:00 EST
Received: by ginger.lcs.mit.edu 
	id AA03070; Tue, 29 Mar 94 10:54:59 -0500
Date: Tue, 29 Mar 94 10:54:59 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9403291554.AA03070@ginger.lcs.mit.edu>
To: int-serv@isi.edu, nimrod-wg@BBN.COM
Subject: How to create and maintain state in a packet network
Cc: big-internet@munnari.oz.au, jnc@ginger.lcs.mit.edu

	It seems to be an item of rough consensus that in the future
internetwork, the routers will need to contain more state about the traffic
flowing through them if we want to provide fair sharing of resources etc.  A
smaller, but influential, group feels that in order to provide certain
services we will need to be able to provide service guarantees, and do so in a
way which requires explicit reservation of resources. All of this means more
state in the routers.
	The interesting question is how this state gets into the routers; is
it inferred from watching the datagrams which flow through the router, or is
in installed in some more explicit way?

	In all of this discussion, I'm passing over the need to retain support
for real datagrams, i.e. transactions which require only a single packet. (In
fact, I'm going to use the term "datagram" from here on out to refer to such
applications, and use the term "packet" to describe the unit of transmission
through the network.) There's clearly no point in storing any state about that
datagram in the routers; it's come and gone in the same packet, and this
discussion is all about state retention, so that's why I don't talk about
them.  However, don't assume from this discussion of how to handle the packets
generated by non-datagram applications that I'm advocating system which
support *only* such packet sequences (which we call "flows").
	As another aside, I still think that the unreliable packet ought to be
the fundamental building block of the internetwork layer. I really like the
design principle that says that we can take any packet and throw it away with
no warning or other action, or take any router and turn it off with no
warning, and have the system still work. The component design simplicity
(since routers don't have to stand on their heads to retain a packet which
they have the only copy of), and overall system robustness, resulting from
these two assumptions is absolutely unloseable.
	Anyway, back to the  question is how the state gets into the routers.

	There is an interesting potential synergy here, because there is
thought being given to routing architectures which, for reasons of engineering
efficiency, store more state in the routers for long-lived flows. (There may
be similar thoughts in the security and billing areas, but I'm not aware of
them.)
	It's important to realize that there is no *fundamental* reason why
this state has to be stored in the routers, and looked up via a key in the
packets. It could easily be repeated in every packet (as a source route), but
we don't plan on doing so for reasons of efficiency in header size (both in
terms of bandwidth, and in processing to create and forward the packets).
	This observation is of some use when thinking about the router state
which is used for doing resource allocation. Some of that state might be
information about the user's service needs; information which could be sent in
each packet, or which can be saved in the router, depending on which makes the
most engineering sense. I call such state, which reflects the desires of the
user, "user state", even when a copy is cached in the routers.
	However, other state is needed for this cannot be stored in each
packet; it's state about the longer-term (i.e. spanning multiple packets)
situation. I call this state "server state".

	There are two schools of thought as to how to proceed. The first says
that for reasons of robustness and simplicity, all "user state" (resource
class info, source route, etc) ought to be repeated in each packet. For
efficiency reasons, the routers may cache such "user state", probably along
with precomputed data derived from the user state. (It makes sense to store
such cached user state along with any applicable server state, of course.)
	This school may be subdivided into two subschools, depending on what
hint they use in the packet to find this cached state. (It's a hint, not a key,
since the state in the router can be discarded at any time without making it
impossible to forward the packet.) In one school, there's a field (the flow-id)
whose sole purpose in life is to be a hint. In the other school, a number of
other fields (source as source and destination address, port, etc) combine to
be the hint.
	The second school says that it's simply going to be too inefficient
to carry all the user state around all the time, and we should just bite
the bullet, install it in the routers directly, and include in the packets
a key (also called a flow-id, just to be confusing) to find that state.
I call this this "installation" school.

	I'm not sure how much use there is to any intermediate position. It
seems to me that to have one internetwork layer subsystem (e.g. resource
allocation) carry user state in all the packets, and use a hint in the packets
to find it, and have a second (e.g. routing) use a direct installaion, and use
a key in the packets to find it, makes little sense. We should do one or the
other, based on a consideration of the efficiency/robustness tradeoff.

	It's a little difficult to make this choice without more information
of exactly how much user state the network is likely to have in the
future (i.e. we might wind up with 500 byte headers if we include the
full source route, resource reservation, etc, etc in every header).
	It's also difficult without consideration of the actual mechanisms
involved. As a general principle, we wish to make recovery from a loss of
state as local as possible, to limit the number of entities which have to
become involved. For instance, when a router crashes, traffic is rerouted
around it without needing to open a new TCP connection.
	In a similar way, the option of the "installation" looks a lot more
attractive if it's plausible, and relatively cheap, to reinstall the user
state when a router crashed, without otherwise causing a lot of hassle.

	My intuition tells me that in the long run we're better off with
just biting the bullet on user state, and going to an installation paradigm
with keys, not replicated user state in each packet with hints, but until
we see more details it may prove difficult to know for sure which way is
the best way.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa21133; 29 Mar 94 12:43 EST
Received: from pizza by PIZZA.BBN.COM id aa24924; 29 Mar 94 12:19 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa24920; 29 Mar 94 12:16 EST
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa19273; 29 Mar 94 12:15 EST
Received: by ginger.lcs.mit.edu 
	id AA03726; Tue, 29 Mar 94 12:10:36 -0500
Date: Tue, 29 Mar 94 12:10:36 -0500
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9403291710.AA03726@ginger.lcs.mit.edu>
To: int-serv@isi.edu, nimrod-wg@BBN.COM
Subject: Re:  How to create and maintain state in a packet network
Cc: big-internet@munnari.oz.au, jnc@ginger.lcs.mit.edu

    It seems to me that to have one internetwork layer subsystem (e.g.
    resource allocation) carry user state in all the packets, and use a hint
    in the packets to find it, and have a second (e.g. routing) use a direct
    installation, and use a key in the packets to find it, makes little sense.
    We should do one or the other...

It has been pointed out to me that there are three ways in which to interpret
this statement, and it makes sense to take note of the different way, because
the utility of doing this makes different degrees of sense in the different
ways.

First, there is the meaning I had in mind, where one single flow uses
different mechanisms for different subsystems.

Second, one flow might use a given technique for all its subsystems, and
another flow might use a different technique of its; there is potentially some
use to this, although I'm not sure the cost in complexity of supporting both
mechanisms is worth the benefits.

Third, one flow might use on mechanism with one router along its path, and
another for a different router. A number of different reasons exist as to
why one might do this, including the fact that not all routers may support
the same mechanisms simultaneously.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa00817; 29 Mar 94 14:01 EST
Received: from pizza by PIZZA.BBN.COM id aa25445; 29 Mar 94 13:43 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa25441; 29 Mar 94 13:41 EST
Received: from zephyr.isi.edu by BBN.COM id aa29412; 29 Mar 94 13:39 EST
Received: by zephyr.isi.edu (5.65c/5.61+local-16)
	id <AA23930>; Tue, 29 Mar 1994 10:35:03 -0800
Date: Tue, 29 Mar 1994 10:35:03 -0800
From: Bob Braden <braden@isi.edu>
Message-Id: <199403291835.AA23930@zephyr.isi.edu>
To: int-serv@isi.edu, nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu
Subject: Re: A taxonomy of state and tags in datagram networks
Cc: big-internet@munnari.oz.au

  *> one), such as routing table entries; and cached state which is not necessary
  *> for the correct forwarding of packets. The term "soft state" is, I believe,
  *> sometimes used to refer to the latter kind of state.

Noel,

I would say that cached state is simply an example of
soft state.  The essential feature of soft state is
that the router is allowed to delete it without an
explicit deletion request.  RSVP state is another example
of soft state; if it is deleted, packets cannot be
forwaarded "correctly" (although they may or may not
be forwarded as best-effort datagrams).  Yet a
router is allowed to time it out and delete the
RSVP state that is not refreshed.

  *> 	Note that neither of these states is what we refer to as "critical"
  *> state, i.e. state which is critical to a given end-end communication, and
  *> which, once lost, means the loss of the connection. An example of this is
  *> the state of a TCP connection, as stored co-located (i.e. sharing fate) with
  *> the application.

YOu have to get over this exclusively connectiontist view of
applications, Noel!  Most realtime applications don't have state in
that sense.


Bob Braden


Received: from PIZZA.BBN.COM by BBN.COM id aa01678; 29 Mar 94 14:18 EST
Received: from pizza by PIZZA.BBN.COM id aa25522; 29 Mar 94 13:50 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa25518; 29 Mar 94 13:48 EST
Received: from pooh.cc.iastate.edu by BBN.COM id aa29792; 29 Mar 94 13:44 EST
Received: by iastate.edu with sendmail-5.57/4.7 
	id <AA18886@iastate.edu>; Tue, 29 Mar 94 12:44:39 -0600
Message-Id: <9403291844.AA18886@iastate.edu>
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
To: int-serv@isi.edu, nimrod-wg@BBN.COM
To: big-internet@munnari.oz.au
Subject: Re: How to create and maintain state in a packet network 
In-Reply-To: Your message of Tue, 29 Mar 94 10:54:59 -0500.
             <9403291554.AA03070@ginger.lcs.mit.edu> 
Date: Tue, 29 Mar 94 12:44:39 CST
From: John Hascall <john@iastate.edu>


[...]
> 	Anyway, back to the  question is how the state gets into the routers.
[...]
> 	There are two schools of thought as to how to proceed.
    [...school  1)  all state in every packet...]
    [....subschool  1a)  dedicated field in packet as hint to state cache...]
    [....subschool  1b)  combine existing fields as hint...]
    [...school  2)  install state in routers, key in packet to find state...]

> 	I'm not sure how much use there is to any intermediate position.

    It seems to me that there is an intermediate position worth exploring.

    Have a hint in the packet (preferably a dedicated one in my eyes),
    and have the initial packet contain the state as an option (a la
    TCP MSS).  Since it is a hint the router could discard it based on
    whatever criteria (LRU, timeout, lack of space, crash, etc.).

    It seems to me you would want a way for a router to request the
    hint again (if it was discarded or if the routing changed and new
    router wanted the hint).  ICMP or its successor would seem the
    most obvious methodology to request it occur in the next convenient
    packet.

John
-----------------------------------------------------------------------------
John Hascall                      An ill-chosen word is the fool's messenger.
Systems Software Engineer
Project Vincent
Iowa State University Computation Center  +  Ames, IA  50011  +  515/294-9551


Received: from PIZZA.BBN.COM by BBN.COM id aa07565; 29 Mar 94 15:51 EST
Received: from pizza by PIZZA.BBN.COM id aa26404; 29 Mar 94 15:34 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa26400; 29 Mar 94 15:31 EST
Received: from emory.mathcs.emory.edu by BBN.COM id aa05975; 29 Mar 94 15:28 EST
Received: by
	emory.mathcs.emory.edu (5.65/Emory_mathcs.3.4.19) via UUCP
	id AA05865 ; Tue, 29 Mar 94 15:28:43 -0500
Received: by shlep.sware.com (16.6/2.0) from paradox.sware.com id AA05625; Tue, 29 Mar 94 15:25:28 -0500
Sender: sale@paradox.sware.com
MMDF-Warning:  Parse error in original version of preceding line at BBN.COM
Received: from  (localhost) by paradox.sware.com with SMTP (5.59/ CMW+ v2.3-eef)
	id AA00642; Tue, 29 Mar 94 15:24:32 EST
Return-Path: <sale@paradox.sware.com>
Date: Tue, 29 Mar 94 15:24:32 EST
Message-Id: <9403292024.AA00642@paradox.sware.com>
From: Ed Sale <sale@paradox.sware.com>
X-Mailer: InterMail/CMW [1.1alpha]
Cc: sale@sware.com
To: nimrod-wg@BBN.COM
Subject: Re: How to create and maintain state in a packet network
In-Reply-To: Your message of Tue, 29 Mar 94 12:10:36 -0500.
        <9403291710.AA03726@ginger.lcs.mit.edu>


Noel,

I believe that the endpoints of flows which will
require the new services could be responsible for
maintaining the extended state in the intermediate
hops along the path.

Packets early in the flow's lifetime could be
exchanged end-end which contain this state and
give the endpoints some assurance that the nodes
along the path are able to provide the required
service.  After this exchange, keys may be used to
convey this information until such a time as a node
along the path either loses this state for some
reason or a new route is established for the flow.
At this point the intermediate nodes that need to
acquire the state could request it from either their
nearest neighbors for the flow or from the endoints
themselves.  This allows the packets to only carry
the extended state on an as-needed basis.

How is it possible to reserve resources for providing
*guaranteed* services in an internetwork where routers
can and do occasionally fail?  In my mind this question
boils down to, "How many points of failure do we want
to provide redundancy for?"  The service-reservation
messages would potentially have to be passed to all
of the routers which might carry packets on behalf of
the flow.

In any case, I believe that providing this type of
service blurs the line between the nework and
transport layers to some degree.  What are some
examples of the kinds of services which are
perceived as requiring guaranteed service(s) from
the network layer?

 --  Ed

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Ed Sale      SecureWare, Inc.             email: sale@sware.com
               2957 Clairmont Rd. #200      phone: (404) 315-6296 x24
               Atlanta, GA 30329-1647         fax: (404) 315-0293


Received: from PIZZA.BBN.COM by BBN.COM id aa19819; 30 Mar 94 1:31 EST
Received: from pizza by PIZZA.BBN.COM id aa28975; 30 Mar 94 1:10 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa28971; 30 Mar 94 1:07 EST
Received: from monk.proteon.com by BBN.COM id aa19141; 30 Mar 94 1:07 EST
Received: from rockford.proteon.com by monk.proteon.com (4.1/Proteon-1.5)
	id AA28985; Wed, 30 Mar 94 01:06:59 EST
Received: by rockford.proteon.com (4.1/SMI-4.1)
	id AA04527; Wed, 30 Mar 94 01:06:59 EST
Date: Wed, 30 Mar 94 01:06:59 EST
From: Avri Doria <avri@proteon.com>
Message-Id: <9403300606.AA04527@rockford.proteon.com>
To: nimrod-wg@BBN.COM
Subject: Cosmetics or definitions
Reply-To: avri@proteon.com


i question whether details in a definition can be called cosmetic and
readjusted sometime after those definitions have been deeply embeded
into the architecture.

some definitions can be left loose, e.g. the nature of the object to
which an EID is attached, for it may not really matter what kind of
object it is that is atomic or fate sharing.  other defintions,
e.g. whether a boundary attachment point is a node in the context of
the next higher abstraction, seems to me to need to be nailed down.

avri


Received: from PIZZA.BBN.COM by BBN.COM id aa19404; 30 Mar 94 12:24 EST
Received: from pizza by PIZZA.BBN.COM id aa01569; 30 Mar 94 12:03 EST
Received: from BBN.COM by PIZZA.BBN.COM id aa01565; 30 Mar 94 11:58 EST
Received: from monk.proteon.com by BBN.COM id aa17581; 30 Mar 94 11:52 EST
Received: from rockford.proteon.com by monk.proteon.com (4.1/Proteon-1.5)
	id AA06554; Wed, 30 Mar 94 11:52:52 EST
Received: by rockford.proteon.com (4.1/SMI-4.1)
	id AA04625; Wed, 30 Mar 94 11:52:51 EST
Date: Wed, 30 Mar 94 11:52:51 EST
From: Avri Doria <avri@proteon.com>
Message-Id: <9403301652.AA04625@rockford.proteon.com>
To: nimrod-wg@BBN.COM
Subject: multicast datagrams (pardon the expression)
Reply-To: avri@proteon.com


in conversations after yesterday  morning's meetings, i felt that it was
necessary that there be a way to handle what has been defined as non
existant: a multicast datagram.

i envision the following possible case: some service provider (a
business that is) sends out a daily packet to a set of subscribers with
information they consider vital to their businesses.  this would be a
sender originated multicast and new businesses would subscribe in their
own good time.  the sender would build its multicast list and send it
off every morning.  it certainly would not make sense for the
intervening routers to keep state on such a 'every once in an
eternity' type of event.  it also would not make sense for state to be
created in each router (i forget what you are calling the act of
setting up multicast state) just to send the  one packet.  i am also
not sure that it would be reasonable for explorers to be sent out to
build a source route (sorry, i forget the acronymic euphomism).

avri


Received: from PIZZA.BBN.COM by BBN.COM id aa12892; 30 Mar 94 19:14 EST
Received: from pizza by PIZZA.BBN.COM id aa04261; 30 Mar 94 18:54 EST
Received: from TTL.BBN.COM by PIZZA.BBN.COM id aa04257; 30 Mar 94 18:51 EST
To: avri@proteon.com
cc: nimrod-wg@BBN.COM
Subject: Re: multicast datagrams (pardon the expression) 
In-reply-to: Your message of Wed, 30 Mar 94 11:52:51 -0500.
             <9403301652.AA04625@rockford.proteon.com> 
Date: Wed, 30 Mar 94 18:47:55 -0500
From: Ram Ramanathan <ramanath@BBN.COM>


 >in conversations after yesterday  morning's meetings, i felt that it was
 >necessary that there be a way to handle what has been defined as non
 >existant: a multicast datagram.

 >i envision the following possible case: some service provider (a
 >business that is) sends out a daily packet to a set of subscribers with
 >information they consider vital to their businesses.  this would be a
 >sender originated multicast and new businesses would subscribe in their
 >own good time.  the sender would build its multicast list and send it
 >off every morning.  it certainly would not make sense for the
 >intervening routers to keep state on such a 'every once in an
 >eternity' type of event.  it also would not make sense for state to be
 >created in each router (i forget what you are calling the act of
 >setting up multicast state) just to send the  one packet.  i am also
 >not sure that it would be reasonable for explorers to be sent out to
 >build a source route (sorry, i forget the acronymic euphomism).

If we have a 'every once in an eternity'  type of scenario, why use multicast
at all?  Why not send individual unicast datagrams?

If there are, for example, n destinations, the difference between
n individual datagrams  and one multicast datagram is probably insignificant
when the event occurs infrequently. The difference in overhead between
n unicasts and a multicast to n destinations is significant primarily
when you send a *large number* of packets (like in video conf) and then, the
overhead of n unicasts gets multiplied by that large number and becomes
unacceptable. But in this situation, setting up a flow is but a minor
price to pay.

The point is that "multicasting" itself is simply  a more efficient way of 
sending n unicasts - basically a scalability improvement and becomes
relativly unimportant for infrequent or small volume applications.

That said, I do think we should consider this option some and am glad
Avri brought it up. However, I am not sure everybody agrees on the
definiton of a multicast datagram (there was some confusion about this
yesterday). The important thing is what kind of state requirements does
this imply? No state other than the maps (which are state in a sense)??
Some simple state derived from the maps but not explicityly setup by
the sender or receiver? Any comments?

- Ram. 
 

-Ram.


Received: from PIZZA.BBN.COM by BBN.COM id aa15310; 4 Apr 94 2:38 EDT
Received: from pizza by PIZZA.BBN.COM id aa22819; 4 Apr 94 2:15 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa22815; 4 Apr 94 2:10 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa14595; 4 Apr 94 2:09 EDT
Received: by ginger.lcs.mit.edu 
	id AA23997; Mon, 4 Apr 94 02:09:47 -0400
Date: Mon, 4 Apr 94 02:09:47 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404040609.AA23997@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re:  multicast datagrams (pardon the expression)
Cc: jnc@ginger.lcs.mit.edu

<Warning: this message contains a seriously crazed idea at the bottom. Please
 scan that part at least..>


    it was necessary that there be a way to handle what has been defined as
    non existant: a multicast datagram.

It turns out that I think there is a way to do this, but first let me point
out that in Nimrod "datagram" has the special meaning of an application which
sends only one "packet" and then is done, so setup is per se not useful. This
presents a conundrum, since you normally can't have a multi-cast group without
some setup, so what's the point, really?

The answer, obviously, is "source-routed multicast datagrams"! I.e., the
datagram contains the spanning-tree path it should be distributed over. The
only bug here is that if the group is a large one, the tree could be a *lot*
of data.... (Hmm, maybe IPng needs a 32-bit header length field!  :-) The
knowledge of who is in the group has to be distributed to all sources, of
course...

Actually, there's a way to solve that, which is to have regional distribution
nodes, who know the members in their area. Then you only have to get the
packet to them, and they pass it out further. (This would also limit the size
of the tree.) Perhaps such servers would be a general service, one which it is
worth connecting up with an installed multi-cast flow? Hmmm...  lots of
possibilities here.


    i envision the following possible case: some service provider ... sends
    out a daily packet to a set of subscribers with information they consider
    vital to their businesses. ... the sender would build its multicast list
    and send it off every morning.

I was wondering about this application; would it really make sense to do this
as multicast, and not a bunch of unicast? I guess, if the multi-cast group
was large enough, it makes sense. Then you're back to the previous problem...


    not make sense for state to be created in each router (i forget what you
    are calling the act of setting up multicast state)

I think of this as "multicast flow setup".

Actually, I've been thinking a rather radical though, which is that maybe we
should ditch separate unicast mechanisms (in flow setup and sourc routed
packets), and only have multicast. Unicast would then be a special case of
multicast. (I assume this will make Steve happy! :-)

My reasoning involves asking "Is there really enough efficiency advantage in
having a unicast flow mechanism, separate from multicast flows, to make it
worth the complexity of having two separate mechanisms?" I mean, if multi-cast
is used a lot, the mechanism to support it must work reasonably efficiently,
right? If so, maybe the answer to the previous quesion is "no", right?

The thing I'm wondering about is scaling; I assume that the complexity of
supporting maintainence of large multicast groups, where you have adding and
pruning going on, means you need a range of mechanisms to support groups of
varying sizes fromo 3 to 3 million? Perhaps there is some general way of
distributing the calculation of the spanning trees, and their maintainence,
which will scale? Maybe we will build the trees as colletions of "distribution
nodes", one at each place the tree branches, and connect up the nodes with
unicast flows?

Anyway, it's late, and I can't think clearly. Have to think about this...


	Noel

PS: Nimrod is a fairly complex system, so any time I see a feature of dubious
utility (such as flow aggregation, separate unicast and multicast mechanisms,
etc) my immediate reaction is "oh good, something I can get rid of"!

After all, the Nimrod motto *is* "Perfection has been attained, not when there
is nothing left to add, but when there is nothing left to take away"!


Received: from PIZZA.BBN.COM by BBN.COM id aa14075; 4 Apr 94 11:34 EDT
Received: from pizza by PIZZA.BBN.COM id aa24047; 4 Apr 94 11:13 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa24043; 4 Apr 94 11:11 EDT
Received: from nsco.network.com by BBN.COM id aa12374; 4 Apr 94 11:08 EDT
Received: from anubis.network.com by nsco.network.com (5.61/1.34)
	id AA18292; Mon, 4 Apr 94 10:11:33 -0500
Received: from gramarye.network.com by anubis.network.com (4.1/SMI-4.1)
	id AA00977; Mon, 4 Apr 94 10:07:58 CDT
Date: Mon, 4 Apr 94 10:07:58 CDT
From: Joel Halpern <jmh@anubis.network.com>
Message-Id: <9404041507.AA00977@anubis.network.com>
To: nimrod-wg@BBN.COM
Subject: Re:  multicast datagrams (pardon the expression)

Noel suggests, in his latest "interesting-gram", that we get rid of
separate Unicast mechanisms, and consider unicast a special case of
multi-cast.  I think that the key to evaluating this is how much extra
work/state/complexity is involved in the "normal" multicast case.  If
multicast can be handled, at the internetwork layer, at the same
complexity and performance as unicast, then we should seriously consider
this.

One good think about this is that it probably allows, with proper
thought, for almost any variation on "anycast" that one wants, since
that is merely a behavior halfway between unicast and multicast.  This
should also serve as a warning as to where scaling is likely to be a
problem.  Not with the protocol/routing, but with the usage we are
encouraging.

Thank you,
Joel M. Halpern			jmh@network.com
Network Systems Corporation


Received: from PIZZA.BBN.COM by BBN.COM id aa22876; 4 Apr 94 13:56 EDT
Received: from pizza by PIZZA.BBN.COM id aa24756; 4 Apr 94 13:39 EDT
Received: from TTL.BBN.COM by PIZZA.BBN.COM id aa24752; 4 Apr 94 13:37 EDT
To: nimrod-wg@BBN.COM
Subject: Re: multicast datagrams (pardon the expression) 
In-reply-to: Your message of Mon, 04 Apr 94 10:07:58 -0500.
             <9404041507.AA00977@anubis.network.com> 
Date: Mon, 04 Apr 94 13:30:49 -0400
From: Ram Ramanathan <ramanath@BBN.COM>


Joel Halpern writes :

 >Noel suggests, in his latest "interesting-gram", that we get rid of
 >separate Unicast mechanisms, and consider unicast a special case of
 >multi-cast.  I think that the key to evaluating this is how much extra
 >work/state/complexity is involved in the "normal" multicast case.  If
 >multicast can be handled, at the internetwork layer, at the same
 >complexity and performance as unicast, then we should seriously consider
 >this.

 As mentioned in the architecture document, multicasting has the following
 non-trivial things to consider, that are not a concern in unicast :

 1) Groups and group dynamism.
 2) Diverse possibilities of state creation - eg. initiated by senders,
    receivers, both or neither. Each has its advantages and disadvantages
    and must not be precluded.
 3) Consequences of policies in multicasting. If a transit policy precludes
    some members as destination and allows the rest of the members, the
    group is essentially partitioned into two subgroups (arbitrary number
    in general), requiring a multicast "forest" not a tree - a rather difficult
    problem. We have addressed and solved this problem in IDPR, but Nimrod
    would be more complicated.

 etc.

 I used to be violently for making unicast a special case of multicast a
 few months ago. But the above and other points have tempered me a bit.
 I believe we should concentrate on unicast by itself first. Later, when
 we understand the tradeoffs better, we could combine the two if we agree
 that the extra baggage for unicast is worth the "uniformity". 

 >One good think about this is that it probably allows, with proper
 >thought, for almost any variation on "anycast" that one wants, since
 >that is merely a behavior halfway between unicast and multicast.  This
 >should also serve as a warning as to where scaling is likely to be a
 >problem.  Not with the protocol/routing, but with the usage we are
 >encouraging.

 Yes, it is powerful. In fact, let us take this one step further in generality.
 In an ideal world, the network provides a x-y-z-cast. Here,

 x = nodes in the network.
 y = a subset of x.
 z = ANY or ALL (a selector that says if all of y or any of y is the dest). 

 Then, broadcast/unicast/multicast/anycast are special cases of x-y-z-cast :

 broadcast : y = x, z = ALL.
 unicast   : y = destination, z = ALL.
 multicast : y = group, z = ALL.
 anycast   : y = group, z = ANY (anycast : communicate with any of a given 
                                 set of nodes).

 Of course, the advantage in generalizing particulars is that we can "see"
 new particulars. An (rather impotent) example is :

 anyothercast : y = x - group, z = ANY. 

 But other combinations may be interesting.

 best regards,

 - Ram.


--------------------------------------------------------------

Ram Ramanathan              Systems and Technologies 
                            Bolt, Beranek and Newman Inc. 
                            10 Moulton Street, Cambridge, MA 02138

Phone : (617) 873-2736

INTERNET : ramanath@bbn.com


Received: from PIZZA.BBN.COM by BBN.COM id aa12640; 4 Apr 94 21:14 EDT
Received: from pizza by PIZZA.BBN.COM id aa27433; 4 Apr 94 20:54 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa27429; 4 Apr 94 20:52 EDT
Received: from alex.disem.dnd.ca by BBN.COM id aa12030; 4 Apr 94 20:52 EDT
Received: by alex.disem.dnd.ca (4.1/SMI-4.1)
	id AA04461; Mon, 4 Apr 94 20:51:57 EST
From: "Capt L. Clement" <clement@alex.disem.dnd.ca>
Message-Id: <9404050151.AA04461@alex.disem.dnd.ca>
Subject: Is the NIMROD Proposal available?
To: nimrod-wg@BBN.COM
Date: Mon, 4 Apr 1994 20:51:57 -0500 (EST)
X-Mailer: ELM [version 2.4 PL23]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 433       

Is the NIMROD proposal available for release?  If so, where can
it be obtained.

Thank you for your consideration in this matter.

------------------------------------------------------------------
Capt L Clement,  DISEM 3-4-2	clement@disem.dnd.ca
National Defence Headquarters
219 Laurier Ave West		
Ottawa, Canada			Tel: 613 992-3851
K1A 0K2				Fax: 613 996-3979 
------------------------------------------------------------------


Received: from PIZZA.BBN.COM by BBN.COM id aa09780; 5 Apr 94 1:13 EDT
Received: from pizza by PIZZA.BBN.COM id aa28333; 5 Apr 94 0:45 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa28329; 5 Apr 94 0:42 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa09072; 5 Apr 94 0:42 EDT
Received: by ginger.lcs.mit.edu 
	id AA02140; Tue, 5 Apr 94 00:42:23 -0400
Date: Tue, 5 Apr 94 00:42:23 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404050442.AA02140@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: RFC-1609
Cc: jnc@ginger.lcs.mit.edu

It might prove somewhat interesting to look at this document. Here's a clip
from it:


   Network related information, referred to as 'network map' in the rest of
   this paper, should

   1. Show the interconnection between the various network
      elements. This will basically represent the Network as a graph
      where vertices represent objects like gateways/workstations/
      subnetworks and edges indicate the connections.

   2. Show properties and functions of the various network elements
      and the interconnections. Attributes of vertices will represent
      various properties of the objects e.g., speed, charge, protocol, OS,
      etc. Functions include services offered by a network element.

   ...

   5. Contain the policy related information, part of which may be
      private while the other part may be made public.

   Using this map the following services may be provided

   ...

   2. Route management:

      - Find alternate routes by referring to the physical
        and logical configurations.
      - Generate routing tables considering local policy and
        policy of transit domains
      - Check routing tables for routing loops,
        non-optimality, incorrect paths, etc.

   3. Fault management: In case of network failures
      alternatives may be found and used to bypass the
      problem node or link.

   ...

   5. Optimization: The information available can be used
      to carry out various optimizations, for example cost,
      traffic, response-time, etc.


It all sounds familiar, no? However, on reading it, my perception is that they
haven't yet gotten their hands around the abstraction problem; i.e. they
provide a way to distribute the storage of the map, but provide no way to
"simplify" pieces of it. Since that's the really hard one...

Still, it's interesting to see someone else going down the map-based road...

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa20674; 5 Apr 94 4:57 EDT
Received: from pizza by PIZZA.BBN.COM id aa29174; 5 Apr 94 4:36 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa29170; 5 Apr 94 4:33 EDT
Received: from mitsou.inria.fr by BBN.COM id aa20112; 5 Apr 94 4:30 EDT
Received: by mitsou.inria.fr
	(5.65c8/IDA-1.2.8) id AA22174; Tue, 5 Apr 1994 10:34:59 +0200
Message-Id: <199404050834.AA22174@mitsou.inria.fr>
To: Ram Ramanathan <ramanath@BBN.COM>
Cc: nimrod-wg@BBN.COM
Subject: Re: multicast datagrams (pardon the expression) 
In-Reply-To: Your message of "Mon, 04 Apr 1994 13:30:49 EDT."
             <199404041810.AA02136@sophia.inria.fr> 
Date: Tue, 05 Apr 1994 10:34:59 +0200
From: Christian Huitema <Christian.Huitema@sophia.inria.fr>

One key point of multicasting is that you often don't know whom you are sending
to. In that case, there is no possibility of including a 'delivery tree' in
the packet - that technique is only fit for well controlled conferences
were a signalling procedure keeps track of the membership.

Christian Huitema


Received: from PIZZA.BBN.COM by BBN.COM id aa18918; 5 Apr 94 12:01 EDT
Received: from pizza by PIZZA.BBN.COM id aa01233; 5 Apr 94 11:46 EDT
Received: from BBN.COM by PIZZA.BBN.COM id ab01225; 5 Apr 94 11:42 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa17248; 5 Apr 94 11:36 EDT
Received: by ginger.lcs.mit.edu 
	id AA07570; Tue, 5 Apr 94 11:35:59 -0400
Date: Tue, 5 Apr 94 11:35:59 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404051535.AA07570@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re: multicast datagrams (pardon the expression)
Cc: jnc@ginger.lcs.mit.edu

    One key point of multicasting is that you often don't know whom you are
    sending to. In that case, there is no possibility of including a 'delivery
    tree' in the packet - that technique is only fit for well controlled
    conferences were a signalling procedure keeps track of the membership.

All multicast groups do have two things: a membership list, and a delivery
tree; the only question is to whether that list is stored in a distributed
fashion, and whether the tree is computed by a distributed algorithm. In
most current multicast systems, the answer to both is "yes". (I'm excluding
multicast groups which only span a single hardware network; the forwarding
infrastructure isn't involved in such groups.)

In particular, one can view the distributed computation and setup of the
delivery tree in most current schemes as a "flow setup"; it's just one with no
resource allocation or other flow-type things, simply information about packet
forwarding paths (like Nimrod). It cetainly results in state about that
multicast group being stored in routers: if it walks like a flow, and quacks
like a flow...

Maybe there isn't a use for "real" datagram multicast; i.e. one which has *no*
state associated with that flowed stored in the routers. However, I for one
can't see a way to do it other than i) including the group delivery tree in
the packets, or ii) relying on server(s) which know that information, and
stick it in the packets (perhaps in a distributed fashion, so that no single
server knows the whole tree). Of course, the difference between ii) and
multicast flow setup is mostly whether the "servers" are co-located with the
routers or not....

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa25827; 5 Apr 94 13:53 EDT
Received: from pizza by PIZZA.BBN.COM id aa01968; 5 Apr 94 13:34 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa01964; 5 Apr 94 13:27 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa24055; 5 Apr 94 13:25 EDT
Received: by ginger.lcs.mit.edu 
	id AA09017; Tue, 5 Apr 94 13:25:12 -0400
Date: Tue, 5 Apr 94 13:25:12 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404051725.AA09017@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re: How to create and maintain state in a packet network
Cc: jnc@ginger.lcs.mit.edu

    From: Ed Sale

    I believe that the endpoints of flows which will require the new services
    could be responsible for maintaining the extended state in the intermediate
    hops along the path.

I think that this is the ultimate fallback position, for reasons of both
robustness and simplicity. I.e. if we don't have to engineer the network to
*never* punt back to the hosts, it makes the network engineering a lot
easier. However, we may still want to do repairs as locally as possible, for
a number of reasons...

    Packets early in the flow's lifetime could be exchanged end-end which
    contain this state and give the endpoints some assurance that the nodes
    along the path are able to provide the required service. After this
    exchange, keys may be used to convey this information

You've described what I think of as flow setup, and flow-id's, exactly.

    until such a time as a node along the path either loses this state for
    some reason or a new route is established for the flow. At this point the
    intermediate nodes that need to acquire the state could request it from
    either their nearest neighbors for the flow or from the endoints
    themselves.

Right; it's a "simple matter of engineering" (famous last words :-) as to which
of the two gets done when, based on cost/benfit tradeoff issues.

    How is it possible to reserve resources for providing *guaranteed*
    services in an internetwork where routers can and do occasionally fail?
    In my mind this question boils down to, "How many points of failure do we
    want to provide redundancy for?"

As Masataka Ohta pointed out, there's no such thing as an absolute guarantee.
The best you can do is, as you expend more resources (money :-), you get a
higher probability of getting the kind of service you want.


    The service-reservation messages would potentially have to be passed to
    all of the routers which might carry packets on behalf of the flow.

I don't quite see this. I can see systems (such as hop-by-hop) where the large
degree of local freedom on where packets go, *together with* the strong
decoupling between routing and resource allocation, raise issues which can
only be solved with this kind of thing. However, if you have an internetwork
layer which couples the two more tightly, you can avoid this. Am I missing
something?

As a semi-worked example, in Nimrod you go to set up a flow with certain
resource needs, so you try and do a resource allocation as part of the setup.
If it succeeds along the path you specified (which may be specified in terms
of high-level entities which actually refer to a number of parallel real
physical paths), that means the routers have picked a linearly arranged set of
links and switches to carry your traffic, and the resources were allocated.

If one of those fails, the routers may be able to select an alternate set,
although still within the group named by the high-level entities which you
named in your flow path, and allocate the resources you asked for along that
path. If so, everything's OK; you may see a slight service interruption.
(Actually, you may not wish such automatic local repair; I suppose we could
have a switch in the flow setup to disable it.) If not, you get told "sorry,
redo the setup, we can't go it any more".


    In any case, I believe that providing this type of service blurs the line
    between the nework and transport layers to some degree.

No. What we are doing is radically enhancing the service model provided by
the internetwork layer...

    What are some examples of the kinds of services which are perceived as
    requiring guaranteed service(s) from the network layer?

Jeez, I'm not an application type, so I have to hand-wave a bit; they can
answer better than I. The people doing voice seem to think they have resource
floors belows which their application just won't work...

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa00992; 6 Apr 94 14:28 EDT
Received: from pizza by PIZZA.BBN.COM id aa08990; 6 Apr 94 14:07 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa08986; 6 Apr 94 14:01 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa28584; 6 Apr 94 13:59 EDT
Received: by ginger.lcs.mit.edu 
	id AA19774; Wed, 6 Apr 94 13:59:50 -0400
Date: Wed, 6 Apr 94 13:59:50 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404061759.AA19774@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re:  multicast datagrams (pardon the expression)
Cc: jnc@ginger.lcs.mit.edu

<The answer to this question, which was sent in privately, seemed like
 it would be interesting to the WG as a whole...>

    > regional distribution nodes, who know the members in their area. Then
    > you only have to get the packet to them, and they pass it out further.
    > (This would also limit the size of the tree.) Perhaps such servers would
    > be a general service, one which it is worth connecting up with an
    > installed multi-cast flow?

    could you explain how what you propose differs from CBT?

Well, depending on what aspect of multicast you're looking at, it may not be
very different. I mean, if you're looking at what paths the user data flows
along, it may be much the same.

As I mentioned in my reply to Christian, I divide multi-cast state into group
membership, and distribution (spanning) tree(s). These can basically be
handled and calculated separately (with the exception that the former is input
to the latter). Most multicast schemes (including CBT) seem to intermix these
two functions together.

In fact, in looking at multicast schemes in general, there are three different
important aspects: the creation and distribution of the two classes of state,
and the actual paths the data takes. (For the latter point, CBT uses
distribution from centralized point(s); other schemes use separate trees for
each source, e.g. MOSPF/DVMRP, or allow either, e.g. PIM.) In each of these
three areas, there seem to be different answers that make sense depending on
the size of the group, data rate, etc.

For instance, if you have a 4 site videoconference, they probably all know who
else is in the conference, and it probably makes sense to calculate the
spanning tree in a unitary (i.e. non-distributed) way. Separate trees seem to
be the way to go in terms of how to distribute the user data for this
application, from what I understand. Larry King Live (with callins, so it's
not all one way, a la HBO), on the other hand, would use something totally
different in all these areas.

To me, the best way to go for the long term is to provide a framework into
which various "local" answers for all of these things fit. Separating out
maintainence of group membership and tree calculation from each other, and
allowing varying local answers for each, seems like part of the answer. That
way, new algorithms for doing either can be deployed incrementally without
great upheaval. The question is "what mechanism must be provided in a uniform
system-wide way to allow this"; is multi-cast flow setup it?

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa18477; 7 Apr 94 0:04 EDT
Received: from pizza by PIZZA.BBN.COM id aa02037; 6 Apr 94 23:42 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa02033; 6 Apr 94 23:40 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa14493; 6 Apr 94 23:39 EDT
Received: by ginger.lcs.mit.edu 
	id AA24966; Wed, 6 Apr 94 23:39:20 -0400
Date: Wed, 6 Apr 94 23:39:20 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404070339.AA24966@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Source routes...
Cc: jnc@ginger.lcs.mit.edu

	We've been requested by the IPNg Technical Requirements people to
provide a list of the Nimrod requirements for IPng. I'm working on a draft for
what we could submit, and I've come across something I'd like to raise.
	(I don't want to make any assumptions about whether or not the Internet
will use Nimrod (although I think something like it will eventually be where
the Internet winds up), so I can't tell them exactly what the IPng requirements
will be for routing, as other schemes may need different support. However, I
can tell them what Nimrod needs.)

	The issue has to do with source-routed packets. It has to do with how
one actually forward such packets; I imagine a mechanism much like the way
datagram packets work.
	I had imagined that if one expressed a source route in terms of, say,
a high-level virtual link, the efficient and robust way to actually forward
that packet would be for the nodes which are the ends of that virtual link to
actually set up a flow which instantiates that virtual link. Any source-routed
packet which arrives which specifies that virtual link would be assigned to
that flow (i.e. you bash that flow-id into the unused flow-id field in the
packet, and fire it down the flow); when it pops out the other end, the router
there looks at the next element in the source route.

	Does this seem like a reasonable model of how to do it? Of course, any
Nimrod area could internally do something different, as long as it followed the
semantics of the source route, but I imagine this would be the way to go in
most cases.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa21957; 7 Apr 94 11:50 EDT
Received: from pizza by PIZZA.BBN.COM id aa04811; 7 Apr 94 11:28 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa04807; 7 Apr 94 11:25 EDT
Received: from wd40.ftp.com by BBN.COM id aa20087; 7 Apr 94 11:22 EDT
Received: from mailserv-D.ftp.com by ftp.com  ; Thu, 7 Apr 1994 11:21:57 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA24901; Thu, 7 Apr 94 11:21:04 EDT
Date: Thu, 7 Apr 94 11:21:04 EDT
Message-Id: <9404071521.AA24901@mailserv-D.ftp.com>
To: jnc@ginger.lcs.mit.edu
Subject: Re: Source routes...
From: Frank Kastenholz <kasten@ftp.com>
Reply-To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM
Content-Length: 808


>        Does this seem like a reasonable model of how to do it? 

Yes.

But in keeping with Nimrod's model of letting areas do things how
they please, I'd word it "the nodes which are the ends of that
virtual link take the responsibility for getting the packet across
the virtual link in a manner consistent with the Nimrod
Architecture".  If Nimrod does its best to not specifiy particular
algorithms, etc, then saying that the two nodes set up a flow seems
to be specifying an algorithm -- I could imagine that the two nodes
could do true hop-by-hop forwarding, ala IPv4, if their local
topology was simple enough. Where the underlying 'context' is to be
general, keep the wording general, and vice versa...


--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa29349; 7 Apr 94 13:57 EDT
Received: from pizza by PIZZA.BBN.COM id aa05929; 7 Apr 94 13:44 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa05925; 7 Apr 94 13:41 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa28173; 7 Apr 94 13:36 EDT
Received: by ginger.lcs.mit.edu 
	id AA03490; Thu, 7 Apr 94 13:36:38 -0400
Date: Thu, 7 Apr 94 13:36:38 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404071736.AA03490@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re:  Nimrod IPng technical requirements text
Cc: jnc@ginger.lcs.mit.edu

In polling the BBN crew for points to put in this, I came across an
interesting point. It was suggested that I include "multicast locators" as a
requirement.

This caused an immediate fault. Locators are the names of objects in the
Nimrod map; you can't *have* a multicast locator!

This raises an interesting question. We can have a multi-drop flow, but how
do you name the set of things the multi-cast flow is delivering to (i.e. the
multi-cast group)? It can't be a locator, right? It has to be an EID, the only
other kind of name we've got. This kind of tends to blow a hole in the
definition of an "endpoint" as a fate-sharing region, though...

That does tie into something Bob Braden said the other day, which is that it's
useful to think more about multi-cast applications, where the concept of
"critical state" is far less useful. Maybe these are two facets of the same
tihing.

Anyway, that would mean that we have EID's for multicast groups, and,
moreover, that a single endpoint can have more than one EID. (It would be
useful to be able to tell from looking at an EID whether it names a group, or
a single endpoint; we can use the "top bit" hack for that.)

So, I think that resolves an old open point about whether endpoints and EID's
are in one-one correspondance... of course, there's still the issue of whether
a single endpoint can have more than one non-multicast EID.

Does this all sound OK?

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa02003; 7 Apr 94 14:38 EDT
Received: from pizza by PIZZA.BBN.COM id aa06226; 7 Apr 94 14:19 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa06222; 7 Apr 94 14:18 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa00222; 7 Apr 94 14:10 EDT
Received: by ginger.lcs.mit.edu 
	id AA03954; Thu, 7 Apr 94 14:10:36 -0400
Date: Thu, 7 Apr 94 14:10:36 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404071810.AA03954@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re: Source routes...
Cc: jnc@ginger.lcs.mit.edu

    But in keeping with Nimrod's model of letting areas do things how
    they please, I'd word it "the nodes which are the ends of that
    virtual link take the responsibility for getting the packet across
    the virtual link in a manner consistent with the Nimrod
    Architecture".

I thought that's what I said: "Of course, any Nimrod area could internally do
something different, as long as it followed the semantics of the source route".

    If Nimrod does its best to not specifiy particular algorithms, etc, then
    saying that the two nodes set up a flow seems to be specifying an
    algorithm

The reason I bring this sort of stuff up is that to the extent we have
"recommended" mechanisms, those mechanisms may find support in the packet
format (e.g. a flow-id they can bash locally) useful, as opposed to having to
create a new header to wrap the packet for transit across their system. Also,
let's be realistic; most people will simply implement what the spec suggests,
not go invent some whole new mechanism!

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa02237; 7 Apr 94 14:42 EDT
Received: from pizza by PIZZA.BBN.COM id aa06280; 7 Apr 94 14:25 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa06276; 7 Apr 94 14:23 EDT
Received: from usc.edu by BBN.COM id aa00396; 7 Apr 94 14:14 EDT
Received: from hermosa.usc.edu by usc.edu (4.1/SMI-3.0DEV3-USC+3.1)
	id AA21256; Thu, 7 Apr 94 11:14:03 PDT
Received: by hermosa.usc.edu (4.1/SMI-4.1+ucs-3.6)
	id AA08573; Thu, 7 Apr 94 11:18:49 PDT
Date: Thu, 7 Apr 94 11:18:49 PDT
From: "Daniel M. Alexander Zappala" <dzappala@catarina.usc.edu>
Message-Id: <9404071818.AA08573@hermosa.usc.edu>
To: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu
In-Reply-To: <9404071736.AA03490@ginger.lcs.mit.edu> (message from Noel Chiappa on Thu, 7 Apr 94 13:36:38 -0400)
Subject: Re:  Nimrod IPng technical requirements text
Reply-To: daniel@catarina.usc.edu


>> On Thu, 7 Apr 94 13:36:38 -0400, Noel Chiappa <jnc@ginger.lcs.mit.edu> said:

> In polling the BBN crew for points to put in this, I came across an
> interesting point. It was suggested that I include "multicast locators" as a
> requirement.

> This caused an immediate fault. Locators are the names of objects in the
> Nimrod map; you can't *have* a multicast locator!

> This raises an interesting question. We can have a multi-drop flow, but how
> do you name the set of things the multi-cast flow is delivering to (i.e. the
> multi-cast group)? It can't be a locator, right? It has to be an EID, the only
> other kind of name we've got. This kind of tends to blow a hole in the
> definition of an "endpoint" as a fate-sharing region, though...

> That does tie into something Bob Braden said the other day, which is that it's
> useful to think more about multi-cast applications, where the concept of
> "critical state" is far less useful. Maybe these are two facets of the same
> tihing.

Why not call it a set of EIDs and give it a set-ID?  Set-IDs can refer
to the types of non-fatesharing entities that Bob Braden says you need
to look into. 

Daniel


Received: from PIZZA.BBN.COM by BBN.COM id aa07950; 7 Apr 94 15:55 EDT
Received: from pizza by PIZZA.BBN.COM id aa06733; 7 Apr 94 15:38 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa06729; 7 Apr 94 15:35 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa06251; 7 Apr 94 15:26 EDT
Received: by ginger.lcs.mit.edu 
	id AA04740; Thu, 7 Apr 94 15:26:46 -0400
Date: Thu, 7 Apr 94 15:26:46 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404071926.AA04740@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re:  Nimrod IPng technical requirements text
Cc: jnc@ginger.lcs.mit.edu

    > This raises an interesting question. We can have a multi-drop flow, but
    > how do you name the set of things the multi-cast flow is delivering to
    > (i.e. the multi-cast group)? ... It has to be an EID, the only other
    > kind of name we've got. ... This kind of tends to blow a hole in the
    > definition of an "endpoint" as a fate-sharing region, though...

    Why not call it a set of EIDs and give it a set-ID? Set-IDs can refer to
    the types of non-fatesharing entities that Bob Braden says you need to look
    into.

Hmm, good idea. I assume there's pretty much a one-one mapping between the
concept of "multicast group" and the concept of "set of endpoints", right?
Here are some mechanical questions:

Does everything work OK if the set-ID's (SID's) come from the same namespace
as EID's (perhaps differentiated by the high bit, or something)? Is there any
reason to draw them from a different namespace?

Also, do packets being sent to SID's look just like packets destined to EID's?
I.e., except for the different kind of destinaion "name", is there any
different information which needs to be carried to be useful, etc?

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa12363; 7 Apr 94 17:13 EDT
Received: from pizza by PIZZA.BBN.COM id aa07232; 7 Apr 94 16:46 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa07228; 7 Apr 94 16:44 EDT
Received: from usc.edu by BBN.COM id aa09634; 7 Apr 94 16:22 EDT
Received: from laguna.usc.edu by usc.edu (4.1/SMI-3.0DEV3-USC+3.1)
	id AA26531; Thu, 7 Apr 94 13:22:52 PDT
Received: by laguna.usc.edu (4.1/SMI-4.1+ucs-3.6)
	id AA05986; Thu, 7 Apr 94 13:29:01 PDT
Date: Thu, 7 Apr 94 13:29:01 PDT
From: "Daniel M. Alexander Zappala" <dzappala@catarina.usc.edu>
Message-Id: <9404072029.AA05986@laguna.usc.edu>
To: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu
In-Reply-To: <9404071926.AA04740@ginger.lcs.mit.edu> (message from Noel Chiappa on Thu, 7 Apr 94 15:26:46 -0400)
Subject: Re:  Nimrod IPng technical requirements text
Reply-To: daniel@catarina.usc.edu

>> On Thu, 7 Apr 94 15:26:46 -0400, Noel Chiappa <jnc@ginger.lcs.mit.edu> said:

>> This raises an interesting question. We can have a multi-drop flow, but
>> how do you name the set of things the multi-cast flow is delivering to
>> (i.e. the multi-cast group)? ... It has to be an EID, the only other
>> kind of name we've got. ... This kind of tends to blow a hole in the
>> definition of an "endpoint" as a fate-sharing region, though...

>     Why not call it a set of EIDs and give it a set-ID? Set-IDs can refer to
>     the types of non-fatesharing entities that Bob Braden says you need to look
>     into.

> Hmm, good idea. I assume there's pretty much a one-one mapping between the
> concept of "multicast group" and the concept of "set of endpoints",
> right?

Right.  Or maybe it's a one-one mapping with a "flow"?  I.e. a set of
EIDs that do not share a fate but DO share routing and QOS state in the
network?

> Here are some mechanical questions:

> Does everything work OK if the set-ID's (SID's) come from the same namespace
> as EID's (perhaps differentiated by the high bit, or something)? Is there any
> reason to draw them from a different namespace?

Well, a set can be identified by enumerating its elements, so the
long-form of the set-ID is a listing of its constituent EIDs.  Same
thing as saying you can send a packet to a multicast group by sending a
bunch of unicast packets.  Of course you prefer to have an alias for
this long name, and this alias could be assigned the way you mention.
Technically, isn't using the high bit to differentiate an SID from an
EID the same thing as splitting the namespace in half?  The only concern
is keeping enough space for EIDs.

> Also, do packets being sent to SID's look just like packets destined to EID's?
> I.e., except for the different kind of destinaion "name", is there any
> different information which needs to be carried to be useful, etc?

Can't think of any offhand, but it requires more thought.  Since you may
be treating state in the network differently for SIDs, there may extra
info.


Daniel


Received: from PIZZA.BBN.COM by BBN.COM id aa14458; 7 Apr 94 17:51 EDT
Received: from pizza by PIZZA.BBN.COM id aa07568; 7 Apr 94 17:33 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa07564; 7 Apr 94 17:32 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa12545; 7 Apr 94 17:16 EDT
Received: by ginger.lcs.mit.edu 
	id AA06951; Thu, 7 Apr 94 17:16:38 -0400
Date: Thu, 7 Apr 94 17:16:38 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404072116.AA06951@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re:  Nimrod IPng technical requirements
Cc: jnc@ginger.lcs.mit.edu

	Coming under separate cover is a first crack at the first section of a
note on IPng requirement for Nimrod. The first section has to do with packet
format issues, and it's pretty well done out; please comment. The second
deals with the general interaction with the rest of the internetwork layer,
and is still pretty sketchy.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa14707; 7 Apr 94 17:56 EDT
Received: from pizza by PIZZA.BBN.COM id aa07577; 7 Apr 94 17:34 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa07573; 7 Apr 94 17:33 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa12602; 7 Apr 94 17:18 EDT
Received: by ginger.lcs.mit.edu 
	id AA06969; Thu, 7 Apr 94 17:18:05 -0400
Date: Thu, 7 Apr 94 17:18:05 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404072118.AA06969@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re:  Nimrod IPng technical requirements (text I)


		Nimrod and IPng Technical Requirements


	I don't want to make any assumptions about whether or not the Internet
will use Nimrod (although I think something like it will eventually be where
the Internet winds up), so I can't tell you exactly what the IPng requirements
will be for routing, as other schemes may need different support. However, I
can tell you what Nimrod needs.
	I will tackle the internetwork packet format first (which is simple),
and then the whole issue of the interaction with the rest of the internetwork
layer, which is a much more difficult topic.


	In speaking of the packet format, you first need to disinguish between
the host-router part of the path, and the router-router part; a format that
works OK for one may not do for another.
	The issue is complicated by the fact that Nimrod can be made to work,
albeit not in optimal form, with information/fields missing from the packet in
the first host-router hop. The missing information/fields can be added by the
first hop router. (This capability is being used to allow deployment and
operation with unmodified IPv4 hosts, although similar techniques could be
used with other internetworking protocols.) Access to the full range of Nimrod
capabilities will require upgrading of hosts to include the necessary
information in the packets they exchange with the routers.
	Second, Nimrod currently has three planned forwarding modes (flows,
datagram, and source-routed packets), and a format that works for one may not
work for another; some modes use fields that are not used by other modes.
The presence or absence of these fields will make a difference.

What Nimrod would like to see in the internetworking packet is:

  - Source and destination EID fields. These are "shortish", fixed length
    fields which contain globally unique, topologically insensitive
    identifiers for endpoints (if you aren't familiar with endpoints, think of
    them as hosts). A length of at least 48 bits, absolute minimum, is needed
    for each of these; we would strongly recommend 64. (IPv4 will be able
    to operate with smaller ones for a while, but eventually either need a new
    packet format, or the horrendous kludgery known as Network Address
    Translators to allow these fields to be only locally unique.)

  - A globally unique flow-id. This *must not* use one of the two previous
    EID fields, as in datagram mode (and probably source-routed mode as well)
    it will be over-written during transit of the network. (The flow is also
    not identified using the EID's, since, again, datagram mode will not work
    if you do.) It could most easily constructed by adding an EID to a locally
    unique flow-id; the latter should be at least 12 bits absolute minimum
    (which would be my "out of thin air" guess), but we would strongly
    recommend a minimum of 16; I would recommend 32.

  - A hop-count. This has to be more than 8 bits; I would strongly recommend
    at least 12, and recommend 16 (to make it easy to update). This is not to
    say that I think networks with diameters larger than 256 are good, or that
    we should design such nets, but I think limiting the maximum path through
    the network to 256 hops is likely to bite us down the road the same way
    making "infinity" 16 in RIP did. When we hit that ceiling, it's going to
    hurt, and there won't be an easy fix. I will note in passing that we are
    already seeing paths lengths of over 30 hops.

  - Optional source and destination locators. These are structured, variable
    length items which are topologically sensitive identifiers for the place in
    the network to which the traffic is destined. The smallest maximum length
    supported should be a minimum of 32 bytes per locator, and longer would
    be even better; I would recommend 256 bytes per locator.

  - Paired with the above, an optional pointer into the locators. This is
    "forwarding state" (i.e. state in the packet which records something about
    its progress across the network) which is used in the datagram forwarding
    mode to ensure that the packet does not loop. It needs to be large enough
    to identify locations in either locator; e.g. if locators can be up to 256
    bytes, it would need to be 9 bits.

  - An optional source route. This are used to support the "source routed
    packet" forwarding mode. Although not designed in detail yet, the syntax
    will likely look much like source routes in PIP; in Nimrod they will be a
    sequence of Nimrod entity identifiers, along with clues as to the context
    in which each identifier is to be interpreted (e.g. up, down, across,
    etc). Since those identifiers themselves are variable length (although
    probably most will be two bytes or less, otherwise the routing overhead
    inside the named object would be excessive), and the hop count above
    contemplates the possibility of paths of over 256 hops, it would seem that
    these might possibly some day exceed 512 bytes, if a lengthy path was
    specified in terms of the actual physical assets used.

  - Paired with the above, an optional pointer into the source route. This is
    also "forwarding state". It needs to be large enough to identify locations
    anywhere in the source route; e.g. if the source router can be up to 1024
    bytes, it would need to be 10 bits.

  - An internetwork header length. I mention this since the above fields could
    easily exceed 256 bytes, if they are to all be carried in the internetwork
    header (see comments below as to where to carry all this information), the
    header length field needs to be more than 8 bits; I recommend 16 bits.

	As noted above, it's possible to use Nimrod in a limited mode where
needed information/fields are added by the first-hop router. It's thus useful
to ask "which of the fields must be present in the host-router header, and
which could be added by the router?" The only one which is absolutely
necessary in all packets are the EID's (provided that some means is available
to map EID's into locators).
	As to the others, if the user wishes to use flows, and wants to
guarantee that their packets are assigned to the correct flows, the flow-id
field is needed. If the user wishes efficient datagram mode, it's probably
wise to include the locators in the packet sent to the router. If the user
wishes to specify the route for the packets, and does not wish to set up a
flow, they need to include the source route.

	How would additional information/fields be added to the packet?  This
question is complex, since all the IPng candidates (and in fact, any
reasonable inter-networking protocol) are extensible protocols; those
extension mechanisms could be used. Also, it would possible to carry some of
the required information as user data in the internetworking packet, with the
original user's data encapsulated further inside. Finally, a private
inter-router packet format could be defined.
	It's not clear which path is best, but we can talk about which fields
the Nimrod routers need access to, and how often; less used ones could be
placed in harder-to-get-to locations (such as in an encapsulated header). The
fields to which the routers need access on every hop are the flow-id and the
hop-count. The locator/pointer fields are only needed at intervals (in what
datagram forarding mode calls "active" routers), as is the source route (the
latter at every object which is named in the source route).
	Depending on how access control is done, and which forwarding mode is
used, the EID's and/or locators might be examined for access control purposes,
wherever that function is performed.
	This is not a complete exploration of the topic, but should give a
rough idea of what's going on.


Received: from PIZZA.BBN.COM by BBN.COM id aa14737; 7 Apr 94 17:57 EDT
Received: from pizza by PIZZA.BBN.COM id aa07663; 7 Apr 94 17:41 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa07659; 7 Apr 94 17:39 EDT
Received: from inet-gw-3.pa.dec.com by BBN.COM id aa13089; 7 Apr 94 17:27 EDT
Received: from nacto1.nacto.lkg.dec.com by inet-gw-3.pa.dec.com (5.65/21Mar94)
	id AA22750; Thu, 7 Apr 94 14:21:24 -0700
Received: from sneezy.nacto.lkg.dec.com by nacto1.nacto.lkg.dec.com with SMTP
	id AA01086; Thu, 7 Apr 1994 17:20:59 -0400
Received: by sneezy.nacto.lkg.dec.com (5.65/4.7) id AA06459; Thu, 7 Apr 1994 17:20:58 -0400
To: nimrod-wg@BBN.COM
Subject: Re:  Nimrod IPng technical requirements text
In-Reply-To: <9404071736.AA03490@ginger.lcs.mit.edu>
References: <9404071736.AA03490@ginger.lcs.mit.edu>
X-Mailer: Poste 2.1
From: David R Oran <oran@nacto.lkg.dec.com>
Date: Thu, 7 Apr 94 17:02:22 -0400
Message-Id: <940407170222.4941@sneezy.nacto.lkg.dec.com.thomas>
Encoding: 67 TEXT, 6 TEXT SIGNATURE

 
> This raises an interesting question. We can have a multi-drop flow, but how
> do you name the set of things the multi-cast flow is delivering to (i.e. the
> multi-cast group)? It can't be a locator, right? It has to be an EID, the only
> other kind of name we've got. This kind of tends to blow a hole in the
> definition of an "endpoint" as a fate-sharing region, though...
> 
No it doesn't have to be an EID. There's no reason why multicast groups
need to (or should) share a namespace with EIDs. There are a few 
possibilities:
a) completely separate semantics AND namespace
b) separate semantics with the namespace shared between EIDs and MCGroups
   (this is what OSI did with multicast NSAPs - useful if the same
   packet field is likely to carry one or the other, but it isn't often
   necessary to carry both)
c) EIDs and MCGroups are indistinguishable to Routers, but the
   participating hosts can tell.
d) EIDs and MCGroups are identical.
See below for more discussion of why it MIGHT matter which of these is
chosen.

> That does tie into something Bob Braden said the other day, which is that it's
> useful to think more about multi-cast applications, where the concept of
> "critical state" is far less useful. Maybe these are two facets of the same
> tihing.
> 
This is true only for multicast applications with minimal "Best effort"
delivery semantics. Multicast with causal or total ordering properties
certainly DO have critical state! On the other hand, if we very carefully
restrict ourselves to only network layer discussions (which may be obvious
to you and me, but possibly not to all readers), and further agree that
the network layer is not responsible for anything other than best-effort
multicast delivery service, then I'm inclined to agree with Bob.

> Anyway, that would mean that we have EID's for multicast groups, and,
> moreover, that a single endpoint can have more than one EID. (It would be
> useful to be able to tell from looking at an EID whether it names a group, or
> a single endpoint; we can use the "top bit" hack for that.)
>
This makes me VERY nervous. If we share EID semantics between
individual endpoints and multicast groups, then you can't answer the
question "which EIDs are the current receivers of for this multicast
group". You *could* answer the question "which locators are associated
with this multicast group", but then if you consider *mobile* group
members then the membership mapping changes when the host machine
moves.

Now, from a pragmatic viewpoint I'm not sure any of this matters terribly
much since you may never want to know state simultaneously about
groups, endpoint-group-participants, and locators of those participants,
but history tells me that people err too frequently on the side of collapsing
concepts which should be kept separate and later discover that an important
degree of freedom or level of indirection has been compromised. I give
as an example the discussions around whether EIDs are needed as well as
addresses, when the prior architecture used one identifier for both
functions!

> So, I think that resolves an old open point about whether endpoints and EID's
> are in one-one correspondance... of course, there's still the issue of whether
> a single endpoint can have more than one non-multicast EID.
> 
> Does this all sound OK?
> 
No. Not yet.

Dave.
Dave.

-+-+-+-+-+-+-+
David R. Oran			Phone:	+ 1 508 486-7377
Digital Equipment Corporation		Fax:	+ 1 508 486-5279
LKG 1-2/A19			Email:	oran@lkg.dec.com
550 King Street
Littleton, MA 01460


Received: from PIZZA.BBN.COM by BBN.COM id aa15698; 7 Apr 94 18:22 EDT
Received: from pizza by PIZZA.BBN.COM id aa08008; 7 Apr 94 18:09 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa08004; 7 Apr 94 18:07 EDT
Received: from inet-gw-3.pa.dec.com by BBN.COM id aa14890; 7 Apr 94 18:00 EDT
Received: from nacto1.nacto.lkg.dec.com by inet-gw-3.pa.dec.com (5.65/21Mar94)
	id AA24971; Thu, 7 Apr 94 14:53:10 -0700
Received: from sneezy.nacto.lkg.dec.com by nacto1.nacto.lkg.dec.com with SMTP
	id AA02254; Thu, 7 Apr 1994 17:52:44 -0400
Received: by sneezy.nacto.lkg.dec.com (5.65/4.7) id AA06504; Thu, 7 Apr 1994 17:52:43 -0400
To: nimrod-wg@BBN.COM
Subject: Re:  Nimrod IPng technical requirements text
In-Reply-To: <9404071736.AA03490@ginger.lcs.mit.edu>
References: <9404071736.AA03490@ginger.lcs.mit.edu>
X-Mailer: Poste 2.1
From: David R Oran <oran@nacto.lkg.dec.com>
Date: Thu, 7 Apr 94 17:02:22 -0400
Message-Id: <940407170222.4941@sneezy.nacto.lkg.dec.com.thomas>
Encoding: 67 TEXT, 6 TEXT SIGNATURE

 
> This raises an interesting question. We can have a multi-drop flow, but how
> do you name the set of things the multi-cast flow is delivering to (i.e. the
> multi-cast group)? It can't be a locator, right? It has to be an EID, the only
> other kind of name we've got. This kind of tends to blow a hole in the
> definition of an "endpoint" as a fate-sharing region, though...
> 
No it doesn't have to be an EID. There's no reason why multicast groups
need to (or should) share a namespace with EIDs. There are a few 
possibilities:
a) completely separate semantics AND namespace
b) separate semantics with the namespace shared between EIDs and MCGroups
   (this is what OSI did with multicast NSAPs - useful if the same
   packet field is likely to carry one or the other, but it isn't often
   necessary to carry both)
c) EIDs and MCGroups are indistinguishable to Routers, but the
   participating hosts can tell.
d) EIDs and MCGroups are identical.
See below for more discussion of why it MIGHT matter which of these is
chosen.

> That does tie into something Bob Braden said the other day, which is that it's
> useful to think more about multi-cast applications, where the concept of
> "critical state" is far less useful. Maybe these are two facets of the same
> tihing.
> 
This is true only for multicast applications with minimal "Best effort"
delivery semantics. Multicast with causal or total ordering properties
certainly DO have critical state! On the other hand, if we very carefully
restrict ourselves to only network layer discussions (which may be obvious
to you and me, but possibly not to all readers), and further agree that
the network layer is not responsible for anything other than best-effort
multicast delivery service, then I'm inclined to agree with Bob.

> Anyway, that would mean that we have EID's for multicast groups, and,
> moreover, that a single endpoint can have more than one EID. (It would be
> useful to be able to tell from looking at an EID whether it names a group, or
> a single endpoint; we can use the "top bit" hack for that.)
>
This makes me VERY nervous. If we share EID semantics between
individual endpoints and multicast groups, then you can't answer the
question "which EIDs are the current receivers of for this multicast
group". You *could* answer the question "which locators are associated
with this multicast group", but then if you consider *mobile* group
members then the membership mapping changes when the host machine
moves.

Now, from a pragmatic viewpoint I'm not sure any of this matters terribly
much since you may never want to know state simultaneously about
groups, endpoint-group-participants, and locators of those participants,
but history tells me that people err too frequently on the side of collapsing
concepts which should be kept separate and later discover that an important
degree of freedom or level of indirection has been compromised. I give
as an example the discussions around whether EIDs are needed as well as
addresses, when the prior architecture used one identifier for both
functions!

> So, I think that resolves an old open point about whether endpoints and EID's
> are in one-one correspondance... of course, there's still the issue of whether
> a single endpoint can have more than one non-multicast EID.
> 
> Does this all sound OK?
> 
No. Not yet.

Dave.
Dave.

-+-+-+-+-+-+-+
David R. Oran			Phone:	+ 1 508 486-7377
Digital Equipment Corporation		Fax:	+ 1 508 486-5279
LKG 1-2/A19			Email:	oran@lkg.dec.com
550 King Street
Littleton, MA 01460


Received: from PIZZA.BBN.COM by BBN.COM id aa17981; 7 Apr 94 19:38 EDT
Received: from pizza by PIZZA.BBN.COM id aa08471; 7 Apr 94 19:25 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa08467; 7 Apr 94 19:23 EDT
Received: from wd40.ftp.com by BBN.COM id aa17197; 7 Apr 94 19:09 EDT
Received: from ftp.com by ftp.com  ; Thu, 7 Apr 1994 19:09:36 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Thu, 7 Apr 1994 19:09:36 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA01928; Thu, 7 Apr 94 19:08:44 EDT
Date: Thu, 7 Apr 94 19:08:44 EDT
Message-Id: <9404072308.AA01928@mailserv-D.ftp.com>
To: jnc@ginger.lcs.mit.edu
Subject: Re:  Nimrod IPng technical requirements text
From: Frank Kastenholz <kasten@ftp.com>
Reply-To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu
Content-Length: 1527

 
 > This caused an immediate fault. Locators are the names of objects in the
 > Nimrod map; you can't *have* a multicast locator!

locators name places.... there need not be an object there :-)


>This raises an interesting question. We can have a multi-drop flow, but how
>do you name the set of things the multi-cast flow is delivering to (i.e. the
>multi-cast group)? It can't be a locator, right? It has to be an EID, the only
>other kind of name we've got. This kind of tends to blow a hole in the
>definition of an "endpoint" as a fate-sharing region, though...

how do we name flows? is there another object in nimrodland which
needs naming? i.e. we have names for things (eids), names for places
(locators), i would imagine that a there also has to be a 'path'
which is followed to get from one place to another -- i.e. flows.

if you take a somewhat object-oriented approach to things then i
would imagine that these paths have many different attributes, among
them things like network qos needed, security goop, perhaps the
source and destination places. if an attribute is just an attribute,
then a path can have many attributes, it can have many destinations,
possibly even many sources.

there certainly may be optimizations to be made for certain, common,
cases. but we should optimize only when we get the general principles
right.


or is it getting late in the day and are my brain cells starting to go
to sleep?

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa18316; 7 Apr 94 19:51 EDT
Received: from pizza by PIZZA.BBN.COM id aa08567; 7 Apr 94 19:35 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa08563; 7 Apr 94 19:33 EDT
Received: from wd40.ftp.com by BBN.COM id aa17507; 7 Apr 94 19:22 EDT
Received: from ftp.com by ftp.com  ; Thu, 7 Apr 1994 19:22:09 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Thu, 7 Apr 1994 19:22:09 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA01996; Thu, 7 Apr 94 19:21:15 EDT
Date: Thu, 7 Apr 94 19:21:15 EDT
Message-Id: <9404072321.AA01996@mailserv-D.ftp.com>
To: oran@nacto.lkg.dec.com
Subject: Re:  Nimrod IPng technical requirements text
From: Frank Kastenholz <kasten@ftp.com>
Reply-To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM
Content-Length: 748

small nit......

>>Anyway, that would mean that we have EID's for multicast groups, and,
>>moreover, that a single endpoint can have more than one EID.
... 
> Now, from a pragmatic viewpoint I'm not sure any of this matters terribly
> much since you may never want to know state simultaneously about
> groups, endpoint-group-participants, and locators of those participants,

accounting and security and the 'are you allowed to see this? have you
bought it?" lawyer types would definitely want to know this sort of
stuff, in some fashion. if we assume that the internet will 'go commercial'
then these interests will need to be catered to. grrrrrrrrr.


--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa19464; 7 Apr 94 20:26 EDT
Received: from pizza by PIZZA.BBN.COM id aa08887; 7 Apr 94 20:12 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa08883; 7 Apr 94 20:10 EDT
Received: from uucp4.netcom.com by BBN.COM id aa19002; 7 Apr 94 20:08 EDT
Received: from localhost by netcomsv.netcom.com with UUCP (8.6.4/SMI-4.1)
	id QAA08422; Thu, 7 Apr 1994 16:51:31 -0700
Received: from cc:Mail UUCPLINK 2.0 by metrico.metricom.com
	id 9403077657.AA765758582 Thu, 07 Apr 94 15:43:02 
Date: Thu, 07 Apr 94 15:43:02 
From: Greg_Campbell@metrico.metricom.com
Message-Id: <9403077657.AA765758582@metrico.metricom.com>
To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu
Content-Length: 1527
Subject: cc:Mail UUCPLINK 2.0 Undeliverable Message

User metrico!rfox is not defined
           Original text follows 
-----------------------------------------
Received: by ccmail
Received:  from netcomsv by metricom.com (UUPC/extended 1.11) with UUCP;
           Thu, 07 Apr 1994 15:40:11 PDT
Received: from PIZZA.BBN.COM by netcomsv.netcom.com with SMTP (8.6.4/SMI-4.1)
	id QAA15788; Thu, 7 Apr 1994 16:41:56 -0700
Received: from pizza by PIZZA.BBN.COM id aa08471; 7 Apr 94 19:25 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa08467; 7 Apr 94 19:23 EDT
Received: from wd40.ftp.com by BBN.COM id aa17197; 7 Apr 94 19:09 EDT
Received: from ftp.com by ftp.com  ; Thu, 7 Apr 1994 19:09:36 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Thu, 7 Apr 1994 19:09:36 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA01928; Thu, 7 Apr 94 19:08:44 EDT
Date: Thu, 7 Apr 94 19:08:44 EDT
Message-Id: <9404072308.AA01928@mailserv-D.ftp.com>
To: jnc@ginger.lcs.mit.edu
Subject: Re:  Nimrod IPng technical requirements text
From: Frank Kastenholz <kasten@ftp.com>
X-ccAdmin: metricom@netcomsv
Reply-To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu
Content-Length: 1527

 
 > This caused an immediate fault. Locators are the names of objects in the
 > Nimrod map; you can't *have* a multicast locator!

locators name places.... there need not be an object there :-)


>This raises an interesting question. We can have a multi-drop flow, but how
>do you name the set of things the multi-cast flow is delivering to (i.e. the
>multi-cast group)? It can't be a locator, right? It has to be an EID, the only
>other kind of name we've got. This kind of tends to blow a hole in the
>definition of an "endpoint" as a fate-sharing region, though...

how do we name flows? is there another object in nimrodland which
needs naming? i.e. we have names for things (eids), names for places
(locators), i would imagine that a there also has to be a 'path'
which is followed to get from one place to another -- i.e. flows.

if you take a somewhat object-oriented approach to things then i
would imagine that these paths have many different attributes, among
them things like network qos needed, security goop, perhaps the
source and destination places. if an attribute is just an attribute,
then a path can have many attributes, it can have many destinations,
possibly even many sources.

there certainly may be optimizations to be made for certain, common,
cases. but we should optimize only when we get the general principles
right.


or is it getting late in the day and are my brain cells starting to go
to sleep?

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa20280; 7 Apr 94 20:54 EDT
Received: from pizza by PIZZA.BBN.COM id aa09081; 7 Apr 94 20:39 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa09077; 7 Apr 94 20:37 EDT
Received: from uucp4.netcom.com by BBN.COM id aa19802; 7 Apr 94 20:36 EDT
Received: from localhost by netcomsv.netcom.com with UUCP (8.6.4/SMI-4.1)
	id RAA00866; Thu, 7 Apr 1994 17:16:47 -0700
Received: from cc:Mail UUCPLINK 2.0 by metrico.metricom.com
	id 9403077657.AA765759668 Thu, 07 Apr 94 16:01:08 
Date: Thu, 07 Apr 94 16:01:08 
From: Greg_Campbell@metrico.metricom.com
Message-Id: <9403077657.AA765759668@metrico.metricom.com>
To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM
Content-Length: 748
Subject: cc:Mail UUCPLINK 2.0 Undeliverable Message

User metrico!rfox is not defined
           Original text follows 
-----------------------------------------
Received: by ccmail
Received:  from netcomsv by metricom.com (UUPC/extended 1.11) with UUCP;
           Thu, 07 Apr 1994 15:57:43 PDT
Received: from PIZZA.BBN.COM by netcomsv.netcom.com with SMTP (8.6.4/SMI-4.1)
	id QAA18794; Thu, 7 Apr 1994 16:56:22 -0700
Received: from pizza by PIZZA.BBN.COM id aa08567; 7 Apr 94 19:35 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa08563; 7 Apr 94 19:33 EDT
Received: from wd40.ftp.com by BBN.COM id aa17507; 7 Apr 94 19:22 EDT
Received: from ftp.com by ftp.com  ; Thu, 7 Apr 1994 19:22:09 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Thu, 7 Apr 1994 19:22:09 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA01996; Thu, 7 Apr 94 19:21:15 EDT
Date: Thu, 7 Apr 94 19:21:15 EDT
Message-Id: <9404072321.AA01996@mailserv-D.ftp.com>
To: oran@nacto.lkg.dec.com
Subject: Re:  Nimrod IPng technical requirements text
From: Frank Kastenholz <kasten@ftp.com>
X-ccAdmin: metricom@netcomsv
Reply-To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM
Content-Length: 748

small nit......

>>Anyway, that would mean that we have EID's for multicast groups, and,
>>moreover, that a single endpoint can have more than one EID.
... 
> Now, from a pragmatic viewpoint I'm not sure any of this matters terribly
> much since you may never want to know state simultaneously about
> groups, endpoint-group-participants, and locators of those participants,

accounting and security and the 'are you allowed to see this? have you
bought it?" lawyer types would definitely want to know this sort of
stuff, in some fashion. if we assume that the internet will 'go commercial'
then these interests will need to be catered to. grrrrrrrrr.


--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa24641; 8 Apr 94 22:06 EDT
Received: from pizza by PIZZA.BBN.COM id aa16868; 8 Apr 94 21:50 EDT
Received: from lotus.com by PIZZA.BBN.COM id aa16864; 8 Apr 94 21:48 EDT
Received: from Mail.Lotus.com (crd.lotus.com) by lotus.com (4.1/SMI-4.1)
	id AA02062; Fri, 8 Apr 94 21:50:21 EDT
Received: by Mail.Lotus.com (4.1/SMI-4.1-DNI)
	id AA08703; Fri, 8 Apr 94 21:54:24 EDT
Date: Fri, 8 Apr 94 21:54:24 EDT
From: Robert_Ullmann.LOTUS@crd.lotus.com
Message-Id: <9404090154.AA08703@Mail.Lotus.com>
Received: by DniMail (v1.0); Fri Apr  8 21:54:21 1994 EDT
To: unixml: ;, lotus.com@crd.lotus.com
MMDF-Warning:  Parse error in original version of preceding line at PIZZA.BBN.COM
Subject: hop limit

Hi,

Keep in mind that the hop limit is a log scale number.
While it is entirely expected to see the number of hops
rise from ~16 to ~30 as the number of connected hosts
goes from a few hunded to a few million, it isn't reasonable to
then expect it to go anywhere near 256.

The empirical formula seems to be max hops (the diameter, in
some sense) is 2 times base 2 log of number of hosts. The model
is that the worst case is a walk all the way up some hierarchy, and
then all the way down some other path. The usual routes are
always equal to or better than that.

To get 256, we would need approximately 10^40 hosts. This
is a big number. (10,000,000,000,000,000,000,000,000,000,000,000,000,000 :-)

If you assumed each hop offered a branchiness of at
least two, 256 hops would let you reach ~10^80 hosts, or something
more than the number of neutrons in the observable universe
(10^78, if I remember correctly :-)

Best Regards,
Robert


Received: from PIZZA.BBN.COM by BBN.COM id aa18130; 9 Apr 94 0:10 EDT
Received: from pizza by PIZZA.BBN.COM id aa17310; 8 Apr 94 23:56 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa17306; 8 Apr 94 23:55 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa17749; 8 Apr 94 23:54 EDT
Received: by ginger.lcs.mit.edu 
	id AA19162; Fri, 8 Apr 94 23:54:28 -0400
Date: Fri, 8 Apr 94 23:54:28 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404090354.AA19162@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re:  hop limit
Cc: jnc@ginger.lcs.mit.edu

    Keep in mind that the hop limit is a log scale number.

With a fairly evenly distributed random graph, this is true. However, I don't
think this formula will apply to the network, since people tend to build all
kinds of uneven meshes in practise.

I think that you may recall the debate in January with Masataka Ohta, in which
he proposed that the best model for the network was a planar graph, in which
the average path length (and diameter) go as sqrt(N), not log(N). You can't
both be right!

At that time, I took a positition that the average would tend toward log(N).
However, my personal guess here is that due the non-even nature of real-world
networks, I think that while the average will be closer to log(N), the worst
case could be pretty bad. I don't think there's necessarily a contradiction
between my previous position, and my position here. We have to distinguish
between average, and worst case. In fairly even graphs, there's not a lot of
variance in the ratio of the average path length to the worst case (i.e. the
diameter), for any given size.

(As a side-point, my intuition says that in such fairly even graphs, as the
graph gets larger, the average path length will asymptote to the diameter,
since the longer paths will get you to a far larger % of the large number of
total nodes, so the contributions of the shorter paths to the average will
diminish. Anyone know if this is right?)

Anyway, I think it's reasonable to guess that the worst case will be a lot
worse than log(N), since, due to the non-even nature of real world networks,
we will probably see a fair amount of variance between the average path
length, and the worst case.

Real-world experience shows this is accurate, at least so far. E.g., path
lengths of more than 16 inside *regionals* (which is why RIP stopped working,
and this *did* happen), and a reported path of more than 30 in the Internet
about a year ago. In neither case was this anything like theoretical diameter
of a graph with the appropriate number of nodes.


    While it is entirely expected to see the number of hops rise from ~16 to
    ~30 as the number of connected hosts goes from a few hunded to a few
    million, it isn't reasonable to then expect it to go anywhere near 256.
    The empirical formula seems to be max hops (the diameter, in some sense)
    is 2 times base 2 log of number of hosts. ... To get 256, we would need
    approximately 10^40 hosts. ... least two, 256 hops would let you reach
    ~10^80 hosts

Real-world experience, as above, shows that this formula does not apply to the
worst case. We may get closer to that as the network gets larger (the variance
from the theoretical average is likely to decline), but I think we would be
*very* unwise to run that chance; it's one I don't want take.

I'd personally take almost any bet that we'll see path lengths of larger than
256 before 2020, which is within the expected lifetime of IPng.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa21149; 9 Apr 94 2:10 EDT
Received: from pizza by PIZZA.BBN.COM id aa17756; 9 Apr 94 1:58 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa17752; 9 Apr 94 1:56 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa20643; 9 Apr 94 1:55 EDT
Received: by ginger.lcs.mit.edu 
	id AA19968; Sat, 9 Apr 94 01:55:00 -0400
Date: Sat, 9 Apr 94 01:55:00 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404090555.AA19968@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re:  Nimrod IPng technical requirements text
Cc: jnc@ginger.lcs.mit.edu

    how do we name flows?

With flow-id's...

    is there another object in nimrodland which needs naming?

I can't think of one offhand, but my brain is pretty run down...

    i would imagine that a there also has to be a 'path' which is followed to
    get from one place to another -- i.e. flows.

Hmm. Will we need to name paths, separate from flows?


	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa22006; 9 Apr 94 2:44 EDT
Received: from pizza by PIZZA.BBN.COM id aa17923; 9 Apr 94 2:33 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa17919; 9 Apr 94 2:31 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa21670; 9 Apr 94 2:31 EDT
Received: by ginger.lcs.mit.edu 
	id AA20355; Sat, 9 Apr 94 02:31:10 -0400
Date: Sat, 9 Apr 94 02:31:10 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404090631.AA20355@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Multicast group names
Cc: jnc@ginger.lcs.mit.edu

    >> Why not call it a set of EIDs and give it a set-ID?

    > I assume there's pretty much a one-one mapping between the
    > concept of "multicast group" and the concept of "set of endpoints",
    > right?

    Right. Or maybe it's a one-one mapping with a "flow"?  I.e. a set of
    EIDs that do not share a fate but DO share routing and QOS state in the
    network?

No, a flow (either unicast or multicast) to me has the general meaning of a
path through the network with some user requirement info (all stored as
non-critical state in routers) attached to it. This is a whole different thing
from simply the destination(s) of a flow (which is what EID's and SID's are);
you can have several different flows to the same destination(s), for example.


    > Does everything work OK if the set-ID's (SID's) come from the same
    > namespace as EID's (perhaps differentiated by the high bit, or
    > something)? Is there any reason to draw them from a different namespace?

    Well, a set can be identified by enumerating its elements, so the
    long-form of the set-ID is a listing of its constituent EIDs. ... Of
    course you prefer to have an alias for this long name, and this alias
    could be assigned the way you mention.

You'd want that alias for sticking the packet headers; the long form wouldn't
be practical for groups of any size at all.

    Technically, isn't using the high bit to differentiate an SID from an EID
    the same thing as splitting the namespace in half?  The only concern is
    keeping enough space for EIDs.

Yes, we keep the syntax the same, but split the semantics, as Dave Oran
pointed out. I dunno, maybe it's a SID if the top N bits are one, or something,
but the principle's the same: SID's are drawn from the same namespace as EID's,
but you can tell just from looking at one whether it's an SID or an EID. Of
course, they name totally different sorts of thing, too.

Does this sound like the right thing to everyone?


    > Also, do packets being sent to SID's look just like packets destined to
    > EID's? I.e., except for the different kind of destinaion "name", is there
    > any different information which needs to be carried to be useful, etc?

    Since you may be treating state in the network differently for SIDs, there
    may extra info.

Well, I dunno. I'm not sure the routers will know anything about SID's and
EID's; they will deal with flows.

Perhaps some of the multicast group maintainence mechanisms (and to the extent
that we have distributed spanning tree computations, those as well) will need
to know about SID's? Maybe not, and they can just work with the multicast
flow-ids.... Have to think about that.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa23848; 11 Apr 94 6:58 EDT
Received: from pizza by PIZZA.BBN.COM id aa01851; 11 Apr 94 6:37 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa01847; 11 Apr 94 6:34 EDT
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa23166;
          11 Apr 94 6:32 EDT
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Mon, 11 Apr 94 19:25:59 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9404111026.AA22168@necom830.cc.titech.ac.jp>
Subject: Re:  hop limit
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Date: Mon, 11 Apr 94 19:25:58 JST
Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu
In-Reply-To: <9404090354.AA19162@ginger.lcs.mit.edu>; from "Noel Chiappa" at Apr 8, 94 11:54 pm
X-Mailer: ELM [version 2.3 PL11]

> I think that you may recall the debate in January with Masataka Ohta, in which
> he proposed that the best model for the network was a planar graph, in which
> the average path length (and diameter) go as sqrt(N), not log(N). You can't
> both be right!

After thinking about the problem, I have noticed that the problem relates
to a flat-rate distance.

Between nodes located within the flat-rate distance, topology can be
(but not necessarily is) truely random, in which case the hop count
scales O(log(N)).

Beyond the distance, it scales O(sqrt(N)).

In Japan, the distance is 15Km. Not so large. Considering that the
maximum arc length on the Earth is 20,000Km, we need hop count of
1300.

> At that time, I took a positition that the average would tend toward log(N).

You are assuming tree-like topology here, not a random graph nor mesh.

That is, your position is biased with the current fact that a small
number of T3 routers can handle all the backbone traffic and the
backbone is mostly tree structured rather than full mesh. Such
topology scales O(log(N)).

As network traffic increases, we need a lot of routers handling
global traffic, in which case we do need mesh of routers and hop
count will be close to O((size of the Earth)/(flat-rate distance)).

Of course, if we pay extra money to connect distant routers, the
required maximum hop count becomes smaller. Such extra connections
are also necessary to reduce the load of routers. But, wiring does cost
and the improvement is propotional to the cose paid.

So, I must concludes that the maximum hop count of 255 is unsafe.

						Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa12607; 11 Apr 94 10:59 EDT
Received: from pizza by PIZZA.BBN.COM id aa03059; 11 Apr 94 10:44 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa03055; 11 Apr 94 10:41 EDT
Received: from wd40.ftp.com by BBN.COM id aa11319; 11 Apr 94 10:39 EDT
Received: from ftp.com by ftp.com  ; Mon, 11 Apr 1994 10:39:10 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Mon, 11 Apr 1994 10:39:10 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA29191; Mon, 11 Apr 94 10:38:11 EDT
Date: Mon, 11 Apr 94 10:38:11 EDT
Message-Id: <9404111438.AA29191@mailserv-D.ftp.com>
To: jnc@ginger.lcs.mit.edu
Subject: Re:  Nimrod IPng technical requirements text
From: Frank Kastenholz <kasten@ftp.com>
Reply-To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu
Content-Length: 758

 >     how do we name flows?
 > 
 > With flow-id's...

Well, at the end of my note I said:
 > or is it getting late in the day and are my brain cells starting to go
 > to sleep?
I guess it was late and my brain cells were asleep....

However, being reminded that those things what I was arguing that we
needed to name are FLOWS, it would seem that my original thoughts
were sort of on line.  IF FLOWS have their own name space, i.e.
there is a separate, distinct, unique, FLOWID in the packet, and
that name is not 'derived' from the source/dest EIDs of the packet,
then we could view the endpoints of the flow as attributes of the
flow, just as things like qos.

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa15343; 11 Apr 94 11:35 EDT
Received: from pizza by PIZZA.BBN.COM id aa03294; 11 Apr 94 11:14 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa03290; 11 Apr 94 11:12 EDT
Received: from uucp9.netcom.com by BBN.COM id aa13471; 11 Apr 94 11:11 EDT
Received: from localhost by netcomsv.netcom.com with UUCP (8.6.4/SMI-4.1)
	id IAA29839; Mon, 11 Apr 1994 08:11:33 -0700
Received: from cc:Mail UUCPLINK 2.0 by metrico.metricom.com
	id 9403117660.AA766072762 Mon, 11 Apr 94 06:59:22 
Date: Mon, 11 Apr 94 06:59:22 
From: Greg_Campbell@metrico.metricom.com
Message-Id: <9403117660.AA766072762@metrico.metricom.com>
To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu
Content-Length: 758
Subject: cc:Mail UUCPLINK 2.0 Undeliverable Message

User metrico!rfox is not defined
           Original text follows 
-----------------------------------------
Received: by ccmail
Received:  from netcomsv by metricom.com (UUPC/extended 1.11) with UUCP;
           Mon, 11 Apr 1994 06:57:46 PDT
Received: from PIZZA.BBN.COM by netcomsv.netcom.com with SMTP (8.6.4/SMI-4.1)
	id IAA09628; Mon, 11 Apr 1994 08:02:08 -0700
Received: from pizza by PIZZA.BBN.COM id aa03059; 11 Apr 94 10:44 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa03055; 11 Apr 94 10:41 EDT
Received: from wd40.ftp.com by BBN.COM id aa11319; 11 Apr 94 10:39 EDT
Received: from ftp.com by ftp.com  ; Mon, 11 Apr 1994 10:39:10 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Mon, 11 Apr 1994 10:39:10 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA29191; Mon, 11 Apr 94 10:38:11 EDT
Date: Mon, 11 Apr 94 10:38:11 EDT
Message-Id: <9404111438.AA29191@mailserv-D.ftp.com>
To: jnc@ginger.lcs.mit.edu
Subject: Re:  Nimrod IPng technical requirements text
From: Frank Kastenholz <kasten@ftp.com>
X-ccAdmin: metricom@netcomsv
Reply-To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu
Content-Length: 758

 >     how do we name flows?
 > 
 > With flow-id's...

Well, at the end of my note I said:
 > or is it getting late in the day and are my brain cells starting to go
 > to sleep?
I guess it was late and my brain cells were asleep....

However, being reminded that those things what I was arguing that we
needed to name are FLOWS, it would seem that my original thoughts
were sort of on line.  IF FLOWS have their own name space, i.e.
there is a separate, distinct, unique, FLOWID in the packet, and
that name is not 'derived' from the source/dest EIDs of the packet,
then we could view the endpoints of the flow as attributes of the
flow, just as things like qos.

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa19073; 11 Apr 94 12:34 EDT
Received: from pizza by PIZZA.BBN.COM id aa03658; 11 Apr 94 12:18 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa03654; 11 Apr 94 12:16 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa17711; 11 Apr 94 12:11 EDT
Received: by ginger.lcs.mit.edu 
	id AA06931; Mon, 11 Apr 94 12:11:05 -0400
Date: Mon, 11 Apr 94 12:11:05 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404111611.AA06931@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re:  hop limit
Cc: jnc@ginger.lcs.mit.edu

    After thinking about the problem, I have noticed that the problem relates
    to a flat-rate distance. Between nodes located within the flat-rate
    distance, topology can be (but not necessarily is) truely random, in which
    case the hop count scales O(log(N)). Beyond the distance, it scales
    O(sqrt(N)).

If I understand you correctly, you are saying that i) in local regions of the
network (e.g. a city), the graph of the network will be probably be (to a
reasonable approximation) random, but that ii) when you look at the network at
a high level (e.g. global), the graph will will probably be (to a reasonable
approximation) planar?

I think there is something to this; I think the network graph will be more
randomly connected locally. However, I'm still not sure that, at the global
level, it will be close to the planar end of the random<->planar spectrum.
It's just too easy to make non-planar links, and I think they act very quickly
to reduce the diameter toward the size predicted for random graphs.

Actually, if someone feel really energetic, they could try creating a model,
and see how quickly the diameter of the graph changes, from the value
predicted for planar graphs, to the value predicted for random graphs. I think
it's probably easier to do this with statistical models, than by mathematical
analysis.


    > At that time, I took a positition that the average would tend toward
    > log(N).

    You are assuming tree-like topology here, not a random graph nor mesh.
    That is, your position is biased with the current fact that a small
    number of T3 routers can handle all the backbone traffic and the
    backbone is mostly tree structured rather than full mesh. 

Not really, I don't think. The model I was working with was closer to yours.
There would definitely be a hierarchy of carriers and links, but, like the
road network, it would still be a mesh..

    Of course, if we pay extra money to connect distant routers, the required
    maximum hop count becomes smaller. Such extra connections are also
    necessary to reduce the load of routers. But, wiring does cost and the
    improvement is propotional to the cose paid.

Good point.


	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa21319; 11 Apr 94 13:11 EDT
Received: from pizza by PIZZA.BBN.COM id aa03851; 11 Apr 94 12:53 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa03847; 11 Apr 94 12:51 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa19784; 11 Apr 94 12:45 EDT
Received: by ginger.lcs.mit.edu 
	id AA07385; Mon, 11 Apr 94 12:45:11 -0400
Date: Mon, 11 Apr 94 12:45:11 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404111645.AA07385@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re: Nimrod IPng technical requirements text
Cc: jnc@ginger.lcs.mit.edu

    Move the "flag" that this is a multicast EID out of the EID and make it a
    real flag bit.

    Advantages are 1) this doubles the name space rather than halving it, 2)
    there is a specific flag that specifies the semantics (which has
    implications for future extension), 3) you can assign multicast ids through
    some central authority simply by incrementing a counter.

Hmm. 1) I don't think is that important. For 2), I'd assume that we'd assign
the same semantics to the top bit if we did it that way, and if there's only
one bit, it's kind of hard to do extended semantics. (Maybe the lesson here is
that if the top bit is 0, it's an EID, if the top bits are 1111, it's a SID,
and other valuesa re reserved for future use?) For 3), can't you do the same
even with the encoding scheme I just proposed.

    Disadvantages are ... this may have implications for source routing (does
    it make sense to source route through a multicast?)

I didn't think source routes were going to contain EID's/SID's anyway. I
assumed they were going to contain locators (and locator elements, for compact
expression; you don't need the whole locator anyway, just an up/across/down
bit and the element).


    It could be reasonable to have a multicast group that anyone may join or
    leave at random. The other extreme is the group that is completely
    controlled from a central location. It would be nice if the creation of a
    multicast group could carry with it information about the security level of
    the multicast.

Good point!

    My goal would be to allow the router to do EID addition/removal to the
    multi-cast group in those situations that are entirely open.

Right, but we need a more general mechanism that will allow users to do the
group entry control. Since their policies may not be publicly stateable, it
has to be like route selection; capable of being moved totally outside.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa22176; 11 Apr 94 13:25 EDT
Received: from pizza by PIZZA.BBN.COM id aa04079; 11 Apr 94 13:13 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa04075; 11 Apr 94 13:11 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa21064; 11 Apr 94 13:06 EDT
Received: by ginger.lcs.mit.edu 
	id AA07766; Mon, 11 Apr 94 13:06:31 -0400
Date: Mon, 11 Apr 94 13:06:31 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404111706.AA07766@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re:  Nimrod IPng technical requirements text
Cc: jnc@ginger.lcs.mit.edu

    we could view the endpoints of the flow as attributes of the flow, just as
    things like qos.

I don't think the internetwork is relaly going to know *all* the attributes
of a flow; there may be policy stuff only the source knows, etc.

I'll have to think about whether it's useful to think of the endpoints of a
flow as attributes of the flow.


	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa23423; 11 Apr 94 13:46 EDT
Received: from pizza by PIZZA.BBN.COM id aa04213; 11 Apr 94 13:29 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa04209; 11 Apr 94 13:28 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa22054; 11 Apr 94 13:23 EDT
Received: by ginger.lcs.mit.edu 
	id AA08248; Mon, 11 Apr 94 13:23:50 -0400
Date: Mon, 11 Apr 94 13:23:50 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404111723.AA08248@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re: hop limit
Cc: jnc@ginger.lcs.mit.edu

    I read once that the diameter of human population was pretty small. ...
    The surprising fact is that according to this metric the distance is said
    to be pretty small - typically around 5. I have no hard proof, though.

I've heard this too. However, humans are typically very richly connected. So,
maybe in the log(N) thing, the base of the log is some function of the average
connectivity of the nodes. Thus, a graph with very richly connected nodes (on
average) will have a smaller diameter than a graph with sparsely connected
nodes. Good point....

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa24500; 11 Apr 94 14:02 EDT
Received: from pizza by PIZZA.BBN.COM id aa04327; 11 Apr 94 13:46 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa04323; 11 Apr 94 13:44 EDT
Received: from wd40.ftp.com by BBN.COM id aa22972; 11 Apr 94 13:37 EDT
Received: from ftp.com by ftp.com  ; Mon, 11 Apr 1994 13:37:09 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Mon, 11 Apr 1994 13:37:09 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA01948; Mon, 11 Apr 94 13:36:14 EDT
Date: Mon, 11 Apr 94 13:36:14 EDT
Message-Id: <9404111736.AA01948@mailserv-D.ftp.com>
To: jnc@ginger.lcs.mit.edu
Subject: Re:  Nimrod IPng technical requirements text
From: Frank Kastenholz <kasten@ftp.com>
Reply-To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM
Content-Length: 859


>I'll have to think about whether it's useful to think of the endpoints of a
>flow as attributes of the flow.

If you do then multicast and unicast are the same thing. However, there
may be a security aspect here. 

If flows are set up ala RSVP's 'receiver pull' mechanism, then you
and I could be engaged in a one-to-one conversation and any random
person could join into the flow and they could read our conversation.
Every flow would need some sort of admission control attributes in
order to prevent this sort of evesdropping.

If flows are set up in a 'transmitter push' mechanism then one node
could be in a position to 'force' a flow onto another node -- but
this is not really any worse than a node, today, just sending packets
to arbitrary IP addresses.

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa26527; 11 Apr 94 14:33 EDT
Received: from pizza by PIZZA.BBN.COM id aa04555; 11 Apr 94 14:13 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa04551; 11 Apr 94 14:11 EDT
Received: from bridge2.NSD.3Com.COM by BBN.COM id aa24843; 11 Apr 94 14:07 EDT
Received: from remmel.NSD.3Com.COM by bridge2.NSD.3Com.COM with SMTP id AA20943
  (5.65c/IDA-1.4.4nsd for <nimrod-wg@BBN.COM>); Mon, 11 Apr 1994 11:07:36 -0700
Received: from localhost.NSD.3Com.COM by remmel.NSD.3Com.COM with SMTP id AA28977
  (5.65c/IDA-1.4.4-910725); Mon, 11 Apr 1994 11:07:35 -0700
Message-Id: <199404111807.AA28977@remmel.NSD.3Com.COM>
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Cc: nimrod-wg@BBN.COM
Subject: Re: hop limit 
In-Reply-To: Your message of "Mon, 11 Apr 94 13:23:50 EDT."
             <9404111723.AA08248@ginger.lcs.mit.edu> 
Date: Mon, 11 Apr 94 11:07:33 -0700
From: tracym@nsd.3com.com

>     I read once that the diameter of human population was pretty small. ...
>     The surprising fact is that according to this metric the distance is said
>     to be pretty small - typically around 5. I have no hard proof, though.
> 
> I've heard this too. However, humans are typically very richly connected. So,
> maybe in the log(N) thing, the base of the log is some function of the average
> connectivity of the nodes. Thus, a graph with very richly connected nodes (on
> average) will have a smaller diameter than a graph with sparsely connected
> nodes. Good point....

I'd heard a number more like 10 (in grade school), but I'd guess that
the "typically" is important.  If you step back and only allow
connections that have existed in the last month or three, then there
may well be strings of outliers that increase the actual maximum
diameter greatly.  It is easy to imagine strung-out serial
topologies of all kinds that shouldn't be precluded.

Tracy


Received: from PIZZA.BBN.COM by BBN.COM id aa28537; 11 Apr 94 15:04 EDT
Received: from pizza by PIZZA.BBN.COM id aa04802; 11 Apr 94 14:46 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa04797; 11 Apr 94 14:44 EDT
Received: from inet-gw-1.pa.dec.com by BBN.COM id aa26946; 11 Apr 94 14:39 EDT
Received: from nacto1.nacto.lkg.dec.com by inet-gw-1.pa.dec.com (5.65/21Mar94)
	id AA18515; Mon, 11 Apr 94 11:34:09 -0700
Received: from sneezy.nacto.lkg.dec.com by nacto1.nacto.lkg.dec.com with SMTP
	id AA28799; Mon, 11 Apr 1994 14:33:30 -0400
Received: by sneezy.nacto.lkg.dec.com (5.65/4.7) id AA08497; Mon, 11 Apr 1994 14:31:55 -0400
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Cc: nimrod-wg@BBN.COM
Subject: Re: hop limit
In-Reply-To: <9404111723.AA08248@ginger.lcs.mit.edu>
References: <9404111723.AA08248@ginger.lcs.mit.edu>
X-Mailer: Poste 2.1
From: David R Oran <oran@nacto.lkg.dec.com>
Date: Mon, 11 Apr 94 14:31:54 -0400
Message-Id: <940411143154.4941@sneezy.nacto.lkg.dec.com.thomas>
Encoding: 52 TEXT, 6 TEXT SIGNATURE

>     I read once that the diameter of human population was pretty small. ...
>     The surprising fact is that according to this metric the distance is said
>     to be pretty small - typically around 5. I have no hard proof, though.
> 
> I've heard this too. However, humans are typically very richly connected. So,
> maybe in the log(N) thing, the base of the log is some function of the average
> connectivity of the nodes. Thus, a graph with very richly connected nodes (on
> average) will have a smaller diameter than a graph with sparsely connected
> nodes. Good point....
> 
There's a famous paper on this, but it wasn't for the whole human
population. I wish I remember the name of the phenomenon...

Basically the study started with a mathematician "x" (after who the metric
was named) and looked at who wrote a paper with him. Any of
those mathemeticians were at distance 1. Then they looked at
mathematicians who wrote a paper with someone who wrote a paper with
"x". They were at distance 2. etc.

At distance 7 there were all published mathematicians were in the set.

In all this discussion, I'm surprised that nobody has used the
international phone system as the analogue. The maximum diameter
of the phone system today is (I think) 8 (but maybe still 7), 
going perhaps to 9 by the year 2010. That's for 2*10**8 phones.

Now, it might not be safe to assume that the "information superhighway",
combined with all its dirt roads, will evolve like the phone system, but
it seems unlikely that a world-wide internet would function reasonably
at all at diameters much larger than 20 or so. For one thing, the
store-and-forward delays would completely fry any real-time traffic.

If bits were the only problem here, we shouldn't argue about an
extra byte for hop count. On the other hand, by having a large hop
dynamic range, you have an interesting connundrum:

If you count down, what do you start at? Today it doesn't matter
too much since the hop count is really only there to stamp out
looping packets and *not* enforce MPL. With a 16 bit hop count,
packets can loop for long enough to consume LOTS of resources.

Unfortunately, there's no motivation for host system manglers to set the 
hop count to a reasonable number in a count-down system.

A count-up system, is even worse - remember DECnet PhaseIII.

A compromise might be to allocate 16 bits but in host requirements
set an absolute upper limit for now of 255. On the other hand,
I'm not sure I'd want to work on a network with a diameter
over 255. Can you spell "message-switching?"

Dave.

-+-+-+-+-+-+-+
David R. Oran			Phone:	+ 1 508 486-7377
Digital Equipment Corporation		Fax:	+ 1 508 486-5279
LKG 1-2/A19			Email:	oran@lkg.dec.com
550 King Street
Littleton, MA 01460


Received: from PIZZA.BBN.COM by BBN.COM id aa13045; 11 Apr 94 18:50 EDT
Received: from pizza by PIZZA.BBN.COM id aa06370; 11 Apr 94 18:36 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa06366; 11 Apr 94 18:34 EDT
Received: from uucp6.netcom.com by BBN.COM id aa12397; 11 Apr 94 18:32 EDT
Received: from localhost by netcomsv.netcom.com with UUCP (8.6.4/SMI-4.1)
	id PAA10792; Mon, 11 Apr 1994 15:23:37 -0700
Received: from cc:Mail UUCPLINK 2.0 by metrico.metricom.com
	id 9403117660.AA766098820 Mon, 11 Apr 94 14:13:40 
Date: Mon, 11 Apr 94 14:13:40 
From: Greg_Campbell@metrico.metricom.com
Message-Id: <9403117660.AA766098820@metrico.metricom.com>
To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM
Content-Length: 859
Subject: cc:Mail UUCPLINK 2.0 Undeliverable Message

User metrico!rfox is not defined
           Original text follows 
-----------------------------------------
Received: by ccmail
Received:  from netcomsv by metricom.com (UUPC/extended 1.11) with UUCP;
           Mon, 11 Apr 1994 14:11:22 PDT
Received: from PIZZA.BBN.COM by netcomsv.netcom.com with SMTP (8.6.4/SMI-4.1)
	id PAA00875; Mon, 11 Apr 1994 15:09:54 -0700
Received: from pizza by PIZZA.BBN.COM id aa04327; 11 Apr 94 13:46 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa04323; 11 Apr 94 13:44 EDT
Received: from wd40.ftp.com by BBN.COM id aa22972; 11 Apr 94 13:37 EDT
Received: from ftp.com by ftp.com  ; Mon, 11 Apr 1994 13:37:09 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Mon, 11 Apr 1994 13:37:09 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA01948; Mon, 11 Apr 94 13:36:14 EDT
Date: Mon, 11 Apr 94 13:36:14 EDT
Message-Id: <9404111736.AA01948@mailserv-D.ftp.com>
To: jnc@ginger.lcs.mit.edu
Subject: Re:  Nimrod IPng technical requirements text
From: Frank Kastenholz <kasten@ftp.com>
X-ccAdmin: metricom@netcomsv
Reply-To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM
Content-Length: 859


>I'll have to think about whether it's useful to think of the endpoints of a
>flow as attributes of the flow.

If you do then multicast and unicast are the same thing. However, there
may be a security aspect here. 

If flows are set up ala RSVP's 'receiver pull' mechanism, then you
and I could be engaged in a one-to-one conversation and any random
person could join into the flow and they could read our conversation.
Every flow would need some sort of admission control attributes in
order to prevent this sort of evesdropping.

If flows are set up in a 'transmitter push' mechanism then one node
could be in a position to 'force' a flow onto another node -- but
this is not really any worse than a node, today, just sending packets
to arbitrary IP addresses.

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa16678; 12 Apr 94 1:09 EDT
Received: from pizza by PIZZA.BBN.COM id aa07887; 12 Apr 94 0:56 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa07883; 12 Apr 94 0:54 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa16179; 12 Apr 94 0:54 EDT
Received: by ginger.lcs.mit.edu 
	id AA13261; Tue, 12 Apr 94 00:54:22 -0400
Date: Tue, 12 Apr 94 00:54:22 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404120454.AA13261@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re: hop limit
Cc: jnc@ginger.lcs.mit.edu

    it seems unlikely that a world-wide internet would function reasonably
    at all at diameters much larger than 20 or so. For one thing, the
    store-and-forward delays would completely fry any real-time traffic.

Hmm, not sure I quite see this. I know how to build a switch which, running at
a modest clock rate of 10 Mhz, will switch a 40 byte packet in 4 usec.
Assuming a 100 Mbit/sec network, add 4 usec input time, and 4 usec output
time (assuming idle interfaces). That gives us 12 usec or so, at least for
small packets. (Larger packes get real complex, as depending on relative
input and output speeds, you may be able to do "cut through" routing, and
overlap input and output times...)

Now, the circumference of the earth is about 40K kilometres, so speed of light
(300K miles per second) delay half way 'round (20K Km) is 65 msec.  So, 1000
hops at 12 usec per would add 12 msec, or less than a fifth of the (rather
inevitable :-) minimum propogation delay....


    On the other hand, by having a large hop dynamic range, you have an
    interesting connundrum: If you count down, what do you start at? Today it
    doesn't matter too much since the hop count is really only there to stamp
    out looping packets ... With a 16 bit hop count, packets can loop for long
    enough to consume LOTS of resources. Unfortunately, there's no motivation
    for host system manglers to set the hop count to a reasonable number in a
    count-down system. A count-up system, is even worse

Big-I talked about this topic a while back, and decided the "right" thing
was to have the host find out what number to stick in there from the routers.
As a first cut, the routers would give back the diameter of the network, times
a safety factor. This could still allow a certain amount of looping if the
diameter gets big. If you want to really make it tight, have the router
return a number which is the path length to the destination, time a safety
factor.. I'd be happy with the first, at least with Nimrod, since hopefully
looping packets would be Really Rare, and if the mechanism to catch them is
not really efficient, it's not the end of the world.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa17209; 12 Apr 94 1:27 EDT
Received: from pizza by PIZZA.BBN.COM id aa07976; 12 Apr 94 1:14 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa07972; 12 Apr 94 1:12 EDT
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa16780;
          12 Apr 94 1:11 EDT
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 12 Apr 94 14:06:14 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9404120506.AA27165@necom830.cc.titech.ac.jp>
Subject: Re:  hop limit
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Date: Tue, 12 Apr 94 14:06:12 JST
Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu
In-Reply-To: <9404111611.AA06931@ginger.lcs.mit.edu>; from "Noel Chiappa" at Apr 11, 94 12:11 pm
X-Mailer: ELM [version 2.3 PL11]

> I think there is something to this; I think the network graph will be more
> randomly connected locally. However, I'm still not sure that, at the global
> level, it will be close to the planar end of the random<->planar spectrum.
> It's just too easy to make non-planar links, and I think they act very quickly
> to reduce the diameter toward the size predicted for random graphs.

True. It is quite easy if the only purpose is the reduction of the diameter.

Add negligibly small number of long distance links. That's all.

Then, all the long distance communication will use that small number
of links. Thus, the link will be overloaded.

>     You are assuming tree-like topology here, not a random graph nor mesh.
>     That is, your position is biased with the current fact that a small
>     number of T3 routers can handle all the backbone traffic and the
>     backbone is mostly tree structured rather than full mesh. 

This part explains it.

> Not really, I don't think.

You do think so, at least partially.

As I have pointed out several times already, You haven't paid enough
attention to link load concentration issues everywhere in NIMROD
specification.

> The model I was working with was closer to yours.

I hope so.

>     Of course, if we pay extra money to connect distant routers, the required
>     maximum hop count becomes smaller. Such extra connections are also
>     necessary to reduce the load of routers. But, wiring does cost and the
>     improvement is propotional to the cose paid.
> 
> Good point.

What? If you can understand this part, you should be able to have
understood the whole issue. Strange.

							Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa17996; 12 Apr 94 1:57 EDT
Received: from pizza by PIZZA.BBN.COM id aa08082; 12 Apr 94 1:44 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa08078; 12 Apr 94 1:42 EDT
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa17660;
          12 Apr 94 1:42 EDT
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 12 Apr 94 14:35:52 +0859
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9404120536.AA27352@necom830.cc.titech.ac.jp>
Subject: Re: hop limit
To: David R Oran <oran@nacto.lkg.dec.com>
Date: Tue, 12 Apr 94 14:35:50 JST
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
In-Reply-To: <940411143154.4941@sneezy.nacto.lkg.dec.com.thomas>; from "David R Oran" at Apr 11, 94 2:31 pm
X-Mailer: ELM [version 2.3 PL11]

> was named) and looked at who wrote a paper with him. Any of
> those mathemeticians were at distance 1. Then they looked at
> mathematicians who wrote a paper with someone who wrote a paper with
> "x". They were at distance 2. etc.
> 
> At distance 7 there were all published mathematicians were in the set.

I'm afraid you ignore mathematicians who never wrote co-authored
paper. Anyway, if you want to solve material mail routing problem
between mathematicians do it with PTT, not here.

> In all this discussion, I'm surprised that nobody has used the
> international phone system as the analogue.

Because it can't be an analogue. Moreover, as we have real model,
we don't need any analogue.

> The maximum diameter
> of the phone system today is (I think) 8 (but maybe still 7), 
> going perhaps to 9 by the year 2010. That's for 2*10**8 phones.

Phone system today aggragates a lot of 64Kbps communication into 2.4Gbps
backbone, which is not the case of the future internet.

Or are you saisffied with UUCP over 14,400 bps modem forever?

> Now, it might not be safe to assume that the "information superhighway",

That's a terrible assumption.

> combined with all its dirt roads, will evolve like the phone system, but
> it seems unlikely that a world-wide internet would function reasonably
> at all at diameters much larger than 20 or so. For one thing, the
> store-and-forward delays would completely fry any real-time traffic.

Use ATM, in a way people in ATM Forum never imagined. See

	draft-ohta-ip-over-atm-00.txt

on how you can do cell-by-cell relaying on routers.

							Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa22240; 12 Apr 94 12:12 EDT
Received: from pizza by PIZZA.BBN.COM id aa10432; 12 Apr 94 11:53 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa10428; 12 Apr 94 11:51 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa20581; 12 Apr 94 11:49 EDT
Received: by ginger.lcs.mit.edu 
	id AA16418; Tue, 12 Apr 94 11:48:55 -0400
Date: Tue, 12 Apr 94 11:48:55 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404121548.AA16418@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re:  hop limit
Cc: jnc@ginger.lcs.mit.edu

    > It's just too easy to make non-planar links, and I think they act very
    > quickly to reduce the diameter toward the size predicted for random
    > graphs.

    It is quite easy if the only purpose is the reduction of the diameter.
    Add negligibly small number of long distance links. That's all.
    Then, all the long distance communication will use that small number
    of links. Thus, the link will be overloaded.

I have two reactions. First, if the charging policy is at all related to real
traffic loads, the extra revenue from all that traffic should enable you to
put more capacity in place. Slowly things will stabilize with the number of
non-planar long-distance links which are needed to handle the long-distance
traffic. Second, I'm not sure what your model for traffic distribution is, but
my model is that there's probably going to be an inverse relationship between
the distance between two communicating nodes, and the amount of traffic. This
says to me that it's perfectly OK to have a graph which is not as thoroughly
connected at the long-distance scale as it is locally, since there will be
relatively less long-distance traffic than local.


    You haven't paid enough attention to link load concentration issues
    everywhere in NIMROD specification.

Load concentrations are things I worry about a lot, but I think there are lots
of good reasons to think that the coming information infrastructure will be
enough of a mesh to minimize massive hot-spots. The same technology and
economic trends that are driving supercomputers toward lots of parallel,
relatively slow, machines, will operate in networking. Also, lots of parallel
links and switches will produce a more robust infrastructure.

You did have a good point that we have to make sure the routing will scale
well in a system that looks like this, but this issue gets looked at a lot,
now that you have raised it, and I think techniques like high-level virtual
links will allow us to reduce the complexity of the high-level map, without
losing the ability to spread the load across the multitude of parallel real
physical links which make up that high-level virtual link.


    >> Of course, if we pay extra money to connect distant routers, the
    >> required maximum hop count becomes smaller. Such extra connections are
    >> also necessary to reduce the load of routers. But, wiring does cost and
    >> the improvement is propotional to the cose paid.

    > Good point.

    What? If you can understand this part, you should be able to have
    understood the whole issue. Strange.

I thought that any system in which the improvement is proportional to the cost
is a pretty good system. If the users want better service, they pay more
money, and what they get for their money is proportional to the money they
spend. Sounds good to me...

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa11658; 12 Apr 94 16:36 EDT
Received: from pizza by PIZZA.BBN.COM id aa12076; 12 Apr 94 16:19 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa12072; 12 Apr 94 16:15 EDT
Received: from quern.epilogue.com by BBN.COM id aa09348; 12 Apr 94 16:08 EDT
To: nimrod-wg@BBN.COM
Subject: comments on draft routing architecture document
Date: Tue, 12 Apr 94 16:08:20 EDT
From: dab@epilogue.com
Sender: dab@epilogue.com
Message-ID:  <9404121608.aa23761@quern.epilogue.com>

Now that I'm back home for a day or two it's time to write up some
comments I had on the Nimrod Routing Architecture document.  Isidro
asked me to write up something on bottom up locators, that'll follow
along a little later.  These are just the easy comments.

In section "1.1 Constraints", constraint 1, you write that the
Internet "will retain general organization of backbone, regional, and
local networks".  One of the big wins of Nimrod to my mind was that it
didn't require this sort of organization of the network.  I think that
if we give people an internetworking layer that can only work in this
manner then that's the structure the network will take.  I don't
believe that's the best structure for the network.  I also think that
if we start with that assumption then we'll develop an internetworking
layer that provides only that ability.  Look at the geographic vs
provider addressing debates to see what results.

In constraint 7 you write that "the frequency at which an entity moves
is usually inversely proportional to the size of the entity, e.g.,
individual hosts are likley to move around more frequently than entire
networks".  While this may be true it's an average over the entire
network and I'm not sure it's useful.  For a given host or network it
may be quite likely to move around.  For instance, the probability of
the network on an aircraft carrier moving is very high.  I believe
that we'll need mechanisms that make it as easy as possible for
networks to move as well as hosts.

Then in section "5. Renumbering" you say "Because renumbering will,
most likely, be infrequent and carefully planned ...".  I don't think
I believe this premiss.  For mobile networks I expect renumbering to
be quite frequent though perhaps planned.  I also expect the network
to require renumbering once and a while.  Not as a carefully planned
thing necessarily but just because the net's getting bigger.
Depending on how the locators are done, this could require renumbering
large parts of the network that really have no connection with
whereever is forcing the renumbering.  In a sufficiently large
network, such renumbering requests could be very frequent.  I'd
suggest obviously that we do the numbering in such a way as to avoid
this problem if we can but I'd say that the statement that it will be
infrequent is premature until we know how we're doing the numbering.

						Dave Bridgham


Received: from PIZZA.BBN.COM by BBN.COM id aa14373; 13 Apr 94 7:07 EDT
Received: from pizza by PIZZA.BBN.COM id aa15609; 13 Apr 94 6:49 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa15605; 13 Apr 94 6:48 EDT
Received: from mitsou.inria.fr by BBN.COM id aa13637; 13 Apr 94 6:48 EDT
Received: by mitsou.inria.fr
	(5.65c8/IDA-1.2.8) id AA23988; Wed, 13 Apr 1994 12:49:45 +0200
Message-Id: <199404131049.AA23988@mitsou.inria.fr>
To: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Cc: nimrod-wg@BBN.COM
Subject: Re: hop limit 
In-Reply-To: Your message of "Wed, 13 Apr 1994 19:09:22 +0200."
             <9404131009.AA03675@necom830.cc.titech.ac.jp> 
Date: Wed, 13 Apr 1994 12:49:44 +0200
From: Christian Huitema <Christian.Huitema@sophia.inria.fr>

=> My model?
=> 
=> Some amount of communication will be within a city.
=> 
=> Most communication will be within a single economic unit such
=> as a country, EC or North America.
=> 
=> There will be some small, but not negligible, amount of truly global
=> traffic.
=> 

This is the classic telecommunication model. But are you sure that it will
remain valid in the long run? What strikes all observers of the Internet is
the "global village" effect: e.g. I exchange this mail with you, although we
are located in different countries, seperated by a rather large distance.

On the hop limit per se: the reason why we have a hop limit in the packets at
all is loop protection, i.e. making sure that if a loop exists the packet will
roll at most "N" times before it is killed. Obviously, this is not very
helpful if the maximum number of hops is very large. One would then have to
use innovative techniques, e.g. count of traversed networks..

Christian Huitema


Received: from PIZZA.BBN.COM by BBN.COM id aa13143; 13 Apr 94 6:29 EDT
Received: from pizza by PIZZA.BBN.COM id aa15441; 13 Apr 94 6:16 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa15437; 13 Apr 94 6:15 EDT
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa12716;
          13 Apr 94 6:15 EDT
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 13 Apr 94 19:09:23 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9404131009.AA03675@necom830.cc.titech.ac.jp>
Subject: Re:  hop limit
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Date: Wed, 13 Apr 94 19:09:22 JST
Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu
In-Reply-To: <9404121548.AA16418@ginger.lcs.mit.edu>; from "Noel Chiappa" at Apr 12, 94 11:48 am
X-Mailer: ELM [version 2.3 PL11]

>     > It's just too easy to make non-planar links, and I think they act very
>     > quickly to reduce the diameter toward the size predicted for random
>     > graphs.
> 
>     It is quite easy if the only purpose is the reduction of the diameter.
>     Add negligibly small number of long distance links. That's all.
>     Then, all the long distance communication will use that small number
>     of links. Thus, the link will be overloaded.
> 
> I have two reactions. First, if the charging policy is at all related to real
> traffic loads, the extra revenue from all that traffic should enable you to
> put more capacity in place. Slowly things will stabilize with the number of
> non-planar long-distance links which are needed to handle the long-distance
> traffic.

Wrong. It will make the cost so high that networking will die. But, don't
mind. It won't be a case. Vendors with larger amount of routers (and,
thus, larger hop count) can offer cheaper service and others will be
kicked out from the market.

> Second, I'm not sure what your model for traffic distribution is, but
> my model is that there's probably going to be an inverse relationship between
> the distance between two communicating nodes, and the amount of traffic.

My model?

Some amount of communication will be within a city.

Most communication will be within a single economic unit such
as a country, EC or North America.

There will be some small, but not negligible, amount of truly global
traffic.

We should investigate how today's telephone traffic pattern is. But, at
the same time, we should think that people in the future will behave
much more globally.

> This
> says to me that it's perfectly OK to have a graph which is not as thoroughly
> connected at the long-distance scale as it is locally, since there will be
> relatively less long-distance traffic than local.

To me, it is perfectly OK to have a graph which is not directly connected
at the long-distance scale.

>     You haven't paid enough attention to link load concentration issues
>     everywhere in NIMROD specification.
> 
> Load concentrations are things I worry about a lot, but I think there are lots
> of good reasons to think that the coming information infrastructure will be
> enough of a mesh to minimize massive hot-spots.

You don't understand the load concentration issue.

Your goal is wrong from the beginning. To minimize the massive hot-spots,
we should have a rooted tree topology and there is only a single but
very hot spot, which melts the network.

The proper goal is to increase the number of hot spots, which makes
the spots colder.

> The same technology and
> economic trends that are driving supercomputers toward lots of parallel,
> relatively slow, machines, will operate in networking.

You misunderstand supercomputers.

On supercomputers, minimizing latency is an important goal. But making
system large costs super linearly. MPP costs a lot more than the
cost of components.

For communication between distant locations, no one can expect so
little latency because of the speed of light is not fast enough.

Your approach is economically infeasible.

> Also, lots of parallel
> links and switches will produce a more robust infrastructure.

Parallel links or parallel processors share their fates too much of
the time that it is not robust.

For robustness, we should geographically distribute them.

> You did have a good point that we have to make sure the routing will scale
> well in a system that looks like this, but this issue gets looked at a lot,

And gets overlooked a lot.

> now that you have raised it, and I think techniques like high-level virtual
> links will allow us to reduce the complexity of the high-level map, without
> losing the ability to spread the load across the multitude of parallel real
> physical links which make up that high-level virtual link.

High-level virtual links will help to reduce hop counts, if its applicable.

But, can you imagine some cases when such links are not available? For
example, how can you establish such links?

> I thought that any system in which the improvement is proportional to the cost
> is a pretty good system. If the users want better service, they pay more
> money, and what they get for their money is proportional to the money they
> spend. Sounds good to me...

People don't want to pay more money when it is avoidable.

						Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa25150; 13 Apr 94 10:14 EDT
Received: from pizza by PIZZA.BBN.COM id aa16444; 13 Apr 94 10:00 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa16440; 13 Apr 94 9:57 EDT
Received: from [131.112.4.4] by BBN.COM id aa23952; 13 Apr 94 9:55 EDT
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 13 Apr 94 22:49:24 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9404131349.AA04503@necom830.cc.titech.ac.jp>
Subject: Re: hop limit
To: Christian Huitema <Christian.Huitema@sophia.inria.fr>
Date: Wed, 13 Apr 94 22:49:22 JST
Cc: nimrod-wg@BBN.COM
In-Reply-To: <199404131049.AA23988@mitsou.inria.fr>; from "Christian Huitema" at Apr 13, 94 12:49 pm
X-Mailer: ELM [version 2.3 PL11]

> => My model?
> => 
> => Some amount of communication will be within a city.
> => 
> => Most communication will be within a single economic unit such
> => as a country, EC or North America.
> => 
> => There will be some small, but not negligible, amount of truly global
> => traffic.
> => 
> 
> This is the classic telecommunication model. But are you sure that it will
> remain valid in the long run? What strikes all observers of the Internet is
> the "global village" effect: e.g. I exchange this mail with you, although we
> are located in different countries, seperated by a rather large distance.

I have completely agreed with you already.

:We should investigate how today's telephone traffic pattern is. But, at
:the same time, we should think that people in the future will behave
:much more globally.

> On the hop limit per se: the reason why we have a hop limit in the packets at
> all is loop protection, i.e. making sure that if a loop exists the packet will
> roll at most "N" times before it is killed. Obviously, this is not very
> helpful if the maximum number of hops is very large.

Currently, maximum allowed is 255. Moreover, old default value of 30
is now too small. So, I'm using 60 on important mail servers (are there
any new IAB recommended value?).

With such large TTL, if packets loop and multiply exponentially,
TTL is meaningless.

So, let's assume that packets may loop but does not multiply and
the effect of large TTL is only linear.

Moreover, current workstations are powerful enough that even FDDI
can be saturated by a pair of them. So, if packets are sent without
some handshaking, the current TTL is already too large to saturates
network anyway.

Fortunately, most of the protocols including TCP won't generate much
packets unless handshaking succeeds. So, packets may loop but network
won't be saturated, because no handshake signal will be returned.
Thus, loop does not consume a lot of network bandwidth. TTL, in this
case, is useful only to prevent really infinit looping of packets.

Then, isn't MAXTTL of 4095 acceptable?

> One would then have to
> use innovative techniques, e.g. count of traversed networks..

TTL these days exactly means it.

						Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa10022; 14 Apr 94 13:58 EDT
Received: from pizza by PIZZA.BBN.COM id aa24831; 14 Apr 94 13:41 EDT
Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa24827; 14 Apr 94 13:37 EDT
To: nimrod-wg@BBN.COM
Subject: Dave's questions
Date: Thu, 14 Apr 94 13:37:27 -0400
From: Martha Steenstrup <msteenst@BBN.COM>

Hi Dave,

Mea culpa.  I'm responsible for the constraints, and so I'll try to
justify them.

First, a general statement about the internetworking constraints
listed in the draft.  These represent predictions of what the Internet
will look like over the next ten years and hence the expected
environment in which Nimrod would have to operate.  These predictions
are based on the current state of the Internet, on the growth trends
observed over the recent past, and on the predictions of others in the
Internet community.  However, they are only predictions, not
guarantees of the future Internet environment.  Please, if these
constraints do not constitute a reasonable and complete set, speak up
so that we can fix the draft.

We expect to use the constraints in two ways.  The first is to define
the environment in which Nimrod must work.  As we are not going to be
perfect predictors of the future Internet, we want Nimrod to be
flexible enough to accommodate a variety of network topologies,
services, and users.  At the very least, Nimrod must be able to handle
an internetwork in which the constraints listed in draft hold.
However, the draft does not claim that Nimrod will not work in an
environment in which only some of these constraints hold or that
Nimrod will not work in an environment in which additional constraints
hold.

The second way we expect to use the constraints is in defining the
common cases and hence to help in making engineering tradeoffs when we
design the protocol details.  For example, if we have two protocols,
the first of which is very efficient for the expected common case
(like the mobility constraint 7 you mentioned) and not very efficient
for the rarer case and the second of which is less efficient than the
first in the common case, but more efficient than the first in the
rarer case, we may end up opting for the protocol which does the best
in the common case.

Do you think constraints 1 and 7 should be removed? or made more
strict?  Please let us know.

Thanks,
m


Received: from PIZZA.BBN.COM by BBN.COM id aa21565; 15 Apr 94 9:41 EDT
Received: from pizza by PIZZA.BBN.COM id aa29903; 15 Apr 94 9:15 EDT
Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa29899; 15 Apr 94 9:12 EDT
To: minutes@cnri.reston.va.us, nimrod-wg@BBN.COM
Subject: Nimrod IETF minutes
Date: Fri, 15 Apr 94 09:09:26 -0400
From: Isidro Castineyra <isidro@BBN.COM>

Seattle IETF Meeting Proceedings

Routing Area
Nimrod BOF

Current Meeting Report

Reported by: J. Noel Chiappa and Isidro Castineyra (BBN)

Minutes of The New Internet Routing And Addressing Architecture BOF (NIMROD)

The objective of this BOF is to design NIMROD: a hierarchical,
map-based, routing architecture. Nimrod's stated purpose is to manage
in a scalable fashion the trade-off between amount of information about
the network and route quality.  A rough draft architecture document
was distributed to the group's mailing list in preparation for
this meeting. The main purpose of the meeting was the review of the
draft architecture document and the preparation of the workplan for
the next meeting scheduled to take place during IETF30.

The group met on the Tuesday and Wednesday, from 0930 to 1200.

On Tuesday, Isidro Castineyra presented the contents of the draft
architecture document. The presentation covered the stated objectives
of Nimrod, its main features and presented an overview of its
mechanisms.  The following are among the issues raised by the attendees:

1. Mobility

   The question that was raised was whether internetworks (nodes in
   Nimrod parlance) are mobile. In response to this it was said
   that in Nimrod nodes are mobile, but that Nimrod does not propose,
   at this time, a mechanism to support mobility. The draft
   architecture suggests ways in which Nimrod can support current
   approaches to mobility.

2. Node expansion.

   In Nimrod, a node in a map can be expanded, substituting the node
   for its internal map. The question was raised of when should one
   look inside a node for more information. This question was added to
   the open issue list.

3. What is an endpoint.

   The draft says that an endpoint represents a user of the network
   layer---a transport layer entity. The question was if this means
   that TCP/UDP are two endpoints. Chiappa answered that the an entity
   that has an end-to-end connection is an endpoint.  It was noted that
   the concept of entity in the draft should be better defined.

4. EIDs and ELs

   The draft proposes two forms of endpoint identifier: the EID
   (endpoint identifier), and the enpoint lables. The first one is a
   relativle short bit string, while the second one is more like a dns
   name. The question was raised whether both these forms are
   necessary. It was noted that though the ELs are necessary to
   perform a distributed look-up, they should not be part of the
   architecture proper. ELs can be considered a user-interface problem.
   
5. Multiple EIDs per endpoint
   
   The draft permits an endpoint to have more than one EID. The
   questions was raised whether this was necessary. It was pointed out
   that there is no apparent way to enforce a single EID per endpoint.

6. Arc's attributes.

   The draft defines maps as consisting of arcs and nodes. The arcs
   are latter defined to have attributes. The question is wheter it is
   necessary for an arc to have attributes, as it is more common to
   have the attributes residing in nodes. It was noted that both
   models have the same power of representation and that the
   distinction was cosmetical, but it was agreed that the next version
   of the draft would try to conform to the more common
   representation. 

7. Connectivity specifications dynamics

   Connectivity specifications describe the capabilities of a node.
   The question was raised whether these specifications are
   dynamic---that is, whether, for example, they indicate the current
   load of an element of the network. It was pointed out that dynamic
   specification might not scale. It was also pointed out that an
   specification could have different parts with different degrees of
   dynamism, and that each part could be distributed differently.

8. Border points

   Nodes have border points to which arcs attached. The question was
   raised of why are border points necessary. It was answered that
   border points are used to be able to separate the internal
   description of a node (its intenal map) and its connection to the
   outside. 

9. Bidirectional arcs

   The architecuture uses both unidirectional and multipoint arcs. The
   question was raised of why were bi-directional arcs not included.
   It was pointed out that a bidirectional arc can be represented with
   either unidirectional or multipoint arcs.


On Wednesday a set of issues was chosen for discussion by the group:

A. Arcs and nodes: different representations

B. When to look inside of a node

C. Dynamics of connectivity specifications.

D. Workplan

The group decided to continue refining the architecture document using
the output of this meeting and discussions in the mailing list. Work
on the protocols should also start in this period.


Received: from PIZZA.BBN.COM by BBN.COM id aa01667; 15 Apr 94 12:41 EDT
Received: from pizza by PIZZA.BBN.COM id aa01300; 15 Apr 94 12:22 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa01294; 15 Apr 94 12:20 EDT
Received: from quern.epilogue.com by BBN.COM id aa00354; 15 Apr 94 12:17 EDT
To: msteenst@BBN.COM
CC: nimrod-wg@BBN.COM
In-reply-to: Martha Steenstrup's message of Thu, 14 Apr 94 13:37:27 -0400 <9404141352.aa06677@quern.epilogue.com>
Subject: Dave's questions
Date: Fri, 15 Apr 94 12:15:22 EDT
From: dab@epilogue.com
Sender: dab@epilogue.com
Message-ID:  <9404151215.aa13078@quern.epilogue.com>

   Date: Thu, 14 Apr 94 13:37:27 -0400
   From: Martha Steenstrup <msteenst@bbn.com>

   Do you think constraints 1 and 7 should be removed? or made more
   strict?  Please let us know.

My opinion is that constraint 1 should be removed because I think it
makes it too easy to come up with easy answers that later inhibit a
mesh network.  In fact, I'd prefer to explicitly work towards a
network that assumes that cross links at the "leaves" are the norm
rather than the exception.

For constraint 7 I can't decide.  You didn't say this but it looks
like an awful easy step to go from what you did say to a design
decision that makes it more difficult for nets to move than hosts.
After all, it's less likely.  It could end up that we simply can't do
better than making mobile nets harder than mobile hosts.  But I
wouldn't want to start with that as a target.

						Dave


Received: from PIZZA.BBN.COM by BBN.COM id aa16283; 18 Apr 94 11:31 EDT
Received: from pizza by PIZZA.BBN.COM id aa15648; 18 Apr 94 11:11 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa15644; 18 Apr 94 11:07 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa14948; 18 Apr 94 11:06 EDT
Received: by ginger.lcs.mit.edu 
	id AA27350; Mon, 18 Apr 94 11:06:21 -0400
Date: Mon, 18 Apr 94 11:06:21 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404181506.AA27350@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re:  Dave's questions
Cc: jnc@ginger.lcs.mit.edu

    My opinion is that constraint 1 ["The Internet ... will retain the
    general organizational structure of backbone, regional, and local
    networks"] should be removed because I think it makes it too easy to come
    up with easy answers that later inhibit a mesh network. In fact, I'd
    prefer to explicitly work towards a network that assumes that cross links
    at the "leaves" are the norm rather than the exception.

I agree completely. In my comments (which I need to polish, and put up for
review), I said "I expect the general organization style, a loose
confederation of autonomous entities, will continue, and the RA must be
flexible enough to support this, while still scaling". We have to build
something that can support a network constructed like that, as something
of a random mesh (although it won't be totally a random mesh, obviously).

    For constraint 7 ["The frequency at which an entity moves is usually
    inversely proportional to the size of the entity"] I can't decide. You
    didn't say this but it looks like an awful easy step to go from what you
    did say to a design decision that makes it more difficult for nets to move
    than hosts. After all, it's less likely. It could end up that we simply
    can't do better than making mobile nets harder than mobile hosts.  But I
    wouldn't want to start with that as a target.

I tend to agree that we shouldn't limit outselves, but two things give me
pause. First, it seems like it's very likely to be harder (you need more
mechanism to make N things do something, than just one thing).

Second, I'm really wary of "second-system/kitchen-sink" disease; in fact, I'm
sure many people think Nimrod already has it Big Time! (I'd agree with them,
if we weren't trying to develop a routing architecture that had the flexibility
to grow and change without a massive disruption to the underlying network
substrate every few years.) Trying to make moving a network as easy, and
important, as moving hosts is an ambitious goal that make me nervous.

Anyway, I'm not saying I disagree with you here, and I think we should think
about it, but I have to think about it.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa18502; 18 Apr 94 12:09 EDT
Received: from pizza by PIZZA.BBN.COM id aa15945; 18 Apr 94 11:49 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa15941; 18 Apr 94 11:47 EDT
Received: from ftp.com by BBN.COM id aa17116; 18 Apr 94 11:44 EDT
Received: from ftp.com by ftp.com  ; Mon, 18 Apr 1994 11:44:21 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Mon, 18 Apr 1994 11:44:21 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA21983; Mon, 18 Apr 94 11:43:16 EDT
Date: Mon, 18 Apr 94 11:43:16 EDT
Message-Id: <9404181543.AA21983@mailserv-D.ftp.com>
To: jnc@ginger.lcs.mit.edu
Subject: Re:  Dave's questions
From: Frank Kastenholz <kasten@ftp.com>
Reply-To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu
Content-Length: 2993


>     For constraint 7 ["The frequency at which an entity moves is usually
>     inversely proportional to the size of the entity"] I can't decide. You
>     didn't say this but it looks like an awful easy step to go from what you
>     did say to a design decision that makes it more difficult for nets to move
>     than hosts. After all, it's less likely. It could end up that we simply
>     can't do better than making mobile nets harder than mobile hosts.  But I
>     wouldn't want to start with that as a target.
> 
> I tend to agree that we shouldn't limit outselves, but two things give me
> pause. First, it seems like it's very likely to be harder (you need more
> mechanism to make N things do something, than just one thing).
> 
> Second, I'm really wary of "second-system/kitchen-sink" disease; in fact, I'm
> sure many people think Nimrod already has it Big Time!

Consider that things like mobile networks and mobile internetworks
are going to probably happen with IPng. I think that Nimrod must be
able to allow an entire network or internetwork to move. I do
specifically see this as allowing an entire network/internetwork to
move as opposed to just greater than 1 host moving (whatever the
difference may be).

For instance, the US Navy is working on developing networks for all
of its ships, planes, and shore installations.  So, if each plane is
a network, and each ship is a network, then all of the planes on an
aircraft carrier are each networked, connected to the carrier's net,
which would be connected to one (or more) shore installations'
networks. As the carrier moves around on the ocean, it will move from
one shore base's area/network to another's -- as will all of the
planes on the carrier's flight deck. Of course, the planes may also
move around, independently of the carrier, so their networks will
have to move both as a result of the carrier moving, AND as a result
of the plane taking off and flying someplace else. (and it gets even
worse when you consider that the Navy tends to operate its ships in
battlegroups, which may represent networks and various ships and
planes may be assigned to the bg, entering and later leaving the bg's
network...).

This also applies to the civilian world. Consider, for example, an
airliner or a train with a net connection providing services to
laptop computers. Or cars (automakers are looking to replace the
multitude of point-to-point control wires with a network...).


Finally, one can look at mobility as being an aspect of the same
problem as changing providers. In each case, an element of the
network's topology is detached from the net's topology and
reconnected someplace else.  If this problem is solved then you solve
the mobility and changing provider problems. Both physical movement
(mobility) and changing provider are merely reasons why the network
element changes where in the topology it connects.

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa14453; 19 Apr 94 2:09 EDT
Received: from pizza by PIZZA.BBN.COM id aa20616; 19 Apr 94 1:54 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa20608; 19 Apr 94 1:51 EDT
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa13814;
          19 Apr 94 1:48 EDT
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 19 Apr 94 14:40:32 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9404190540.AA01688@necom830.cc.titech.ac.jp>
Subject: Re:  Dave's questions
To: kasten@ftp.com
Date: Tue, 19 Apr 94 14:40:31 JST
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
In-Reply-To: <9404181543.AA21983@mailserv-D.ftp.com>; from "Frank Kastenholz" at Apr 18, 94 11:43 am
X-Mailer: ELM [version 2.3 PL11]

> Consider that things like mobile networks and mobile internetworks
> are going to probably happen with IPng. I think that Nimrod must be
> able to allow an entire network or internetwork to move. I do
> specifically see this as allowing an entire network/internetwork to
> move as opposed to just greater than 1 host moving (whatever the
> difference may be).
> 
> For instance, the US Navy is working on developing networks for all
> of its ships, planes, and shore installations.  So, if each plane is
> a network, and each ship is a network, then all of the planes on an
> aircraft carrier are each networked, connected to the carrier's net,
> which would be connected to one (or more) shore installations'
> networks. As the carrier moves around on the ocean, it will move from
> one shore base's area/network to another's -- as will all of the
> planes on the carrier's flight deck. Of course, the planes may also
> move around, independently of the carrier, so their networks will
> have to move both as a result of the carrier moving, AND as a result
> of the plane taking off and flying someplace else. (and it gets even
> worse when you consider that the Navy tends to operate its ships in
> battlegroups, which may represent networks and various ships and
> planes may be assigned to the bg, entering and later leaving the bg's
> network...).

So? What's wrong with RIP or OSPF?

> This also applies to the civilian world. Consider, for example, an
> airliner or a train with a net connection providing services to
> laptop computers. Or cars (automakers are looking to replace the
> multitude of point-to-point control wires with a network...).

If you try to solve moiblity issue of individuals of upto 4G people now
living on the Earth through routing mechanism, routing table instantly
expoldes.

Forget it.

						Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa09985; 19 Apr 94 9:42 EDT
Received: from pizza by PIZZA.BBN.COM id aa22263; 19 Apr 94 9:29 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa22259; 19 Apr 94 9:27 EDT
Received: from wd40.ftp.com by BBN.COM id aa08840; 19 Apr 94 9:23 EDT
Received: from ftp.com by ftp.com  ; Tue, 19 Apr 1994 09:23:33 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Tue, 19 Apr 1994 09:23:33 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA03534; Tue, 19 Apr 94 09:22:30 EDT
Date: Tue, 19 Apr 94 09:22:30 EDT
Message-Id: <9404191322.AA03534@mailserv-D.ftp.com>
To: mohta@necom830.cc.titech.ac.jp
Subject: Re:  Dave's questions
From: Frank Kastenholz <kasten@ftp.com>
Reply-To: kasten@ftp.com
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
Content-Length: 2206


 > > Consider that things like mobile networks and mobile internetworks
 > > are going to probably happen with IPng. I think that Nimrod must be
 > > able to allow an entire network or internetwork to move. I do
 > > specifically see this as allowing an entire network/internetwork to
 > > move as opposed to just greater than 1 host moving (whatever the
 > > difference may be).
 
 > So? What's wrong with RIP or OSPF?

The routing protocol to use is irrelevant.

The real problem is does the architecture allow only leaves of the
tree to move around on the topology or does it allow entire subtrees
to move around? Also, there is the re-locatoring problem -- if you
assume that movement is limited to the leaves of the tree then it
makes sense to allow individual leaves to ask some service "what's my
new locator?". If you allow entire subtrees to move around, this
might not work -- there might be thousands of leaves in the subtree
that just moved -- having them all ask at one time "where am I?"
might kill the local network(s) and overload various servers. You'd
also have the problem of telling all these nodes that they moved...
So, if you allow subtrees to move, you might want to do the
relocatoring by (for example) having the routers advertise onto their
local networks what the locator prefix for that subnet is, rather
than having nodes, when they come up ask for the prefix. 


 > > This also applies to the civilian world. Consider, for example, an
 > > airliner or a train with a net connection providing services to
 > > laptop computers. Or cars (automakers are looking to replace the
 > > multitude of point-to-point control wires with a network...).
 > 
 > If you try to solve moiblity issue of individuals of upto 4G people now
 > living on the Earth through routing mechanism, routing table instantly
 > expoldes.
 > 
 > Forget it.

Only if the routing tables have to hold entries for all 4G people. One of
the major goals of Nimrod is to allow aggregation of routes, reducing the
amount of routing information that has to be kept in the routing
tables at any one point in the network.

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa24370; 19 Apr 94 22:40 EDT
Received: from pizza by PIZZA.BBN.COM id aa27181; 19 Apr 94 22:28 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa27177; 19 Apr 94 22:25 EDT
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa20555;
          19 Apr 94 22:23 EDT
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 20 Apr 94 11:17:18 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9404200217.AA05852@necom830.cc.titech.ac.jp>
Subject: Re:  Dave's questions
To: kasten@ftp.com
Date: Wed, 20 Apr 94 11:17:16 JST
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
In-Reply-To: <9404191322.AA03534@mailserv-D.ftp.com>; from "Frank Kastenholz" at Apr 19, 94 9:22 am
X-Mailer: ELM [version 2.3 PL11]

>  > > Consider that things like mobile networks and mobile internetworks
>  > > are going to probably happen with IPng. I think that Nimrod must be
>  > > able to allow an entire network or internetwork to move. I do
>  > > specifically see this as allowing an entire network/internetwork to
>  > > move as opposed to just greater than 1 host moving (whatever the
>  > > difference may be).
>  
>  > So? What's wrong with RIP or OSPF?
> 
> The routing protocol to use is irrelevant.
> 
> The real problem is does the architecture allow only leaves of the
> tree to move around on the topology or does it allow entire subtrees
> to move around? Also, there is the re-locatoring problem --

Aha, I agree. Your issue is related to the fact that nimrod can not
support locator change.

> if you
> assume that movement is limited to the leaves of the tree then it
> makes sense to allow individual leaves to ask some service "what's my
> new locator?".

The more important question would be "what's someone else's locator?",
I think. Even if there exists some protcol to ask it, it will kill the
global network.

So far, I don't think nimrod work in its proposed form.

> Only if the routing tables have to hold entries for all 4G people. One of
> the major goals of Nimrod is to allow aggregation of routes, reducing the
> amount of routing information that has to be kept in the routing
> tables at any one point in the network.

In general, people moves at random. Around Tokyo, more than 10M people
moves daily between thier home and office, where very weak correlation
between locations of home and oice is seen. There are a lot of nation
wide or international travellors who also travels randomly. Of course
there is a backbone of transportation, but the correlation between
source and distination is mostly random and no meaningful aggregation
is possible.


							Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa03918; 20 Apr 94 9:14 EDT
Received: from pizza by PIZZA.BBN.COM id aa29539; 20 Apr 94 9:00 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa29535; 20 Apr 94 8:58 EDT
Received: from wd40.ftp.com by BBN.COM id aa02486; 20 Apr 94 8:54 EDT
Received: from ftp.com by ftp.com  ; Wed, 20 Apr 1994 08:54:56 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Wed, 20 Apr 1994 08:54:56 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA17562; Wed, 20 Apr 94 08:53:53 EDT
Date: Wed, 20 Apr 94 08:53:53 EDT
Message-Id: <9404201253.AA17562@mailserv-D.ftp.com>
To: mohta@necom830.cc.titech.ac.jp
Subject: Re:  Dave's questions
From: Frank Kastenholz <kasten@ftp.com>
Reply-To: kasten@ftp.com
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
Content-Length: 4436


 > > The real problem is does the architecture allow only leaves of the
 > > tree to move around on the topology or does it allow entire subtrees
 > > to move around? Also, there is the re-locatoring problem --
 > 
 > Aha, I agree. Your issue is related to the fact that nimrod can not
 > support locator change.

I assume by this you mean "Nimrod can not support changing the locator
of a node in the network graph"? True, in the Nimrod architecture document
there is no specification of a protocol that does this. But, the document
specifies an architecture, not protocols.

Section 5 of the document discusses renumbering. What is broken with it?

 > > if you
 > > assume that movement is limited to the leaves of the tree then it
 > > makes sense to allow individual leaves to ask some service "what's my
 > > new locator?".
 > 
 > The more important question would be "what's someone else's locator?",
 > I think. Even if there exists some protcol to ask it, it will kill the
 > global network.

This is true. It is also irrelevant to my discussion. I was talking
about what actions have to occur when a node(*) changes its location
on the network graph. Specifically, I was dealing with the issues of
how a node finds out its locator when it moves. For example, if I
take my PC and move it to your network, how does my PC determine its
new locator?

However, my contention is that this problem is one aspect of a more
general problem, how does a node(*) determine its locator under any
circumstances?  That is, when a node is first created it must
determine its locator; when the locator for a node changes (say, I
change providers, or a new level is added to the locator hierarchy)
the node must learn of its new locator, and when a node moves it must
learn its new locator. I believe that one mechanism can be used to
solve all these problems. 

(*) by "node" I mean a node of the Nimrod graph. That is, a node could
    be a cluster which is composed of other clusters, hosts, routers,
    networks, etc etc. 


You are right in that there is also the problem of getting a 'remote'
host's locator when I want to start communicating with that host. But
this must happen even if hosts do not move on the network. For
example, when "tri-flow.ftp.com" (my mail server) sends this mail
message to "necom830.cc.titech.ac.jp" (your machine), it must
establish a TCP connection to transfer the message (using SMTP). My
machine must find the locator for your machine, even though neither
machine is moving and neither machine is likely to move.

 > > Only if the routing tables have to hold entries for all 4G people. One of
 > > the major goals of Nimrod is to allow aggregation of routes, reducing the
 > > amount of routing information that has to be kept in the routing
 > > tables at any one point in the network.
 > 
 > In general, people moves at random. Around Tokyo, more than 10M people
 > moves daily between thier home and office, where very weak correlation
 > between locations of home and oice is seen. There are a lot of nation
 > wide or international travellors who also travels randomly. Of course
 > there is a backbone of transportation, but the correlation between
 > source and distination is mostly random and no meaningful aggregation
 > is possible.

The aggregation would occur at some level higher than the individual
person.  If both your home and your office get service from the same
service provider, then aggregation could occur there. For example, if
I had connectivity to my home I might get service from Nearnet.
Nearnet also provides service to FTP software. So, a machine at my
home might have a locator like nsfnet.nearnet.franks_home.machine. If
I move that machine to my office it would then get a locator like
nsfnet.nearnet.ftp_software.machine. So the aggregation would occur
within Nearnet. If my home service came from PSI instead, then my
home locator would be nsfnet.psi.franks_home.machine. When I move the
machine to the office, then the aggregation would be at the nsfnet
level.

In either case, this is invisible to you in Japan since (I assume)
Japan's connectivity would be from the Japanese National Backbone to
the US Backbone (nsfnet). So the Japanese backbone would need to keep
a route to only the nsfnet. It would not need to keep track of routes
to nsfnet.nearnet or nsfnet.psi.

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa17282; 20 Apr 94 12:51 EDT
Received: from pizza by PIZZA.BBN.COM id aa01120; 20 Apr 94 12:33 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa01116; 20 Apr 94 12:31 EDT
Received: from uucp6.netcom.com by BBN.COM id aa15859; 20 Apr 94 12:26 EDT
Received: from localhost by netcomsv.netcom.com with UUCP (8.6.4/SMI-4.1)
	id JAA04598; Wed, 20 Apr 1994 09:08:42 -0700
Received: from cc:Mail UUCPLINK 2.0 by metrico.metricom.com
	id 9403207668.AA766855110 Wed, 20 Apr 94 08:18:30 
Date: Wed, 20 Apr 94 08:18:30 
From: Greg_Campbell@metrico.metricom.com
Message-Id: <9403207668.AA766855110@metrico.metricom.com>
To: kasten@ftp.com
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
Content-Length: 2206
Subject: cc:Mail UUCPLINK 2.0 Undeliverable Message

User metrico!rfox is not defined
           Original text follows 
-----------------------------------------
Received: by ccmail
Received:  from netcomsv by metricom.com (UUPC/extended 1.11) with UUCP;
           Tue, 19 Apr 1994 22:45:45 PDT
Received: from PIZZA.BBN.COM by netcomsv.netcom.com with SMTP (8.6.4/SMI-4.1)
	id GAA26150; Tue, 19 Apr 1994 06:44:42 -0700
Received: from pizza by PIZZA.BBN.COM id aa22263; 19 Apr 94 9:29 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa22259; 19 Apr 94 9:27 EDT
Received: from wd40.ftp.com by BBN.COM id aa08840; 19 Apr 94 9:23 EDT
Received: from ftp.com by ftp.com  ; Tue, 19 Apr 1994 09:23:33 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Tue, 19 Apr 1994 09:23:33 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA03534; Tue, 19 Apr 94 09:22:30 EDT
Date: Tue, 19 Apr 94 09:22:30 EDT
Message-Id: <9404191322.AA03534@mailserv-D.ftp.com>
To: mohta@necom830.cc.titech.ac.jp
Subject: Re:  Dave's questions
From: Frank Kastenholz <kasten@ftp.com>
X-ccAdmin: metricom@netcomsv
Reply-To: kasten@ftp.com
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
Content-Length: 2206


 > > Consider that things like mobile networks and mobile internetworks
 > > are going to probably happen with IPng. I think that Nimrod must be
 > > able to allow an entire network or internetwork to move. I do
 > > specifically see this as allowing an entire network/internetwork to
 > > move as opposed to just greater than 1 host moving (whatever the
 > > difference may be).
 
 > So? What's wrong with RIP or OSPF?

The routing protocol to use is irrelevant.

The real problem is does the architecture allow only leaves of the
tree to move around on the topology or does it allow entire subtrees
to move around? Also, there is the re-locatoring problem -- if you
assume that movement is limited to the leaves of the tree then it
makes sense to allow individual leaves to ask some service "what's my
new locator?". If you allow entire subtrees to move around, this
might not work -- there might be thousands of leaves in the subtree
that just moved -- having them all ask at one time "where am I?"
might kill the local network(s) and overload various servers. You'd
also have the problem of telling all these nodes that they moved...
So, if you allow subtrees to move, you might want to do the
relocatoring by (for example) having the routers advertise onto their
local networks what the locator prefix for that subnet is, rather
than having nodes, when they come up ask for the prefix. 


 > > This also applies to the civilian world. Consider, for example, an
 > > airliner or a train with a net connection providing services to
 > > laptop computers. Or cars (automakers are looking to replace the
 > > multitude of point-to-point control wires with a network...).
 > 
 > If you try to solve moiblity issue of individuals of upto 4G people now
 > living on the Earth through routing mechanism, routing table instantly
 > expoldes.
 > 
 > Forget it.

Only if the routing tables have to hold entries for all 4G people. One of
the major goals of Nimrod is to allow aggregation of routes, reducing the
amount of routing information that has to be kept in the routing
tables at any one point in the network.

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa17281; 20 Apr 94 12:51 EDT
Received: from pizza by PIZZA.BBN.COM id aa01126; 20 Apr 94 12:33 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa01122; 20 Apr 94 12:31 EDT
Received: from uucp6.netcom.com by BBN.COM id aa15901; 20 Apr 94 12:26 EDT
Received: from localhost by netcomsv.netcom.com with UUCP (8.6.4/SMI-4.1)
	id JAA03600; Wed, 20 Apr 1994 09:04:57 -0700
Received: from cc:Mail UUCPLINK 2.0 by metrico.metricom.com
	id 9403207668.AA766853371 Wed, 20 Apr 94 07:49:31 
Date: Wed, 20 Apr 94 07:49:31 
From: Greg_Campbell@metrico.metricom.com
Message-Id: <9403207668.AA766853371@metrico.metricom.com>
To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu
Content-Length: 2993
Subject: cc:Mail UUCPLINK 2.0 Undeliverable Message

User metrico!rfox is not defined
           Original text follows 
-----------------------------------------
Received: by ccmail
Received:  from netcomsv by metricom.com (UUPC/extended 1.11) with UUCP;
           Tue, 19 Apr 1994 21:49:03 PDT
Received: from PIZZA.BBN.COM by netcomsv.netcom.com with SMTP (8.6.4/SMI-4.1)
	id JAA22380; Mon, 18 Apr 1994 09:12:01 -0700
Received: from pizza by PIZZA.BBN.COM id aa15945; 18 Apr 94 11:49 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa15941; 18 Apr 94 11:47 EDT
Received: from ftp.com by BBN.COM id aa17116; 18 Apr 94 11:44 EDT
Received: from ftp.com by ftp.com  ; Mon, 18 Apr 1994 11:44:21 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Mon, 18 Apr 1994 11:44:21 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA21983; Mon, 18 Apr 94 11:43:16 EDT
Date: Mon, 18 Apr 94 11:43:16 EDT
Message-Id: <9404181543.AA21983@mailserv-D.ftp.com>
To: jnc@ginger.lcs.mit.edu
Subject: Re:  Dave's questions
From: Frank Kastenholz <kasten@ftp.com>
X-ccAdmin: metricom@netcomsv
Reply-To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu
Content-Length: 2993


>     For constraint 7 ["The frequency at which an entity moves is usually
>     inversely proportional to the size of the entity"] I can't decide. You
>     didn't say this but it looks like an awful easy step to go from what you
>     did say to a design decision that makes it more difficult for nets to move
>     than hosts. After all, it's less likely. It could end up that we simply
>     can't do better than making mobile nets harder than mobile hosts.  But I
>     wouldn't want to start with that as a target.
> 
> I tend to agree that we shouldn't limit outselves, but two things give me
> pause. First, it seems like it's very likely to be harder (you need more
> mechanism to make N things do something, than just one thing).
> 
> Second, I'm really wary of "second-system/kitchen-sink" disease; in fact, I'm
> sure many people think Nimrod already has it Big Time!

Consider that things like mobile networks and mobile internetworks
are going to probably happen with IPng. I think that Nimrod must be
able to allow an entire network or internetwork to move. I do
specifically see this as allowing an entire network/internetwork to
move as opposed to just greater than 1 host moving (whatever the
difference may be).

For instance, the US Navy is working on developing networks for all
of its ships, planes, and shore installations.  So, if each plane is
a network, and each ship is a network, then all of the planes on an
aircraft carrier are each networked, connected to the carrier's net,
which would be connected to one (or more) shore installations'
networks. As the carrier moves around on the ocean, it will move from
one shore base's area/network to another's -- as will all of the
planes on the carrier's flight deck. Of course, the planes may also
move around, independently of the carrier, so their networks will
have to move both as a result of the carrier moving, AND as a result
of the plane taking off and flying someplace else. (and it gets even
worse when you consider that the Navy tends to operate its ships in
battlegroups, which may represent networks and various ships and
planes may be assigned to the bg, entering and later leaving the bg's
network...).

This also applies to the civilian world. Consider, for example, an
airliner or a train with a net connection providing services to
laptop computers. Or cars (automakers are looking to replace the
multitude of point-to-point control wires with a network...).


Finally, one can look at mobility as being an aspect of the same
problem as changing providers. In each case, an element of the
network's topology is detached from the net's topology and
reconnected someplace else.  If this problem is solved then you solve
the mobility and changing provider problems. Both physical movement
(mobility) and changing provider are merely reasons why the network
element changes where in the topology it connects.

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa23930; 20 Apr 94 14:44 EDT
Received: from pizza by PIZZA.BBN.COM id aa01998; 20 Apr 94 14:21 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa01994; 20 Apr 94 14:19 EDT
Received: from uucp5.netcom.com by BBN.COM id aa22128; 20 Apr 94 14:15 EDT
Received: from localhost by netcomsv.netcom.com with UUCP (8.6.4/SMI-4.1)
	id KAA01936; Wed, 20 Apr 1994 10:56:15 -0700
Received: from cc:Mail UUCPLINK 2.0 by metrico.metricom.com
	id 9403207668.AA766862691 Wed, 20 Apr 94 10:24:51 
Date: Wed, 20 Apr 94 10:24:51 
From: Greg_Campbell@metrico.metricom.com
Message-Id: <9403207668.AA766862691@metrico.metricom.com>
To: kasten@ftp.com
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
Content-Length: 4436
Subject: cc:Mail UUCPLINK 2.0 Undeliverable Message

User metrico!rfox is not defined
           Original text follows 
-----------------------------------------
Received: by ccmail
Received:  from netcomsv by metricom.com (UUPC/extended 1.11) with UUCP;
           Wed, 20 Apr 1994 09:35:02 PDT
Received: from PIZZA.BBN.COM by netcomsv.netcom.com with SMTP (8.6.4/SMI-4.1)
	id GAA22877; Wed, 20 Apr 1994 06:16:25 -0700
Received: from pizza by PIZZA.BBN.COM id aa29539; 20 Apr 94 9:00 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa29535; 20 Apr 94 8:58 EDT
Received: from wd40.ftp.com by BBN.COM id aa02486; 20 Apr 94 8:54 EDT
Received: from ftp.com by ftp.com  ; Wed, 20 Apr 1994 08:54:56 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Wed, 20 Apr 1994 08:54:56 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA17562; Wed, 20 Apr 94 08:53:53 EDT
Date: Wed, 20 Apr 94 08:53:53 EDT
Message-Id: <9404201253.AA17562@mailserv-D.ftp.com>
To: mohta@necom830.cc.titech.ac.jp
Subject: Re:  Dave's questions
From: Frank Kastenholz <kasten@ftp.com>
X-ccAdmin: metricom@netcomsv
Reply-To: kasten@ftp.com
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
Content-Length: 4436


 > > The real problem is does the architecture allow only leaves of the
 > > tree to move around on the topology or does it allow entire subtrees
 > > to move around? Also, there is the re-locatoring problem --
 > 
 > Aha, I agree. Your issue is related to the fact that nimrod can not
 > support locator change.

I assume by this you mean "Nimrod can not support changing the locator
of a node in the network graph"? True, in the Nimrod architecture document
there is no specification of a protocol that does this. But, the document
specifies an architecture, not protocols.

Section 5 of the document discusses renumbering. What is broken with it?

 > > if you
 > > assume that movement is limited to the leaves of the tree then it
 > > makes sense to allow individual leaves to ask some service "what's my
 > > new locator?".
 > 
 > The more important question would be "what's someone else's locator?",
 > I think. Even if there exists some protcol to ask it, it will kill the
 > global network.

This is true. It is also irrelevant to my discussion. I was talking
about what actions have to occur when a node(*) changes its location
on the network graph. Specifically, I was dealing with the issues of
how a node finds out its locator when it moves. For example, if I
take my PC and move it to your network, how does my PC determine its
new locator?

However, my contention is that this problem is one aspect of a more
general problem, how does a node(*) determine its locator under any
circumstances?  That is, when a node is first created it must
determine its locator; when the locator for a node changes (say, I
change providers, or a new level is added to the locator hierarchy)
the node must learn of its new locator, and when a node moves it must
learn its new locator. I believe that one mechanism can be used to
solve all these problems. 

(*) by "node" I mean a node of the Nimrod graph. That is, a node could
    be a cluster which is composed of other clusters, hosts, routers,
    networks, etc etc. 


You are right in that there is also the problem of getting a 'remote'
host's locator when I want to start communicating with that host. But
this must happen even if hosts do not move on the network. For
example, when "tri-flow.ftp.com" (my mail server) sends this mail
message to "necom830.cc.titech.ac.jp" (your machine), it must
establish a TCP connection to transfer the message (using SMTP). My
machine must find the locator for your machine, even though neither
machine is moving and neither machine is likely to move.

 > > Only if the routing tables have to hold entries for all 4G people. One of
 > > the major goals of Nimrod is to allow aggregation of routes, reducing the
 > > amount of routing information that has to be kept in the routing
 > > tables at any one point in the network.
 > 
 > In general, people moves at random. Around Tokyo, more than 10M people
 > moves daily between thier home and office, where very weak correlation
 > between locations of home and oice is seen. There are a lot of nation
 > wide or international travellors who also travels randomly. Of course
 > there is a backbone of transportation, but the correlation between
 > source and distination is mostly random and no meaningful aggregation
 > is possible.

The aggregation would occur at some level higher than the individual
person.  If both your home and your office get service from the same
service provider, then aggregation could occur there. For example, if
I had connectivity to my home I might get service from Nearnet.
Nearnet also provides service to FTP software. So, a machine at my
home might have a locator like nsfnet.nearnet.franks_home.machine. If
I move that machine to my office it would then get a locator like
nsfnet.nearnet.ftp_software.machine. So the aggregation would occur
within Nearnet. If my home service came from PSI instead, then my
home locator would be nsfnet.psi.franks_home.machine. When I move the
machine to the office, then the aggregation would be at the nsfnet
level.

In either case, this is invisible to you in Japan since (I assume)
Japan's connectivity would be from the Japanese National Backbone to
the US Backbone (nsfnet). So the Japanese backbone would need to keep
a route to only the nsfnet. It would not need to keep track of routes
to nsfnet.nearnet or nsfnet.psi.

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa24000; 22 Apr 94 16:22 EDT
Received: from pizza by PIZZA.BBN.COM id aa19211; 22 Apr 94 16:10 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa19207; 22 Apr 94 16:06 EDT
Received: from quern.epilogue.com by BBN.COM id aa22706; 22 Apr 94 16:02 EDT
To: nimrod-wg@BBN.COM
In-reply-to: Noel Chiappa's message of Mon, 18 Apr 94 11:06:21 -0400 <9404181506.AA27350@ginger.lcs.mit.edu>
Subject:  Dave's questions
Date: Fri, 22 Apr 94 16:01:13 EDT
From: dab@epilogue.com
Sender: dab@epilogue.com
Message-ID:  <9404221601.aa03047@quern.epilogue.com>

   Date: Mon, 18 Apr 94 11:06:21 -0400
   From: Noel Chiappa <jnc@ginger.lcs.mit.edu>

   Trying to make moving a network as easy, and important, as moving
   hosts is an ambitious goal that make me nervous.

This is a goal that if we stumble too hard I'd be willing to punt.
But, because of the things like what Frank pointed out, I think it's
going to be an important requirement of the net in the future.  Also,
I guess I have this feeling that the design of nimrod is going to
make it not quite as hard as we fear.

The final also is that I think the way we're going that moving a host
is going to be indistinguishable, to nimrod anyway, from moving a
network.  The design of locators in the architecture document had them
growing top down and as far down as you wanted.  In other words, the
locators didn't have to stop at the interface they could go inside the
machine.  Way cool, I've always wanted to do that.  Strict bottom up
locators don't necessarily let you do that but you were pushing for
being able to grow both up and down (and I'd pretty much come to that
conclusion with my own thinking too).  So it looks like we're going to
get locators that don't stop at the interface but go inside the
machine.  Now moving a host looks like moving an entire network.  If
nimrod can handle one it can handle the other.

					Dave


Received: from PIZZA.BBN.COM by BBN.COM id aa04263; 23 Apr 94 2:52 EDT
Received: from pizza by PIZZA.BBN.COM id aa21848; 23 Apr 94 2:41 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa21844; 23 Apr 94 2:39 EDT
Received: from necom830.cc.titech.ac.jp by BBN.COM id aa03991;
          23 Apr 94 2:39 EDT
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Sat, 23 Apr 94 15:33:29 +0900
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9404230633.AA22971@necom830.cc.titech.ac.jp>
Subject: Re:  Dave's questions
To: kasten@ftp.com
Date: Sat, 23 Apr 94 15:33:28 JST
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM
In-Reply-To: <9404201253.AA17562@mailserv-D.ftp.com>; from "Frank Kastenholz" at Apr 20, 94 8:53 am
X-Mailer: ELM [version 2.3 PL11]

> I assume by this you mean "Nimrod can not support changing the locator
> of a node in the network graph"? True, in the Nimrod architecture document
> there is no specification of a protocol that does this. But, the document
> specifies an architecture, not protocols.

Urrrrr....

>  > > if you
>  > > assume that movement is limited to the leaves of the tree then it
>  > > makes sense to allow individual leaves to ask some service "what's my
>  > > new locator?".

Didn't you say "new locator" here? Doesn't it imply locator change?

> This is true. It is also irrelevant to my discussion. I was talking
> about what actions have to occur when a node(*) changes its location
> on the network graph.

It depends on how the protocol for locator change is.

> The aggregation would occur at some level higher than the individual
> person.  If both your home and your office get service from the same
> service provider, then aggregation could occur there.

Your assumption is broken.

My home will get service from service providers most convenient to my
home.

My office will get service from service providers most convenient to my
office.

> In either case, this is invisible to you in Japan since (I assume)

I haven't say my move around tokyo affects something in US.

							Masataka Ohta


Received: from PIZZA.BBN.COM by BBN.COM id aa18360; 25 Apr 94 10:43 EDT
Received: from pizza by PIZZA.BBN.COM id aa01192; 25 Apr 94 10:27 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa01188; 25 Apr 94 10:24 EDT
Received: from wd40.ftp.com by BBN.COM id aa16105; 25 Apr 94 10:09 EDT
Received: from ftp.com by ftp.com  ; Mon, 25 Apr 1994 10:09:30 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Mon, 25 Apr 1994 10:09:30 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA06713; Mon, 25 Apr 94 10:08:22 EDT
Date: Mon, 25 Apr 94 10:08:22 EDT
Message-Id: <9404251408.AA06713@mailserv-D.ftp.com>
To: nimrod-wg@BBN.COM
Subject: bottom-up or top-down
From: Frank Kastenholz <kasten@ftp.com>
Reply-To: kasten@ftp.com
Content-Length: 7249

On the flight back from Seattle I was tired and suffering from a cold
with the attendant earaches due to the pressurization/
depressurization of the cabin. So what else would my brain turn to
but Nimrod -- I guess that my sanity was the first casualty of the
week. Specifically, I started pondering the discussion started by
Dave Bridgham about whether the tree grows from the bottom up or from
the top down.

The problem that Dave, et al, had was that if the tree grows from the
bottom up (that is, the leaves of the tree are assgned level number 0
and the root is assigned level N with N>0) then it must have the same
height all over. This is needed so that, given a locator, you can
tell where within the tree that the locator 'belongs'.

Assume that you had a hierarchy of 7 nodes, numbered 1-7, with arcs
1-2, 1-3, 2-4, 2-5, 3-6, and 3-7. Node 1 is the root node and nodes 4
5 6 and 7 are the leaf nodes. Further, assume that nodes 1, 2, 4, and
6 have been assigned locator-element "A" and nodes 3, 5, and 7 have
element "B" -- that is, the mapping of full locators, starting at the
root node, for the individual nodes would be:
    Node    Full Locator
     1      A
     2      A.A
     3      A.B
     4      A.A.A
     5      A.A.B
     6      A.B.A
     7      A.B.B

The graph would look something like:
                +----+
                | 1A |
                +----+
         |-------|  |-----------|
      +----+                  +----+
      | 2A |                  | 3B |
      +----+                  +----+
   |---|  |----|            |--|  |---|  
+----+        +----+    +----+       +----+
| 4A |        | 5B |    | 6A |       | 7B |
+----+        +----+    +----+       +----+

(In this note, I'll always use numbers to uniquely name nodes,
letters to be the locator elements, and arcs will be named by
the two nodes that they connect)


So, suppose that you are at node #4 and have locator A.B -- does it
refer to node #5 or to node #3? You might be tempted to say that all
locators should be rooted at the root (? :-). However, if the
hierarchy can grow upward, you might not really know where the root
is. You can not root the locator at the leaves, since it would not be
unambiguous ( e.g, add another node to the above diagram, #8, with
locator-element "B" and connect it to the graph via a link 5-8 -- if
locators were rooted at the leaves, then the locator "B.B" would
identify both nodes 8 and 3 -- oops).

So, you have to include some concept of where in the tree the locator
"belongs". Dave and others have been assuming that this information
is a 'level number' -- in the above picture, the leaves would all be
at level 1, the root at level 3. However, this seems to require that
all leaf nodes be at the same level, i.e. level 1, which means that
adding new levels in a local manner is impossible without some severe
graph gyrations (e.g. MAP's notion of using fractions, or sparse
number spaces and the like).

Now, My Idea... Why not number the levels relative to the node which
is generating the locator?


For example, in the following locator hierarchy (numbers are
the absoloute node identifiers [ala EIDs], and letters are the
elements of the locators) 
                        1A
                      /    \
                    /        \
                  2A         3B
                 /  \        /  \     
               4A    5B    6A   7B
              /  \     \           \
            8A   9B   10A         11A
                       /  \           \
                    12A  13B         14A
                                     /  \
                                  15A   16B

The locator string A.B could refer to nodes 3, 5, 9, 13, or 16.
However, if locators are 'qualified' by identifying how many levels
up the tree one must go before finding the 'root' of the locator, A.B
could then be uniquely used to identify something. For example, node
8 could refer to node 9 as 2.A.B (go up two levels to node 2, and
then go to child-node A (node 4) and then child node B (9)). Node 9
could also be referred to as 1.B (up 1 to node 4 and then to child
B), or 3.A.A.B.

Note that if node 8 thinks that the 'global root' of the locator
hierarchy is node 2 (e.g. we've added a new layer at the top but node
8 does not know it yet), then node 8 would not be able to communicate
with nodes 1, 3, 6, 7, 11, 14, 15, or 16 because it would not be able
to build a well formed locator for those nodes (that is, node 8
believes that it can go up only two levels, not three, so therefore
it can not build a locator rooted at node 1).


If the graph is acyclic (as I've drawn it here), node-to-node
forwarding gets real simple. Any given node needs to know only a few
things -- which way is 'up' in the graph, and which way does it need
to send packets to get to lower, contained, nodes. Obviously, when a
packet is passed from one node to another (i.e. across an arc in the
graph), things are a bit more complicated since there might be
several links connecting two nodes, each link having different
levels/grades of service and so on -- but I do not feel that this is
a major problem.


More likely, the graph is going to have cycles in it -- people
will want to have multiple service providers and so on:

                        1A
                      /     \
                    /         \
                  2A           3B
                 /  \         / ^ \ 
               /      \    6A   | 7B
             4A        B5C<-----+   \ 
            /  \        \            \
          8A    9B      10A         11A
                       /   \            \
                    12A     13B         14A
                                       /    \
                                     15A    16B

In this example, node 5 is connected to both nodes 2 and 3.  Node 5
also has two locator elements assigned to it. From "within" node 2,
it has locator element B and from "within" node 3 it has locator
element C; thus, from node 1 there are two, equally valid, locators
for node 5: A.A.B and A.B.C (and, obviously, node 10 has locators
A.A.B.A and A.B.C.A).

This shows that locators really are bound to arcs and not nodes of
the map (this was blatantly obvious to me when I drew this graph and,
at the same time, observed that node 1 does not need a locator
element -- it has no 'containing' node...)

===============================================

Now, there is a problem here. What does a host get when it tries to
get the locator for some node? (I use the term 'node' here explicitly
as a point in the locator graph and 'host' as the computer that sits
on my desk). There is a mapping service that, given a host-name
(probably a FQDN) and will return a locator to reach that host (just
as we get an IPv4 address out of the DNS today).  If a host in node-8
tries to reach a host in node-9, what locator does it get back?
There are many locators that refer to node 9. The actual locator will
depend on where the asker is. Valid locators to reach node 9 from
node 8 are 1.B, 2.A.B and 3.A.A.B. If Node16 is trying to reach node
9, the only valid locator would be 5.A.A.B. This I have not yet
figured out.


--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa18366; 25 Apr 94 10:43 EDT
Received: from pizza by PIZZA.BBN.COM id aa01179; 25 Apr 94 10:27 EDT
Received: from BBN.COM by PIZZA.BBN.COM id ab01168; 25 Apr 94 10:23 EDT
Received: from wd40.ftp.com by BBN.COM id aa16097; 25 Apr 94 10:09 EDT
Received: from ftp.com by ftp.com  ; Mon, 25 Apr 1994 10:09:26 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Mon, 25 Apr 1994 10:09:26 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA06710; Mon, 25 Apr 94 10:08:18 EDT
Date: Mon, 25 Apr 94 10:08:18 EDT
Message-Id: <9404251408.AA06710@mailserv-D.ftp.com>
To: nimrod-wg@BBN.COM
Subject: Architecture Comments
From: Frank Kastenholz <kasten@ftp.com>
Reply-To: kasten@ftp.com
Content-Length: 14164

I've finally finished reading the architecture document that was
mailed out on 15 March. I have a few comments. Some are merely
editorial in nature, some are of a bit more substance.. Some of these
are rather general questions, which i only thought of asking when reading
the section indicated.

1. Bullet 1 of section 1.1 (Constraints) says that the Internet will grow
   to O(10**9) networks. Current thinking is that the number may be
   as high as 10**12. I don't think that this will really be a problem for
   Nimrod, but I wanted to point it out.

2. Bullet 7 of 1.1 (constraints still) says that larger entities will
   move less often than smaller entities (my paraphrasing). While this
   may be true, I have a feeling that if a large entity (e.g. a network
   or collection of networks) will move at all, then it will move
   a lot. Think about networks in planes or cars or ships...

3. In section 1.2 (Basic Routing Functions), it says that Nimrod
   does distribution using link-state routing. I do understand the
   advantages of LS over DV routing. My concern is that this may
   be specification where specification is not required. It seems that
   a cluster should be free to route within itself in any manner
   that it wants. The only requirement is that it can truly provide
   the services/connections that it advertises outside of itself. Or
   am I missing something?

4. In section 1.3.1 (Clustering), second paragraph. it says that Nimrod
   does not specify a cluster formation algorithm. It seems to me that
   we may be sacrificing some simplicity here. I have a feeling (and it
   is only a feeling, so don't ask me to explain it any more than this)
   that, since clustering is an essential part of Nimrod, we have to
   have some description of how clusters are built. I just have this
   feeling of incompletness here...

5. In section 1.3.1 (Clustering), third paragraph. It says that "two branches
   can not be in the same cluster unless that cluster also contains the 
   network connecting them". I read this as saying that the two PCs on
   my desk can not be in the same cluster unless the ethernet to which
   they are both connected is also in the cluster. Yes? 

   What about using virtual links here?

6. In section 1.3.1 (Clustering), last paragraph. It says that
   clustering is distinct from the physical organization of the
   components -- cluster boundaries may not coincide with host,
   router, etc boundaries. Does this generality add complexity?
   Would it be simpler to limit clusters to physical entity boundaries?
   What would we lose by making such a limit? (other than generality).

7. In section 1.4 (The Internet), bullet 1 does not take into account
   CIDR. SHould it?

8. In section 1.4 (The Internet), bullet 3 says that there are 232 possible
   distinct IP addresses and 109 networks projected for the future. I think
   that this may be wrong.

9. Section 2.1, Endpoints. The sentence in the middle of the paragraph
   probably wants to read "For ease of management, EIDs might be
   hierarchically administered, but this is not required".

10. Section 2.2, Maps. The third paragraph says that a host as
    access to one or more route servers. Note that Nimrod
    must be able to work on an isolated, single ip-network (equivalent)
    piece of wire with only two PCs on it.

11. In section 2.3 the document talks about border points. What are
    border points? What use do they serve in the architecture (I don't
    recall seeing them used anyplace else)? Are border points nodes
    of the graph? If so, then are the arcs that connect to border points
    really connecting to border points within the border points? (feel
    free to recurse as much as desired here :-)

12. In section 2.3, under multi-point arcs. Suppose that I have
    a mesh network that supports some forms of service allocation
    or whatever at the datalink level, such as ATM. There are many nodes,
    all of which can communicate. However, some
    subset of the nodes is given additional service levels or
    whatever. How is this dealt with?

13. In 2.3.1, Internal Maps, second paragraph, it says that a "router can
    obtain more detailed maps ... recursively". At what point does this
    stop? Is there a logical 'ending' point, or can it be turtles all the
    way down?

14. In 2.3.1, Internal Maps, third graf, says "A transit map---not
    containing nodes---cannot be further expanded."  This is an odd
    statement. First of all, a transit map contains some set
    of nodes and arcs. These nodes are the border-point nodes of the
    map, and the arcs represent the connectivity services offered by the
    map.  Now, an observation on this: This map contains nodes. Each of
    these nodes (i.e. the border-point nodes) ought to be expandable (
    ignoring administrative constraint) showing how that border-point
    node 'connects' the various arcs which connect to it. Also, the
    border-point node, being a node, will have arcs connecting to it
    and those arcs will have to connect to border-points within the 
    border-point (section 2.3, graf 2). Feel free to recurse at this
    point as much as you'd like :-) 

    Maybe something extraordinarily brilliant, sublime, and subtle
    is going on here and I just don't get it. Maybe not.

15. In section 2.4, Locators, graf 1. It says that a BTE is assigned only
    one locator. I have always been under the rough impression that the
    locator hierarchy would closely follow the provider hierarchy since
    network providers and backbones and the like, really define the actual
    topology with which we have to deal. If this impression is correct, the
    statement in the document says that even if I have two providers, I
    get only one locator. I do not think that limiting things to a single
    locator will be acceptable:
    - there is the problem of multi-homed hosts (waving your hands and
      saying 'not allowed' is not allowed).
    - when rearranging one's networks, one may wish to do things like
      assign multiple locators to hosts prior to doing the actual
      physical changes.
    - for mobility, if a concept similar to the base station concept
      is in use, then the host that moved may be seen as having 2
      locators (one for its true position and one the position of
      the base station which will 'forward' the traffic).
    - anything that gets relocatored should probably appear to be at both
      the old and the new locators for a while, allowing time for the
      old locators to drain out of the Internetwork without disrupting
      'current' traffic too much.
    - dual providers.

16. section 2.4, Locators, graf 2. You might want to have the sentence that
    begins "Given that the nodes in a map..." start a new graf. 

17. Section 3.1, last graf. The diagram associated shows x.net and y.net
    as providers and z.com as a service user and then discusses what
    happens when z.com changes service providers (from x to y). The paragraph
    in question says "caching of locators must be done carefully". This
    brought up a thought. If, when z.com changes its providers, from
    x.net to y.net, why not have x.net redirect the packets that
    it receives which are destined for z.com to y.net? (presumably
    the two (x.net and y.net) are connected via some higer level entities
    such as a national backbone). This may be more 'algorithm' and
    less 'architecture' but the possibility ought to be mentioned.

    (Aside, I've found reading the diagrams sometimes confusing. Mostly
    because of the multiple use of the period (.). In some of the diagrams
    that I've drawn in ascii, I've found the following conventions very
    helpful:
    - periods separate locator elements, only.
    - if you want to have things like x.net as the 'name' of a blob
      in the picture, do it as x-net. 
    - i generally assign each node a unique id number in the picture, making
      it easier to show 'which' node the text is referring to.
    - i always make locator elements uppercase, single, letters and, out of
      habit, always start at A. 
    I've found that following rules like this helps make it easy to understand
    both the text and the diagram.)

18. Section 3.2, Multiple Locator Assignment.  How does this fit with 
    section 2.4, Locators, graf 1 where it says that a BTE is assigned only
    one locator.

19. Fig 2 on page 14. I do not understand how locators are assigned here.
    we went through this in Seattle so there's probably no need to go
    into it here.

20. Sect 4, Forwarding, Bullet2 -- BTEC mode is very similar in general
    concept to IPv4's source routing, yes?

21. Sect 4, last graf, there are a couple of poorly worded sentences:
    "Given a map, a packet moves to the node in this map to which the
     the associated destination locator belongs to"
    Change the 'belongs to' into 'refers'. And
    "If the destination node has a ``detailed'' internal map, the destination
     locator should belong to..."
    Again, change the 'belong' into 'refer'

22. Sect 4.4, BTE Chain Mode. The last sentence of the first graf says
    "..a locator in the BTEC header could correspond to the type of
     service...[and not the physical path]" argh! Tilt!

    Does this say that there are these abstract things in the network
    called 'types of service' which have locations on the topology?
    I was sort of under the impression that if you wanted specific
    types of service, you'd need a flow? 

    The next two grafs of this section seem to be overly complicated by
    the existance of multi-point arcs. if we get rid of multipoint arcs,
    aren't things made simpler?

23. Section 4.4, graf 6, it says "ii) routers would maintain, for each BTE,
    a pre setup flow which provides connectivity similar to that of the BTE."
    For what "for each BTE"? Is it each adjacent BTE? Each BTE in the Internet?
    Each BTE of which the router is aware? Some randomly selected set of BTEs?

24. Section 4.4, last two grafs. There is an implication here that internet
    level headers will (may) change as  the packet goes through the internet.
    This may have ramifications for security and integrity functions
    of the internet protocols (suppose that a crypto-checksum is run over
    packets by the source, which the destination uses to authenticate the
    packet?). Also, obviously the EIDs can not change. I would envision
    that the EIDs are used by the TCP as a part of the connection
    identification.

25. Section 4.5, Datagram Mode. graf 4. It says that a packet contains a
    pointer into the locators. Which locators? There are only two of them.
    The bullet should include the next paragraph, which describes the
    use of the pointer (i interpreted the bullet as saying that the
    pointer is into a chain of locators, much like the source-route
    pointer in IPv4).

26. Same section, graf 5, The sentence that begines "In addition to these
    extra fields..." is a non-sequitur with respect to the rest of the graf. 
    Also, this sentence needs to be expanded. What are the 'critical places
    in the abstraction hierarchy'? How are they defined? How do routers
    find them?

27. Same section, graf 6. The end of this paragraph ("As an efficiency
    move...") implies that there is a nimrod packet encapsulation (which
    can include the flow-setup stuff and the actual datagram).

28. Same section, graf 7. This section seems to imply that routers have
    presetup flows all the way up the hierarchy (or be able to do so on
    demand). Also, is it possible that the active router would
    have a shortcut to the destination. The example given discusses a
    packet going from A.P.Q.R to A.X.Y.Z. It says that a packet must
    go all the way up to A and then all the way back down. Suppose that a
    router at Q has a 'private backdoor' to a router at Y (for example,
    A.P.Q and A.X.Y are two branch offices of the same company, and the company
    has set up a private link to connect these offices together). Can the
    router at Q send the packet directly to the router at Y? This is an
    extremely important ability -- companies may very well wish to route
    internal traffic over internal links, links which they have control
    over, to avoid possible provider traffic-based charges, security issues,
    and so on.

29. Same section, in general it seems that the notion of a border router
    is introduced here. It seems that this is a specific element of the
    architecture and should be called out (someplace) as such and its
    specific qualities and attributes described.

30. Section 5, Renumbering, and its figure. Can the nmode which gets
    renumbered be any node of the network? With renumbering occuring
    recursively down ward, until the last turtle is reached.

31. Section 6.1.1, Effects of Mobility. Observation: If Networks can move,
    and hosts can move, then there might be multiple, redirected, hops via
    multiple home representatives. (i.e. if a network moves, then all
    packets directed to that network would first go to the network's
    'old place' and then be redirected to the network's  'new place'. 
    Now, if a host within that network moves, packets directed to the
    host are first sent to the network's 'old place', which redirects the
    packets to the network's 'new place' and the packets enter the
    destination host's network, which sees that the destination host
    has moved and so will redirect the packets now to the host's 'new
    place).

32. Section 6.2.1 Approaches (to mobility). This whole section brings
    up a critical question. How are EIDs structured? Remember, they are
    here, in effect, defined to be database keys. They can not be
    simple, random, numbers. If they are, we end up with hosts.txt.
    There must be hierarchy in the EIDs so that the database which
    maps EIDs to locators can be distributed and delegated. Otherwise
    Nimrod does not work...

-- 
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa26790; 25 Apr 94 13:02 EDT
Received: from pizza by PIZZA.BBN.COM id aa02423; 25 Apr 94 12:41 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa02419; 25 Apr 94 12:39 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa25232; 25 Apr 94 12:34 EDT
Received: by ginger.lcs.mit.edu 
	id AA26154; Mon, 25 Apr 94 12:34:26 -0400
Date: Mon, 25 Apr 94 12:34:26 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404251634.AA26154@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re:  Dave's questions
Cc: jnc@ginger.lcs.mit.edu

    From: dab@epilogue.com

    The design of locators in the architecture document had them
    growing top down and as far down as you wanted.  In other words, the
    locators didn't have to stop at the interface they could go inside the
    machine.  Way cool, I've always wanted to do that.  Strict bottom up
    locators don't necessarily let you do that

There are ways to do that with bottom-up; you just have to think of extending
"sideways". E.g., if you're running a VM OS, the host becomes a router, and
there is a virtual net inside the machine, and each virtual machine gets an
interface to that virtual net; the virtual network can have a locator at the
same level as the real network the host is attched to. The process continues
to work if you have VM OS's running in the VM's. Of course, at a certain
point, the area containing all those nets may get so large you want to split
it, which bring me to my next point...

    but you were pushing for being able to grow both up and down

Well, "pushing" is a bit too strong; I was contemplating the possibility.  The
main reason I came to this point was asking myself: suppose area X at level K
gets too large, and wants to split; the obvious tack is to become two areas,
X1 and X2, at level K. However, suppose this is not possible? Being able to
introduce another sub-layer below K, and make X into two things at that level
(so instead of K.X.mumble turning into K.X1.mumble, it turns into K.X.1.mumble)
is one way out.

However, I'm not sure the extra complexity of this is worth it; locators can
become even more tricky. Maybe you just have to split into two K level things.

I think it'll become clearer which is the right way to go as more about
locators in general, and especially how to find the binding context of a given
locator, becomes clear.  E.g., if levels are numbered (both to help with
finding binding contexts, as well as allowing non-unique labels), if locators
can grow in the middle the numbers have to be rationals; uggh.

    So it looks like we're going to get locators that don't stop at the
    interface but go inside the machine. Now moving a host looks like moving
    an entire network.  If nimrod can handle one it can handle the other.

Yes and no. If you move a single endpoint, you can use simple mechanisms to
tell it (and everyone else) it has moved. If you move a group of entities,
the mechanisms to notify them all are inevitably more complicated, as they
have to scale, etc.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa14812; 25 Apr 94 15:48 EDT
Received: from pizza by PIZZA.BBN.COM id aa03761; 25 Apr 94 15:24 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa03754; 25 Apr 94 15:22 EDT
Received: from wd40.ftp.com by BBN.COM id aa07379; 25 Apr 94 15:11 EDT
Received: from ftp.com by ftp.com  ; Mon, 25 Apr 1994 15:11:27 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Mon, 25 Apr 1994 15:11:27 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA12088; Mon, 25 Apr 94 15:10:20 EDT
Date: Mon, 25 Apr 94 15:10:20 EDT
Message-Id: <9404251910.AA12088@mailserv-D.ftp.com>
To: daniel@catarina.usc.edu
Subject: Re: bottom-up or top-down
From: Frank Kastenholz <kasten@ftp.com>
Reply-To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM
Content-Length: 440

 > >> On Mon, 25 Apr 94 10:08:22 EDT, Frank Kastenholz <kasten@ftp.com> said:

 > 
 > But what happens when you try to ...

eeeeek! i sent that out? i didn't mean to! it had bugs (like the one
you pointed out). never mind. delete it. expunge it. erase all traces
of it from your memory. pretend it never happened. sorry to waste the
bandwidth.


--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa23480; 25 Apr 94 16:29 EDT
Received: from pizza by PIZZA.BBN.COM id aa04055; 25 Apr 94 16:03 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa04051; 25 Apr 94 16:01 EDT
Received: from usc.edu by BBN.COM id aa15238; 25 Apr 94 15:50 EDT
Received: from laguna.usc.edu by usc.edu (4.1/SMI-3.0DEV3-USC+3.1)
	id AA05567; Mon, 25 Apr 94 12:03:42 PDT
Received: by laguna.usc.edu (4.1/SMI-4.1+ucs-3.6)
	id AA02454; Mon, 25 Apr 94 12:10:15 PDT
Date: Mon, 25 Apr 94 12:10:15 PDT
From: "Daniel M. Alexander Zappala" <dzappala@catarina.usc.edu>
Message-Id: <9404251910.AA02454@laguna.usc.edu>
To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM
In-Reply-To: <9404251408.AA06713@mailserv-D.ftp.com> (message from Frank Kastenholz on Mon, 25 Apr 94 10:08:22 EDT)
Subject: Re: bottom-up or top-down
Reply-To: daniel@catarina.usc.edu


>> On Mon, 25 Apr 94 10:08:22 EDT, Frank Kastenholz <kasten@ftp.com> said:

> More likely, the graph is going to have cycles in it -- people
> will want to have multiple service providers and so on:

>                         1A
>                       /     \
>                     /         \
>                   2A           3B
>                  /  \         / ^ \ 
>                /      \    6A   | 7B
>              4A        B5C<-----+   \ 
>             /  \        \            \
>           8A    9B      10A         11A
>                        /   \            \
>                     12A     13B         14A
>                                        /    \
>                                      15A    16B

> In this example, node 5 is connected to both nodes 2 and 3.  Node 5
> also has two locator elements assigned to it. From "within" node 2,
> it has locator element B and from "within" node 3 it has locator
> element C; thus, from node 1 there are two, equally valid, locators
> for node 5: A.A.B and A.B.C (and, obviously, node 10 has locators
> A.A.B.A and A.B.C.A).

But what happens when you try to go up from a node that has several
links in the "up" direction?  Which path do you take?  Moreover, in your
example it is not too confusing, but if node 5 has a link to node 4,
then 1.A.A does not uniquely define a path from node 5.


Daniel


Received: from PIZZA.BBN.COM by BBN.COM id aa26374; 26 Apr 94 14:24 EDT
Received: from pizza by PIZZA.BBN.COM id aa11475; 26 Apr 94 14:06 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa11471; 26 Apr 94 14:03 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa24909; 26 Apr 94 13:59 EDT
Received: by ginger.lcs.mit.edu 
	id AA05986; Tue, 26 Apr 94 13:27:03 -0400
Date: Tue, 26 Apr 94 13:27:03 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404261727.AA05986@ginger.lcs.mit.edu>
To: int-serv@isi.edu, nimrod-wg@BBN.COM, rsvp@isi.edu, sdrp@catarina.usc.edu
Subject: Do we need a 'flows' mailing list?
Cc: big-internet@munnari.oz.au, jnc@ginger.lcs.mit.edu

	During a discussion with Craig, he suggested that I poll people to see
if those of us who believe in flows should have a single mailing list for
discussing generic flow stuff on.
	The current situation, with 4 (or more) groups working on flows means
that either i) generic topics cause people to get multiple copies or ii) some
people miss useful stuff. For instance, some of the Nimrod people at BBN missed
the discussion of mulitcast flows which happened on the RSVP list.

	Replies pro and con to me *only*, please (no 'reply-all' :-), and I'll
send out a summary. (Volunteers to host same also accepted...)

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa23900; 28 Apr 94 14:56 EDT
Received: from pizza by PIZZA.BBN.COM id aa26456; 28 Apr 94 14:40 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa26452; 28 Apr 94 14:35 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa22571; 28 Apr 94 14:34 EDT
Received: by ginger.lcs.mit.edu 
	id AA25086; Thu, 28 Apr 94 14:34:24 -0400
Date: Thu, 28 Apr 94 14:34:24 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404281834.AA25086@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: IPng requirements document points....
Cc: jnc@ginger.lcs.mit.edu

	During a phone conversation with Martha, she had a number of comments
on the requirements. I'll note them briefly here, and I'll shortly post a
revised version of the requirements.

    - Instead of a "hop count", substitute a "looping packet detector". It is
true that the hop count is per se not part of the forwarding architecture
(except inasmuch as we do wish to see a "belt and suspenders" approach to
packet looping), but any equivalent mechanism (such as a timestamp) will do.

    - The pointers into the locator and source route are really not strictly
speaking necesary, since you should be able to tell what element to process
next if things are working correctly. It is this not necessary, but my
engineer's sixth sense says this is well worth it, and will be a big win, on
robustness grounds alone (let along processing efficiency). Wording modified
to indicate this.

    - The kind of source route I have been asssuming we were going to have
is what you might call "semi-strict", which is to say that the route does
not have to name all individual routers it traverses, but it does have to
list topologically contiguous elements, albeit potentially high-level ones.
The other possibility is classical "loose" source routing, in which only a
few intermediate points through which the route has to pass are named. It
has been an open question as to whether Nimrod would provide both, or only
one, and this topic will be explored in more detail in a bit. The wording
was modified to indicate the needs of both modes, and make clear that no
choice had been made as to which (or both) to include.

  - A header version number is always useful.

  - Authentication of some sort is needed. See the recent IAB document from
the IAB architecture retreat on security (draft-iab-sec-arch-workshop-00.txt),
section 4, and especially section 4.3. There is currently no set way of doing
"denial/theft of service" in Nimrod, but this topic is well explored in that
document.


	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa02259; 28 Apr 94 16:41 EDT
Received: from pizza by PIZZA.BBN.COM id aa27348; 28 Apr 94 16:25 EDT
Received: from BBN.COM by PIZZA.BBN.COM id ab27341; 28 Apr 94 16:23 EDT
Received: from ftp.com by BBN.COM id aa01087; 28 Apr 94 16:23 EDT
Received: by ftp.com  ; Thu, 28 Apr 1994 16:23:12 -0400
Received: by ftp.com  ; Thu, 28 Apr 1994 16:23:12 -0400
Date: Thu, 28 Apr 1994 16:23:12 -0400
From: Jim DeMarco <jdemarco@ftp.com>
Message-Id: <9404282023.AA21551@ftp.com>
To: jnc@ginger.lcs.mit.edu
Cc: nimrod-wg@BBN.COM
In-Reply-To: Noel Chiappa's message of Thu, 28 Apr 94 14:34:24 -0400 <9404281834.AA25086@ginger.lcs.mit.edu>
Subject: IPng requirements document points....
Reply-To: jdemarco@ftp.com

>    - Instead of a "hop count", substitute a "looping packet detector". It is
>true that the hop count is per se not part of the forwarding architecture
>(except inasmuch as we do wish to see a "belt and suspenders" approach to
>packet looping), but any equivalent mechanism (such as a timestamp) will do.

I believe many current trace-route programs make use of the "hop
count" field, and they have proven themselves quite valuable.  Are
there alternative strategies available to perform the same function
that don't require a hop count?

--Jim


Received: from PIZZA.BBN.COM by BBN.COM id aa03117; 28 Apr 94 16:56 EDT
Received: from pizza by PIZZA.BBN.COM id aa27523; 28 Apr 94 16:40 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa27518; 28 Apr 94 16:38 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa02007; 28 Apr 94 16:36 EDT
Received: by ginger.lcs.mit.edu 
	id AA27140; Thu, 28 Apr 94 16:36:57 -0400
Date: Thu, 28 Apr 94 16:36:57 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404282036.AA27140@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re:  IPng requirements document points....
Cc: jnc@ginger.lcs.mit.edu

    I believe many current trace-route programs make use of the "hop
    count" field, and they have proven themselves quite valuable.

Good point.

    Are there alternative strategies available to perform the same function
    that don't require a hop count?

Well, for anything source routed, you wouldn't need it, of course. For
datagram mode, if we don't use hop count, an explicit inquiry tool could be
done. One was almost done for ICMP (you send router X a packet, asking for the
next hop it would use for destination Y), before someone figured out the hop-
count hack with trace-route. I don't think it's a major issue.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa03784; 28 Apr 94 17:10 EDT
Received: from pizza by PIZZA.BBN.COM id aa27692; 28 Apr 94 16:56 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa27688; 28 Apr 94 16:55 EDT
Received: from ftp.com by BBN.COM id aa03007; 28 Apr 94 16:54 EDT
Received: from ftp.com by ftp.com  ; Thu, 28 Apr 1994 16:54:39 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Thu, 28 Apr 1994 16:54:39 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA25578; Thu, 28 Apr 94 16:53:25 EDT
Date: Thu, 28 Apr 94 16:53:25 EDT
Message-Id: <9404282053.AA25578@mailserv-D.ftp.com>
To: jnc@ginger.lcs.mit.edu
Subject: Re:  IPng requirements document points....
From: Frank Kastenholz <kasten@ftp.com>
Reply-To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM, jnc@ginger.lcs.mit.edu
Content-Length: 829

>    I believe many current trace-route programs make use of the "hop
>    count" field, and they have proven themselves quite valuable.
>
>Good point.
>
>    Are there alternative strategies available to perform the same function
>    that don't require a hop count?
>
>Well, for anything source routed, you wouldn't need it, of course. For
>datagram mode, if we don't use hop count, an explicit inquiry tool could be
>done. One was almost done for ICMP (you send router X a packet, asking for the
>next hop it would use for destination Y), before someone figured out the hop-
>count hack with trace-route. I don't think it's a major issue.

and, of course, nimrod works by distributing maps. just go get the map
and take a look :-)

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa03963; 28 Apr 94 17:14 EDT
Received: from pizza by PIZZA.BBN.COM id aa27673; 28 Apr 94 16:54 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa27669; 28 Apr 94 16:53 EDT
Received: from usc.edu by BBN.COM id aa02859; 28 Apr 94 16:52 EDT
Received: from catarina.usc.edu by usc.edu (4.1/SMI-3.0DEV3-USC+3.1)
	id AA09335; Thu, 28 Apr 94 13:52:13 PDT
Received: from catarina.usc.edu by catarina.usc.edu (4.1/SMI-4.1+ucs-3.6)
	id AA24672; Thu, 28 Apr 94 13:55:20 PDT
Message-Id: <9404282055.AA24672@catarina.usc.edu>
From: kannan@catarina.usc.edu
To: nimrod-wg@BBN.COM
Subject: Re: IPng requirements document points.... 
In-Reply-To: Your message of Thu, 28 Apr 1994 16:36:57 -0400.<9404282036.AA27140@ginger.lcs.mit.edu> 
Date: Thu, 28 Apr 1994 13:55:20 -0700
Sender: kannan@catarina.usc.edu


>     I believe many current trace-route programs make use of the "hop
>     count" field, and they have proven themselves quite valuable.

Actually, for most purposes, SNMP should be sufficient in returning the
next-hop(s), in a better way, especially when multiple equalk cost next
hops etc. exist.  The only functional difference might be that
traceroute works through the forwarding code, while SNMP would only look
up a FIB, but the difference to a human being might be insignificant.

>     Are there alternative strategies available to perform the same function
>     that don't require a hop count?
> 
> Well, for anything source routed, you wouldn't need it, of course. For

Noel:  Are you asserting that when using source-routed paths, I would
not need traceroutes?  Why would that be so?


Kannan


Received: from PIZZA.BBN.COM by BBN.COM id aa05800; 28 Apr 94 17:48 EDT
Received: from pizza by PIZZA.BBN.COM id aa28076; 28 Apr 94 17:36 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa28072; 28 Apr 94 17:34 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa04987; 28 Apr 94 17:32 EDT
Received: by ginger.lcs.mit.edu 
	id AA27802; Thu, 28 Apr 94 17:32:40 -0400
Date: Thu, 28 Apr 94 17:32:40 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404282132.AA27802@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re:  IPng requirements document points....
Cc: jncjnc@ginger.lcs.mit.edu

    and, of course, nimrod works by distributing maps. just go get the map
    and take a look :-)

Oh, right, yeah; forgot about that! :-)

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa06586; 28 Apr 94 18:08 EDT
Received: from pizza by PIZZA.BBN.COM id aa28140; 28 Apr 94 17:43 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa28136; 28 Apr 94 17:41 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa05434; 28 Apr 94 17:40 EDT
Received: by ginger.lcs.mit.edu 
	id AA27884; Thu, 28 Apr 94 17:40:36 -0400
Date: Thu, 28 Apr 94 17:40:36 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9404282140.AA27884@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re: IPng requirements document points....
Cc: jnc@ginger.lcs.mit.edu

    Actually, for most purposes, SNMP should be sufficient in returning the
    next-hop(s)

Well, maybe not. That requires a publicly available SNMP session, something
that sites may not want to do. However, specific, limited, ICMP network
maintainence things (e.g. Ping) stand a much better chance.

    especially when multiple equalk cost next hops etc. exist.

You can fix that with allowing multiple next hops in the ICMP message
(something it's hard to do with the hop-count hack). Also, to the extent that
the routing decision is taking as input more than just the destination (e.g.
TOS), whatever mechanism is doing the lookup should provide all the same info.


    Are you asserting that when using source-routed paths, I would
    not need traceroutes?  Why would that be so?

The thinking goes that you *already* know the path. Howver, there are a few
issues.

First, we need to have this discussion about the differences between
"semi-strict" and "loose" source routing, as alluded to in the notes of
Martha's comments. In a loose source routed system, you'd clearly need it.
However, even in semi-strict, it might be nice, since if the semi-strict
source route was given in terms of high level entities, you might want to see
what physical assets that got translated into.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa12213; 29 Apr 94 2:03 EDT
Received: from pizza by PIZZA.BBN.COM id aa00886; 29 Apr 94 1:51 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa00882; 29 Apr 94 1:50 EDT
Received: from fennel.acc.com by BBN.COM id aa11002; 29 Apr 94 1:50 EDT
Received: from  by fennel.acc.com (4.1/SMI-4.1)
	id AB11373; Thu, 28 Apr 94 22:49:30 PDT
Message-Id: <9404290549.AB11373@fennel.acc.com>
X-Sender: fbaker@fennel.acc.com
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Thu, 28 Apr 1994 22:49:37 -0800
To: kannan@catarina.usc.edu, nimrod-wg@BBN.COM
From: Fred Baker <fbaker@acc.com>
Subject: Re: IPng requirements document points....

At  1:55 PM 4/28/94 -0700, kannan@catarina.usc.edu wrote:
>>     I believe many current trace-route programs make use of the "hop
>>     count" field, and they have proven themselves quite valuable.
>
>Actually, for most purposes, SNMP should be sufficient in returning the
>next-hop(s), in a better way, especially when multiple equalk cost next
>hops etc. exist.  The only functional difference might be that
>traceroute works through the forwarding code, while SNMP would only look
>up a FIB, but the difference to a human being might be insignificant.

the thing that may be at issue here is that all IP routers send ICMP TIME
EXCEEDED, but not all implement "public" as the common community - consider
SNMPV2, you may not have the appropriate objects in the general MIB view.

Consider also: if my trace is towards ftp.acc.com, the forwarding code
makes the analysis nicely. Out in the net, it will route towards 129.192.
At the egress point from CERFNET, it comes to a router which is running
OSPF, and routes towards 129.192.64. At SB-8230.acc.com, and ARP table
lookup is done, and the function forwarded. In SNMP, you cannot ask for
"whatever arp or route entry most directly relates to 129.192.64.25," you
must GET-NEXT through the route entries around the value, and if you find a
DIRECT route, check for an ipAddrEntry OR a corresponding
ipNetToMediaEntry. If the device exists but no ARP entry is cached, you
will be unable to translate the last hop until you first ping the target.

This is getting to be a fairly sophisticated application...

>>     Are there alternative strategies available to perform the same function
>>     that don't require a hop count?
>>
>> Well, for anything source routed, you wouldn't need it, of course. For
>
>Noel:  Are you asserting that when using source-routed paths, I would
>not need traceroutes?  Why would that be so?

The simple answer is that you already KNOW the route, by virtue of the
source-routed path. What you don't necessarily know is:

        how to calculate the path (traceroute might be useful for this, but
        then again maybe you want something more sophisticated for flow setup)

        If the path breaks, you may have difficulty figuring out where it broke.

        If the path is "loosely" source routed - the path gets you from one AS
        to the next without telling you how to transit the AS's - you are still
        lacking a fair bit of information.

_______________________________________________________________________
                        "There's nothing like hay when you're thirsty!"
                                                The White King...


Received: from PIZZA.BBN.COM by BBN.COM id aa22992; 2 May 94 19:02 EDT
Received: from pizza by PIZZA.BBN.COM id aa20097; 2 May 94 18:49 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa20089; 2 May 94 18:46 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa22320; 2 May 94 18:44 EDT
Received: by ginger.lcs.mit.edu 
	id AA01683; Mon, 2 May 94 18:44:39 -0400
Date: Mon, 2 May 94 18:44:39 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9405022244.AA01683@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: New Nimrod document repository...
Cc: jnc@ginger.lcs.mit.edu

	A new document repository has been set up (and a "thanks" to Frank
Kastenholz and FTP Software for the help), at research.ftp.com, in the
directory pub/nimrod. It contains the IPng requirements draft, the
architecture document, the old JNC I-D, and some other stuff (such as the
current jargon list, the list of open archirectural points, etc).

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa23121; 2 May 94 19:06 EDT
Received: from pizza by PIZZA.BBN.COM id aa20155; 2 May 94 18:54 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa20151; 2 May 94 18:53 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa22608; 2 May 94 18:52 EDT
Received: by ginger.lcs.mit.edu 
	id AA01725; Mon, 2 May 94 18:52:44 -0400
Date: Mon, 2 May 94 18:52:44 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9405022252.AA01725@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: New version of IPng requirements
Cc: jnc@ginger.lcs.mit.edu

	A new draft version of the "Nimrod IPng requirements" document has
been prepared.It is available from the Nimrod document repository, on
research.ftp.com, in pub/nimrod/ipng_req.txt, as plain ASCII text.

	This version has an entire added section which discusses in more
general terms the interaction of the routing subsystem with the other
subsystems of the internetwork layer. It includes an analysis of the
internetwork layer as a collection of subsystems (of which the group of stuff
called Nimrod compromises three subsystems), and material on state, flows,
and flow setup.
	It also includes comments from a variety of source, as discussed
on the mailing list.

	I need to get comments fairly soon, as I guess we have to have this in
my May 10 or so. So, let's say by noon Eastern time next Monday (the 9th).

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa18163; 4 May 94 11:51 EDT
Received: from pizza by PIZZA.BBN.COM id aa01685; 4 May 94 11:30 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa01681; 4 May 94 11:26 EDT
Received: from research.ftp.com by BBN.COM id aa16616; 4 May 94 11:24 EDT
Received: by Research.Ftp.Com (920330.SGI/)
	for nimrod-wg@bbn.com id AA02549; Wed, 4 May 94 11:20:31 -0400
Received: by Research.Ftp.Com 
	id AA02549; Wed, 4 May 94 11:20:31 -0400
Date: Wed, 4 May 1994 11:20:31 -0400 (EDT)
From: Frank Kastenholz <kasten@research.ftp.com>
Subject: Comments on "Nimrod and IPng Technical Requirements"
To: nimrod-wg@BBN.COM
Message-Id: <Pine.3.89.9405041122.B2509-0100000@research.ftp.com>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII


> - A looping packet detector. This is any mechanism that will detect a packet
>   which is "stuck" in the network; a timeout value in packets, together
>   with a check in routers,is an example. If this is a hop-count, it has
>   to be more than 8 bits; I would strongly recommend at least 12, and
>   recommend 16 (to make it easy to update). This is not to say that I think
>   networks with diameters larger than 256 are good, or that we should design
>   such nets, but I think limiting the maximum path through the network to
>   256 hops is likely to bite us down the road the same way making
>   "infinity" 16 in RIP did (as it did, eventually). When we hit that
>   ceiling, it's going to hurt, and there won't be an easy fix. I will
>   note in passing that we are already seeing paths lengths of over 30 hops.

Is this really needed? Nimrod works by map distribution. So, looking at 
the map, I can see if there is a loop or not and deal with it.  If the 
map does not show a loop, but there is a loop (i.e. the map is wrong) 
then I would imagine that there will be enough other problems that killing
loopy packets is the least of our worries.


>   (e.g. up, down, across, etc). Since those identifiers themselves are
>   variable length (although probably most will be two bytes or less,
>   otherwise the routing overhead inside the named object would be
>   excessive), and the hop count above contemplates the possibility of
>   paths of over 256 hops, it would seem that these might possibly some
>   day exceed 512 bytes, if a lengthy path was specified in terms of the
>   actual physical assets used.

In general, I'd suggest that we ask for 'large' length fields, and allow
administrative limits be placed on the max values. Then, as the need arises,
we can let the administrative limit grow. This also applies to the hop-count
field -- use a big field, but put a 'smallish' admin limit on it.

>3.1.2 The Subsystems of the Internetwork Layer

>	The subsystems which comprise the are covered by Nimrod are i) routing
>information distribution (in the case of Nimrod, topology map distribution),

Does this include distributing the attributes of the topology? That is, 
things like link access policies, and the like.


>3.2.2 Flows
>
>	A flow, from the user's point of view, is a.....

So, a flow is defined path through the internetwork which has certain
attributes (other than a simple source-to-destination 'connection')
associated with it and these attributes may have been explicitly used in
determining, creating, or selecting, the particular path. Yes?


>	The packets which belong to a flow could be identified by a tag
>consisting of a number of fields (such as addresses, ports, etc), as opposed
>to a specialized field.

Well,
1. you need more than just the source and destination address/eid/whatever
   to uniquely id a flow (i.e. there could be >1 flows between machines
   X and Y, each with its own traits).
2. not all protocols carried over IP have source/dest ports.
3. source/dest port pair might change with each transaction (e.g. SNMP
   over UDP).

So, the conclusion is that you need more than just the "IP addresses"
but you can not rely on any specific information carried in the IP payload,
so therefore, you need an explicit field in the IP header which identifies
the flow.


Frank


Received: from PIZZA.BBN.COM by BBN.COM id aa28435; 4 May 94 14:24 EDT
Received: from pizza by PIZZA.BBN.COM id aa02822; 4 May 94 14:05 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa02818; 4 May 94 14:03 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa27079; 4 May 94 14:03 EDT
Received: by ginger.lcs.mit.edu 
	id AA12882; Wed, 4 May 94 14:03:06 -0400
Date: Wed, 4 May 94 14:03:06 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9405041803.AA12882@ginger.lcs.mit.edu>
To: kasten@research.ftp.com, nimrod-wg@BBN.COM
Subject: Re:  Comments on "Nimrod and IPng Technical Requirements"
Cc: jnc@ginger.lcs.mit.edu

<Thanks for the comments. An version of the IPng requirements document
 updated to reflect these comments is available on the Nimrod repository at
 FTP.>


    > - A looping packet detector. This is any mechanism that will detect a
    > packet which is "stuck" in the network

    Is this really needed? Nimrod works by map distribution. So, looking at 
    the map, I can see if there is a loop or not and deal with it. If the 
    map does not show a loop, but there is a loop (i.e. the map is wrong) 
    then I would imagine that there will be enough other problems that killing
    loopy packets is the least of our worries.

We thought about this for a while, some time back. (Look in the mailing list
archive around 10 Jan 94.)

I know that "theoretically" Nimrod should not display routing loops. I'm very
suspicious of such statements! Real life seems to produce circumstances which
"shouldn't" happen, and we've paid painful prices all along (the 'Titanic',
etc) for designs that assumed that. If nothing else, implementation errors
(which do happen) could cause loops.

As I said at the time:

    My reasoning is that preventing looping data traffic is very desireable,
    since the side-effects are pretty bad. ... I guess what it really comes
    down to is "how common are loops going to be, and how much will they cost
    if we don't have a [detection mechanism]", versus "how much is the
    [detection mechanism] going to cost us".

Given all this, having a separate mechanism to detect and kill looping packets
seems wise. 'robustness' is a hard thing to quantify, but two completely
independant mechanisms to do the same thing seems like it's really robust.


    > it would seem that these might possibly some day exceed 512 bytes, if a
    > lengthy path was specified in terms of the actual physical assets used.

    In general, I'd suggest that we ask for 'large' length fields, and allow
    administrative limits be placed on the max values. Then, as the need
    arises, we can let the administrative limit grow.

Good point; I'll make a note to this effect in the introductory section on
fields (2.1). I have also gone through and redone the suggested lengths for
each field in terms of the new terminology defined in 2.1; if people could
please check these lengths, and see what they think of them....


    > The subsystems which ... are covered by Nimrod are i) routing
    > information distribution (... topology map distribution),

    Does this include distributing the attributes of the topology? That is, 
    things like link access policies, and the like.

Yes, I'll note it.


    > A flow, from the user's point of view, is a.....

    So, a flow is defined path through the internetwork which has certain
    attributes (other than a simple source-to-destination 'connection')
    associated with it and these attributes may have been explicitly used in
    determining, creating, or selecting, the particular path. Yes?

Yes, except that I'd say the "path through the internetwork" is one of the
attributes of the flow.

Actually, I guess we need to distinguish between i) a flow as a sequence of
packets, and ii) a flow, as the thing which is set up in the routers; they are
subtly different! I made a note to this effect in 3.2.3. Do you think we need
seaprate terms (and if so, any suggestions), or do you think it will always be
obvious from the context?


    > The packets which belong to a flow could be identified by a tag
    > consisting of a number of fields (such as addresses, ports, etc), as
    > opposed to a specialized field.

    Well ... the conclusion is that you need more than just the "IP addresses"
    but you can not rely on any specific information carried in the IP payload,
    so therefore, you need an explicit field in the IP header which identifies
    the flow.

I thought I said more or less that:

    Given that you can always find situations where the existing fields alone
    don't do the job, and you *still* need a separate field to do the job
    correctly

Do I need an example, or more of an argument? Remember, this document isn't
"Why the internetwork layer needs flows" but "Nimrod IPng tech Rqmts"! Yes,
I do have to explain some stuff, but I can't put *everything* in here (like
"Why the internetwork layer needs EID's" :-)

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa29457; 4 May 94 14:39 EDT
Received: from pizza by PIZZA.BBN.COM id aa02911; 4 May 94 14:18 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa02907; 4 May 94 14:16 EDT
Received: from research.ftp.com by BBN.COM id aa27768; 4 May 94 14:14 EDT
Received: by Research.Ftp.Com (920330.SGI/)
	for nimrod-wg@bbn.com id AA02717; Wed, 4 May 94 14:10:13 -0400
Received: by Research.Ftp.Com 
	id AA02717; Wed, 4 May 94 14:10:13 -0400
Date: Wed, 4 May 1994 14:10:13 -0400 (EDT)
From: Frank Kastenholz <kasten@research.ftp.com>
Subject: Re: Comments on "Nimrod and IPng Technical Requirements"
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Cc: nimrod-wg@BBN.COM
In-Reply-To: <9405041803.AA12882@ginger.lcs.mit.edu>
Message-Id: <Pine.3.89.9405041456.E2553-0100000@research.ftp.com>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII


On Wed, 4 May 1994, Noel Chiappa wrote:

>     > A flow, from the user's point of view, is a.....
> 
>     So, a flow is defined path through the internetwork which has certain
>     attributes (other than a simple source-to-destination 'connection')
>     associated with it and these attributes may have been explicitly used in
>     determining, creating, or selecting, the particular path. Yes?
> 
> Yes, except that I'd say the "path through the internetwork" is one of the
> attributes of the flow.

I think that this is you saying "tomahtoe" and me saying "tomaytoe"...

> 
> Actually, I guess we need to distinguish between i) a flow as a sequence of
> packets,

Always refer to this as a sequence of packets, or packet sequence or 
packet flow...

> and ii) a flow, as the thing which is set up in the routers;

And this is just a "flow." Since you describe this flow as "a thing..." 
it sounds like it needs its own term more than the use 'i)'.


> Do I need an example, or more of an argument?

Neither. I just arrived at the same conclusion via a different path
(although, perhaps the conclusion I came to more stronly requires a
specific flow id field in the internetwork header). It tends to validate
the conclusion, and it can be filed away and then used if/when required. 

Frank


Received: from PIZZA.BBN.COM by BBN.COM id aa17617; 4 May 94 20:32 EDT
Received: from pizza by PIZZA.BBN.COM id aa05080; 4 May 94 20:22 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa05076; 4 May 94 20:20 EDT
Received: from Princeton.EDU by BBN.COM id aa17189; 4 May 94 20:18 EDT
Received: from ponyexpress.Princeton.EDU by Princeton.EDU (5.65b/2.110/princeton)
	id AA24439; Wed, 4 May 94 20:10:57 -0400
Received: from clytemnestra.Princeton.EDU by ponyexpress.princeton.edu (5.65c/1.113/newPE)
	id AA21777; Wed, 4 May 1994 20:10:55 -0400
Received: by clytemnestra.Princeton.EDU (4.1/Phoenix_Cluster_Client)
	id AA12308; Wed, 4 May 94 20:10:52 EDT
Message-Id: <9405050010.AA12308@clytemnestra.Princeton.EDU>
To: nimrod-wg@BBN.COM
Subject: Re: Comments on "Nimrod and IPng Technical Requirements" 
In-Reply-To: Your message of "Wed, 04 May 1994 14:03:06 EDT."
             <9405041803.AA12882@ginger.lcs.mit.edu> 
X-Mailer: exmh version 1.3 4/7/94
Date: Wed, 04 May 1994 20:10:51 EDT
From: John Wagner <jwagner@princeton.edu>


> 
> Actually, I guess we need to distinguish between i) a flow as a sequence of
> packets, and ii) a flow, as the thing which is set up in the routers; they 
are
> subtly different! I made a note to this effect in 3.2.3. Do you think we need
> seaprate terms (and if so, any suggestions), or do you think it will always 
be
> obvious from the context?

I think the only thing obvious is that many will misread the meaning no matter 
what the context.  I think these two are not subtly but drastically different. 
To pull analogies from other fields:

   The stream bed (ii) is not the same as the water (i) flowing through it.

   The storm drain (ii) is not the same as the rain water (i) flowing through 
	it.

   The wire (ii) is not the same as the electrons (i) flowing thorugh it.

I think all of these are good analogies for what a Nimrod flow is *after the 
flow setup has been done*.  Flow setup builds the pipe network.  The flow 
occurs after the pipes are put together.  Using "flow" to represent both the 
pipes and their internal contents sure seems to lead to problems communicating.

To use another analogy (excuse me while I date myself); 

Back at the '69 Worlds Fair in New York the GM exhibit showed a nice 
steamlined car of the future.  What was the big selling point?  You'd get into 
the car in front of your house, drive it to the nearest big road *and take 
your hands off the steering wheel* because you would be able to tell something 
in your car where you wanted to go and it would worry about the routing (sound 
familiar?).  The goodness of this analogy is that it leads to use of words we 
want people to think about; road maps, routes, interconnected roads (meshes), 
etc.

Instead of flows (ii) why not "pre-selected routes", "dynamically defined 
routes", "network optimized routes", ...?  None of these strike me as the 
right replacement but there has to be something better than "flow".

   John Wagner


Received: from PIZZA.BBN.COM by BBN.COM id aa19020; 4 May 94 21:29 EDT
Received: from pizza by PIZZA.BBN.COM id aa05325; 4 May 94 21:15 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa05321; 4 May 94 21:14 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa18628; 4 May 94 21:14 EDT
Received: by ginger.lcs.mit.edu 
	id AA15703; Wed, 4 May 94 21:07:11 -0400
Date: Wed, 4 May 94 21:07:11 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9405050107.AA15703@ginger.lcs.mit.edu>
To: big-internet@munnari.oz.au, int-serv@isi.edu, nimrod-wg@BBN.COM, 
    rsvp@isi.edu
Subject: "Flows" mailing list.
Cc: jnc@ginger.lcs.mit.edu

	So, I received a total of 28 replies (including myself) about the
question of whether or not we should have a separate flows mailing list.
The count was: 5 No, 3 Maybe (as in "I don't know if we need this list, but
it you create it add me"), 20 Yes. Since I think that this constitutes
rough consensus, the list has been set up.
	The list itself is "flows@research.ftp.com", and there's the usual
"flows-request@research.ftp.com" for *ALL* requests to be added or deleted
(but see below). The kind of things the list should deal with are questions
like:

  - Do we have a single mechanism across all subsystems (routing, resource
    allocation, etc) to name the packets which are part of a flow?
  - If so, what is it?
  - Do we have a single mechanism across all subsystems to install flow state
    in the routers?
  - If so, what is it?
  - How do we do multicast flow setup/maintainence, especially for large
    multi-cast groups?

	All who voted "Yes" or "Maybe" have been added. Everyone on the
Nimrod-WG mailing list has also been added; I did that since the "flow"
subsystem of the Nimrod group of subsystem will probably be mostly discussed
there. If anyone on the Nimrod WG mailing list didn't want on, my apologies in
advance, but I thought that would probably also save having most of you write
in to say "please add me".
	The archives are available for anonymous ftp from research.ftp.com in
the directory pub/flow/Archives/ (note the uppercase A!). The 'current'
archive file is named archive. back archives are available as archive-ddMmmyy
or archive-ddMmmyy.Z. ddMmmyy is the date that the archive file was saved. for
example, if there are two files, archive-01Mar94.Z and archive-03Mar94.Z,
archive-03Mar94.Z will have the traffic from shortly before midnight on
01Mar94 up to shortly before midnight on 03Mar94.

	Thanks to Frank Kastenholz for setting this up, and FTP for hosting.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa19927; 5 May 94 10:58 EDT
Received: from pizza by PIZZA.BBN.COM id aa08461; 5 May 94 10:39 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa08457; 5 May 94 10:34 EDT
Received: from inet-gw-2.pa.dec.com by BBN.COM id aa18376; 5 May 94 10:32 EDT
Received: from nacto1.nacto.lkg.dec.com by inet-gw-2.pa.dec.com (5.65/21Mar94)
	id AA04319; Thu, 5 May 94 07:24:50 -0700
Received: from sneezy.nacto.lkg.dec.com by nacto1.nacto.lkg.dec.com with SMTP
	id AA04300; Thu, 5 May 1994 10:24:58 -0400
Received: by sneezy.nacto.lkg.dec.com (5.65/4.7) id AA04476; Thu, 5 May 1994 10:25:01 -0400
To: nimrod-wg@BBN.COM
Subject: Re: Comments on "Nimrod and IPng Technical Requirements"
In-Reply-To: <Pine.3.89.9405041122.B2509-0100000@research.ftp.com>
References: <Pine.3.89.9405041122.B2509-0100000@research.ftp.com>
X-Mailer: Poste 2.1
From: David R Oran <oran@nacto.lkg.dec.com>
Date: Thu, 5 May 94 10:24:59 -0400
Message-Id: <940505102459.474@sneezy.nacto.lkg.dec.com.thomas>
Encoding: 30 TEXT, 6 TEXT SIGNATURE

> Is this really needed? Nimrod works by map distribution. So, looking at 
> the map, I can see if there is a loop or not and deal with it.  If the 
> map does not show a loop, but there is a loop (i.e. the map is wrong) 
> then I would imagine that there will be enough other problems that killing
> loopy packets is the least of our worries.
> 
If map distribution is done by lower-level flooding, then you might get by
without an explicit looping-packet detector (we used to call this
"super-macho routing", since if it ever fails, it fails spectacularly).
If map distribution is handled on top of the normal IPng forwarding
mechanisms, then you can have a serious problem if a misbehaving router
ever starts spraying these things around erroneously.

> 
>
> In general, I'd suggest that we ask for 'large' length fields, and allow
> administrative limits be placed on the max values. Then, as the need arises,
> we can let the administrative limit grow. This also applies to the hop-count
> field -- use a big field, but put a 'smallish' admin limit on it.
> 
The problem with punting this to the administrative domain is the
difficulting of gracefully changing the value, which might be quite
involved and error prone. Based on my experience in setting the count-up
limit on DECnet Phase IV networks, and doing fault diagnosis where
premature dropping of packets is one of many possible symptoms of
non-transitive communication, I would be reluctant to endorse something
which does not reasonable auto-configure based on a (conservative)
assessment of the actual network diameter. I think Noel was onto the right
track when he suggested a procedure similar to MTU or Router Discovery to
get the diameter estimate to the hosts.

-+-+-+-+-+-+-+
David R. Oran			Phone:	+ 1 508 486-7377
Digital Equipment Corporation		Fax:	+ 1 508 486-5279
LKG 1-2/A19			Email:	oran@lkg.dec.com
550 King Street
Littleton, MA 01460


Received: from PIZZA.BBN.COM by BBN.COM id aa01380; 5 May 94 13:52 EDT
Received: from pizza by PIZZA.BBN.COM id aa09573; 5 May 94 13:37 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa09569; 5 May 94 13:35 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa00223; 5 May 94 13:34 EDT
Received: by ginger.lcs.mit.edu 
	id AA21928; Thu, 5 May 94 13:34:14 -0400
Date: Thu, 5 May 94 13:34:14 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9405051734.AA21928@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re: Comments on "Nimrod and IPng Technical Requirements"
Cc: jnc@ginger.lcs.mit.edu

    From: David R Oran <oran@nacto.lkg.dec.com>

    If map distribution is done by lower-level flooding, then you might get by
    without an explicit looping-packet detector (we used to call this
    "super-macho routing", since if it ever fails, it fails spectacularly).o

Yah, the failure mode's what bothers me. I'm worried enough about coding
faults, etc, that it seems worth guarding about.

    If map distribution is handled on top of the normal IPng forwarding
    mechanisms

Some map forwarding will have to be; if you ask for a map of some distant
location, they aren't going to flood it to you. On the other hand, most
"normal" local updating seems like it shoudl be done via flooding.


    Based on my experience in setting the count-up limit on DECnet Phase IV
    networks ... I would be reluctant to endorse something which does not
    reasonable auto-configure ... a procedure similar to MTU or Router
    suggested Discovery to get the diameter estimate to the hosts.

Yah, I agree completely. This is neither expensive, nor difficult, so I think
there's no reason not to go this way.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa12897; 6 May 94 9:40 EDT
Received: from pizza by PIZZA.BBN.COM id aa14944; 6 May 94 9:30 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa14940; 6 May 94 9:26 EDT
Received: from wd40.ftp.com by BBN.COM id aa11914; 6 May 94 9:24 EDT
Received: from ftp.com by ftp.com  ; Fri, 6 May 1994 09:24:36 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Fri, 6 May 1994 09:24:36 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA11505; Fri, 6 May 94 09:23:20 EDT
Date: Fri, 6 May 94 09:23:20 EDT
Message-Id: <9405061323.AA11505@mailserv-D.ftp.com>
To: oran@nacto.lkg.dec.com
Subject: Re: Comments on "Nimrod and IPng Technical Requirements"
From: Frank Kastenholz <kasten@ftp.com>
Reply-To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM
Content-Length: 2326

>> In general, I'd suggest that we ask for 'large' length fields, and allow
>> administrative limits be placed on the max values. Then, as the need arises,
>> we can let the administrative limit grow. This also applies to the hop-count
>> field -- use a big field, but put a 'smallish' admin limit on it.
>> 
>The problem with punting this to the administrative domain is the
>difficulting of gracefully changing the value,

Yup. I was imagining a mechanism that would allow say 1 or 2 years
for a 'phase-in' for new limits. I would hope that we can see, soon
enough, when the current admin limits are going to be too small so
that a note can be publised to vendors saying 'increase the limit on
parameter X' and they would do it as a part of their normal release
process and then get fielded in systems as a part of the normal
upgrade process. Obviously, we'd need a crystal ball that has a 2 or
more year range, at least for these limits.

I'll certainly admit that this might be a bit too optimistic.

> I would be reluctant to endorse something
>which does not reasonable auto-configure based on a (conservative)
>assessment of the actual network diameter. I think Noel was onto the right
>track when he suggested a procedure similar to MTU or Router Discovery to
>get the diameter estimate to the hosts.

Yeah. Would be nice if we could set up a hunk of DNS that contains
network-wide configuration parameters (such as max TTL). This could
feed into the "local configuration protocol" such as DHCP or BOOTP or
the like.

Could also use various other feedback schemes. For example, for TTL,
we could use a VJ-like 'slow-start' algorithm. We could start the TTL
at a small-ish value (e.g. 16) and then send the packet. A TTL
expired message would come back, so we up the TTL. We keep doing this
until we either start to get responses from the far node, or the
transmitting host decides that there is a loop. I'd try to do loop
detection based on the locator of the node (router) sending the TTL
expired message. If the locator of the router sending TTL expired
message 'N' is topologically closer to the destination than the
locator for the router sending TTL Expired message 'N-1' then you
know that you are 'making progress'.

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa08163; 10 May 94 14:58 EDT
Received: from pizza by PIZZA.BBN.COM id aa08356; 10 May 94 14:33 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa08352; 10 May 94 14:30 EDT
Received: from inet-gw-1.pa.dec.com by BBN.COM id aa06000; 10 May 94 14:28 EDT
Received: from xirtlu.zk3.dec.com by inet-gw-1.pa.dec.com (5.65/21Mar94)
	id AA24519; Tue, 10 May 94 11:22:46 -0700
Received: by xirtlu.zk3.dec.com; id AA29481; Tue, 10 May 1994 11:53:36 -0400
Message-Id: <9405101553.AA29481@xirtlu.zk3.dec.com>
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Cc: nimrod-wg@BBN.COM
Subject: Re: IPng requirements document points.... 
In-Reply-To: Your message of "Thu, 28 Apr 94 16:36:57 EDT."
             <9404282036.AA27140@ginger.lcs.mit.edu> 
Date: Tue, 10 May 94 11:53:30 -0400
From: bound@zk3.dec.com
X-Mts: smtp

Noel and WG,

We have an IPng Directorate Retreat May 19 and 20 in Chicago.  The topics
are Routing/Addressing, Autoconfig, and Transition.  On our Directorate
Telechat yesterday I asked if we had absorbed the NIMROD reqs yet.
Frank K. was going to check with Noel but I figured I would probe here too
as I think these requirements are good for the NEXT Internet World,
which I believe will be far more complex than today per all the users
types who will want to be on the Internet.  

I think the last bottom-line section of the reqs I pulled accross from
research.ftp.com are what needs to be sent in to the IPng Directorate. At
a minimum they should be at the top of any analysis if readers wish to
view them.  I say this because the Directorate has now been mandated by
the IPng AD's to write up their technical reviews of the IPng proposals.
So these folks are really maxed out (like me) and need concise data
right now.  Kind of like sending something to a high level technical
manager or director who you want to read your document, give them the 
technical jist up front to entice them to read the rest of your technical 
paper.  In this case thats the core requirements.

I think stating what will BREAK the core beliefs and abstractions in NIMROD 
are critical. 

/jim


Received: from PIZZA.BBN.COM by BBN.COM id aa18482; 11 May 94 17:04 EDT
Received: from pizza by PIZZA.BBN.COM id aa15853; 11 May 94 16:43 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa15849; 11 May 94 16:40 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa16888; 11 May 94 16:37 EDT
Received: by ginger.lcs.mit.edu 
	id AA09472; Wed, 11 May 94 16:37:18 -0400
Date: Wed, 11 May 94 16:37:18 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9405112037.AA09472@ginger.lcs.mit.edu>
To: bound@zk3.dec.com
Subject: Re: IPng requirements document points....
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM

    From: bound@zk3.dec.com

    I think these requirements are good for the NEXT Internet World, which I
    believe will be far more complex than today per all the users types who
    will want to be on the Internet.

By "NEXT", do you mean IPng, or some hypothetical thing after that? If you
really think we're soon going to need something beyond IPng, maybe we should
wait a bit, and see if we can compress two "product cycles" of the internetwork
layer into one. At the very least, we should make it plain to people that IPng
is an interim step..

    I think the last bottom-line section of the reqs I pulled accross from
    research.ftp.com are what needs to be sent in to the IPng Directorate.

I assume you mean section 3.3, "Specific Interaction Issues"?

What about section 2.2, "Packet Format Fields"? Many of these fields describe
information which ought to be provided by the hosts (e.g. the source and
destination locators, flow-id, etc). If you don't think lengths ought to be
included, I would strongly disagree. This group is working on actual
mechanisms, and these are our best thoughts on what size fields we will need
to support the designs we are doing.

There are some "world-view" sections that don't need to be in there (such as
3.1 and 3.2), but I'd say all the rest contain stuff which is of direct
relevance of what Nimrod thinks it needs...

    At a minimum they should be at the top of any analysis if readers wish to
    view them. ... So these folks are really maxed out (like me) and need
    concise data right now.  Kind of like sending something to a high level
    technical manager or director who you want to read your document, give them
    the technical jist up front to entice them to read the rest of your
    technical paper.

Are you saying you want a shorter document, because this one is too long?

    In this case thats the core requirements. I think stating what will BREAK
    the core beliefs and abstractions in NIMROD are critical.

The core requirements, as best we can codify them, and in the sections I
mentioned: 2.2 and 3.3. Leaving out any would tend to do serious damage, the
exact nature of which (and our workarounds in response) we could discuss.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa25221; 12 May 94 0:32 EDT
Received: from pizza by PIZZA.BBN.COM id aa17907; 12 May 94 0:15 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa17901; 12 May 94 0:14 EDT
Received: from inet-gw-3.pa.dec.com by BBN.COM id aa24481; 12 May 94 0:11 EDT
Received: from xirtlu.zk3.dec.com by inet-gw-3.pa.dec.com (5.65/21Mar94)
	id AA27262; Wed, 11 May 94 21:08:41 -0700
Received: by xirtlu.zk3.dec.com; id AA05384; Thu, 12 May 1994 00:08:34 -0400
Message-Id: <9405120408.AA05384@xirtlu.zk3.dec.com>
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Cc: bound@zk3.dec.com, nimrod-wg@BBN.COM
Subject: Re: IPng requirements document points.... 
In-Reply-To: Your message of "Wed, 11 May 94 16:37:18 EDT."
             <9405112037.AA09472@ginger.lcs.mit.edu> 
Date: Thu, 12 May 94 00:08:28 -0400
From: bound@zk3.dec.com
X-Mts: smtp

Noel,

    From: bound@zk3.dec.com

    I think these requirements are good for the NEXT Internet World, which I
    believe will be far more complex than today per all the users types who
    will want to be on the Internet.

>By "NEXT", do you mean IPng, or some hypothetical thing after that? If you
>really think we're soon going to need something beyond IPng, maybe we should
>wait a bit, and see if we can compress two "product cycles" of the internetwork
>layer into one. At the very least, we should make it plain to people that IPng
>is an interim step..

I mean IPng we need to get it right now.  As far as IPng being an
interim step that is unacceptable to vendors and most customers in the
real world.  They will loose faith in the IETF if they (we) cannot
figure it out with IPng.

I believe forcing the separation of EIDs and Locators permits us a great
flexibility for the future putting my long term architecture hat on.
This forces us to live with this model and also provides us discrete
components to architect and then engineer into the year 2000.  I am
positive its the right thing to do (call it 20 years of this industry
intuition).

    I think the last bottom-line section of the reqs I pulled accross from
    research.ftp.com are what needs to be sent in to the IPng Directorate.

>I assume you mean section 3.3, "Specific Interaction Issues"?

>What about section 2.2, "Packet Format Fields"? Many of these fields describe
>information which ought to be provided by the hosts (e.g. the source and
>destination locators, flow-id, etc). If you don't think lengths ought to be
>included, I would strongly disagree. This group is working on actual
>mechanisms, and these are our best thoughts on what size fields we will need
>to support the designs we are doing.

I did mean 3.3 but your right 2.2 is required to make 3.3. work.

>There are some "world-view" sections that don't need to be in there (such as
>3.1 and 3.2), but I'd say all the rest contain stuff which is of direct
>relevance of what Nimrod thinks it needs...

     At a minimum they should be at the top of any analysis if readers wish to
    view them. ... So these folks are really maxed out (like me) and need
    concise data right now.  Kind of like sending something to a high level
    technical manager or director who you want to read your document, give them
    the technical jist up front to entice them to read the rest of your
    technical paper.

>Are you saying you want a shorter document, because this one is too long?

No just put a quick overview and then the actual requirements up front
and then put all the rest behind it as supporting discussion.  Now I
have read these discussions thats why I think the requirements are on
target.  I just think its best to give them the requirements up front.
Let that be the first discussion point not the technical philosophy.
Most likely if one dislikes the requirements the technical philosophy
discussion will begin anyway.  

    In this case thats the core requirements. I think stating what will BREAK
    the core beliefs and abstractions in NIMROD are critical.

>The core requirements, as best we can codify them, and in the sections I
>mentioned: 2.2 and 3.3. Leaving out any would tend to do serious damage, the
>exact nature of which (and our workarounds in response) we could discuss.

I agree and think above I made it more clear what I suggested.

take care,
/jim


Received: from PIZZA.BBN.COM by BBN.COM id aa05699; 17 Jun 94 15:27 EDT
Received: from pizza by PIZZA.BBN.COM id aa29321; 17 Jun 94 15:07 EDT
Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa29317; 17 Jun 94 15:04 EDT
To: nimrod-wg@BBN.COM
Subject: New Architecture Draft
Date: Fri, 17 Jun 94 15:00:42 -0400
From: Isidro Castineyra <isidro@BBN.COM>

There is a new draft of the Nimrod architecture. The draft
incorporates comments and suggestions received during the last IETF
and from the mail in this list. The file is in

ftp://bbn.com/pub/nimrod-wg/architecture.draft

It is an ascii file.

We are hoping to put this in the Ineternet Draft archive at the
beginning of July. Please send comments to the working group mailing
list.

Thanks,
Isidro


Isidro Castineyra                                 (isidro@bbn.com)   
Bolt Beranek and Newman,  Incorporated		  (617) 873-6233
10 Moulton Street, Cambridge, MA 02138		  USA


Received: from PIZZA.BBN.COM by BBN.COM id aa09403; 20 Jun 94 12:36 EDT
Received: from pizza by PIZZA.BBN.COM id aa11709; 20 Jun 94 12:19 EDT
Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa11705; 20 Jun 94 12:16 EDT
To: nimrod-wg@BBN.COM
Subject: more stuff on bbn.com
Date: Mon, 20 Jun 94 12:17:13 -0400
From: Martha Steenstrup <msteenst@BBN.COM>

Hello,

There is a document describing a take on Nimrod functionality in the
pub/nimrod-wg directory of bbn.com.  It is a compressed postscript
file, func.ps.Z, and is available via anonymous FTP.  There will be a
straight ASCII version later this week, but that version will lack the
figures of the postscript version.  So I urge you to try the
postscript version first.

This document should get you thinking about how one might go about
adding internetwork functionality, based on the Nimrod routing
architecture.  It's a sort of first step toward actual protocols.
Please send your Nimrod protocol ideas and your comments on the
document to the nimrod-wg mailing list.

Thanks,
Martha


Received: from PIZZA.BBN.COM by BBN.COM id aa13264; 20 Jun 94 13:36 EDT
Received: from pizza by PIZZA.BBN.COM id aa12264; 20 Jun 94 13:21 EDT
Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa12260; 20 Jun 94 13:19 EDT
To: nimrod-wg@BBN.COM
Subject: bbn.com
Date: Mon, 20 Jun 94 13:20:29 -0400
From: Martha Steenstrup <msteenst@BBN.COM>

Apparently, the new files have not yet been placed on bbn.com.
(Security precautions at BBN preclude us from writing directly
to that machine.  Hence, we rely on certain designated people to
place things in that directory for us.)

I guess I jumped the gun on this one.  I will let you know when
the document is REALLY there.  Sorry about that.

m


Received: from PIZZA.BBN.COM by BBN.COM id aa13727; 20 Jun 94 13:43 EDT
Received: from pizza by PIZZA.BBN.COM id aa12344; 20 Jun 94 13:26 EDT
Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa12340; 20 Jun 94 13:25 EDT
To: nimrod-wg@BBN.COM
Subject: all documents
Date: Mon, 20 Jun 94 13:25:49 -0400
From: Martha Steenstrup <msteenst@BBN.COM>

Hello again,

Just FTPed to bbn.com and all documents are there.  The
architecture document, the functionality document, the
mobility document, and the multicast document.  Please let
us know if you have any trouble obtaining these.

Thanks,
m


Received: from PIZZA.BBN.COM by BBN.COM id aa16628; 20 Jun 94 14:28 EDT
Received: from pizza by PIZZA.BBN.COM id aa12738; 20 Jun 94 14:12 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa12734; 20 Jun 94 14:09 EDT
Received: from quern.epilogue.com by BBN.COM id aa15466; 20 Jun 94 14:10 EDT
From: Dave Bridgham <dab@epilogue.com>
Sender: dab@epilogue.com
To: nimrod-wg@BBN.COM
Subject: comments on architecture draft
Date: Mon, 20 Jun 94 14:08:54 -0400
Message-ID:  <9406201408.aa05437@quern.epilogue.com>


The Nimrod approach to providing this routing functionality includes map
distribution according to the ``link-state'' paradigm, 

   I would describe the link-state paradigm as each device broadcasts
   the state of its links through the net by flooding which enables
   any device in the flooding area (the flood plain?) to construct a
   map of the net.  This doesn't come very close to my picture of
   nimrod.

A map is a graph composed of nodes and arcs.  Properties of nodes and arcs
are contained in attributes associated with them.  Nimrod defines languages
to specify these attributes and to describe maps.

   I thought that one of the few conclusions we came to at the last
   IETF was that arcs did not have attributes.  While I was one of the
   people who argued that it doesn't matter which way you go because
   one is convertable to the other, implementationwise I'm certainly
   going to convert any arc with attributes to a node and that's how
   I'd usually draw it too.

   It also solves the problem of where you're allowed to draw cluster
   boundries.  You can only cut arcs, which aren't enties they only
   connect entities.

The locator of an arc is prefixed by the locator of the node attached to the
tail of the arc (the node the arc ``leaves'').  The locators of all
attributes of a node are also prefixed by the node's locator.  For example,
the locators for the connectivity specifications in the Transit Connectivity
attribute are prefixed by the locator of the node.

   If arcs are entities then they don't need locators at all.

defined at this stage.  An alternative would be not to assign
locators to attributes, but assign an attribute number.  The
attribute would be identified by the locator of the node (or arc)
and the attribute number.  The concatnation of these two starts to
look suspiciously like a locator.

   I had always assumed that attributes of a node would be named by a
   string.  Think property lists on atoms in lisp.  Strings have the
   property that they seem more expandable than integers.  Maybe from
   a theoretic point of view they're not really, but in practice it
   seems to work out that way.  Imagine email header fields being
   named by numbers instead of things like From:, Subject, or
   X-todays-funny-saying:.  When I concatenate an attribute name with
   a locator it doesn't really look like a locator anymore.

Nimrod has no pre-defined ``lowest level'':  for example, it is possible to
define and advertise a map that is physically realized inside a CPU. In this
map, a node could represent, for example, a process or a group of processes.
The user of this map need not necessarily know or care.  (``It is turtles
all the way down!'', in [3] page 63.)

   Well, I've never written up anything on bottom-up locators so I
   suppose it's reasonable that we're still assuming top-down.

The main consequence of this requirement, and it is not a trivial one,
is that ``you cannot take your locator with you.''  As an example of
this, see figure 1, ...

   With a little editing of the whitespace this was too funny.

A datagram-mode packet can indicate a limited form of policy routing by the
choice of destination and source locators.  For this choice to exist, the
source or destination endpoints must have several locators associated with
them.  This type of policy routing is capable of, for example, choosing
providers.

   I don't think you should even think about suggesting that this is
   possible.  I seriously doubt that people will ever be able to do
   better than first level provider selection by having multiple
   locators.  I'd much rather discourage policy selection by multiple
   locators and encourage people to come up with algorithms that use
   the information provided in the network maps to do it reasonably.

The renumbering scheme described above implies that it should be possible to
update the DNS (or its equivalent) securely and, relatively, dynamically.
However, because renumbering will, most likely, be infrequent and carefully
planned, we expect that the load on this updating mechanism should be
manageable.

   I suggest that if we end up using top-down locators then
   renumbering will not necessarily be infrequent.  Also, renumbering
   may involve large pieces of the entire internet at times.

					Dave Bridgham


Received: from PIZZA.BBN.COM by BBN.COM id aa25332; 20 Jun 94 16:25 EDT
Received: from pizza by PIZZA.BBN.COM id aa13467; 20 Jun 94 16:11 EDT
Received: from TTL.BBN.COM by PIZZA.BBN.COM id aa13463; 20 Jun 94 16:09 EDT
To: nimrod-wg@BBN.COM
Subject: even more stuff on bbn.com
Date: Mon, 20 Jun 94 16:03:56 -0400
From: Ram Ramanathan <ramanath@BBN.COM>


There are two more documents related to Nimrod available for anonymous FTP
from bbn.com. One is on mobility support and the other on multicast support.
Each of them is available in both .ps and .txt (ascii) format. 

Anonymous FTP to bbn.com and go to /pub/nimrod-wg. The files are :

mobility.ps
mobility.txt

multicast.ps
multicast.txt

These documents describe the requirements that mobility/multicast solution
should meet and approaches to solutions including examples (Mobile-IP and
PIM). The style is somewhat more discussion oriented than the architecture
and functionality documents.

Please send your comments and suggestions to this mailing list 
as  soon as you can so that we can talk about it before making it
if appropriate, an internet-draft.

- Ram.


--------------------------------------------------------------

Ram Ramanathan              Advanced Networking R & D 
                            BBN Systems and Technologies
                            10 Moulton Street, Cambridge, MA 02138

Phone : (617) 873-2736

INTERNET : ramanath@bbn.com


Received: from PIZZA.BBN.COM by BBN.COM id aa17549; 22 Jun 94 17:08 EDT
Received: from pizza by PIZZA.BBN.COM id aa29058; 22 Jun 94 16:54 EDT
Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa29054; 22 Jun 94 16:50 EDT
To: nimrod-wg@BBN.COM
Subject: functionality document
Date: Wed, 22 Jun 94 14:40:00 -0400
From: Martha Steenstrup <msteenst@BBN.COM>

A straight text version of the Nimrod functionality document,
minus the figures, should be available on bbn.com by the end
of the day.  The name of the file is func.txt.


Received: from PIZZA.BBN.COM by BBN.COM id aa19589; 23 Jun 94 14:45 EDT
Received: from pizza by PIZZA.BBN.COM id aa05802; 23 Jun 94 14:31 EDT
Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa05798; 23 Jun 94 14:28 EDT
To: nimrod-wg@BBN.COM
cc: isidro@BBN.COM
Subject: Dave's comments to the architecutre draft
Date: Thu, 23 Jun 94 14:25:29 -0400
From: Isidro Castineyra <isidro@BBN.COM>

David,

Thanks for your comments. My thoughts are below.

AD> = Architecture Draft
DC> = Dave's Comments

AD> The Nimrod approach to providing this routing functionality
AD> includes map distribution according to the ``link-state'' paradigm,

   DC> I would describe the link-state paradigm as each device broadcasts
   DC> the state of its links through the net by flooding which enables
   DC> any device in the flooding area (the flood plain?) to construct a
   DC> map of the net.  This doesn't come very close to my picture of
   DC> nimrod.

Are you talking about the description "link-state" or about the
existence of a flooding protocol? If the former, I am not sure if what
I think Nimrod is going to have should be called "link-state" or not.
If the latter, I think that there is going to be to be a "hierarchical
flooding" mechanism in operation. After all, there has to be a way of
discovering a node's map to start with.  Something like the
following. In general, each router participates in implementing one or
more nodes. A router participates in reliably flooding updates for the
maps of those nodes it implements. Consider the figure below (I hate
ascii drawings) which shows only routers and their
interconnections. The first figure first shows a physical network, the
seconmd shows the Nimrod map (you might want to print this)


               /\                                            /\
              /R1\ a:b                                      /R2\ a:c
              \  /                                          \  /
               \/                                            \/
                \                                            /
		 \    					    /
		  \   					   /
		   \  					  /
	            \/\ a:d:1                             /\ a:d:2
		    /R3\________________________________ /R4\
                    \  /                                 \  /
                     \/.                                 .\/
		      \	 .              	       .  /
		       \   .            	    .    /
		        \     .         	  .     /
			 \      .     a:d:3    .       /
			  \	  .     /\   .        /
			   \	     . /R5\.         /
			    \	       \  /         /
			     \	        \/         /
			      \		|         /
			       \	|        /
			       	\	|       /
                                 \      |      /
				  \	|     /
				   \	|    /
				    \	|   /
				     \  |  /
				      \ | /
				        /\  a:d:4
                                       /R6\
                                       \  /
                                        \/
                                        |
 					|
 					|
 					|
 					|
				        /\
                                       /R7\ a:e
                                       \  /
                                        \/


          +-----+                                     +-----+
          |     |                                     |     |
          | a:b |                                     |a:c  |
          |     |                                     |     |
          +-----+.                                    +-----+
		   .                                 .
		     .  			   .
		       .			 .
		          .    +-----+         .
		       	    .  |     |      .
		       	      .| a:d |   .
		               |     |.
		       	       +-----+
			  	  |
			   	  |
			    	  |
			          |
			          |       /
			          |
			       +-----+
                               |     |
                               | a:e |
                               |     |
                               +-----+


All routers are part of node a. There will be at least two floodings
happening. One associated with node a:d, another associated with node
a. Routers R1, R2, R3, R4, R6, R7 (but not R5) participate in the
flooding associated with node a. Routers R3 to R6 participate in the
flooding associated with node a:d. (R5 participates only in the a:d
flooding). R1 does not see what's happening inside a:d (nor do R2 or
R7, for that matter). If R1 needs that a:d's map, it would have to
expressely request it (from R3, perhaps). R5 does not see what's
happening outside a:d


AD> A map is a graph composed of nodes and arcs.  Properties of nodes
AD> and arcs are contained in attributes associated with them.  Nimrod
AD> defines languages to specify these attributes and to describe maps.

   DC> I thought that one of the few conclusions we came to at the last
   DC> IETF was that arcs did not have attributes.  While I was one of the
   DC> people who argued that it doesn't matter which way you go because
   DC> one is convertable to the other, implementationwise I'm certainly
   DC> going to convert any arc with attributes to a node and that's how
   DC> I'd usually draw it too.

   DC> It also solves the problem of where you're allowed to draw cluster
   DC> boundaries.  You can only cut arcs, which aren't enties they only
   DC> connect entities.

Ram is going to address that in another message.

AD> The locator of an arc is prefixed by the locator of the node
AD> attached to the tail of the arc (the node the arc ``leaves'').  The
AD> locators of all attributes of a node are also prefixed by the
AD> node's locator.  For example, the locators for the connectivity
AD> specifications in the Transit Connectivity attribute are prefixed
AD> by the locator of the node.

   DC> If arcs are entities then they don't need locators at all.

One needs some way to refer to an arc when you are specifying a
"source route". Rather than have other types of names, we thought that
giving the arcs locators was easier.

AD> defined at this stage.  An alternative would be not to assign
AD> locators to attributes, but assign an attribute number.  The
AD> attribute would be identified by the locator of the node (or arc)
AD> and the attribute number.  The concatnation of these two starts to
AD> look suspiciously like a locator.

   DC> I had always assumed that attributes of a node would be named by a
   DC> string.  Think property lists on atoms in lisp.  Strings have the
   DC> property that they seem more expandable than integers.  Maybe from
   DC> a theoretic point of view they're not really, but in practice it
   DC> seems to work out that way.  Imagine email header fields being
   DC> named by numbers instead of things like From:, Subject, or
   DC> X-todays-funny-saying:.  When I concatenate an attribute name with
   DC> a locator it doesn't really look like a locator anymore.

We are going to need a way to refer to "well known attributes", those,
I think, will be strings. For example, in some data base it will say
"the following are the connectivity specifications associated with
this arc (or node's transit specifications, if we make the change
above), but then you will also need to give them a name so that a
packet or flow that wishes to use it can refer to it succinctly. I
agree with you that perhaps it should not be called a locator, but I
was trying to minimize the number of things.


AD> Nimrod has no pre-defined ``lowest level'': for example, it is
AD> possible to define and advertise a map that is physically realized
AD> inside a CPU. In this map, a node could represent, for example, a
AD> process or a group of processes.  The user of this map need not
AD> necessarily know or care.  (``It is turtles all the way down!'', in
AD> [3] page 63.)

   DC> Well, I've never written up anything on bottom-up locators so I
   DC> suppose it's reasonable that we're still assuming top-down.

Actually, I am not assuming how the locators are assigned. I believe
that the architecture as it stands supports both ways of assignigment.
Perhaps we should add a explicit note to that effect.

AD> The main consequence of this requirement, and it is not a trivial
AD> one, is that ``you cannot take your locator with you.''  As an
AD> example of this, see figure 1, ...

   DC> With a little editing of the whitespace this was too funny.

AD> A datagram-mode packet can indicate a limited form of policy
AD> routing by the choice of destination and source locators.  For this
AD> choice to exist, the source or destination endpoints must have
AD> several locators associated with them.  This type of policy routing
AD> is capable of, for example, choosing providers.

   DC> I don't think you should even think about suggesting that this is
   DC> possible.  I seriously doubt that people will ever be able to do
   DC> better than first level provider selection by having multiple
   DC> locators.  I'd much rather discourage policy selection by multiple
   DC> locators and encourage people to come up with algorithms that use
   DC> the information provided in the network maps to do it reasonably.

I have no objection to deleting this. I just thought that our
customers (Nimrod's would be users) want to do this, and will try to
do this even if we do not like it.


AD> The renumbering scheme described above implies that it should be
AD> possible to update the DNS (or its equivalent) securely and,
AD> relatively, dynamically.  However, because renumbering will, most
AD> likely, be infrequent and carefully planned, we expect that the
AD> load on this updating mechanism should be manageable.

   DC> I suggest that if we end up using top-down locators then
   DC> renumbering will not necessarily be infrequent.  Also, renumbering
   DC> may involve large pieces of the entire internet at times.

Agreed.

Isidro


Received: from PIZZA.BBN.COM by BBN.COM id aa26078; 23 Jun 94 15:46 EDT
Received: from pizza by PIZZA.BBN.COM id aa06295; 23 Jun 94 15:33 EDT
Received: from TTL.BBN.COM by PIZZA.BBN.COM id aa06291; 23 Jun 94 15:31 EDT
To: nimrod-wg@BBN.COM
Subject: Re: Dave's comments to the architecutre draft 
In-reply-to: Your message of Thu, 23 Jun 94 14:25:29 -0400.
Date: Thu, 23 Jun 94 15:30:02 -0400
From: Ram Ramanathan <ramanath@BBN.COM>


>AD> A map is a graph composed of nodes and arcs.  Properties of nodes
>AD> and arcs are contained in attributes associated with them.  Nimrod
>AD> defines languages to specify these attributes and to describe maps.

 >  DC> I thought that one of the few conclusions we came to at the last
 >  DC> IETF was that arcs did not have attributes.  While I was one of the
 >  DC> people who argued that it doesn't matter which way you go because
 >  DC> one is convertable to the other, implementationwise I'm certainly
 >  DC> going to convert any arc with attributes to a node and that's how
 >  DC> I'd usually draw it too.

 >  DC> It also solves the problem of where you're allowed to draw cluster
 >  DC> boundaries.  You can only cut arcs, which aren't enties they only
 >  DC> connect entities.

>Ram is going to address that in another message

I don't think it is a big problem that needs "addressing", but
my opinion is that the architecture only specifies that a node
*can* have attributes and an arc *can* have attributes. As an implementor,
one may choose to have attributes for arcs or not. As Dave mentions,
they are both functionally equivalent. I am not in favor of *precluding*
arcs from having attributes. However, perhaps the text should be changed
to bring out the point more clearly - something like "A map is a graph
composed of nodes and arcs. Nodes and arcs may have attributes associated
with them. Nimrod specifies the language ...".

Regarding clustering, I consider it as replacement of one map with another.
Both nodes and arcs can be aggregated. Aggregation results in a new set of
nodes and a new set of arcs, with an associated mapping between the old
and the new. How the mapping is stored and processed is a subject that
belongs in the protocol document.

- Ram.


Received: from PIZZA.BBN.COM by BBN.COM id aa11269; 24 Jun 94 10:32 EDT
Received: from pizza by PIZZA.BBN.COM id aa10989; 24 Jun 94 10:19 EDT
Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa10985; 24 Jun 94 10:16 EDT
To: Ram Ramanathan <ramanath@BBN.COM>
cc: nimrod-wg@BBN.COM
Subject: Re: Dave's comments to the architecutre draft 
In-reply-to: Your message of Thu, 23 Jun 94 15:30:02 -0400.
Date: Fri, 24 Jun 94 10:13:14 -0400
From: Isidro Castineyra <isidro@BBN.COM>


>>Regarding clustering, I consider it as replacement of one map with another.
>>Both nodes and arcs can be aggregated. Aggregation results in a new set of
>>nodes and a new set of arcs, with an associated mapping between the old
>>and the new. How the mapping is stored and processed is a subject that
>>belongs in the protocol document.
>>

I am not convinced that this mapping should be part of Nimrod. I
imagine that implementations will keep such a mapping. But how would
this map be useful to a user of the map is not clear to me. Moreover,
requiring a mapping might be more of a hindrance than a help.

Isidro


Received: from PIZZA.BBN.COM by BBN.COM id aa26835; 24 Jun 94 15:02 EDT
Received: from pizza by PIZZA.BBN.COM id aa13527; 24 Jun 94 14:41 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa13523; 24 Jun 94 14:39 EDT
Received: from quern.epilogue.com by BBN.COM id aa25678; 24 Jun 94 14:39 EDT
From: Dave Bridgham <dab@epilogue.com>
Sender: dab@epilogue.com
To: isidro@BBN.COM
CC: nimrod-wg@BBN.COM
In-reply-to: Isidro Castineyra's message of Thu, 23 Jun 94 14:25:29 -0400 <9406231437.aa04238@quern.epilogue.com>
Subject: Dave's comments to the architecutre draft
Date: Fri, 24 Jun 94 14:39:21 -0400
Message-ID:  <9406241439.aa11386@quern.epilogue.com>

This response contains rather vast quantities of included message.
Sorry about that.

   Date: Thu, 23 Jun 94 14:25:29 -0400
   From: Isidro Castineyra <isidro@bbn.com>

   AD> = Architecture Draft
   DC> = Dave's Comments

   AD> The Nimrod approach to providing this routing functionality
   AD> includes map distribution according to the ``link-state'' paradigm,

      DC> I would describe the link-state paradigm as each device broadcasts
      DC> the state of its links through the net by flooding which enables
      DC> any device in the flooding area (the flood plain?) to construct a
      DC> map of the net.  This doesn't come very close to my picture of
      DC> nimrod.

   Are you talking about the description "link-state" or about the
   existence of a flooding protocol? If the former, I am not sure if what
   I think Nimrod is going to have should be called "link-state" or not.
   If the latter, I think that there is going to be to be a "hierarchical
   flooding" mechanism in operation. After all, there has to be a way of
   discovering a node's map to start with.  Something like the
   following. In general, each router participates in implementing one or
   more nodes. A router participates in reliably flooding updates for the
   maps of those nodes it implements. Consider the figure below (I hate
   ascii drawings) which shows only routers and their
   interconnections. The first figure first shows a physical network, the
   seconmd shows the Nimrod map (you might want to print this)

Ah, I see I think.  I misunderstood just which part of Nimrod you were
talking about.  I thought you were talking about the map distribution
part where route calculators talk to various map servers in the
porcess of figuring out a route to somewhere.

Now I think you were talking about how the maps are generated in the
first place.  Very likely the lowest level map of a nimrod map
heirarchy would be produced by a link state protocol of some sort.  Of
course something else could work here but I see link state is the most
likely at this time.

However, I'm not sure this follows for any layers above that.  The map
at one layer will get abstracted and passed up.  For now I assume the
abstraction will be largely very simple with hand tuning; maybe
someday we'll get good algorithms for automating this.  The layer
above I assumed would take maps from below to build its maps.  I guess
this could be described as being similar to the link-state paradigm.

   One needs some way to refer to an arc when you are specifying a
   "source route". Rather than have other types of names, we thought that
   giving the arcs locators was easier.

This is true if you give arcs attributes.  It is not true if you
don't.  You only need to specify arc in a source route if arcs are
somehow related to some entity.  If they're not then they don't get
specified in source routes and they don't need names.

   We are going to need a way to refer to "well known attributes", those,
   I think, will be strings. For example, in some data base it will say
   "the following are the connectivity specifications associated with
   this arc (or node's transit specifications, if we make the change
   above), but then you will also need to give them a name so that a
   packet or flow that wishes to use it can refer to it succinctly. I
   agree with you that perhaps it should not be called a locator, but I
   was trying to minimize the number of things.

I thought the idea was to encapsulate all the various choices made by
the route generator in the source route or flow spec.  You seem to be
suggesting here that some of the information used in chosing the route
is going to be embedded in the source route or flow spec.  I assume so
part of the route choice can be made out it the network somewhere.

As for the naming, you say the connectivity spec would have a string
name and a, presumably different, name so the flow can refer to it
succinctly.  From the original message I assume this second name would
be the number then.  I wouldn't use the number at all, but I'm wierd.

   AD> A datagram-mode packet can indicate a limited form of policy
   AD> routing by the choice of destination and source locators.  For this
   AD> choice to exist, the source or destination endpoints must have
   AD> several locators associated with them.  This type of policy routing
   AD> is capable of, for example, choosing providers.

      DC> I don't think you should even think about suggesting that this is
      DC> possible.  I seriously doubt that people will ever be able to do
      DC> better than first level provider selection by having multiple
      DC> locators.  I'd much rather discourage policy selection by multiple
      DC> locators and encourage people to come up with algorithms that use
      DC> the information provided in the network maps to do it reasonably.

   I have no objection to deleting this. I just thought that our
   customers (Nimrod's would be users) want to do this, and will try to
   do this even if we do not like it.

I'm sure some will; I'd like them to be as few as possible.  I'd
rather spend the time up front building Nimrod from the beginning such
that it does the right thing without such kludges.  I believe the
architecture has it in it and I'd not like that lost in the rush of
people setting up their sites with multiple locators to do first level
provider selection.

						Dave


Received: from PIZZA.BBN.COM by BBN.COM id aa00160; 24 Jun 94 15:36 EDT
Received: from pizza by PIZZA.BBN.COM id aa13792; 24 Jun 94 15:20 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa13788; 24 Jun 94 15:18 EDT
Received: from quern.epilogue.com by BBN.COM id aa28781; 24 Jun 94 15:16 EDT
From: Dave Bridgham <dab@epilogue.com>
Sender: dab@epilogue.com
To: ramanath@BBN.COM
CC: nimrod-wg@BBN.COM
In-reply-to: Ram Ramanathan's message of Thu, 23 Jun 94 15:30:02 -0400 <9406231538.aa04470@quern.epilogue.com>
Subject: Dave's comments to the architecutre draft 
Date: Fri, 24 Jun 94 15:15:56 -0400
Message-ID:  <9406241515.aa11674@quern.epilogue.com>

   Date: Thu, 23 Jun 94 15:30:02 -0400
   From: Ram Ramanathan <ramanath@bbn.com>

   >AD> A map is a graph composed of nodes and arcs.  Properties of nodes
   >AD> and arcs are contained in attributes associated with them.  Nimrod
   >AD> defines languages to specify these attributes and to describe maps.

    >  DC> I thought that one of the few conclusions we came to at the last
    >  DC> IETF was that arcs did not have attributes.  While I was one of the
    >  DC> people who argued that it doesn't matter which way you go because
    >  DC> one is convertable to the other, implementationwise I'm certainly
    >  DC> going to convert any arc with attributes to a node and that's how
    >  DC> I'd usually draw it too.

    >  DC> It also solves the problem of where you're allowed to draw cluster
    >  DC> boundaries.  You can only cut arcs, which aren't enties they only
    >  DC> connect entities.

I can go either way here, my biggest concern was that I thought we
came to an agreement on this at the Seattle IETF.  An agreement the
other way.  Even though I was argueing for the side we didn't agree to
I've come since to believe that Noel was right.  I think it really
does work better I think to make anything with attributes a node and
links are non-entities that link the nodes together.

   I don't think it is a big problem that needs "addressing", but my
   opinion is that the architecture only specifies that a node *can*
   have attributes and an arc *can* have attributes. As an
   implementor, one may choose to have attributes for arcs or not. As
   Dave mentions, they are both functionally equivalent. I am not in
   favor of *precluding* arcs from having attributes. However, perhaps
   the text should be changed to bring out the point more clearly -
   something like "A map is a graph composed of nodes and arcs. Nodes
   and arcs may have attributes associated with them. Nimrod specifies
   the language ...".

As an implementor if arcs can have attributes I better implement
handling arcs with attributes.  Either that or have a input converter
to re-write any maps containing arcs with attributes into one with
only nodes with attributes.

						Dave


Received: from PIZZA.BBN.COM by BBN.COM id aa03174; 24 Jun 94 16:28 EDT
Received: from pizza by PIZZA.BBN.COM id aa14184; 24 Jun 94 16:12 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa14180; 24 Jun 94 16:10 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa02027; 24 Jun 94 16:07 EDT
Received: by ginger.lcs.mit.edu 
	id AA09260; Fri, 24 Jun 94 16:07:46 -0400
Date: Fri, 24 Jun 94 16:07:46 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9406242007.AA09260@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Re:  Dave's comments to the architecutre draft
Cc: jnc@ginger.lcs.mit.edu

    From: Dave Bridgham <dab@epilogue.com>

    >> I thought that one of the few conclusions we came to at the last
    >> IETF was that arcs did not have attributes.

    I can go either way here, my biggest concern was that I thought we
    came to an agreement on this at the Seattle IETF.

There was a lot of heat (and some light :-) about the node and arc model, but
I don't recall the exact outcome on this specific issue (do arcs have
attributes).

I do remember Isidro sort of capitulating and saying "OK, we'll do it that
way", and me being unhappy because I wanted people to go with the node/arc
model I suggested only if they were convinced it was the Right Thing, not
because I was stubborn.

    I've come since to believe that Noel was right.

:-)

    I think it really does work better I think to make anything with
    attributes a node and links are non-entities that link the nodes together.

It certainly makes the implementation more of a simple step from the formal
description. The one thing that still worries me is that that formalism isn't
the closest match to the kinds of pictures we all draw (where the interfaces
are arcs, and nodes represent nets and boxes). On the other hand, I don't know
how else to deal with the fact that interfaces are going to have attributes,
unless we make the interfaces attributes of the router/host nodes, and then
all the interface "attributes" are sub-attributes of the interface attribute..


Sigh, I keep promising to go off and think about the node/arc model hard, but
I keep not having the time. These stupid arguments on Big-I about "should
TLN's be different from TSILN's", and then the fixed/variable stuff, keep
wasting valuable time.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa00978; 24 Jun 94 23:38 EDT
Received: from pizza by PIZZA.BBN.COM id aa16096; 24 Jun 94 23:27 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa16092; 24 Jun 94 23:25 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa24637; 24 Jun 94 23:25 EDT
Received: by ginger.lcs.mit.edu 
	id AA11059; Fri, 24 Jun 94 23:25:31 -0400
Date: Fri, 24 Jun 94 23:25:31 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9406250325.AA11059@ginger.lcs.mit.edu>
To: dab@epilogue.com, isidro@BBN.COM
Subject: Re:  Dave's comments to the architecutre draft
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM

    From: Dave Bridgham <dab@epilogue.com>

    AD> The Nimrod approach to providing this routing functionality
    AD> includes map distribution according to the ``link-state'' paradigm,

I must confess a certain amount of unease with the use of "link state" here.
To me, LS invokes a mental image of things like the new ARPAnet algorithm,
IS-IS, OSPF, etc; i.e. routing architectures which depend on synchronized
databases and identical route calculations to make a hop-by-hop routing
paradigm work. I prefer to put IDPR and Nimrod in a class I call "map
distribution", of which LS is a subset.

    > One needs some way to refer to an arc when you are specifying a
    > "source route". Rather than have other types of names, we thought that
    > giving the arcs locators was easier.

    This is true if you give arcs attributes. It is not true if you
    don't.

I'd put if that you only need to name the arcs if you allow more than one arc
between a pair of nodes; if you don't, then there's no possible confusion.  If
arcs just represent connectivity, with no attributes, then you don't need to
have more than one between any node pair (the second one contains 0
information), but if arcs have attributes, you need to be able to allow
multiple arcs. But then we get back to whether arcs have nodes, and what the
data will look like anyway...


	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa21037; 29 Jun 94 11:52 EDT
Received: from pizza by PIZZA.BBN.COM id aa12122; 29 Jun 94 11:29 EDT
Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa12118; 29 Jun 94 11:26 EDT
To: nimrod-wg@BBN.COM
Subject: Administrivia
Date: Wed, 29 Jun 94 11:21:34 -0400
From: Isidro Castineyra <isidro@BBN.COM>

A couple of things to get us organized for IETF.

1.- The current draft Nimrod documents have been put in host bbn.com
in directory /pub/nimrod-wg. These files are accessible via anonymous
ftp.  There are four different documents and seven files there:

   architecture.draft	draft architecture
   func.ps.Z		postscript version of the functionality
   func.txt		text version of the functionality
   mobility.ps		postscript version of Nimrod's approach to mobility
   mobility.txt		text version of Nimrod's approach to mobility
   multicast.ps		postscript version of Nimrod's approach to multicast
   multicast.txt	text version of Nimrod's approach to multicast

We plan to submit these documents to the Internet Draft archive by the
end of the first week of July unless there are objections. Please send
your comments to the list so we can incorporate them.

2.- I will be sending out a draft agenda soon. Let me know of any
items you would like to see in it.

Thanks,
Isidro


Received: from PIZZA.BBN.COM by BBN.COM id aa25995; 29 Jun 94 13:09 EDT
Received: from pizza by PIZZA.BBN.COM id aa12781; 29 Jun 94 12:59 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa12777; 29 Jun 94 12:56 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa25078; 29 Jun 94 12:54 EDT
Received: by ginger.lcs.mit.edu 
	id AA12237; Wed, 29 Jun 94 12:54:52 -0400
Date: Wed, 29 Jun 94 12:54:52 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9406291654.AA12237@ginger.lcs.mit.edu>
To: nimrod-wg@BBN.COM
Subject: Locators and EID's
Cc: jnc@ginger.lcs.mit.edu

	Everyone on this WG mailing list who think's that the internetwork
should have transport levels names (e.g. EID's) which are separate from
routing names (e.g. locators) needs to respond to the recent query from the
IPng AD's on the Big-Internet mailing list about whether people want one
"name" or two.
	I'd guess that many of you have given up reading it, but many people
whom I know favor this split haven't responded, so please, take the time to
do so.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa05050; 29 Jun 94 15:16 EDT
Received: from pizza by PIZZA.BBN.COM id aa13802; 29 Jun 94 15:04 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa13798; 29 Jun 94 15:00 EDT
Received: from inet-gw-2.pa.dec.com by BBN.COM id aa04000; 29 Jun 94 15:00 EDT
Received: from xirtlu.zk3.dec.com by inet-gw-2.pa.dec.com (5.65/27May94)
	id AA03691; Wed, 29 Jun 94 11:52:12 -0700
Received: by xirtlu.zk3.dec.com; id AA26345; Wed, 29 Jun 1994 14:52:03 -0400
Message-Id: <9406291852.AA26345@xirtlu.zk3.dec.com>
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Cc: nimrod-wg@BBN.COM, sob@hsdndev.harvard.edu, mankin@cmf.nrl.navy.mil, 
    pvm@isi.edu
Subject: Re: Locators and EID's 
In-Reply-To: Your message of "Wed, 29 Jun 94 12:54:52 EDT."
             <9406291654.AA12237@ginger.lcs.mit.edu> 
Date: Wed, 29 Jun 94 14:51:56 -0400
From: bound@zk3.dec.com
X-Mts: smtp

Noel,

>	Everyone on this WG mailing list who think's that the internetwork
>should have transport levels names (e.g. EID's) which are separate from
>routing names (e.g. locators) needs to respond to the recent query from the
>IPng AD's on the Big-Internet mailing list about whether people want one
>"name" or two.
>	I'd guess that many of you have given up reading it, but many people
>whom I know favor this split haven't responded, so please, take the time to
>do so.

Not at all.  Just very busy.  Also a lot of us answered this on the
SIPP list. 

My response was that I only wanted one name for my system.  My new
Internet node for example to test IPng is called sipper (not available
yet to incoming packets still working on the filters).  I don't want
to remember any other name as an end user.  I want that to all be
transparent to me.   So what I am saying is that I don't want to have
to call my node sipper.jimbo.  If other qualifiers like zk3.dec.com
constitute a name then thats OK.  I hate to see the X.400 naming
strings in any address so its a taste issue not a technical
issue.

On TSNs or EIDs/Locators you know I agree with that split at the
network packet level and software that drives that separation.  But
for IPng I think the best we can do is make sure that IPng does not
preclude in the future a simple change to IPng to use EIDs and
Locators.  This can be accomplished with a carefully defined IPng header,
address space, and source route.  

I also think we need a separate working group to just work on EIDs and
what they mean and get some implementation experience based on a
spec.  

/jim


Received: from PIZZA.BBN.COM by BBN.COM id aa15181; 30 Jun 94 10:51 EDT
Received: from pizza by PIZZA.BBN.COM id aa19768; 30 Jun 94 10:35 EDT
Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa19764; 30 Jun 94 10:31 EDT
To: nimrod-wg@BBN.COM
Subject: Draft Nimrod Agenda for Toronto
Date: Thu, 30 Jun 94 10:28:33 -0400
From: Isidro Castineyra <isidro@BBN.COM>

This is the draft agenda for the Toronto IETF. Please send comments
and suggestions.

Thanks,
Isidro
----------------
Group Name: Nimrod - The New Internet Routing and Addressing Architecture
IETF Area: Routing
Date/Time: Tuesday, July 26, 1994
           1600-1800 EST (multicast)
           Wednesday, July 27, 1994
           1600-1800 EST


Proposed Agenda -- First Session

    1. Agenda bashing

    2. Architecture 

	a. Update (Isidro Castineyra)				30min

	b. Questions 						30min

    4. Nimrod Functionality 

	a. Overview (Martha Steenstrup)				30min

	b. Questions						30min


Proposed Agenda -- Second Session

    6. Implementation Sketch

	a. Database Structuring (Isidro Castineyra)		15min

	b. Protocol Mechanisms (Ram Ramanathan)			15min

	c. Mapping Functionality to Databases and Protocols	15min
	   (Martha Steenstrup)

	d. Discussion						30min

     8. Multicast and Mobility 

	a. Update (Ram Ramanathan)				15min

	b. Questions						15min

     7. Open Issues and Work Plan				15min


Received: from PIZZA.BBN.COM by BBN.COM id aa20272; 30 Jun 94 12:00 EDT
Received: from pizza by PIZZA.BBN.COM id aa20294; 30 Jun 94 11:41 EDT
Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa20290; 30 Jun 94 11:39 EDT
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
cc: nimrod-wg@BBN.COM
Subject: Re: Dave's comments to the architecutre draft 
In-reply-to: Your message of Fri, 24 Jun 94 16:07:46 -0400.
             <9406242007.AA09260@ginger.lcs.mit.edu> 
Date: Thu, 30 Jun 94 11:27:23 -0400
From: Isidro Castineyra <isidro@BBN.COM>

I think David and Noel are right that we agreed to have arcs with no
attributes.  I'll re-write the architecture document with that in
mind. There are basically two approaches I can think of:

1.- As mentioned by Noel:

   make the interfaces attributes of the router/host nodes, and then
   all the interface "attributes" are sub-attributes of the interface
   attribute..

The only problem is in the current draft "interface" is not used as a
concept. (The previous draft used something equivalent, the node
connecting point, or something like that, but we got rid of that.)

2.- Represent arcs with attributes as two arcs with a node in the
middle.  I really do not like this approach. These nodes would have to
have locators, which I do not know how to assign reasonably. 

Isidro


Received: from PIZZA.BBN.COM by BBN.COM id aa22367; 30 Jun 94 12:39 EDT
Received: from pizza by PIZZA.BBN.COM id aa20709; 30 Jun 94 12:21 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa20705; 30 Jun 94 12:19 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa21163; 30 Jun 94 12:14 EDT
Received: by ginger.lcs.mit.edu 
	id AA19695; Thu, 30 Jun 94 12:14:08 -0400
Date: Thu, 30 Jun 94 12:14:08 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9406301614.AA19695@ginger.lcs.mit.edu>
To: isidro@BBN.COM, jnc@ginger.lcs.mit.edu
Subject: Re: Dave's comments to the architecutre draft
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM

    From: Isidro Castineyra <isidro@bbn.com>

    we agreed to have arcs with no attributes. ... There are basically two
    approaches I can think of: ... make the interfaces attributes of the
    router/host nodes ... Represent arcs with attributes as two arcs with a
    node in the middle.

I'll try and find some time to think about this (if this stupid IPng drivel
will let up on Big-I)...

    I really do not like this approach. These nodes would have to have
    locators, which I do not know how to assign reasonably.

Well, maybe it's not so bad as all that. If those nodes represent interfaces,
then it's legit for them to have locators, and you can either assign them as
subsidiary to the locator of the router node, or the locator of the network
node.

The Internet has previously always done the second, since that's what makes
most sense in a routing-table world, but I can see cases where it's useful to
have them associated with the router. (Maybe we do both, with overlapping
areas, nasty as that sounds.)

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa22407; 30 Jun 94 12:40 EDT
Received: from pizza by PIZZA.BBN.COM id aa20728; 30 Jun 94 12:22 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa20724; 30 Jun 94 12:20 EDT
Received: from quern.epilogue.com by BBN.COM id aa21231; 30 Jun 94 12:15 EDT
From: Dave Bridgham <dab@epilogue.com>
Sender: dab@epilogue.com
To: nimrod-wg@BBN.COM
In-reply-to: Isidro Castineyra's message of Thu, 30 Jun 94 11:27:23 -0400 <9406301150.aa25045@quern.epilogue.com>
Subject: Dave's comments to the architecutre draft 
Date: Thu, 30 Jun 94 12:15:28 -0400
Message-ID:  <9406301215.aa25247@quern.epilogue.com>

   Date: Thu, 30 Jun 94 11:27:23 -0400
   From: Isidro Castineyra <isidro@bbn.com>

   1.- As mentioned by Noel:

      make the interfaces attributes of the router/host nodes, and then
      all the interface "attributes" are sub-attributes of the interface
      attribute..

   The only problem is in the current draft "interface" is not used as a
   concept. (The previous draft used something equivalent, the node
   connecting point, or something like that, but we got rid of that.)

Huh?  Some nodes are networks, some are interfaces, some are hosts,
some are aggregates, and some were invented as a place to stash some
extra attributes.  When you build the map (for any value of `you') you
just put in map nodes whereever you need attributes.

The map distribution language needs to be able to specify nodes, the
attributes of each, the links between them, and the links off this map
to other maps.  Route generators need to be able to read a map in the
map distribution language and understand the attributes so that it can
pick a sequence of nodes whos attributes are all acceptable.

I suggest that attributes are tuples of <tag, value> where both the
tag and the value are strings.  I don't know if the tags should be
heirarchical or flat.  We'll spend a lot of time over the next few
decades adding new tag types.

   2.- Represent arcs with attributes as two arcs with a node in the
   middle.  I really do not like this approach. These nodes would have to
   have locators, which I do not know how to assign reasonably. 

Take the map of nodes and arcs.  Draw clustering circles.  Don't
locators just fall out?  The top-down vs bottom-up issue is still
there but given a map with clustering circles drawn, the locators
shouldn't be that hard to come up with.

Oh yeah, this reminds me of something I mentioned to Noel on the phone
a while back.  I thought it made the whole issue of reasonably
assigning locators much simpler but Noel said his head hurt and hung
up.  I mostly forgot about it until now.  Next message.

						Dave


Received: from PIZZA.BBN.COM by BBN.COM id aa23159; 30 Jun 94 12:56 EDT
Received: from pizza by PIZZA.BBN.COM id aa20901; 30 Jun 94 12:42 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa20897; 30 Jun 94 12:39 EDT
Received: from quern.epilogue.com by BBN.COM id aa22276; 30 Jun 94 12:38 EDT
From: Dave Bridgham <dab@epilogue.com>
Sender: dab@epilogue.com
To: nimrod-wg@BBN.COM
Subject: grouping strategies
Date: Thu, 30 Jun 94 12:37:57 -0400
Message-ID:  <9406301237.aa25449@quern.epilogue.com>

One rathole we've fallen in more than once is how do we group things
in the maps?  Specifically, if we have nodes for hosts, networks, and
interfaces, does the interface node group with the host or the
network?  This provided one of the stronger arguments for why we
needed overlapping areas; group the interface both ways.

Well, how about another solution?  I claim that the above thinking is
unnecessarily close to IPv4 routing or heirarchical routing.  Nimrod
allows more flexibility, use it.  In particular, put all the nodes for
hosts, networks, and interfaces at a site in the same group.  No
sub-grouping.  If the resulting aggregation is not so large as to be
unwieldy this seems like it would be much easier.  In other words,
within a single site, say a company with no more than 1000 to 5000
hosts, I'd just have a flat locator heirarchy to the hosts.

I'd go to more groupings if the site was too large or if there were
policy reasons for more structure in the grouping.  Probably
adminstrative reasons would cause more groups too.

Implementationwise, I picture something like ES-IS to let the packet
forwarders know about all the hosts and link-state like flooding
between the packet forwarders to create the lowest level map.

						Dave


Received: from PIZZA.BBN.COM by BBN.COM id aa23853; 30 Jun 94 13:12 EDT
Received: from pizza by PIZZA.BBN.COM id aa21052; 30 Jun 94 12:56 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa21048; 30 Jun 94 12:54 EDT
Received: from wd40.ftp.com by BBN.COM id aa23073; 30 Jun 94 12:55 EDT
Received: from ftp.com by ftp.com  ; Thu, 30 Jun 1994 12:54:58 -0400
Received: from mailserv-D.ftp.com by ftp.com  ; Thu, 30 Jun 1994 12:54:58 -0400
Received: by mailserv-D.ftp.com (5.0/SMI-SVR4)
	id AA13913; Thu, 30 Jun 94 12:53:01 EDT
Date: Thu, 30 Jun 94 12:53:01 EDT
Message-Id: <9406301653.AA13913@mailserv-D.ftp.com>
To: dab@epilogue.com
Subject: Re: grouping strategies
From: Frank Kastenholz <kasten@ftp.com>
Reply-To: kasten@ftp.com
Cc: nimrod-wg@BBN.COM
Content-Length: 687

 > In other words,
 > within a single site, say a company with no more than 1000 to 5000
 > hosts, I'd just have a flat locator heirarchy to the hosts.
 > 
 > I'd go to more groupings if the site was too large or if there were
 > policy reasons for more structure in the grouping.  Probably
 > adminstrative reasons would cause more groups too.
 > 
 > Implementationwise, I picture something like ES-IS to let the packet
 > forwarders know about all the hosts and link-state like flooding
 > between the packet forwarders to create the lowest level map.

How is this different from bridging?

--
Frank Kastenholz
FTP Software
2 High Street
North Andover, Mass. USA 01845
(508)685-4000


Received: from PIZZA.BBN.COM by BBN.COM id aa24640; 30 Jun 94 13:21 EDT
Received: from pizza by PIZZA.BBN.COM id aa21162; 30 Jun 94 13:06 EDT
Received: from BBN.COM by PIZZA.BBN.COM id ab21155; 30 Jun 94 13:04 EDT
Received: from quern.epilogue.com by BBN.COM id aa23487; 30 Jun 94 13:05 EDT
From: Dave Bridgham <dab@epilogue.com>
Sender: dab@epilogue.com
To: kasten@ftp.com
CC: nimrod-wg@BBN.COM
In-reply-to: Frank Kastenholz's message of Thu, 30 Jun 94 12:53:01 EDT <9406301653.AA13913@mailserv-D.ftp.com>
Subject: grouping strategies
Date: Thu, 30 Jun 94 13:05:11 -0400
Message-ID:  <9406301305.aa25646@quern.epilogue.com>

   Date: Thu, 30 Jun 94 12:53:01 EDT
   From: Frank Kastenholz <kasten@ftp.com>

   How is this different from bridging?

 - Broadcasts don't pass through packet forwarders.
 - You don't flood packets until you've learned where the host lives.
 - It isn't media layer transparent.
 - It'll work between different medias.
 - The maps thus generated could have attributes with policy and
   performance information.
 - It can work with resource reservation systems.
 - I like this scheme and I don't like bridging.
 - That's what floats immediately to the top of my mind.
 - One thing I didn't say in the first message.

I'm not saying that I'd require Nimrod users to set up their site as
I've suggested.  They can do as they like.  I'm saying they could and
if they did life gets easier and this recurring issue about do we
group the interface node with the host or the network goes away.

						Dave


Received: from PIZZA.BBN.COM by BBN.COM id aa25250; 30 Jun 94 13:26 EDT
Received: from pizza by PIZZA.BBN.COM id aa21239; 30 Jun 94 13:12 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa21235; 30 Jun 94 13:10 EDT
Received: from black-ice.cc.vt.edu by BBN.COM id aa23684; 30 Jun 94 13:09 EDT
Received: (from valdis@localhost) by black-ice.cc.vt.edu (8.6.9/8.6.9) id NAA21328; Thu, 30 Jun 1994 13:09:18 -0400
Message-Id: <199406301709.NAA21328@black-ice.cc.vt.edu>
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
cc: nimrod-wg@BBN.COM
Subject: Re: Locators and EID's 
In-reply-to: Your message of "Wed, 29 Jun 1994 12:54:52 EDT."
             <9406291654.AA12237@ginger.lcs.mit.edu> 
From: Valdis.Kletnieks@vt.edu
Date: Thu, 30 Jun 1994 13:09:18 +22306356
Sender: valdis@black-ice.cc.vt.edu

Noel, the AD, and anybody else who cares:

EID's are not locators.  They address subtly different issues which may
need to be dealt with in IPng (the areas where I see the distinction
making the most difference are mobility, multicasting, and anything
else we devise where we don't want to nail down both ends of the connection
with really big spikes (a la the current [srcaddr,scrport,dstadd,dstport]
quads that specify IP V4 connections).

/Valdis


Received: from PIZZA.BBN.COM by BBN.COM id aa27786; 30 Jun 94 13:57 EDT
Received: from pizza by PIZZA.BBN.COM id aa21505; 30 Jun 94 13:41 EDT
Received: from KARIBA.BBN.COM by PIZZA.BBN.COM id aa21501; 30 Jun 94 13:39 EDT
To: Noel Chiappa <jnc@ginger.lcs.mit.edu>
cc: nimrod-wg@BBN.COM
Subject: Re: Dave's comments to the architecutre draft 
In-reply-to: Your message of Thu, 30 Jun 94 12:14:08 -0400.
             <9406301614.AA19695@ginger.lcs.mit.edu> 
Date: Thu, 30 Jun 94 13:35:49 -0400
From: Isidro Castineyra <isidro@BBN.COM>

>>    I really do not like this approach. These nodes would have to have
>>    locators, which I do not know how to assign reasonably.
>>
>>Well, maybe it's not so bad as all that. If those nodes represent interfaces,
>>then it's legit for them to have locators, and you can either assign them as
>>subsidiary to the locator of the router node, or the locator of the network
>>node.

Something with a locator subsidiary to the locator of a node (i.e.,
having its locator prefixed by the locator of a node) should basically
be *inside* the node. I really do not like having it hanging way out
separated by a full arc. 

Anyway, I would prefer not to talk about routers in this context. It
is too much an implementation issue (or node-realization issue).

Isidro


Received: from PIZZA.BBN.COM by BBN.COM id aa08311; 30 Jun 94 15:50 EDT
Received: from pizza by PIZZA.BBN.COM id aa22510; 30 Jun 94 15:34 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa22506; 30 Jun 94 15:31 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa06450; 30 Jun 94 15:28 EDT
Received: by ginger.lcs.mit.edu 
	id AA22121; Thu, 30 Jun 94 15:28:38 -0400
Date: Thu, 30 Jun 94 15:28:38 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9406301928.AA22121@ginger.lcs.mit.edu>
To: dab@epilogue.com, nimrod-wg@BBN.COM
Subject: Re:  grouping strategies
Cc: jnc@ginger.lcs.mit.edu

    From: Dave Bridgham <dab@epilogue.com>

    In particular, put all the nodes for hosts, networks, and interfaces at a
    site in the same group.

It would be nice to be able to tell, from looking at a locator, what kind of
thing it names. Although, perhaps this information is an attribute of the
thing which is named?

That wouldn't be so bad; you could do things like run consistency checks on
the maps to make sure that there was always a node of type "interface" between
a mode of type "machine" and a node of type "network".

    In other words, within a single site ... I'd just have a flat locator
    heirarchy to the hosts.

Like IS-IS with a single level 1 area covering multiple physical networks,
yes?

    I'd go to more groupings if the site was too large or if there were policy
    reasons for more structure in the grouping.  Probably adminstrative
    reasons would cause more groups too.

I think it would be mostly a human factors deal. For instance, it's nice if a
locator for a host interface could be assigned automatically (serverless
autoconfig) by appending the local physical network address to the locator of
the network. (Yes, you could do the same thing by appending the LPNA to the
locator of the whole company, if the LPNA is unique within the company.)  That
way, you can look at an interface locator, and know which network the machine
is on, without consulting a map.

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa09687; 30 Jun 94 16:11 EDT
Received: from pizza by PIZZA.BBN.COM id aa22738; 30 Jun 94 15:54 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa22733; 30 Jun 94 15:52 EDT
Received: from quern.epilogue.com by BBN.COM id aa08349; 30 Jun 94 15:51 EDT
From: Dave Bridgham <dab@epilogue.com>
Sender: dab@epilogue.com
To: nimrod-wg@BBN.COM
In-reply-to: Noel Chiappa's message of Thu, 30 Jun 94 15:28:38 -0400 <9406301928.AA22121@ginger.lcs.mit.edu>
Subject:  grouping strategies
Date: Thu, 30 Jun 94 15:50:04 -0400
Message-ID:  <9406301550.aa27373@quern.epilogue.com>

   Date: Thu, 30 Jun 94 15:28:38 -0400
   From: Noel Chiappa <jnc@ginger.lcs.mit.edu>

   It would be nice to be able to tell, from looking at a locator,
   what kind of thing it names. Although, perhaps this information is
   an attribute of the thing which is named?

Yeah, I'd just make the type of the node another attribute of the
node.  Why do you wish to infer the node's type from its name?

   That wouldn't be so bad; you could do things like run consistency
   checks on the maps to make sure that there was always a node of
   type "interface" between a mode of type "machine" and a node of
   type "network".

I thought I'd successfully gotten you to give up on that idea.  My
choice would be to allow any node to be linked to any other node.  If
the potential intervening nodes have no attributes, I have no urge to
require them in the map.

       In other words, within a single site ... I'd just have a flat
       locator heirarchy to the hosts.

   Like IS-IS with a single level 1 area covering multiple physical
   networks, yes?

Could be.  I don't know IS-IS.

						Dave


Received: from PIZZA.BBN.COM by BBN.COM id aa10232; 30 Jun 94 16:20 EDT
Received: from pizza by PIZZA.BBN.COM id aa22879; 30 Jun 94 16:06 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa22875; 30 Jun 94 16:04 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa09303; 30 Jun 94 16:04 EDT
Received: by ginger.lcs.mit.edu 
	id AA22547; Thu, 30 Jun 94 16:04:11 -0400
Date: Thu, 30 Jun 94 16:04:11 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9406302004.AA22547@ginger.lcs.mit.edu>
To: isidro@BBN.COM, jnc@ginger.lcs.mit.edu
Subject: Re: Dave's comments to the architecutre draft
Cc: jnc@ginger.lcs.mit.edu, nimrod-wg@BBN.COM

    From: Isidro Castineyra <isidro@bbn.com>

    > If those nodes represent interfaces, then it's legit for them to have
    > locators, and you can either assign them as subsidiary to the locator of
    > the router node, or the locator of the network node.

    Something with a locator subsidiary to the locator of a node (i.e.,
    having its locator prefixed by the locator of a node) should basically
    be *inside* the node. I really do not like having it hanging way out
    separated by a full arc. 

Oops, yes, when you draw the circle labelled "network with locator A.B.C", that
circle includes all the nodes which represent interfaces (with locators of
the form A.B.C.<foo>). Of course, if you look inside the node A.B.C, in
addition to all the enclosed nodes for the interfaces, there has to be a
node for the network itself (unless you have N^2 arcs); A.B.C.0 might be
the name of that node.

When I spoke of "subsididary to the network node", I was speaking of node
A.B.C, but of course there's ambiguity when speaking of "the network node"
as to whether you mean A.B.C or A.B.C.0.

That's part of why I used to like the model where the lowest level locators
all represented interfaces (a network couldn't have a locator on the same
level as an interface), but you guys sucessfully convinced me that the
"turtles all the way down" model was more flexible! :-)

	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa15581; 30 Jun 94 18:03 EDT
Received: from pizza by PIZZA.BBN.COM id aa23676; 30 Jun 94 17:49 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa23672; 30 Jun 94 17:47 EDT
Received: from GINGER.LCS.MIT.EDU by BBN.COM id aa14858; 30 Jun 94 17:47 EDT
Received: by ginger.lcs.mit.edu 
	id AA23628; Thu, 30 Jun 94 17:47:44 -0400
Date: Thu, 30 Jun 94 17:47:44 -0400
From: Noel Chiappa <jnc@ginger.lcs.mit.edu>
Message-Id: <9406302147.AA23628@ginger.lcs.mit.edu>
To: dab@epilogue.com, nimrod-wg@BBN.COM
Subject: Re:  grouping strategies
Cc: jnc@ginger.lcs.mit.edu

    From: Dave Bridgham <dab@epilogue.com>

    > It would be nice to be able to tell, from looking at a locator,
    > what kind of thing it names.

    Why do you wish to infer the node's type from its name?

Ah, no good reason, really, now that I think about it. It just seemed like
a useful thing..

    > you could do things like run consistency checks on the maps to make sure
    > that there was always a node of type "interface" between a mode of type
    > "machine" and a node of type "network".

    My choice would be to allow any node to be linked to any other node.

In what circumstances would it be reasonable to have a node of type "network"
joined directly to another node of type "network"? I think consistency checks
are important, particularly once we get to the levels where hand-tuned
abstractions start to appear.

    If the potential intervening nodes have no attributes, I have no urge to
    require them in the map.

Sure, but is a locator an attribute or not? If it is (and I would lean to
saying it is), then even an interface with no other attributes has an
attribute, its locator.


	Noel


Received: from PIZZA.BBN.COM by BBN.COM id aa18186; 30 Jun 94 18:57 EDT
Received: from pizza by PIZZA.BBN.COM id aa24048; 30 Jun 94 18:45 EDT
Received: from BBN.COM by PIZZA.BBN.COM id aa24044; 30 Jun 94 18:43 EDT
Received: from quern.epilogue.com by BBN.COM id aa17181; 30 Jun 94 18:43 EDT
From: Dave Bridgham <dab@epilogue.com>
Sender: dab@epilogue.com
To: nimrod-wg@BBN.COM
In-reply-to: Noel Chiappa's message of Thu, 30 Jun 94 17:47:44 -0400 <9406302147.AA23628@ginger.lcs.mit.edu>
Subject:  grouping strategies
Date: Thu, 30 Jun 94 18:43:16 -0400
Message-ID:  <9406301843.aa29676@quern.epilogue.com>

   Date: Thu, 30 Jun 94 17:47:44 -0400
   From: Noel Chiappa <jnc@ginger.lcs.mit.edu>

   In what circumstances would it be reasonable to have a node of type
   "network" joined directly to another node of type "network"? I
   think consistency checks are important, particularly once we get to
   the levels where hand-tuned abstractions start to appear.

Because I can't think of why I'd use that particular case right now
you'd prohibit that and all other wierd cases?  I can see connecting
two interface nodes directly together in the case of a point-to-point
network.  Two interface routers could directly link two interface node
with no intervening host node.

       If the potential intervening nodes have no attributes, I have
       no urge to require them in the map.

   Sure, but is a locator an attribute or not? If it is (and I would
   lean to saying it is), then even an interface with no other
   attributes has an attribute, its locator.

I don't know if a locator is an attribute.  I see attributes as
strictly optional.  The locator is the node's name.  Without it you
can't reference the node so it may be safely garbage collected.  I
guess I picture nodes like lisp symbols.  The attributes are the
property list and the locator is the symbol name.  The heirarchical
locators structure is like having a heirarchical package system (maybe
I'm stretching this a bit too far).

Do we tag host nodes with their EIDs?  Do EIDs even matter at this
level?  Single homed hosts don't need a host node, the interface node
is sufficient.  Multi-homed hosts (I'm talking about non-routers here)
need either a host node or the n^2 interconnect.  If the n^2
interconnect, who gets the EID?  All of them?  If the host node, need
to indicate that this node is not a transit node.  That's just policy
I suppose.

There's really no need for network nodes or host nodes to have
locators (for the moment ignoring what I wrote above about locators
being inherent in nodes).  Interfaces and aggregations of interfaces
are named by locators.  Network and host nodes are just there to cut
down the number of links (reduce N^2 to N).  Maybe it is useful to
have unlocator'd nodes.

I think I'm rambling.

					Dave