[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Apologies for any incoherence throughout this document.  Stream of
consciousness runs fast and rough around the turns. :)

On Fri, 21 Jan 2000 11:41:28 +0900, scozens@pwj.co.jp wrote:
>(This is a reply to an email sent to me privately, but I'm copying
>the response to the list because I feel it is interesting for infobot
>development, especially since it refers to the devel series.)
>Rocco Caputo writes:
>> If geckobot's using POE, the Filter::Reference will let you pass
>> around Perl structures.  It freezes them on the sender's side and
>> thaws them on the receiver's.  Send syntax is basically:
>>  $wheel->put(\@thing)
>> On the receiving end, you get an event that says "you got this
>> thing, \@thing".
>Eee. Um. That's nice, but... isn't it a little trusting? I mean,
>is there some negotiation first, or do you indiscriminately receive
>anything sent to you? Presumably you'll use the received thing as a
>Perl object, and maybe execute methods on it, and the idea of
>running arbitrary untrusted subs is rather scary for me. Sure, it's
>fine if we're just passing around factoids and meta-data, but once
>you get code coming across, I wouldn't touch it with a big stick.

False for at least two reasons, one of which you can pin on me:

First, the example \@thing is intended to be an unblessed Perl data
structure.  It's my fault for implying objectness by using the name
"thing".  I apologize for the confusion there.

And B: Even if it were a blessed reference, Storable's freeze() and
thaw() don't work the way you seem to expect.  freeze() doesn't
include code along with the blessed reference; thaw() doesn't inject
strange code into your program.  The receiver's code is its own.

The rest of the code works the same either way.  If it's an object,
the receiver's package has validators and accessors.  If it's a data
packet, the receiver has a library of validation functions and just
accesses the data direcly.

And you can use ref() and can() to make sure the object is blessed

>> Artur/Sky has a patch pending that lets it use Compress::Zlib
>> transparently, so network traffic can be balanced against CPU.
>Oh, wow. That's special.
>> On the decentralized side, factoids could be propagated like
>> Usenet news messages.  Factoids won't get lost if there are
>> redundant paths.
>OK, let's take this principle and run with it. Here's my complete
>train of thought on how interbot is going to work if we go with a
>peer-to-peer network. How you traverse it is up to you, but I've tried
>to make it a DFA. :)
>i) Infobots have a set of peers. If we're really lucky, these peers
>form a k-connected graph. ->ii,iii,v
>ii) How do you discover how your peers are? Either we hard code in
>config files, or we automatically discover. If we're going to discover,
>which I'd prefer because `discovery' is my programming buzzword of
>the moment, we'll need some sort of central registration scheme
>for infobots. You choose your peers by who's networkologically
>close to you. If you're going to go to a central registry, though,
>you might want to ditch the idea and go with something either
>massively-connected (all bots connect to all other bots) or maybe
>a star-based topology where everone connects to a MetaBot which
>forwards and proxies (and caches, yes!) queries. You also lose
>the distributed nature of the network to a single point of failure.
>Or should there be multiple proxy bots? How many levels can you go
>that way? -> xii

Discovery is harder, but I like it better.  Hard problems are more
fun, anyway.

About central registries: An abbreviated DNS model might work.  Set up
a single open registrar for the geckobot network.  Distribute its data
among mirrors regularly.  Hardcode the root cache into geckobot.
Geckobots pull a copy of the peer database at install time, and they
periodically update against it.  This is my favorite choice so far.

IRC makes an interesting star topology.  Set up a small network of
private servers, and have all the 'bots meet on a reserved channel.
NO HUMANZ ALOWED!!!  Are there any limits on channel sizes?

If you want to go all crazy and impractical and stuff, set up a
virtual private network, and put all the bots on the same subnet.
Sling broadcast packets around:

  Hey, everyone on the subnet, tell me what ``foo'' is.
  [X number of ACKs saying: Foo is yatta yatta]
  [Y number of NAKs saying: I don't know]

VPN is my other favorite choice, if I'm allowed to be a complete loon.

>iii) Are peerings bi-directional? Usenet feeds are (generally)
>two-way; I can receive news from my peers, and I post news to
>them as well. I'm named in their config files, and they're named
>in mine. -> iv

I like UDP right now; it's my socket flavor of the week.  They're
comfy and easy to broadcast.

At initial startup, the querist bot grabs a list of servers from one
of the registrar sites or mirrors.  It periodically updates this list,
perhaps daily.

Peerings are one-way: data is propagated by fetch only.

How a querist asks peers for a factoid:

  Create a virtual connection pool; load the question "What is foo?"
  into it.

  Preload it with the N most appropriate peers.  Appropriateness is
  gauged by willingness to answer past and the typical response time.
  Quicker peers float to the top, where they get hit more often.  I
  assume this will balance out the overall bot-network load.

To begin with, there may be a few types of responses.  No matter which
response type, note the round trip time for load balancing.  Keep an
eye on round trip times, and keep a running average, since conditions
constantly change.

  Positive lookup; here's your factoid.  Stop loading addresses into
  the connection pool.  Wait for any remaining transactions to finish.

  No factoid here.  Discard the remote peer's address.

  No factoid here, and let me know if you find something out.  Here's
  a cookie to remind me that I asked for the data you might be pushing
  at me later.  Cache the remote address in case we can send a

  Stop asking me questions.  Flag the site as unwilling to transact;
  don't contact it again for a long while (on the order of a day or
  more).  Why recontact at all?  In case the site's policies change.

  No response; eventually time out (on the order of a few seconds).
  This could get *slow*.

The virtual connection pool will eventually run out, either because
there are no more peers to ask or because an ACK stopped new peers
from being loaded into it.  If we received a positive response,
broadcast copies of the factoid to sites that gave us cookies.  If
nobody knows the factoid, throw away the cookies (or be friendly and
NAK them?).

Or something.

>iv) Does an information request from a bot need to be any different
>from an information request from a human? -> viii

Probably, since additional information may let the bot network
transact more efficiently.

>v) A request for information goes out to all peers, each of which
>replies if they know the answer. ->vi,vii
>vi) We may receive two conflicting answers. Do we simply take the
>first one? Or can we invent a way of validating which is more
>appropriate? If so, do we keep both factoids anyway? In short,
>how are different factoids going to co-exist in a distributed
>knowledge base? -> xii

Combine them:

  A asks, "What is foo?"
  B says, "Foo is what comes before bar."
  C says, "Foo is silence."
  D says, "I don't know; please tell me what you found out."

  A assimilates "Foo is what comes before bar. or silence."

  A says to D, "I found out: Foo is what comes before bar. or

Overlapping factoids are a little harder:

  A asks, "What is foo?"
  B says, "Foo is silence, or what comes before bar."
  C says, "Foo is silence."
  D says, "Foo is what comes before bar or a metavariable."

A will have to be smart enough to figure out:

  A assimilates, "Foo is silence or what comes before bar or a

>vii) How do we propagate the search further? We need a sense of
>a return Path and a way to terminate the search if we've found
>the information. Usenet provides a `cancel' control message to
>do this, effectively. But where do we store state data about a
>query? -> viii, x

Defined under my response to (iii).  Recap: Bot transactions happen
one level at a time.  No more than N-1 outstanding transactions need
to be cleaned up, and the "I don't know, but tell me what you find
out" response gives bots a reason to wait past a positive remote

This doesn't ensure that a found factoid propagates to *everyone*, but
it does help spread the information around.  The factoid will be
cached on more servers for the next time.

>viii) Ideally we'd have a response look the same at every level,
>whether initiated by a human, a bot or another bot further
>down the chain, just to make implementation nice and easy. Why
>should the bot have to care whether it's talking to me or to
>another bot? So, we'd have a message ID stored *locally* keying
>what went to whom and who it's for. -> ix)

The idea in (iii) involves only one level of store and forward at any
time.  Respondents that don't know what a factoid means can pass the
querist back a cookie along with the "I don't know, but please tell
me" response.  The querist can use the cookie to validate the factoid
it's found.  Cookies might expire after (timeout * known_servers +
fudge) seconds.  That gives the querist enough time to finish its

The whole idea invalidates the "bots and humans living together"
aspect.  That is, they would have to speak different languages.  That
may be unacceptable.

>ix) However, we can't sent the message ID out if we want it to
>be transparent, since I'm a human and I wouldn't send out a
>message ID with my queries. So, we have a situation like this:
>     Human: BotA, tell me about hoge
>BotA - need to tell Human about hoge, add to to-do list.
>don't know about hoge. ask peers, [and this is the key!]
>go back to the event loop.
>     BotA: Human, sorry, don't know. I'll see if I can find out.
>     BotA: BotB, tell me about hoge
>     BotA: BotC, tell me about hoge
>     BotA: BotD, tell me about hoge
>BotB - need to tell BotA about hoge. don't know about hoge.
>ask peers.
>BotF - need to tell BotC about hoge.
>     BotF: BotC, hoge is the Japanese for foo.
>BotC - recieved new factoid `hoge'. Checking to-do list.
>Have to tell BotA about `hoge'. Clear `hoge'->BotA from
>to-do list.
>     BotC: BotA, hoge is the Japanese for foo.
>BotA - recieved new factoid `hoge'. Checking to-do list.
>Have to tell Human about `hoge'. Clear `hoge'->Human from
>to-do list.
>     BotA: Human, hoge is the Japanese for foo.
>[Hence, there was no need to keep state data. However... -> x]

The list method in (iii) avoids collissions when BotA and BotB both
ask BotF.  It's also potentially hideously slow.  On the other hand,
the tree method you have in (ix) lets questions percolate through the
network where they might not normally be allowed.

List method example:

  Nick asks BotA: Foo?
  BotA asks BotB: Foo?
  BotA asks BotC: Foo?
  BotB says to BotA: Doesn't know.
  BotC says to BotA: Stop asking.

Tree method:

  Nick asks BotA: Foo?
  BotA asks BotB: Foo?
  BotA asks BotC: Foo?
  BotB says to BotA: Doesn't know, but is looking.
  BotC says to BotA: Stop asking.
  BotB asks BotC: Foo?
  BotC says to BotB: Foo is silence.
  BotB says to BotA: Foo is silence.

The factoid gets back to Nick in the tree structure, where it fails in
the list structure.

On the other other hand, the list structure balances network and bot
loads by picking on ones that seem to have better resources.  Perhaps
a combination of the two methods:

Acquire a list of possible peers at initial startup, from a registrar
or mirror.  Assign them all 0ms response times and assume they're all
willing to answer.  This ensures that every peer is tested before
live statistics are used.

On the first local factoid fail, process the list in list mode.  This
adjusts the response times, and it finds out which peers don't want
talk with us directly.  Continue to process peers in list mode until
they all have been checked.

On subsequent factoid fails, walk through our list of cooperative
hosts, and pass along a few uncooperative hosts to each one.  In this
mode, the cooperative hosts check to see if they cooperate with one of
the uncooperative hosts.  If so, they act as a proxy.

Proxies return the factoid, and identify the uncooperative (to the
initial querist) host they contacted.  The querist replaces the
uncooperative host in its list with a path to it through the proxy.

These rules recurse as necessary (big gloss over interesting
details here), until each geckobot has built routes to each other
geckobot.  At that point we decide that non-cooperation isn't
workable unless bots (or small nets) decide to firewall themselves
off from others totally.  Then we give them their own registrars
and send them on their way.

Chains would be periodically broken and retested, in case server
policies change.

>x) So, we can reliably pass information between bots and humans
>without keeping any state *in the message* or distinguishing between
>human queries, bot queries and second (n'th) -level bot queries.
>BUT... if we have a k-connected graph, which is nice for redundancy,
>yet we're not keeping track of who's asked who what, how on earth
>do we exhaust the search? -> xi

My entire premise, back at (iii) sort of hinges on bots speaking to
one another in a special way and/or on a special network.

>xi) I guess one solution may be, when receiving the `will try
>to find out' message, check whether we have this from all peers,
>and if so report `none of my peers know about this'. This will
>prune that particular branch of the search. But is that now
>guaranteed to find-or-fail in all circumstances? (I can't get my
>head around the topology of this) And how to we code it into an
>event loop? -> xii
>xii) Don't know.

I am now extremely tired.  Again, please forgive the rough bits.

-- Rocco Caputo / troc@netrus.net