Field of Memes Kevin A. Lenzo lenzo@cs.cmu.edu Carnegie Mellon University Presented at The Perl Conference 4 Abstract The Infobot, and the current development branch, the Geckobot, are introduced and used to discuss cultural transmission and replication in the context of rhizomatic, group-writable language repeaters. The structure and design of the Geckobot are outlined, with some motivating philosophy, and a discussion of the implications of activating situated agents. This paper, and the accompanying talk, are vehicles for discovery of other replicator builders that might be interested in interacting. I. Overview It was the poets, as was natural, then, who first began to develop the study of style and delivery. For words are imitations, and we are equipped with a voice that is the most imitative of all our parts; ... -- Aristotle, in The Art of Rhetoric [Aristotle, p. 217] In this paper, we will look at a new kind of mimic, the Infobot, and the community it interacts with from a number of perspectives, and see how that has affected design of the new Geckobot development system. Following Aristotle's advice in the Rhetoric, we'll collect some familiar things and make them strange, and likewise, make some strange things familiar, to deliver the message. The Infobot [Lenzo] is a situated program that interacts with users over the net or on the desktop. The original, and still primary mode is through text on Internet Relay Chat (IRC), but that is expanding to other modalities, including speech. One of the most salient points about the 'bot' is that it allows anyone who interacts with it to update the communal store of responses, and it can also pick up certain things by "listening in" on conversations. By storing 'factoids', which are simply asserted but not necessarily true, an imitating the interaction in the context of their delivery the program -- often called a "bot" -- becomes a repeater for patterns in the community. Everyone who interacts with it can be both a reader and an author of a the common collection of actions and factoids; any one can write into, or erase from, the local collective encyclopedia. This construct has no deep understanding of what it processes, but the users who come in contact with the bot have made the the simple interface useful. There is no complete theory in advance that has described the evolution of the space; it is a medium that has been opened recently, and is in continuous flux. In this paper, we can give some history, structure, and design choices, but the science is in the process of creating itself. This field-creation process is not new; Herbert Simon describes the same sort of thing in Sciences of the Artificial: ... the main route to the development and improvement of time-sharing systems was to build them and see how they behaved. And this was what was done. They were built, modified, and improved in successive stages. Perhaps theory could have anticipated these experiments and made them unnecessary. In fact it didn't, and I don't know anyone intimately acquainted with these exceedingly complex systems who has very specific ideas as to how it might have done so. To understand them, the systems had to be constructed, and their behavior observed. [Simon, p. 25] Borrowing from, and with apologies to, Ken Pike [Pike], we'll use some perspective devices used in his analysis of some instructions for assembling a barbecue grill. The whole (two-page) document shows some relatively complex linguistic phenomena, including all sorts of pragmatic assumptions. Individual pieces, taken atomically, can be seen as 'particles' of a system. Having a conception of the individual pieces helps us identify them and look at them as objects in their own right; it also allows us to talk about relationships between them. In the barbecue grill instructions, there is a picture of each of the components, with names attached to them, that are used during the assembly -- different nuts and bolts and so on, near the top of the first page, giving you a way to both isolate them and group them. The components are a collection of particles, in this terminology. In the Geckobot, the Modules and entities involved are the nuts and bolts; we'll them as particles and in relation to each other. Waves are traversal paths, or propagation in one way or another. There are several different ways one can look at a system as waves -- in the assembly example, one can look at the flow of the document itself, or we can choose to follow the sequence of operations that one actually needs to carry out as the event unfolds in time, which are quite different. We'll follow several different waves on our tour -- how a message moves, how the bot came to be historically, and how the use of the bot itself changed as the bot and the users co-evolved. In the barbecue grill example, there is an exploded view of the entire grill, with little dashed lines connecting the screws and parts to where they will be on the completed assembly. This gives an overview of the entire system as should fit together -- the field of relations. The particles, after an assembly wave, will be the barbecue grill itself, which is a field over the possible configurations of the particles, and how they work together or against each other. Once the grill is constructed, it can become a particle also, and enter into interaction on another level. There are many ways to wield the Field perspective tool on the bot -- as a system internally, as a system of interaction with the community in which it is situated, the structures of communities like #perl, and the nature of Perl itself come to mind. We have no difficulty referring to a field as a thing in itself, or even as a process. Each of these perspectives is a device for considering the whole topic. The re-description of a field or a wave as a particle is something that reappears all through semiotic literature, which pertains to signals, signs, and meaning. A word can be coined for a process that can then take part in the process itself. The ultimate goal of the Infobot is to aid and abet this unlimited semiosis -- the creation of new sign relations. These can be complete garbage, or can be useful, depending upon how they are maintained. II. Philosophical Background Aristotle shows considerable insight into how words are used persuasively in oratory and writing in the Rhetoric. One device was the enthymeme, which is often called a syllogism of probability, or a thesis with some of the supporting arguments implied. A complete enthymeme would contain an idea itself, along with the reasoning for it -- You should be kind to travelers, for one might be Zeus in disguise, and offending Zeus could result in dire consequences. Of course, there's a lot more explanation that could go on there, such as Zeus is a god, and so on, but this can be given simply as the maxim "Be kind to travelers," with the whole mythology of reasoning implied. Charles Saunders Peirce [Peirce], frustrated with overloaded philosophical terms, started using Firstness, Secondness and Thirdness to refer to the objects and processes at work in the construction of meaning: However, the words Quality, Reaction, Representation, might well enough serve to name the conceptions. The names are of little consequence; the point is to apprehend the conceptions. And in order to avoid all false associations, I think it the far best plan to form entirely new scientific names for them. I therefore prefer to designate them as Firstness, Secondness, and Thirdness. [Peirce, p. 147] I share his frustration, which is unfortunately only extended by the addition of these terms. His treatment of sign, object, and interpretant, along with abductive inference, and also the Peircean definitions of Firstness (being or quality), Secondness (unmediated reaction), and Thirdness (representation, law, code), have influenced the structure of the Geckobot. The number three was important in Peirce's formulation -- the sign, the object that evokes it, and the interpretant -- the concept evoked in the mind of the recipient, form a three-part system in sign creation, in which any two can produce the third, in an infinite generative loop that selected some outcome from the possibilia at hand. There word 'seme', "meaning unit," which is the basis of "semiosis" and "semiotics," was originally named by Ferdinand de Saussure [Saussure]. Saussureorganized the language of linguistics into a system, and many of the terms are still used in similar ways even now, not the least of which is the 'seme' meme. Saussure left a legacy of devices for language itself, and we rely on these to make a decent agent that interacts through language. The paradigmatic '-eme' is a class or a named configuration, whereas the syntagmatic '-etic' takes on an realization that depends upon the embedding; these are echoed throughout Pike's work, as does most recent (this century) linguistic literature. There are all sorts of -emes that are more or less familiar these days: theme, rheme, meme, seme, phoneme, morpheme, viseme, There are not so many -etics that we can find; apparently that suffix was less appealing that the notion of an '-eme'. Certainly 'phonetics', the systematic study of the realized, physically measurable traces of phonemes when they are produced in context, is a good example; the only other one in near reach is 'memetics,' which I will avoid, for fear of alienating the audience. More concretely, a seme is a unit of meaning that can, for instance, be used to construct meaningful language... which a bot might be able to pick up. A seme might be a 'semantic fragment,' each of which could be realized and constrain each other when assembled into a sentence with some more complete meaning. Saussure, in his lecture notes from 1906, posits the right of the study of signs and their relations: It is therefore possible to conceive of a science which studies the role of signs as part of social life. It would form a part of social psychology, and hence of general psychology. We shall call it semiology (from Greek semeion, 'sign'). It would investigate the nature of signs and the laws governing them. Since it does not yet exist, one cannot say for certain that it will exist. But it has a right to exist, a place ready for it in advance. Linguistics is only one branch of this general science. While Saussure coined the term, C. S. Peirce and he are both cited as the philosophical roots of semiotics. Whereas Peirce was fond of the number three, Saussure's communication system had only two parts -- the signifier (akin to Peirce's object), and the sign. The signifier stood for the sign, and was passed between my head and yours through language. The term 'meme', which has been exploited to nearly its breaking point in recent years, is a name for any cultural replicator, first introduced by Richard Dawkins in "The Selfish Gene" in 1976 [Dawkins]. Dawkins often described the meme as behaving parasitically on the sign vehicle, even potentially as a symbiont, aiding the goals of the host. He quotes N. K. Humphrey as summarizing: ...memes should be regarded as living structures, not just metaphorically but technically. When you plant a fertile meme in my mind you literally parasitize my brain turning it into a vehicle for the meme's propagation in just the same way that a virus may parasitize the host cell. ... [Dawkins, p. 192] 'Meme' might have been, more fully, 'mimeme,' as in /mime/+/-eme/, but Dawkins shortened it to the easier form that is used today. Put into the terms used here, it is a wave, touching particles, simultaneously allowed by and creating the fields around it. The analogy to parasitism has generated a lot of discussion, bot for and against, but the definition of a meme as a unit of cultural replication is not. This paper, and the talk at The Perl Conference 4.0, are vehicles for the concept of building replicators as much as they are about the bot itself. You might think of the enthymeme, then, as a 'meme complex' (after Dawkins), analogous to a gene complex. One of mixed blessings of memes, semes, rhemes, and so on is that there are so many of them. Too many, some might say -- we can't see the forest for the trees, when it comes to web searching. The bot provides a group-crafted responses to expected triggers, so when you do get something useful, it's likely something you want, because it's a reflection of what someone in the appropriate setting said or set. Attention, and speed of response, is one aspect of IRC that makes it so attractive -- real-time interaction, rather than the asynchronous delay you get from, say, email or net news. Quoting Herb Simon again: ... This is not an isolated problem. The first generation of management systems installed in large American companies were largely judged to have failed because their designers aimed at providing more information to managers, instead of protecting managers from irrelevant distractions of their attention. A design representation suitable to a world in which the scarce factor is information may be exactly the wrong one for a world in which the scarce factor is attention. [Simon, p. 167] While this discussion is not intended to be a treatise on semiotics*, any communication system that addresses the exchange of information is, on some level, a theory of how communication takes place, even though many of the assumptions of the theory may remain unstated. In the next section, we'll examine the context and internal structure of the communication system enacted by the Geckobot. (*Footnote: Umberto Eco gives a quite readable account in "Semiotics and the Philosophy of Language" [Eco], and for a very approachable introduction, there's "Introducing Semiotics" [Cobley&Jansz] from Totem Books. ) III. The Infobot and the Geckobot 1. Context a. Motivation: part and particle The initial motivation for the Infobot was to store up answers to questions that people were tired of answering. To put this in context, the first Infobot ('url', the ancestor to 'purl' and still running today), lived on Eris Free Net (EFNet), the largest Internet Relay Chat (IRC) network on a chat channel called #macintosh. People ("newbies") would show up and ask the same questions right away -- often things like "Where can I get Ircle?" (a chat client) or "Where do i get PlainTalk?" (an Apple speech product). Let's look at some of these as particles: a. Me, finishing up a research job in Japan. b. #macintosh, as an IRC channel. A 'channel' is just a named locus of conversation; anyone who joins it receives all public messages. The channel also colloquially refers to the polity in encompasses -- the people on it; when you talk about 'the channel' on the channel, you are generally referring to a thing like 'national will'. #perl and #macintosh both have over one hundred clients on them, so 'what the channel wants' or 'does' is an interesting concept. Moreover, the topic of conversation may have very little to do with the name or current 'topic' appearing on the channel. Text appears in the channel, identified by the sender's nickname, a string of up to nine characters. The unit of production is line of text (possibly several sentences) that ends when the user hits enter. A long message might be 250 characters total, but people tend to speak in short, interleaved sentences, and tolerate a lot of 'lag,' or asynchrony; people also learn to have several discussions going on on a channel, and to filter out irrelevant messages from other interleaved conversations. There are a number of slick IRC clients with nice user interfaces that display a window or tab for each channel, and list the nicknames of each user along the side. c. Internet Relay Chat, as a collection of channels and users. There are several different IRC networks; the #macintosh and #perl channels in this paper are on EFNet, the Eris-Free Net, but there are similar channels on Undernet and other IRC networks. EFNet has about fifty thousand people on it at any given time, on myriad channels. d. Perl. I had been using Perl (MacPerl in particular) in my research in speech, and the regular expression handling built-in to the syntax was compelling for text-handling operations, as IRC ultimately is (modulo the networking issues). e. Newbies. #macintosh was (and may still be) the default channel for a number of IRC chat clients for the Macintosh, so people with few preconceptions about the space would turn up from time to time. People new to IRC are often called 'Newbies,' and we've all been them. Another class of 'newbie' is one new to a topic; so there are newbies to IRC proper, and newbies to the Macintosh, or to #macintosh, with its own set of habits. Newbies to Perl often join #perl and ask (even sometimes demand) questions or engage in conversation about a topic. f. Questions. "What's the best IRC client for the mac?" "Where can i get Ircle?" "Does anyone know where i can get Fetch?" Before the bots, people typed out the responses to these, or cut and pasted URLs from their private stash or from their browsers to respond to these, when they did. And they were often of the same form -- "QuestionWord Verb X?" in the simplest case. g. Any thing in the world we might refer to during interaction. Ircle, EFNet, Perl, IRC, anything that can be associated with a lexical token X. X1 is Y. (X2's dog X3) is Y. Anything that can referred to by a particle (not in the syntactic sense of particle, but in the particle-wave-field sense of perspective devices). h. Assertions and responses. These are what people said that could be taken to assert some definition or relation, like 'is'. So if someone asks 'where is Ircle?' you might reply 'http://www.ircle.com'. Call 'Ircle' (the subject) X and the object of the response Y. Since there are a number of different conversations going on at the same time, you might clarify what you're referring to by saying 'Ircle is at http://www.ircle.com', which is a simple form to find in the input text. These assertions, whether or not they are in response to a query, and can be used to update a database of Xs and the related Ys. These simplistic forms would not be as useful for information extraction from general text, however, unless the text contained a lot of simple statements; that would require more elaborate machinery. i. the Infobot itself, once introduced. Now to following a causal wave, as a gedanken experiment: (a) was on (b) and had been using (c) a lot, as well as (d), but noticed that (e) (as a class) would ask the same (f) about some (g) and get roughly the same (h). These waves reoccured enough times to create a recognizable field, for which (i) was created so as to minimize the energy (or maximize the laziness) of (a) and (b). This coincidentally improved the gain for (e) because the (h) could be quite detailed and triggered by (b) even when (e) asked (f) which (i) couldn't parse, and (i) caught the (f) that no (b) wanted to answer anymore. I say 'causal' here only in a retroactive continuity sense, of course; once the bot was there, everything seemed to be defined around it, and it changed the interactions in the channel. Later, when purl joined the IRC channel #perl, and the source was all finally released freely, the actors there started getting inside the code itself, affecting how and what messages did when the bot encountered it -- each message takes on a life of its own in the bot. The community has helped shape both the inner workings of the Infobot, which many people run copies of, and the personality of purl herself on the channel. The openness of the source has allowed people to work on the mechanism itself, rather than accept the existing modes of interaction, perhaps as given by one exclusive, closed-source, non-free retainer of intellectual property. In my estimation, purl is one of the best-maintained Infobots, because the group there is so dedicated to her and relies on her. b. Repeating, replicating, and interacting The bot itself a repeater, perhaps even an amplifier, for the phenomena that evoke waves between the particles (which of course may also act as waves or fields). The action as a whole affects fields all around it, overlapping, interacting with all sorts of things that didn't involve it directly; even this is appearing at The Perl Conference, which is uncoincidentally part of the larger field of Perl. Since the bot doesn't always quote exactly, but performs transformations on them before using them, it replicates the message with some changes introduced through the parsing, interpretation, action, and generation processes at work within the bot. This can be overridden by users, however, when a fixed answer should be given; people have used this to make the interactions appear more human, by peppering the bots with responses to common phrases. There is a way to have random replies, too, and insert some variables into the text (such as the name of the speaker) to make the patterns appear less rigid. There are several humorous stories of people who had never seen an Infobot thinking it was another person on-channel, rather than a bot, but illusion is not its purpose. It's there to support the local field. Any implementation of a memetic repeater can change the space in which it is activated -- by design, in fact, as each is activated by some motivated actor. The dynamics of #perl were certainly affected when purl was introduced, as with #macintosh before it; there was some resistance initially, but eventually the channel no longer even discussed her removal. Notions such as karma* ("woohoo! jon++" appearing on-channel would increment a value for 'jon') are now used socially, and I know I've even said "X-plus-plus" aloud when I've approved of something. I suppose that's called "geeking out loud," but these fields have certainly had an impact on those in the rest of my life, and on those of others. The bot alters the field upon insertion; something of a Heisenbot, changing what it observes. [ Footnote: karma has been around CMU for ages on Zephyr, and migrated onto IRC using the bot as a vehicle. ] Many people have now contributed to the Infobot and the Geckobot, at the level of theory, as code (modules and patches), and as packages of factoids called 'factpacks' that can be fed into the bots. These factpacks include Macintosh error code numbers, RFC information, network suffix lookups ('where is .tt?'), and any number of other topics; they are made available freely to anyone who wants to add that 'knowledge' to their bot. But a point (particle?) to note: the system can be collapsed. Particles created waves that became a field, which again became a particle, and we are referring to this whole thing right now with a single word -- "Infobot" -- that is a cover term for the whole "seme complex." Semiosis at work. 2. The Geckobot a. What's in a name? The name 'Infobot' has a rather unimaginative origin, as the blend now appears self-evident, and other uses of the term have inevitably sprung up. The Geckobot, on the other hand, is named in honor of my leopard geckos. I have five, two males and three females, about to become one year old. They'll start thinking springtime thoughts soon, and I could have several two-egg clutches this season, if I'm lucky, and I don't stress them out too much. The salient point of the Eublepharis macularius, however, is that you can have an influence on the sex of a leopard gecko by incubating it at different temperatures -- it's not entirely genetically determined, but affected by extrinsic, environmental parameters. If an egg is kept at between about 84 and 87 degrees Fahrenheit (28.9 - 30.6 C), both males and females are produced, but as the incubation temperature is increased by a few degrees, the sex ratio becomes skewed toward males; slightly colder temperatures will lead to mostly females being born [Bartlett]. Let us look at the two bots, url and purl, with this analogy in mind. 'url' sounds like 'Earl,' a masculine name, and 'purl' like 'Pearl,' a feminine one. This is an accident of naming -- 'url' is as in URL, a Uniform Resource Locator, like 'http://www.infobot.org', which some people pronounce (perhaps only internally) in a way that sounds like a man's name. 'purl' was coined as the 'url' for Perl. However, these simple choices gender-inscribed the bots. purl is almost universally referred to as 'she,' whenever the gendered pronoun is evoked. Likewise, url is a 'he'; I believe this was practically determined by initial choice of naming. Naming, and referring to names of things, is central to what the bot does, and the names and forms chosen at the inscription of each factoid into the bot can return to influence future mark-makers. The term 'Geckobot' is meant to evoke the image of inscription, and construction. b. Running the Geckobot The Geckobot is a set of Perl modules that you can download and run for yourself. The Infobot home page is at http://www.infobot.org, and you can get the latest information from there. There is an archive of the mailing list there, with discussion of many of the issues mentioned here. You can also go directly to get the source code from CVS from SourceForge at http://www.sourceforge.net, and go through the usual rituals to install a Perl module, with perl Makefile.PL, make, make test, make install. Running the bot is simple: perl -MGeckobot -e 'shell(verbosity=>11)' will get you a Geckobot::Console, with Term::ReadLine support, similar to the CPAN shell. the shell() method will take arguments that are used to set the startup parameters of the bot, and 'verbosity' at '11' here gets us all the debugging information, and then some. Once the Geckobot comes up, it behaves similarly to the Infobot on IRC, except in a desktop, I'm-your-personal-Geckobot format that you can use privately. As the bot loads, you can see it setting up some sane defaults and definitions. This can all be changed by using the 'conf_file' parameter and pointing it at your setup, but part of what you see when the verbosity is set up high is the list of modules as they load: [ 49] Module Babelfish OK 0.01 qr/(?ix-sm:^\s*(?:(?:x|trans(?:late))\s+(in|to|from)\s+(italian|en|sp|pt|spanish|french|fr|de|gr|german|ge|it|po)\s+(.*))|(?:(.*?)\s+(in|to|from)\s+(italian|en|sp|pt|spanish|french|fr|de|gr|german|ge|it|po)\s*$))/ [ 50] Module W3Search OK 0.01 qr/(?i-xsm:^\s*(?:search\s+)?(AltaVista|Dejanews|Excite|Gopher|HotBot|Infoseek|Lycos|Magellan|PLweb|SFgate|Simple|Verity|Google)\s+for\s+(.*))/ [ 51] Module Convert OK 0.02 qr/(?i-xsm:^convert\s+(.*)$)/ 'Convert' has loaded OK, is version 0.02, and responds to the regular expression given there. The qr// operator requires Perl 5.005 or better. The default mechanism used to decide whether to invoke a Module's action() at run-time checks this expression against the text contents of the Message, and squirrels away the parentheses matches ($1, $2, etc) internally if they do, so that the action() can use them as arguments when it is called. With the default verbosity (3), you'll eventually get something like [ 7] Modules +W3Search +Babelfish +Convert +Exchange +Stock +Dict +Whois +Math +Example +ModuleMgr +Excuse +WServer +Weather +Nickometer +Extras +Purldoc +METAR +Nslookup +Zippy +Zappa +Insult (8) [ 8] This is Geckobot 0.49_04 [Bender] > and you can start interacting with it. > what is x? i don't know about 'x' > x is http://www.infobot.org/list/ okay, Bender > x? x is http://www.infobot.org/list/ > yow How's it going in those MODULAR LOVE UNITS?? > 4*22+16 104 > convert 65 lb to kg 65 lb equal 29.48350405 kg > Those last three are from the Geckobot::Modules Zippy, Math, and Convert, respectively, where Zippy (from Rich Lafferty) and Convert (from Kevin Meltzer) are contributed modules. Convert, like many of the Geckobot::Modules, requires another Perl module (Math::Units) to work, but will fail gracefully if they are missing. There is a private CPAN bundle on http://www.infobot.org that will allow the CPAN installer to grab almost everything and install it for you. See http://www.perl.com/CPAN/ for more information on CPAN, the Comprehensive Perl Archive Network. c. In Botspace: the communication system Now let's turn to the inner workings of the system. For convenience, let's refer to this level of communication as "botspace*," to contrast with "userspace," which is where you and I interact with the bot through human language, and even "objectspace," internal to an object. First we'll lay out the modules, and then and follow some information through botspace. All of the information here is about the current development version of the Infobot, called Geckobot. This is a redesign and reimplementation that uses very little of the prior Infobot code, and the architecture is substantially different. [* Thanks to Simon Cozens for this term. ] The Geckobot is modular, for several meanings of the term. Firstly, it's a set of Perl modules. Secondly, it has an internal module system for dynamically adding and removing active code. There are Perl modules that make up the basic classes in the system, and the objects send messages between each other -- in the same fashion that a user or another bot would communicate with it, the Geckobot communicates internally between objects. The communication system philosophy is best summed up in the following maxim: Entities enact relations through policies over actions in response to messages. * Geckobot::Entity -- The base class Every module in the Geckobot is derived from an Entity class, which implements the fetching and storing of relations. The bot itself is an Entity, and each of its active components are also. A 'relation' is concretely defined in the Geckobot as a mapping from an n-tuple to an object, and even 'parameters' are simply relations of the Geckobot object (in fact, they are under the 'parameter' relation). When the bot's startup wave begins, it sets up a number of default relations, such as its own name, where to accept connections from, and so on, under the 'parameter' relation. Entity implements set() and get() as cover methods for the more general rget() and rset() methods, which take n-tuples as arguments. set('foo'=>'bar') is thus identical to rset('parameter','foo','bar'), but set() can take many attribute-value pairs, so you get the syntactic sugar of constructs like set('a'=>1, 'b'=>2'). The values can be anything, including objects and data structures. * Geckobot::Message: A container Entity for communication A Geckobot::Message is an Entity that carries the signs from one Entity to another. We must be clear that the method invocation on an object (in objectspace) is not a Message (in botspace) under this definition; a Message object is what is sent between a Sender and a Recipient (the Addressee may be different than the Recipient). Message objects can only be created and received by Dispatcher Entities. In the Geckobot, the Message takes on something of a life of its own. The original Message is passed through the system and altered by Entities; it can fork, and it can create expectations. It's fair to say 'the Message is the Process,' because the original Message hangs around and directs the show. Since many users can interact with the bot at the same time, this makes the entire thing re-entrant and allows parallel execution of many live Messages. A message contains information that pertains to six basic factors: the Sender, the Recipient, the Message itself, the communication Channel, the Code used to operate over it, and Context of evaluation. Two interesting support methods for a Message are clone(), which produces a copy of the message, and reply(), which operates on the current message by reversing the Sender and Recipient, and taking the reply as an argument. Thus, a Message can clone itself and send a copy out over the network, while maintaining expectations on the input stream. While it's interesting to allow a Message to create new Message objects and send them to, say, itself, the message object is not currently a Dispatcher. This may be changed in order to allow full generality; a Message could, for instance, act as clock through this mechanism. * Entities sending Messages: Geckobot::Entity::Dispatcher A Dispatcher can send and receive information, and so may be the Sender and/or the Recipient of Message objects. The bot instance itself is a Dispatcher, and so can send Messages to other Dispatchers (which also can receive messages, despite the one-sidedness of the name) to request operations over relations that they can affect, and receive requests or replies itself. Currently, a Message is the only Entity that is not a Dispatcher; changing this would mean collapsing the functions of Dispatcher into the base class Entity, and then every Entity (even Messages) could send and receive Messages. For example, the bot instance might send or relay a message to the DB object, which maintains the persistent database of factoids, to retrieve or set certain entries -- which are relations. As another example, the Console object can relay messages to the top-level bot Entity for processing and receive a reply in botspace, which it then relays to the user (in userspace). By now I imagine you're conjuring up the particle, wave, and field perspectives on this. Naturally, The propagating Message can be both particle and wave. Geckobot: The top-level bot Entity The bot itself is a Dispatcher, created with usual new() method. The shell() class method is just a simple wrapper for it: sub Geckobot::shell { use lib '.'; my $i = new Geckobot(@_, console => 1); $i->start; } The Geckobot class is the Entity that coordinates requests and passes information around. It contains several methods, but the whole process is invoked by a call to Geckobot::Process, which pre-processes the Message and then routes it through validation (UserProc), Question, and Statement as appropriate. Geckobot::Question parses the question, and checks to see if the Modules apply. If that works, it uses Geckobot::Language to construct and send a reply, otherwise it passes it along to Statement, which tries to make itself apply to Message, and assume responsibility for responding. If all else fails, the Message goes to a catch-all method, Confused, which either discards the Message object or replies the the Sender with a message that it wasn't understood, depending on the context. To give the flavor of how a reply is sent, one snippet looks like this: $self->send ($m->reply ('reply' => $self->lang->one_of('confused', $m))); That's calling the Geckobot::Dispatcher to send a Message object which is returned by the method Geckobot::Message::reply(), returns a version of $m with the Sender and Recipient swapped, and the content is given by the bot's language object (returned from the lang() method) producing one of the possible 'confused' responses with context taken from $m. The 'reply' looks redundant here, but is consistent with the method call syntax used for Entities. * Geckobot::Language With the transition to the Geckobot development, the bot is a first and flawed cut at internationalization by pulling the language out into separate Perl modules and referring to expressions by abstract tags. This is better than the prior Infobot designs, which riddled regular expressions over English all through a number of different files, making it rather difficult to port to non-English languages. * Geckobot::Language::English The Language the Geckobot uses is defined in a subclass of Geckobot::Language, for instance, Geckobot::Language::English. For instance, 'X_R_Y' is a cover symbol for statements like "Pittsburgh is rainy," but with three substituted symbols, X, the subject, R, the relation or verb, and Y, the object. In the Geckobot::Language::English module, there is a mapping for 'X_R_Y' and three arguments to possible outputs. If the argument order remains the same, as it might for any Subject-Verb-Object order language, then one (over-)simply redefines R to be the appropriate verb. X and Y come from the text itself. This will also work if argument order is different, because the bot simply uses the abstract tag and the three arguments. So 'X_R_Y' might actually look more like X_Y_R, with the verb last, and the derived Language would give the surface forms correctly. Parsing, and in particular, separating X and Y when X and Y are both qr/.*/ is left as an exercise for the reader. * Geckobot::DB: Persistent relations One striking relation in the English system is the relation 'is', or, more accurately, 'English first person singular present verb be', which is maintained by the persistent database Entity, DB. For non-English languages, the mapping for this term would be different, but it would be a homomorphic relation. The database is implemented on top of DBM, but there is at least one publicly available fork of the Infobot that uses DBI (named pbotty). The Geckobot differs from the Infobot in this representation of relations. The Infobot was less flexible when it came to representation, but both versions allow you to store things like 'X R Y' and retrieve them... R here is a verb, like 'is' or 'are'. X is any object, and Y is the associated object -- usually text. The entry is persistent, even if the bot is shut down and restarted. Other relations, such as parameters, can be made persistent in the same way: the relation for the Entity is tied to the DBM with tie(). This suggests a subclass Geckobot::Entity::Persistent, that has non-volatile relations; this is a future direction. The DBM files currently define dbget() and dbset() methods as distinct from their parameters, which use the usual Entity methods, get() and set(). * Geckobot::Console Geckobot::Console implements TTY-mode interaction with the bot, as a console. It can use Term::ANSIColor to help the more salient bits stand out in the display, but it's not required, and those allergic to ANSI color seem to be pleased by that. The Console can also use Term::ReadLine if it's available, much like the CPAN shell (from CPAN.pm, the CPAN installer); this gives you command history and all those goodies we've come to expect. Raw TTY can be cumbersome, so I recommend you install Term::ReadLine if you don't already have it. Running in Console mode is a good way to debug, because you won't annoy other people. * Geckobot::Server The Server module is in its infancy, but it uses POE, the Perl Object Environment, to handle connections as if the bot were an IRC server; it also allows you to link bots together to share queries and information. In fact, the entire architecture is designed so that something like POE, the Event module, or even Tk event loops could work with it. POE offers a an excellent event-driven framework for the Geckobot. While the Infobot is bound to IRC, the Geckobot has not, as of this writing, been turned loose in that medium. By the publication of this paper, the Geckobot should be able to connect and interact as the Infobot does. * Geckobot::Interbot and Geckobot::Language::Interbot For communication between Geckobots, we want a terse, unambiguous grammar, and so we derive a separate Language object that is used as appropriate. When interacting with humans, randomized outputs keep things from appearing too mechanical, but between the bots, we want it to be as simple as possible to extract the information we need. A Geckobot::Language::Interbot object is used in place of the human language for these connections. Interbot mediates inter-bot communication, as the name implies. The Infobot has a way to link up bots using IRC, but the Geckobots will be able to open direct network connections to other friendly bots. This is an area we're working on actively, and the Geckobot Interbot communication is in flux, but it is clear that there will be a mechanism for rhizomatic discovery. * Making Modules: Geckobot::Module The Module abstraction simplifies the process of incorporating new functionality. The base class, Geckobot::Module, implements most of the heavy lifting involved in process control and rule matching, but it allows a lot of flexibility. All Modules are derived from the Geckobot::Module base class, which in turn is derived from Dispatcher. A derived module needs to define, at the very least, a new() method, which creates and initializes the Module object at instantiation time. By convention, new() sets some internal variables, including the name and a description, which can be retrieved from any module through name() and description() methods implemented in the base class. Let's look at two possible waves through the Module: In the most general case, a derived Module would implement a matches() function, which takes a single Message object as input. If matches() returns true, then the Module's action() method will be called. action() is where anything interesting might happen; it's where the action is. action() can create new messages (since a Module is a Dispatcher), or can change the given message, or send a reply() -- or do nothing. It is even possible for a Module to act on the Message, and decorate it or change it, without accepting it in a final condition, and this is a way to filter or transform the message before it reaches another module, or to do several things in response to a single message. In the most common case, however, a Module would rely on the default matches() provided by the base class, Geckobot::Module. If, in new(), the Module uses the regex() method to set a regular expression, matches() will check that regex over the text form of the input. If it matches, then matches() will place a call to the Module's action() method on the queue, which the derived Module must provide. In this case, if action() returns a string, matches() will pack it up as a Message object and send it back to the original sender, by exchanging the Sender and Recipient and dispatching it, using the Message objects reply() method. * Geckobot::Module::ModuleMgr ModuleMgr is a special Module that can peer into the state of the bot instance, as well as any other Modules that are instantiated in the system. It can retrieve and set parameters at run-time, if the access permissions allow, and give status information for Modules. When Modules are initialized, they can fail, due to dependencies, or syntax errors, or any number of reasons, but the failure is noted, and ModuleMgr can get status information out of well-behaved modules even if they do fail. At the very least it can determine that the code failed to load; if a Module manages to instantiate, it can mark itself as failed and not enabled for operation, and even give some diagnostic information. Many of the existing Modules will warn about failed dependencies this way. ModuleMgr defines its own matches() function, overriding that of the basic Geckobot::Module, because of the complexity of the language it accepts. Any Module is welcome to do this too, even to accept everything or to ignore everything. * Other Modules There are a number of other modules that do interesting things, which you can explore: Babelfish, Convert, Dict, Extras, Example, Exchange, Excuse, Insult, Math, METAR, Nickometer, Nslookup, Purldoc, WServer, W3Search, Weather, Whois, Zappa, and Zippy in 0.49_04. Many of these are contributed by friends of purl on the #perl channel. 3. Information flow between Entities I connect to the Geckobot using a Console, and say 'x?'. If we follow the information, this is what happens: Geckobot::Console detects my input, and creates a new Message object. Console is a Dispatcher, so it needs to be able to alias and resolve Entities 'behind it', so it puts an entry into the Message object that stands for me as a Sender, and includes the text of my name. The Recipient is the bot itself, the content is what I said, the Context is the channel and history of the local interactions. The Context includes whether this is a private or public message -- in this case, private, so the bot will consider itself addressed, and reply under all circumstances, even with an 'I don't know'. Once the Message has been constructed, Console uses the send() method (since it is derived from Dispatcher), and it goes off to the top-level Geckobot instance. The bot sends the Message to its own process() method, which is the default for any Dispatcher who receives a Message directed to it. process() resolves the Sender and attaches the access rights and class to the Message, then checks if this is an Interbot request -- if so, it goes off to Interbot, avoiding the problems of parsing English. Process also sets several other relations, including the time Sender was last seen active, and whether continuity can be assumed between this Message and the last one, based on the context and time. After all that (and a little more), it sends it on: $self->send($m->set('recipient' => $self, 'action' => 'question')); Naturally, this gets pickup up by the bot and forwarded to the 'question' method. Question plays some games to remove some parts of the utterance, then tries to parse it out as a question using the Language object. Even if it doesn't parse as a question per se, just 'X' can mean 'tell me about X', so the text is also checked against the database, given a number of reasonable surface transformations. If the relation is not retrieved from the DB object, Question will then engage the Modules, in turn, over the Message. It will also try extracting the apparent request and passing it to the Modules, so "What is the weather for Pittsburgh" will work, even if the Module only declared itself to match qr/weather for (\w+)/, and the Module will receive the part of the input that it is looking for as long as it fits into Question's schema. If one of the Modules declares that it will handle it (by returning a Reply), that goes back to the bot which forwards it on to the Console instance, which then puts it back over the communication channel. If the bot doesn't find any way to match, it may send an Interbot request to see if any friendly bots know about X, and the result would tunnel back through the bot and be forwarded to the Console, and on to the user. Failing that, the Message will then go on to Statement, which will try to parse the Message out as a directive to update the database. If it succeeds, it will send a Reply, as Question does, by constructing it with the Language object; otherwise, it passes it on to Confused, which, since the Context here is 'private', will respond with one of the Language object's 'confused' messages, presumably in English. 4. Userspace: interacting with the bot through language When you actually use the bot, you don't have to worry about Messages, or Entities. The usual interaction is through text, but we have wrapped the system with CMU Sphinx and Festival for an initial speech-enabled system; CMU Sphinx is an open source, free speech recognition engine, and Festival is a likewise free speech synthesizer. Speech raises a number of new issues, in language modeling and parsing the unknown language that is brought about through naming something; this is beyond the scope of this paper, however. In the text domain, on IRC, the usage has co-evolved with the contexts the bot has been in. The language that they use on-channel is influenced by whether or not they want to trigger the bot now, and they choose their words around this; I have seen several instances of people exlicitly avoiding the 'X is Y' form when they didn't want it to accidentally be picked up and repeated, and it's commonplace for people to reply to questions in a way that deliberately invokes a bot to give more information (such as a list of URLs). The text parsing happens in an unusual way: as the Message is passed between entities, they fill out registers inside it, decorate it, and pass it on, so there is no top-level full grammar. In fact, Parse::RecDescent is not really an option, given our philosophy of passing the message to an object that encapsulates its language and function -- we don't know what an object is capable of responding to, especially if that object is an Entity in another, remote bot. Since people are conversing in text on IRC, text is an appropriate thing to encode and repeat. The users need to prune and clean up the responses in the bot, and the 'owner,' or the person who maintains the bot, must set the parameters of the bot appropriately for each channel. Some channels like chatty bots, some like no-nonsense, only-the-factoids-ma'am bots. Either way, they are communal encyclopedic factoid stores, retaining and replicating the texts they are fed in kind, fulfilling the role of memetic individual in userspace. IV. Discussion 1. The Archimeme The Infobot, and the Geckobot, are products of what I called the Archimeme -- the raw impulse to build replicators. These pieces of code (the replicanda) came about as blurred copies of that impulse. The bots themselves perform replication, paying reasonable homage to the Archimeme like little prayer wheels, but are certainly not the only plausible institution of it. As replicators intended to carry signs between Entities, as well as store them and repeat them later, the bots are, in software, examples of non-human memetic individuals in the spaces they inhabit -- they have some level of competence over the culture. Certainly, mere repetition and retrieval are shallow definitions of competence, but they do interact with other Entities in the transmission of cultural information, and as they accumulate factoids, they reflect at least part of the common disposition and presumptions of the others there. Even creative work includes some degree if replication, and works that actively replicate have a peculiar recursive relation to the Archimeme. The should be no great mysticism about the Archimeme; it's present in our everyday lives to such a degree that we often take it for granted. 2. Entity soup In the case of the Geckobot, there is a difficulty in drawing an inside/outside distinction, because the notion of Entity is so fluid -- a Message can pass to another bot, return, and be answered (to the original Sender) by the first bot, all through mechanisms that resemble both the internal and the external communication. The system itself is designed so that even an Entity can be made of Entities, and so that whole bots can link together rhizomatically and share resources and content. The Geckobot itself can send and receive messages to its 'internal' Entities using the same mechanisms that it can communicate with other bots or users. This is similar to notions of a society of mind; even when someone addresses Kevin Lenzo, they talk to many facets of Kevin, that parade under the notion of an individual. I am Many-Me, and occasionally the many mes don't agree. The interface is simple, but the collection may not be. 3. Trust It comes down to how we treat the policies over the affected relations, however. Trust plays a pivotal role in what we allow to happen; if 'I' consider something part of 'Me', it might get special status, or at least with respect to something 'outside' -- something 'Other'. And to more or less of a degree, we grant some degree of trust to the communities that we interact with. The Rhetoric [Aristotle] describes the devices for exposing the inherent qualities of an enthymeme, and using them to influence the audience, by exploiting their biases and interests, or appealing to logic. Trust, or Ethos, plays a large role here -- trust in the speaker, trust in your own judgment, trust in the continuity of things. The Perl and #perl communities inspire a certain level of trust, and purl assumes it vicariously, through repeating those that are often "correct," and by going to primary sources over the Internet. Peirce wrote a great deal about the construction of knowledge, and 'justified true belief,' in his writings on Abduction. For him, knowledge was constructed through continuous encounter with the world, and the reliability of that information played a role in the construction of relations. Abductive inference allows us to generate new knowledge by weighing evidence, but not in a strictly deductive manner. Medical diagnosis is often abductive -- from symptoms to causes. The reliability of the tests used, and information given, influences whether or not we accept a given explanation, or course of action; trust, and even suspicion, are at the heart of the scientific method itself. Upon receiving a message, you might be required to perform some truth maintenance to resolve some apparent conflicts, but you would be less inclined to do so if you mistrusted the source. In the space of #perl, people trust that the answers they get will be helpful (in the main), and so trust (to some degree) the members of that community when it comes to information. This diminishes the otherness of the group as a whole, and as the mores and patterns of the society become internalized, they shape impulses within the Entities. 4. Discovery Infobots, and Geckobots, are designed to be linked together to share resources, and it is desirable that they should be able to seek out and discover other bots, perhaps through known bots, or by some registration process. The whole trust propagation issue remains -- I might want my personal bot to trust those in my intranet workgroup, but take information from the net at large with lower confidence. If someone in my workgroup adds a trusted host, I might want to set a policy that I accept all his trusted hosts, and do discovery that way. Another approach is to connect to the big rhizome of Infobots discovering each other around the network. Like tubers, a bots or pieces of bots start sending tendrils towards each other and sharing material. One, or part of one, can be cut out and placed in a new bed, and start intermingling with it, carrying replicating material from the prior context. Validation, Trust, and Discovery remain important issues for the Geckobot. We have no easy answers here. The infobot-dev mailing list has had active threads on Trust and Discovery; you can find more at http://www.infobot.org/list/. This work is a vehicle for discovery of thoughtful replicator builders. 5. #perl, purl, and Perl I have no doubt that purl has affected #perl and Perl itself*. She deserves a good deal of the credit for catalyzing the Yet Another Perl Conference (YAPC), which was largely organized by word of mouth, or more accurately word of hand, on #perl. [*footnote: For small values of Perl, anyway.] Many of the people behind the nicknames on #perl are substantial contributors to Perl and the Perl community, as well as text and program authors, documenters, and users. And as #perl presents a forum for discussions, and as people enter and see the discussions of Perl itself, with purl's help, many different systems interpenetrate -- shaping and shaped by the community. The discussions about how purl does and should behave on the channel have touched on deep issues of social interaction. Since purl will accept factoids from anyone, and anyone can change them, the space has become charged with the upkeep of the communal store. At various times people have wanted locks on factoids, or to allow only certain privileged people the right to change them, or wanted disclosure of who said or set something (currently anonymous), or wanted factoids changed to things others would disagree with. Here we see the 'Infobot problem': many disagreeing author/readers vying for a limited space. When one person thinks X "rules" and another thinks X "sucks," to put it simply, there will be contention over the definition of X, and the definition of X in the bot will be replicated, both publicly and privately, so there is some political impact of the definition of X. The Infobot has carried the legacy of small chunks with it from IRC -- responses are usually under 200 characters, which keeps the contributions rather compact; the Geckobot is moving towards being able to display by any means that handler object exists to fulfill. With the egalitarian Infobot, the actors can just keep changing each others definitions until they change their behavior somehow, perhaps as compelled to by the other users of the bot. Since the bot is a large meme complex, with many particles and many waves moving through it, everyone who comes in contact with it is both a reader and an author. It's as if you could change the encyclopedia entry for X for everyone whenever you didn't like it. Internally, the bot has only the shallowest mechanism for storage and retrieval of the factoids, but the users cleanse the common store, making it useful, making value judgments that the bot can't make for itself. Mistakes are corrected, stale URLs are updated, social codes for presentation, spelling, and language style are enforced. And this is not simply in one channel with a hundred people on it, but several marginally related channels simultaneously, each with different construction and constituency. Perlfaq Prime (http://www.perlfaq.com) also faces some of these issues, as a collection of authors who can all modify each others entries -- it is a collection of questions that the community enters and modifies directly, not as a weblog adding comments, but with multiple authorship. The approach seems to work well there, also, but one can imagine more volatile spaces -- a weblog like Slashdot -- where it might not, or might not be stable enough to be useful; their format is one where posts are made and then followed up on, allowing a thread to expand, but not contract or change internally. Memepool (http://www.memepool.com) takes a different, non-weblog approach; many authors, each making (possibly many) small contributions, like bricoleurs, but there are no follow-ups, and and editing of the 'memes' is the prerogative of the authors and editors. The result is an eclectic collection of topic interesting to people like the people who post to it. V. Conclusion The Infobot, and the Geckobot, are repeaters of language. They are quite open to anything, and have been used for positive and negative effects -- from storing illegally transferred serial numbers on cracked programs to giving Perl information. These bots are extremely shallow -- they have no notion of 'goodness'. This partly comes from the fact that they just use patterns of text, with no real understanding of what they do. While many thought forms are propagated through text, the bots have few of the filters we take for granted in rational humans like ourselves, so we have to give them proper care and feeding. Hopefully the perspective devices and constructs that we discussed here can aid a discussion of how that care and feeding is done. We, as individuals, _can_ be selective about what we replicate, what we create, and how we maintain the structures of information. As human beings, we can be conscious of what we do and do not want to put into the field -- not completely, but to a great extent. When you create a Perl module, or answer a question, or ask one -- the information moves, filtered, through each of us as Entities, but certainly as Entities with choice. The Archimeme does not rule us dispassionately. We can choose to build, and to improve in our own ways, and we can choose our policies of reaction to the Messages we receive, given the relations we maintain -- carry out the care and feeding of the social and cultural fields, if you will. Perl, as a community, is creative on a very high level. All the modules on CPAN, and CPAN itself, show the evidence of that hand of creation; and, while there are sparks, as in any real discussion, the dialogue -- polylogue -- has been a dynamo, letting us all be a little lazier. And we all have a say in how the world is shaped -- whether we will be cruel, whether we will treat a newbie with grandmotherly kindness, or whether we allow our creations to speak louder that words, and give them away. The bots are blurred copies of the community, that give of themselves as if that were their entire reason for being. The Archimeme demands our service, but we can perform in good conscience. How do we build the right structures to enable free and open communication, through implementations in free and open software, that can change the way we live in a positive way? Modest experiments of that sort have been carried out in the crafting of the Constitution of the United States, and similar structural codes throughout the world. How will Perl, or any field, be shaped, socially and technically? It's up to you, and you need the right tools for this, the right job. If you build it... VI. Bibliography [Aristotle] The Art of Rhetoric, Aristotle; Translated and with an Introduction by H. C. Lawson-Tancred. Penguin Books, 1991. [Bartlett] Leopard and Fat-Tailed Geckos, R. D. Bartlett and Patricia Bartlett. Barron's Educational Series, Inc., 1999. [Cobley&Jansz] Introducing Semiotics, Paul Cobley and Litza Jansz. Totem Books, 1997. [Dawkins] The Selfish Gene, Richard Dawkins. Oxford University Press, 1999. First publication 1976. [Eco] Semiotics and the Philosophy of Language, Umberto Eco. Indiana University Press, 1986. [Lenzo] Purl and Infobots, Kevin A. Lenzo. The Perl Journal #10, Volume 3, Number 2, pp. 10-14. Summer, 1998. [Peirce] Reasoning and the Logic of Things, Charles Saunders Peirce; Edited by Kenneth Laine Keitner, with an Introduction by Kenneth Laine Keitner and Hilary Putnam. Harvard University Press, 1992. Peirce delivered these lectures as what eventually became the William James lecture series at Harvard in 1898. [Pike] Tagmemics, discourse, and verbal art, Kenneth L. Pike. Michigan Studies in the Humanities, University of Michigan. 1981. This pamphlet contains three papers, "Linguistic Complexity in a Two Page Instruction Sheet," "Levels of Observer Relationship in Verbal Art," and "Grammar versus Reference in the Analysis of Discourse." [Saussure] Course in General Linguistics, Ferdinand de Saussure; Translated and Annotated by Roy Harris. Main text copyright 1972 Editions Payot, Paris; Open Court Publishing Company first published as translation 1986. Saussure's original course notes date from 1906-1911 at the University of Geneva. [Simon] The Sciences of the Artificial, Herbert A. Simon. The MIT Press, 1981. Web resources: The Infobot home page, http://www.infobot.org/ The infobot-dev mailing list archive, http://www.infobot.org/list/ Infobot on SourceForge, http://www.sourceforge.net/project/?group_id=2241 The Comprehensive Perl Archive Network (CPAN), http://www.perl.com/CPAN/ CMU Sphinx, http://www.speech.cs.cmu.edu/sphinx/ The Festival Speech Synthesis System, http://www.speech.cs.cmu.edu/festival/ Eris Free Net, http://www.efnet.org IRC Help (irchelp), http://www.irchelp.org Perl Object Environment, http://www.newts.org/~troc/POE/ pbotty, a DBI-based Infobot, http://matrix.linux-help.org/pbotty/ Perlfaq Prime, http://www.perlfaq.com/ /. (Slashdot), http://www.slashdot.org/ Memepool, http://www.memepool.com