Searching, Publicity; How Well It Can Work, How Well It Should

As one with a long term interest in both Automated Information-Collecting Agents and the Analysis of their Data, I was most fascinated to see the arrival of the Sheep Labs Search, created by the Electric Sheep Company. Actually, I was a little startled, as it suddenly arrived without warning on an Easter Monday during which many people are not present, already filled with all sorts of information concerning products.

Automata and Homonculi

I thought that, to start with, I would make up a couple of terms to clarify exactly what was meant here.

  • Free Image Hosting at allyoucanupload.comAn automaton ("script bot") exists as an object in-world and uses LSL to move about and sense. It may use outside resources to make decisions and give out information, but its interaction with the Grid will always be on the same basis as any other script. In fact all scripted objects are automata of some level, but most are not very sophisticated, simply opening or closing when touched, say. For example, my trams are automata.

  • Free Image Hosting at allyoucanupload.comA homonculus ("client bot") uses some outside resource to control it, but appears in SL as an avatar; libsecondlife or other open client code has been used to breed this new kind of entity. Potentially it has all of the abilities and senses of a avatar; however, some of them are too hard for it to properly use. The eyes are blind, the fingers feel nothing. Sometimes this is an issue of technology but in many cases it is a matter of cognitive ability. For example, "landbots" are homunculi - they appear to be real people but in fact can only sense land for sale, decide whether it is at an appropriate price, move to it and buy it. They can do this very quickly but they are not amenable to pleas and threats, nor can they look around and decide whether the plot is pleasantly situated and would sell well (except perhaps crudely through certain algorithms, certainly with nothing like the sophistication of an actual person). Homunculi could also use Automata themselves.

  • Neither, as will become significant later on, can read.

A Quick Summary of Operations

This form of search, as far as I understand it, operates on the following basis: a search homunculus visits areas, looks for objects lying around, and remembers what they are, what parcel they are on and their price. The exact order in which it does this (does it look for parcels first then search the parcels? does it look for objects then work out where they are?) is mostly unimportant, but there are a few details of it which are important, which I shall mention later on.

Accuracy and Performance

Given that the technical details of the implementation are likely to be changed and improved I will not spend too much time talking about those; if I point to a performance issue, for instance, it will likely be fixed in the next day or two making my commentary worthless. Instead I prefer to talk about issues relating to the basic concept of grabbing the names of items for sale, which is not necessarily unique to this particular system by the ESC either, and may be replicated elsewhere in some different form.

Things which work

Sherlock Holmes explainsIt is certainly the case that the engine is able to find things for sale relating to a particular term, though how many of the possible items returned it is impossible to say, and how quickly it will update I do not know. It is also rather fast, though again I do not think that it covers the entire Grid. It does not at this point appear to reliably return results for parcel names containing keywords; I am not sure whether this is by design or some sort of error.

It is clearly an advantage to be able to search for the entire contents of a shop rather than just whatever the owner has managed to squeeze into the parcel description. I, for instance, have a selection of snowball-throwing devices in my shop, which do not have the space to put into search terms for the current Find. Nobody searching for "snowball" would come up with my shop using that, but with this new engine, they would.

It also has a head start on the current methods of Cheating The Search (a practice also euphemistically termed Search Engine Optimisation, but which basically boils down to Camping). This is not a long-term advantage, given that if it grows at all popular, specific methods of Cheating This Search will arise with great speed - "keyword spam", the practice of putting multiple irrelevant keywords into a search merely to attract searchers, will be far far worse, given that a huge number fake for-sale objects can be created with large quantities of attached text - but for the moment it is refreshing.

One thing that it also does is return items which have a specific person as the creator, not the seller, since it is able to search for both owner and creator. Did I mention that? It is able to do that, so if you wish to see all items set for sale by a particular person you are able to. This is mostly of use to those wishing to hunt down those selling Freebies, a practice which I consider generally unethical but which I do not actually care much about for myself, except in some rare cases. There are those who would be interested in such activities, though, and they will find this function invaluable.

As a side note to the above I have seen the proposition put forward by Mr Neva that this will also allow the harassment of people who are quite legitimately reselling items that were sold for transfer with no indication that they remain free. This is, I would admit, a danger; I am not sure quite how much of a danger, as I do not know how common this sort of behaviour is in the world, but I have certainly heard of it.

Things which item-name-based search will not do

Dr WatsonThere are however some issues with this which I believe will not help.

Item names are not perfectly descriptive. They often do not relate explicitly to what is actually being sold - if one has a box with several items in it that is set to sell its Contents when bought, it doesn't really matter what that box is called, though it might make one's accounting a bit tricky if one has umpteen records of sales for something called "Object". For many small business people this will not be significant, though.

As well as this, sometimes items may rely on information surrounding them which no bot can read; for instance a dress may be sold in different styles, each containing box with the same name, but each with a different texture applied to show a customer exactly what they are getting. Or, perhaps, the dress is merely sold under its name without the term "dress" - I have several items of clothing like this, where the outfit has a name that means very little, because, after all, anyone actually going to buy it will see precisely what it looks like and what the outfit contains from nearby textures and signs.

This is why I mentioned that bots cannot read, above. They currently do not have the intelligence to interpret context and terrain, and it is likely that it will be years or decades before one is able to "look" at a sign, say, and take information from it which a Human Being could receive in an instant.

In the past, naming sale boxes suitably has not been necessary, because searching has been done strictly on the basis of the name of the parcel. Introducing an extra system which does not incorporate this as well may produce some odd results.

The fatal blow, though - and I am sure that experienced Residents will have picked up on this already and be tapping their feet, wondering when I am going to come to it - is that search by item name is absolutely incapable of incorporating vendors, and an awful lot of people use vendors to sell their products. Personally I use them rarely as I prefer buying from individual boxes myself, but I do, for example, have all my free items in a "vendor", just one which does not charge.

The result of this is that I believe that performance in practice may produce results biased towards those selling one way and not another for no good reason. There is nothing better about either method, and a system which favours one over the other will be distorted. As I said previously, the fact that the current search means that people put key phrases into parcel names and descriptions may balance this out. Still, I am not sure that this will not be a significant factor, and search engine placement is a very significant thing.

One last point - this works specifically for items for sale, not events or areas. A casino or art gallery would be disadvantaged. Not a huge issue as long as potential users are aware of this.

Ways in which item-name-based search might prove less effective than the current Find

Professor MoriartyAt this stage, by the way, I will not be considering privacy and surveillance implications; they have their own section later on.

As mentioned above the bias towards object/box sales vs vendor sales might be bad, but I am not in possession of sufficient (or any) statistics to prove this one way or the other. There is also the potential issue of people introducing deliberately libelous and/or distorted items to create a bad impression. Some enemy of mine, were I to have any, which I am sure I do not, could take one of my free and modifiable items, turn it into something disgusting and set it for sale as "Ordinal's Nazi Ageplay Camp Chair - Earn L$$$$$ For Abusing Jewish Babies!!!!!!" which, if sufficiently repeated, would turn up on a search for my name with me as the creator. I would not be terribly happy with such a situation.

There is also the issue that sometimes, items are set for sale when actually they are not meant to be sold, either by mistake or because the original version was set as such. This is surprisingly common and even I have noticed an item set for sale in such circumstances, though a recent post perhaps indicates that it is not an issue any more; I am not sure whether the points mentioned therein refer to this issue but will be testing it when I am able.

I think that the major disadvantage here may come from the unwillingness of people - some of whom are in control of large amounts of land, such as Ms Anshe Chung - to allow the "scraping" of data by the searching homunculi on their own properties. This dramatically reduces the number of potential results. I deal with this in the forthcoming section.

Notification and Privacy

I believe that I have been reasonably balanced, and balancedly reasonable, in my treatment of the issue and the Electric Sheep Company so far; I certainly have no personal issue with any of them; and I have done such things as send bug reports which will hopefully improve the system; thus I hope I will not be taken to be suddenly partisan if I say that I consider the way in which this was announced and "rolled out" to have been handled pretty badly all told.

Holmes and Moriarty at the Reichenbach FallsThe announcement of its existence was a fait accompli on a holiday - "hello, we have scanned you and put your details on this web site, oh, you might want to opt out maybe". I think many people were aware that ESC were working on a search engine for SL but I don't believe anyone outside of the company had heard anything specific. If they did they didn't tell anyone.

I can understand that a fully opt-in system would require a considerable level of organisation and bureaucracy, particularly on the mainland with its absentee landlords, probably to be honest an impractical one and one which would definitely mean the project lost a lot of surprise value. For heaven's sake, though, a week or even a few days of grace prefixed by a message such as "We've got an amazing new search engine! It's great! You'll get loads more sales! But if you don't want to be indexed by our bot, here's your chance to opt out now!" would have been respectful of the well-known fact that people often have objections to this sort of behaviour on the basis of privacy.

At this point I must say that this is nowhere near the level that slstats.com reached, where there was to be quite honest active contempt for objections, and which catalogued considerably more significant data. I don't believe that ESC have contempt in this instance, please do not misunderstand; I do think it could have been handled better.

It is traditional at this point for some thoughtless hick to pop up and sneer "it's public data, you put it out there, they don't have any obligation to tell you what they're doing, they're a private company, get over it, you can't stop it, IT'S ONLY A GAME!" Whilst I am not in the habit of replying to people that I have just made up, even if they are accurate conglomerations of responses I have seen elsewhere, I must make a few points.

The issue of what is "public" data and what is acceptable to happen to it is considerably more complex than a binary "this is private, don't use it at all" / "this is public, do what you want" one, as is becoming increasingly clear in the Other World with matters such as Closed-Circuit Television Cameras. My appearance, for instance, is clearly public in that anyone may look at me and I don't mind. Somebody who looks at me consistently for a long period of time, I will find suspicious. The repeated access of "public" information becomes surveillance. The mass access of "public" information can be used for profiling and in concert with other data to produce analyses of behaviour patterns, and not for the benefit of the individuals concerned. And any of this information can be damaging to the individual when it is presented out of context, which, with any mass data gathering escapade, it certainly will be.

The Hound of the BaskervillesIt is rational to be concerned about this when it occurs. In fact, I might say that being concerned about it by default is in fact more rational than assuming that it must be okay unless otherwise proven. The "technophobes" here are in fact those who intuitively understand the issues. Information, once gathered, is almost impossible to take back; removing oneself from databases is incredibly difficult unless they are very small ones. Endless legislation goes back and forward on these points in the Other World.

I do not wish to go off on too much of a tangent here, and I am not saying that the ESC are trying to gather data to feed to the New World Order so that they may come and enslave us from helicopters black save for corporate logos. But the mass gathering of information in any way is potentially significant, and if people concerned about that do not feel that their concerns are being addressed they will take action, no matter how much bumpkins say there is no choice and people just need to get over it. A very clear statement about exactly what is gathered, what is stored and what it is used for needs to be put forward, with some advance warning, or else there will be mistrust, resulting in, say, the system being banned from several dozen sims at once.

Merely gathering data regarding items for sale may not in itself seem threatening, given that items are generally put out for sale to the general public. There are though I'm sure people who would actually rather they were not publicly listed, who don't advertise in the traditional way. Perhaps they want to sell to a specific person, or to a select crowd. Perhaps they don't want the details to go into a database to be aggregated and used to produce market strategy presentations. Perhaps the idea of being entered against their will into somebody's catalogue with whom they may disagree makes them feel icky. Perhaps they are insane. Who knows? Does it matter? This enterprise doesn't bother me terribly and I am not going to be banning any bots for the foreseeable future, but if others wish to I cannot say that their wishes should be ignored. The choice and appropriate information needs to be presented from before Day One.

Other commentary

Anonymous's picture
11 Apr200706:43
Tony Walsh (not verified)

Great article. I like the terminology you use for the bots. I have often used "homonculus" to describe my own avatar, but I think it's better-suited to a bot.

Anonymous's picture
11 Apr200706:44
Nicola Escher (not verified)

So thoroughly well-stated, I feel no need to write something up in my own journal. Kudos.

Anonymous's picture
11 Apr200708:57
Adri (not verified)

Well reasoned, well stated, and wonderfully free of sensationalism, Miss Malaprop. Thank you for this excellent report.

Anonymous's picture
11 Apr200709:32
Prokofy Neva (not verified)

Excellent write-up and very helpful to think through the issues. It's a bit boggling to me to think of how the homonculi can't *read*. I thought that by scanning shapes of words or something they did "read" after a fashion.

My take on this issue of objects-for-sale-in-themselves versus vendors-of-objects-in-them is that the Sheep don't really care about objects for sale. They have SLBoutique.com for selling objects. What they care about is getting a search going. Ultimately, they want to scrape all stuff on all parcels. They'll want to scrape the names of parcels, avatars, the events calendar, the land for sale list -- whatever is to be scanned.

Things for sale is just an easy place to start.

I'm glad you saw fit to say that they should have given a heads up about the big scrape. That would have gone very far towards ameliorating some concerns.

It does leave open the question about the ultimate USE of this data. There are no community checks and balances on a business that didn't see fit to consult with the community before it began scraping.

Anonymous's picture
12 Apr200720:43
Ace Albion (not verified)

For those in the retail business who welcome this, adapting their sales methods to include buyable boxes with clear and useful names or descriptions will be something they can do to take advantage.

I left my other thoughts around on Prokofy's blog and Clickable Culture, but I will say one further thing here:

Whatever assurances relating to policy and information usage that Electric Sheep give, and hold to, they are only relevant to what Electric Sheep do. Nothing is said, or expected, or scrutinised, of the other one, 10 or 1000 object harvesting operators with lower profiles and presumably lower morals.

Anonymous's picture
12 Apr200722:28
Ordinal Malaprop (not verified)

Well, yes, certainly if Electric Sheep give assurances they are only relevant to Sheep, though it is an opportunity to set an example for those doing similar things in the future. But as for saying and expecting things of others, I would say that that already occurs; there is considerable condemnation of the irresponsible users of landbots and campbots and other homunculi, when they are identified - not to imply a connection between that sort of criminal and this project of course, just to show that it isn't simply the profile which affects the response.

Anonymous's picture
13 Apr200719:43
Ace Albion (not verified)

The "condemnation" of landbots and campbots is nothing more than the wailing and gnashing of teeth. I don't mean that the objections are not valid, I mean that they have no effect whatever, and those who utilise these systems continue to use them and profit them, with no deterent from doing so other than having to grow a thicker skin.

Whatever the Electric Sheep choose to do, if there is a way for someone to exploit this technology for their own profit, regardless of the detriment of others (or even due to it) they will do so. Brace yourselves, is what I'm saying.

Anonymous's picture
13 Apr200719:55
Ordinal Malaprop (not verified)

I think that is more a sociological point, in that despite their impassioned complaints it is very hard to get a group larger than half a dozen people to actually _do_ anything effective about them on the Grid, though occasionally there is an unexpected mass movement.

One issue is that despite their apparent importance to those who read them, forums and journals and whatnot are actually fairly irrelevant to the day-to-day lives of most on the Grid, who will not know about homunculi and suchlike until they are bitten. The Copybot furore I think spread because of a transition from Forum to World, which is a lesson I believe for protestors and rabble-rousers.