readme for SWSpookShare-0.4.0.2

For newer stuff, check out swspookshare.sourceforge.net

Contents

Spookware Implementation of SpookShare

Overview

This is the Spookware implementation of SpookShare (SWSpookShare). It is, at the time of this writing, the only implementation of the SpookShare protocol so far. I hope people will write their own implementations, because none of the ones I write seem to work. They're always full of bugs, or, if not bugs, things that seem like bugs unless you know exactly what you're doing. They are very non-user-friendly. I think this is mainly because of the limitations imposed by running as a CGI over an existing HTTP server.

Versioning

I don't know if there's a standard way to do version naming, but here's the system I use:

a.b.c.d

History of version

Requirements

Description

This is the first (hopefully not the last) implementation of the SpookShare protocol (which you can read about in the help file). It is written in perl, so it should be fairly portable, except that the programs call eachother assuming that they're running under a Un*x shell (using a shebang - DOS users should be able to get around this by creating batch files).

Contacting the author

If you have any question, suggestion, etc., feel free to contact T.O.G. of Spookware at chumps_53705.arm@yahoo.leg.com (amputate to send)

SWSS User's Guide

Initial setup

SWSS requires an HTTP server, and perl5. You can get both these things for free, if you don't already have them. I reccomend Apache (for Windows or Unix).

To get started, just dump the SpookShare directory in your cgi-bin (unzip SWSpookShare-x.x.x.zip directly into your cgi-bin, and it should make it's own subdirectory). The 'node' is ss.cgi, which is what you tell people about if you're advertising yourself. admin.cgi is the program which is used to do administration stuff, like advertize your node, clean out files, and retrieve data from elsewhere.

Windows users will have to do a little fiddling to get things to work. If you're using apache, or any web server that supports shebang lines, edit the shebang lines in ss.cgi and admin.cgi so that they use your perl path (#!c:/perl/bin/perl.exe, or whatever). If your server relies on file extentions, then change them to 'pl', or whatever, and then edit the lines in data/config.sssd that refer to ss.cgi and admin.cgi and change them, too.

For Un*x users, make sure http server has write permission on the data directory, the junk directory, and everything they contain (this is VERY important).

Once you have all that in place, point your web browser at admin.cgi (or ss.cgi, which should give you a link 'Administration', up at the top). Once you're there, you should be able to follow the rest of this.

You must provide admin.cgi with a password when you use it so that people can't fiddle with your stuff. I find it all to be fairly self-explainatory, but that doesn't say much, because I wrote the thing. The first thing you should do is, if your system clock is wrong by more than a minute or so, edit the variable named nowoffset under 'General configuration'. This number will be added to your system clock time whenever the current time is needed. Put a number in here that will make the time given be as close as you can get to UTC. (If your system clock is set correctly, you shouldn't have to deal with this). If you have a clock that is very accurate, set the variables 'auth_now' and 'auth_now_sysclock' to 'OK'. 'General configuration' lets you change a bunch of simple variables that are used alot. I forgot what they all do, though (see list below).

If you are going to share your files, you should then generate a list of your files. Go into 'make file list'. This should be pretty self-explainatory. You can add extra information to your files by creating an SSSD file with the same name as a file except with the extra extention '.meta.sssd' in the same directory as the file. This file should only have 'set property=value' lines, and should not set size. expires, or CS anything. You only need to run makefilelist when you add or remove files. If you have LOTS of files (few million or so), then it might not be so efficient to re-compile the whole list each time, but this should be OK for most people.

Next, you should be able to advertise either your files or your node. If you have a lot of files indexed, it might be better to advertise your node, because if you advertise your files, you have to upload your entire list of files to someone elses node. If you advertise your node, searches will get forwarded to your machine, and you won't need to upload much data while advertising. Follow the link labeled 'Advertize'. Again, most of the default parameters should be okay. You must specify the URI of your node if you are advertising your node, or your server root if you are advertizing only your files (you can use the address of your node for advertising your files, as long as their URIs start with a slash) . You do not need to know your host address. If you use $host_me, in place of your host address, the nodes you advertize to will replace it with your address. You can also change the node(s) which you advertize to. If the 'pick servers to advertize...' checkbox is checked, the advertize program will pick servers out of your netdata file to advertise to (stuff is added to your netdata file when you run getdata or someone advertises to you). Otherwise, you must specify them in the boxes below. Expires is the minimum amount of time (in seconds) that your data (which you are advertising) is expected to be accurate. If you are subject to being disconnected once in a while (if you're not on a permanent connection), this should be set to a relatively low value, like 120 or 300. That way, if you are suddenly disconnected, you will not be able to re-post your data, and it will soon expire, and people will not see it any more so will not try to get your files. When your data expires, you'll need to re-advertize your node/files. If you want this to be done automatically, choose 'write configuration file' from the drop down list next to the submit button (this will cause your settings to be saved without actually advertizing your files), and after you submit the form, start the 'advertize.pl' program in the ss directory (from the command prompt or equivalent (perl advertize.pl; Windows users can double-click advertize.pl in explorer)). This will run the advertize program, and tell it to re-advertize your data when it expires, until it can no longer reach any of the hosts you're advertizing to (or until you stop it with kill or C-c or whatever). You'll probably want to run it in the background or it's own terminal/window, or pipe the output into a log file (like junk/advertize.log). If you don't run advertize.pl, then you'll have to go back and re-advertize your stuff every few minutes (or whatever you set your expire time to) manually (by submitting the form)

Other stuff

Getdata: This program will get information about other SpookShare nodes and add it to your netdata file. By default, it is automatically called about 1 out of every 5 times a search is forwarded.

Clean expired data: Goes through a datafile and cleans out all expired data. This will help speed up searches. By default, it is automatically called about 1 out of every 5 times someone posts data to your node.

Advertize: Posts data about your node and/or the files indexed by makefilelist to other spookshare nodes.

Make file list: Compiles a list of files on your web server into an SSSD file, which can be searched by your node, or uploaded to another node using advertize.

General Configuration: Set general configuration parameters, described below; most of these no need to be messed with

Configuration variables

SpookShare Protocol

SpookShare is a protocol used to advertize and search for resources on a network. It specifies a data format (sssd), and a set of parameters for requesting such data. CGI/HTTP is the standard way to transfer data, and should be recognized by all nodes. SpookShare data is made up of 'resource descriptions', which contain a number of named variables, such as URI, size, type, bitrate, conspeed (connection speed of host), CS/scheme (checksum) or anything else you can think of. If this data is stored on a SpookShare node, people can search for it via the SpookShare node's CGI. Search requests can also have a TTL (time to live) value. If the number of results desired is not found on the node, and the TTL is greater than 0, the node recieving the request may (but is not required to) forward the search to another node.

Request Parameters

Following is a list of request parameters and the values to which they may be set. These requests are commonly done via CGI (action=list&output=html&...).

CGI parameters:

  • datafile=datafile - for list and putdata. Specifies which datafile to look in or post to. If not specified, servers will often look through multiple data files for list, or revert to net for list and putdata. Sometimes that is good, sometimes bad. Specify or don't specify this variable accordingly. Specifying this will usually cause the search to not be forwarded. datafile names do not neccessarily correlate to an actual filename. Standard datafile names include:
  • output= - how you want the data to be presented. If the node cannot present the specified data in the specified format, it should give an error (HTTP 500) message. Standard formats are: sssd, and html
  • TTL=somenumber - number of times you want a search to be forwarded if maxresults are not found in your lists
  • data=sssd - data which you want to be posted in their datafile - characters which have a special meaning to HTTP or CGIs such as &, +, and space, will have to be escaped (% followed by their 2-digit hex value) (spaces can usually be sent as +, also). Also, certain strings will be replaced by the recieving node:
  • now=seconds - what you want the remote host to use as current time; if left unspecified (as it should be most of the time), the remote host will use what it thinks to be the current time.
  • Search parameters
  • sp_maxresults=somenumber - maximum number of resource descriptions you want sent
  • sp_regexp_someproperty=regularexpression - some property (or 'any' to match any) must match this regular expression
  • sp_words_someproperty=words - some property (or 'any' to match any) must contain these words (words is a space-separated list)
  • sp_min_someproperty=number - some property must be a minimum of this (for numerical properties such as conspeed or size)
  • sp_max_someproperty=number - someproperty must be a maximum of this (for numerical properties such as conspeed or size)
  • sp_eq_someproperty=string - value of someproperty must match this string exactly
  • sp_miscsearchflag - miscellaneous search parameters
  • sp_dont_expire= - set this (to something other than 0) if you don't want the server to automatically expire old data
  • sp_do_autoexpire= - set this if you want data missin either 'now' or 'expires' properties to be automatically expired
  • The 'words_any' will prolly be the most commonly used parameter, since it's easy for humans to use. The regular expressions, however, are the most flexible and powerful.

    Rules for servers (0.4)

    Data

    SpookShare data is pass in format with name 'SSSD' or 'Simple SpookShare Data' (Data Simple of SpookShare). SSSD consist of series of command of one line, which describe multiple server, file on server, node of SpookShare, and other resource. It work by set some property with name by say:

    Some property no longer in use can be remove by say:

    Once value of each property is set, a resource description is create with all value of property current when command is use:

    Here is example of data standard:

    name of propertyvalue standardwhat it mean
    restype SSserv (node of SpookShare),
    HTTPserv (server of HTTP),
    file,
    web (page of web HTML based),
    SSSD (document of SSSD),
    junk (this RD is junk),
    message (this RD is a message)
    typetext/html, image/jpeg, etctype MIME
    URI_baseURI which follow are relative to this
    URIURI of resource
    sizefor file, number of bytes
    CS/schemechecksum, where scheme is the type of checksum (this is not yet standardized - send suggestions)
    conspeedspeed in kbps of host
    nowtime at which this resource description be post, unit is number second since 1970
    expiresnumber of second after 'now' at which this describe of resource might not is accurate

    For name of property of media specialize (like MP3), name of property should is 'special_subspecial' (example: MP3_bitrate)

    Here is example of SSSD:

    null A
    set URI=http://www.yahoo.com/
    set description=Yahoo. It have directory of web spiffy.
    set conspeed=1024
    set now=995500000
    set expires=30000000
    create res
    

    (Such large 'expires' should not is use unless resource is very reliable)

    1