readme for SWSpookShare-0.4.0.2
For newer stuff, check out swspookshare.sourceforge.net
This is the Spookware implementation of SpookShare (SWSpookShare). It is, at the time of this writing, the only implementation of the SpookShare protocol so far. I hope people will write their own implementations, because none of the ones I write seem to work. They're always full of bugs, or, if not bugs, things that seem like bugs unless you know exactly what you're doing. They are very non-user-friendly. I think this is mainly because of the limitations imposed by running as a CGI over an existing HTTP server.
I don't know if there's a standard way to do version naming, but here's the system I use:
a.b.c.d
timeauth
to auth_time
R/web
to readme.page_
actions. They are now included in the file,
and the files are named .html
instead of
.phtml
auth_time
to auth_now
createR
to create res
This is the first (hopefully not the last) implementation of the SpookShare protocol (which you can read about in the help file). It is written in perl, so it should be fairly portable, except that the programs call eachother assuming that they're running under a Un*x shell (using a shebang - DOS users should be able to get around this by creating batch files).
If you have any question, suggestion, etc., feel free to contact
T.O.G. of Spookware at
chumps_53705.arm@yahoo.leg.com
(amputate to send)
SWSS requires an HTTP server, and perl5. You can get both these things for free, if you don't already have them. I reccomend Apache (for Windows or Unix).
To get started, just dump the SpookShare directory in your cgi-bin (unzip SWSpookShare-x.x.x.zip directly into your cgi-bin, and it should make it's own subdirectory). The 'node' is ss.cgi, which is what you tell people about if you're advertising yourself. admin.cgi is the program which is used to do administration stuff, like advertize your node, clean out files, and retrieve data from elsewhere.
Windows users will have to do a little fiddling to get things to work. If you're using apache, or any web server that supports shebang lines, edit the shebang lines in ss.cgi and admin.cgi so that they use your perl path (#!c:/perl/bin/perl.exe, or whatever). If your server relies on file extentions, then change them to 'pl', or whatever, and then edit the lines in data/config.sssd that refer to ss.cgi and admin.cgi and change them, too.
For Un*x users, make sure http server has write permission on the data directory, the junk directory, and everything they contain (this is VERY important).
Once you have all that in place, point your web browser at admin.cgi (or ss.cgi, which should give you a link 'Administration', up at the top). Once you're there, you should be able to follow the rest of this.
You must provide admin.cgi with a password when you use it so that people can't fiddle with your stuff. I find it all to be fairly self-explainatory, but that doesn't say much, because I wrote the thing. The first thing you should do is, if your system clock is wrong by more than a minute or so, edit the variable named nowoffset under 'General configuration'. This number will be added to your system clock time whenever the current time is needed. Put a number in here that will make the time given be as close as you can get to UTC. (If your system clock is set correctly, you shouldn't have to deal with this). If you have a clock that is very accurate, set the variables 'auth_now' and 'auth_now_sysclock' to 'OK'. 'General configuration' lets you change a bunch of simple variables that are used alot. I forgot what they all do, though (see list below).
If you are going to share your files, you should then generate a list of your files. Go into 'make file list'. This should be pretty self-explainatory. You can add extra information to your files by creating an SSSD file with the same name as a file except with the extra extention '.meta.sssd' in the same directory as the file. This file should only have 'set property=value' lines, and should not set size. expires, or CS anything. You only need to run makefilelist when you add or remove files. If you have LOTS of files (few million or so), then it might not be so efficient to re-compile the whole list each time, but this should be OK for most people.
Next, you should be able to advertise either your files or your
node. If you have a lot of files indexed, it might be better to
advertise your node, because if you advertise your files, you have to
upload your entire list of files to someone elses node. If you
advertise your node, searches will get forwarded to your machine, and
you won't need to upload much data while advertising. Follow the link
labeled 'Advertize'. Again, most of the default
parameters should be okay. You must specify the URI of your node if
you are advertising your node, or your server root if you are
advertizing only your files (you can use the address of your node for
advertising your files, as long as their URIs start with a slash)
. You do not need to know your host address. If you use $host_me
,
in place of your host address, the nodes you advertize to will replace
it with your address. You can also change the node(s) which
you advertize to. If the 'pick servers to advertize...' checkbox is
checked, the advertize program will pick servers out of your netdata
file to advertise to (stuff is added to your netdata file when you run
getdata or someone advertises to you). Otherwise, you must specify
them in the boxes below. Expires is the minimum amount of time (in
seconds) that your data (which you are advertising) is expected to be
accurate. If you are subject to being disconnected once in a while (if
you're not on a permanent connection), this should be set to a
relatively low value, like 120 or 300. That way, if you are suddenly
disconnected, you will not be able to re-post your data, and it will
soon expire, and people will not see it any more so will not try to
get your files. When your data expires, you'll need to re-advertize
your node/files. If you want this to be done automatically, choose
'write configuration file' from the drop down list next to the submit
button (this will cause your settings to be saved without actually
advertizing your files), and after you submit the form, start the
'advertize.pl' program in the ss directory (from the command prompt or
equivalent (perl advertize.pl; Windows users can double-click
advertize.pl in explorer)). This will run the advertize program, and
tell it to re-advertize your data when it expires, until it can no
longer reach
any of the hosts you're advertizing to (or until you stop it with kill
or C-c or whatever). You'll probably want to run it in the background
or it's own terminal/window, or pipe the output into a log file (like
junk/advertize.log). If you don't run advertize.pl, then
you'll have to go back and re-advertize your stuff every few minutes
(or whatever you set your expire time to) manually (by submitting the
form)
Getdata: This program will get information about other SpookShare nodes and add it to your netdata file. By default, it is automatically called about 1 out of every 5 times a search is forwarded.
Clean expired data: Goes through a datafile and cleans out all expired data. This will help speed up searches. By default, it is automatically called about 1 out of every 5 times someone posts data to your node.
Advertize: Posts data about your node and/or the files indexed by makefilelist to other spookshare nodes.
Make file list: Compiles a list of files on your web server into an SSSD file, which can be searched by your node, or uploaded to another node using advertize.
General Configuration: Set general configuration parameters, described below; most of these no need to be messed with
datafile_net
- location of your network data
filedatafile_new_net
- location where new network data
is written (can be the same as datafile_net
)datafile_messages
- location of your message filedatafile_new_messages
- location where new messages
are written (can be the same as
datafile_messages
)datafile_myfiles
- where the list of your files is
keptcanflock
- whether or not your system can do flock
- should be turned on if possible (put 'on' in there)auth_now
- do you know what time it is?auth_now_sysclock
- does your sysclock know what time
it is?max_connections
- numbver of SpookShare nodes to
forward a search to if TTL>0myname_ss
- location of your ss.cgimyname_admin
- location of your admin.cgiMOTD
- message of the daymax_TTL
- any TTL greater than this will be reduced
to thisnowoffset
- number of seconds to add to the time
given by your system clockadminpassword
- password required to use the
Administration programdefault_maxresults
- maxresults to use if not
specifiedmax_maxresults
- any macxresults higher than this
will be lowered to this unless a specific datafile (no forwarding)
is specifiedpostperclean
- inverse of chance that network
datafile will get cleaned each time it is given data. Set to 0 for
never.listperget
- inverse of chance that gatdata will be
run each time a search is forwarded. Set to 0 for never.SpookShare is a protocol used to advertize and search for resources on a network. It specifies a data format (sssd), and a set of parameters for requesting such data. CGI/HTTP is the standard way to transfer data, and should be recognized by all nodes. SpookShare data is made up of 'resource descriptions', which contain a number of named variables, such as URI, size, type, bitrate, conspeed (connection speed of host), CS/scheme (checksum) or anything else you can think of. If this data is stored on a SpookShare node, people can search for it via the SpookShare node's CGI. Search requests can also have a TTL (time to live) value. If the number of results desired is not found on the node, and the TTL is greater than 0, the node recieving the request may (but is not required to) forward the search to another node.
Following is a list of request parameters and the values to which they
may be set. These requests are commonly done via CGI
(action=list&output=html&...
).
CGI parameters:
action=
actionname - tells what you want the remote
host to do, options are:page_
pagename - give me the page called pagename
(this is to help a human interface with it)
standard pages include:index
- an index page with links to other
pagessearch
- a page with a search formhelp
- a page with instructions for end-userslist_now
- give the current time, and some other
miscellaneous information:auth_now
- authority with respect to time - empty
string for bad, otherwize, goodnow
- current timespookshare_version
- version of SpookShare
protocol being usedaddress_you
- host address of requesterlist
- lists all data matching search parametersdatafile=
datafile - for list and putdata. Specifies
which datafile to look in or post to. If not specified, servers
will often look
through multiple data files for list, or revert to net for list
and putdata. Sometimes that is good, sometimes
bad. Specify or don't specify this variable
accordingly. Specifying this will usually cause the search to not
be forwarded. datafile
names do not neccessarily correlate to an actual filename. Standard
datafile names include:net
- network datamyfiles
- files stored on the same machine as the
nodemessages
- message board/guestbook kind of file,
whose contents are usually expired in a different way (or not at
all)output=
- how you want the data to be
presented. If the node cannot present the specified data in the
specified format, it should give an error (HTTP 500)
message. Standard formats are: sssd
, and
html
TTL=
somenumber - number of times you want a search to be
forwarded if maxresults are not found in your listsdata=
sssd - data which you want to be posted in
their datafile - characters which have a special
meaning to HTTP or CGIs
such as &
, +
, and space, will have
to be escaped (%
followed by their
2-digit hex value) (spaces can usually be sent as +
,
also). Also, certain strings will be replaced by the recieving node:$now
with the current time$host_me
with the requester's host addressnow=
seconds - what you want the remote host to use
as current time; if left unspecified (as it should be most of the
time), the remote host will use what it thinks to be the current
time.sp_maxresults=
somenumber - maximum number of resource
descriptions you want sentsp_regexp_
someproperty=
regularexpression
- some property (or 'any' to match any) must match this regular
expressionsp_words_
someproperty=
words - some property (or
'any' to match any) must contain these words (words is a
space-separated list)sp_min_
someproperty=
number - some
property must be a minimum of this (for numerical properties such
as conspeed or size)sp_max_
someproperty=
number - someproperty
must be a maximum of this (for numerical properties such as
conspeed or size)sp_eq_
someproperty=
string - value of
someproperty must match this string exactlysp_
miscsearchflag - miscellaneous search parameters
sp_dont_expire=
- set this (to something other than
0) if you don't want the server to automatically expire old
datasp_do_autoexpire=
- set this if you want data missin
either 'now' or 'expires' properties to be automatically
expiredThe 'words_any' will prolly be the most commonly used parameter, since it's easy for humans to use. The regular expressions, however, are the most flexible and powerful.
$now
with the current
time, and $host_me
with the remote IP or domainSpookShare data is pass in format with name 'SSSD' or 'Simple SpookShare Data' (Data Simple of SpookShare). SSSD consist of series of command of one line, which describe multiple server, file on server, node of SpookShare, and other resource. It work by set some property with name by say:
set
nameofproperty=
valueSome property no longer in use can be remove by say:
null A
- nullify all propertynull F
- nullify all property associate
with file in normal, but not with server (specific: all property except
conspeed
, now
, expires
, and
URI_base
)null E
space separated list - nullify all
property except property with name is specify in listOnce value of each property is set, a resource description is create with all value of property current when command is use:
create res
Here is example of data standard:
name of property | value standard | what it mean |
---|---|---|
restype |
SSserv (node of SpookShare),HTTPserv (server of HTTP),file ,web (page of web HTML based),SSSD (document of SSSD),junk (this RD is junk),message (this RD is a message) |
|
type | text/html , image/jpeg ,
etc | type MIME |
URI_base | URI which follow are relative to this | |
URI | URI of resource | |
size | for file, number of bytes | |
CS/scheme | checksum, where scheme is the type of checksum (this is not yet standardized - send suggestions) | |
conspeed | speed in kbps of host | |
now | time at which this resource description be post, unit is number second since 1970 | |
expires | number of second after 'now' at which this describe of resource might not is accurate |
For name of property of media specialize (like MP3), name of
property should is 'special_
subspecial' (example:
MP3_bitrate
)
Here is example of SSSD:
null A set URI=http://www.yahoo.com/ set description=Yahoo. It have directory of web spiffy. set conspeed=1024 set now=995500000 set expires=30000000 create res
(Such large 'expires' should not is use unless resource is very reliable)