The Filesharing Networks

I need an introduction. I don't have one. Deal with it. I'm going to describe the differences between the major internet-based file-sharing networks out there (as well as SpookShare and Freenet), because that's something I know about.

Napster was the first widely used filesharing network out there, so I'll start with that. Napster works by having users download a Napster client which will run on their computer. The client makes a single [TCP] connection to one of many servers run by the Napster company. Over that connection, it sends the server information about itself, such as the names of the files available for download, your connection speed, etc. The server takes that information and puts it in a database. When you type a search into your client, the search parameters are sent along your connection to the central server. That server then looks through it's database, and sends back the addresses of the files that match which have been posted by others (said address includes both the address of the host machine and the location of the file on that machine). In order to download the file, your client makes a separate connection directly to the machine hosting that file, and transfers the file using some weird protocol (not HTTP, I think). Note the reliance on the central server for doing searches. That's a Bad Thing, because if there's a power outage or a cable cut or the RIAA doesn't like what your doing, that central server will fail and nobody will be able to find eachother's files anymore.

Some guy at Nullsoft thought he'd make a file sharing network that didn't rely on any central servers at all, and came up with Gnutella. Gnutella works without any central server whatsoever. When you start up your Gnutella client, it connects to a few other Gnutella clients. When you do a search, the search parameters are sent along all of those connections at once, with a TTL of something like 7 to 16. Every client that gets the search request looks through its collection of files and sends the addresses of any matches back to you, then decrease the TTL by one and send the search to all of their neighbors, who do the same, until the TTL reaches 1. That avoids reliance on a central server, but there are serious scalability issues here. Say you send out a search to 3 neighbor nodes with a TTL of 10. They each forward the search to 3 or 4 other neighbor nodes, and so on. The search ends up being sent over 3^10 (that's 59049) times. That uses up a bunch of bandwidth. When you have more than maybe a hundred people doing this at once, it uses up enough bandwidth that you're not going to have much left over for anything else.

Something else I should mentiona bout Gnutella is that it transfers files over HTTP, the same way your browser retrieves web pages. It also does this directly between the machine with the file, and the machine which initiated the search for the file (if they chose to download it). This makes Gnutella not anonymous unless you only do searches and never download anything, because people can tell who it is that is actually downloading a file.

The next generation of filesharing networks took an approach somewhere between having centralized servers and having everyone be at the same level. Networks like KaZaA have hosts which meet certain criteria, such as having sufficient bandwidth and CPU power, act as 'SuperNodes'. Supernodes do a lot of the dirty work that every Gnutella client had to do. I'm not quite sure how KaZaA works, because it's closed source (bleck), but it works pretty much like Napster, except that there are a lot more servers (the SuperNodes), and they pass information between them. New SuperNodes pop in and out of the network all the time, and they're run by anyone who happens to have enough bandwidth and CPU time, not a single company, making KaZaA almost as hard to shut down as Gnutella, but much more scalable.

KaZaA, like Gnutella, uses plain old HTTP to transfer files. Unfortunantely (or possibly fortunately, but I don't like it), KaZaA uses a completely proprietary protocol for organizing the network and doing searches, and their client is Windoze-only, so I'm up a creek if I'm running Linux. It also crashes alot. I say having a proprietary protocol may be a good thing, because then you don't have every Joe Schmoe and his brother writing a client. You did have with Gnutella, and it seems like a lot of the clients just didn't work. I haven't been able to download anything off Gnutella for a long time. When I am able to, I usually get 99% of the file and then the download stops. On KaZaA, when I try to download a file, I almost always get it. However, the fact that I have to use their (Win only) software to access the network really turns me off. I was fairly happy with the Gnutella software model (open protocol), but I didn't like the scalability problems.

I invented a little network called SpookShare. It is made up of a bunch of HTTP servers with a little CGI program running on them. All searches are done completely over HTTP. Running all over HTTP does use up more bandwidth than it should, though, because for each search you do, you must make a new TCP connection. It does, however, make for a very open and easy-to-implement protocol.

SpookShare works like this: Someone running an HTTP server (this could be you, as the best HTTP server (apache) is free for Windows or Un*x) wants to share their files. Now that they're running the server, anyone who knows about those files can go get them, but if, say, the server is on a dial-up connection, it'll be hard to give people your address before it changes. That's where SpookShare comes in. As long as there are a few SpookShare nodes on permanent connections, the person who wants to share their files can post their addresses to one of those well-known nodes, where anyone can find out about them. If you do a search on SpookShare (go to a SpookShare server and put in a few words, just like you do with Google or Yahoo), the server will look through all the addresses of files that people have uploaded, like a KaZaA supernode. Searches are done depth-first, until the desired number of results is found, instead of everyone-send-this-to-all-your-friends, as to avoid Gnutella-style bandwidth wastage. If the server doesn't find the desired number of files, it asks another SpookShare server, until the TTL runs out. The search then 'backs up' and tries again. The client I have written for SpookShare (SWSpookShare, or Spookware SpookShare) is written in perl, and requires a separate HTTP server. It can run on Windows or Un*x.

(If you would like to try SpookShare, run a permanent node, or help develop (all would be greatly appreciated), check out http://swspookshare.sourceforge.net/.)

I should also mention Freenet. Freenet works completely differently than any of the above networks, because it is meant to do something different. Whereas the networks I mentioned before have a special way to distribute searches, and leave actual transfers of files up to conventional methods (such as HTTP), Freenet is all about the moving files around. Freenet was designed as an information storage/retrieval system that would make it almost impossible to remove any specific piece of information, if it was popular. Freenet dows not do searches. Instead, you must know the exact 'key' of a resource (a file), in order to download it. To retrieve the resource with that key, you ask a single neighbor about it. Requests are done depth-first, like SpookShare searches. If the file with that key is ever found (doesn't always happen - if a piece of information is not requested enough it will dissapear), the file itself - not a message giving the address of the file - is sent back through the chain of nodes to he who initiated the request. Every node that the file passed through will keep a copy of the file. This will cause files to actually move closer to where they are more popular. A side effect of that is that it makes it nearly impossible to know who initiated the request, making Freenet users, and those who post information on Freenet, almost completely anonymous. Also note that certain countries (China) are trying to block access to the Freenet website (so it must be good, right?).

And that's all I have to say about that.

- T.O.G. of Spookware