PRELIMINARY Version 2.0 Wednesday, September 15, 1999 To be submitted to: EDMEDIA2000, Montreal, June, 2000. Distributed Repository of Programming Examples Tomasz Müldner Jodrey School of Computer Science, Acadia University, Wolfville, N.S. B0P 1X0 Canada email: tomasz.muldner@acadiau.ca Vicky Shiv Jodrey School of Computer Science, Acadia University, Wolfville, N.S. B0P 1X0 Canada Abstract: We present a description of a fully distributed system that can be used to distribute data among any number of authenticated users. The name server is used only for authentication; otherwise the load is evenly distributed. The system has been used to create a distributed repository of programming examples.
With the increased availability of the Internet, personal computers and fast networks, there are many research efforts directed at various network applications, and in general, distributed software. Another reason for the interest in this research is the growing number of universities, which have built fully computerized campuses. For example, Acadia University, see [MacDougall et al. 98], has been involved in building such a campus; it provides fast network connections for all students living on the campus as well as electronic classrooms (a setting in which every student and the teacher have access to a personal computer, and all the computers are networked allowing for various types of computer-supported interaction, see [Shneiderman et al 95]). For another example of a computerized campus, see [Holmes 96]. In this paper we tackle the issue of distributing data among a number of networked computers. We consider both, connected and partially connected computers (the latter are computers, which are not permanently connected, and can be reconnected in various networks). This type of arrangement is typical for computerized campuses, but also applies to mobile users who travel with their laptops. There are many applications for which one would wish to distribute information, but here we concentrate on a specific case to support an example-driven learning process; namely a distributed repository of programming examples, for instance examples of programs in C, or Java. Examples are useful if they can be easily browsed and searched, and various users can share them. When learning programming, examples are particularly useful because in this case one always learns from examples of small programs. This paper is organized as follows. In the first section, we provide basic concepts of distributed information systems and discuss various models. The second section presents our system, the Distributed Repository of User-Classified dOcuments, DRUCO; and the final section describes a case study, a repository of programming examples in C.
1. Distributed Information Systems A distributed information system can be defined as a system where documents containing some information are distributed across multiple machines connected by a network. Therefore, data (or, documents) are accessible as a shared resource, see [Booth 81]. These systems are useful because the collective storage of multiple computers provides a more powerful system. Additionally, with the duplication of resources, the failure of one component does not necessarily imply losing the entire set of data. Thus, distributed systems provide parallelism and fault tolerance, making them potentially much more powerful than their individual components, see [Mullender 89]. In the context of computerized campuses and a wide accessibility of the Internet, a distributed information system has an additional important task to fulfill; namely to support information sharing. Below, we review various types of information sharing systems. The most basic thing we want to be able to do is to exchange and share information. From now on, we will assume that the information is stored in a document, which serves as a persistent medium for this information (and therefore, we will use term "document" interchangeably with the term "information"). A document is not necessarily the same thing as a file; it could be a file, a number of files logically grouped together, or an entry in the database. (Note that this definition of a distributed system is more general than that of a distributed file system, such as Coda, see [ Satyanarayanan, M. 90] ) There are three basic considerations when it comes to sharing documents using an information system: operations on the system, security of the system, and the interface to the system. For a system to be useful, it has to provide various operations such as search, browse, compare, export/import from and to the file system, and others. Some systems support only homogenous information, that is force all the users to use the same type of documents. Other systems may support more heterogeneous documents as long as they conform to a certain standard. For example, XXX uses ontologies to translate one type of information into the other. The system should be secure by allowing each user to specify for every document, who are the users for whom this document is accessible (for example, to view, copy, traverse, etc.). This functionality can be provided by associating with each document the list of users and their passwords; then the user has to be authenticated before they can obtain any information. As an alternative, digital signatures can be used, see [Greenleaf G. 97]. The interface to the system determines one of two possible roles the user has to play to obtain documents that belong to other users. These roles roughly correspond to pulling information and pushing information. If a system supports only pulling information, then the user has to play an active role in obtaining a required information. A notification system can be added to notify a user for example when new information becomes available (addition notification) or is changed (change notification). An information push system tries to push required information to interested user. These systems may be profile-based, or context- and situation-based; see [Van de Velde at al. 97]. An alternative would be to consider active, autonomous providers, such as software agents, using planning to search for the required information; see [Huhns 98]. A system of intelligent agents, such as Cooperative Information System, CIS, see [Papazoglou 92] can actively reason about the information it contains and search for these pieces of information by connecting to other systems. Now, we elaborate on the push and pull models. In order to describe the task of exchanging and sharing information in a pull model, we consider two kinds of applications, here called respectively providers and fetchers. A provider application gives access to available information for authorized users; a fetcher application is designed to fetch or browse information from one or more providers. Typically, a provider is implemented as a server, and a fetcher as a client; for more information about clients and servers, see [Mullender 89]. A single application may be both a provider (server) and a fetcher (client) at the same time. For an application A to communicate with another application B, A has to be able to locate B, using some kind of a naming system. A standard convention used by the Internet, is to use Universal Resource Locators, URLs, see XXX. When A provides Bs URL, the Domain Name Server, DNS finds Bs IP address, and now A can use this address to communicate with B. Unfortunately, the above technique does not work if B is currently off-line, or B uses an Internet service Provider, ISP which provides dynamic IP addresses. Therefore, often we assume that there exists additional name server residing at a "well-known" static IP. When the application B goes on-line, it connects to the name server, which at that time retrieves its current IP and associates it with the applications name. When the application A wants to connect to the application B, it does it through the name server, which guarantees that B is on-line and its current IP address is known. (Additionally, the name server can be used to store user names and passwords, and to authenticate users.) The best known example of the above model is that of a Web server (provider) and a Web browser (fetcher). The user of a browser pulls information provided by the Web server. Recently, Internet Explorer introduced channels, which are lists of web sites that are regularly (with the required frequency) pushed onto the users desktop. However, this is not a real push model; instead it is a "scheduled" pull. As mentioned above, the push model is best implemented using mobile agents (for the description of mobile agents, see [Wong D. 99] ). The user interested in an operation, such as search, or compare may create an agent to move to various sites and perform this operation on her or his behalf (the agent may perform some action on that site, if allowed, or it may collect the information and carry it to its home site). While it is easy to see how to specify the required information, it is more difficult to decide where the agent should go to perform the required task. Here, there are several possible solutions. First, a solution similar to one described above, i.e. using a central naming server can be used. The server will know of all kinds of information and the corresponding sites. Of course, instead of using a central server, a distributed naming service such as XXX can be used. The problem is that all sites associated with a single type of information will have to be homogenous; for example the information about technical reports will have be provided in a consistent manner. It would be best, if there is a generally accepted standard for specifying data, and for this reason below we briefly describe XML. An eXtensible Marking Language, XML, see [David S. 99], is a meta-language that allows the user to create a specialized language to define a structure of the document. XML has been used in a variety of applications, for example to define Internet Explorers channels, see [Microsoft]. The main strength of XML is that it provides a domain specific standard, and XML applications support operations such as search, compare and merge. For each XML-generated language there is a Document Type Definition, DTD, which is basically a grammar for this language. Therefore, an XML document can be verified against its DTD. In addition, an XML document can be easily converted to a different XML format, if the conversion is defined by another DTD. XML is accompanied by an eXtensible Style Language, XSL and an eXtensible Link Language, XLL. XSL provides formatting instructions to display an XML document, while XLL supports links that generalize HTML links, to rectify the problem of maintenance of links, and to allow to create links for read-only media. Therefore, these products; XML, XSL and XLL provide three independent ways to collectively define the documents structure, format and links. In the absence of XML, one may want to use an information system, in which the user explicitly provides her or his own structure of the document. In the following section we describe a specific example of a general distributed information system, based on user-defined classifications, called Distributed Repository of User-Classified dOcuments, DRUCO.
2. Distributed Repository of User-Classified Documents Here we consider a system of documents, in which every user can define her or his classifications. Each document may be stored or classified within one or more classifications, and classifications may be nested. Therefore, in a classical file system, classifications resemble folders, or directories, and documents resemble files. The entire system can be seen as a tree; leaves, however, may have more than one father (classification). The user will be able to make a part of its system available to other users; and pull (download) a classification from another user (this classification corresponds to the node of the tree; and the entire subtree rooted at this node may be pulled). Permissions may be set so that only selected users will have access to some documents. In what follows, we will refer to this specific information system as a distributed repository of user-defined classifications, or briefly as a repository. DRUCO is a system designed to satisfy the above requirements. In this system, a single unit that resembles a file represents each document. DRUCO documents are stored separately from the file system, but they can be easily imported to, and exported from that system. Now, we briefly describe the design and functionality of DRUCO. It is a program that can operate both, in off-line mode or in on-line mode, and in the latter mode as a server, or as a client. All the operations that can be performed in off-line mode can also be performed in on-line mode. Off-line operations facilitate organizing a repository, that is creating new classifications, modifying and deleting existing classifications, importing and exporting, and browsing and viewing documents; see Figure 1. http://evilqueen.acadiau.ca/montreal2000/figure-1.jpg (Figure 1: DRUCO in off-line mode). In Figure 1 the leftmost pane shows the current state of the repository. Folder icons indicate whether or not they are currently folded. The middle pane shows the current state of the DownloadSpace; the outermost folder (Shiv) is a placeholder for all the documents that have been pulled. The rightmost pane can be used to show the contents of the selected file.The user can use menus/right button/drag and drop. Also, in off-line mode, the user may specify "permissions", that is give or revoke the right to access documents. Here, a classification is the smallest unit; that is the user can grant a permission to one or more users for a specific classification; then, these users can view and possibly pull all documents in this classification. However, whether or not they will be able to view the contents of any nested classifications, depends on their permissions for these classifications. (Below, we describe how the user can get the list of all registered users.) To manage permissions, the user selects the SecurityManagement pane, see Figure 2. http://evilqueen.acadiau.ca/montreal2000/figure-2.jpg (Figure 2: DRUCO in off-line mode).The UserList pane shows the list of registered users (see Figure 2); the user can select folders and documents in the leftmost pane, and use buttons to assign and revoke permissions. To facilitate the location of other users, especially in the presence of users who use dynamic IP addresses, a central name server has been incorporated into DRUCO. An administrator of this name service manages users, that is registers these users, assigns passwords, etc. The name server is running on a computer with a "well-known" static IP address, which is provided in the configuration file read when the users application starts. The interface to the name server is shown in Figure 3. http://evilqueen.acadiau.ca/montreal2000/figure-3.jpg (Figure 3: DRUCOs name server). The administrator manages the name server; that is adds and removes users, and sets their passwords, see Fig. 4. http://evilqueen.acadiau.ca/montreal2000/figure-4.jpg (Figure 4: Administrator's name server). On-line operations can be divided into two types; interactions with the name server and interactions with other users. In order to interact with the name server, the user has to login to the server, by providing the name and password. Upon the successful login, the name server retrieves the current IP of this user, and saves this information; therefore DRUCO supports users who have dynamic IP addresses, for example those using ISPs. The user may perform the following operations that interact with the name server: change the password, retrieve the list of all registered users, and retrieve the list of all active users (i.e. users who are currently on-line). The list of active users is used to start an interaction with another user; to connect to another user it is enough to select this user from the active user list, see Figure 5. http://evilqueen.acadiau.ca/montreal2000/figure-5.jpg (Figure 5: DRUCO in on-line mode). Note that this connection doesnt require an authentication, because both users must be currently connected to the name server, and therefore they have already been authenticated by that server. Also, the name server provides the current, dynamic IP address of all users. The list of currently active users can be used to select one user and then connect to her or him.User A, upon connecting to another user B, may browse these classifications for which have been permitted by the user B, and she or he can also download some of these classifications, see Figure 6. http://evilqueen.acadiau.ca/montreal2000/figure-6.jpg (Figure 6: DRUCO in on-line mode). As in Figure 6, the user "shiv" connected to the user "vicky" and therefore the middle pane shows a part of the classification from Vicky that was made available to Shiv. DRUCO has been implemented in Java 1.2, see [SUN 99a], using RMI, see [SUN 99b].
3. Case Study: Distributed Repository of Programming Examples Examplebased learning, see [Neal 1989] promotes the idea of using numerous examples to help to understand various concepts, and to move these concepts from a short-term memory to a long term memory. Examples are useful if they can be easily browsed and searched, and they can be shared by various users. When learning programming, examples are particularly useful because in this case one always learns from examples of small programs. Consider as an example a specific class for teaching programming in the C programming language, taking place in an electronic classroom. Before the beginning of the class, the instructor (provider) makes available examples of programs in C to all students in the classroom. (As an alternative, the instructor may divide students into groups by giving each group a different example.) Now, the students can pull these examples to their computers. They can view these examples, export them to a favorite compiler, modify them by creating new versions, and making these versions available to all students, or specific students. The entire process can result in collaborative development of a useful repository that can be used not only for learning, but also for real every-day programming. DRUCO is an ideal candidate to implement a shared repository of examples with a hierarchical structure. It has been used to implement DRUCOC - a repository of examples of C. Here are some pictures of DRUCOC DRUCOC (Work Space folder Pane) http://evilqueen.acadiau.ca/montreal2000/figure-7.jpg DRUCOC (Another shot of Work Space Pane) http://evilqueen.acadiau.ca/montreal2000/figure-8.jpg DRUCOC (Security Management Pane) http://evilqueen.acadiau.ca/montreal2000/figure-9.jpg DRUCOC (File Preview Screen) http://evilqueen.acadiau.ca/montreal2000/figure-10.jpg Acknowledgments The implementation of DRUCO described in this paper has been completed as a part the Masters thesis written by the second author. References [Booth 1981] Booth, Grayce M. The Distributed System Environment. New York: McGraw-Hill, 1981 [Holmes 96] Holmes, M., Porter, D.: "Student Notebook Computers in Studio Courses"; ED-MEDIA'96 Conference, Boston (June 1996). [Huhns 98] M.N. Huhns: Agent Foundations for Cooperative Information Systems. In: Proceedings of The Third International Conference on The Practical Applications of Intelligent Agents and Multi-Agent Technology; London 1998; Edited by H.S. Nwana and D.T. Ndumu. [Mullender 1989] Mullender, Sape. Distributed Systems. New York: ACM Press, 1989. Neal L. R. (1989). A System for Example-Based Programming. ACM, Human-Computer Interactions, HCI'89 Proceedings. pp. 63-67. [Papazoglou 92] Papazoglou, M.: An Organizational Framework for Intelligent Cooperative IS. IJICS-1 (1). 1992. [Shneiderman et al 95] Shneiderman, B., Alavi, M., Norman, K, and Borkowski, E. Windows of Opportunity in Electronic Classrooms; Communications of the ACM, 38, 11 (1995). [Sun Microsystems 1998]. Sun Microsystems. The Java Language Environment. On-line. Available: http://www.javasoft.com/docs/white/langenv/ (11 September 1998).[Van de Velde at al. 97] Van de Velde, W., Geldof, S. and Schrooten, R.: Competition for Attention. In: Intelligent Agents IV. Agent Theories, Architectures and Languages. Lecture Notes in Artificial Intelligence 1365; J.G. Carbonell and J. Siekmann (Eds.) Lecture Notes in Computer Science; Springer Verlag 1997 [ Satyanarayanan, M. 90] Satyanarayanan, M., Kistler, J.J., Kumar, P., Okasaki, M.E., Siegel, E.H., Steere, D.C. IEEE Transactions on Computers April 1990, Vol. 39, No. 4 [Greenleaf G. 97] Graham Greenleaf and Roger Clarke, "Privacy Implications of Digital Signatures". at http://www.anu.edu.au/people/Roger.Clarke/DV/DigSig.html [Wong D. 99] David Wong, Noemi Paciorek, Dana Moore: Java-based Mobile Agents. CACM 42(3): 92-102 (1999) [David S. 99] Lessons in EDI, knowledge management, and scalable vector graphics from Interdoc's annual XML conference. Sept. 22, 1999http://www.xml.com [SUN 99a] Sun Microsystems, Java 2 SDK, Available via http://java.sun.com/jdk/ [SUN 99b] Sun Microsystems. White paper Available via http://java.sun.com/marketing/collateral/javarmi.html (21 May 1999 ).
|