Implementation of SoftMMU - Notes ================================= note: all labels are purely informational, and are not found in the source code. There are 4 bookkeeping structures in my implementation of SoftMMU: 1) User Zero Page Structure window .word $0000 page .word $0000 .byte $00 This is the structure that the user program deals with, as defined in the SoftMMU API. The last byte is used as an offset into the task's superpage where another structure resides. 2) Task Superpage Structure zpstruct .byte $00 start .word $0000 end .word $0000 memsrc .byte $00 This is a second structure that is created in the user program context for every chunk allocated. The first byte links this structure to its sibling structure in zero page, and the last byte of the zero page structure links to this one. start and end contain the page offsets into the expanded memory pool for the beginning, and ending+1 locations that span the entire allocated chunk. memsrc contains a value which indicates which expanded memory device this chunk is allocated in. These structures are dynamically allocated in the tsp when the program allocates memory. A byte variable, located at offset tsp_smptr in the tsp, contains a pointer to the next free area in the task's superpage. Upon allocation, the xalloc routine creates this structure at the location pointed to by tsp_smptr, and increases tsp_smptr by a value of 6. xalloc must also be sure that tsp_smptr never points above $6f, or it will corrupt other tsp data. Currently, there is enough room for 10 structures. When xfree is called, it must decrease tsp_smptr by 6, and shuffle down any subsequent structures in order to fill the hole created by the now defunct xfree'd structure. When moving structures within the tsp, the pointer in the zpstruct must be updated to reflect the new location. 3) Task Superpage Variables This is the only portion of SoftMMU that requires a change in the LUnix kernel. First, tsp_smptr must be initialized whenever a task (and its corresponding superpage) is created. Second, the memory configuration should remain intact across context switches, so an 8-byte chunk of tsp memory (tsp_smstate) is reserved for the current memory configuration, as well as a two flag bytes, tsp_smget, tsp_smlose. The bits of these flag bytes correspond to the memory source number (see below). If one of these bits is set, it indicates that the corresponding memory source needs to save its status every time the context is received or lost. The following code does this fairly quickly: ldy #tsp_smget ;get the flag byte for getting context. ;use tsp_smlose for when the context is lost. lda (lk_tsp),y beq ++ ;skip if no bits are set ldx #$ff ;start at memory source #0 - inx asl a ;get next bit bcc + ;skip if it's not set pha ;dispatch to the appropriate routine. ;current memory source # is in .X pla + bne - ;keep looping while there are bits set in tsp_smflag + This must be performed both when a task loses and receives the current context. Upon creation of the tsp, tsp_smflag should be set to zero, and at every xalloc, the used memory source should OR its bit with 1 if this feature is to be used. The third change to the kernel is to clear out any allocated memory during the suicide system call. This is accomplished by tracing from tsp_smptr back to its original value, calling xfree for every structure. 3) Memory Sources List memsrc .byte $00 page .byte $00,$00,$00,... top .byte $00,$00,$00,... open .byte $00,$00,$00,... ;jump tables initlo .byte $00,$00,$00,... inithi .byte $00,$00,$00,... xpagelo .byte $00,$00,$00,... xpagehi .byte $00,$00,$00,... ... There is one instance of the above structure contained in the SoftMMU core. memsrc contains the number of memory sources in the machine, which is also the length of the following lists. Note that for 3 memory sources, memsrc is $02, and the lists are 3 bytes long (max offset of $02). These values and the list lengths are currently calculated at compile-time, using a large, nasty gob of preprocessor directives. Each memory source is given a number, which is its offset into the lists. Since the offset into the lists are usually scanned from the highest offset (because memsrc was loaded to intialize the counter, which is decremented until zero), the faster memory sources are listed there, and the slower ones reside at the head of the list. For example, a C64 with both a REU and a GeoRAM would have three memory sources: 6502 would be #0 (slowest), REU would be #1, and GeoRAM would be #2 (fastest). page points to a page in memory that is uniquely allocated to each memory source, which holds the free memory lists (described below). The "top" variable contains the offset within this page of the last entry in the list, while "open" contains the offset to the next free list entry location. Free memory lists may or may not be used for base system RAM (like stock C64 memory) because programs, the kernel, or other non-SoftMMU data & code might reside there. For these, the page, open, and top lists are meaningless, no page is allocated, and other routines must be used to manage this memory. The jump tables are used to access the specific functions that each memory source must provide to SoftMMU. These are described in the skeletal memory unit example. 4) Free Memory Lists length .word $0000 start .word $0000 A list of these structures is contained in a page which describes all free areas of a memory source. Both length and start are page offsets, meaning that a single memory source can contain up to 24 bits of virtual address space. (actually, valid values for length are $0001 - $ffff pages, or 256 bytes through 16 megabytes minus 256 bytes) I say "virtual" because the 16-bit page offsets can be handled in any way by the different expansion support routines. For example, if a system has 2 banks of 64k each, and needs to change a bank select register to either $3f or $7f, it could initialize the free memory lists to contain the following 2 entries: length .word $0100 ;64k length start .word $3f00 ;$3f0000 - $3fffff length .word $0100 ;64k length start .word $7f00 ;$7f0000 - $7fffff This way, the xpage routine would take the upper byte of the page offset, dump it into the bank select register, and use the lower byte of the page offset to select which physical page in the 64k bank to use. While this is a very fast implementation, it limits the maximum size of any memory chunk to 64k, so trying to allocate 100k of memory will fail even though there is 128k of RAM available. (Note: this might change in the future if memory fragmentation is implemented) To remedy this, the free memory list would be initialized to the following: length .word $0200 ;128k length start .word $0000 ;$000000 - $01ffff Here, the high byte of the page offset would be queried, and a $3f or $7f would be chosen if the high byte were $00 or $01, respectively. While this is somewhat slower, it concatenates both banks of 64k into a single virtual chunk of 128k that it presents to the system, allowing for larger allocation requests. (Note: This example might be misleading, because if the preceding were implemented on a C128, and the data was stored in a different bank than the code that was trying to access it, the program counter would continue to run in the data bank instead of the code bank. In the case of the 128, care must be taken that the code & data reside in the same bank, as the 8502 cannot see more than one bank at a time on its address bus.) When xalloc is called, it scans through the free memory lists, looking for a suitable match, which is the smallest free chunk that is longer or equal in size to the allocation request. Once this is found, the chunk is created to start at the start of the list entry, the start of the list entry is increased by the allocated size, and the length of the list entry is decreased by the allocated size. If the length is reduced to zero, and the current list entry is below the "open" variable which corresponds to this list, "open" is updated to point to the current list entry, as zero-length free list entries are defunct and recyclable. When xfree is called, it must scan through the list to see if the newly freed chunk of memory is adjacent to another free chunk. This coalesces memory, so that concurrent small chunks of free memory may be grouped together for a large allocation. If adjacent chunks are found (remember that it must check if both the start and end of the chunk is adjacent), the length and/or start are updated to expand the free chunk. If no adjacent free chunk is found, a new list entry must be created, at location "open" within the page. Once this is done, "open" must be updated to find a new unused list entry location. If, at this point, "open" and "top" are equal, both are increased by four. If not, the new location is found by scanning upwards from "open" to "top" until a zero-length list entry is discovered, or "top" is reached. No attempt is made to order the lists, or to defragment either the list or the memory chunks, as these should not be necessary. Being stored in a single page, the maximum number of free chunks is limited to 64. If this is ever reached, the memory chunk simply cannot be freed, and the system will continue to function sans that addressable piece of memory. In order for this unfortunate situation to occur, though, there must be at least 31 non-adjacent memory allocations, which I foresee as quite unlikely with a maximum of only 32 processes being supported by LUnix, and the "best fit" allocation method being used. However, it is recommended that programs allocate the smallest number of large chunks as possible, instead of a large number of small chunks.