diff options
author | rsc <devnull@localhost> | 2005-07-13 13:23:47 +0000 |
---|---|---|
committer | rsc <devnull@localhost> | 2005-07-13 13:23:47 +0000 |
commit | bdf5b5cde1d463ce11b1e62eba68dab25f5b7422 (patch) | |
tree | 3906d7e7b28aa29f910c696a0f98c4c514b5eb9d /man/man8/venti.8 | |
parent | df03d60c047a3010f800b26883763764be8d5744 (diff) | |
download | plan9port-bdf5b5cde1d463ce11b1e62eba68dab25f5b7422.tar.gz plan9port-bdf5b5cde1d463ce11b1e62eba68dab25f5b7422.tar.bz2 plan9port-bdf5b5cde1d463ce11b1e62eba68dab25f5b7422.zip |
new man pages
Diffstat (limited to 'man/man8/venti.8')
-rw-r--r-- | man/man8/venti.8 | 431 |
1 files changed, 431 insertions, 0 deletions
diff --git a/man/man8/venti.8 b/man/man8/venti.8 new file mode 100644 index 00000000..2327529f --- /dev/null +++ b/man/man8/venti.8 @@ -0,0 +1,431 @@ +.TH VENTI 8 +.SH NAME +venti.conf \- venti configuration +.SH DESCRIPTION +Venti is a SHA1-addressed archival storage server. +See +.IR venti (7) +for a full introduction to the system. +This page documents the structure and operation of the server. +.PP +A venti server requires multiple disks or disk partitions, +each of which must be properly formatted before the server +can be run. +.SS Disk +The venti server maintains three disk structures, typically +stored on raw disk partitions: +the append-only +.IR "data log" , +which holds, in sequential order, +the contents of every block written to the server; +the +.IR index , +which helps locate a block in the data log given its score; +and optionally the +.IR "bloom filter" , +a concise summary of which scores are present in the index. +The data log is the primary storage. +To improve the robustness, it should be stored on +a device that provides RAID functionality. +The index and the bloom filter are optimizations +employed to access the data log efficiently and can be rebuilt +if lost or damaged. +.PP +The data log is logically split into sections called +.IR arenas , +typically sized for easy offline backup +(e.g., 500MB). +A data log may comprise many disks, each storing +one or more arenas. +Such disks are called +.IR "arena partitions" . +Arena partitions are filled in the order given in the configuration. +.PP +The index is logically split into block-sized pieces called +.IR buckets , +each of which is responsible for a particular range of scores. +An index may be split across many disks, each storing many buckets. +Such disks are called +.IR "index sections" . +.PP +The index must be sized so that no bucket is full. +When a bucket fills, the server must be shut down and +the index made larger. +Since scores appear random, each bucket will contain +approximately the same number of entries. +Index entries are 40 bytes long. Assuming that a typical block +being written to the server is 8192 bytes and compresses to 4096 +bytes, the active index is expected to be about 1% of +the active data log. +Storing smaller blocks increases the relative index footprint; +storing larger blocks decreases it. +To allow variation in both block size and the random distribution +of scores to buckets, the suggested index size is 5% of +the active data log. +.PP +The (optional) bloom filter is a large bitmap that is stored on disk but +also kept completely in memory while the venti server runs. +It helps the venti server efficiently detect scores that are +.I not +already stored in the index. +The bloom filter starts out zeroed. +Each score recorded in the bloom filter is hashed to choose +.I nhash +bits to set in the bloom filter. +A score is definitely not stored in the index of any of its +.I nhash +bits are not set. +The bloom filter thus has two parameters: +.I nhash +(maximum 32) +and the total bitmap size +(maximum 512MB, 2\s-2\u32\d\s+2 bits). +.PP +The bloom filter should be sized so that +.I nhash +\(ti +.I nblock +\(ti +0.7 +\(<= +0.7 \(ti +.IR b , +where +.I nblock +is the expected number of blocks stored on the server +and +.I b +is the bitmap size in bits. +The false positive rate of the bloom filter when sized +this way is approximately 2\s-2\u\-\fInblock\fR\d\s+2. +.I Nhash +less than 10 are not very useful; +.I nhash +greater than 24 are probably a waste of memory. +.I Fmtbloom +(see +.IR venti-fmt (8)) +can be given either +.I nhash +or +.IR nblock ; +if given +.IR nblock , +it will derive an appropriate +.IR nhash . +.SS Memory +Venti can make effective use of large amounts of memory +for various caches. +.PP +The +.I "lump cache +holds recently-accessed venti data blocks, which the server refers to as +.IR lumps . +The lump cache should be at least 1MB but can profitably be much larger. +The lump cache can be thought of as the level-1 cache: +read requests handled by the lump cache can +be served instantly. +.PP +The +.I "block cache +holds recently-accessed +.I disk +blocks from the arena partitions. +The block cache needs to be able to simultaneously hold two blocks +from each arena plus four blocks for the currently-filling arena. +The block cache can be thought of as the level-2 cache: +read requests handled by the block cache are slower than those +handled by the lump cache, since the lump data must be extracted +from the raw disk blocks and possibly decompressed, but no +disk accesses are necessary. +.PP +The +.I "index cache +holds recently-accessed or prefetched +index entries. +The index cache needs to be able to hold index entries +for three or four arenas, at least, in order for prefetching +to work properly. Each index entry is 50 bytes. +Assuming 500MB arenas of +128,000 blocks that are 4096 bytes each after compression, +the minimum index cache size is about 6MB. +The index cache can be thought of as the level-3 cache: +read requests handled by the index cache must still go +to disk to fetch the arena blocks, but the costly random +access to the index is avoided. +.PP +The size of the index cache determines how long venti +can sustain its `burst' write throughput, during which time +the only disk accesses on the critical path +are sequential writes to the arena partitions. +For example, if you want to be able to sustain 10MB/s +for an hour, you need enough index cache to hold entries +for 36GB of blocks. Assuming 8192-byte blocks, +you need room for almost five million index entries. +Since index entries are 50 bytes each, you need 250MB +of index cache. +If the background index update process can make a single +pass through the index in an hour, which is possible, +then you can sustain the 10MB/s indefinitely (at least until +the arenas are all filled). +.PP +The +.I "bloom filter +requires memory equal to its size on disk, +as discussed above. +.PP +A reasonable starting allocation is to +divide memory equally (in thirds) between +the bloom filter, the index cache, and the lump and block caches; +the third of memory allocated to the lump and block caches +should be split unevenly, with more (say, two thirds) +going to the block cache. +.SS Network +The venti server announces two network services, one +(conventionally TCP port +.BR venti , +17034) serving +the venti protocol as described in +.IR venti (7), +and one serving HTTP +(conventionally TCP port +.BR venti , +80). +.PP +The venti web server provides the following +URLs for accessing status information: +.TP +.B /index +A summary of the usage of the arenas and index sections. +.TP +.B /xindex +An XML version of +.BR /index . +.TP +.B /storage +Brief storage totals. +.TP +.BI /set/ variable +The current integer value of +.IR variable . +Variables are: +.BR compress , +whether or not to compress blocks +(for debugging); +.BR logging , +whether to write entries to the debugging logs; +.BR stats , +whether to collect run-time statistics; +.BR icachesleeptime , +the time in milliseconds between successive updates +of megabytes of the index cache; +.BR arenasumsleeptime , +the time in milliseconds between reads while +checksumming an arena in the background. +The two sleep times should be (but are not) managed by venti; +they exist to provide more experience with their effects. +The other variables exist only for debugging and +performance measurement. +.TP +.BI /set/ variable / value +Set +.I variable +to +.IR value . +.TP +.BI /graph/ name / param / param / \fR... +A PNG image graphing the named run-time statistic over time. +The details of names and parameters are undocumented; +see +.B httpd.c +in the venti sources. +.TP +.B /log +A list of all debugging logs present in the server's memory. +.TP +.BI /log/ name +The contents of the debugging log with the given +.IR name . +.TP +.B /flushicache +Force venti to begin flushing the index cache to disk. +The request response will not be sent until the flush +has completed. +.TP +.B /flushdcache +Force venti to begin flushing the arena block cache to disk. +The request response will not be sent until the flush +has completed. +.PD +.PP +Requests for other files are served by consulting a +directory named in the configuration file +(see +.B webroot +below). +.SS Configuration File +A venti configuration file +enumerates the various index sections and +arenas that constitute a venti system. +The components are indicated by the name of the file, typically +a disk partition, in which they reside. The configuration +file is the only location that file names are used. Internally, +venti uses the names assigned when the components were formatted +with +.I fmtarenas +or +.I fmtisect +(see +.IR venti-fmt (8)). +In particular, only the configuration needs to be +changed if a component is moved to a different file. +.PP +The configuration file consists of lines in the form described below. +Lines starting with +.B # +are comments. +.TP +.BI index " name +Names the index for the system. +.TP +.BI arenas " file +.I File +is an arena partition, formatted using +.IR fmtarenas . +.TP +.BI isect " file +.I File +is an index section, formatted using +.IR fmtisect . +.PP +After formatting a venti system using +.IR fmtindex , +the order of arenas and index sections should not be changed. +Additional arenas can be appended to the configuration; +run +.I fmtindex +with the +.B -a +flag to update the index. +.PP +The configuration file also holds configuration parameters +for the venti server itself. +These are: +.TF httpaddr netaddr +.TP +.BI mem " size +lump cache size +.TP +.BI bcmem " size +block cache size +.TP +.BI icmem " size +index cache size +.TP +.BI addr " netaddr +network address to announce venti service +(default +.BR tcp!*!venti ) +.TP +.BI httpaddr " netaddr +network address to announce HTTP service +(default +.BR tcp!*!http ) +.TP +.B queuewrites +queue writes in memory +(default is not to queue) +.TP +.BI webroot " dir +directory tree containing files for HTTP server +to consult for unrecognized URLs +.PD +.PP +The units for the various cache sizes above can be specified by appending a +.LR k , +.LR m , +or +.LR g +(case-insensitive) +to indicate kilobytes, megabytes, or gigabytes respectively. +.SS Command Line +Options to +.I venti +are: +.TP +.BI -c " config +The server configuration file +(default +.BR venti.conf ) +.TP +.BI -o " line +Set a server parameter, using the same syntax +as in the configuration file. +The +.B -o +options override the configuration file. +.TP +.B -d +Produce various debugging information on standard error. +Implies +.BR -s . +.TP +.B -L +Enable logging. By default all logging is disabled. +Logging slows server operation considerably. +.TP +.B -s +Do not run in the background. +Normally, +the foreground process will exit once the Venti server +is initialized and ready for connections. +.PD +.SH EXAMPLE +A simple configuration: +.IP +.EX +% cat venti.conf +index main +isect /tmp/disks/isect0 +isect /tmp/disks/isect1 +arenas /tmp/disks/arenas +mem 10M +bcmem 20M +icmem 30M +% +.EE +.PP +Format the index sections, the arena partition, and +finally the main index: +.IP +.EX +% venti/fmtisect isect0. /tmp/disks/isect0 & +% venti/fmtisect isect1. /tmp/disks/isect1 & +% venti/fmtarenas arenas0. /tmp/disks/arenas & +% wait +% venti/fmtindex venti.conf +% +.EE +.PP +Start the server and check the storage statistics: +.IP +.EX +% venti/venti +% hget http://$sysname/storage +.EE +.SH "SEE ALSO" +.IR venti (1), +.IR venti (3), +.IR venti (7), +.IR venti-backup (8) +.IR venti-fmt (8) +.br +Sean Quinlan and Sean Dorward, +``Venti: a new approach to archival storage'', +.I "Usenix Conference on File and Storage Technologies" , +2002. +.SH BUGS +Setting up a venti server is too complicated. +.PP +Venti should not require the user to decide how to +partition its memory usage. |