aboutsummaryrefslogtreecommitdiff
path: root/man/man8/venti.8
diff options
context:
space:
mode:
Diffstat (limited to 'man/man8/venti.8')
-rw-r--r--man/man8/venti.8431
1 files changed, 431 insertions, 0 deletions
diff --git a/man/man8/venti.8 b/man/man8/venti.8
new file mode 100644
index 00000000..2327529f
--- /dev/null
+++ b/man/man8/venti.8
@@ -0,0 +1,431 @@
+.TH VENTI 8
+.SH NAME
+venti.conf \- venti configuration
+.SH DESCRIPTION
+Venti is a SHA1-addressed archival storage server.
+See
+.IR venti (7)
+for a full introduction to the system.
+This page documents the structure and operation of the server.
+.PP
+A venti server requires multiple disks or disk partitions,
+each of which must be properly formatted before the server
+can be run.
+.SS Disk
+The venti server maintains three disk structures, typically
+stored on raw disk partitions:
+the append-only
+.IR "data log" ,
+which holds, in sequential order,
+the contents of every block written to the server;
+the
+.IR index ,
+which helps locate a block in the data log given its score;
+and optionally the
+.IR "bloom filter" ,
+a concise summary of which scores are present in the index.
+The data log is the primary storage.
+To improve the robustness, it should be stored on
+a device that provides RAID functionality.
+The index and the bloom filter are optimizations
+employed to access the data log efficiently and can be rebuilt
+if lost or damaged.
+.PP
+The data log is logically split into sections called
+.IR arenas ,
+typically sized for easy offline backup
+(e.g., 500MB).
+A data log may comprise many disks, each storing
+one or more arenas.
+Such disks are called
+.IR "arena partitions" .
+Arena partitions are filled in the order given in the configuration.
+.PP
+The index is logically split into block-sized pieces called
+.IR buckets ,
+each of which is responsible for a particular range of scores.
+An index may be split across many disks, each storing many buckets.
+Such disks are called
+.IR "index sections" .
+.PP
+The index must be sized so that no bucket is full.
+When a bucket fills, the server must be shut down and
+the index made larger.
+Since scores appear random, each bucket will contain
+approximately the same number of entries.
+Index entries are 40 bytes long. Assuming that a typical block
+being written to the server is 8192 bytes and compresses to 4096
+bytes, the active index is expected to be about 1% of
+the active data log.
+Storing smaller blocks increases the relative index footprint;
+storing larger blocks decreases it.
+To allow variation in both block size and the random distribution
+of scores to buckets, the suggested index size is 5% of
+the active data log.
+.PP
+The (optional) bloom filter is a large bitmap that is stored on disk but
+also kept completely in memory while the venti server runs.
+It helps the venti server efficiently detect scores that are
+.I not
+already stored in the index.
+The bloom filter starts out zeroed.
+Each score recorded in the bloom filter is hashed to choose
+.I nhash
+bits to set in the bloom filter.
+A score is definitely not stored in the index of any of its
+.I nhash
+bits are not set.
+The bloom filter thus has two parameters:
+.I nhash
+(maximum 32)
+and the total bitmap size
+(maximum 512MB, 2\s-2\u32\d\s+2 bits).
+.PP
+The bloom filter should be sized so that
+.I nhash
+\(ti
+.I nblock
+\(ti
+0.7
+\(<=
+0.7 \(ti
+.IR b ,
+where
+.I nblock
+is the expected number of blocks stored on the server
+and
+.I b
+is the bitmap size in bits.
+The false positive rate of the bloom filter when sized
+this way is approximately 2\s-2\u\-\fInblock\fR\d\s+2.
+.I Nhash
+less than 10 are not very useful;
+.I nhash
+greater than 24 are probably a waste of memory.
+.I Fmtbloom
+(see
+.IR venti-fmt (8))
+can be given either
+.I nhash
+or
+.IR nblock ;
+if given
+.IR nblock ,
+it will derive an appropriate
+.IR nhash .
+.SS Memory
+Venti can make effective use of large amounts of memory
+for various caches.
+.PP
+The
+.I "lump cache
+holds recently-accessed venti data blocks, which the server refers to as
+.IR lumps .
+The lump cache should be at least 1MB but can profitably be much larger.
+The lump cache can be thought of as the level-1 cache:
+read requests handled by the lump cache can
+be served instantly.
+.PP
+The
+.I "block cache
+holds recently-accessed
+.I disk
+blocks from the arena partitions.
+The block cache needs to be able to simultaneously hold two blocks
+from each arena plus four blocks for the currently-filling arena.
+The block cache can be thought of as the level-2 cache:
+read requests handled by the block cache are slower than those
+handled by the lump cache, since the lump data must be extracted
+from the raw disk blocks and possibly decompressed, but no
+disk accesses are necessary.
+.PP
+The
+.I "index cache
+holds recently-accessed or prefetched
+index entries.
+The index cache needs to be able to hold index entries
+for three or four arenas, at least, in order for prefetching
+to work properly. Each index entry is 50 bytes.
+Assuming 500MB arenas of
+128,000 blocks that are 4096 bytes each after compression,
+the minimum index cache size is about 6MB.
+The index cache can be thought of as the level-3 cache:
+read requests handled by the index cache must still go
+to disk to fetch the arena blocks, but the costly random
+access to the index is avoided.
+.PP
+The size of the index cache determines how long venti
+can sustain its `burst' write throughput, during which time
+the only disk accesses on the critical path
+are sequential writes to the arena partitions.
+For example, if you want to be able to sustain 10MB/s
+for an hour, you need enough index cache to hold entries
+for 36GB of blocks. Assuming 8192-byte blocks,
+you need room for almost five million index entries.
+Since index entries are 50 bytes each, you need 250MB
+of index cache.
+If the background index update process can make a single
+pass through the index in an hour, which is possible,
+then you can sustain the 10MB/s indefinitely (at least until
+the arenas are all filled).
+.PP
+The
+.I "bloom filter
+requires memory equal to its size on disk,
+as discussed above.
+.PP
+A reasonable starting allocation is to
+divide memory equally (in thirds) between
+the bloom filter, the index cache, and the lump and block caches;
+the third of memory allocated to the lump and block caches
+should be split unevenly, with more (say, two thirds)
+going to the block cache.
+.SS Network
+The venti server announces two network services, one
+(conventionally TCP port
+.BR venti ,
+17034) serving
+the venti protocol as described in
+.IR venti (7),
+and one serving HTTP
+(conventionally TCP port
+.BR venti ,
+80).
+.PP
+The venti web server provides the following
+URLs for accessing status information:
+.TP
+.B /index
+A summary of the usage of the arenas and index sections.
+.TP
+.B /xindex
+An XML version of
+.BR /index .
+.TP
+.B /storage
+Brief storage totals.
+.TP
+.BI /set/ variable
+The current integer value of
+.IR variable .
+Variables are:
+.BR compress ,
+whether or not to compress blocks
+(for debugging);
+.BR logging ,
+whether to write entries to the debugging logs;
+.BR stats ,
+whether to collect run-time statistics;
+.BR icachesleeptime ,
+the time in milliseconds between successive updates
+of megabytes of the index cache;
+.BR arenasumsleeptime ,
+the time in milliseconds between reads while
+checksumming an arena in the background.
+The two sleep times should be (but are not) managed by venti;
+they exist to provide more experience with their effects.
+The other variables exist only for debugging and
+performance measurement.
+.TP
+.BI /set/ variable / value
+Set
+.I variable
+to
+.IR value .
+.TP
+.BI /graph/ name / param / param / \fR...
+A PNG image graphing the named run-time statistic over time.
+The details of names and parameters are undocumented;
+see
+.B httpd.c
+in the venti sources.
+.TP
+.B /log
+A list of all debugging logs present in the server's memory.
+.TP
+.BI /log/ name
+The contents of the debugging log with the given
+.IR name .
+.TP
+.B /flushicache
+Force venti to begin flushing the index cache to disk.
+The request response will not be sent until the flush
+has completed.
+.TP
+.B /flushdcache
+Force venti to begin flushing the arena block cache to disk.
+The request response will not be sent until the flush
+has completed.
+.PD
+.PP
+Requests for other files are served by consulting a
+directory named in the configuration file
+(see
+.B webroot
+below).
+.SS Configuration File
+A venti configuration file
+enumerates the various index sections and
+arenas that constitute a venti system.
+The components are indicated by the name of the file, typically
+a disk partition, in which they reside. The configuration
+file is the only location that file names are used. Internally,
+venti uses the names assigned when the components were formatted
+with
+.I fmtarenas
+or
+.I fmtisect
+(see
+.IR venti-fmt (8)).
+In particular, only the configuration needs to be
+changed if a component is moved to a different file.
+.PP
+The configuration file consists of lines in the form described below.
+Lines starting with
+.B #
+are comments.
+.TP
+.BI index " name
+Names the index for the system.
+.TP
+.BI arenas " file
+.I File
+is an arena partition, formatted using
+.IR fmtarenas .
+.TP
+.BI isect " file
+.I File
+is an index section, formatted using
+.IR fmtisect .
+.PP
+After formatting a venti system using
+.IR fmtindex ,
+the order of arenas and index sections should not be changed.
+Additional arenas can be appended to the configuration;
+run
+.I fmtindex
+with the
+.B -a
+flag to update the index.
+.PP
+The configuration file also holds configuration parameters
+for the venti server itself.
+These are:
+.TF httpaddr netaddr
+.TP
+.BI mem " size
+lump cache size
+.TP
+.BI bcmem " size
+block cache size
+.TP
+.BI icmem " size
+index cache size
+.TP
+.BI addr " netaddr
+network address to announce venti service
+(default
+.BR tcp!*!venti )
+.TP
+.BI httpaddr " netaddr
+network address to announce HTTP service
+(default
+.BR tcp!*!http )
+.TP
+.B queuewrites
+queue writes in memory
+(default is not to queue)
+.TP
+.BI webroot " dir
+directory tree containing files for HTTP server
+to consult for unrecognized URLs
+.PD
+.PP
+The units for the various cache sizes above can be specified by appending a
+.LR k ,
+.LR m ,
+or
+.LR g
+(case-insensitive)
+to indicate kilobytes, megabytes, or gigabytes respectively.
+.SS Command Line
+Options to
+.I venti
+are:
+.TP
+.BI -c " config
+The server configuration file
+(default
+.BR venti.conf )
+.TP
+.BI -o " line
+Set a server parameter, using the same syntax
+as in the configuration file.
+The
+.B -o
+options override the configuration file.
+.TP
+.B -d
+Produce various debugging information on standard error.
+Implies
+.BR -s .
+.TP
+.B -L
+Enable logging. By default all logging is disabled.
+Logging slows server operation considerably.
+.TP
+.B -s
+Do not run in the background.
+Normally,
+the foreground process will exit once the Venti server
+is initialized and ready for connections.
+.PD
+.SH EXAMPLE
+A simple configuration:
+.IP
+.EX
+% cat venti.conf
+index main
+isect /tmp/disks/isect0
+isect /tmp/disks/isect1
+arenas /tmp/disks/arenas
+mem 10M
+bcmem 20M
+icmem 30M
+%
+.EE
+.PP
+Format the index sections, the arena partition, and
+finally the main index:
+.IP
+.EX
+% venti/fmtisect isect0. /tmp/disks/isect0 &
+% venti/fmtisect isect1. /tmp/disks/isect1 &
+% venti/fmtarenas arenas0. /tmp/disks/arenas &
+% wait
+% venti/fmtindex venti.conf
+%
+.EE
+.PP
+Start the server and check the storage statistics:
+.IP
+.EX
+% venti/venti
+% hget http://$sysname/storage
+.EE
+.SH "SEE ALSO"
+.IR venti (1),
+.IR venti (3),
+.IR venti (7),
+.IR venti-backup (8)
+.IR venti-fmt (8)
+.br
+Sean Quinlan and Sean Dorward,
+``Venti: a new approach to archival storage'',
+.I "Usenix Conference on File and Storage Technologies" ,
+2002.
+.SH BUGS
+Setting up a venti server is too complicated.
+.PP
+Venti should not require the user to decide how to
+partition its memory usage.