So far, in the assignments, you have been working with
distributed and mostly immutable state in the context of
distributed computation (proof of work). In this assignment you
will build a simple distributed key-value store (KVS) that will
support mutable operations. Your KVS will resemble A4 in that
there will be a single front-end node that receives and
processes all client requests. However, unlike previous
assignments, your KVS will need to deal with some failures. We
will also build on this basic KVS in the final assignment.
Changes since @566, initial assignment release
Thu 1 Apr 2021 17:19:05 PDT
Most up to date starter.zip
- @667, @610_f1 : KvslibGetResult and FrontEndGetResult value can be nil if err is false (i.e., if no previous put on the key)
- @602 : Added per-key constraint to formal def of write monotonicity
- @588 : Made spec in line with starter code, Start at frontend and storage takes a tracing.Tracer
- @589 : Changed global uniqueness restriction on OpIds to 'For each client, OpIds are unique'
- @591, @581_f1 : starter.zip - FrontEndAddr typo in storage config json file, and comma after OutputFile in tracing server json
- Added a diagram to visually illustrate all the distributed tracing actions
- @569, @571, @572, @573, @576, @580: various fixes to starter files
- @571: Changes to KVS Initialize and Close APIs; kvs.Initialize should create traces, but not kvs.Close
- Updates to all distributed tracing specs; these now capture monotonicity and failure requirements.
Overview
In this assignment you will develop a fairly centralized, but
networked, key value store (KVS) that supports two operations from
multiple clients: Put, Get.
The KVS is divided into a frontend node that serves requests
from clients and a storage node that holds the
key-value data. In this system, clients are unaware of the storage
node and requests are directed towards a frontend node that issues
the operation to the storage node hosting the data. The frontend
node and the storage node must serve requests concurrently.
Your system will need to survive storage process node (process)
halting failures. However, the storage node will have access to a
local disk that it will use to persist all key-value data. If
the storage node fails and later restarts, it must re-read (recover)
the key-value data into memory from persistent disk storage. The
storage node must serve read-only requests (gets) from memory, but
data mutating operations (puts) must consistently update the disk
state.
Clients must communicate with the frontend node and the frontend
node must communicate with the storage node using a reliable
messaging protocol. As in A4, you must use Go's built-in RPC
library. You will also rely on the same distributed tracing library
as in A4. We will use the captured distributed traces to evaluate
the correctness of your implementation.
kvslib API
Similar to A3/A4 the client will access the KVS through a library,
called kvslib. Unlike these previous assignments, one or more clients
may invoke multiple concurrent Get/Put operations against identical
keys.
The kvslib Initialize, Get, and Put API calls below must each create a
new distributed trace instance using Tracer.CreateTrace(). Note that
the Initialize call should create a node-local "kvslib trace" that
only records actions local to the kvslib (the frontend and storage
nodes will have their own node-local traces). In the descriptions
below, if err (of built-in error type) is nil then the call succeeded,
otherwise err must include a descriptive message of the error. There
are no constraints on what the error message is, the exact text does
not matter. The kvslib has a KVS struct that must provide the
following API (stub methods are given
in kvslib/kvslib.go) .
- notify-channel, err
← Initialize(localTracer *tracing.Tracer, clientId string, FrontEndIP:FrontEndPort, ChCapacity)
-
Initializes the instance of KVS to use for connecting
to the frontend, and the frontend's IP:port. The
returned notify-channel channel must have
capacity ChCapacity and must be used by kvslib to
deliver all solution notifications. ChCapacity
determines the concurrency factor at the client: the client
will never have more than ChCapacity number of
operations outstanding (pending concurrently) at any one time.
If there is an issue with connecting, this should return an
appropriate err value, otherwise err should
be set to nil.
- opId, err ← Get(tracer *tracing.Tracer,
clientId string, key string)
-
This is a non-blocking request from the client to make
a get call for a given key. In case there is an underlying
issue (for example, the frontend cannot be reached), this
should return an appropriate err value,
otherwise err should be set to nil. Note that this
call is non-blocking. The returned value must be delivered
asynchronously to the client via the notify-channel
channel returned in the Initialize call. The
value opId is used to identify this request and
associate the returned value with this request.
- opId, err ← Put(tracer *tracing.Tracer,
clientId string, key string, value string)
-
This is a non-blocking request from the client to update
the value associated with a key. In case there is an underlying
issue (for example, the frontend cannot be reached), this should
return an appropriate err value, otherwise err
should be set to nil. Note that this call is non-blocking. The
value opId is used to identify this request and
associate the returned value with this request. The returned
value must be delivered asynchronously via
the notify-channel channel returned in the Initialize
call.
- err ← Close()
-
Stops the KVS instance from communicating with the
frontend and from delivering any results via
the notify-channel. If there is an issue with
stopping, this should return an appropriate
err value, otherwise err should be set to nil.
The OpId values must have the following properties:
-
Values of
opId must monotonically increase at each
kvslib and be locally unique. They should be comparable at a
single client to determine client operation request
order. Globally, across kvslib instances, values
of opId do not need to be comparable.
-
The
opId type is uint32. You can assume that a
client will not issue more than 2**32 operations during the
execution of your system (you do not need to implement id
rollover).
Items returned via the notify-channel should have the
following type:
type ResultStruct struct {
opId uint32
storageFail bool
result *string
}
Values of ResultStruct must conform to the following:
-
opId must correspond to some previously issued
operation that synchronously returned this opId .
-
If the operation could not complete successfully because of
storage node failure, then
storageFail should be set to true ,
otherwise it should be set to false .
-
If
opId corresponds to a Get(k)
operation, then result should contain the value
associated with key k . A get on a key that has not
been previously put should set result
to nil .
-
If
opId corresponds to a Put(k,v)
operation, then result should be set to
to v .
KVS system consistency semantics
Your KVS system must provide monotonic read and monotonic write
consistency. These two consistency types are usually used to
describe what a single process in the system observes, without
considering which key is being read/written. In this assignment, we
will use per key monotonic read/write notions. Your KVS does
not need to provide consistency guarantees across different keys.
Note that the consistency semantics defined more precisely below
specify what clients of your system should observe about the KVS
store. You can achieve these semantics in several different ways in
your implementations of kvslib, frontend, and storage nodes. In
particular, you can build your system to provide stronger
semantics. Though, this may lead you to lose out on the KVS
performance bonus (see rubric below :-)
- Monotonic reads: Assume that a kvslib delivers a
ResultStruct with result string v in response to a client's get(k)
operation with opId id1. Then, a get(k) operation with opId value
id2, such as id2 > id1, issued by the same client should observe
the same or a more recent value for key k.
- Monotonic writes: Assume that a client performs two put
operations: a put(k,v) with opId id1 and a put(k,v') with opId id2
and id1 < id2. Then, put(k,v) must be reflected in the KVS first,
before put(k,v'). That is, it should (1) be possible to observe
the value v for key k, and (2) key k should have value v' as the
final state, assuming no other puts on key k in the system.
- Frontend ordering: The frontend should process requests
from different clients in the order that they are received at the
frontend.
Below are a couple of example traces. Each line in the trace is a
fully completed operation: the operation was first issued and then
returned a ResultStruct. Time flows from top to bottom. There is no
concurrency in these examples.
Here is an example of an execution that violates monotonic reads:
- client1: put(k,v1)
- client1: put(k,v2)
- client1: v2←get(k)
- client1: v1←get(k)
- A violation since client already observed v2. By having the
same client later observing v1, the trace violates
monotonicity.
Here is an example of an execution that violates monotonic writes:
- client1: put(k,v1)
- client1: put(k,v2)
- client1: v1←get(k)
- A violation since v2 should be the final state of key k.
Here is an example of an execution with two clients that satisfies
monotonic reads and monotonic writes:
- client1: put(k,v1)
- client2: put(k,v2)
- client1: v1←get(k)
- client2: v2←get(k)
- client1: v2←get(k)
KVS distributed operation
The diagram above captures the basic operation and interactions
between the kvslib, frontend, storage, and persistent disk for two
operations (edge colors denote distributed traces): put, followed by
a get. The flow is similar to A4: there is a forwarding-style
interaction, extending from the kvslib at the client to the frontend
to the storage node to the disk, and back. A couple of points of
note:
- In this assignment we do not provide any RPC description for
you to follow. So, it's best to think of the diagram above as a
conceptual data-flow diagram.
-
As in A4, you will use distributed traces extensively. The
distributed tracing library API is unchanged. This spec assumes
tracing. That is, we require that all of your RPCs should carry
and return tracing tokens, just as was described in A4. Note,
however, that tracing does not extend to the disk.
-
The storage node records the put(k,v) state update to disk
before replying to the frontend. This is important to provide
strong failure semantics. More on failures below.
-
The frontend does not, and should not, implement any
key-value state caching. It should forward requests to the
storage node.
-
Likewise, the kvslib does not, and should not, implement any
key-value state caching.
- When servicing a get request the storage node does not read
the value for key k from disk. Your storage implementation
must maintain a consistent in-memory copy of the KVS, which it
uses to service all get requests.
Storage node failure and recovery
This assignment features storage failures. There are no kvslib and
frontend failures. Only the storage node will fail (you may assume
that fail means killing a process here). The storage node can fail
at any time. A storage node may later restart. On restart the
storage node should always be able to recover any stored state
from disk and continue servicing operations. You can use RPCs to
detect storage failures from the frontend. If an RPC from the
frontend node to the storage node fails, then the frontend must
retry exactly once before considering the storage node to have
failed. The frontend should retry after waiting
for storageTimeout seconds,
where storageTimeout is a user-defined
parameter. If the retry request fails, the frontend must let the
corresponding kvslib instance know about the failure. The kvslib
should not perform any retries.
Your KVS state must be durable. This means that the storage node
must keep track of put requests and save them (or updated state) to
disk before the clients a receives a response. When a storage node
is initialized it should checks if there is state on disk (if not,
it should initialize empty state) in the directory specified by the
configuration and reload the state of the KVS prior to the
failure. The following diagram illustrates the interaction between
storage and disk (without retries):
A couple of points to note in the above diagram:
-
There is a notion of initializing the disk state. The
implementation of this initialization will depend on how you
decide to use the disk.
-
The diagram illustrates one way to implement durability: reply
to a put request from storage node only once the write has
persisted. There are other, more complex schemes, that may give
you a performance boost.
-
After restart, in-memory state must be completely recovered
before any new requests are serviced. This again, is one way to
ensure correctness, but not the only way.
-
The failure may last an arbitrary amount of time. A restart of
the storage node is not guaranteed.
-
Note that the storage node may fail during disk initialization
and while it is writing to the disk. The diagram does not
illustrate these scenarios, but your implementation must handle
them. The storage node should always be able to start
successfully and it should not lose data.
Below is an end-to-end view of storage failure. This diagram includes
retry actions, but omits some of the above details:
-
The above diagram illustrates a corner case in which the kvslib
observes a failure of a put operation. However, later, the
kvslib observes that the put failure was actually a
success. Yes, this is correct behavior. If you would like extra
credit (see below), you can build your system to eliminate this
type of inconsistent behavior (warning: this is hard).
Non-kvslib API specifications
As in A4 and A5, you will need to complete a client side library
(kvslib) that calls the frontend node and performs the operation
specified by the client and a storage and frontend library. The
kvslib API was described earlier. Unlike A4 and A5, the frontend and
storage nodes also have APIs that you must follow. These are simple
APIs that initialize and start the frontend/storage daemons. We will
use these to start your frontend/storage instances.
- Frontend
err ← Start(
clientAPIListenAddr string,
storageAPIListenAddr string,
storageTimeout uint8,
ftrace *tracing.Tracer
)
-
When successful, this call should not return. The frontend
server should run until the program is terminated. This
call should return an error if there are networking issues
(e.g., incorrect ip:port given) or other unrecoverable
issues.
clientAPIListenAddr is the IP:port where the frontend
will expect clients to connect to it.
storageAPIListenAddr is the IP:port where the frontend
will expect the storage node to connect to it. Notice that
unlike A4/A5, the frontend node does not know and cannot
connect to storage nodes. Storage node must connect to the
frontend, instead.
storageTimeout is the number of seconds to
wait before retrying a failed storage RPC.
ftrace is a frontend tracer instance;
use this to (1) create a new distributed trace instance
using Tracer.CreateTrace(), and (2) record local frontend
actions in the created trace that are not associated with
any particular client request (see below).
- Storage
-
err ← Start(
frontEndAddr string,
storageAddr string,
diskPath string,
strace *tracing.Tracer
)
- When successful, this call should not return. The storage
server should run until the program is terminated. This
call should return an error if there are networking issues
(e.g., incorrect ip:port given), storage issues (e.g.,
permission errors), or other unrecoverable issues.
frontEndAddr is the IP:port of the frontend node that this
storage node should connect to.
storageAddr is the local IP:port that this storage
node should use to connect to the frontend node.
diskPath is a file system path where the
storage node can create/read/write files. You can assume
that this directory is empty when the storage node is run
for the first time.
strace is a storage tracer instance; use this
(1) create a new distributed trace instance
using Tracer.CreateTrace(), and (2) record local storage
actions in the created trace that are not associated with
any particular client request (see below).
RPC specifications sketch
None. You get to design your own RPCs.
Distributed tracing semantics
As in A4, all actions that you record in this assignment will be
part of distributed tracing. This means you need to use the updated
trace-based API for recording
actions: Trace.RecordAction(action) . The diagram above
illustrates all of the actions described below. However,
this diagram only illustrates the simple case with no concurrent
operations. Additionally, the diagram does not values for opIds, which are
necessary to verify monotonicity properties stated informally above
(and more formally below).
As with A4, keep in mind that all ordering constraints are
evaluated according to the happens before relation (i.e., using
vector clocks), not physical clock timestamps. Below we use the
term 'trace' to refer to a trace that was created by a client during
a put or a get operation. We will refer to frontend trace (or
ftrace) or storage trace (strace) when discussing actions recorded
as part of the frontend/storage traces, which are distinct from
client traces (though they are part of the same logical timespace).
Actions to be recorded by kvslib
KvslibBegin{clientId} : signifies the start/initialize of a client's kvslib. For each unique ClientId corresponding to a running client,
the tracing log as a whole should contain this action exactly once, as part of the kvslib trace.
This action should happen-before all other actions recorded by the client with ClientId.
KvslibPut{clientId,opId,key,value} : For a given ClientId, Key and Value, kvslib should record this action just before sending a Put operation to the frontend.
- A trace must contain this action, or a KvslibGet action, exactly once
before either of the corresponding KvslibPutResult and KvslibComplete.
- This action must happen-before a corresponding
FrontEndPut and FrontEndPutResult, which must refer to the same key-value pair.
- A trace must contain this action before
StoragePut and StorageSaveData.
KvslibGet{clientId,opId,key} : For a given ClientId and Key, kvslib should record this action just before sending a Get operation to the frontend.
- A trace must contain this action, or a KvslibPut action, exactly once
before either of the corresponding KvslibGetResult and KvslibComplete.
- This action must happen-before a corresponding
FrontEndGet and FrontEndGetResult, which must refer to the same key.
- A trace must contain this action before
StorageGet and StorageGetResult.
KvslibPutResult{opId,err} : kvslib should record this action just after receiving the Put results from frontend.
Here err is a boolean, per storageFail in the ResultStruct.
- If KvslibPut is present, a trace must contain this action exactly once
after KvslibBegin and KvslibPut.
- A trace must contain this action after corresponding
FrontEndPut and FrontEndPutResult.
- A trace must contain this action after StorageSaveData.
-
Put results must satisfy the monotonic write condition.
Assume t1 and t2 are two put-related traces writing to same key and generated by
the same client. Let t1 contain
KvslibPutResult(opId1,err:false) and let t2 contain
KvslibPutResult(opId2,err:false) such that opId1 < opId2. Then
t1 must contain a StorageSaveData and t2 must contain a
StorageSaveData' such that StorageSaveData happens before
StorageSaveData'.
KvslibGetResult{opId,key,value,err} : kvslib should record this action just after receiving the Get results from frontend.
Here value is the value of the key key. If Err is true (operation failed), the value must be nil.
value will have the actual value of key key only when Err is false (operation succeeded).
- If KvslibGet is present, a trace must contain this action exactly once
after KvslibBegin and KvslibGet.
-
If Err is false and Value is not nil, then the most recent
instance of StorageLoadSuccess{state} (if it exists) or the
most recent StorageSaveData{key,_} (if it exists) must
associate key with value. One of these StorageLoadSuccess
and StorageSaveData actions must exist.
If both exist, the latest one (that happens-after the other) is considered,
and shall be called the latest relevant store action.
-
If Err is false and Value is nil, then there must neither be
an instance of StorageLoadSuccess{state} nor a
StorageSaveData that associate key with any value in the
past.
- A trace must contain this action after corresponding
FrontEndGet and FrontEndGetResult.
- A trace must contain this action after
StorageGet and StorageGetResult.
-
Get results must satisfy the monotonic read condition.
Assume t1 and t2 are two get-related traces generated by
the same client. Let t1 contain
KvslibGetResult(opId1,key,value,err:false) and let t2
contain KvslibGetResult(opId2,key,value',err:false) such
that opId1 < opId2.
As specified in the second point, both KvslibGetResults must be associated with a latest relevant store action,
that associates key with value and key with value' respectively.
Given this context, either value' must be equal to
value, or the latest relevant store action associated with KvslibGetResult(opId1,key,value,err:false)
must happen-before the latest relevant store action associated with KvslibGetResult(opId2,key,value',err:false).
KvslibComplete{clientId} : For a specific clientId, kvslib records this action to mark the end of a client session,
corresponding to the successful completion of KVS.Close() .
For each ClientId corresponding to a running client, the tracing log as a whole should contain this action exactly once,
as part of the kvslib trace.
This action should happen-after all other actions reported by a given clientId.
For each client, OpIds are unique: Any two traces from the
same client that contain opIds, must have distinct opId values.
Actions to be recorded by frontend
FrontEndStorageStarted{} : Denotes that the frontend has
detected the storage node as being started or restarted. This
action should only be recorded in the frontend trace.
- This action should alternate with
FrontEndStorageFailed. No two FrontEndStorageStarted should
occur without an intermediate FrontEndStorageFailed.
- This action should be recorded only if the frontend has
successfully executed an RPC against the storage node.
- This action must be recorded before any
FrontEndPutResult(err=false) action.
FrontEndStorageFailed{} : Denotes that the frontend has
detected the storage node as failed. This action should only be
recorded in the frontend trace.
- This action should alternate with
FrontEndStorageStarted. No two FrontEndStorageFailed actions
should occur without an intermediate FrontEndStorageStarted.
-
This action should be recorded if the frontend has (1)
failed to successfully execute an RPC against the storage
node, and (2) has re-attempted the failed RPC after a
timeout, and (3) the re-attempted RPC has also failed.
- This action must be recorded before any
FrontEndPutResult{Err: true} or FrontEndGetResult{Err: true} action.
Additionally, there must be no FrontEndStorageStarted{} that happens-after this FrontEndStorageFailed{}
and happens-before the FrontEndPut/GetResult{Err: true}.
FrontEndPut{{key,value}} : In a given trace and with a key & value pair,
frontend records this action just after receiving a Put operation from a client.
- A trace must contain this action exactly once
after KvslibBegin and KvslibPut, should KvslibPut be present.
- A trace must contain this action before FrontEndPutResult.
- A trace must contain this action before any
StoragePut and StorageSaveData, if they are present.
If they are present, these subsequent actions must refer to the same key,value pair.
FrontEndPutResult{err} : The frontend must record this action after receiving Put results from storage node.
Here err is a boolean, per storageFail in the ResultStruct.
- A trace must contain this action exactly once
after KvslibBegin and KvslibPut, should KvslibPut be present.
- A trace must contain this action after a corresponding
FrontEndPut.
- A trace must contain this action after
StoragePut and StorageSaveData.
-
This action should only be recorded with err set to
true if the frontend has detected storage as failed and has
(unsuccessfully) re-tried the operation. Record of this
action indicates that kvslib will record a corresponding
later (happens after) KvslibPutResult with err set to true.
FrontEndGet{key} : In a given trace and with a key key,
frontend records this action just after receiving a Get operation from a client.
- A trace must contain this action exactly once
after KvslibBegin and KvslibGet, should KvslibGet be present.
- A trace must contain this action before FrontEndGetResult.
- A trace must contain this action before
any instances of StorageGet and StorageGetResult.
Any of these subsequent actions should refer to the same key.
FrontEndGetResult{key,value,err} : frontend should record this action just after receiving the Get results from storage.
Here value is the value of the key key. If err is true (operation failed), the value will be nil.
value will have actual value of key key only when err is false (operation succeeded).
- A trace must contain this action exactly once
after KvslibBegin and KvslibGet, should KvslibGet be present.
- A trace must contain this action after corresponding
FrontEndGet.
- A trace must contain this action after
StorageGet and StorageGetResult, and, if err is set to false,
this action should report the same key,value pair as as the StorageGetResult.
-
This action should only be recorded with err set to
true if the frontend has detected storage as failed and has
(unsuccessfully) re-tried the operation. Record of this
action indicates that kvslib will record a corresponding
later (happens after) KvslibGetResult with err set to true.
Actions to be recorded by storage
StorageLoadSuccess{state} : The storage records this action just
after successful loading of Key-Value state from disk (or
initializing this state, if it does not exist). It should report
all key-value pairs that were loaded as state, which is a
Go map.
- This should be the first action recorded by the storage
node on start (or restart) and it should be recorded only in
the storage trace.
StoragePut{key,value} : For a given key-value pair key & value, the storage node records this action
just after receiving a Put operation from frontend.
- A trace must contain this action between zero and two times,
after KvslibBegin and KvslibPut, should KvslibPut be present.
It may only appear zero times if the storage node is not running or has failed during the put operation; it may only appear
more than once if the two occurrences are separated by a StorageLoadSuccess.
- A trace must contain this action after FrontEndPut and before FrontEndPutResult.
- Any instances of this action in a trace must happen-before at least one instance of StorageSaveData in the same trace.
StorageSaveData{key,value} : The storage records this action just after durably adding/modifying the put value to disk.
- A trace must contain this action between zero and two times,
after KvslibBegin and KvslibPut, should KvslibPut be present.
It may only appear zero times if the storage node is not running or has failed during the put operation;
it may only appear more than once if the two occurrences are separated by a StorageLoadSuccess.
- A trace must contain this action after FrontEndPut and before respective FrontEndPutResult.
- A trace must contain this action after some instance of StorageLoadSuccess, and this action must happen-after
at least one instance of StoragePut in the same trace.
- Effect: across all traces, all instances of StorageGetResult for which this action is the latest relevant store action
must report the same key,value pair.
StorageGet{key} : For a given key key, the storage node records this action
just after receiving a Get operation from frontend.
- A trace must contain this action between zero and two times,
after KvslibBegin and KvslibGet, should KvslibGet be present.
It may only appear zero times if the storage node is not running or has failed during the get operation;
it may only appear more than once if the two occurrences are separated by a StorageLoadSuccess.
- A trace must contain this action after FrontEndGet and before FrontEndGetResult.
- A trace must contain this action after
some instance of StorageLoadSuccess, and this action must happen-before at least one instance of StorageGetResult in the same trace.
StorageGetResult{key,value} : The storage records this action just after a Get on the in-memory key-value store.
Here value is the value of the key key. If key does not exists in key-value map, then
value must be nil. Otherwise, value will have actual value of key key.
- A trace must contain this action between zero and two times,
after KvslibBegin and KvslibGet, should KvslibGet be present.
It may only appear zero times if the storage node is not running or has failed during the get operation;
it may only appear more than once if the two occurrences are separated by a StorageLoadSuccess.
- Any instance of this action must report the same value for the given key as the
latest relevant store action associated with the same key.
- A trace must contain this action after FrontEndGet and before FrontEndGetResult.
- A trace must contain this action after
some instance of StorageLoadSuccess, and this action must happen-before at least one instance of StorageGet in the same trace.
Assumptions you can and cannot make
-
You can assume that the frontend will start first, followed by
the storage node, followed by some number of clients. (But,
remember that the storage node may fail and later restart at any
time and any number of times.)
-
Except for the frontend retry value of storageTimeout
seconds, you should not make any timing assumptions about how
long RPCs/communication may take over the network (i.e., assume
an asynchronous environment).
Implementation notes
The provided starter code contains four applications. Your task is
to implement the frontend app, the storage app, and the kvslib
that will compile with a client
application.
Like A3 and A4, A5 also has a cmd/ directory which
contains each application in a separate directory.
The four applications are the following:
While you should be making changes to cmd/frontend/main.go
and cmd/storage/main.go,
you also need to make changes to client application in cmd/client/main.go;
all client implementation-related changes should occur in kvslib/kvslib.go.
The kvslib/ directory contains the
kvslib package. It defines a KVS struct with stub methods you should implement.
This exact API is required, and will be relied upon for grading.
You may, however, add any fields you need to the KVS struct, as well
as any private helper methods.
The config/ directory contains the
configuration files for the client, storage, frontend, and tracing server.
The storage.go and
frontend.go contain the
configuration and tracing action structs, while
client.go implements a client
based on the kvslib API.
We provide you with a Makefile
to easily build and compile all the above applications.
The go.mod file contains dependency
tracking metadata, including a dependency on
https://github.com/DistributedClocks/tracing,
the tracing library.
The go.sum file contains
auto-generated metadata (checksums for module dependencies).
You should not need to touch or understand this file.
Testing
We will release some smoke tests for you to test your tracing log
files. As before, we recommend that you test your system across a
variety of deployments. One of the best ways to test your system is
to use the tracing output and assert that events are occurring in
the expected order.
Implementation requirements
- The client code must be runnable on CS ugrad machines and be
compatible with Go version 1.15.6.
- Your code must be compatible with the given APIs, and run
successfully against the provided initial code for client, frontend,
storage nodes.
- Your solution can only
use standard library Go
packages.
- Your solution code must be Gofmt'd
using gofmt.
Handin instructions
As with previous assignments, you will use your personal repository
for this assignment under the
CPSC416-2020W-T2
org. Your repository will look
like cpsc416_a5_USERNAME1_USERNAME2 .
We will make the starter code available as part of your handin
repository. Keep the provided file layout at the top-level in
your solution. You may include additional files, but we do not expect
this will be necessary. Do not reference any additional libraries in
your solution.
Rough grading scheme
Your code must not change the API,
tracing structs and applications' configurations and command-line
arguments. Your code must work on ugrad servers. If any of these are
violated, your mark for this assignment is 0. This is true regardless
of how many characters had to be changed to make your solution
compile, and regardless of how well your solution works with a
different API or on a different machine.
Both partners in a team will receive the same assignment mark. That
is, we will not be doing any accounting of who contributed what to the
team's solution.
The high-level A5 mark breakdown looks like this:
- 20% : Non-concurrent put/get requests without faults
- 30% : Concurrent put/get requests without faults
- 30% : Correct failure semantics/recovery at storage node
- 10% : Correct failure semantics in kvslib and frontend
- 10% : Recovery in complex fault scenarios
Please make sure that your code is structured according to our
instructions. We will deduct 10% if we have to move, rename, or edit
files in your repository to make your project work with our
autograding scripts.
Performance bonuses. Do you want to earn more points? It's
easy, just make your highly concurrent and fault-tolerant and
multi-node system fast. How fast? You have to be faster than 90% of
the class. That's only 126 other students, or 63 other A5
groups. There are four performance bonuses:
- Get throughput (5% on top of A5 mark): what's the highest concurrent
get request rate that your system can sustain from multiple
concurrent clients? Assume a 100% get workload.
- Put throughput (5% on top of A5 mark): what's the highest concurrent
put request rate that your system can sustain from multiple
concurrent clients? Assume a 100% put workload.
- P99
get latency (5% on top of A5 mark): what's your system's 99th
percentile end-to-end get request response time?
- P99 put latency (5% on top of A5 mark): what's your system's 99th
percentile end-to-end put request response time?
Everyone's solutions will be evaluated for a bonus. To qualify,
your solution must be able to take nil values for tracer/trace
instances in the API. That is, we will only run performance
tests on your system if your system can disable tracing (not crash
with nil values for tracer/trace).
To earn a bonus you must respect the other constraints: (1) no
caching of key-value state in the kvslib or in the frontend, (2) no
skipping of fault tolerance logic, (3) no assumptions about when the
storage node fails, (4) no assumptions about network delay/timing,
(5) no changes to APIs or configuration files.
We will not release the performance tests we will use. We will also
not release the distribution of key/value sizes nor their pattern.
We will also not release our definition of sustain for
throughput, though you can think of it as the max request rate that
your system achieves before hitting a knee in the performance curve.
Extra credit: 2% of final course mark
An earlier diagram depicted a case where the storage state is
inconsistent with what the client observes. Specifically, the system
is allowed to fail a client operation due to storage node failure,
even though, internally, the operation succeeded. On storage restart
this inconsistency becomes visible to the client.
To earn this extra credit you must eliminate this inconsistency
problem. You must do so while respecting the other constraints: (1) no
caching of key-value state in the kvslib or in the frontend, (2) no
arbitrary delaying of client operations (clients should observe
storage node failures), (3) no assumptions about when the storage node
fails, (4) no assumptions about network delay/timing, (5) no changes
to APIs or configuration files, etc.
You must explicitly ask for extra credit evaluation. Please post a
private piazza post for this. The extra credit will be marked in a
three-step process. First, you must compose and share a short document
that reviews your solution. Second, you must schedule to meet with
Ivan and do a code review to illustrate how your solution
works. Third, at the end of your code review session, you should do a
demo that illustrates your solution in practice.
Make sure to follow the
course collaboration policy and
refer to the submission
instructions that detail how to submit your solution.
Note that this is a group assignment requiring two people. We strongly
recommend that you work in a team. Please find a partner using post @5
on Piazza.
|