control_flow
Modules
SlowState #(T, int DELAY)
module SlowState #(T, int DELAY) {
domain data
trigger is_stable'0 : T old'0
action set_slow'0 : T new_value_slow'DELAY
action set_fast'0 : T new_value_fast'0
domain reset
action rst
}Allows us to update a blob of state with a pipelined update function.
Because the new value for the register takes a few cycles to arrive,
the throughput of this module is at best one per DELAY+1 cycles.
Check slow_state.is_stable to get the currently stored value.
From this, you can update it with slow_state.set_slow(new_value_slow).
Note that set_slow itself must be known within the same cycle as is_stable.
The new_value_slow may take as many cycles as desired.
When you do, is_stable will disengage until new_value_slow is computed.
You can also immediately set the state with slow_state.set_fast(new_value_fast).
In this case, the new value is visible on is_stable immediately.
Example, taken from some batching memory reading code. In this code, the goal was
to make sequential double (8 bytes) requests from memory, based on the counts passed via the sizes_fifo.
In this example, the SlowState takes one pipeline stage for the sizes_fifo.pop(),
and one for adding (size * 8) to the current address. This means it has a maximum rate of 1 update per 3 cycles.
SlowState#(T: type int#(FROM: 0, TO: 1 << 64)) current_addr
action set_new_addr : int#(FROM: 0, TO: 1 << 64) new_addr {
current_addr.set_fast(new_addr)
}
FIFO sizes_fifo
input bool may_request_read
trigger request_read: int addr, int count
when sizes_fifo.may_pop & may_request_read {
when current_addr.is_stable : int cur_addr {
int size = sizes_fifo.pop() // pop()'s output may have some pipeline stages
push_addr(cur_addr, size)
// Add a pipeline stage for next_addr. 64-bit adds are *expensive*
reg int next_addr = cur_addr + (size * 8) mod 1 << 64
current_addr.set_slow(next_addr)
}
}
SlowStateAdvanced #(T, T RESET_TO, int OLD_DELAY, int NEW_DELAY)
module SlowStateAdvanced #(T, T RESET_TO, int OLD_DELAY, int NEW_DELAY) {
clock clk
domain clk
action rst
domain data
output state T old'-OLD_DELAY
output bool may_update'0
action update'0 : T new'NEW_DELAY
}Allows us to update a blob of state with a pipelined update function
update() may only be called if may_update, upon which may_update goes low for UPDATE_PIPELINE_DEPTH cycles
SlowPipelineEnd #(T, int PIPELINE_DEPTH)
module SlowPipelineEnd #(T, int PIPELINE_DEPTH) {
domain main
input T din'PIPELINE_DEPTH
action input_changed'0
trigger pipeline_stable'0 : T dout'0
domain reset
action rst
}This module lets you wait until all values within a pipeline have stabilized, and use the results as if the pipeline completed instantaneously.
As opposed to SlowPipelineBegin, this module is meant to go at the end of your pipeline
Usage:
state float cur_sum'0
SlowPipelineEnd manage_adder
float next_sum'11 = fp32_add(cur_sum, 1.0) // Takes 11 cycles
manage_adder.din = next_sum
when manage_adder.pipeline_stable : float new_sum {
cur_sum = new_sum
manage_adder.input_changed()
}
SlowPipelineBegin #(T, int PIPELINE_DEPTH)
module SlowPipelineBegin #(T, int PIPELINE_DEPTH) {
domain main
input T din'0
action input_changed'0
output T dout'-PIPELINE_DEPTH
trigger pipeline_stable'0
domain reset
action rst
}This module lets you wait until all values within a pipeline have stabilized, and use the results as if the pipeline completed instantaneously.
As opposed to SlowPipelineEnd, this module is meant to go at the beginning of your pipeline
Usage:
state float cur_sum'0
SlowPipeline manage_adder
manage_adder.din = cur_sum
float next_sum'11 = fp32_add(manage_adder.dout, 1.0) // Takes 11 cycles
when manage_adder.pipeline_stable {
cur_sum = next_sum
manage_adder.input_changed()
}
ParallelWhile #(int COMPUTATION_LATENCY, int REQUEST_DATA_LATENCY)
module ParallelWhile #(int COMPUTATION_LATENCY, int REQUEST_DATA_LATENCY) {
clock clk
domain clk
domain iter_domain
trigger iter'0 : int #(FROM: 0, TO: TOTAL_CYCLES) cur_iter'0
action continue'COMPUTATION_LATENCY
trigger may_start'-REQUEST_DATA_LATENCY
action start'0
domain reset
action rst
}This module lets you create pipelined feedback loops of variable latency.
Use as follows:
FIFO input_fifo
trigger outflow: int assoc_data
ParallelWhile pw
ParallelState cur_iter
ParallelStore associated_data
when pw.may_start {
// gather up data
when input_fifo.may_pop {
int num_iters = input_fifo.pop()
cur_iter.init(num_iters)
associated_data.init(num_iters)
pw.start()
}
}
when pw.iter : int iteration {
cur_iter.link(iteration)
associated_data.link(iteration)
int num_iters_left = cur_iter.old
when num_iters_left > 0 {
reg reg reg cur_iter.new = UnsafeIntCast#(FROM: 0, TO)(num_iters_left - 1)
pw.continue()
} else {
outflow(associated_data.old)
}
}
The order in which results arrive is not the same as the order of inputs, due to some taking more cycles to execute.
ParallelState #(T, int NUM_PARALLEL_STATES, int LATENCY)
module ParallelState #(T, int NUM_PARALLEL_STATES, int LATENCY) {
action link'0 : int#(FROM: 0, TO: NUM_PARALLEL_STATES) cur_iter_id'0
output T old'2
action init'0 : T initial_data'0
input T new'LATENCY
}See ParallelWhile
Similar to ParallelStore
Must be initialized simultaneously with [ParallelWhile::start] (with [ParallelState::init]). Old data can be read from [ParallelState::old] Data must be updated through [ParallelState::new] every cycle
ParallelStore #(T, int NUM_PARALLEL_STATES)
module ParallelStore #(T, int NUM_PARALLEL_STATES) {
action link'0 : int#(FROM: 0, TO: NUM_PARALLEL_STATES) cur_iter_id'0
output T old'3
action init'0 : T initial_data'0
}See ParallelWhile
Similar to ParallelState, but only initialized once, simultaneously with [ParallelWhile::start]. Data is returned every cycle on [ParallelStore::old]. Old data can be read from [ParallelState::old]