control_flow

Modules

SlowState #(T, int DELAY)

module SlowState #(T, int DELAY) {
   domain data
    trigger is_stable'0 : T old'0
    action set_slow'0 : T new_value_slow'DELAY
    action set_fast'0 : T new_value_fast'0
   domain reset
    action rst
}
SlowState data clock clk reset DELAY DELAY set_slow new_value_slow set_fast new_value_fast is_stable old rst

Allows us to update a blob of state with a pipelined update function. Because the new value for the register takes a few cycles to arrive, the throughput of this module is at best one per DELAY+1 cycles.

Check slow_state.is_stable to get the currently stored value. From this, you can update it with slow_state.set_slow(new_value_slow). Note that set_slow itself must be known within the same cycle as is_stable. The new_value_slow may take as many cycles as desired. When you do, is_stable will disengage until new_value_slow is computed.

You can also immediately set the state with slow_state.set_fast(new_value_fast). In this case, the new value is visible on is_stable immediately.

Example, taken from some batching memory reading code. In this code, the goal was to make sequential double (8 bytes) requests from memory, based on the counts passed via the sizes_fifo. In this example, the SlowState takes one pipeline stage for the sizes_fifo.pop(), and one for adding (size * 8) to the current address. This means it has a maximum rate of 1 update per 3 cycles.

SlowState#(T: type int#(FROM: 0, TO: 1 << 64)) current_addr
 
action set_new_addr : int#(FROM: 0, TO: 1 << 64) new_addr {
    current_addr.set_fast(new_addr)
}
 
FIFO sizes_fifo
input bool may_request_read
trigger request_read: int addr, int count
when sizes_fifo.may_pop & may_request_read {
    when current_addr.is_stable : int cur_addr {
        int size = sizes_fifo.pop() // pop()'s output may have some pipeline stages
        push_addr(cur_addr, size)
        // Add a pipeline stage for next_addr. 64-bit adds are *expensive*
        reg int next_addr = cur_addr + (size * 8) mod 1 << 64
        current_addr.set_slow(next_addr)
    }
}

SlowStateAdvanced #(T, T RESET_TO, int OLD_DELAY, int NEW_DELAY)

module SlowStateAdvanced #(T, T RESET_TO, int OLD_DELAY, int NEW_DELAY) {
  clock clk
   domain clk
    action rst
   domain data
    output state T old'-OLD_DELAY
    output bool may_update'0
    action update'0 : T new'NEW_DELAY
}
SlowStateAdvanced clk clock clk data OLD_DELAY NEW_DELAY rst update new old may_update

Allows us to update a blob of state with a pipelined update function update() may only be called if may_update, upon which may_update goes low for UPDATE_PIPELINE_DEPTH cycles

SlowPipelineEnd #(T, int PIPELINE_DEPTH)

module SlowPipelineEnd #(T, int PIPELINE_DEPTH) {
   domain main
    input T din'PIPELINE_DEPTH
    action input_changed'0
    trigger pipeline_stable'0 : T dout'0
   domain reset
    action rst
}
SlowPipelineEnd main clock clk reset PIPELINE_DEPTH PIPELINE_DEPTH din input_changed pipeline_stable dout rst

This module lets you wait until all values within a pipeline have stabilized, and use the results as if the pipeline completed instantaneously.

As opposed to SlowPipelineBegin, this module is meant to go at the end of your pipeline

Usage:

state float cur_sum'0
SlowPipelineEnd manage_adder
 
float next_sum'11 = fp32_add(cur_sum, 1.0) // Takes 11 cycles
manage_adder.din = next_sum
when manage_adder.pipeline_stable : float new_sum {
    cur_sum = new_sum
    manage_adder.input_changed()
} 

SlowPipelineBegin #(T, int PIPELINE_DEPTH)

module SlowPipelineBegin #(T, int PIPELINE_DEPTH) {
   domain main
    input T din'0
    action input_changed'0
    output T dout'-PIPELINE_DEPTH
    trigger pipeline_stable'0
   domain reset
    action rst
}
SlowPipelineBegin main clock clk reset PIPELINE_DEPTH PIPELINE_DEPTH din input_changed dout pipeline_stable rst

This module lets you wait until all values within a pipeline have stabilized, and use the results as if the pipeline completed instantaneously.

As opposed to SlowPipelineEnd, this module is meant to go at the beginning of your pipeline

Usage:

state float cur_sum'0
SlowPipeline manage_adder
manage_adder.din = cur_sum
 
float next_sum'11 = fp32_add(manage_adder.dout, 1.0) // Takes 11 cycles
when manage_adder.pipeline_stable {
    cur_sum = next_sum
    manage_adder.input_changed()
} 

ParallelWhile #(int COMPUTATION_LATENCY, int REQUEST_DATA_LATENCY)

module ParallelWhile #(int COMPUTATION_LATENCY, int REQUEST_DATA_LATENCY) {
  clock clk
   domain clk
   domain iter_domain
    trigger iter'0 : int #(FROM: 0, TO: TOTAL_CYCLES) cur_iter'0
    action continue'COMPUTATION_LATENCY
    trigger may_start'-REQUEST_DATA_LATENCY
    action start'0
   domain reset
    action rst
}
ParallelWhile clk clock clk iter_domain reset COMPUTATION_LATENCY COMPUTATION_LATENCY REQUEST_DATA_LATENCY continue start iter cur_iter may_start rst

This module lets you create pipelined feedback loops of variable latency.

Use as follows:

FIFO input_fifo
trigger outflow: int assoc_data
 
ParallelWhile pw
ParallelState cur_iter
ParallelStore associated_data
when pw.may_start {
    // gather up data
    when input_fifo.may_pop {
        int num_iters = input_fifo.pop()
        cur_iter.init(num_iters)
        associated_data.init(num_iters)
        pw.start()
    }
}
 
when pw.iter : int iteration {
    cur_iter.link(iteration)
    associated_data.link(iteration)
    int num_iters_left = cur_iter.old
    when num_iters_left > 0 {
        reg reg reg cur_iter.new = UnsafeIntCast#(FROM: 0, TO)(num_iters_left - 1)
        pw.continue()
    } else {
        outflow(associated_data.old)
    }
}

The order in which results arrive is not the same as the order of inputs, due to some taking more cycles to execute.

ParallelState #(T, int NUM_PARALLEL_STATES, int LATENCY)

module ParallelState #(T, int NUM_PARALLEL_STATES, int LATENCY) {
    action link'0 : int#(FROM: 0, TO: NUM_PARALLEL_STATES) cur_iter_id'0
    output T old'2
    action init'0 : T initial_data'0
    input T new'LATENCY
}
ParallelState default clock clk LATENCY - 2 link cur_iter_id init initial_data new old

See ParallelWhile

Similar to ParallelStore

Must be initialized simultaneously with [ParallelWhile::start] (with [ParallelState::init]). Old data can be read from [ParallelState::old] Data must be updated through [ParallelState::new] every cycle

ParallelStore #(T, int NUM_PARALLEL_STATES)

module ParallelStore #(T, int NUM_PARALLEL_STATES) {
    action link'0 : int#(FROM: 0, TO: NUM_PARALLEL_STATES) cur_iter_id'0
    output T old'3
    action init'0 : T initial_data'0
}
ParallelStore default clock clk link cur_iter_id init initial_data old

See ParallelWhile

Similar to ParallelState, but only initialized once, simultaneously with [ParallelWhile::start]. Data is returned every cycle on [ParallelStore::old]. Old data can be read from [ParallelState::old]