control_flow

Modules

SlowState #(T, int DELAY)

module SlowState #(T, int DELAY) {
   domain data
    trigger is_stable'0 : T old'0
    action set_slow'0 : T new_value_slow'DELAY
    action set_fast'0 : T new_value_fast'0
   domain reset
    action rst
}

Allows us to update a blob of state with a pipelined update function. Because the new value for the register takes a few cycles to arrive, the throughput of this module is at best one per DELAY+1 cycles.

Check slow_state.is_stable to get the currently stored value. From this, you can update it with slow_state.set_slow(new_value_slow). Note that set_slow itself must be known within the same cycle as is_stable. The new_value_slow may take as many cycles as desired. When you do, is_stable will disengage until new_value_slow is computed.

You can also immediately set the state with slow_state.set_fast(new_value_fast). In this case, the new value is visible on is_stable immediately.

Example, taken from some batching memory reading code. In this code, the goal was to make sequential double (8 bytes) requests from memory, based on the counts passed via the sizes_fifo. In this example, the SlowState takes one pipeline stage for the sizes_fifo.pop(), and one for adding (size * 8) to the current address. This means it has a maximum rate of 1 update per 3 cycles.

SlowState#(T: type int#(FROM: 0, TO: 1 << 64)) current_addr
 
action set_new_addr : int#(FROM: 0, TO: 1 << 64) new_addr {
    current_addr.set_fast(new_addr)
}
 
FIFO sizes_fifo
input bool may_request_read
trigger request_read: int addr, int count
when sizes_fifo.may_pop & may_request_read {
    when current_addr.is_stable : int cur_addr {
        int size = sizes_fifo.pop() // pop()'s output may have some pipeline stages
        push_addr(cur_addr, size)
        // Add a pipeline stage for next_addr. 64-bit adds are *expensive*
        reg int next_addr = cur_addr + (size * 8) mod 1 << 64
        current_addr.set_slow(next_addr)
    }
}

SlowStateAdvanced #(T, T RESET_TO, int OLD_DELAY, int NEW_DELAY)

module SlowStateAdvanced #(T, T RESET_TO, int OLD_DELAY, int NEW_DELAY) {
  clock clk
   domain clk
    action rst
   domain data
    output state T old'-OLD_DELAY
    output bool may_update'0
    action update'0 : T new'NEW_DELAY
}

Allows us to update a blob of state with a pipelined update function update() may only be called if may_update, upon which may_update goes low for UPDATE_PIPELINE_DEPTH cycles

SlowPipelineEnd #(T, int PIPELINE_DEPTH)

module SlowPipelineEnd #(T, int PIPELINE_DEPTH) {
   domain main
    input T din'PIPELINE_DEPTH
    action input_changed'0
    trigger pipeline_stable'0 : T dout'0
   domain reset
    action rst
}

This module lets you wait until all values within a pipeline have stabilized, and use the results as if the pipeline completed instantaneously.

As opposed to SlowPipelineBegin, this module is meant to go at the end of your pipeline

Usage:

state float cur_sum'0
SlowPipelineEnd manage_adder
 
float next_sum'11 = fp32_add(cur_sum, 1.0) // Takes 11 cycles
manage_adder.din = next_sum
when manage_adder.pipeline_stable : float new_sum {
    cur_sum = new_sum
    manage_adder.input_changed()
}

SlowPipelineBegin #(T, int PIPELINE_DEPTH)

module SlowPipelineBegin #(T, int PIPELINE_DEPTH) {
   domain main
    input T din'0
    action input_changed'0
    output T dout'-PIPELINE_DEPTH
    trigger pipeline_stable'0
   domain reset
    action rst
}

This module lets you wait until all values within a pipeline have stabilized, and use the results as if the pipeline completed instantaneously.

As opposed to SlowPipelineEnd, this module is meant to go at the beginning of your pipeline

Usage:

state float cur_sum'0
SlowPipeline manage_adder
manage_adder.din = cur_sum
 
float next_sum'11 = fp32_add(manage_adder.dout, 1.0) // Takes 11 cycles
when manage_adder.pipeline_stable {
    cur_sum = next_sum
    manage_adder.input_changed()
}

ParallelWhile #(int COMPUTATION_LATENCY, int REQUEST_DATA_LATENCY)

module ParallelWhile #(int COMPUTATION_LATENCY, int REQUEST_DATA_LATENCY) {
  clock clk
   domain clk
   domain iter_domain
    trigger iter'0 : int #(FROM: 0, TO: TOTAL_CYCLES) cur_iter'0
    action continue'COMPUTATION_LATENCY
    trigger may_start'-REQUEST_DATA_LATENCY
    action start'0
   domain reset
    action rst
}

This module lets you create pipelined feedback loops of variable latency.

Use as follows:

FIFO input_fifo
trigger outflow: int assoc_data
 
ParallelWhile pw
ParallelState cur_iter
ParallelStore associated_data
when pw.may_start {
    // gather up data
    when input_fifo.may_pop {
        int num_iters = input_fifo.pop()
        cur_iter.init(num_iters)
        associated_data.init(num_iters)
        pw.start()
    }
}
 
when pw.iter : int iteration {
    cur_iter.link(iteration)
    associated_data.link(iteration)
    int num_iters_left = cur_iter.old
    when num_iters_left > 0 {
        reg reg reg cur_iter.new = UnsafeIntCast#(FROM: 0, TO)(num_iters_left - 1)
        pw.continue()
    } else {
        outflow(associated_data.old)
    }
}

The order in which results arrive is not the same as the order of inputs, due to some taking more cycles to execute.

ParallelState #(T, int NUM_PARALLEL_STATES, int LATENCY)

module ParallelState #(T, int NUM_PARALLEL_STATES, int LATENCY) {
    action link'0 : int#(FROM: 0, TO: NUM_PARALLEL_STATES) cur_iter_id'0
    output T old'2
    action init'0 : T initial_data'0
    input T new'LATENCY
}

See ParallelWhile

Similar to ParallelStore

Must be initialized simultaneously with [ParallelWhile::start] (with [ParallelState::init]). Old data can be read from [ParallelState::old] Data must be updated through [ParallelState::new] every cycle

ParallelStore #(T, int NUM_PARALLEL_STATES)

module ParallelStore #(T, int NUM_PARALLEL_STATES) {
    action link'0 : int#(FROM: 0, TO: NUM_PARALLEL_STATES) cur_iter_id'0
    output T old'3
    action init'0 : T initial_data'0
}

See ParallelWhile

Similar to ParallelState, but only initialized once, simultaneously with [ParallelWhile::start]. Data is returned every cycle on [ParallelStore::old]. Old data can be read from [ParallelState::old]

Semaphore #(int MAX, int INITIAL_PERMITS)

module Semaphore #(int MAX, int INITIAL_PERMITS) {
   domain write
    output bool may_acquire'0
    action acquire'0
   domain read
    action release'0
   domain reset
    action rst
}

The Semaphore is a construct for distributing a limited resource, such as memory buffers or compute units. It has an internal counter called the “permits” counter which keeps track of how many of the guarded resource have been given out thus far.

With may_acquire and acquire(), you request a “permit” for a resource. You use the resource for a while, and when you’re done with it, you return it with release()

For this mode of operation, you start the Semaphore out with INITIAL_PERMITS=MAX permits.

You can also use the Semaphore in reverse mode, acting somewhat akin to a FIFO. In this case, the Semaphore should be set to INITIAL_PERMITS=0, and then when a compute job becomes available release() first.

SharedAcquireSemaphore #(int MAX, int INITIAL_PERMITS, int NUM_SEMAPHORES)

module SharedAcquireSemaphore #(int MAX, int INITIAL_PERMITS, int NUM_SEMAPHORES) {
   domain tmp
    output state int#(FROM: 0, TO: MAX+1)[NUM_SEMAPHORES] available_permits
   domain default
    input bool[NUM_SEMAPHORES] releases
    output bool may_acquire
    action acquire_all
   domain reset
    action rst
}

Equivalent of combining multiple Semaphores together. The releases are independent, whereas may_acquire and acquire_all are shared among all.