Splits
Sometimes you're building more complicated hardware structures, and you find yourself in a situation where you've got a repetitive structure that you'd really really want to express as for loops over a series of intermediary arrays. Only, some constraint is holding you back. Perhaps you're implementing some type of cascading structure where Latency Counting inadvertently ties your array elements together causing Net Positive Latency Cycles.
An array declared as a split lets you get around this limitation. Each index in the outermost array becomes a separate wire, and any parameter of these wires not explicitly specified in the declaration can vary freely. Examples of parameters you can vary are:
- Absolute Latency (for
splitwires at multiple points in a pipeline) - Array sizes
- Integer bounds
Example
module AddFloats {
input float[16] floats
output float sum
split float[][5] float_reduction
// Results in:
// float[1] float_reduction_split_0'44
// float[2] float_reduction_split_1'33
// float[4] float_reduction_split_2'22
// float[8] float_reduction_split_3'11
// float[16] float_reduction_split_4'0
float_reduction[4] = input_floats
sum = float_reduction[0]
for int I in 0..4 {
gen int SIZE = pow2#(E: I)
float[SIZE] lefts = float_reduction[I+1][SIZE:]
float[SIZE] rights = float_reduction[I+1][:SIZE]
// We do have to declare sums as a temporary array here to help type inference
// May fix in future https://github.com/pc2/sus-compiler/issues/160
float[SIZE] sums
for int J in 0..SIZE {
sums[J] = fp32_add(lefts, rights)
}
float_reduction[I] = sums
}
}
In the above example, we've implemented a basic floating point TreeAdd. 16 floats are accepted, and a tree of 15 floating point adders from sus-float reduces this down to 1 float. We need split here because:
- We wish to halve the number of floats in each step. We couldn't declare this as a normal 2D array, because it would need to be a staggered array, which non-split wires don't support.
fp32_addhas a latency of 11 cycles on U280 FPGAs. Since we use the same variable for each pair of sequential layers, this would create a net-positive latency cycle fromfloat_reductionto itself. With the split,float_reduction[0]is a distinct wire fromfloat_reduction[1], which is distinct fromfloat_reduction[2], etc. Hence, allowing them to be at different absolute latencies.
Drawbacks of split
- You cannot use a runtime index into the "split" array dimension.
- The parameters of each wire in the split must be Inferred. This may lead to funky constructs.
- Ports cannot be declared
split(Yet... https://github.com/pc2/sus-compiler/issues/196) splitgenerates individual wires. This means in the waveform viewer it's slightly harder to discern.