PythonとVeriloggenでソーティングネットワークを書いてみる
@miyox氏がSynthesijer.Scalaでソーティングネットワークを自動生成していたので,Veriloggenでも試してみた.
VeriloggenはPythonでVerilog HDLのソースコードを組み立てるフレームワークです.
基本的な構成はSynthesijer.Scala版とほぼ同じだけど,Veriloggenの方がVerilog HDLの文法に近く抽象化が少ない.
比較器
_i = [0] def mk_pair(): s = m.Wire('small_' + str(_i[0]), width) l = m.Wire('large_' + str(_i[0]), width) _i[0] += 1 return s, l def prim_net(a, b): s, l = mk_pair() m.Assign(s( Cond(a < b, a, b) )) # small m.Assign(l( Cond(a < b, b, a) )) # large return s, l
mk_pair()で比較用のwire変数を作成し,prim_net()でassignで代入する.比較器を組み合わせ回路として構成する.
ネットワーク1段分
def chain_net(regs, fsm, e): x = regs[0] for i in range(e): s, l = prim_net(x, regs[i+1]) fsm.add( regs[i](s) ) x = l fsm.add( regs[e](x) ) for i in range(e + 1, len(regs)): fsm.add( regs[i](regs[i]) ) fsm.goto_next()
chain_netでパイプラインステージ1段分の回路を組み立てる.Veriloggenはlib.fsmというステートマシンライブラリがあるのでそれを使う.
fsm自体は状態遷移機械を管理するオブジェクトである.そのadd()メソッドを呼び出すことで,各状態での変数の代入を行う.
fsm.goto_next()を呼び出すと,状態変数がひとつ増加し,その状態へ遷移するコードが追加される.fsm.goto_next()を使わずに次の状態に進めたい場合には,代入と同時にfsm.set_next()を呼び出すし,その後,fsm.inc()を呼び出せばよい.
組み立て
# build up fsm = lib.FSM(m, 'fsm') idle = fsm.current() # init state fsm.add(*[registers[i](inputs[i]) for i in range(numports)], cond=kick) fsm.add(busy(1), cond=kick) fsm.goto_next(cond=kick) # connect network for i in range(numports): chain_net(registers, fsm, numports-i-1) # finalize fsm.add(busy(0)) fsm.goto(idle)
fsmは状態遷移を管理するオブジェクト.fsm.current()で現在の状態遷移のラベルが取得できる.
まず,kickがアサートされたら,register*にinput*をコピーし,同時にbusyをアサートする.fsm.add()の名前付き引数でcond=を指定すると,代入する条件が指定出来る.If(cond)( hoge ) として条件を追加することのシンタックスシュガーである.
そして次の状態へ遷る.fsm.goto_next()の名前付き引数でcond=を指定すると,次の状態に遷移する条件を追加できる.指定しない場合やNoneを指定した場合には,無条件で遷移する.
あとは,chain_net()を規定回数呼び出して,ネットワーク全体を組み立てる.最後に,busyをデアサートして,idle状態に遷移して終了. fsm.goto_next()を呼び出すと,状態変数が自動的に1増えるが,fsm.goto()の場合には増えない.
ソーティングネットワークのソースコード全体
import sys import os from veriloggen import * nclk = lib.simulation.next_clock def mkSort(numports=4): m = Module('sort') width = m.Parameter('WIDTH', 32) clk = m.Input('CLK') rst = m.Input('RST') inputs = [ m.Input('input_' + str(i), width) for i in range(numports) ] outputs = [ m.Output('output_' + str(i), width) for i in range(numports) ] kick = m.Input('kick') busy = m.OutputReg('busy') registers = [ m.Reg('registers_' + str(i), width) for i in range(numports) ] for i in range(numports): m.Assign(outputs[i](registers[i])) _i = [0] def mk_pair(): s = m.Wire('small_' + str(_i[0]), width) l = m.Wire('large_' + str(_i[0]), width) _i[0] += 1 return s, l def prim_net(a, b): s, l = mk_pair() m.Assign(s( Cond(a < b, a, b) )) # small m.Assign(l( Cond(a < b, b, a) )) # large return s, l def chain_net(regs, fsm, e): x = regs[0] for i in range(e): s, l = prim_net(x, regs[i+1]) fsm.add( regs[i](s) ) x = l fsm.add( regs[e](x) ) for i in range(e + 1, len(regs)): fsm.add( regs[i](regs[i]) ) fsm.goto_next() # build up fsm = lib.FSM(m, 'fsm') idle = fsm.current() # init state fsm.add(*[registers[i](inputs[i]) for i in range(numports)], cond=kick) fsm.add(busy(1), cond=kick) fsm.goto_next(cond=kick) # connect network for i in range(numports): chain_net(registers, fsm, numports-i-1) # finalize fsm.add(busy(0)) fsm.goto(idle) init = [ busy(0) ] + [ r(0) for r in registers ] + [ fsm.set_init() ] # import assignment into always statement m.Always(Posedge(clk))( If(rst)( init ).Else( fsm.make_case() )) return m def mkSimSort(numports=4): m = Module('simsort') width = m.Parameter('WIDTH', 32) clk = m.Reg('CLK') rst = m.Reg('RST') inputs = [ m.Reg('input_' + str(i), width) for i in range(numports) ] outputs = [ m.Wire('output_' + str(i), width) for i in range(numports) ] kick = m.Reg('kick') busy = m.Wire('busy') uut = m.Instance(mkSort(numports), 'uut', (width,), [clk, rst] + inputs + outputs + [kick, busy]) lib.simulation.setup_waveform(m, uut) lib.simulation.setup_clock(m, clk) lib.simulation.setup_reset(m, rst) m.Initial( [ ip(100 - i) for i, ip in enumerate(inputs) ], kick(0), Wait(rst), nclk(clk), Wait(Not(rst)), nclk(clk), nclk(clk), nclk(clk), kick(1), nclk(clk), kick(0), ) m.Initial( Delay(100), Wait(kick), nclk(clk), Wait(busy), nclk(clk), Wait(Not(busy)), nclk(clk), Systask('finish'), ) return m if __name__ == '__main__': sort = mkSimSort() verilog = sort.to_verilog('tmp.v') print(verilog)
Veriloggenはまだテストベンチが生成できないので,テストベンチはVerilog HDLで書いた.面倒だった.
現在,テストベンチもVeriloggenで書けます.上記のmkSimSortを参照.
module test; initial begin $dumpfile("uut.vcd"); $dumpvars(0, uut); end parameter WIDTH = 32; reg CLK; reg RST; reg [(WIDTH - 1):0] input_0; reg [(WIDTH - 1):0] input_1; reg [(WIDTH - 1):0] input_2; reg [(WIDTH - 1):0] input_3; wire [(WIDTH - 1):0] output_0; wire [(WIDTH - 1):0] output_1; wire [(WIDTH - 1):0] output_2; wire [(WIDTH - 1):0] output_3; reg kick; wire busy; sort uut( CLK, RST, input_0, input_1, input_2, input_3, output_0, output_1, output_2, output_3, kick, busy ); initial begin CLK = 0; forever #5 CLK = ~CLK; end initial begin RST = 0; kick = 0; input_0 = 4; input_1 = 3; input_2 = 2; input_3 = 1; #100; RST = 1; #100; RST = 0; @(posedge CLK); #1; @(posedge CLK); #1; @(posedge CLK); #1; kick = 1; @(posedge CLK); #1; kick = 0; #500; $finish; end endmodule
実行結果とシミュレーション結果
生成されるVerilog HDLのコードはこんな感じ.なお現在のPyverilog.ast_code_generatorはちゃんと整形されたコードを生成します.
module simsort # ( parameter WIDTH = 32 ) ( ); reg CLK; reg RST; reg [(WIDTH - 1):0] input_0; reg [(WIDTH - 1):0] input_1; reg [(WIDTH - 1):0] input_2; reg [(WIDTH - 1):0] input_3; wire [(WIDTH - 1):0] output_0; wire [(WIDTH - 1):0] output_1; wire [(WIDTH - 1):0] output_2; wire [(WIDTH - 1):0] output_3; reg kick; wire busy; sort #( WIDTH ) uut ( CLK, RST, input_0, input_1, input_2, input_3, output_0, output_1, output_2, output_3, kick, busy ); initial begin $dumpfile("uut.vcd"); $dumpvars(0, uut); end initial begin CLK = 0; forever begin #5 CLK = (!CLK); end end initial begin RST = 0; #100; RST = 1; #100; RST = 0; end initial begin input_0 = 100; input_1 = 99; input_2 = 98; input_3 = 97; kick = 0; wait(RST); @(posedge CLK); #1; wait((!RST)); @(posedge CLK); #1; @(posedge CLK); #1; @(posedge CLK); #1; kick = 1; @(posedge CLK); #1; kick = 0; end initial begin #100; wait(kick); @(posedge CLK); #1; wait(busy); @(posedge CLK); #1; wait((!busy)); @(posedge CLK); #1; $finish; end endmodule module sort # ( parameter WIDTH = 32 ) ( input CLK, input RST, input [(WIDTH - 1):0] input_0, input [(WIDTH - 1):0] input_1, input [(WIDTH - 1):0] input_2, input [(WIDTH - 1):0] input_3, output [(WIDTH - 1):0] output_0, output [(WIDTH - 1):0] output_1, output [(WIDTH - 1):0] output_2, output [(WIDTH - 1):0] output_3, input kick, output reg busy ); reg [(WIDTH - 1):0] registers_0; reg [(WIDTH - 1):0] registers_1; reg [(WIDTH - 1):0] registers_2; reg [(WIDTH - 1):0] registers_3; assign output_0 = registers_0; assign output_1 = registers_1; assign output_2 = registers_2; assign output_3 = registers_3; reg [(32 - 1):0] fsm; localparam fsm_init = 0; localparam fsm_1 = 1; wire [(WIDTH - 1):0] small_0; wire [(WIDTH - 1):0] large_0; assign small_0 = (((registers_0 < registers_1))? registers_0 : registers_1); assign large_0 = (((registers_0 < registers_1))? registers_1 : registers_0); wire [(WIDTH - 1):0] small_1; wire [(WIDTH - 1):0] large_1; assign small_1 = (((large_0 < registers_2))? large_0 : registers_2); assign large_1 = (((large_0 < registers_2))? registers_2 : large_0); wire [(WIDTH - 1):0] small_2; wire [(WIDTH - 1):0] large_2; assign small_2 = (((large_1 < registers_3))? large_1 : registers_3); assign large_2 = (((large_1 < registers_3))? registers_3 : large_1); localparam fsm_2 = 2; wire [(WIDTH - 1):0] small_3; wire [(WIDTH - 1):0] large_3; assign small_3 = (((registers_0 < registers_1))? registers_0 : registers_1); assign large_3 = (((registers_0 < registers_1))? registers_1 : registers_0); wire [(WIDTH - 1):0] small_4; wire [(WIDTH - 1):0] large_4; assign small_4 = (((large_3 < registers_2))? large_3 : registers_2); assign large_4 = (((large_3 < registers_2))? registers_2 : large_3); localparam fsm_3 = 3; wire [(WIDTH - 1):0] small_5; wire [(WIDTH - 1):0] large_5; assign small_5 = (((registers_0 < registers_1))? registers_0 : registers_1); assign large_5 = (((registers_0 < registers_1))? registers_1 : registers_0); localparam fsm_4 = 4; localparam fsm_5 = 5; always @(posedge CLK) begin if(RST) begin busy <= 0; registers_0 <= 0; registers_1 <= 0; registers_2 <= 0; registers_3 <= 0; fsm <= fsm_init; end else begin case(fsm) fsm_init: begin if(kick) begin registers_0 <= input_0; registers_1 <= input_1; registers_2 <= input_2; registers_3 <= input_3; end if(kick) begin busy <= 1; end if(kick) begin fsm <= fsm_1; end end fsm_1: begin registers_0 <= small_0; registers_1 <= small_1; registers_2 <= small_2; registers_3 <= large_2; fsm <= fsm_2; end fsm_2: begin registers_0 <= small_3; registers_1 <= small_4; registers_2 <= large_4; registers_3 <= registers_3; fsm <= fsm_3; end fsm_3: begin registers_0 <= small_5; registers_1 <= large_5; registers_2 <= registers_2; registers_3 <= registers_3; fsm <= fsm_4; end fsm_4: begin registers_0 <= registers_0; registers_1 <= registers_1; registers_2 <= registers_2; registers_3 <= registers_3; fsm <= fsm_5; end fsm_5: begin busy <= 0; fsm <= fsm_init; end endcase end end endmodule
動作波形はこんな感じ.
ちゃんとソートされている.
感想
Veriloggen,あんまり書きやすくないかも・・・.精進します. PythonなのにSynthesijer.Scalaよりもソースコードが長くなったのもイマイチ.