PYNQの問題点
FPGAのDMA転送やレジスタ書き込みがPythonから行えるPYNQは通常のFPGA開発と比較してデバッグが非常に容易になり、プログラミングも楽になる素晴らしい環境です。めんどくさいレジスタの設定も自動でやってくれます。
ただ、Pythonというのは諸刃の剣であり、演算が遅かったりGILが邪魔だったりで、FPGAを使いたい場面だとPythonの性質に足を引っ張られます。例えば信号をマルチスレッドでPYNQからDMAで取得したデータをCPUでパイプラインを組んで処理したい!と思っても、Pythonのマルチプロセスのキューライブラリを使うと遅すぎてまともにプログラムが動かないなんてことに陥ります。このような性質はFPGAが得意とするリアルタイム性が用いられるケースと相性が最悪です。
このようにPYNQはPythonでサクサク書いてデバッグできるという性質とFPGAが必要な場面では遅すぎるという二面性があるわけです。
回避するためには通常C言語でZYNQのapiを使ってプログラミングすればよいのでしょうが、多くがubuntuなどのPYNQと同じLinux上で動かすことを想定しておらず、VIVADOのIDE上でプログラムを開発するものが多いので、PYNQで開発したものを同じ環境で高速化したいと思うとちょっとめんどくさいです。
本記事はPYNQのAPIと似たい関数をC言語で呼び出せるC API for PYNQを紹介します。このプロジェクトではC言語でPYNQの基本的な機能をC言語で実装しています。本稿はとりあえずC API for PYNQを使ってLEDを点滅させるまでたどり着いた作業メモです。
C API for PYNQのインストール
インストールは公式のGithubに従ってください
make sudo make install
ライブラリは以上のコマンドをPYNQのubuntu上で実行することでインストールができます
回路設計
適当に回路を作ります。AXI4Liteから4つのレジスタにアクセスして、それぞれのレジスタにLEDを割り当ててます。
myip_v1_0_S00_AXI.v ラップしてるやつ myip_v1_0.v適当なコード
`timescale 1 ns / 1 ps
module myip_v1_0_S00_AXI #
(
// Users to add parameters here
// User parameters ends
// Do not modify the parameters beyond this line
// Width of S_AXI data bus
parameter integer C_S_AXI_DATA_WIDTH = 32,
// Width of S_AXI address bus
parameter integer C_S_AXI_ADDR_WIDTH = 4
)
(
// Users to add ports here
// User ports ends
// Do not modify the ports beyond this line
// Global Clock Signal
input wire S_AXI_ACLK,
// Global Reset Signal. This Signal is Active LOW
input wire S_AXI_ARESETN,
// Write address (issued by master, acceped by Slave)
input wire [C_S_AXI_ADDR_WIDTH-1 : 0] S_AXI_AWADDR,
// Write channel Protection type. This signal indicates the
// privilege and security level of the transaction, and whether
// the transaction is a data access or an instruction access.
input wire [2 : 0] S_AXI_AWPROT,
// Write address valid. This signal indicates that the master signaling
// valid write address and control information.
input wire S_AXI_AWVALID,
// Write address ready. This signal indicates that the slave is ready
// to accept an address and associated control signals.
output wire S_AXI_AWREADY,
// Write data (issued by master, acceped by Slave)
input wire [C_S_AXI_DATA_WIDTH-1 : 0] S_AXI_WDATA,
// Write strobes. This signal indicates which byte lanes hold
// valid data. There is one write strobe bit for each eight
// bits of the write data bus.
input wire [(C_S_AXI_DATA_WIDTH/8)-1 : 0] S_AXI_WSTRB,
// Write valid. This signal indicates that valid write
// data and strobes are available.
input wire S_AXI_WVALID,
// Write ready. This signal indicates that the slave
// can accept the write data.
output wire S_AXI_WREADY,
// Write response. This signal indicates the status
// of the write transaction.
output wire [1 : 0] S_AXI_BRESP,
// Write response valid. This signal indicates that the channel
// is signaling a valid write response.
output wire S_AXI_BVALID,
// Response ready. This signal indicates that the master
// can accept a write response.
input wire S_AXI_BREADY,
// Read address (issued by master, acceped by Slave)
input wire [C_S_AXI_ADDR_WIDTH-1 : 0] S_AXI_ARADDR,
// Protection type. This signal indicates the privilege
// and security level of the transaction, and whether the
// transaction is a data access or an instruction access.
input wire [2 : 0] S_AXI_ARPROT,
// Read address valid. This signal indicates that the channel
// is signaling valid read address and control information.
input wire S_AXI_ARVALID,
// Read address ready. This signal indicates that the slave is
// ready to accept an address and associated control signals.
output wire S_AXI_ARREADY,
// Read data (issued by slave)
output wire [C_S_AXI_DATA_WIDTH-1 : 0] S_AXI_RDATA,
// Read response. This signal indicates the status of the
// read transfer.
output wire [1 : 0] S_AXI_RRESP,
// Read valid. This signal indicates that the channel is
// signaling the required read data.
output wire S_AXI_RVALID,
// Read ready. This signal indicates that the master can
// accept the read data and response information.
input wire S_AXI_RREADY,
output wire [3:0] LED
);
// AXI4LITE signals
reg [C_S_AXI_ADDR_WIDTH-1 : 0] axi_awaddr;
reg axi_awready;
reg axi_wready;
reg [1 : 0] axi_bresp;
reg axi_bvalid;
reg [C_S_AXI_ADDR_WIDTH-1 : 0] axi_araddr;
reg axi_arready;
reg [C_S_AXI_DATA_WIDTH-1 : 0] axi_rdata;
reg [1 : 0] axi_rresp;
reg axi_rvalid;
// Example-specific design signals
// local parameter for addressing 32 bit / 64 bit C_S_AXI_DATA_WIDTH
// ADDR_LSB is used for addressing 32/64 bit registers/memories
// ADDR_LSB = 2 for 32 bits (n downto 2)
// ADDR_LSB = 3 for 64 bits (n downto 3)
localparam integer ADDR_LSB = (C_S_AXI_DATA_WIDTH/32) + 1;
localparam integer OPT_MEM_ADDR_BITS = 1;
//----------------------------------------------
//-- Signals for user logic register space example
//------------------------------------------------
//-- Number of Slave Registers 4
reg [C_S_AXI_DATA_WIDTH-1:0] slv_reg0;
reg [C_S_AXI_DATA_WIDTH-1:0] slv_reg1;
reg [C_S_AXI_DATA_WIDTH-1:0] slv_reg2;
reg [C_S_AXI_DATA_WIDTH-1:0] slv_reg3;
wire slv_reg_rden;
wire slv_reg_wren;
reg [C_S_AXI_DATA_WIDTH-1:0] reg_data_out;
integer byte_index;
reg aw_en;
assign LED[0] = slv_reg0[0];
assign LED[1] = ~slv_reg1[0];
assign LED[2] = slv_reg2[0];
assign LED[3] = ~slv_reg3[0];
// I/O Connections assignments
assign S_AXI_AWREADY = axi_awready;
assign S_AXI_WREADY = axi_wready;
assign S_AXI_BRESP = axi_bresp;
assign S_AXI_BVALID = axi_bvalid;
assign S_AXI_ARREADY = axi_arready;
assign S_AXI_RDATA = axi_rdata;
assign S_AXI_RRESP = axi_rresp;
assign S_AXI_RVALID = axi_rvalid;
// Implement axi_awready generation
// axi_awready is asserted for one S_AXI_ACLK clock cycle when both
// S_AXI_AWVALID and S_AXI_WVALID are asserted. axi_awready is
// de-asserted when reset is low.
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1'b0 )
begin
axi_awready <= 1'b0;
aw_en <= 1'b1;
end
else
begin
if (~axi_awready && S_AXI_AWVALID && S_AXI_WVALID && aw_en)
begin
// slave is ready to accept write address when
// there is a valid write address and write data
// on the write address and data bus. This design
// expects no outstanding transactions.
axi_awready <= 1'b1;
aw_en <= 1'b0;
end
else if (S_AXI_BREADY && axi_bvalid)
begin
aw_en <= 1'b1;
axi_awready <= 1'b0;
end
else
begin
axi_awready <= 1'b0;
end
end
end
// Implement axi_awaddr latching
// This process is used to latch the address when both
// S_AXI_AWVALID and S_AXI_WVALID are valid.
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1'b0 )
begin
axi_awaddr <= 0;
end
else
begin
if (~axi_awready && S_AXI_AWVALID && S_AXI_WVALID && aw_en)
begin
// Write Address latching
axi_awaddr <= S_AXI_AWADDR;
end
end
end
// Implement axi_wready generation
// axi_wready is asserted for one S_AXI_ACLK clock cycle when both
// S_AXI_AWVALID and S_AXI_WVALID are asserted. axi_wready is
// de-asserted when reset is low.
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1'b0 )
begin
axi_wready <= 1'b0;
end
else
begin
if (~axi_wready && S_AXI_WVALID && S_AXI_AWVALID && aw_en )
begin
// slave is ready to accept write data when
// there is a valid write address and write data
// on the write address and data bus. This design
// expects no outstanding transactions.
axi_wready <= 1'b1;
end
else
begin
axi_wready <= 1'b0;
end
end
end
// Implement memory mapped register select and write logic generation
// The write data is accepted and written to memory mapped registers when
// axi_awready, S_AXI_WVALID, axi_wready and S_AXI_WVALID are asserted. Write strobes are used to
// select byte enables of slave registers while writing.
// These registers are cleared when reset (active low) is applied.
// Slave register write enable is asserted when valid address and data are available
// and the slave is ready to accept the write address and write data.
assign slv_reg_wren = axi_wready && S_AXI_WVALID && axi_awready && S_AXI_AWVALID;
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1'b0 )
begin
slv_reg0 <= 0;
slv_reg1 <= 0;
slv_reg2 <= 0;
slv_reg3 <= 0;
end
else begin
if (slv_reg_wren)
begin
case ( axi_awaddr[ADDR_LSB+OPT_MEM_ADDR_BITS:ADDR_LSB] )
2'h0:
for ( byte_index = 0; byte_index <= (C_S_AXI_DATA_WIDTH/8)-1; byte_index = byte_index+1 )
if ( S_AXI_WSTRB[byte_index] == 1 ) begin
// Respective byte enables are asserted as per write strobes
// Slave register 0
slv_reg0[(byte_index*8) +: 8] <= S_AXI_WDATA[(byte_index*8) +: 8];
end
2'h1:
for ( byte_index = 0; byte_index <= (C_S_AXI_DATA_WIDTH/8)-1; byte_index = byte_index+1 )
if ( S_AXI_WSTRB[byte_index] == 1 ) begin
// Respective byte enables are asserted as per write strobes
// Slave register 1
slv_reg1[(byte_index*8) +: 8] <= S_AXI_WDATA[(byte_index*8) +: 8];
end
2'h2:
for ( byte_index = 0; byte_index <= (C_S_AXI_DATA_WIDTH/8)-1; byte_index = byte_index+1 )
if ( S_AXI_WSTRB[byte_index] == 1 ) begin
// Respective byte enables are asserted as per write strobes
// Slave register 2
slv_reg2[(byte_index*8) +: 8] <= S_AXI_WDATA[(byte_index*8) +: 8];
end
2'h3:
for ( byte_index = 0; byte_index <= (C_S_AXI_DATA_WIDTH/8)-1; byte_index = byte_index+1 )
if ( S_AXI_WSTRB[byte_index] == 1 ) begin
// Respective byte enables are asserted as per write strobes
// Slave register 3
slv_reg3[(byte_index*8) +: 8] <= S_AXI_WDATA[(byte_index*8) +: 8];
end
default : begin
slv_reg0 <= slv_reg0;
slv_reg1 <= slv_reg1;
slv_reg2 <= slv_reg2;
slv_reg3 <= slv_reg3;
end
endcase
end
end
end
// Implement write response logic generation
// The write response and response valid signals are asserted by the slave
// when axi_wready, S_AXI_WVALID, axi_wready and S_AXI_WVALID are asserted.
// This marks the acceptance of address and indicates the status of
// write transaction.
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1'b0 )
begin
axi_bvalid <= 0;
axi_bresp <= 2'b0;
end
else
begin
if (axi_awready && S_AXI_AWVALID && ~axi_bvalid && axi_wready && S_AXI_WVALID)
begin
// indicates a valid write response is available
axi_bvalid <= 1'b1;
axi_bresp <= 2'b0; // 'OKAY' response
end // work error responses in future
else
begin
if (S_AXI_BREADY && axi_bvalid)
//check if bready is asserted while bvalid is high)
//(there is a possibility that bready is always asserted high)
begin
axi_bvalid <= 1'b0;
end
end
end
end
// Implement axi_arready generation
// axi_arready is asserted for one S_AXI_ACLK clock cycle when
// S_AXI_ARVALID is asserted. axi_awready is
// de-asserted when reset (active low) is asserted.
// The read address is also latched when S_AXI_ARVALID is
// asserted. axi_araddr is reset to zero on reset assertion.
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1'b0 )
begin
axi_arready <= 1'b0;
axi_araddr <= 32'b0;
end
else
begin
if (~axi_arready && S_AXI_ARVALID)
begin
// indicates that the slave has acceped the valid read address
axi_arready <= 1'b1;
// Read address latching
axi_araddr <= S_AXI_ARADDR;
end
else
begin
axi_arready <= 1'b0;
end
end
end
// Implement axi_arvalid generation
// axi_rvalid is asserted for one S_AXI_ACLK clock cycle when both
// S_AXI_ARVALID and axi_arready are asserted. The slave registers
// data are available on the axi_rdata bus at this instance. The
// assertion of axi_rvalid marks the validity of read data on the
// bus and axi_rresp indicates the status of read transaction.axi_rvalid
// is deasserted on reset (active low). axi_rresp and axi_rdata are
// cleared to zero on reset (active low).
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1'b0 )
begin
axi_rvalid <= 0;
axi_rresp <= 0;
end
else
begin
if (axi_arready && S_AXI_ARVALID && ~axi_rvalid)
begin
// Valid read data is available at the read data bus
axi_rvalid <= 1'b1;
axi_rresp <= 2'b0; // 'OKAY' response
end
else if (axi_rvalid && S_AXI_RREADY)
begin
// Read data is accepted by the master
axi_rvalid <= 1'b0;
end
end
end
// Implement memory mapped register select and read logic generation
// Slave register read enable is asserted when valid address is available
// and the slave is ready to accept the read address.
assign slv_reg_rden = axi_arready & S_AXI_ARVALID & ~axi_rvalid;
always @(*)
begin
// Address decoding for reading registers
case ( axi_araddr[ADDR_LSB+OPT_MEM_ADDR_BITS:ADDR_LSB] )
2'h0 : reg_data_out <= slv_reg0;
2'h1 : reg_data_out <= slv_reg1;
2'h2 : reg_data_out <= slv_reg2;
2'h3 : reg_data_out <= slv_reg3;
default : reg_data_out <= 0;
endcase
end
// Output register or memory read data
always @( posedge S_AXI_ACLK )
begin
if ( S_AXI_ARESETN == 1'b0 )
begin
axi_rdata <= 0;
end
else
begin
// When there is a valid read address (S_AXI_ARVALID) with
// acceptance of read address by the slave (axi_arready),
// output the read dada
if (slv_reg_rden)
begin
axi_rdata <= reg_data_out; // register read data
end
end
end
// Add user logic here
// User logic ends
endmodule
`timescale 1 ns / 1 ps
module myip_v1_0 #
(
// Users to add parameters here
// User parameters ends
// Do not modify the parameters beyond this line
// Parameters of Axi Slave Bus Interface S00_AXI
parameter integer C_S00_AXI_DATA_WIDTH = 32,
parameter integer C_S00_AXI_ADDR_WIDTH = 4
)
(
// Users to add ports here
// User ports ends
// Do not modify the ports beyond this line
// Ports of Axi Slave Bus Interface S00_AXI
input wire s00_axi_aclk,
input wire s00_axi_aresetn,
input wire [C_S00_AXI_ADDR_WIDTH-1 : 0] s00_axi_awaddr,
input wire [2 : 0] s00_axi_awprot,
input wire s00_axi_awvalid,
output wire s00_axi_awready,
input wire [C_S00_AXI_DATA_WIDTH-1 : 0] s00_axi_wdata,
input wire [(C_S00_AXI_DATA_WIDTH/8)-1 : 0] s00_axi_wstrb,
input wire s00_axi_wvalid,
output wire s00_axi_wready,
output wire [1 : 0] s00_axi_bresp,
output wire s00_axi_bvalid,
input wire s00_axi_bready,
input wire [C_S00_AXI_ADDR_WIDTH-1 : 0] s00_axi_araddr,
input wire [2 : 0] s00_axi_arprot,
input wire s00_axi_arvalid,
output wire s00_axi_arready,
output wire [C_S00_AXI_DATA_WIDTH-1 : 0] s00_axi_rdata,
output wire [1 : 0] s00_axi_rresp,
output wire s00_axi_rvalid,
input wire s00_axi_rready,
output LED0,
output LED1,
output LED2,
output LED3
);
// Instantiation of Axi Bus Interface S00_AXI
myip_v1_0_S00_AXI # (
.C_S_AXI_DATA_WIDTH(C_S00_AXI_DATA_WIDTH),
.C_S_AXI_ADDR_WIDTH(C_S00_AXI_ADDR_WIDTH)
) myip_v1_0_S00_AXI_inst (
.S_AXI_ACLK(s00_axi_aclk),
.S_AXI_ARESETN(s00_axi_aresetn),
.S_AXI_AWADDR(s00_axi_awaddr),
.S_AXI_AWPROT(s00_axi_awprot),
.S_AXI_AWVALID(s00_axi_awvalid),
.S_AXI_AWREADY(s00_axi_awready),
.S_AXI_WDATA(s00_axi_wdata),
.S_AXI_WSTRB(s00_axi_wstrb),
.S_AXI_WVALID(s00_axi_wvalid),
.S_AXI_WREADY(s00_axi_wready),
.S_AXI_BRESP(s00_axi_bresp),
.S_AXI_BVALID(s00_axi_bvalid),
.S_AXI_BREADY(s00_axi_bready),
.S_AXI_ARADDR(s00_axi_araddr),
.S_AXI_ARPROT(s00_axi_arprot),
.S_AXI_ARVALID(s00_axi_arvalid),
.S_AXI_ARREADY(s00_axi_arready),
.S_AXI_RDATA(s00_axi_rdata),
.S_AXI_RRESP(s00_axi_rresp),
.S_AXI_RVALID(s00_axi_rvalid),
.S_AXI_RREADY(s00_axi_rready),
.LED({LED3,LED2,LED1,LED0})
);
// Add user logic here
// User logic ends
endmodule
まあ適当にブロック接続に投げ込んでください。 実際のLEDにはPYNQ-Z2の制約ファイルを探して接続してください
この辺が参考になるかも https://qiita.com/nishimuraatsushi/items/3e5f161eca4b2a0f06c8
プログラミング
PYNQの実行ではbitファイルとhwhファイルを扱うと思うのですがこのhwhファイルはXML形式なのでパーサが必要となります。今回はRapidXMLを使いました。
高速軽量XMLパーサ:RapidXMLを触ってみた (1/2):CodeZine(コードジン)
軽量ライブラリなのでincludeするだけです。
いよいよメインのLED点滅です。以下のコードを実行するとレジスタが1と0で書き代わり偶数版のLEDと奇数版のLEDが交互に点滅します。
#include "rapidxml.hpp" #include "rapidxml_utils.hpp" // rapidxml::file #include <iostream> #include <stdio.h> #include <thread> #include <chrono> //pynq_apiはC言語なのでこうじゃ extern "C"{ #include <pynq_api.h> } using namespace std; namespace rx = rapidxml; using std::this_thread::sleep_for; constexpr int TIME_TO_SLEEP = 100; //hwhファイルを差し替えても動くようにパーサを使ってアドレスを探せるようにします。汚くてごめんね string get_ip_addr(string path) { rx::xml_document<> doc; rx::file<> input("Lchika_test.hwh"); doc.parse<0>(input.data()); rx::xml_node<>* node = doc.first_node("EDKSYSTEM")->first_node("MODULES")->first_node("MODULE")->first_node("PARAMETERS"); for ( rx::xml_node<>* child = node->first_node(); child != nullptr; child = child->next_sibling() ) { rx::xml_attribute<>* attr = child->first_attribute("NAME"); if( attr->value() == std::string("C_BASEADDR")){ return child->first_attribute("VALUE")->value(); } } return nullptr; } int main(void){ //hwhファイルからアドレスを探す string addr_str = get_ip_addr("Lchika_test.hwh"); int addr = strtol(addr_str.c_str(), NULL, 16); // ビットファイルを展開する PYNQ_loadBitstream("Lchika_test.bit"); // メモリマップドIOを使って各レジスタにアクセスできるようにする。 PYNQ_MMIO_WINDOW led; PYNQ_createMMIOWindow( &led, addr, 32); int val = 1; for(int i =0; i < 100; i++){ // 点滅 val = ~val; //ちょっと遅延 sleep_for(std::chrono::milliseconds(TIME_TO_SLEEP)); //各レジスタに書き込み PYNQ_writeMMIO(&led, &val, 0, 4); PYNQ_writeMMIO(&led, &val, 4, 4); PYNQ_writeMMIO(&led, &val, 8, 4); PYNQ_writeMMIO(&led, &val, 12, 4); } return 0; }
以下のコマンドでコンパイルします。
g++ -o my_code main.cpp -lpynq -lcma -lpthread -O3
公式のドキュメントでは-lpynq_apiでライブラリを呼べと書いてあるのですが、/usr/libを見るとlibpynqというファイル名なので-lpynqが正解だと思います。