当前位置:网站首页>HLS / chisel uses CORDIC hyperbolic system to realize square root calculation
HLS / chisel uses CORDIC hyperbolic system to realize square root calculation
2022-04-23 02:57:00 【AlwaysDayOne】
CORDIC( Coordinate rotation digital algorithm ) It's a computational triangle 、 Numerical algorithms for hyperbolic and other mathematical functions , Each operation produces a result output . This enables us to adjust the accuracy of the algorithm according to the application requirements ; More accurate results can be obtained by increasing the number of operation iterations .CORDIC Is to use only addition 、 Subtraction 、 Shift and look-up table implementation of a simple algorithm , This algorithm is in FPGA High efficiency in , It is often used in hardware algorithm implementation .
This article mainly introduces CORDIC Introduce the calculation of square root based on hyperbolic system .
HLS / Chisel Realization CORDIC Algorithmic hyperbolic system
principle
stay CORDIC In the vector mode of hyperbolic system , The final output is as follows :
{ x n = K n x 1 2 − y 1 2 y n = 0 z n = z 1 + tanh − 1 ( y 1 / x 1 ) K n = ∏ i = 1 n − 1 1 − 2 − 2 i σ i = { + 1 , y i ⩽ 0 − 1 , y i > 0 − 0.784 < t a n h ( z n ) = y 1 x 1 < 0.784 \begin{aligned} &\left\{\begin{array}{l} x_{n}=K_{n} \sqrt{x_{1}^{2}-y_{1}^{2}} \\ y_{n}=0 \\ z_{n}=z_{1}+\tanh ^{-1}\left(y_{1} / x_{1}\right) \\ K_{n}=\prod_{i=1}^{n-1} \sqrt{1-2^{-2 i}} \end{array}\right.\\ &\sigma_{i}=\left\{\begin{array}{l} +1, y_{i} \leqslant 0 \\ -1, y_{i}>0 \end{array}\right. \end{aligned} \\ -0.784<tanh({z_n}) =\frac {y_1} {x_1}<0.784 ⎩⎪⎪⎨⎪⎪⎧xn=Knx12−y12yn=0zn=z1+tanh−1(y1/x1)Kn=∏i=1n−11−2−2iσi={ +1,yi⩽0−1,yi>0−0.784<tanh(zn)=x1y1<0.784
The specific mathematical derivation can be seen in the article cited above .
We can notice that , In this iteration , x n x_n xn The square root function appears , So we can make a little change : set up a Is the square root to be found , set up x The initial value of a+1,y The initial value of a-1,z Any value can , After iteration x The value is :
x = K ∗ ( a + 1 ) 2 − ( a − 1 ) 2 = K ∗ 4 a = 2 K ∗ a x=K^{*} \sqrt{(a+1)^{2}-(a-1)^{2}}=K^{*} \sqrt{4 a}=2 K^{*} \sqrt{a} x=K∗(a+1)2−(a−1)2=K∗4a=2K∗a
It is known that
K = 1 / P = 0.8297816201389014 P = 1.205136358399695 \begin{gathered} K=1 / P= 0.8297816201389014 \\ P = 1.205136358399695 \end{gathered} K=1/P=0.8297816201389014P=1.205136358399695
Then the output of the vector mode of the hyperbolic system x n x_n xn ride P, Move one more bit to the right to get a The square root of
a = ( p ∗ x n ) > > 1 \sqrt{a}=(p*x_n)>>1 a=(p∗xn)>>1
Input range problem
be aware − 0.784 < t a n h ( z n ) = y 1 x 1 < 0.784 -0.784<tanh({z_n}) =\frac {y_1} {x_1}<0.784 −0.784<tanh(zn)=x1y1<0.784, Then there are − 0.784 < a − 1 a + 1 < 0.784 -0.784< \frac {a-1} {a+1}<0.784 −0.784<a+1a−1<0.784, therefore x The input range of needs to be limited to [0,8] Within the scope of .
Besides , because [0,1] The number in the range is too small , Its error in iterative numerical calculation is too large , The end result is not ideal , So we want the input range to be [1,8].
In order to expand the scope to R+, We can input a Use the shift operation to move N position ,N For the even , Shift to [1,8] Get after range a ‘ a` a‘, Finally through HV The result is obtained after module calculation x n x_n xn, In shift N/2 Just recover .
HLS practice
Be careful : Here only the square operation of the basic class is realized , The input problem is not solved by binary shift , Readers can refer to solve .
#define THETA_TYPE float
#define COS_SIN_TYPE float
#define NUM_ITERATIONS 50
THETA_TYPE cordic_P_mark_n = 1.205136358399695;
void cordic_hv_square_root(COS_SIN_TYPE a, COS_SIN_TYPE &x_out)
{
x = a+1;
y = a-1;
for(int j = 1;j <= NUM_ITERATIONS; j++){
COS_SIN_TYPE temp_x = x;
if(y < 0){
x = x + y >> j;
y = y + temp_x >> j;
theta = theta - cordic_phase_h[j-1];
} else{
x = x - y >> j;
y = y - temp_x >> j;
theta = theta + cordic_phase_h[j-1];
}
}
x_out = (x * cordic_P_mark_n) >> 1;
}
Chisel practice
Fixed point number definition
First define the number of fixed points , Pay attention to chisel.experimental There are fixed-point classes in the package
/* Cordic Global constant definition of */
trait Cordic_Const_Data extends HasDataConfig{
/* Number of iterations configuration */
val NUM_ITERATIONS = 20
}
trait HasDataConfig {
/* Bit width definition of fixed-point number */
val DataWidth = 32
val BinaryPoint = 20
}
The source code to achieve
Here, on the basis of the quoted article, it is changed cordic Algorithm HV Module code , Note that you only need to calculate x,y that will do , There is no need for... In the calculation of hyperbolic system z
- First, implement binary shift , And generate associated objects
class shift_range_1_8 extends Module with HasDataConfig{
/* * @x : Enter the number of bits to be moved Fixed point number type Value range R * @out : Output the shifted result Range [1,8] * @cnt : Number of shifts output , Move left to positive , Shift right to negative * details: Combinational logic circuits will x Shift to the range by double left and right [1,8], And output the number of shifts **/
val io = IO(new Bundle {
val x: FixedPoint = Input(FixedPoint(DataWidth.W, BinaryPoint.BP))
val out: FixedPoint = Output(FixedPoint(DataWidth.W, BinaryPoint.BP))
val cnt: SInt = Output(SInt(log2Ceil(DataWidth).W))
})
val temp_x: FixedPoint = io.x
when(temp_x < FixedPoint.fromDouble(1, DataWidth.W, BinaryPoint.BP)){
/* Than 0.5 Small , Need to move left */
val index = VecInit(Seq.fill(BinaryPoint>>1)(0.B))
for(i <- 0 until (BinaryPoint>>1)){
when((temp_x << (i<<1)) < FixedPoint.fromDouble(1, DataWidth.W, BinaryPoint.BP)){
index(i) := 0.B
}.otherwise{
index(i) := 1.B
}
}
/* The priority encoder first obtains greater than or equal to 1 The location of , namely cnt Value */
val temp_cnt = PriorityEncoder(index.asUInt())
io.cnt := (temp_cnt<<1).asSInt()
io.out := (temp_x << (temp_cnt<<1))
}.elsewhen(temp_x > FixedPoint.fromDouble(8, DataWidth.W, BinaryPoint.BP)){
/* Than 8 Big , Need to move right */
val index = VecInit(Seq.fill(DataWidth>>1)(0.B))
for(i <- 0 until (DataWidth>>1)){
when((temp_x >> (i<<1)) > FixedPoint.fromDouble(8, DataWidth.W, BinaryPoint.BP)){
index(i) := 0.B
}.otherwise{
index(i) := 1.B
}
}
/* The priority encoder first obtains less than or equal to 8 The location of , namely cnt Value */
val temp_cnt = PriorityEncoder(index.asUInt())
io.cnt := -((temp_cnt<<1).asSInt())
io.out := (temp_x >> (temp_cnt<<1))
}.otherwise{
io.cnt := 0.S
io.out := temp_x
}
}
/* * Define the companion object of this class , A factory method is defined to simplify the instantiation and wiring of modules . **/
object shift_range_1_8 {
def apply(x: FixedPoint): (FixedPoint, SInt) = {
/* * @x : Enter the number of bits to be moved Fixed point number type Value range R * @out : Output the shifted result Range [0.5,1] * @cnt : Number of shifts output , Move left to positive , Shift right to negative * @flag : Output x The positive and negative symbols of are 1,0 * details: take x Shift left and right to range [0.5,1], And output the number of shifts and whether to change the symbol **/
val unit = Module(new shift_range_1_8)
unit.io.x := x
(unit.io.out, unit.io.cnt)
}
}
- Realization hv Module to calculate the square root source code
class cordic_hv_square_root_origin extends Module with HasDataConfig with Cordic_Const_Data{
/* * @in : The horizontal axis coordinates of the input vector Fixed point number type in∈[0,8] * @out : Output x^0.5 Fixed point number type * details: utilize cordic The scaling of vector mode iteration in hyperbolic coordinate system is obtained (in+1,in-1) Output of coordinates x, namely 2K(in^0.5), cube 1/K=P, Just move one bit to the right . Be careful HV Mode limits y/x The scope of the , so in∈[0,8] */
val io = IO(new Bundle {
val in: FixedPoint = Input(FixedPoint(DataWidth.W, BinaryPoint.BP))
val out: FixedPoint = Output(FixedPoint(DataWidth.W, BinaryPoint.BP))
})
val x: FixedPoint = io.in + (1.F(DataWidth.W, BinaryPoint.BP))
val y: FixedPoint = io.in - (1.F(DataWidth.W, BinaryPoint.BP))
/* When approaching infinity ,K The value becomes constant */
val cordic_P_h: FixedPoint = FixedPoint.fromDouble(1.205136358399695, DataWidth.W, BinaryPoint.BP)
/* Initializes the calculated register array , formation NUM_ITERATIONS It's a cascade of water */
val current_x: Vec[FixedPoint] = RegInit(VecInit(Seq.fill(NUM_ITERATIONS)(FixedPoint.fromDouble(0.0, DataWidth.W, BinaryPoint.BP)))) // cos
val current_y: Vec[FixedPoint] = RegInit(VecInit(Seq.fill(NUM_ITERATIONS)(FixedPoint.fromDouble(0.0, DataWidth.W, BinaryPoint.BP)))) // sin
for (i <- 1 to NUM_ITERATIONS) {
/* * x[i] = K`(x[i-1] + sigma * 2^(-i) * y[i-1]) * y[i] = K`(y[i-1] + sigma * 2^(-i) * x[i-1]) * z[i] = z[i-1] - sigma * arctanh(2^(-i)) **/
if (i == 1) {
/* The first stage of the pipeline is directly connected to (io.x,io.y) Click to do the operation */
when(y < FixedPoint.fromDouble(0.0, DataWidth.W, BinaryPoint.BP)) {
current_x(i - 1) := x + (y >> i)
current_y(i - 1) := y + (x >> i)
}.otherwise {
current_x(i - 1) := x - (y >> i)
current_y(i - 1) := y - (x >> i)
}
} else {
when(current_y(i - 2) < FixedPoint.fromDouble(0.0, DataWidth.W, BinaryPoint.BP)) {
current_x(i - 1) := current_x(i - 2) + (current_y(i - 2) >> i) // Shift instead of multiplication
current_y(i - 1) := current_y(i - 2) + (current_x(i - 2) >> i)
}.otherwise {
current_x(i - 1) := current_x(i - 2) - (current_y(i - 2) >> i)
current_y(i - 1) := current_y(i - 2) - (current_x(i - 2) >> i)
}
}
}
io.out := (current_x(NUM_ITERATIONS - 1) * cordic_P_h ) >> 1
}
test
import org.scalatest._
import chisel3._
import chiseltest._
import chisel3.experimental._
import scala.math._
class Cordic_Square_Root_Tester extends FlatSpec with ChiselScalatestTester with Matchers {
behavior of "mytest2"
it should "do something" in {
test(new cordic_hv_square_root) {
c =>
c.io.in.poke(2.F(32.W, 20.BP))
c.clock.step(20)
println(s"out ${
c.io.out.peek}\n")
c.io.in.poke(4.F(32.W, 20.BP))
c.clock.step(20)
println(s"out ${
c.io.out.peek}\n")
c.io.in.poke(8.F(32.W, 20.BP))
c.clock.step(20)
println(s"out ${
c.io.out.peek}\n")
c.io.in.poke(16.F(32.W, 20.BP))
c.clock.step(20)
println(s"out ${
c.io.out.peek}\n")
c.io.in.poke(64.F(32.W, 20.BP))
c.clock.step(20)
println(s"out ${
c.io.out.peek}\n")
c.io.in.poke(128.F(32.W, 20.BP))
c.clock.step(20)
println(s"out ${
c.io.out.peek}\n")
// Note that there 32 There are not enough seats
// c.io.in.poke(FixedPoint.fromDouble(65532,32.W, 20.BP))
// c.clock.step(20)
// println(s"out ${c.io.out.peek}\n")
c.io.in.poke(0.00132.F(32.W, 20.BP))
c.clock.step(20)
println(s"out ${
c.io.out.peek}\n")
c.io.in.poke(0.6315.F(32.W, 20.BP))
c.clock.step(20)
println(s"out ${
c.io.out.peek}\n")
c.io.in.poke(0.000632.F(32.W, 20.BP))
c.clock.step(20)
println(s"out ${
c.io.out.peek}\n")
}
}
}
// perform sbt "testOnly Cordic_Square_Root_Tester"
版权声明
本文为[AlwaysDayOne]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220648125266.html
边栏推荐
- LeetCode 1450 - 1453
- Practical combat of industrial defect detection project (II) -- steel surface defect detection based on deep learning framework yolov5
- 《信息系统项目管理师总结》第六章 项目人力资源管理
- Numpy append function
- windows MySQL8 zip安装
- Chapter V project quality management of information system project manager summary
- Encapsulation of ele table
- Opencv reads webcam video and saves it locally
- eventBus
- Kubernetes - detailed explanation of pod
猜你喜欢
windows MySQL8 zip安装
Linux Redis——Redis 数据库缓存服务
AC & A2C & A3C
工业互联网+危化安全生产综合管理平台怎样建
Leangoo brain map - shared multi person collaborative mind mapping tool
The interface request takes too long. Jstack observes the lock holding
Windows MySQL 8 zip installation
Airtrack cracking wireless network password (Dictionary running method)
Six very 6 computer driver managers: what software is good for driver upgrade? Recommended by the best computer driver management software abroad
[hcip] detailed explanation of six LSAS commonly used by OSPF
随机推荐
L2-006 樹的遍曆(中後序確定二叉樹&層序遍曆)
Table space capacity query and expansion of Oracle Database
Learn regular expression options, assertions
重大危险源企业如何保障年底前完成双预防机制数字化建设任务
JS using the parameters of art template
MySQL complex query uses temporary table / with as (similar to table variable)
How to deploy a website with only a server and no domain name?
Domestic lightweight Kanban scrum agile project management tool
解决win7 中powershell挖矿占用CPU100%
Log4j知识点记录
The interface request takes too long. Jstack observes the lock holding
学习正则表达式选项、断言
Actual combat of industrial defect detection project (IV) -- ceramic defect detection based on hrnet
[learn junit5 from official documents] [II] [writingtests] [learning notes]
The problem of removing spaces from strings
The space between the left and right of the movie ticket seats is empty and cannot be selected
Airtrack cracking wireless network password (Dictionary running method)
Devil cold rice 𞓜 078 devil answers the market in Shanghai and Nanjing; Communication and guidance; Winning the country and killing and screening; The purpose of making money; Change other people's op
《信息系統項目管理師總結》第六章 項目人力資源管理
Slave should be able to synchronize with the master in tests/integration/replication-psync.tcl