当前位置:网站首页>HLS / chisel uses CORDIC hyperbolic system to realize square root calculation
HLS / chisel uses CORDIC hyperbolic system to realize square root calculation
2022-04-23 02:57:00 【AlwaysDayOne】
CORDIC( Coordinate rotation digital algorithm ) It's a computational triangle 、 Numerical algorithms for hyperbolic and other mathematical functions , Each operation produces a result output . This enables us to adjust the accuracy of the algorithm according to the application requirements ; More accurate results can be obtained by increasing the number of operation iterations .CORDIC Is to use only addition 、 Subtraction 、 Shift and look-up table implementation of a simple algorithm , This algorithm is in FPGA High efficiency in , It is often used in hardware algorithm implementation .
This article mainly introduces CORDIC Introduce the calculation of square root based on hyperbolic system .
HLS / Chisel Realization CORDIC Algorithmic hyperbolic system
principle
stay CORDIC In the vector mode of hyperbolic system , The final output is as follows :
{ x n = K n x 1 2 − y 1 2 y n = 0 z n = z 1 + tanh − 1 ( y 1 / x 1 ) K n = ∏ i = 1 n − 1 1 − 2 − 2 i σ i = { + 1 , y i ⩽ 0 − 1 , y i > 0 − 0.784 < t a n h ( z n ) = y 1 x 1 < 0.784 \begin{aligned} &\left\{\begin{array}{l} x_{n}=K_{n} \sqrt{x_{1}^{2}-y_{1}^{2}} \\ y_{n}=0 \\ z_{n}=z_{1}+\tanh ^{-1}\left(y_{1} / x_{1}\right) \\ K_{n}=\prod_{i=1}^{n-1} \sqrt{1-2^{-2 i}} \end{array}\right.\\ &\sigma_{i}=\left\{\begin{array}{l} +1, y_{i} \leqslant 0 \\ -1, y_{i}>0 \end{array}\right. \end{aligned} \\ -0.784<tanh({z_n}) =\frac {y_1} {x_1}<0.784 ⎩⎪⎪⎨⎪⎪⎧xn=Knx12−y12yn=0zn=z1+tanh−1(y1/x1)Kn=∏i=1n−11−2−2iσi={ +1,yi⩽0−1,yi>0−0.784<tanh(zn)=x1y1<0.784
The specific mathematical derivation can be seen in the article cited above .
We can notice that , In this iteration , x n x_n xn The square root function appears , So we can make a little change : set up a Is the square root to be found , set up x The initial value of a+1,y The initial value of a-1,z Any value can , After iteration x The value is :
x = K ∗ ( a + 1 ) 2 − ( a − 1 ) 2 = K ∗ 4 a = 2 K ∗ a x=K^{*} \sqrt{(a+1)^{2}-(a-1)^{2}}=K^{*} \sqrt{4 a}=2 K^{*} \sqrt{a} x=K∗(a+1)2−(a−1)2=K∗4a=2K∗a
It is known that
K = 1 / P = 0.8297816201389014 P = 1.205136358399695 \begin{gathered} K=1 / P= 0.8297816201389014 \\ P = 1.205136358399695 \end{gathered} K=1/P=0.8297816201389014P=1.205136358399695
Then the output of the vector mode of the hyperbolic system x n x_n xn ride P, Move one more bit to the right to get a The square root of
a = ( p ∗ x n ) > > 1 \sqrt{a}=(p*x_n)>>1 a=(p∗xn)>>1
Input range problem
be aware − 0.784 < t a n h ( z n ) = y 1 x 1 < 0.784 -0.784<tanh({z_n}) =\frac {y_1} {x_1}<0.784 −0.784<tanh(zn)=x1y1<0.784, Then there are − 0.784 < a − 1 a + 1 < 0.784 -0.784< \frac {a-1} {a+1}<0.784 −0.784<a+1a−1<0.784, therefore x The input range of needs to be limited to [0,8] Within the scope of .
Besides , because [0,1] The number in the range is too small , Its error in iterative numerical calculation is too large , The end result is not ideal , So we want the input range to be [1,8].
In order to expand the scope to R+, We can input a Use the shift operation to move N position ,N For the even , Shift to [1,8] Get after range a ‘ a` a‘, Finally through HV The result is obtained after module calculation x n x_n xn, In shift N/2 Just recover .
HLS practice
Be careful : Here only the square operation of the basic class is realized , The input problem is not solved by binary shift , Readers can refer to solve .
#define THETA_TYPE float
#define COS_SIN_TYPE float
#define NUM_ITERATIONS 50
THETA_TYPE cordic_P_mark_n = 1.205136358399695;
void cordic_hv_square_root(COS_SIN_TYPE a, COS_SIN_TYPE &x_out)
{
x = a+1;
y = a-1;
for(int j = 1;j <= NUM_ITERATIONS; j++){
COS_SIN_TYPE temp_x = x;
if(y < 0){
x = x + y >> j;
y = y + temp_x >> j;
theta = theta - cordic_phase_h[j-1];
} else{
x = x - y >> j;
y = y - temp_x >> j;
theta = theta + cordic_phase_h[j-1];
}
}
x_out = (x * cordic_P_mark_n) >> 1;
}
Chisel practice
Fixed point number definition
First define the number of fixed points , Pay attention to chisel.experimental There are fixed-point classes in the package
/* Cordic Global constant definition of */
trait Cordic_Const_Data extends HasDataConfig{
/* Number of iterations configuration */
val NUM_ITERATIONS = 20
}
trait HasDataConfig {
/* Bit width definition of fixed-point number */
val DataWidth = 32
val BinaryPoint = 20
}
The source code to achieve
Here, on the basis of the quoted article, it is changed cordic Algorithm HV Module code , Note that you only need to calculate x,y that will do , There is no need for... In the calculation of hyperbolic system z
- First, implement binary shift , And generate associated objects
class shift_range_1_8 extends Module with HasDataConfig{
/* * @x : Enter the number of bits to be moved Fixed point number type Value range R * @out : Output the shifted result Range [1,8] * @cnt : Number of shifts output , Move left to positive , Shift right to negative * details: Combinational logic circuits will x Shift to the range by double left and right [1,8], And output the number of shifts **/
val io = IO(new Bundle {
val x: FixedPoint = Input(FixedPoint(DataWidth.W, BinaryPoint.BP))
val out: FixedPoint = Output(FixedPoint(DataWidth.W, BinaryPoint.BP))
val cnt: SInt = Output(SInt(log2Ceil(DataWidth).W))
})
val temp_x: FixedPoint = io.x
when(temp_x < FixedPoint.fromDouble(1, DataWidth.W, BinaryPoint.BP)){
/* Than 0.5 Small , Need to move left */
val index = VecInit(Seq.fill(BinaryPoint>>1)(0.B))
for(i <- 0 until (BinaryPoint>>1)){
when((temp_x << (i<<1)) < FixedPoint.fromDouble(1, DataWidth.W, BinaryPoint.BP)){
index(i) := 0.B
}.otherwise{
index(i) := 1.B
}
}
/* The priority encoder first obtains greater than or equal to 1 The location of , namely cnt Value */
val temp_cnt = PriorityEncoder(index.asUInt())
io.cnt := (temp_cnt<<1).asSInt()
io.out := (temp_x << (temp_cnt<<1))
}.elsewhen(temp_x > FixedPoint.fromDouble(8, DataWidth.W, BinaryPoint.BP)){
/* Than 8 Big , Need to move right */
val index = VecInit(Seq.fill(DataWidth>>1)(0.B))
for(i <- 0 until (DataWidth>>1)){
when((temp_x >> (i<<1)) > FixedPoint.fromDouble(8, DataWidth.W, BinaryPoint.BP)){
index(i) := 0.B
}.otherwise{
index(i) := 1.B
}
}
/* The priority encoder first obtains less than or equal to 8 The location of , namely cnt Value */
val temp_cnt = PriorityEncoder(index.asUInt())
io.cnt := -((temp_cnt<<1).asSInt())
io.out := (temp_x >> (temp_cnt<<1))
}.otherwise{
io.cnt := 0.S
io.out := temp_x
}
}
/* * Define the companion object of this class , A factory method is defined to simplify the instantiation and wiring of modules . **/
object shift_range_1_8 {
def apply(x: FixedPoint): (FixedPoint, SInt) = {
/* * @x : Enter the number of bits to be moved Fixed point number type Value range R * @out : Output the shifted result Range [0.5,1] * @cnt : Number of shifts output , Move left to positive , Shift right to negative * @flag : Output x The positive and negative symbols of are 1,0 * details: take x Shift left and right to range [0.5,1], And output the number of shifts and whether to change the symbol **/
val unit = Module(new shift_range_1_8)
unit.io.x := x
(unit.io.out, unit.io.cnt)
}
}
- Realization hv Module to calculate the square root source code
class cordic_hv_square_root_origin extends Module with HasDataConfig with Cordic_Const_Data{
/* * @in : The horizontal axis coordinates of the input vector Fixed point number type in∈[0,8] * @out : Output x^0.5 Fixed point number type * details: utilize cordic The scaling of vector mode iteration in hyperbolic coordinate system is obtained (in+1,in-1) Output of coordinates x, namely 2K(in^0.5), cube 1/K=P, Just move one bit to the right . Be careful HV Mode limits y/x The scope of the , so in∈[0,8] */
val io = IO(new Bundle {
val in: FixedPoint = Input(FixedPoint(DataWidth.W, BinaryPoint.BP))
val out: FixedPoint = Output(FixedPoint(DataWidth.W, BinaryPoint.BP))
})
val x: FixedPoint = io.in + (1.F(DataWidth.W, BinaryPoint.BP))
val y: FixedPoint = io.in - (1.F(DataWidth.W, BinaryPoint.BP))
/* When approaching infinity ,K The value becomes constant */
val cordic_P_h: FixedPoint = FixedPoint.fromDouble(1.205136358399695, DataWidth.W, BinaryPoint.BP)
/* Initializes the calculated register array , formation NUM_ITERATIONS It's a cascade of water */
val current_x: Vec[FixedPoint] = RegInit(VecInit(Seq.fill(NUM_ITERATIONS)(FixedPoint.fromDouble(0.0, DataWidth.W, BinaryPoint.BP)))) // cos
val current_y: Vec[FixedPoint] = RegInit(VecInit(Seq.fill(NUM_ITERATIONS)(FixedPoint.fromDouble(0.0, DataWidth.W, BinaryPoint.BP)))) // sin
for (i <- 1 to NUM_ITERATIONS) {
/* * x[i] = K`(x[i-1] + sigma * 2^(-i) * y[i-1]) * y[i] = K`(y[i-1] + sigma * 2^(-i) * x[i-1]) * z[i] = z[i-1] - sigma * arctanh(2^(-i)) **/
if (i == 1) {
/* The first stage of the pipeline is directly connected to (io.x,io.y) Click to do the operation */
when(y < FixedPoint.fromDouble(0.0, DataWidth.W, BinaryPoint.BP)) {
current_x(i - 1) := x + (y >> i)
current_y(i - 1) := y + (x >> i)
}.otherwise {
current_x(i - 1) := x - (y >> i)
current_y(i - 1) := y - (x >> i)
}
} else {
when(current_y(i - 2) < FixedPoint.fromDouble(0.0, DataWidth.W, BinaryPoint.BP)) {
current_x(i - 1) := current_x(i - 2) + (current_y(i - 2) >> i) // Shift instead of multiplication
current_y(i - 1) := current_y(i - 2) + (current_x(i - 2) >> i)
}.otherwise {
current_x(i - 1) := current_x(i - 2) - (current_y(i - 2) >> i)
current_y(i - 1) := current_y(i - 2) - (current_x(i - 2) >> i)
}
}
}
io.out := (current_x(NUM_ITERATIONS - 1) * cordic_P_h ) >> 1
}
test
import org.scalatest._
import chisel3._
import chiseltest._
import chisel3.experimental._
import scala.math._
class Cordic_Square_Root_Tester extends FlatSpec with ChiselScalatestTester with Matchers {
behavior of "mytest2"
it should "do something" in {
test(new cordic_hv_square_root) {
c =>
c.io.in.poke(2.F(32.W, 20.BP))
c.clock.step(20)
println(s"out ${
c.io.out.peek}\n")
c.io.in.poke(4.F(32.W, 20.BP))
c.clock.step(20)
println(s"out ${
c.io.out.peek}\n")
c.io.in.poke(8.F(32.W, 20.BP))
c.clock.step(20)
println(s"out ${
c.io.out.peek}\n")
c.io.in.poke(16.F(32.W, 20.BP))
c.clock.step(20)
println(s"out ${
c.io.out.peek}\n")
c.io.in.poke(64.F(32.W, 20.BP))
c.clock.step(20)
println(s"out ${
c.io.out.peek}\n")
c.io.in.poke(128.F(32.W, 20.BP))
c.clock.step(20)
println(s"out ${
c.io.out.peek}\n")
// Note that there 32 There are not enough seats
// c.io.in.poke(FixedPoint.fromDouble(65532,32.W, 20.BP))
// c.clock.step(20)
// println(s"out ${c.io.out.peek}\n")
c.io.in.poke(0.00132.F(32.W, 20.BP))
c.clock.step(20)
println(s"out ${
c.io.out.peek}\n")
c.io.in.poke(0.6315.F(32.W, 20.BP))
c.clock.step(20)
println(s"out ${
c.io.out.peek}\n")
c.io.in.poke(0.000632.F(32.W, 20.BP))
c.clock.step(20)
println(s"out ${
c.io.out.peek}\n")
}
}
}
// perform sbt "testOnly Cordic_Square_Root_Tester"
版权声明
本文为[AlwaysDayOne]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204220648125266.html
边栏推荐
猜你喜欢

基于ele封装下拉菜单等组件

C language 171 Number of recent palindromes

ele之Table表格的封装
![[unity3d] rolling barrage effect in live broadcasting room](/img/61/46a7d6c4bf887fca8f088e7673cf2f.png)
[unity3d] rolling barrage effect in live broadcasting room

接口请求时间太长,jstack观察锁持有情况

1215_ Hello world used by scons

Encapsulation of ele table

Liunx foundation - zabbix5 0 monitoring system installation and deployment

Slave should be able to synchronize with the master in tests/integration/replication-psync.tcl

Solve the problem that PowerShell mining occupies 100% of cpu7 in win7
随机推荐
Face longitude:
Jz76 delete duplicate nodes in linked list
Close the computer port
基于Scrum进行创新和管理
grain rain
Linux Redis——Redis 数据库缓存服务
工业互联网+危化安全生产综合管理平台怎样建
Step principle of logical regression in machine learning
MySQL / SQL Server判断表或临时表存在则删除
Interim summary (Introduction + application layer + transportation layer)
Chapter V project quality management of information system project manager summary
JS using the parameters of art template
Mosaic Routing: implement / home / news
Specific field information of MySQL export table (detailed operation of Navicat client)
B blocks of the 46th ICPC Asian regional competition (Kunming)
The shell monitors the depth of the IBM MQ queue and scans it three times in 10s. When the depth value exceeds 5 for more than two times, the queue name and depth value are output.
LeetCode 1450 - 1453
The penultimate K nodes in jz22 linked list
Les derniers noeuds K de la liste jz22
《信息系统项目管理师总结》第七章 项目沟通管理