Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

simple interface to SIMD instructions

License

NotificationsYou must be signed in to change notification settings

octu0/go-intrin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIT LicenseGoDocGo Report CardReleases

go-intrin is a Go library that provides a simple interface to SIMD (MMX/SSE/SSE2/AVX/AVX2) instructions.currently supports Intel x86.

Installation

go get github.com/octu0/go-intrin

Example

Float32/Float64

import"github.com/octu0/go-intrin/x86"funcAdd() {a:= [4]float32{1.0,3.0,5.0,11.0 }b:= [4]float32{2.0,4.0,6.0,8.0 }r:=x86.Float32Add(a,b)r[0]// 3.0r[1]// 7.0r[2]// 11.0r[3]// 19.0}funcSub() {a:= [4]float32{1.0,3.0,-5.0,-11.0 }b:= [4]float32{2.0,-4.0,6.0,-8.0 }r:=x86.Float32Sub(a,b)r[0]// 3.0r[1]// 7.0r[2]// -11.0r[3]// -3}funcMul() {a:= [2]float64{1.0,-1.0}b:= [2]float64{-2.0,-5.0}_=x86.Float64Mul(a,b)}funcDiv() {a:= [4]float64{-1.1,5.2,3.0,4.2}b:= [4]float64{-100.0,20.0,2.0,2.0}_=x86.Float64Div4(a,b)}

Int8/Int16/Uint8/Uint16/Int32

import"github.com/octu0/go-intrin/x86"funcAdd() {a:= [16]int8{1,2,3,4,5,6,7,8,-1,2,3,-4,-5,-6,-7,-8,    }b:= [16]int8{11,12,13,14,15,16,17,18,100,125,126,-128,-124,127,-2,-3,    }_=x86.Int8Add(a,b)}funcSub() {a:= [8]int16{1,2,32767,-32768,1,1,-32768,0}b:= [8]int16{1,3,1,1,-32768,32767,-1,-1}_=x86.Int16Sub(a,b)}funcMax() {a:= [16]uint8{1,2,3,4,5,6,7,8,255,250,0,255,127,6,7,8,    }b:= [16]uint8{0,1,10,255,254,0,7,8,255,251,255,1,128,7,6,9,    }_=x86.Uint8Max(a,b)}funcAvg() {a:= [8]uint16{1,2,32767,65535,1,65535,0,0}b:= [8]uint16{1,3,1,1,65535,65535,0,1}_=x86.Uint8Avg(a,b)}funcAbs() {a:= [8]int32{1,-2,-3,-100,-255,1024,777,-888},_=x86.Int32Abs(a)}

Benchmark

The performance of SIMD benchmarks can vary depending on the execution environment.
In some cases, performance can be better than native Go even when using cgo to access SIMD instructions.

Darwin & Core i5

goos: darwingoarch: amd64pkg: github.com/octu0/go-intrin/x86cpu: Intel(R) Core(TM) i5-8210Y CPU @ 1.60GHzBenchmarkInt16BenchmarkInt16/sum/goBenchmarkInt16/sum/go-4         1000000000         0.3100 ns/opBenchmarkInt16/sum/simdBenchmarkInt16/sum/simd-4       1000000000         0.1590 ns/opBenchmarkInt8BenchmarkInt8/sum/goBenchmarkInt8/sum/go-4          1000000000         0.3011 ns/opBenchmarkInt8/sum/simdBenchmarkInt8/sum/simd-4        1000000000         0.08489 ns/op

Linux & Xeon

goos: linuxgoarch: amd64pkg: github.com/octu0/go-intrin/x86cpu: Intel(R) Xeon(R) W-11955M CPU @ 2.60GHzBenchmarkFloat32BenchmarkFloat32/sum/goBenchmarkFloat32/sum/go-16         585412603         2.097 ns/opBenchmarkFloat32/sum/simdBenchmarkFloat32/sum/simd-16       1000000000         0.8212 ns/op

Go code for Grayscale can improve performance by minimizing cgo calls.

goos: linuxgoarch: amd64pkg: github.com/octu0/go-intrin/x86cpu: Intel(R) Xeon(R) W-11955M CPU @ 2.60GHzBenchmarkGrayscaleBenchmarkGrayscale/goBenchmarkGrayscale/go-16            1923    596976 ns/opBenchmarkGrayscale/simd/smallBenchmarkGrayscale/simd/small-16    100   10063526 ns/opBenchmarkGrayscale/simd/mediumBenchmarkGrayscale/simd/medium-16   1087   1112508 ns/opBenchmarkGrayscale/simd/fullBenchmarkGrayscale/simd/full-16     3070    382772 ns/op

The Go code is as below:

funcGrayscale(src*image.NRGBA)*image.NRGBA {b:=src.Bounds()w,h:=b.Dx(),b.Dy()out:=image.NewNRGBA(b)fory:=0;y<h;y+=1 {forx:=0;x<w;x+=1 {c:=src.NRGBAAt(x,y)// BT.709gray:=byte((0.2126*float32(c.R))+ (0.7152*float32(c.G))+ (0.0722*float32(c.B)))out.SetNRGBA(x,y, color.NRGBA{R:gray,G:gray,B:gray,A:0xff,      })    }  }returnout}

It is called asC.simd_grayscale as follows (this code is used in the simd/full benchmark):

staticvoidsimd_grayscale(uint8_t*out,uint8_t*in,intsize) {__m128bt709=_mm_setr_ps(0.2126f,0.7152f,0.0722f,0.0f);uint8_tgray[8];for(inti=0;i<size;i+=16) {__m64m1=_mm_setr_pi8(in[i+0],in[i+1],in[i+2],in[i+3],0,0,0,0    );__m64m2=_mm_setr_pi8(in[i+4],in[i+5],in[i+6],in[i+7],0,0,0,0    );__m64m3=_mm_setr_pi8(in[i+8],in[i+9],in[i+10],in[i+11],0,0,0,0    );__m64m4=_mm_setr_pi8(in[i+12],in[i+13],in[i+14],in[i+15],0,0,0,0    );__m128rgba1=_mm_mul_ps(_mm_cvtpu8_ps(m1),bt709);__m128rgba2=_mm_mul_ps(_mm_cvtpu8_ps(m2),bt709);__m128rgba3=_mm_mul_ps(_mm_cvtpu8_ps(m3),bt709);__m128rgba4=_mm_mul_ps(_mm_cvtpu8_ps(m4),bt709);__m128r=_mm_setr_ps(rgba1[0],rgba2[0],rgba3[0],rgba4[0]);// R__m128g=_mm_setr_ps(rgba1[1],rgba2[1],rgba3[1],rgba4[1]);// G__m128b=_mm_setr_ps(rgba1[2],rgba2[2],rgba3[2],rgba4[2]);// B// gray = [rgba1,rgba2,rgba3,rgba4]__m128gray_float=_mm_add_ps(_mm_add_ps(r,g),b);__m64gray_u8=_mm_cvtps_pi8(gray_float);memcpy(&gray,&gray_u8,sizeof(__m64));uint8_ttmp[16]= {gray[0],gray[0],gray[0],255,gray[1],gray[1],gray[1],255,gray[2],gray[2],gray[2],255,gray[3],gray[3],gray[3],255    };memcpy(out+i,&tmp,16);  }}

enable FMA

-mfma cannot be specified for Cgo's CFLAGS by default.

Enable it like the following command.

$ go env -w "CGO_CFLAGS_ALLOW=-mfma"

And,go build -tags fma to build

$ go build -tags fma github.com/octu0/go-intrin

Testing

$ go test -tags fma -v ./x86/

License

MIT, see LICENSE file for details.


[8]ページ先頭

©2009-2025 Movatter.jp