Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A Crypto-Secure Reliable-UDP Library for golang with FEC

License

NotificationsYou must be signed in to change notification settings

xtaci/kcp-go

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kcp-go

GoDocPoweredMIT licensedBuild StatusGo Report CardCoverage Status

Introduction

kcp-go is aReliable-UDP library forgolang.

This library is designed to providesmooth, resilient, ordered, error-checked and anonymous delivery of streams overUDP packets. It has been battle-tested with the open-source projectkcptun. Millions of devices, ranging from low-end MIPS routers to high-end servers, have deployed kcp-go-powered programs in various applications, includingonline games, live broadcasting, file synchronization, and network acceleration.

Latest Release

Features

  1. Designed forlatency-sensitive scenarios.
  2. Cache-friendly andmemory-optimized design, offering extremelyhigh performance core.
  3. Handles>5K concurrent connections on a single commodity server.
  4. Compatible withnet.Conn andnet.Listener, serving as a drop-in replacement fornet.TCPConn.
  5. FEC (Forward Error Correction) support withReed-Solomon Codes.
  6. Packet-level encryption support withAES,TEA,3DES,Blowfish,Cast5,Salsa20, etc., inCFB mode, generating completely anonymous packets.
  7. Onlya fixed number of goroutines are created for the entire server application, with costs incontext switching between goroutines taken into consideration.
  8. Compatible withskywind3000's C version with various improvements.
  9. Platform-dependent optimizations:sendmmsg andrecvmmsg exploited for Linux.

Documentation

For complete documentation, see the associatedGodoc.

Specification

Frame Format

NONCE:  16bytes cryptographically secure random number, nonce changes for every packet.  CRC32:  CRC-32 checksum of data using the IEEE polynomial FEC TYPE:  typeData = 0xF1  typeParity = 0xF2  FEC SEQID:  monotonically increasing in range: [0, (0xffffffff/shardSize) * shardSize - 1]  SIZE:  The size of KCP frame plus 2KCP Header+------------------+| conv      uint32 |+------------------+| cmd       uint8  |+------------------+| frg       uint8  |+------------------+| wnd      uint16  |+------------------+| ts       uint32  |+------------------+| sn       uint32  |+------------------+| una      uint32  |+------------------+| rto      uint32  |+------------------+| xmit     uint32  |+------------------+| resendts uint32  |+------------------+| fastack  uint32  |+------------------+| acked    uint32  |+------------------+| data     []byte  |+------------------+

Layer-Model of KCP-GO

+-----------------+| SESSION         |+-----------------+| KCP(ARQ)        |+-----------------+| FEC(OPTIONAL)   |+-----------------+| CRYPTO(OPTIONAL)|+-----------------+| UDP(PACKET)     |+-----------------+| IP              |+-----------------+| LINK            |+-----------------+| PHY             |+-----------------+

Looking for a C++ client?

  1. https://github.com/xtaci/libkcp -- FEC enhanced KCP session library for iOS/Android in C++

Examples

  1. simple examples
  2. kcptun client
  3. kcptun server

Benchmark

===Model Name:MacBook ProModel Identifier:MacBookPro14,1Processor Name:Intel Core i5Processor Speed:3.1 GHzNumber of Processors:1Total Number of Cores:2L2 Cache (per Core):256 KBL3 Cache:4 MBMemory:8 GB===$ go test -v -run=^$ -bench .beginning tests, encryption:salsa20, fec:10/3goos: darwingoarch: amd64pkg: github.com/xtaci/kcp-goBenchmarkSM4-4                    50000     32180 ns/op  93.23 MB/s       0 B/op       0 allocs/opBenchmarkAES128-4                500000      3285 ns/op 913.21 MB/s       0 B/op       0 allocs/opBenchmarkAES192-4                300000      3623 ns/op 827.85 MB/s       0 B/op       0 allocs/opBenchmarkAES256-4                300000      3874 ns/op 774.20 MB/s       0 B/op       0 allocs/opBenchmarkTEA-4                   100000     15384 ns/op 195.00 MB/s       0 B/op       0 allocs/opBenchmarkXOR-4                 20000000        89.9 ns/op33372.00 MB/s       0 B/op       0 allocs/opBenchmarkBlowfish-4               50000     26927 ns/op 111.41 MB/s       0 B/op       0 allocs/opBenchmarkNone-4                30000000        45.7 ns/op65597.94 MB/s       0 B/op       0 allocs/opBenchmarkCast5-4                  50000     34258 ns/op  87.57 MB/s       0 B/op       0 allocs/opBenchmark3DES-4                   10000    117149 ns/op  25.61 MB/s       0 B/op       0 allocs/opBenchmarkTwofish-4                50000     33538 ns/op  89.45 MB/s       0 B/op       0 allocs/opBenchmarkXTEA-4                   30000     45666 ns/op  65.69 MB/s       0 B/op       0 allocs/opBenchmarkSalsa20-4               500000      3308 ns/op 906.76 MB/s       0 B/op       0 allocs/opBenchmarkCRC32-4               20000000        65.2 ns/op15712.43 MB/sBenchmarkCsprngSystem-4         1000000      1150 ns/op  13.91 MB/sBenchmarkCsprngMD5-4           10000000       145 ns/op 110.26 MB/sBenchmarkCsprngSHA1-4          10000000       158 ns/op 126.54 MB/sBenchmarkCsprngNonceMD5-4      10000000       153 ns/op 104.22 MB/sBenchmarkCsprngNonceAES128-4   100000000        19.1 ns/op 837.81 MB/sBenchmarkFECDecode-4            1000000      1119 ns/op1339.61 MB/s    1606 B/op       2 allocs/opBenchmarkFECEncode-4            2000000       832 ns/op1801.83 MB/s      17 B/op       0 allocs/opBenchmarkFlush-4                5000000       272 ns/op       0 B/op       0 allocs/opBenchmarkEchoSpeed4K-4             5000    259617 ns/op  15.78 MB/s    5451 B/op     149 allocs/opBenchmarkEchoSpeed64K-4            1000   1706084 ns/op  38.41 MB/s   56002 B/op    1604 allocs/opBenchmarkEchoSpeed512K-4            100  14345505 ns/op  36.55 MB/s  482597 B/op   13045 allocs/opBenchmarkEchoSpeed1M-4               30  34859104 ns/op  30.08 MB/s 1143773 B/op   27186 allocs/opBenchmarkSinkSpeed4K-4            50000     31369 ns/op 130.57 MB/s    1566 B/op      30 allocs/opBenchmarkSinkSpeed64K-4            5000    329065 ns/op 199.16 MB/s   21529 B/op     453 allocs/opBenchmarkSinkSpeed256K-4            500   2373354 ns/op 220.91 MB/s  166332 B/op    3554 allocs/opBenchmarkSinkSpeed1M-4              300   5117927 ns/op 204.88 MB/s  310378 B/op    6988 allocs/opPASSok  github.com/xtaci/kcp-go50.349s
=== Raspberry Pi 4 ===➜  kcp-go git:(master) cat /proc/cpuinfoprocessor: 0model name: ARMv7 Processor rev 3 (v7l)BogoMIPS: 108.00Features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32CPU implementer: 0x41CPU architecture: 7CPU variant: 0x0CPU part: 0xd08CPU revision: 3➜  kcp-go git:(master)  go test -run=^$ -bench .2020/01/05 19:25:13 beginning tests, encryption:salsa20, fec:10/3goos: linuxgoarch: armpkg: github.com/xtaci/kcp-go/v5BenchmarkSM4-4                     20000             86475 ns/op          34.69 MB/s           0 B/op          0 allocs/opBenchmarkAES128-4                  20000             62254 ns/op          48.19 MB/s           0 B/op          0 allocs/opBenchmarkAES192-4                  20000             71802 ns/op          41.78 MB/s           0 B/op          0 allocs/opBenchmarkAES256-4                  20000             80570 ns/op          37.23 MB/s           0 B/op          0 allocs/opBenchmarkTEA-4                     50000             37343 ns/op          80.34 MB/s           0 B/op          0 allocs/opBenchmarkXOR-4                    100000             22266 ns/op         134.73 MB/s           0 B/op          0 allocs/opBenchmarkBlowfish-4                20000             66123 ns/op          45.37 MB/s           0 B/op          0 allocs/opBenchmarkNone-4                  3000000               518 ns/op        5786.77 MB/s           0 B/op          0 allocs/opBenchmarkCast5-4                   20000             76705 ns/op          39.11 MB/s           0 B/op          0 allocs/opBenchmark3DES-4                     5000            418868 ns/op           7.16 MB/s           0 B/op          0 allocs/opBenchmarkTwofish-4                  5000            326896 ns/op           9.18 MB/s           0 B/op          0 allocs/opBenchmarkXTEA-4                    10000            114418 ns/op          26.22 MB/s           0 B/op          0 allocs/opBenchmarkSalsa20-4                 50000             36736 ns/op          81.66 MB/s           0 B/op          0 allocs/opBenchmarkCRC32-4                 1000000              1735 ns/op         589.98 MB/sBenchmarkCsprngSystem-4          1000000              2179 ns/op           7.34 MB/sBenchmarkCsprngMD5-4             2000000               811 ns/op          19.71 MB/sBenchmarkCsprngSHA1-4            2000000               862 ns/op          23.19 MB/sBenchmarkCsprngNonceMD5-4        2000000               878 ns/op          18.22 MB/sBenchmarkCsprngNonceAES128-4     5000000               326 ns/op          48.97 MB/sBenchmarkFECDecode-4              200000              9081 ns/op         165.16 MB/s         140 B/op          1 allocs/opBenchmarkFECEncode-4              100000             12039 ns/op         124.59 MB/s          11 B/op          0 allocs/opBenchmarkFlush-4                  100000             21704 ns/op               0 B/op          0 allocs/opBenchmarkEchoSpeed4K-4              2000            981182 ns/op           4.17 MB/s       12384 B/op        424 allocs/opBenchmarkEchoSpeed64K-4              100          10503324 ns/op           6.24 MB/s      123616 B/op       3779 allocs/opBenchmarkEchoSpeed512K-4              20         138633802 ns/op           3.78 MB/s     1606584 B/op      29233 allocs/opBenchmarkEchoSpeed1M-4                 5         372903568 ns/op           2.81 MB/s     4080504 B/op      63600 allocs/opBenchmarkSinkSpeed4K-4             10000            121239 ns/op          33.78 MB/s        4647 B/op        104 allocs/opBenchmarkSinkSpeed64K-4             1000           1587906 ns/op          41.27 MB/s       50914 B/op       1115 allocs/opBenchmarkSinkSpeed256K-4             100          16277830 ns/op          32.21 MB/s      453027 B/op       9296 allocs/opBenchmarkSinkSpeed1M-4               100          31040703 ns/op          33.78 MB/s      898097 B/op      18932 allocs/opPASSok      github.com/xtaci/kcp-go/v5      64.151s

Typical Flame Graph

Flame Graph in kcptun

Key Design Considerations

1. Slice vs. Container/List

kcp.flush() loops through the send queue for retransmission checking every 20 ms.

I wrote a benchmark comparing sequential loops through aslice and acontainer/listhere:

BenchmarkLoopSlice-4   2000000000         0.39 ns/opBenchmarkLoopList-4    100000000        54.6 ns/op

The list structure introducesheavy cache misses compared to the slice, which has betterlocality. For 5,000 connections with a 32-window size and a 20 ms interval, using a slice costs 6 μs (0.03% CPU) perkcp.flush(), while using a list costs 8.7 ms (43.5% CPU).

2. Timing Accuracy vs. Syscall clock_gettime

Timing iscritical to theRTT estimator. Inaccurate timing leads to false retransmissions in KCP, but callingtime.Now() costs 42 cycles (10.5 ns on a 4 GHz CPU, 15.6 ns on my MacBook Pro 2.7 GHz).

The benchmark fortime.Now() ishere:

BenchmarkNow-4         100000000        15.6 ns/op

In kcp-go, after eachkcp.output() function call, the current clock time is updated upon return. For a singlekcp.flush() operation, the current time is queried from the system once. For 5,000 connections, this costs 5000 * 15.6 ns = 78 μs (a fixed cost when no packet needs to be sent). For 10 MB/s data transfer with a 1400 MTU,kcp.output() is called around 7500 times, costing 117 μs fortime.Now() every second.

3. Memory Management

Primary memory allocation is done from a global buffer pool,xmit.Buf. In kcp-go, when we need to allocate some bytes, we get them from that pool, which returns a fixed-capacity 1500 bytes (mtuLimit). The rx queue, tx queue, and fec queue all receive bytes from this pool and return them after use to prevent unnecessary zeroing of bytes. The pool mechanism maintains a high watermark for slice objects, allowing these in-flight objects to survive periodic garbage collection, while also being able to return memory to the runtime when idle.

4. Information Security

kcp-go is shipped with built-in packet encryption powered by various block encryption algorithms and works inCipher Feedback Mode. For each packet to be sent, the encryption process starts by encrypting anonce from thesystem entropy, ensuring that encryption of the same plaintext never results in the same ciphertext.

The contents of the packets are completely anonymous with encryption, including the headers (FEC, KCP), checksums, and contents. Note that no matter which encryption method you choose at the upper layer, if you disable encryption, the transmission will be insecure, as the header isplaintext and susceptible to tampering, such as jamming thesliding window size,round-trip time,FEC properties, andchecksums.AES-128 is suggested for minimal encryption, as modern CPUs come withAES-NI instructions and perform better thansalsa20 (check the table above).

Other possible attacks on kcp-go include:

  • Traffic analysis: Data flow on specific websites may have patterns while exchanging data. This type of eavesdropping has been mitigated by adoptingsmux to mix data streams and introduce noise. A perfect solution has not yet appeared, but theoretically, shuffling/mixing messages on a larger scale network may mitigate this problem.
  • Replay attack: Since asymmetrical encryption has not been introduced into kcp-go, capturing packets and replaying them on a different machine is possible. Note that hijacking the session and decrypting the contents is stillimpossible. Upper layers should use an asymmetrical encryption system to guarantee the authenticity of each message (to process each message exactly once), such as HTTPS/OpenSSL/LibreSSL. Signing requests with private keys can eliminate this type of attack.

Connection Termination

Control messages likeSYN/FIN/RST in TCPare not defined in KCP. You need akeepalive/heartbeat mechanism at the application level. A real-world example is to use amultiplexing protocol over the session, such assmux (which has an embedded keepalive mechanism). Seekcptun for an example.

FAQ

Q: I'm handling >5K connections on my server, and the CPU utilization is so high.

A: A standaloneagent orgate server for running kcp-go is suggested, not only to reduce CPU utilization but also to improve theprecision of RTT measurements (timing), which indirectly affects retransmission. Increasing the updateinterval withSetNoDelay, such asconn.SetNoDelay(1, 40, 1, 1), will dramatically reduce system load but may lower performance.

Q: When should I enable FEC?

A: Forward error correction is critical for long-distance transmission because packet loss incurs a huge time penalty. In the complex packet routing networks of the modern world, round-trip time-based loss checks are not always efficient. The significant deviation of RTT samples over long distances usually leads to a larger RTO value in typical RTT estimators, which slows down the transmission.

Q: Should I enable encryption?

A: Yes, for the security of the protocol, even if the upper layer has encryption.

Who is using this?

  1. https://github.com/xtaci/kcptun -- A Secure Tunnel Based On KCP over UDP.
  2. https://github.com/getlantern/lantern -- Lantern delivers fast access to the open Internet.
  3. https://github.com/smallnest/rpcx -- A RPC service framework based on net/rpc like alibaba Dubbo and weibo Motan.
  4. https://github.com/gonet2/agent -- A gateway for games with stream multiplexing.
  5. https://github.com/syncthing/syncthing -- Open Source Continuous File Synchronization.

Links

  1. https://github.com/xtaci/smux/ -- A Stream Multiplexing Library for golang with least memory
  2. https://github.com/xtaci/libkcp -- FEC enhanced KCP session library for iOS/Android in C++
  3. https://github.com/skywind3000/kcp -- A Fast and Reliable ARQ Protocol
  4. https://github.com/klauspost/reedsolomon -- Reed-Solomon Erasure Coding in Go

[8]ページ先頭

©2009-2025 Movatter.jp