长肥管道(LFT)中TCP的艰难处境与打法
一个长肥管道很难被单条 TCP 连接填满(一条 TCP 流很难在长肥管道中达到额定带宽)!

iperf -c 172.16.0.2 -i 1 -P 1 -t 2
------------------------------------------------------------
Client connecting to 172.16.0.2, TCP port 5001
TCP window size: 3.54 MByte (default)
------------------------------------------------------------
[ 3] local 172.16.0.1 port 41364 connected with 172.16.0.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 129 MBytes 1.09 Gbits/sec
[ 3] 1.0- 2.0 sec 119 MBytes 996 Mbits/sec
[ 3] 0.0- 2.0 sec 248 MBytes 1.04 Gbits/sec
...
tc qdisc add dev enp0s9 root netem delay 100ms limit 10000000
------------------------------------------------------------
Client connecting to 172.16.0.2, TCP port 5001
TCP window size: 853 KByte (default)
------------------------------------------------------------
[ 3] local 172.16.0.1 port 41368 connected with 172.16.0.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 2.25 MBytes 18.9 Mbits/sec
[ 3] 1.0- 2.0 sec 2.00 MBytes 16.8 Mbits/sec
[ 3] 2.0- 3.0 sec 2.00 MBytes 16.8 Mbits/sec
[ 3] 3.0- 4.0 sec 2.00 MBytes 16.8 Mbits/sec
[ 3] 4.0- 5.0 sec 2.88 MBytes 24.1 Mbits/sec
[ 3] 0.0- 5.1 sec 11.1 MBytes 18.3 Mbits/sec
net.ipv4.tcp_congestion_control = reno
为了让发送端可以发送 BDP 这么多的数据,接收端需要通告一个 BDP 大小的窗口。 为了让发送端有 BDP 这么多的数据可发,发送端需要有 BDP 这么大的发送缓冲区。 为了让 BDP 这么多的数据背靠背顺利通过长肥管道,网络不能拥塞(即需要单流独享带宽)。
net.core.rmem_max = 13420500
net.ipv4.tcp_rmem = 4096 873800 13420500
iperf -s -w 15m
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 25.6 MByte (WARNING: requested 14.3 MByte)
------------------------------------------------------------
net.core.wmem_max = 13420500
net.ipv4.tcp_wmem = 4096 873800 13420500
...
[ 3] 38.0-39.0 sec 18.5 MBytes 155 Mbits/sec
[ 3] 39.0-40.0 sec 18.0 MBytes 151 Mbits/sec
[ 3] 40.0-41.0 sec 18.9 MBytes 158 Mbits/sec
[ 3] 41.0-42.0 sec 18.9 MBytes 158 Mbits/sec
[ 3] 42.0-43.0 sec 18.0 MBytes 151 Mbits/sec
[ 3] 43.0-44.0 sec 19.1 MBytes 160 Mbits/sec
[ 3] 44.0-45.0 sec 18.8 MBytes 158 Mbits/sec
[ 3] 45.0-46.0 sec 19.5 MBytes 163 Mbits/sec
[ 3] 46.0-47.0 sec 19.6 MBytes 165 Mbits/sec
...
#!/usr/bin/stap -g
%{
#include <linux/skbuff.h>
#include <net/tcp.h>
%}
function alter_cwnd(skk:long, skbb:long)
%{
struct sock *sk = (struct sock *)STAP_ARG_skk;
struct sk_buff *skb = (struct sk_buff *)STAP_ARG_skbb;
struct tcp_sock *tp = tcp_sk(sk);
struct iphdr *iph;
struct tcphdr *th;
if (skb->protocol != htons(ETH_P_IP))
return;
th = (struct tcphdr *)skb->data;
if (ntohs(th->source) == 5001) {
// 手工将拥塞窗口设置为恒定的BDP包量。
tp->snd_cwnd = 8947;
}
%}
probe kernel.function("tcp_ack").return
{
alter_cwnd($sk, $skb);
}
...
[ 3] 0.0- 1.0 sec 124 MBytes 1.04 Gbits/sec
[ 3] 1.0- 2.0 sec 116 MBytes 970 Mbits/sec
[ 3] 2.0- 3.0 sec 116 MBytes 977 Mbits/sec
[ 3] 3.0- 4.0 sec 117 MBytes 979 Mbits/sec
[ 3] 4.0- 5.0 sec 116 MBytes 976 Mbits/sec
...


# ping -f 172.16.0.2
PING 172.16.0.2 (172.16.0.2) 56(84) bytes of data.
.......^C
--- 172.16.0.2 ping statistics ---
1025 packets transmitted, 1018 received, 0.682927% packet loss, time 14333ms
rtt min/avg/max/mdev = 100.217/102.962/184.080/4.698 ms, pipe 10, ipg/ewma 13.996/104.149 ms
#!/usr/bin/stap -g
%{
#include <linux/skbuff.h>
#include <net/tcp.h>
%}
function dump_info(skk:long, skbb:long)
%{
struct sock *sk = (struct sock *)STAP_ARG_skk;
struct sk_buff *skb = (struct sk_buff *)STAP_ARG_skbb;
struct tcp_sock *tp = tcp_sk(sk);
struct iphdr *iph;
struct tcphdr *th;
if (skb->protocol != htons(ETH_P_IP))
return;
th = (struct tcphdr *)skb->data;
if (ntohs(th->source) == 5001) {
int inflt = tcp_packets_in_flight(tp);
// 这里打印窗口和inflight值,以观测窗口什么时候能涨到BDP。
STAP_PRINTF("RTT:%llu curr cwnd:%d curr inflight:%d \n", tp->srtt_us/8, tp->snd_cwnd, inflt);
}
%}
probe kernel.function("tcp_reno_ssthresh").return
{
// 这里恢复即将要减半的窗口
$return = $return*2
}
probe kernel.function("tcp_ack")
{
dump_info($sk, $skb);
}
iperf -c 172.16.0.2 -i 1 -P 1 -t 1500
RTT:104118 curr cwnd:8980 curr inflight:8949
RTT:105555 curr cwnd:8980 curr inflight:8828
RTT:107122 curr cwnd:8980 curr inflight:8675
RTT:108641 curr cwnd:8980 curr inflight:8528
...
[ 3] 931.0-932.0 sec 121 MBytes 1.02 Gbits/sec
[ 3] 932.0-933.0 sec 121 MBytes 1.02 Gbits/sec
[ 3] 933.0-934.0 sec 121 MBytes 1.02 Gbits/sec
...
is_cwnd_limited |= (tcp_packets_in_flight(tp) >= tp->snd_cwnd);
root@zhaoya:/home/zhaoya# iperf -c 172.16.0.2 -i 1 -P 1 -t 5
------------------------------------------------------------
Client connecting to 172.16.0.2, TCP port 5001
TCP window size: 853 KByte (default)
------------------------------------------------------------
[ 3] local 172.16.0.1 port 41436 connected with 172.16.0.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 62.4 MBytes 523 Mbits/sec
[ 3] 1.0- 2.0 sec 117 MBytes 980 Mbits/sec
[ 3] 2.0- 3.0 sec 128 MBytes 1.08 Gbits/sec
[ 3] 3.0- 4.0 sec 129 MBytes 1.08 Gbits/sec
[ 3] 4.0- 5.0 sec 133 MBytes 1.12 Gbits/sec
[ 3] 0.0- 5.0 sec 570 MBytes 955 Mbits/sec

管道长,意味着窗口爬升慢,丢包感知慢,丢包恢复慢。 管道肥,意味着目标很远大,结合管道长的弱点,达到目标更加不易。
TCP 代理模式。可以通过多个透明代理接力将数据传输到遥远的目的地。 TCP 隧道模式。可以将大时延的 TCP 流封装在小时延的 TCP 隧道里接力传输。
https://blog.csdn.net/dog250/article/details/83997773
https://blog.csdn.net/dog250/article/details/81257271
https://blog.csdn.net/dog250/article/details/106955747


case1:测试 TCP 直接通过 S-D 长肥管道的吞吐。 case2:测试 TCP 从 S 经由隧道 T1,T2 到达 D 的吞吐。
# 绑定 172.16.0.1
iperf -c 172.18.0.1 -B 172.16.0.1 -i 1 -P 1 -t 15
...
[ 3] 9.0-10.0 sec 63.6 KBytes 521 Kbits/sec
[ 3] 10.0-11.0 sec 63.6 KBytes 521 Kbits/sec
[ 3] 11.0-12.0 sec 17.0 KBytes 139 Kbits/sec
[ 3] 12.0-13.0 sec 63.6 KBytes 521 Kbits/sec
[ 3] 13.0-14.0 sec 63.6 KBytes 521 Kbits/sec
[ 3] 14.0-15.0 sec 80.6 KBytes 660 Kbits/sec
[ 3] 0.0-15.2 sec 1.27 MBytes 699 Kbits/sec
# 绑定 172.16.0.3
iperf -c 172.18.0.3 -B 172.16.0.3 -i 1 -P 1 -t 15
...
[ 3] 9.0-10.0 sec 127 KBytes 1.04 Mbits/sec
[ 3] 10.0-11.0 sec 445 KBytes 3.65 Mbits/sec
[ 3] 11.0-12.0 sec 127 KBytes 1.04 Mbits/sec
[ 3] 12.0-13.0 sec 382 KBytes 3.13 Mbits/sec
[ 3] 13.0-14.0 sec 127 KBytes 1.04 Mbits/sec
[ 3] 14.0-15.0 sec 90.5 KBytes 741 Kbits/sec
[ 3] 0.0-15.2 sec 2.14 MBytes 1.18 Mbits/sec


程序员如何避免陷入“内卷”、选择什么技术最有前景,中国开发者现状与技术趋势究竟是什么样?快来参与「2020 中国开发者大调查」,更有丰富奖品送不停!
☞任正非就注册姚安娜商标道歉;人人影视字幕组因盗版被查;JIRA、Confluence 等产品本月停售本地化版本 | 极客头条
☞三年已投 1000 亿打造的达摩院,何以仗剑走天涯?
关注公众号:拾黑(shiheibook)了解更多
[广告]赞助链接:
四季很好,只要有你,文娱排行榜:https://www.yaopaiming.com/
让资讯触达的更精准有趣:https://www.0xu.cn/

随时掌握互联网精彩
赞助链接
排名
热点
搜索指数
- 1 增绿就是增优势 植树就是植未来 7967503
- 2 网购500元假茅台官方扫码为真 7993627
- 3 顾茜茜抖音账号被永久封禁 7849381
- 4 《政府工作报告》全文公布 7777615
- 5 多名外卖小哥提醒避雷黄焖鸡 7618112
- 6 夫妻连生9女 取名从招娣盼娣到仇娣 7552224
- 7 春捂要捂到什么时候?医生建议来了 7470027
- 8 老干部局招聘要求50岁 工资3000元 7360207
- 9 金秀贤把入伍期间照片私发给金赛纶 7239414
- 10 中小企业经营向暖 7127847