Skip to content

tcp_nodelay

Posted on:July 15, 2021 at 10:57 AM

tcp_nodealy

 The solution to the small-packet problem
 解决小包问题的方法

Clearly an adaptive approach is desirable.  One  would  expect  a
proposal  for  an  adaptive  inter-packet time limit based on the
round-trip delay observed by TCP.  While such a  mechanism  could
certainly  be  implemented,  it  is  unnecessary.   A  simple and
elegant solution has been discovered.

The solution is to inhibit the sending of new TCP  segments  when
new  outgoing  data  arrives  from  the  user  if  any previously
transmitted data on the connection remains unacknowledged.   This
inhibition  is  to be unconditional; no timers, tests for size of
data received, or other conditions are required.   Implementation
typically requires one or two lines inside a TCP program.
解决的方式是如果之前发送的数据没有被ack,阻止发送新的tcp段.这个抑制条件是不需要前置的条件的:不需要定时器,不需要探测包是否被接收,以及其他条件.实现上只需要添加一两行代码在tcp程序里面


At first glance, this solution seems to imply drastic changes  in
the  behavior of TCP.  This is not so.  It all works out right in
the end.  Let us see why this is so.
乍看起来,这会很大地改变tcp的行为.但是实际上并不是这样,这从头到尾都没有太大变化.让我们看看为什么是这样.

When a user process writes to a TCP connection, TCP receives some
data.   It  may  hold  that data for future sending or may send a
packet immediately.  If it refrains from  sending  now,  it  will
typically send the data later when an incoming packet arrives and
changes the state of the system.  The state changes in one of two
ways;  the incoming packet acknowledges old data the distant host
has received, or announces the availability of  buffer  space  in
the  distant  host  for  new  data.  (This last is referred to as
"updating the window").    Each time data arrives  on  a  connec-
tion,  TCP must reexamine its current state and perhaps send some
packets out.  Thus, when we omit sending data on arrival from the
user,  we  are  simply  deferring its transmission until the next
message arrives from the distant host.   A  message  must  always
arrive soon unless the connection was previously idle or communi-
cations with the other end have been lost.  In  the  first  case,
the  idle  connection,  our  scheme will result in a packet being
sent whenever the user writes to the TCP connection.  Thus we  do
not  deadlock  in  the idle condition.  In the second case, where
当一个用户写消息到tcp连接,Tcp协议栈会受到这些信息.tcp协议栈会保持这些内容或者立马发送这些内容.



RFC 896    Congestion Control in IP/TCP Internetworks      1/6/84

the distant host has failed, sending more data is futile  anyway.
Note  that we have done nothing to inhibit normal TCP retransmis-
sion logic, so lost messages are not a problem.

Examination of the behavior of this scheme under  various  condi-
tions  demonstrates  that the scheme does work in all cases.  The
first case to examine is the one we wanted to solve, that of  the
character-oriented  Telnet  connection.   Let us suppose that the
user is sending TCP a new character every  200ms,  and  that  the
connection  is  via  an Ethernet with a round-trip time including
software processing of 50ms.  Without any  mechanism  to  prevent
small-packet congestion, one packet will be sent for each charac-
ter, and response will be optimal.  Overhead will be  4000%,  but
this  is  acceptable  on  an Ethernet.  The classic timer scheme,
with a limit of 2 packets per second, will  cause  two  or  three
characters to be sent per packet.  Response will thus be degraded
even though on a high-bandwidth  Ethernet  this  is  unnecessary.
Overhead  will  drop  to  1500%, but on an Ethernet this is a bad
tradeoff.  With our scheme, every character the user  types  will
find  TCP with an idle connection, and the character will be sent
at once, just as in the no-control case.  The user  will  see  no
visible  delay.   Thus,  our  scheme  performs as well as the no-
control scheme and provides better responsiveness than the  timer
scheme.

The second case to examine is the same Telnet  test  but  over  a
long-haul  link  with  a  5-second  round trip time.  Without any
mechanism to prevent  small-packet  congestion,  25  new  packets
would be sent in 5 seconds.* Overhead here is  4000%.   With  the
classic timer scheme, and the same limit of 2 packets per second,
there would still be 10 packets outstanding and  contributing  to
congestion.  Round-trip time will not be improved by sending many
packets, of course; in general it will be worse since the packets
will  contend  for line time.  Overhead now drops to 1500%.  With
our scheme, however, the first character from the user would find
an  idle  TCP connection and would be sent immediately.  The next
24 characters, arriving from the user at 200ms  intervals,  would
be  held  pending  a  message from the distant host.  When an ACK
arrived for the first packet at the end of 5  seconds,  a  single
packet  with  the 24 queued characters would be sent.  Our scheme
thus results in an overhead reduction to 320% with no penalty  in
response  time.   Response time will usually be improved with our
scheme because packet overhead is reduced, here by  a  factor  of
4.7 over the classic timer scheme.  Congestion will be reduced by
this factor and round-trip delay will decrease sharply.  For this
________
  * This problem is not seen in the pure ARPANET case because the
    IMPs will block the host when the count of packets
    outstanding becomes excessive, but in the case where a pure
    datagram local net (such as an Ethernet) or a pure datagram
    gateway (such as an ARPANET / MILNET gateway) is involved, it
    is possible to have large numbers of tiny packets
    outstanding.



RFC 896    Congestion Control in IP/TCP Internetworks      1/6/84

case, our scheme has a striking  advantage  over  either  of  the
other approaches.

We use our scheme for all TCP connections, not just  Telnet  con-
nections.   Let us see what happens for a file transfer data con-
nection using our technique. The two extreme cases will again  be
considered.

As before, we first consider the Ethernet case.  The user is  now
writing data to TCP in 512 byte blocks as fast as TCP will accept
them.  The user's first write to TCP will start things going; our
first  datagram  will  be  512+40  bytes  or 552 bytes long.  The
user's second write to TCP will not cause a send but  will  cause
the  block  to  be buffered.  Assume that the user fills up TCP's
outgoing buffer area before the first ACK comes back.  Then  when
the  ACK  comes in, all queued data up to the window size will be
sent.  From then on, the window will be kept full,  as  each  ACK
initiates  a  sending  cycle  and queued data is sent out.  Thus,
after a one round-trip time initial period when only one block is
sent,  our  scheme  settles down into a maximum-throughput condi-
tion.  The delay in startup is only 50ms on the Ethernet, so  the
startup  transient  is  insignificant.  All three schemes provide
equivalent performance for this case.

Finally, let us look at a file transfer over the  5-second  round
trip  time connection.  Again, only one packet will be sent until
the first ACK comes back; the window will then be filled and kept
full.   Since the round-trip time is 5 seconds, only 512 bytes of
data are transmitted in the first 5 seconds.  Assuming a 2K  win-
dow,  once  the first ACK comes in, 2K of data will be sent and a
steady rate of 2K per 5 seconds will  be  maintained  thereafter.
Only  for  this  case is our scheme inferior to the timer scheme,
and the difference is only in the startup transient; steady-state
throughput  is  identical.  The naive scheme and the timer scheme
would both take 250 seconds to transmit a 100K  byte  file  under
the  above  conditions  and  our scheme would take 254 seconds, a
difference of 1.6%.

Thus, for all cases examined, our scheme provides at least 98% of
the  performance  of  both other schemes, and provides a dramatic
improvement in Telnet performance over paths with long round trip
times.   We  use  our  scheme  in  the  Ford  Aerospace  Software
Engineering Network, and are able to run screen editors over Eth-
ernet and talk to distant TOPS-20 hosts with improved performance
in both cases.

相关阅读