Add New Notes

This commit is contained in:
geekard
2012-08-08 14:26:04 +08:00
commit 5ef7c20052
2374 changed files with 276187 additions and 0 deletions

View File

@@ -0,0 +1,411 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-02-27T16:36:05+08:00
====== How do you get ECONNRESET on recv ======
Created Monday 27 February 2012
http://fixunix.com/unix/84635-how-do-you-get-econnreset-recv.html
The man page for __recv__ and __read__ list the error __ECONNRESET__ as an error
condition that happens when "A connection was forcibly closed __by a__
__ peer__."~~ I take this to mean that, assuming a TCP connection, if a~~
~~ client is recv'ing from a server, and the server suddenly crashes,~~
~~ then on the client side recv will return -1 and set errno to~~
~~ ECONNRESET.~~
For the purpose of **robust error handling**, I'm trying to integrate
routines to take care of this sort of thing in my program. But I
simply** can't actually get recv to return ECONNRESET** in any of my
tests. For testing purposes, I set up a simple server and a simple
client. The server sends data, and a background thread raises a
__SIGSEGV__ while the server is sending, causing the whole program to
crash. Meanwhile, the client is recv'ing. But when the server
crashes, the client does not issue an ECONNRESET error. Rather, recv
__ returns 0 and errno is set to 0__. No error condition is generated at
all. But the man page says that recv should only return 0 if "the
peer has performed an orderly shutdown". But a SIGSEGV is certainly
not my idea of an "orderly shutdown"!
So, is the behavior of recv in this aspect something that is
implementation defined, i.e. not identical across platforms? Maybe
some UNIX environments return ECONNRESET on recv, but others don't?
Or does recv never return ECONNRESET with TCP?
--------------------------------------------------------
Reply With Quote Reply With Quote
10-04-2007 12:09 AM #2
Re: How do you get ECONNRESET on recv?
On 2007-08-09, chsalvia@gmail.com wrote:
> The man page for recv and read list the error ECONNRESET as an error
> condition that happens when "A connection was forcibly closed by a
> peer." I take this to mean that, assuming a TCP connection, if a
> client is recv'ing from a server, and the server suddenly crashes,
> then on the client side recv will return -1 and set errno to
> ECONNRESET.
Well, your understanding is__ probably wrong.__
The TCP answers with RESET when you try to __send__ some data to peer that
does not want to __read__ that data. In other words the peer has __closed__
connection or has done __shutdown of reading__.
Normally, if the peer__ closes__ connection, recv returns 0 without any
error. The same applies to the cases when the peer application crashes.
Now, if you try to send the data to peer after you got 0 from recv,
you should get RESET.
__ recv返回0说明对方已经关闭了(close)连接,或关闭了写端(shutdown(SHUT_WR))。__
在本地也调用close后再send就会收到RESET报文。
以后再发send
如果是前者客户端send数据时服务器端TCP会返回RESET报文send出错errno为
**ECONNRESET。**
If you try to send the data after you got RESET, you'll __get EPIPE or SIGPIPE__.
So, theoretically, you can see ECONNRESET in recv only if the peer does
__ shutdown(SHUT_RD) __and you try to send some data after this. Which
usually never happens More often the peer__ closes __socket unexpectedly
while you are sending many chunks of data and as result you get
SIGPIPE, because your first send triggers RESET, and your second send
triggers SIGPIPE, because you didn't see the RESET.
--
Minds, like parachutes, function best when open
Reply With Quote Reply With Quote
10-04-2007 12:09 AM #3
Re: How do you get ECONNRESET on recv?
chsalvia@gmail.com wrote:
> The man page for recv and read list the error ECONNRESET as an error
> condition that happens when "A connection was forcibly closed by a
> peer." I take this to mean that, assuming a TCP connection, if a
> client is recv'ing from a server, and the server suddenly crashes,
> then on the client side recv will return -1 and set errno to
> ECONNRESET.
Actually no When the server _system_ suddenly crashes, your
application receives nothing.
ECONNRESET means the connection has received a ReSeT (RST) segment
(ostesibly) from the remote TCP. There are a multitude of reasons
such a segment could be received, including, but not limited to:
*) the remote abused SO_LINGER and did an abortive close of the
connection
*) your application sent data which arrived after the remote called
shutdown(SHUT_RD) or close()
*) the remote TCP hit a retransmission limit and aborted (yes, if the
data segments weren't getting through the chances of the RST making
it are slim, but still non-zero)
*) there was some actual TCP protocol error between the two systems
99 times out of ten if the server _application_ terminates
(prematurely) the normal close() which happens on almost all platorms
will cause TCP to emit a FINished (FIN) segment. That would then be a
recv/read return of zero at your end. Of course if your application
ignored that and then tried to send something, that brings us to the
second bullet item above.
> But the man page says that recv should only return 0 if "the peer
> has performed an orderly shutdown". But a SIGSEGV is certainly not
> my idea of an "orderly shutdown"!
Ah, but as per above, 99 times out of ten, when the OS is cleaning-up
after the SIGSEGV'd application, it goes ahead and calls (the moral
equivalent to) close(), which unless perhaps the application has set
the abortive close SO_LINGER options will result in a FIN being sent.
The TCP code doesn't know the difference between a close() from the
app making a direct call, the system making a close() call on normal
program termination, or one from abnormal termination.
I suppse you could try setting the SO_LINGER options on the server
code to cause an RST when close() is called and then see what killing
the process does. Just be sure that you only do that in a debug
version and/or have code to put SO_LINGER back the way it should be
when doing a "normal" close() in your server app. Hmm, that might be
one of the few valid (IMO) reasons to use that otherwise heinous
direct-to_RST SO_LINGER option... Perhaps one day I will try that
with netperf.
rick jones
--
Process shall set you free from the need for rational thought.
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
Reply With Quote Reply With Quote
10-04-2007 12:09 AM #4
Re: How do you get ECONNRESET on recv?
Andrei Voropaev writes:
> So, theoretically, you can see ECONNRESET in recv only if the peer does
> shutdown(SHUT_RD) and you try to send some data after this. Which
> usually never happens More often the peer closes socket unexpectedly
> while you are sending many chunks of data and as result you get
> SIGPIPE, because your first send triggers RESET, and your second send
> triggers SIGPIPE, because you didn't see the RESET.
Actually, a more common cause is that the peer uses the SO_LINGER
option, sets l_onoff to 1 (true) and l_linger to 0 (zero time), then
closes the socket. On systems that implement BSD sockets properly,
that causes the system to emit TCP RST and blow away the connection.
Your application will then see ECONNRESET or SIGPIPE or EPIPE,
depending on where it was when the message was received.
--
James Carlson, Solaris Networking
Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Reply With Quote Reply With Quote
10-04-2007 12:09 AM #5
Re: How do you get ECONNRESET on recv?
In article ,
Rick Jones wrote:
> chsalvia@gmail.com wrote:
> > The man page for recv and read list the error ECONNRESET as an error
> > condition that happens when "A connection was forcibly closed by a
> > peer." I take this to mean that, assuming a TCP connection, if a
> > client is recv'ing from a server, and the server suddenly crashes,
> > then on the client side recv will return -1 and set errno to
> > ECONNRESET.
>
> Actually no When the server _system_ suddenly crashes, your
> application receives nothing.
But if you were sending something at the time that it crashed, you
system will keep retransmitting. When the system reboots, it will
respond to the retransmission with a RST, and this will cause you to get
ECONNRESET.
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
Reply With Quote Reply With Quote
10-04-2007 12:09 AM #6
Re: How do you get ECONNRESET on recv?
Barry Margolin wrote:
> But if you were sending something at the time that it crashed, you
> system will keep retransmitting. When the system reboots, it will
> respond to the retransmission with a RST, and this will cause you to
> get ECONNRESET.
I thought one got some sort of timed-out or unreachable errno or
somesuch?
rick jones
--
No need to believe in either side, or any side. There is no cause.
There's only yourself. The belief is in your own precision. - Jobert
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
Reply With Quote Reply With Quote
10-04-2007 12:09 AM #7
Re: How do you get ECONNRESET on recv?
In article ,
Rick Jones wrote:
> Barry Margolin wrote:
> > But if you were sending something at the time that it crashed, you
> > system will keep retransmitting. When the system reboots, it will
> > respond to the retransmission with a RST, and this will cause you to
> > get ECONNRESET.
>
> I thought one got some sort of timed-out or unreachable errno or
> somesuch?
Only if the reboot takes longer than the retransmission limit. In the
days when a reboot took several minutes that would be likely, but these
days many systems can reboot in under a minute (unless they have to do
lengthy fsck's), so the ECONNRESET is a possibility.
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
Reply With Quote Reply With Quote
10-04-2007 12:09 AM #8
Re: How do you get ECONNRESET on recv?
Barry Margolin wrote:
> Only if the reboot takes longer than the retransmission limit. In
> the days when a reboot took several minutes that would be likely,
> but these days many systems can reboot in under a minute (unless
> they have to do lengthy fsck's), so the ECONNRESET is a possibility.
But what of the RFC suggested (or is it mandated?) quiet time on stack
start?-)
rick jones
--
a wide gulf separates "what if" from "if only"
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
Reply With Quote Reply With Quote
10-04-2007 12:09 AM #9
Re: How do you get ECONNRESET on recv?
In article ,
Rick Jones wrote:
> Barry Margolin wrote:
> > Only if the reboot takes longer than the retransmission limit. In
> > the days when a reboot took several minutes that would be likely,
> > but these days many systems can reboot in under a minute (unless
> > they have to do lengthy fsck's), so the ECONNRESET is a possibility.
>
> But what of the RFC suggested (or is it mandated?) quiet time on stack
> start?-)
If I understand it correctly, this just prohibits the rebooted system
from initiating connections during the quiet time. It doesn't affect
responding to segments received. In fact, the point of the quiet time
is to ensure that new connections don't inadvertently reuse the port and
sequence numbers of connections from before the reboot, which would
prevent responding to those packets with RST.
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
Reply With Quote Reply With Quote
10-04-2007 12:10 AM #10
Re: How do you get ECONNRESET on recv?
I have written a multi-threaded C client application running on HP UX
11 Pa RISC that sends a SOAP request via send( ) to a webservice
residing on a Windows PC. The send( ) is always successful, and I
perform no other socket calls until I issue a recv( ) to get the
webservice response. Very occasionally the application either gets no
reponse or an ECONNRESET response. We suspect some sort of network
issue between the client and server. I have built and deployed my
client application for several other platforms including Solaris and
Windows and have never experienced this issue.
I feel that my application should handle this situation more
gracefully, and to this end my questions are:
Is it safe to assume anything regarding the state of the send( )
request on the webservice server? More specifically, what would be the
proper way for my client application to recover?
On Aug 9, 3:27 pm, Rick Jones wrote:
> chsal...@gmail.com wrote:
> > The man page for recv and read list the error ECONNRESET as an error
> > condition that happens when "A connection was forcibly closed by a
> > peer." I take this to mean that, assuming a TCP connection, if a
> > client is recv'ing from a server, and the server suddenly crashes,
> > then on the client side recv will return -1 and set errno to
> > ECONNRESET.
>
> Actually no When the server _system_ suddenly crashes, your
> application receives nothing.
>
> ECONNRESET means the connection has received a ReSeT (RST) segment
> (ostesibly) from the remote TCP. There are a multitude of reasons
> such a segment could be received, including, but not limited to:
>
> *) the remote abused SO_LINGER and did an abortive close of the
> connection
>
> *) your application sent data which arrived after the remote called
> shutdown(SHUT_RD) or close()
>
> *) the remote TCP hit a retransmission limit and aborted (yes, if the
> data segments weren't getting through the chances of the RST making
> it are slim, but still non-zero)
>
> *) there was some actual TCP protocol error between the two systems
>
> 99 times out of ten if the server _application_ terminates
> (prematurely) the normal close() which happens on almost all platorms
> will cause TCP to emit a FINished (FIN) segment. That would then be a
> recv/read return of zero at your end. Of course if your application
> ignored that and then tried to send something, that brings us to the
> second bullet item above.
>
> > But the man page says that recv should only return 0 if "the peer
> > has performed an orderly shutdown". But a SIGSEGV is certainly not
> > my idea of an "orderly shutdown"!
>
> Ah, but as per above, 99 times out of ten, when the OS is cleaning-up
> after the SIGSEGV'd application, it goes ahead and calls (the moral
> equivalent to) close(), which unless perhaps the application has set
> the abortive close SO_LINGER options will result in a FIN being sent.
> The TCP code doesn't know the difference between a close() from the
> app making a direct call, the system making a close() call on normal
> program termination, or one from abnormal termination.
>
> I suppse you could try setting the SO_LINGER options on the server
> code to cause an RST when close() is called and then see what killing
> the process does. Just be sure that you only do that in a debug
> version and/or have code to put SO_LINGER back the way it should be
> when doing a "normal" close() in your server app. Hmm, that might be
> one of the few valid (IMO) reasons to use that otherwise heinous
> direct-to_RST SO_LINGER option... Perhaps one day I will try that
> with netperf.
>
> rick jones
> --
> Process shall set you free from the need for rational thought.
> these opinions are mine, all mine; HP might not want them anyway...
> feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
Reply With Quote Reply With Quote
10-04-2007 12:10 AM #11
Re: How do you get ECONNRESET on recv?
Fish Maker wrote:
> I have written a multi-threaded C client application running on HP
> UX 11 Pa RISC that sends a SOAP request via send( ) to a webservice
> residing on a Windows PC. The send( ) is always successful, and I
> perform no other socket calls until I issue a recv( ) to get the
> webservice response. Very occasionally the application either gets
> no reponse or an ECONNRESET response. We suspect some sort of
> network issue between the client and server. I have built and
> deployed my client application for several other platforms including
> Solaris and Windows and have never experienced this issue.
> I feel that my application should handle this situation more
> gracefully, and to this end my questions are:
> Is it safe to assume anything regarding the state of the send( )
> request on the webservice server? More specifically, what would be
> the proper way for my client application to recover?
If you have received nothing but the ECONRESET on the recv() you can
assume nothing about the state of the request on the server. You do
not know if the server application received the data, nor if it acted
upon the data if it did receive it. To know that you need to receive
some sort of message from the server application.
I'm not fully up on all the terminology, but you may want to web
search on "two phase commit."
rick jones
--
web2.0 n, the dot.com reunion tour...
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

View File

@@ -0,0 +1,81 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-02-27T20:06:34+08:00
====== TCP连接关闭总结 ======
Created Monday 27 February 2012
http://blog.csdn.net/shallwake/article/details/5250467
由于涉及面太广只作简单整理有兴趣的可参考《UNIX Networking Programming》volum 1, Section 5.7, 5.12, 5.14, 5.15, 6.6 以及7.5 SO_LINGER选项。
以一个简单的echo服务器为例客户端从标准输入读入字符发送给服务器服务器收到后再原样返回客户端收到后打印到标准输出。
那么,关于套接字的关闭有以下几种情形:
1客户端主动关闭连接
1.1客户端调用close()
1.2,客户端进程关闭
1.3客户端调用shutdown()
1.4客户端调用close()+SO_LINGER选项
1.5,客户端崩溃(突然断电网线拔出非正常关机导致内核没有发送FIN没有重启)
2服务器关闭连接
2.1服务器调用close()
2.2,服务器进程关闭
2.3,服务器崩溃
2.4,服务器崩溃+SO_KEEPALIVE选项
========================================分割线=========================================
1.1与1.2等价就算客户端进程关闭系统内核也会自动close(socket)且注意当socket__引用为0时__才会真正调用close()__close()总是立即返回的然后由系统尝试发送完内核缓冲区内的所有数据接着才发送FIN__。所以__进程退出后其发送的数据有可能还没发到对方__。
说道这里不得不谈谈TCP连接关闭的四次握手。可以看成是2组FIN, ACK。主动关闭的一方先发送FIN收到ACK后进入FIN_WAIT2状态此时也叫做“__半关闭”状态__特别须要注意的是此时__主动关闭一方的套接字依然可以接收数据包但是不能发送数据包__。
注意:
1. 这里的“发”是指本地TCP发送FIN并收到ACK后( __可能由close()或shutdown(SHUT_WR)引起)再执行的__send或write系统调用**不包括已经在发端的内核TCP缓冲区中**未发送的数据 (发送这些数据的send在close前调用而且成功返回)。
2. 如果在close或shutdown后继续发数据则sendwrite有__可能收到SIGPIPE然后出错返回errno为EPIPE__,
被动关闭的一方此时收到FIN了一般情况下都是__由read(socket)返回0然后得知对方关闭(但是本地还可以继续发数据。)__close(socket)后另外一组FINACK随之产生此时主动方进入TIME_WAIT状态。即四次握手完成。
以上即是正常情况下连接关闭的情形。
再看看1.3shutdown()与close()主要有3点区别
* __shutdown()不理会引用计数与内核缓冲区内剩余待发数据包直接发送FIN对于关闭发送而言__
* shutdown()可以只关闭套接字__某个方向__的连接例如关闭发送关闭接收或者2者都关闭
__实际上shutdown(write)后就是上面说的半关闭情形依然可以完成四次握手。__
===== 再看看1.4为什么要设置SO_LINGER呢 =====
SO_LINGER的目的就是__改变close()的默认行为__可以决定close()在哪个状态返回或者让套接字__立即发送RST(而且没有TIME_WAIT状态)__从而没有FIN的发送接收方返回ECONNRESET错误连接**直接关闭**。
再来总结下1.1-1.4,这么多关闭连接的方式,那么什么方式才是最好的呢?
择优选择的方式当然是考虑最恶劣的情况,对方主机崩溃或网络故障导致数据包传输停滞。
* RST不用考虑了直接TIME_WAIT状态都没如果有网络故障可能**下次创建的套接字还会接收到已经被销毁的套接字的数据报**。
* close()不能保证对方一定收到FIN(因为close总是**立即返回**的有内核尝试发完TCP缓冲区中的所有数据然后发送FIN。但这时__发送进程可能已经结束__了。)。
* close()+SO_LINGER虽然能控制close()在__收到ACK后返回__依然不能保证四次握手完成。
* shutdown()先进入半关闭状态再调用read()返回0收到对方FIN则说明四次握手正常进行__此为最优方式__。
其实仔细想想一般情况也不用这么麻烦拿网游服务器来说客户端close()后就算服务器不知道那么这种情况归为1.5讨论如果是服务端close()而客户端不知道那么归为2.3讨论。总之都有解决办法。。
现在再讨论1.5很简单服务端加入链路异常检测机制即可这也是所有大型TCP服务器必备的机制__定时发送小数据包检测客户端是否有异常退出__。
========================================分割线=========================================
服务器关闭连接方面:
2.12.2等价一般情况下也与1.11.2等价,只是主动关闭方是服务器了。
2.3服务器崩溃客户端由于一直收不到ACK会一直尝试发送数据标准socket大概是__9分钟__后才会返回错误。
2.3服务器崩溃客户端又长时间与服务器没有数据交互此时设置__SO_KEEPALIVE__选项可得知。
========================================分割线=========================================
后记网络是门复杂的学问由此TCP连接的关闭可见一斑。普通程序员通常不会考虑这么细致但是我相信这些问题一直困扰着他们。
补充说明经试验在Windows平台__1.2 2.2情况等同于close()+SO_LINGER选项直接发送RST__可能由于系统必须及时清理资源吧这点**与linux是不同**的,有兴趣的可以试试。

View File

@@ -0,0 +1,52 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-02-27T20:43:52+08:00
====== 理解套接字recv()send() ======
Created Monday 27 February 2012
http://blog.csdn.net/shallwake/article/details/5273727
今天看UNP时找到了个很不错的图示觉得理解清楚后就基本没什么问题了在这里做个简单整理注意此图示是假设从stdin接受输入然后send给套接字发送从套接字recv后传给stdout输出。
===== send内核发送缓冲区注意发送和接收缓冲区是环形的。 =====
{{./1.jpg}}
* tooptr 指向下一个__将传送给socket__的字节
* toiptr 指向下一个可以__接收应用层数据__的位置
所以:
* 要传送给套接字的数据长度就是toiptr - tooptr。
* 内核缓冲区可以接受stdin传来的数据长度是&to[MAXLINE] - toiptr。
* 阻塞模式下应用层copy数据至内核缓冲区即返回若没有足够缓冲区容纳传来的__整个数据__如网络太慢则阻塞至有足够空间。
* 非阻塞模式下若缓冲区__已满__立即返回EWOULDBLOCK有缓冲区立即返回的是__已经copy了的数据长度__。
=============================分割线===================================
===== recv内核接收缓冲区 =====
{{./2.jpg}}
* froptr 指向下一个将__传送给应用层__的字节
* friptr 指向下一个可以__接收socket数据__的位置
所以:
* 要__传送给应用层__的数据长度就是friptr - froptr 。
* 内核缓冲区可以接受__socket传来__的数据长度是&fr[MAXLINE] - friptr。
* 阻塞模式下若缓冲区内无数据可读则__阻塞等待至有数据才返回数据长度不定__可以是1个字节也可以是一个完整数据包
* 非阻塞模式下若缓冲区内无数据立即返回EWOULDBLOCK有缓冲区与上面相同。
=============================分割线===================================
===== 总结: =====
* 无论阻塞还是非阻塞不要指望send(n) or recv(n)就一定能发送或接收n字节的数据。
* 把内核缓冲区理解清楚对网络编程理解很有帮助。
===== 思考: =====
众所周知一个服务器设计原则是“__不要使用任何阻塞操作__”。
很容易理解一是充分利用CPU二则是安全性比如恶意客户很容易让服务器阻塞在它上面。
关于__非阻塞的安全性__我看过很多代码都是把非阻塞send放进一个循环里没有发送完指定n个数据则不退出这在正常情况下可以但是若网络比较慢根据上面图示推测显然while()退出也缓慢这势必会影响服务器对其他套接字数据的发送。更不用考虑若对方是恶意用户比如只接收一个字节则sleep。。
所以我觉得高性能服务器不能用阻塞也不能把任何I/O操作放进循环直到操作完期望数据这点以后再整理。。。
可以用pollepollselect等。

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB