What is the correct way of reading from a TCP socket in C/C++?


Question

Here's my code:

// Not all headers are relevant to the code snippet.
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <cstdlib>
#include <cstring>
#include <unistd.h>

char *buffer;
stringstream readStream;
bool readData = true;

while (readData)
{
    cout << "Receiving chunk... ";

    // Read a bit at a time, eventually "end" string will be received.
    bzero(buffer, BUFFER_SIZE);
    int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE);
    if (readResult < 0)
    {
        THROW_VIMRID_EX("Could not read from socket.");
    }

    // Concatenate the received data to the existing data.
    readStream << buffer;

    // Continue reading while end is not found.
    readData = readStream.str().find("end;") == string::npos;

    cout << "Done (length: " << readStream.str().length() << ")" << endl;
}

It's a little bit of C and C++ as you can tell. The BUFFER_SIZE is 256 - should I just increase the size? If so, what to? Does it matter?

I know that if "end" is not received for what ever reason, this will be an endless loop, which is bad - so if you could suggest a better way, please also do so.

1
23
3/22/2009 9:24:50 PM

Accepted Answer

Without knowing your full application it is hard to say what the best way to approach the problem is, but a common technique is to use a header which starts with a fixed length field, which denotes the length of the rest of your message.

Assume that your header consist only of a 4 byte integer which denotes the length of the rest of your message. Then simply do the following.

// This assumes buffer is at least x bytes long,
// and that the socket is blocking.
void ReadXBytes(int socket, unsigned int x, void* buffer)
{
    int bytesRead = 0;
    int result;
    while (bytesRead < x)
    {
        result = read(socket, buffer + bytesRead, x - bytesRead);
        if (result < 1 )
        {
            // Throw your error.
        }

        bytesRead += result;
    }
}

Then later in the code

unsigned int length = 0;
char* buffer = 0;
// we assume that sizeof(length) will return 4 here.
ReadXBytes(socketFileDescriptor, sizeof(length), (void*)(&length));
buffer = new char[length];
ReadXBytes(socketFileDescriptor, length, (void*)buffer);

// Then process the data as needed.

delete [] buffer;

This makes a few assumptions:

  • ints are the same size on the sender and receiver.
  • Endianess is the same on both the sender and receiver.
  • You have control of the protocol on both sides
  • When you send a message you can calculate the length up front.

Since it is common to want to explicitly know the size of the integer you are sending across the network define them in a header file and use them explicitly such as:

// These typedefs will vary across different platforms
// such as linux, win32, OS/X etc, but the idea
// is that a Int8 is always 8 bits, and a UInt32 is always
// 32 bits regardless of the platform you are on.
// These vary from compiler to compiler, so you have to 
// look them up in the compiler documentation.
typedef char Int8;
typedef short int Int16;
typedef int Int32;

typedef unsigned char UInt8;
typedef unsigned short int UInt16;
typedef unsigned int UInt32;

This would change the above to:

UInt32 length = 0;
char* buffer = 0;

ReadXBytes(socketFileDescriptor, sizeof(length), (void*)(&length));
buffer = new char[length];
ReadXBytes(socketFileDescriptor, length, (void*)buffer);

// process

delete [] buffer;

I hope this helps.

32
8/15/2013 11:56:14 AM

Several pointers:

You need to handle a return value of 0, which tells you that the remote host closed the socket.

For nonblocking sockets, you also need to check an error return value (-1) and make sure that errno isn't EINPROGRESS, which is expected.

You definitely need better error handling - you're potentially leaking the buffer pointed to by 'buffer'. Which, I noticed, you don't allocate anywhere in this code snippet.

Someone else made a good point about how your buffer isn't a null terminated C string if your read() fills the entire buffer. That is indeed a problem, and a serious one.

Your buffer size is a bit small, but should work as long as you don't try to read more than 256 bytes, or whatever you allocate for it.

If you're worried about getting into an infinite loop when the remote host sends you a malformed message (a potential denial of service attack) then you should use select() with a timeout on the socket to check for readability, and only read if data is available, and bail out if select() times out.

Something like this might work for you:

fd_set read_set;
struct timeval timeout;

timeout.tv_sec = 60; // Time out after a minute
timeout.tv_usec = 0;

FD_ZERO(&read_set);
FD_SET(socketFileDescriptor, &read_set);

int r=select(socketFileDescriptor+1, &read_set, NULL, NULL, &timeout);

if( r<0 ) {
    // Handle the error
}

if( r==0 ) {
    // Timeout - handle that. You could try waiting again, close the socket...
}

if( r>0 ) {
    // The socket is ready for reading - call read() on it.
}

Depending on the volume of data you expect to receive, the way you scan the entire message repeatedly for the "end;" token is very inefficient. This is better done with a state machine (the states being 'e'->'n'->'d'->';') so that you only look at each incoming character once.

And seriously, you should consider finding a library to do all this for you. It's not easy getting it right.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon