8
\$\begingroup\$

I am building a web server from scratch and trying to make certain tasks faster by embedding C code into the mix for performance. Specifically I'm worried about how thestd::string class with the.find() and other functions compare to straight pointer arithmetic.

#include <iostream>#include <map>#include <string>std::map<std::string, std::string> http_request;void parse_header( void * );int main(){    char * msg= "GET / HTTP/1.1\r\n"                "Host: 192.241.213.46:6880\r\n"                "Upgrade-Insecure-Requests: 1\r\n"                "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n"                "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/602.4.8 (KHTML, like Gecko) Version/10.0.3 Safari/602.4.8\r\n"                "Accept-Language: en-us\r\n"                "Accept-Encoding: gzip, deflate\r\n"                "Connection: keep-alive\r\n\r\n";    parse_header( msg );}void parse_header( void *msg ){    char *head = (char *) msg;    char *mid;    char *tail = head;    if( sizeof( msg ) == 0 )    {        return;    }    // Find request type    while( *head++ != ' ');    http_request[ "Type" ] = std::string( ( char * ) msg ).substr( 0 , ( head - 1) - tail );    // Find path    tail = head;    while( *head++ != ' ');    http_request[ "Path" ] = std::string( ( char * ) msg ).substr( tail - ( char *)msg , ( head - 1) - tail );    // Find HTTP version    tail = head;    while( *head++ != '\r');    http_request[ "Version" ] = std::string( ( char * ) msg ).substr( tail - ( char *)msg , ( head - 1) - tail );    // Map all headers from a key to a value    while( true )    {        tail = head + 1;        while( *head++ != '\r' );        mid = strstr( tail, ":" );           // Look for the failed strstr           if( tail > mid )            break;        http_request[ std::string( ( char * ) msg ).substr( tail - ( char *)msg , ( mid ) - tail  ) ] = std::string( ( char * ) msg ).substr( mid + 2 - ( char *) msg , ( head - 3 ) - mid );    }    // Determine if successful    std::cout << http_request[ "Host" ] << std::endl;     std::cout << http_request[ "Upgrade-Insecure-Requests" ] << std::endl;     std::cout << http_request[ "Accept" ] << std::endl;     std::cout << http_request[ "User-Agent" ] << std::endl;     std::cout << http_request[ "Accept-Language" ] << std::endl;     std::cout << http_request[ "Accept-Encoding" ] << std::endl;     std::cout << http_request[ "Connection" ] << std::endl; }

Data comes in via the linuxsocket.send() function, so it is avoid * type. I directly send that data to aparse_header function to create astd::map for easy access. Is it necessary to be this low-level, or can this speed be obtained by the STL?

Note: I killed OO design for a minimal example. The end-goal has an HTTP class.

Note++: I'd prefer to have this code review pertain to performance rather than c++ semantics, I'd like it to be as conforming as possible while also being fast and lightweight. This seems to be the purest form of the cluge of C/C++.

askedMar 6, 2017 at 5:45
Shawnic Hedgehog's user avatar
\$\endgroup\$
5
  • \$\begingroup\$There is something wrong:sizeof(msg) == 0, I don't think this is what you meant.\$\endgroup\$CommentedMar 6, 2017 at 8:12
  • \$\begingroup\$You are right. sizeof the pointer was not what I was meaning! Thanks!\$\endgroup\$CommentedMar 6, 2017 at 17:40
  • \$\begingroup\$This is a false worry. The string implementation is efficient.\$\endgroup\$CommentedMar 6, 2017 at 18:33
  • \$\begingroup\$IF you are worried abut speed usenginx\$\endgroup\$CommentedMar 6, 2017 at 18:35
  • \$\begingroup\$@Loki I use nginx for the webserver, this code is for handling API requests, sorry for the confusion. I benchmarked the code against Jorge Omar Medra's example and my code comes out roughly ~20us or so faster( vs ~50us avg), which is most likely due to less function calls. I guess the question now is: the c++ code provided is more maintainable because everyone hates pointers, but is the difference in speed worth it if I'm the only programmer? I was thinking if you were getting thousands of requests a second 30us is more beneficial than 50us.\$\endgroup\$CommentedMar 6, 2017 at 18:41

2 Answers2

2
\$\begingroup\$

Note++: I'd prefer to have this code review pertain to performance rather than c++ semantics

I get it, I really do (he says unconvincingly); but I also really think that you're going about this backwards. Step 1 isalways to "make it work"; you don't get to "make it fast, make it cheap" untilafter you get it working. So you should momentarily forget about performance and just work on fixing the bugs and infelicities; and then if it turns out to be slow, you find the bottlenecks and improve them. (I'm not going as far as to say you need a profiler; you can find bottlenecks with eyeballs and intuition too. But there's no point in micro-polishing code until the bugs are out.)


So:sizeof(msg)==0 is never true. It looks like you're wanting to check the size of the message itself (like, number of bytes in the message); so, you need to add a function parameter toget the message size.(void *msg, size_t msg_size) would do fine. In C++17 the new party line is probably to use astd::string_view.


while( *head++ != ' '); is a guaranteed segfault, as soon as I throw a malformed request at your server. Don't even give me the opportunity to segfault your code. Say what you mean:

while (head != msg_end && *head == ' ') ++head;

The repeated subexpressionstd::string( ( char * ) msg ) is a red flag: both because you're creating a string over and over (which does heap allocation, which is slow) and because you're using a C-style cast (type-casts are always red flags in both C and C++)and because you're casting awayconst./me checks the function signature again — No, you'renot casting awayconst. But you should be! A parsing function shouldn't be allowed to mutate the message buffer. Make that signature(const void *msg, size_t msg_size). Or even better:(const char *msg, const char *msg_end).


Iostreams are slow; don't usestd::cout << ... if you care about micro-optimizing performance. — But in fact you shouldn't be printing output from within your parsing function anyway! Let's just have the parsing functionreturn the map of headers.

Also, your variable names seem confused. I'd try to make each extracted string run fromhead totail, since that's how we usually talk about things in English. So:

void parse_header(const char *msg, const char *msg_end){    const char *head = msg;    const char *tail = msg;    // Find request type    while (tail != msg_end && *tail != ' ') ++tail;    http_request["Type"] = std::string(head, tail);    // Find path    while (tail != msg_end && *tail == ' ') ++tail;    head = tail;    while (tail != msg_end && *tail != ' ') ++tail;    http_request["Path"] = std::string(head, tail);    // Find HTTP version    while (tail != msg_end && *tail == ' ') ++tail;    head = tail;    while (tail != msg_end && *tail != '\r') ++tail;    http_request["Version"] = std::string(head, tail);    if (tail != msg_end) ++tail;  // skip '\r'    // TODO: what about the trailing '\n'?    // Map all headers from a key to a value    head = tail;    while (head != msg_end && *head != '\r') {        while (tail != msg_end && *tail != '\r') ++tail;        const char *colon = memchr(head, tail, ':');        if (colon == NULL) {            // TODO: malformed headers, what should happen?            break;        }        const char *value = colon+1;        while (value != tail && *value == ' ') ++value;        http_request[ std::string(head, colon) ] = std::string(value, tail);        head = tail+1;        // TODO: what about the trailing '\n'?    }    return http_request;}

Completely untested, mind you; but the most important things about the above code are that- it uses the appropriatestd::string constructor instead of repeatedly allocating a gigantic string and thensubstr'ing parts of it;- it follows your original code (good idea!) in repeating the same basic idiom over and over, mechanically, so that if there ends up being a bug in the logic it's easy to see all the places to apply the fix.


Another thing to try if you havelots of headers, or if you're planning to dolots of lookups in the resulting map, would be to try usingstd::unordered_map instead ofstd::map. Less pointer-chasing often means faster code.

answeredMar 6, 2017 at 20:03
Quuxplusone's user avatar
\$\endgroup\$
6
  • \$\begingroup\$Thanks for the reminder about seg faults. I was assuming too much in my code. I also had no idea about that particular std::string constructor, so yes, that definitely simplified the code! I'm also guessing you'd rather message_end rather than message_length because you'd have to have more math to see if the pointer is in the right position, correct (i.e., tail - msg < msg_end - msg)? Finally, is &msg[ sizeof( msg ) - 1] the cheapest way to get the end of an array?\$\endgroup\$CommentedMar 6, 2017 at 21:10
  • \$\begingroup\$If you're interested, I also benchmarked the two which can be found herepastebin.com/bLBzapgi. This solution is approximately 3x faster than the original.. really goes to show how expensive the heap is.\$\endgroup\$CommentedMar 6, 2017 at 22:09
  • \$\begingroup\$@ShawnicHedgehog: I think you're still misunderstanding whatsizeof actually does. It isnot a synonym forstrlen.\$\endgroup\$CommentedMar 7, 2017 at 0:38
  • 1
    \$\begingroup\$I am aware that sizeof returns the amount of bytes. I'm misusing sizeof in the OP with a pointer. On my machine a character is 1 byte. If I wanted it to be perfect I would do sizeof( data ) / sizeof ( data[0] ) which would find the number of elements in an array. It works in this example because I'm specifying const char array[] instead of const char *array. I thought we are not able to update the OP's code, otherwise I would've fixed it.\$\endgroup\$CommentedMar 7, 2017 at 0:41
  • 1
    \$\begingroup\$Didn't mean to sound torte, hard to set a tone over the internet. Thanks for the advice, strchr seems to be what I'm looking for.\$\endgroup\$CommentedMar 7, 2017 at 1:11
2
\$\begingroup\$

First of all, The Http server send all the headers Line by Line. In your sample i see you created a dummy sample with the next buffer:

char * msg= "GET / HTTP/1.1\r\n"            "Host: 192.241.213.46:6880\r\n"            "Upgrade-Insecure-Requests: 1\r\n"            "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n"            "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/602.4.8 (KHTML, like Gecko) Version/10.0.3 Safari/602.4.8\r\n"            "Accept-Language: en-us\r\n"            "Accept-Encoding: gzip, deflate\r\n"            "Connection: keep-alive\r\n\r\n";

I suppose that you first recive all the header before analyzing it with your methodvoid parse_header( void *msg )... and that is the reason that the task to get each Key-Value is so hard because you are using parsers like this:

http_request[ std::string( ( char * ) msg ).substr( tail - ( char *)msg , ( mid ) - tail  ) ] = std::string( ( char * ) msg ).substr( mid + 2 - ( char *) msg , ( head - 3 ) - mid );

What will I do to improve your code:

  1. I will keep themap<string,string> to store all headers.

  2. Read the header one line by time and, before to continue with the other line, analyze it.

  3. I don't see any issue with the code to parse the first but I will modify it to analyze is a single line and simplify the parsers.

If you read and analyze one line by line, the code to map all the keys with values could be like this:

void parseFirstLine(string line){    string key = "";    string value = "";    int  position, lpost;    // Find request type    position = line.find(' ');    http_request[ "Type" ] = line.substr(0, position);    position++; //Skip character ' '    // Find path    position = line.find(' ', lpost);    http_request[ "Path" ] = line.substr(lpost, (position-lpost));    position++; //Skip character ' '    // Find HTTP version    http_request[ "Version" ] = line.substr(position);}void parseHeader(string line){    string key = "";    string value = "";    if(data.size() == 0) return;    int posFirst = line.find(":",0); //Look for separator ':'    key = line.substr(0, posFirst);    value = line.substr(posFirst + 1);    http_request[key] = value;}

(*) This code could have some bugs because I'm writing this code without any developer environment to review this.

Is your decision where you get each line to analyze, at the moment you get it from socket reading or after finishing of read all the header.

Vogel612's user avatar
Vogel612
25.5k7 gold badges59 silver badges141 bronze badges
answeredMar 6, 2017 at 8:50
Jorge Omar Medra's user avatar
\$\endgroup\$
2
  • \$\begingroup\$I think you misunderstood.. I am the Http server; this is server-side code. Data is received using the socket recv functionman7.org/linux/man-pages/man2/recv.2.html. This function requires a data buffer and a known data buffer length. To my knowledge it would be incredibly more difficult to read 1 line at a time since you have to assign a length, and you don't know the length because the first line is variable, and the specific headers required change if authorization is necessary.\$\endgroup\$CommentedMar 6, 2017 at 17:39
  • \$\begingroup\$Yes, @Shawnic-Hedgehog, i understood you and it is the same case for both sides. The server always recives a Http request which the header are segment by lines followed by the contest, the response has the same structure. I have some sample to read the Request line by line. I have a sample of how to read it, but use the lib curl:github.com/jorgemedra/HTTPSClient/blob/master/HTTPSClient/…. My post is in spanish.\$\endgroup\$CommentedMar 6, 2017 at 18:14

You mustlog in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.