Parsing a HTTP request

Question 1

I'm currently writing apure ruby webserver, and one of the things that I have to do is parse a HTTP request. The method I've pasted below takes a HTTP request, and puts it in a map keyed by theheader field names.

The biggest issue I faced while doing this was dealing with the lack of anEOF from theTCPSocket on requests with bodies (basically everyPOST request). This meant that I couldn't just keep doingfile.gets until I reached the end of the file, because there was no end. What I ended up doing instead, was to create awhile true loop that breaks when it finds'\r\n', which are the last characters before theBody, then read theBody separately using the number I get fromContent-length.

Is there a more elegant way inruby to do this? I feel like it's unnecessary to use a infinite loop, but I can't think of anything else that would work.

   # Takes a HTTP request and parses it into a map that's keyed   # by the title of the heading and the heading itself.   # Request should always be a TCPSocket object.   def self.parse_http_request(request)      headers = {}      #get the first heading (first line)      headers['Heading'] = request.gets.gsub /^"|"$/, ''.chomp      method = headers['Heading'].split(' ')[0]      #parse the header      while true         #do inspect to get the escape characters as literals         #also remove quotes         line = request.gets.inspect.gsub /^"|"$/, ''         #if the line only contains a newline, then the body is about to start         break if line.eql? '\r\n'         label = line[0..line.index(':')-1]         #get rid of the escape characters         val = line[line.index(':')+1..line.length].tap{|val|val.slice!('\r\n')}.strip         headers[label] = val      end       #If it's a POST, then we need to get the body      if method.eql?('POST')         headers['Body'] = request.read(headers['Content-Length'].to_i)       end       return headers   end

Question 2

I don't know if this is the best method to parse an HTTP request. You probably should have a look into one of the simpler ruby http servers out there to get some ideas. I do know though that you're doing some strange things here:

headers['Heading'] = request.gets.gsub /^"|"$/, ''.chomp

It looks like you're calling the #chomp method on'' here, which doesn't make sense. You really should use parenthesis here:

headers['Heading'] = request.gets.gsub(/^"|"$/, '').chomp

But on the other hand I can't see what the gsub is doing here, you probably can just get rid of it.

method = headers['Heading'].split(' ')[0]

Probably nitpicking but you can make this easier (and a little faster) by using:

method = headers['Heading'][/[^ ]*/]

That translates to: get the substring till the first space character (so no need to create an array here).

line = request.gets.inspect.gsub /^"|"$/, ''

That is really strange, you don't need to "inspect" a string just to parse control characters. You can match control characters with double quoted strings (e.g. "\r\n").

label = line[0..line.index(':')-1]#get rid of the escape charactersval = line[line.index(':')+1..line.length].tap{|val|val.slice!('\r\n')}.strip

The ruby god weeps. You can make this much simpler (and more efficient) by using a regexp here:

line =~ /(.*?): (.*)/label = $1val = $2.strip

Modifying the stream inside #tap is bad style and the #strip removes surrounding whitespace like "\n" and "\r" anyway.

For the loop you can do the most natural thing and just write the break condition in the while condition part:

while (line = request.gets) != "\r\n"  ...end

Question 3

I know this is a bit dated, but I have been working on the same issue. I think the method that you are looking for isrequest.readpartial. You can see the docshere.

This takes amaxlen of bytes as an argument. If you reach the end of the data from callingIO.readpartial(maxlen), or hit the max length of bytes before the end of the incoming data on theTCPSocket, it will return the data.

This can replace thewhile true and\r\n logic needed to find the end of the incoming data.

Question 4

The check for\r\n is there to check the end of the header, not the end of the input stream. But readpartial might be necessary to read the whole body when no EOF is send.

Question 5

One Tip: If I were about to write a ruby webserver I would take a serious look intoRagel. That is a special DSL to write write a parsing state machine and it can be compiled to Ruby (and also to C if Perfomance becomes an issue). Zed Shaw used this for Mongrel and it was the most popular ruby webserver for a while.

David Ongaro 5394 silver badges15 bronze badges · Answer 1 · 2015-03-12 01:31:38Z

I don't know if this is the best method to parse an HTTP request. You probably should have a look into one of the simpler ruby http servers out there to get some ideas. I do know though that you're doing some strange things here:

headers['Heading'] = request.gets.gsub /^"|"$/, ''.chomp

It looks like you're calling the #chomp method on'' here, which doesn't make sense. You really should use parenthesis here:

headers['Heading'] = request.gets.gsub(/^"|"$/, '').chomp

But on the other hand I can't see what the gsub is doing here, you probably can just get rid of it.

method = headers['Heading'].split(' ')[0]

Probably nitpicking but you can make this easier (and a little faster) by using:

method = headers['Heading'][/[^ ]*/]

That translates to: get the substring till the first space character (so no need to create an array here).

line = request.gets.inspect.gsub /^"|"$/, ''

That is really strange, you don't need to "inspect" a string just to parse control characters. You can match control characters with double quoted strings (e.g. "\r\n").

label = line[0..line.index(':')-1]#get rid of the escape charactersval = line[line.index(':')+1..line.length].tap{|val|val.slice!('\r\n')}.strip

The ruby god weeps. You can make this much simpler (and more efficient) by using a regexp here:

line =~ /(.*?): (.*)/label = $1val = $2.strip

Modifying the stream inside #tap is bad style and the #strip removes surrounding whitespace like "\n" and "\r" anyway.

For the loop you can do the most natural thing and just write the break condition in the while condition part:

while (line = request.gets) != "\r\n"  ...end

nrako 1311 bronze badge · Answer 2 · 2015-03-11 16:49:20Z

I know this is a bit dated, but I have been working on the same issue. I think the method that you are looking for isrequest.readpartial. You can see the docshere.

This takes amaxlen of bytes as an argument. If you reach the end of the data from callingIO.readpartial(maxlen), or hit the max length of bytes before the end of the incoming data on theTCPSocket, it will return the data.

This can replace thewhile true and\r\n logic needed to find the end of the incoming data.

The check for\r\n is there to check the end of the header, not the end of the input stream. But readpartial might be necessary to read the whole body when no EOF is send.
One Tip: If I were about to write a ruby webserver I would take a serious look intoRagel. That is a special DSL to write write a parsing state machine and it can be compiled to Ruby (and also to C if Perfomance becomes an issue). Zed Shaw used this for Mongrel and it was the most popular ruby webserver for a while.

Movatterモバイル変換

Stack Exchange Network

Parsing a HTTP request

2 Answers2

You mustlog in to answer this question.

Related

Hot Network Questions

Subscribe to RSS