Logo Background
  • HTTP Headers Scanning
    By on January 10, 2009 | No Comments

    In a typical HTTP transaction, the browser requests a specific page from the server.

    Along with the request, the browser sends several lines of header information. These tell the server what types of data the browser can handle, what type of browser it is, and so forth.

    The server responds with the content that was requested, but it precedes that with its own header lines.

    These are not usually revealed to the end user, but they can tell us a great deal about the server and the pages that it hosts.

    sample@allguru.net [~]# wget -S http://www.allguru.net/index.php
    –00:50:26–  http://www.allguru.net/index.php
    => `index.php’
    Resolving www.allguru.net…
    Connecting to www.allguru.net||:80… connected.
    HTTP request sent, awaiting response…
    HTTP/1.1 302 Moved Temporarily
    Date: Sat, 10 Jan 2009 07:50:26 GMT
    Server: Apache/2.2.11 (Unix) mod_ssl/2.2.11 OpenSSL/0.9.8i DAV/2 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/
    X-Powered-By: PHP/5.2.8
    Expires: Thu, 19 Nov 1981 08:52:00 GMT
    Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
    Pragma: no-cache
    X-Pingback: http://www.allguru.net/xmlrpc.php
    Set-Cookie: PHPSESSID=5066017a518c53d393ebd00d0b382f10; path=/
    Location: http://www.allguru.net/
    Content-Length: 0
    Keep-Alive: timeout=15, max=100
    Connection: Keep-Alive
    Content-Type: text/html; charset=UTF-8
    Location: http://www.allguru.net/ [following]
    –00:50:26–  http://www.allguru.net/
    => `index.html’
    Reusing existing connection to www.allguru.net:80.
    HTTP request sent, awaiting response…
    HTTP/1.1 200 OK
    Date: Sat, 10 Jan 2009 07:50:26 GMT
    Server: Apache/2.2.11 (Unix) mod_ssl/2.2.11 OpenSSL/0.9.8i DAV/2 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/
    X-Powered-By: PHP/5.2.8
    Expires: Thu, 19 Nov 1981 08:52:00 GMT
    Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
    Pragma: no-cache
    X-Pingback: http://www.allguru.net/xmlrpc.php
    Connection: close
    Content-Type: text/html; charset=UTF-8
    Length: unspecified [text/html]

    [ <=>                                                                                                              ] 39,146       223.01K/s

    00:50:27 (222.69 KB/s) – `index.html’ saved [39146]
    There are several headers that are common to all servers, as well as others that vary between servers and between different pages on the same server. The twelve headers that are returned with the O’Reilly home page are representative. I will step through each of these and explain their significance. The order has been changed so as to group related headers together.

    Server Response
    HTTP/1.1 200 OK

    The first header announces the protocol that is being used by the server and reports its response to the page request. In this case, the server is using Version 1.1 of the HTTP protocol, which is typical.The number that follows is the server response code . The value 200 signifies that the requested page was found and is being sent back to the browser. The status message OK is a convenience that reminds us what the numeric code means.

    There are more than 30 possible server response codes but most are rarely seen. You are undoubtedly familiar with code 404, which signifies that the requested page was not found. Codes in the 300 series indicate that the browser is being redirected to another page. Redirection is commonly used in scams to conceal the identity of a web site. The mechanism being used can be determined by looking at that specific code, as I discuss in Chapter 4.

    Date: Thu, 20 Jan 2005 17:08:11 GMT

    The date header is a timestamp for when this page was downloaded from the server.

    Last-Modified: Thu, 20 Jan 2005 09:19:26 GMT

    This header is the timestamp of when the content of the page was last modified. This is an important piece of information for the browser because it will check its cache to see if it already has a copy of this page. If it does, then it will compare the timestamp from when that was downloaded with the Last-Modified timestamp. If the latter is later than the prior download then it will retrieve the new version. If the cached version was downloaded after the last change, then the browser will use that instead of continuing with the current download.

    This header is also of interest from a forensics perspective because it tells us a little about the history of the page. Looking at these dates from a set of related files can help define when the site was created, which would be on or before the earliest date. The most recent change to any of the files can suggest how long the site has been in operation. Fake bank sites, for example, tend to be created immediately before the associated phishing email is sent out.

    ETag: “a4524-d5f6-41ef779e”

    Entity Tags, or ETags, offer an alternative to comparing timestamps. The browser can compare the bit representation of the ETag from the cached version with the one on the web site. If they are identical, then the cached version will be used. This offers a very slight performance improvement over comparing timestamps directly.

    Connection and Keep-Alive
    Connection: Keep-Alive
    Keep-Alive: timeout=15, max=500

    The Connection header tells the browser what kind of connection the server would like to establish. Keep-Alive is the usual value for this and means that the connection between the two computers will stay open after this page has been downloaded. The Keep-Alive header defines the number of seconds that a connection will stay open, waiting for a new request, and the time it will wait if the browser fails to respond.

    Content-Type: text/html

    This tells the browser what type of content it should expect. text/html is the MIME type of a basic web page. This would be different if the document were an audio file or an Excel spreadsheet or some other type of file.

    Content-Length and Accept-Ranges
    Content-Length: 54774
    Accept-Ranges: bytes

    Content-Length tells the browser how large a document is coming its way. Checking this against the number of bytes actually received provides a simple integrity check for the browser. Accept-Ranges tells the browser that it can, if it needs to, request specific pieces of the requested file, rather than the entire thing. This is not very relevant for regular web pages.

    P3P: policyref=”http://www.allguru.net/w3c/p3p.xml”,
    CP=”CAO DSP COR […]

    This header is used by a growing number of sites to disclose their privacy policy to browsers prior to actually downloading any content. P3P stands for Platform for Privacy Preferences, a project of the World Wide Web Consortium (http://www.w3.org/P3P/).

    X-Cache: MISS from www.allguru.net

    Headers with the X- prefix can represent anything the server wants them to. They are equivalent to the X- headers found in email messages. In this example, X-Cache most likely refers to a cache of dynamically generated pages on the O’Reilly web site, suggesting that this site handles its load using something more than a basic web server.

    Server: Apache/1.3.33 (Unix) PHP/4.3.10 mod_perl/1.29

    The most informative header, from a forensics point of view, is the Server header. This tells us what type of web server is responding to this request. Apache is the most common server on the Internet, and its default configuration offers up a surprising amount of detail in this header.

    In this example, you can see that the site is hosted on a system running Unix. The Apache web server is Version 1.3.33, which is widely used although not the most recent release. In addition, it tells us the specific versions of PHP and the mod_perl module. With a bit of work investigating the release history of these packages and their inclusion in different Linux distributions, you could make an educated guess as to when the computer running this server was set up or last updated. Here are some other examples of Server headers taken from various web sites:

    Apache, version 1.3 on Mac OS X, version 2.0 and a commercial version
    Server: Apache/1.3.29 (Darwin) PHP/4.3.1
    Server: Apache/2.0.51 (Unix)
    Server: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412 (Unix)
    amarewrite/0.1 mod_fastcgi/2.2.12

    Microsoft Internet Information Server, versions 5 and 6
    Server: Microsoft-IIS/5.0
    Server: Microsoft-IIS/6.0

    Sun ONE Web Server
    Server: Sun-ONE-Web-Server/6.1

    Oracle Application Server
    Server: Oracle-Application-Server-10g OracleAS-Web-Cache-10g
    / H;max-age=300+0;age=73)

    Google’s custom web server
    Server: GWS/2.1

    The amount of information revealed varies according to the type of server. Apache is, by default, very generous. But while this works to our benefit when we want to investigate a web site, it can be viewed as a liability when other people use it to learn about the sites that we control.

    Anyone who wants to break into a server is looking for a vulnerability they can exploit. By revealing the specific versions of Apache, PHP, DAV, etc. that we are running on our server, we may be making their life much easier than it needs to be. If someone knows that a certain vulnerability exists in, say Apache 1.3.29, they can write a simple wrapper script that runs wget on every IP address in a range, and then runs grep on the headers that are returned, looking for the specific version. I show you how you can limit the information in this header in the section “Controlling HTTP Headers” later in this chapter.