Implementing a proxy server with webob and paste

I thought this was kind of neat. An http proxy is a useful tool to have, there are many types for many different purposes. It's a program that you can put between a client and a web server. See the wiki article for all the common uses for a proxy. Anyway, after playing with these for a couple of days I had some needs at work that it seemed would work if I had a proxy server I could drop some custom code in for various things.

Writing your own might seem like a daunting task but it's surprisingly easy thanks to Ian Bicking's webob and paste . Let's run through some examples. 

This is a proxy server that does nothing except pass the request along and return the response. This won't work for https, but it's on my list of things to track down as I get time.

from paste import httpserver
from paste.proxy import TransparentProxy
httpserver.serve(TransparentProxy(), "0.0.0.0", port=8088)

 

To use it you just go into your browser preferences and configure your browser to use it. This varies by browser of course. So now that we have the ability to have all http traffic coming from a browser routed through our program, adding functionality is pretty easy thanks to wsgi middleware and webob.

Here's middleware that will print every request and corresponding response that goes through it.

 
from webob.dec import wsgify
from paste import httpserver
from paste.proxy import TransparentProxy


def print_trip(request, response):
    """
    just prints the request and response
    """
    print "Request\n==========\n\n"
    print str(request)
    print "\n\n"
    print "Response\n==========\n\n"
    print str(response)
    print "\n\n"


class HTTPMiddleware(object):
    """
    serializes every request and response
    """

    def __init__(self, app, record_func=print_trip):
        self._app = app
        self._record = record_func

    @wsgify
    def __call__(self, req):
        result = req.get_response(self._app)
        try:
            self._record(req.copy(), result.copy())
        except Exception, ex: #return response at all costs
            print ex
        return result

httpserver.serve(HTTPMiddleware(TransparentProxy()), "0.0.0.0", port=8088)

Hopefully this example is enough to show you what is possible. It just so happens that I have a need at my day job that would greatly benefit from capturing all the http traffic that is caused by our automated tests. So I wrote a record function to save the request/response off to a database so that we can analyze things over subsequent releases. But any need you may have that can be expressed as "When a request comes in I need to..." and/or "When a response comes out i need to..." could probably be patched in pretty simply with a piece of middleware around a proxy.

This morning I was thinking of what my options were as far as content filtering for my household. I've used dansguardian in the past and it was ok, but not exactly easy for a non-computer-literate parent to setup. So my experiment was to figure this out. So I started imagining what challenges I might come across if I wrote one. 

Well, as far as content filtering it seems like it should be easy once you have access to the request. The webob request object gives you all kinds of goodies to manipulate, so I should be able to write something like...

 

from webob.dec import wsgify
from webob.exc import HTTPForbidden 
import urlparse
from paste import httpserver

class FacebookFilter(object):
    """
    don't allow traffic to facebook
    """
    def __init__(self, app):
        self._app = app

    @wsgify
    def __call__(self, request):
        url_parts = urlparse.urlsplit(request.url)
        if "facebook.com" in url_parts.netloc:
            return HTTPForbidden(body="do your homework")
        else:
            return request.get_response(self._app)

httpserver.serve(FacebookFilter(TransparentProxy()), "0.0.0.0", port=8088)
 

So filtering requests by anything that you can get from the request seems pretty easy. But what I would really like to do is filter content based on who is accessing it. This took a little more work, I actually had to read an rfc and use the HTTPMiddleware I wrote to get an idea of what headers need to be sent and returned. 

Once you get used to the language of the rfc (I recommend plenty of coffee as these aren't the most exciting things in the world to read), it really isn't that bad.  Basically you tell the browser it needs to authenticate by returning a 407 response which is basically what the "HTTPProxyAuthenticationRequired" does, to trigger the browser to prompt for username and password you need to add a "Proxy-Authenticate" header. That's straight from the rfc.
 
The other side of the equation is how the browser gets credentials back to the proxy. That is through the use of the "Proxy-Authorization" header, which as far as I can tell the browser will pass on every request. When you specify "Basic" authentication the browser will pass the credentials entered as "Basic user:password" but  user,password are base64 encoded. And honestly I have no idea what real proxies would do if your username or password had a ":" in it.   
 

class AuthProxy(object):
    """
    Auth proxy example
    """

    def __init__(self):
        self._proxy = TransparentProxy()

    @wsgify
    def __call__(self, request):
        if self.authorize(request):
            return request.get_response(self._proxy)
        else:
            return self.challenge()

    def challenge(self):
        response = HTTPProxyAuthenticationRequired()
        response.headers.add(\
            "Proxy-Authenticate",
            "Basic realm=\"phaeton proxy\"")

        return response

    def authorize(self, request):
        auth_hdr = request.headers.get("Proxy-Authorization")
        if  not auth_hdr:
            return False
        else:
            return self.authenticate_user(*self.get_creds(auth_hdr))

    def authenticate_user(self, user, password):
        """
        real weak authentication scheme
        """
        return True

    def get_creds(self, data):
        hdr_sep_pos = data.find(" ")
        creds_b64 = data[hdr_sep_pos:].strip()
        creds_str = base64.decodestring(creds_b64)
        sep_pos = creds_str.find(":")
        return  creds_str[:sep_pos], creds_str[sep_pos:]

httpserver.serve(AuthProxy(), "0.0.0.0", port=8088)

So there you go, a proxy server that won't let any http traffic out without credentials. Though it does nothing with them surely you can see how it would be possible to take that user,pass and verify things against a database or maybe ldap or .htpasswd file or whatever. 

To sum up, I might be woefully naive but it seems that this proxy stuff isn't all that hard. And I hope I've effectively demonstrated what is possible in a relatively small amount of code thanks to wsgi, paste and webob. Thanks for reading.

A typo.

Hi, there is a typo in the second exapmle where you logging to the console - `return result` is in the `except` clause. I think the formatter had eaten an indent or `finally` clause is missing.

Good catch

You are right, the tinymc editor was fighting me tooth and nail on the formatting of this stuff. I fixed it thanks for the heads up.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd><p> <h1><h2><h3><h4><h5><h6> <img>
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>. Beside the tag style "<foo>" it is also possible to use "[foo]".
  • E-Mail addresses are hidden with reCAPTCHA Mailhide.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.