Implementing a proxy server with webob and paste
I thought this was kind of neat. An http proxy is a useful tool to have, there are many types for many different purposes. It's a program that you can put between a client and a web server. See the wiki article for all the common uses for a proxy. Anyway, after playing with these for a couple of days I had some needs at work that it seemed would work if I had a proxy server I could drop some custom code in for various things.
Writing your own might seem like a daunting task but it's surprisingly easy thanks to Ian Bicking's webob and paste . Let's run through some examples.
This is a proxy server that does nothing except pass the request along and return the response. This won't work for https, but it's on my list of things to track down as I get time.
from paste import httpserver from paste.proxy import TransparentProxy httpserver.serve(TransparentProxy(), "0.0.0.0", port=8088)
To use it you just go into your browser preferences and configure your browser to use it. This varies by browser of course. So now that we have the ability to have all http traffic coming from a browser routed through our program, adding functionality is pretty easy thanks to wsgi middleware and webob.
Here's middleware that will print every request and corresponding response that goes through it.
from webob.dec import wsgify
from paste import httpserver
from paste.proxy import TransparentProxy
def print_trip(request, response):
"""
just prints the request and response
"""
print "Request\n==========\n\n"
print str(request)
print "\n\n"
print "Response\n==========\n\n"
print str(response)
print "\n\n"
class HTTPMiddleware(object):
"""
serializes every request and response
"""
def __init__(self, app, record_func=print_trip):
self._app = app
self._record = record_func
@wsgify
def __call__(self, req):
result = req.get_response(self._app)
try:
self._record(req.copy(), result.copy())
except Exception, ex: #return response at all costs
print ex
return result
httpserver.serve(HTTPMiddleware(TransparentProxy()), "0.0.0.0", port=8088)
Hopefully this example is enough to show you what is possible. It just so happens that I have a need at my day job that would greatly benefit from capturing all the http traffic that is caused by our automated tests. So I wrote a record function to save the request/response off to a database so that we can analyze things over subsequent releases. But any need you may have that can be expressed as "When a request comes in I need to..." and/or "When a response comes out i need to..." could probably be patched in pretty simply with a piece of middleware around a proxy.
This morning I was thinking of what my options were as far as content filtering for my household. I've used dansguardian in the past and it was ok, but not exactly easy for a non-computer-literate parent to setup. So my experiment was to figure this out. So I started imagining what challenges I might come across if I wrote one.
Well, as far as content filtering it seems like it should be easy once you have access to the request. The webob request object gives you all kinds of goodies to manipulate, so I should be able to write something like...
from webob.dec import wsgify from webob.exc import HTTPForbidden import urlparsefrom paste import httpserver
class FacebookFilter(object):
"""
don't allow traffic to facebook
"""
def __init__(self, app):
self._app = app
@wsgify
def __call__(self, request):
url_parts = urlparse.urlsplit(request.url)
if "facebook.com" in url_parts.netloc:
return HTTPForbidden(body="do your homework")
else:
return request.get_response(self._app)
httpserver.serve(FacebookFilter(TransparentProxy()), "0.0.0.0", port=8088)
So filtering requests by anything that you can get from the request seems pretty easy. But what I would really like to do is filter content based on who is accessing it. This took a little more work, I actually had to read an rfc and use the HTTPMiddleware I wrote to get an idea of what headers need to be sent and returned.
class AuthProxy(object):
"""
Auth proxy example
"""
def __init__(self):
self._proxy = TransparentProxy()
@wsgify
def __call__(self, request):
if self.authorize(request):
return request.get_response(self._proxy)
else:
return self.challenge()
def challenge(self):
response = HTTPProxyAuthenticationRequired()
response.headers.add(\
"Proxy-Authenticate",
"Basic realm=\"phaeton proxy\"")
return response
def authorize(self, request):
auth_hdr = request.headers.get("Proxy-Authorization")
if not auth_hdr:
return False
else:
return self.authenticate_user(*self.get_creds(auth_hdr))
def authenticate_user(self, user, password):
"""
real weak authentication scheme
"""
return True
def get_creds(self, data):
hdr_sep_pos = data.find(" ")
creds_b64 = data[hdr_sep_pos:].strip()
creds_str = base64.decodestring(creds_b64)
sep_pos = creds_str.find(":")
return creds_str[:sep_pos], creds_str[sep_pos:]
httpserver.serve(AuthProxy(), "0.0.0.0", port=8088)
So there you go, a proxy server that won't let any http traffic out without credentials. Though it does nothing with them surely you can see how it would be possible to take that user,pass and verify things against a database or maybe ldap or .htpasswd file or whatever.
To sum up, I might be woefully naive but it seems that this proxy stuff isn't all that hard. And I hope I've effectively demonstrated what is possible in a relatively small amount of code thanks to wsgi, paste and webob. Thanks for reading.
A typo.
Good catch
Post new comment