The last article demonstrated how we could simply make any https flow going out a virtual bridge network interface through mitmproxy to cache anything.
In this article, we’ll focus on the logic to “cache anything” from within our mitmproxy script.
We’d like each fetched file to be stored in a directory as is to make it easy to tweak the cached content:
- remove a file from the cache directory to force the proxy to re-fetch it,
- fill the cache directory with your own files,
- have the cache directory mountable in a memory (ie. a tmpfs directory).
So, for example:
# for apt http://http.debian.net/debian/pool/main/r/rtmpdump/librtmp-dev_2.4+20111222.git4e06e21-1_amd64.deb # would be: cache_directory/http.debian.net/debian/pool/main/r/rtmpdump/librtmp-dev_2.4+20111222.git4e06e21-1_amd64.deb # for pip https://pypi.python.org/simple/python-memcached # would be cache_directory/simple/python-memcached/index.html # for elasticsearch plugin installer https://github.com/mobz/elasticsearch-head/archive/master.zip # would be cache_directory/github.com/mobz/elasticsearch-head/archive/master.zip
We also want HTTP error codes to be reproduced. A quick hackish way to allow our PoC to do that is to symlink the file to the error code. For example:
# for apt http://debian.saltstack.com/debian/dists/wheezy-saltstack-2014-07/main/i18n/Translation-en.lzma # would be a symlink to the non-existent 404 file cache_directory/debian.saltstack.com/debian/dists/wheezy-saltstack-2014-07/main/i18n/Translation-en.lzma -> 404
So, we’re going to implement:
- url to download path in the caching directory two-way conversion,
- convert urls ending with a slash to
- be able to fix urls which are used without trailing slashes, but which the server will redirect to trailed-slash form (hello pip and pypi),
- serve static files from WSGI, with python’s static module
There you go:
I’ve noticed that some times mitmproxy hangs during an
call. I’m waiting for this to be reproducible again to debug it, but this
might have to do with the IP stack.
It seems like all our tests are passing with this (our job matrix - deployment of 13 major products - is still running at the time of writing).
I like this kind of solution because it’s:
- easy to manage the cache directory,
- fix quirks (ie. pip/pypi trailing slash issue),
- have one single generic caching proxy which all developers can use,
- at the office, they are all using the same internet connection, so it matters a lot and is very efficient,
- we can build even if our internet connection is down, as long as no new dependency that is not cached is pulled in,
- if a new dependency is pulled in, and internet is down, then the developer may upload the package he’s been using on the caching proxy and voila, it’ll just work.
The downsides are:
- not sure how well mitmproxy scales so far, once my new CI server comes in we’ll be able to run the 13 jobs in parallel and put it to a real test. If it doesn’t scale so well then I’ll have to find another way to do lazy SSL certificate generation or pre-generate them.
- it requires setting a couple of iptables rules and installing the root certificate in containers, which may be a bit hard for many users, but your test script can do that for them though.
I think I’m going to continue on this path to make an awesome generic caching proxy for CI purposes.