How to host your WPEngine blog in a subdirectory using a reverse proxy
Having your cake and eating it too
Hosting, scaling and maintaining your own blog is neither fun nor typically a good use of your time. For this reason, many companies – especially time-strapped startups – tend to use hosted services. These typically work by pointing a DNS record at the hosted instance resulting in a blog URL format that I’m sure you’re familiar with; blog.example.com
.
Early on in a venture, this approach is an entirely sensible one as you get a blog up and running with very little operational complexity or cost. It does, however, come at some apparent cost to the ranking of your blog content on search engines. While this cost is hard to confirm and quantify, after considering others’ experiences and our own data, at Poparide we decided that the time had come to optimise our search engine rankings by moving our blog from blog.poparide.com
, to www.poparide.com/blog
.
As exciting as the subdirectory vs. subdomain debate is, this article is not about that quandary. This article is about how you can have your cake and eat it too! That is, how you can use a hosted service and have your blog accessed at your root domain and in a subdirectory.
Choosing a reverse proxy
The standard approach for serving whole websites at an entirely different domain to where it is actually hosted is to use a reverse proxy.
At Poparide, we are using WPEngine to host our blog, as our content creators were most comfortable with WordPress. While WPEngine does support the use of reverse proxies in combination with their service, they do not support the hosting of content within a subdirectory such as example.com/blog
. We are neither PHP nor WordPress developers, so the prospect of getting mired in the internals of a WordPress instance was very unappealing to us. For this reason and because we believe it’s a cleaner approach, we opted to consider the hosted WordPress instance a black-box and fake access via a subdirectory in the reverse proxy.
Faking the /blog/
subdirectory in our reverse proxy keeps complexity in one place; in the reverse proxy configuration. Since no complex WordPress configuration or plugins are required as a result, it makes it easier for us to potentially move away from WordPress if we ever want to. However, it did mean having an additional requirement when it came to choosing a reverse proxy; it needed to be able to dynamically rewrite both the headers and bodies of requests and responses. For example, we neededexample.com/blog/some-neat-article
to be translated to a request to WPEngine at ourblogname.wpengine.com/some-neat-article
, and any links in the response to be converted from ourblogname.wpengine.com/yet-another-article
to example.com/blog/yet-another-article
.
There were several candidates considered, but in the end, Nginx was a clear winner. It is fast, intuitively configurable and proven. HAProxy was a close runner-up but was ultimately disqualified due to not supporting the rewriting of body content. Not only did Nginx come with excellent header and body rewriting capabilities, after experimenting with both, we felt that it was a lot friendlier to configure than HAProxy.
Configuring the reverse proxy
Nginx proved to be a breeze to configure and we quickly had a working configuration up and running that handled:
- stripping of
/blog/
from the path for requests to WPEngine - rewriting of references to the WPEngine subdomain in their responses
- removal of headers in both directions that we didn’t want being passed around (secrets in Cookies, for example – be careful not to leak them!)
- caching of most responses
And here’s roughly what our Nginx configuration looks like!
proxy_cache_path /var/nginx/cache levels=1:2 keys_zone=blog_cache:1m max_size=100m inactive=60m use_temp_path=off; server{ listen 80; server_name example.com; # prevent redirects by WP for missing trailing slashes rewrite ^([^?#]*/)([^?#./]+)([?#].*)?$ $1$2/$3 permanent; # prevent forwarding of cookies proxy_set_header Cookie ""; # prevent passing of WP cookie back to client proxy_hide_header Set-Cookie; # hide a range of other, unimportant headers proxy_hide_header link; proxy_hide_header wpe-backend; proxy_hide_header x-pingback; # add high-precedence location to avoid proxying WPEngine's robot.txt file location /blog/robots.txt {return 404;} location /blog/ { proxy_pass https://exampleblog.wpengine.com:443/; proxy_set_header Host exampleblog.wpengine.com; # ask WPEngine for uncompressed content as otherwise it complicates re-writing proxy_set_header Accept-Encoding ""; # strip /blog/ from the path rewrite /blog/(.*) /$1 break; # replace all instances of the WPEngine subdomain from the response subs_filter_types text/html text/css text/xml; subs_filter 'exampleblog.wpengine.com' 'example.com/blog' gi; } # we cache all content for 60 minutes, ignoring WPEngine cache # headers but not caching most errors proxy_cache blog_cache; proxy_ignore_headers Cache-Control; proxy_cache_valid 200 302 60m; proxy_cache_valid 404 10m; }
Finer points to consider
Leakage of secrets
By default, a reverse proxy will forward all parts of a request to the backend hosting service. If you are using a subdomain that also serves other functionality besides your blog, it is easy to have cookies related to that other functionality leaked to the backend hosting service. This is not good! Test thoroughly and ensure that any cookies you don’t want to be sent through to the backend are being properly stripped out. Consider preventing them from being submitted through tuning your Cookies’ path and domain parameters.
IP being blocked by WPEngine
Since all your blog traffic in this approach is coming from a single or small range of IP addresses, it is likely that the fine Ops folk at your hosting solution will notice this and assume it is nefarious. For this reason, it is probably a good idea to reach out to them and request that your IP addresses be whitelisted. WPEngine has proven receptive to doing this for us.
That being said, if you use basic caching as in the example above, the traffic to your blog hosting service will be pretty insignificant and most likely won’t raise alarms.
Increased bandwidth usage
Be aware that deciding to host your blog this way means having all your blog traffic enter and exit your network. This may result in extra cost to you depending on your network. Use of caching will certainly help reduce this though, as it avoids every hit on your blog resulting in a request to your blog hosting service.
And yes, this article is being served up to you with the approach described within – pretty meta, eh? If you have any thoughts you’d like to share or questions you might have, let us know.