Crawlable Javascript .Net website using Owin

If you want your content to be discovered online these days it needs to be crawlable.   If you want your links to show up on twitter/facebook/google+/pinterest/etc with a preview image and description then your site needs to be crawlable.  The only problem with this is that all of the crawlers out there these days are lazy.  The web runs on javascript and most all of the current crawlers will run javascript(I have noticed some dynamic content showing up in google search but only a small amout).

Creating an Html Snapshot using phantomjs

If you want your content to be indexed you have to do the work for the lazy bots out there these days and that consists of creating a snapshot of what your site would look like if the bot wasn't lazy and executed your javascript.   For in depth information on how this works you can read up on the spec as defined by google here.

Basicaly it works like this you have a url such as

bret.ferrier.us/#!/tech-blog/crawlable-javascript-dotnet-website-using-owin

and google will replace the #! with "?_escaped_fragment_=" and send the following request to your server

bret.ferrier.us/?_escaped_fragment_=/tech-blog/crawlable-javascript-dotnet-website-using-owin

Doing this allows your server to see the client url and respond accordingly.   

Show me the code!

So enough talk, lets get into the code.   Setting up a custom piece of middle ware into the Owin pipeline and creating and returning an html snapshot turns out to be fairly easy with the help of phantomjs a headless browser that can run javascript.


app.Use((context, next) =>
{
if (context.Request.QueryString.HasValue && context.Request.QueryString.Value.StartsWith("_escaped_fragment_="))
{
var actualUrl = context.Request.Uri.AbsoluteUri.Replace("?_escaped_fragment_=", "#!");
var driver = new OpenQA.Selenium.PhantomJS.PhantomJSDriver();
driver.Navigate().GoToUrl(actualUrl);
var cachedContent = driver.PageSource;
context.Response.StatusCode = 200;
context.Response.ContentType = "text/html";
return context.Response.WriteAsync(cachedContent);
}
return next.Invoke();
});
This code which is taken from the Startup.cs class used for configuring Owin adds a piece of middleware which inspects all requests and sends the request to PhantomJS if the request is coming from a bot.   I haven't used this in any production environment but I am using that on this site to make it so that I can share posts and have them indexed.  In my testing PhantomJS takes about .5 seconds to render this site which is completely rendered in the browser using AngularJS.   I have set up caching for use with this site where I push the generated html to mongo so that I only pay the price to render the content for bots once.

Details

This should work in any ASP.NET site but I have only tested it here on my blog which is only using Web.API and not MVC or WebForms, also
to get the above snippet to work you will also need to add the "Selenium.Webdriver" and "phantomjs.exe" Nuget packages to your project.