Partial Html Cache Clearing

May 26, 2011
Tags: Caching, Sitecore

*the code was updated 8/2/2011 with some insight from Mrunal Brahmbhatt. Much appreciated.

 So I recently upgraded my system to Sitecore 6.4 from Sitecore 6.2 and was pumped about a lot of the new features like multi-browser support, new Rich Text Editor but mostly, the new built-in multi-target cache management system. Now I have to say that when I heard the words "partial cache clearing" I completely misunderstood what it meant. I thought it was partial html cache clearing thinking that when a single page got published, just that item was removed from cache. The truth is that when anyone publishes anything all html cache for all sites defined under the web.config's "publish:end" or "publish:end:remote" event are cleared. Sitecore manages a lot more cache than just html cache so by their thinking when just the html cache is cleared, that is just part of all the cache they're working with. In this way they're right, but this strategy is a problem for my particular system because of the large number of sites and editors working on it at any given time. The continually growing number of sites means that I rely on the html cache a lot to minimize the workload on the servers and keep sites loading quickly. I had solved this same issue working with the Stager Module but doing it with this new system is a bit different. Before I go into details about how to do this I will say that I am expecting that you have already setup your system as a mutli-target platform and have properly configured your ScalabilitySettings.config file. If you're looking for more on how to setup a multi-target platform then I'd suggest first starting by reviewing the scaling guide on SDN first which answered all my question about how to get it working.

Okay so the first thing you'll need to do is add a new HtmlCacheClearer class to your library which I've copied and modified from the Sitecore.Publishing.HtmlCacheClearer class.

public class HtmlCacheClearer
	{
		// Fields
		private readonly ArrayList _sites = new ArrayList();

		// Methods
		public void ClearCache(object sender, EventArgs args) {
			Assert.ArgumentNotNull(sender, "sender");
			Assert.ArgumentNotNull(args, "args");

			//THIS WILL RUN ON THE REMOTE TARGETS
			if (args.GetType().ToString().Equals("Sitecore.Data.Events.PublishEndRemoteEventArgs")) {
				
				PublishEndRemoteEventArgs pargs = (PublishEndRemoteEventArgs)args;
			    
				ID did = new ID(pargs.RootItemId);
				Assert.IsNotNull(did, "publish root item id");
				Database db = Sitecore.Configuration.Factory.GetDatabase(pargs.TargetDatabaseName);
				if (db != null) {
					Item rootItem = db.GetItem(did);
					if (rootItem != null) {
						List<SiteInfo> siList = GetSiteInfo(rootItem);
						foreach(SiteInfo si in siList){
							SiteContext sc = Factory.GetSite(si.Name)
							if (sc != null) {
								HtmlCache htmlCache = CacheManager.GetHtmlCache(sc);
								if (htmlCache != null) {
									htmlCache.Clear();
								}
							}
						}
					}
				}
			}  
		}

		public List<SiteInfo> GetSiteInfo(Item i){
			return Factory.GetSiteInfoList().Where(siteInfo => Sitecore.Context.Item.Paths.ContentPath.Contains(siteInfo.StartItem)).ToList();
		}

		// Properties
		public ArrayList Sites {
			get {
				return this._sites;
			}
		}
	}
   

Now you're going to want to add a reference to this class in your web.config file's events section under the "publish:end:remote" event. You'll want to change the handler from:

<event name="publish:end:remote">
	<handler type="Sitecore.Publishing.HtmlCacheClearer, Sitecore.Kernel" method="ClearCache">
		<sites hint="list">
			<site>website</site>
		</sites>
	</handler>
</event>

to:

<event name="publish:end:remote">
	<handler type="YourNamespace.Publishing.HtmlCacheClearer, YourLibrary" method="ClearCache">
		<sites hint="list">
			<site>website</site>
		</sites>
	</handler>
</event>

So what changed in this modified method is we take whatever root item was published to determine which site was affected and clear all of the html cache for that site. You could, and I may later, change it so that it tries to only remove references to that single item's path and id from the html cache keys, but for now it's an improvement and I don't have a lot of time to test the best way to do that. I would say that if you're looking to do it yourself you should start with determining whether or not the PublishEndRemoteEventArgs pargs' "PublishMode" property is set to "SingleItem" or not which would tell you whether or not to do an axes query for all descendants to remove their path and id from the cache keys as well. Also by using this method you won't need to add all new sites to the list of the sites in this event unless you're going to modify the code to use the Sites ArrayList in this class. For this example I just pulled sites from the Factory.GetSiteInfoList().

It should be noted that Sitecore is not running this event during a normal page lifecycle and there won't be any access to the Log. While the Sitecore application is running, it performs a check for changes to the event queue at an interval set in the web.config file and when it sees there was a publish event and determines that it's instance name doesn't match the name that published it, it will call this pipeline event. If you do attempt to modify this code you won't be able to rely on typical messaging methods to know if you're code is failing. I was able to at least write messages to an html cache key and read it using the Cache Manager. It's not pretty but it worked. So here's to making things work.